From ralf at ark.in-berlin.de  Mon Oct  1 01:19:23 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 1 Oct 2007 10:19:23 +0200
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal to
	add	OpenDocument as an additional
Message-ID: <20071001081923.GA29575@ark.in-berlin.de>

I agree with bowerbird here:
> jeroen said:
> >    Some attempts are in progress to encode texts as TEI, 
> >    and automatically create text, html, and pdf from them. 
> 
> those attempts have been "in progress" for seven years now.
> i invite people to view the .html and .pdf files that are created.

The last release of the gnutenberg-press software was 2005.
There is no reply by the last maintainer on my eMails.
There is no place to report bugs which are plenty.
The package is effectively unmaintained.

I repeat here what I sent to Marcello:

I would maintain the package but I don't have a pglaf account,
and I know zilch about XSLT or stylesheets. I would play the
maintainer part (sf or berlios, your choice) and I could try my luck
with the LaTeX/PDF backend, though, if someone else does the rest of the
bugfixing.

This offer is up until the end of the month (2007-Oct).
I will not make this offer a third time.


Sincerely,
ralf


From marcello at perathoner.de  Mon Oct  1 07:01:51 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 01 Oct 2007 16:01:51 +0200
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to	add	OpenDocument as an additional
In-Reply-To: <20071001081923.GA29575@ark.in-berlin.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
Message-ID: <4700FDCF.1060009@perathoner.de>

Ralf Stephan wrote:
> I agree with bowerbird here:
>> jeroen said:
>>>    Some attempts are in progress to encode texts as TEI, 
>>>    and automatically create text, html, and pdf from them. 
>> those attempts have been "in progress" for seven years now.
>> i invite people to view the .html and .pdf files that are created.
> 
> The last release of the gnutenberg-press software was 2005.
> There is no reply by the last maintainer on my eMails.
> There is no place to report bugs which are plenty.
> The package is effectively unmaintained.
> 
> I repeat here what I sent to Marcello:
> 
> I would maintain the package but I don't have a pglaf account,
> and I know zilch about XSLT or stylesheets. I would play the
> maintainer part (sf or berlios, your choice) and I could try my luck
> with the LaTeX/PDF backend, though, if someone else does the rest of the
> bugfixing.

Maintenance of the "gnutenberg press" is on ice for the present.

1. I don't have much time.

2. The "gnutenberg press" is GPL. So if I don't maintain, you can. Other
people have already created derivative works.

3. My personal impression is that the adoption of TEI is not hampered by
the few small bugs in the conversion chain, but by the complete lack of
TEI authoring tools in DP. Those tools can be developed independently
from the conversion chain (aren't standards nice?). I don't know much
about DP internals, nor do I want to. Too little free time. Maybe if DP
"adopts" TEI and oodles of TEI books start pouring in, my motivation
will rise :-)

4. TEI version 5 is coming Real Soon Now. TEI 5 will have many backward
incompatibilities and new features added. No good to "maintain" now and
having to "re-maintain" later.

5. I'm fed up with buggy Perl XSL modules. I'm going to rewrite the next
version in Python. Also

6. the next version will use a standard presentational format (with
semantic hinting) as intermediate format, lets say: XSL-FO. So people
who want to write backends for their favourite pet formats (eg.
OpenDocument, epub,
that-other-noring-format-which-was-all-the-rage-some-months-ago-but-is-forgotten-now
etc.) can simply convert one open presentational standard into whatever
without having to worry about the "gnutenberg press"'s internal cogs.
The hard conversion between semantic TEI and presentational FO will be
the job of the "gnutenberg press". This all makes all output look more
consistent.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From rolsch at verizon.net  Mon Oct  1 08:31:06 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Mon, 01 Oct 2007 11:31:06 -0400
Subject: [gutvol-d]
 =?iso-8859-1?q?gnutenberg-press_maintenance_offer_=28w?=
 =?iso-8859-1?q?as_Re=3A_Proposal_to_add=09OpenDocument_as_an_additional?=
In-Reply-To: <4700FDCF.1060009@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de>
Message-ID: <200710011131.06953.rolsch@verizon.net>

On Monday 01 October 2007 10:01 am, Marcello Perathoner wrote:
> Ralf Stephan wrote:
> > I agree with bowerbird here:
> >> jeroen said:
> >>>    Some attempts are in progress to encode texts as TEI,
> >>>    and automatically create text, html, and pdf from them.
> >>
> >> those attempts have been "in progress" for seven years now.
> >> i invite people to view the .html and .pdf files that are created.
> >
> > The last release of the gnutenberg-press software was 2005.
> > There is no reply by the last maintainer on my eMails.
> > There is no place to report bugs which are plenty.
> > The package is effectively unmaintained.
> >
> > I repeat here what I sent to Marcello:
> >
> > I would maintain the package but I don't have a pglaf account,
> > and I know zilch about XSLT or stylesheets. I would play the
> > maintainer part (sf or berlios, your choice) and I could try my luck
> > with the LaTeX/PDF backend, though, if someone else does the rest of the
> > bugfixing.
>
> Maintenance of the "gnutenberg press" is on ice for the present.
>
> 1. I don't have much time.
>
> 2. The "gnutenberg press" is GPL. So if I don't maintain, you can. Other
> people have already created derivative works.

If a maintained version of "gnutenberg press" is created.  Will it become the 
official one used at PG.

> 3. My personal impression is that the adoption of TEI is not hampered by
> the few small bugs in the conversion chain, but by the complete lack of
> TEI authoring tools in DP. Those tools can be developed independently
> from the conversion chain (aren't standards nice?). I don't know much
> about DP internals, nor do I want to. Too little free time. Maybe if DP
> "adopts" TEI and oodles of TEI books start pouring in, my motivation
> will rise :-)

IHMO, I have noted a rise of interest by a number of Post-Processors to use 
TEI to Post-Process their projects.  However, their first impression of the 
conversion chain has been their biggest disappointment.  I will agree that a 
TEI version of Guiguts would go a long way in helping adoption of TEI.

> 4. TEI version 5 is coming Real Soon Now. TEI 5 will have many backward
> incompatibilities and new features added. No good to "maintain" now and
> having to "re-maintain" later.

I may be a good idea after TEI 5 comes out to establish a working group to 
discuss, design and write a new PG conversion tool and a DP to TEI tool.

> 5. I'm fed up with buggy Perl XSL modules. I'm going to rewrite the next
> version in Python. Also

By all means please write it in Python.

> 6. the next version will use a standard presentational format (with
> semantic hinting) as intermediate format, lets say: XSL-FO. So people
> who want to write backends for their favourite pet formats (eg.
> OpenDocument, epub,
> that-other-noring-format-which-was-all-the-rage-some-months-ago-but-is-forg
>otten-now etc.) can simply convert one open presentational standard into
> whatever without having to worry about the "gnutenberg press"'s internal
> cogs. The hard conversion between semantic TEI and presentational FO will
> be the job of the "gnutenberg press". This all makes all output look more
> consistent.

From ralf at ark.in-berlin.de  Mon Oct  1 09:59:28 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 1 Oct 2007 18:59:28 +0200
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re:	Proposal
	to add?OpenDocument as an additional
In-Reply-To: <200710011131.06953.rolsch@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de>
	<200710011131.06953.rolsch@verizon.net>
Message-ID: <20071001165928.GA18806@ark.in-berlin.de>

Thanks Marcello for a clear statement.

Roland:
> If a maintained version of "gnutenberg press" is created.  Will it become the 
> official one used at PG.

As Marcello intends to rewrite the package, even collecting bug
reports for 0.4 is useless. I draw back the maintenance offer
for 0.4 but will offer for the next version to launch pages on
sourceforge or berlios.de for collecting bugs (at least) and to
have a specific forum.

However, a help text for the DP Wiki for 0.4 users is still a 
good idea, and after finishing a TEI drama project, I think I'll
have some stuff to write.


Regards,
ralf


From jeroen.mailinglist at bohol.ph  Mon Oct  1 11:34:23 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Mon, 01 Oct 2007 20:34:23 +0200
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to	add	OpenDocument as an additional
In-Reply-To: <4700FDCF.1060009@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de>
Message-ID: <47013DAF.5090400@bohol.ph>

Marcello Perathoner wrote:
> Maintenance of the "gnutenberg press" is on ice for the present.
> 1. I don't have much time.
>
>   
A common problem. Sometimes I feel I spend too much time on preparing
actual texts in TEI, and too little on tool development, but then, I
want my texts to be TEI.

> 2. The "gnutenberg press" is GPL. So if I don't maintain, you can. Other
> people have already created derivative works.
>
> 3. My personal impression is that the adoption of TEI is not hampered by
> the few small bugs in the conversion chain, but by the complete lack of
> TEI authoring tools in DP. Those tools can be developed independently
> from the conversion chain (aren't standards nice?). I don't know much
> about DP internals, nor do I want to. Too little free time. Maybe if DP
> "adopts" TEI and oodles of TEI books start pouring in, my motivation
> will rise :-)
>
>   
I think the biggest barrier here is the steep learning curve of TEI (20%
of the tags cover 80% of the things you encounter, but every other book
you will need something from those remaining 80%, and, oh gosh, which
tag can I use then), combined with the fact that it is a far stretch
from TEI to WYSIWYG. Maybe somebody can help build an authoring tool,
but, in my opinion, it should not even try to be WYSIWYG, as with proper
tagged text, you get much more than you can see in one view....  WYGIMMTWYS

> 4. TEI version 5 is coming Real Soon Now. TEI 5 will have many backward
> incompatibilities and new features added. No good to "maintain" now and
> having to "re-maintain" later.
>   
Their will be migration paths from P3 to version 5, supported by XSLT,
etc. However, version 5 will mainly add much awaited features.

> 5. I'm fed up with buggy Perl XSL modules. I'm going to rewrite the next
> version in Python. Also
>   
Yep, they are a pain, as are those in PHP.
> 6. the next version will use a standard presentational format (with
> semantic hinting) as intermediate format, lets say: XSL-FO. So people
> who want to write backends for their favourite pet formats 
>  can simply convert one open presentational standard into whatever
> without having to worry about the "gnutenberg press"'s internal cogs.
> The hard conversion between semantic TEI and presentational FO will be
> the job of the "gnutenberg press". This all makes all output look more
> consistent.
>
>
>   
I found XSL-FO some kind of overkill for most projects, and have had
good results using Prince (unfortunately a non-free tool, supporting
many CSS3 features) with the generated HTML (or actually, somewhat
modified generated HTML), to get to a printable PDF.

Jeroen.


From Bowerbird at aol.com  Mon Oct  1 12:48:30 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 1 Oct 2007 15:48:30 EDT
Subject: [gutvol-d] more good news
Message-ID: <d05.1afcf6a0.3432a90e@aol.com>

there's more good news on the light-markup revolution.

john macfarlane, out of uberkeley, reports:
>    I've put together two small web apps to demonstrate pandoc

"pandoc" does conversions between various other formats.

john says:
>    http://johnmacfarlane.net/pandoc/html2x.html
>    can convert most web pages to markdown, reStructuredText, 
>    DocBook, LaTeX, ConTeXt, RTF, or groff man. 

he continues:
>    http://johnmacfarlane.net/pandoc/try
>    allows you to experiment with pandoc 
>    without going to the trouble of 
>    installing it on your system.

these are great steps forward for people exploring light-markup.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071001/02211ccd/attachment.htm 

From julio.reis at tintazul.com.pt  Mon Oct  1 13:43:00 2007
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio_Reis?=)
Date: Mon, 01 Oct 2007 21:43:00 +0100
Subject: [gutvol-d] PG-E shows text, encoding jumbled
In-Reply-To: <mailman.2.1191265202.24007.gutvol-d@lists.pglaf.org>
References: <mailman.2.1191265202.24007.gutvol-d@lists.pglaf.org>
Message-ID: <47015BD4.20809@tintazul.com.pt>

I've checked today, and PG-Europe seems to be... huh... different than 
it was. Still not OK, for sure. It's good to see that apparently someone 
fiddled with it -- so there's hope that DP-E might get fixed.

I used to search for texts in Portuguese and get the full list. The text 
pages themselves were then unreachable (following a link to any text 
returned an empty document.) Now when I search for Portuguese, I get... 
one record at a time. Go ahead, try it. 
http://pge.rastko.net/catalog/world/search

The upshot is -- now I can /actually /follow the link to the e-text 
page, and I can click to download the document. Dandy! But the encoding 
is wrong. For instance, in http://pge.rastko.net/etext/7384 there are 
utf-7 and iso-8859-1 versions of the /Carta da Companhia /by Jos? de 
Anchieta. The guy's name inside the file shows as "Jos+AOk- de 
Anchieta". It's the same in either encoding.

The number of documents listed in the /Language/ drop-down list box 
doesn't match the records found on the database. For instance, it reads 
"Portuguese (77)" and when I search I get the message "85 headings 
found" (and then I am given a single record at a time, of course. :-) 
That is a minor issue, though.

When I search for texts in French, after a few minutes I get: "More than 
1000 records found. Please refine your query." That's minor too, because 
I was only checking what would happen if I searched for other languages. 
Now if I were /actually /meaning to find texts in French, it would be 
not minor, but a major issue. The huge time lag smells like a 
badly-programmed query, or an inefficient database.

Thanks to the people who give their time to fix stuff like this... DP-E 
is really important; there are huge chunks of European recent literary 
history which can't yet be released in the USA. So kudos to all the 
Euro-guys who are taking care of the Euro-stuff...

Tintazul.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071001/2bc8f33b/attachment.htm 

From desrod at gnu-designs.com  Mon Oct  1 16:29:47 2007
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Mon, 01 Oct 2007 19:29:47 -0400
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re:	Proposal
	to add	OpenDocument as an additional
In-Reply-To: <200710011131.06953.rolsch@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de>
	<200710011131.06953.rolsch@verizon.net>
Message-ID: <1191281387.9646.2.camel@localhost.localdomain>


On Mon, 2007-10-01 at 11:31 -0400, Roland Schlenker wrote:
> > 5. I'm fed up with buggy Perl XSL modules. I'm going to rewrite the
> > next version in Python.

> By all means please write it in Python.

Great idea! Let's trade the "buggy" XSL modules in Perl with memory
leaks in Python instead. 

Can you point me to the bug reports you've filed against these XSL
modules, so I can test/follow-up on them myself? Thanks.


-- 
David A. Desrosiers
desrod at gnu-designs.com
setuid at gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071001/af4f38ee/attachment.pgp 

From desrod at gnu-designs.com  Mon Oct  1 16:51:15 2007
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Mon, 01 Oct 2007 19:51:15 -0400
Subject: [gutvol-d] more good news
In-Reply-To: <d05.1afcf6a0.3432a90e@aol.com>
References: <d05.1afcf6a0.3432a90e@aol.com>
Message-ID: <1191282675.9646.9.camel@localhost.localdomain>

On Mon, 2007-10-01 at 15:48 -0400, Bowerbird at aol.com wrote:
> these are great steps forward for people exploring light-markup.

This reminds me of those tools that purport to convert PG texts to HTML,
by slapping <html><body> at the top and </body></html> at the bottom,
and calling it done. 

I tried a bunch of documents through the conversion, including DocBook
and Markdown, and what it produced... was... how shall I say.. short of
the mark I expected. 

This is a great start though.. and to draw a parallel, rss feeds are
causing people to think about a whole new way of reproducing, writing
and sharing news/blogs/etc. with other people.

There will probably always be two camps, light and heavy markup. I'm in
the heavy-markup camp, simply because I haven't yet seen the proof that
all of the necessary semantics a complex document requires can be
reproduced purely with light markup. Add to this that light markup
output can be produced from heavy markup, but not the reverse. 

If given a choice, I'd rather have the book, than the CliffNotes. I can
always produce my own CliffNotes from the book, but I can't create the
book from the CliffNotes. 


-- 
David A. Desrosiers
desrod at gnu-designs.com
setuid at gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071001/5b0460ee/attachment.pgp 

From Bowerbird at aol.com  Mon Oct  1 17:19:11 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 1 Oct 2007 20:19:11 EDT
Subject: [gutvol-d] more good news
Message-ID: <c20.20656c65.3432e87f@aol.com>

david said:
>    There will probably always be two camps, light and heavy markup. 
>    I'm in the heavy-markup camp, simply because I haven't yet seen 
>    the proof that all of the necessary semantics a complex document 
>    requires can be reproduced purely with light markup. 

show me that complex document.   preferably one already in p.g.
or, if you're willing to o.c.r. and correct it, _any_ google scan-set.


>    Add to this that light markup output 
>    can be produced from heavy markup, 
>    but not the reverse.

well, first of all, your first part hasn't been adequately demonstrated.
and second of all, your second part has not been fully _disproven_...

but set all of that aside.   if the markup could be applied by _magic_,
i would chose the heavy markup too!   (and then convert it to light.)

the point is that it's more costly to apply and maintain heavy markup.
_much_ more costly.   so much more costly we actually can't afford it.
and that added cost is for benefits that haven't yet proven themselves.

if you're living in fairy-tale land, then sure, you take the heavy-markup.

but for those of us living in the real world, light-markup is a _bargain_
that costs 15% as much and delivers 85% of the benefits.   easy decision.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071001/7a92cc4b/attachment.htm 

From jon at noring.name  Tue Oct  2 07:23:36 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 2 Oct 2007 08:23:36 -0600
Subject: [gutvol-d] Update on the use of captchas to "proof" digital texts
Message-ID: <56977291.20071002082336@noring.name>

http://news.bbc.co.uk/1/hi/technology/7023627.stm


From desrod at gnu-designs.com  Tue Oct  2 07:40:01 2007
From: desrod at gnu-designs.com (David A. Desrosiers)
Date: Tue, 02 Oct 2007 10:40:01 -0400
Subject: [gutvol-d] Update on the use of captchas to "proof"
	digital	texts
In-Reply-To: <56977291.20071002082336@noring.name>
References: <56977291.20071002082336@noring.name>
Message-ID: <1191336001.9646.20.camel@localhost.localdomain>

On Tue, 2007-10-02 at 08:23 -0600, Jon Noring wrote:
> http://news.bbc.co.uk/1/hi/technology/7023627.stm

I've been using reCaptcha on my Wordpress blog for quite some time now,
and I highly recommend it to anyone else who has a well-trafficked blog.

http://recaptcha.net/plugins/wordpress/


-- 
David A. Desrosiers
desrod at gnu-designs.com
setuid at gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071002/9c6d15b3/attachment.pgp 

From Bowerbird at aol.com  Tue Oct  2 14:34:53 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 2 Oct 2007 17:34:53 EDT
Subject: [gutvol-d] if the entire population of the earth was a village of
	100 people
Message-ID: <d1f.fb030df.3434137d@aol.com>

michael-

this will give you some updates you've been looking for:
>    http://www.miniature-earth.com/

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071002/866938fb/attachment.htm 

From hart at pglaf.org  Wed Oct  3 10:46:16 2007
From: hart at pglaf.org (Michael Hart)
Date: Wed, 3 Oct 2007 10:46:16 -0700 (PDT)
Subject: [gutvol-d] if the entire population of the earth was a village
 of 100 people
In-Reply-To: <d1f.fb030df.3434137d@aol.com>
References: <d1f.fb030df.3434137d@aol.com>
Message-ID: <Pine.LNX.4.64.0710031045500.11617@pglaf.org>


Does anyone have these in plain text nubmers?

mh


On Tue, 2 Oct 2007, Bowerbird at aol.com wrote:

> michael-
>
> this will give you some updates you've been looking for:
>>    http://www.miniature-earth.com/
>
> -bowerbird
>
>
>
> **************************************
> See what's new at http://www.aol.com
>

From Bowerbird at aol.com  Wed Oct  3 11:00:51 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 3 Oct 2007 14:00:51 EDT
Subject: [gutvol-d] sony stupidity
Message-ID: <c3e.1e78e464.343532d3@aol.com>

sony says ripping your own cd to mp3 is "stealing":
>    
http://arstechnica.com/news.ars/post/20071002-sony-bmgs-chief-anti-piracy-lawyer-copying-music-you-own-is-stealing.html

in court, under oath, jennifer pariser said that.
and she's the head of litigation for sony b.m.g.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071003/d4eda06f/attachment.htm 

From Bowerbird at aol.com  Thu Oct  4 10:05:57 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 4 Oct 2007 13:05:57 EDT
Subject: [gutvol-d] give 1, get fuzzy
Message-ID: <bf3.1611f26c.34367775@aol.com>

you've probably heard that the o.l.p.c. program
will be offering a give-1-get-1 deal next month,
where $399 will purchase one o.l.p.c. machine
for a needy child and one for yourself...   great!

the offer starts november 12th.   it's scheduled
to run for two weeks, or until they sell a certain
number of units, and some people think that
they will reach that number of units _quickly_
-- some predictions say it'll be the first day --
so if you really want one, plan to buy _early_...

but if you can afford to give a $199 machine
to a needy child without getting one yourself,
you can do it right _now_, without any waiting:
>    http://www.xogiving.org

you might not get a machine, but i guarantee
you will get a gratifying warm fuzzy feeling...

i haven't seen a more inspirational tech project
than this one in recent memory, maybe never...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071004/b2e092be/attachment.htm 

From Bowerbird at aol.com  Thu Oct  4 12:58:06 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 4 Oct 2007 15:58:06 EDT
Subject: [gutvol-d] how much does it cost to print a book?
Message-ID: <be2.1e90f0f4.34369fce@aol.com>

michael-

it looks like you could use some information
on how much it typically costs to print a book:
>    http://z-m-l.com/misc/printerpricing.html

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071004/2b9da5c7/attachment.htm 

From lee at novomail.net  Thu Oct  4 15:12:18 2007
From: lee at novomail.net (Lee Passey)
Date: Thu, 04 Oct 2007 16:12:18 -0600
Subject: [gutvol-d] more good news
In-Reply-To: <c20.20656c65.3432e87f@aol.com>
References: <c20.20656c65.3432e87f@aol.com>
Message-ID: <47056542.4000002@novomail.net>

Bowerbird at aol.com wrote:

> the point is that it's more costly to apply and maintain heavy markup.
> _much_ more costly.

You know, I have never bought this argument. Personally, I find it much 
easier and cheaper to maintain XHTML that some kind of structured text 
which relies on subtle markup that can easily be mistaken for content. 
The use of subtle distinctions in text as markup has been one of my 
greatest sources of annoyance (and lost productivity) in trying to 
"proofread" for Distributed Proofreaders. I would much rather have an 
"in-your-face" <h3> tag for a chapter heading than being forced to move 
the cursor into a text field and counting the number of blank lines 
before and after a line. And is it one space or two, or maybe a tab 
character, that signals that word-wrapping should be turned off? 
"In-your-face" markup may be distracting if you're trying to read around 
it, but if you're looking for, and trying to manipulate, the markup, 
having it blatant is a huge time saver and great check against errors.

Additionally, there are all sorts of great tools to maintain XML files, 
and virtually nothing to parse or check Plain Text Markup Languages.

I would be interested in knowing what other people's experiences have 
been in the maintenance of full featured, obvious markup languages vs. 
"lite", subtle markup languages.

-- 
Nothing of significance below this line.


From lee at novomail.net  Thu Oct  4 15:20:18 2007
From: lee at novomail.net (Lee Passey)
Date: Thu, 04 Oct 2007 16:20:18 -0600
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to	add	OpenDocument as an additional
In-Reply-To: <47013DAF.5090400@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>	<4700FDCF.1060009@perathoner.de>
	<47013DAF.5090400@bohol.ph>
Message-ID: <47056722.9000801@novomail.net>

Jeroen Hellingman (Mailing List Account) wrote:

> I think the biggest barrier here is the steep learning curve of TEI (20%
> of the tags cover 80% of the things you encounter, but every other book
> you will need something from those remaining 80%, and, oh gosh, which
> tag can I use then) ....

I am intrigued by this comment (and not only because it mirrors my own 
experience). So by way of information gathering among those who use TEI 
on a regular basis, I would you to tell me, perhaps simply as an ordered 
list, what TEI tags you believe are most used and most valuable (not 
necessarily the same thing). In other words, what are the 20% of the 
tags that cover 80% of the need, and from the remaining 80% what seems 
to come up the most often?

I'm thinking of writing a little script that will try to automate the 
collection of usage data from current Gutenberg TEI texts.

-- 
Nothing of significance below this line.


From jon at noring.name  Thu Oct  4 18:14:13 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 4 Oct 2007 19:14:13 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <47056722.9000801@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de> <47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>
Message-ID: <392096647.20071004191413@noring.name>

Lee wrote:
> Jeroen Hellingman wrote:

>> I think the biggest barrier here is the steep learning curve of TEI
>> (20% of the tags cover 80% of the things you encounter, but every
>> other book you will need something from those remaining 80%, and,
>> oh gosh, which tag can I use then) ....

> I am intrigued by this comment (and not only because it mirrors my
> own experience). So by way of information gathering among those who
> use TEI on a regular basis, I would you to tell me, perhaps simply
> as an ordered list, what TEI tags you believe are most used and most
> valuable (not necessarily the same thing). In other words, what are
> the 20% of the tags that cover 80% of the need, and from the
> remaining 80% what seems to come up the most often?
>
> I'm thinking of writing a little script that will try to automate
> the collection of usage data from current Gutenberg TEI texts.

Regarding the last paragraph Lee wrote, I think that's a splendid
idea, to see what elements, attributes and attribute values have been
used, doing a statistical analysis of their usage. I hope the analysis
will also look at content models, maybe building a minimal DTD to which
all the documents will validate, (I believe there are tools which will
build a common DTD from a set of XML documents -- hmmm, maybe that's
the approach to take first, then one can do a statistical analysis by
comparing to this minimal DTD. Also, inspection of that DTD will
provide insights.)

*****

Regarding the rest of what Lee wrote, a few days ago I outlined in a
private email to Lee some preliminary ideas, which I'll restate here
for discussion purposes:

The gist of the idea is that a group of us create a very strict subset
of TEI: elements, attributes and *standardized* attribute values, and
constrained element content models, along with any other markup usage
rules that cannot be enforced by a DTD or schema. This subset and
associated ruleset would be sufficient to consistently, uniformly, and
in standardized fashion (especially attribute values), markup 80% or
90% of the public domain books which PG/DP works with. A related goal
(if possible to achieve) would be that when different people
independently markup the same book, and follow the rules, the marked
up documents would be, for all practical purposes, canonically
identical with one another.

Furthermore, we would actually build our own DTD or schema so that
those authoring to this strictly applied subset could immediately
validate to it. Also, we could write a script to do conformance
checking to check any other requirements that cannot be enforced by
DTD or schema -- a sort of "conformance chekcer." We could also write
a brief document describing how to markup documents using this subset
vocabulary, and minimizing the need for people learning to markup
using our simple subset to have to slog through the TEI manual to
figure out how to do something.

And for most (but clearly not all) of the remaining texts, we could
slowly build a "superset" of the basic DTD, so that at least the more
complicated books follow the strict subset in uniformity of basic
markup. This superset could be slowly built over time.

The benefit of this approach is that we can now involve more people in
marking up books, have the validation tools, and provide a much more
uniform basis by which authoring tool and conversion tools can be
built more quickly. The problem with the full-blown TEI, and even
TEI-Lite (which is not "Lite"), is that it is so massive, and the
manual so difficult to comprehend without spending a year studying it,
that those trying to build dedicated authoring tools and conversion
tools to handle all possibilities is much more difficult. And I'm not
advocating not using the full blown TEI for the extremely funky texts,
but let's at least standardize on something simple and uniform for the
vast majority of the books.

Anyway, that's the core of the idea. And not actually new as Josh has
mentioned it in some fashion, but maybe what is proposed here has a
few twists on what previously has been proposed.

And I'm willing to help build the DTD (I prefer DTDs since they should
be sufficient for this purpose and there are other advantages to DTD
over schema which I won't get into here.) I am quite experienced in
building DTDs, having built by hand the OEBPS 1.2 Document and
Package DTDs, the OpenReader Binder DTD, and the BookX DTD (which has
a similar philosophy to that described above but focused on new book
publishers, clearly not for use by PG/DP for reasons I won't delve
into.)

Jon Noring


(p.s., since TEI P5 is soon to be released -- it is currently at
version 0.9 and may be elevated to 1.0 at this years TEI Annual
Meeting in November -- our subset should definitely be built on P5.)


From rolsch at verizon.net  Thu Oct  4 18:33:42 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Thu, 04 Oct 2007 21:33:42 -0400
Subject: [gutvol-d]
 =?iso-8859-1?q?gnutenberg-press_maintenance_offer_=28w?=
 =?iso-8859-1?q?as_Re=3A_Proposal_to_add=09OpenDocument_as_an_additional?=
In-Reply-To: <47056722.9000801@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
Message-ID: <200710042133.43012.rolsch@verizon.net>

On Thursday 04 October 2007 6:20 pm, Lee Passey wrote:
> Jeroen Hellingman (Mailing List Account) wrote:
> > I think the biggest barrier here is the steep learning curve of TEI (20%
> > of the tags cover 80% of the things you encounter, but every other book
> > you will need something from those remaining 80%, and, oh gosh, which
> > tag can I use then) ....
>
> I am intrigued by this comment (and not only because it mirrors my own
> experience). So by way of information gathering among those who use TEI
> on a regular basis, I would you to tell me, perhaps simply as an ordered
> list, what TEI tags you believe are most used and most valuable (not
> necessarily the same thing). In other words, what are the 20% of the
> tags that cover 80% of the need, and from the remaining 80% what seems
> to come up the most often?

In my experience, it is not the simple matter of elements but, of the usage of 
which element to use for a peculiar piece of text.  I use a python program, 
which marks up about 90% of a DP formatted text.  So, my time is spent 
marking up the non-general use cases.

I use a minimal TEI file as a test file to check how a peculiar piece of text 
will render in the three current formats.  Then, when the TEI markup is 
correct, I copy it into the master TEI file.

The following are some examples from a condensed test file:


  <div rend="page-break-before: right" type="edition">
    <pb n="1"/><anchor id="Pg1"/>

    <p rend="text-align: center; text-indent: 0; font-size: 150%"><hi
    rend="font-variant: small-caps">Marcia Schuyler</hi><lb/><lb/></p>

    <p rend="text-align: center; text-indent: 0"><hi rend="font-size:
    75%">SIXTH EDITION</hi></p>
  </div>

  <div rend="page-break-before: always" type="frontispiece">
    <pb n="4"/><anchor id="Pg4"/>
    <p>
      <figure url="images/image01.png" rend="page-float: 'htb'; text-align:
      center; width: 100%">
        <head><hi rend="font-size: 50%">Copyright by C. Klackner</hi><lb/>
        <hi rend="font-variant: small-caps; font-size: 63%"><q>Oh, You Naughty
        Man!</q> She Exclaimed Prettily, <q>How Dare You!</q></hi></head>
        <figDesc>Illustration: Copyright by C. Klackner<lb/><hi
        rend="font-variant: small-caps"><q>Oh, You Naughty Man!</q> She
        Exclaimed Prettily, <q>How Dare You!</q></hi></figDesc>
      </figure>
    </p>
  </div>

  <div rend="page-break-before: right" type="titlepage">
    <pb n="5"/><anchor id="Pg5"/>

    <p rend="text-align: center; text-indent: 0; font-size: 250%">Marcia
    Schuyler<lb/><lb/></p>

    <p rend="text-align: center; text-indent: 0; font-size:
    75%">by<lb/><lb/><hi rend="font-size: 150%">Grace Livingston Hill
    Lutz</hi><lb/>Author of <q>The Story of a Whim,</q> <q>According to
    the<lb/>Pattern,</q> <q>An Unwilling Guest,</q> etc.<lb/><lb/></p>

    <p rend="text-align: center; text-indent: 0; font-size: 75%"><hi
    rend="font-style: italic">Illustrations by</hi><lb/>E. L. HENRY,
    N.A.<lb/><lb/></p>

    <p rend="page-float: 'b'; text-align: center; text-indent: 0">GROSSET
    &amp; DUNLAP<lb/><hi rend="font-size: 75%">PUBLISHERS &middot;
    NEW YORK</hi></p>
  </div>

  <div rend="page-break-before: always" type="verso">
    <pb n="6"/><anchor id="Pg6"/>

    <p rend="text-align: center; text-indent: 0; font-size: 50%">Copyright,
    1908<lb/>By J. B. Lippincott Company<lb/><lb/></p>

    <p rend="text-align: center; text-indent: 0; font-size: 50%">Published
    February, 1908<lb/><lb/></p>

    <p rend="page-float: 'b'; text-align: center; text-indent: 0; font-size:
    50%"><hi rend="font-style: italic">Electrotyped and printed by J. B.
    Lippincott Company<lb/>The Washington Square Press, Philadelphia, U. S.
    A.</hi></p>
  </div>

  <div rend="page-break-before: right" type="dedication">
    <pb n="7"/><anchor id="Pg7"/>

    <p rend="text-align: center; text-indent: 0; font-size: 50%">TO<lb/>THE
    DEAR MEMORY OF<lb/>MY FATHER<lb/><hi rend="font-size: 75%">The Rev.
    CHARLES MONTGOMERY LIVINGSTON</hi><lb/>WHOSE COMPANIONSHIP AND
    ENCOURAGEMENT<lb/>HAVE BEEN MY HELP THROUGH<lb/>THE YEARS</p>
  </div>


<p>The Squire with deepening frown was studying his elder
senses that a girl of his could be so heartless.</p>

<quote rend="display; pre: none; post: none">
  <p><q><hi rend="font-variant: small-caps">Dear David</hi>,</q>
  the letter ran,&mdash;written as though in a hurry, done at the last
  moment,&mdash;which indeed it was:&mdash;</p>

  <p><q rend="post: none">I want you to forgive me for what I am
  doing. I know you will feel bad about it, but really I never was the right
  one for you. I&rsquo;m sure you thought me all too good, and I never could
  have stayed in a strait-jacket, it would have
  <pb n="51"/><anchor id="pg51"/>
  killed me. I shall always consider you the best man in the world, and I
  like you better than anyone else except Captain Leavenworth. I can&rsquo;t
  help it, you know, that I care more for him than anyone else, though
  I&rsquo;ve tried. So I am going away to-night and when you read this we
  shall have been married. You are so very good that I know you will forgive
  me, and be glad I am happy. Don&rsquo;t think hardly of me for I always did
  care a great deal for you.</q></p>

  <p rend="text-align: right; margin-right: 5"><q rend="post: none">Your
  loving</q></p>

  <p rend="text-align: right"><q><hi rend="font-variant:
  small-caps">Kate.</hi></q></p>
</quote>

<p>It was characteristic of Kate that she demanded the love
the mantel-piece.</p>


<p>They waxed a trifle sentimental at the parting, but when
spirit without a guiding star.</p>

<quote rend="display; pre: none; post: none">
  <p><q><hi rend="font-variant: small-caps">Dear Lemuel</hi>:</q> she
  wrote:&mdash;</p>

  <p><q rend="post: none">I am coming home. I wonder if you will be
  glad?</q></p>
</quote>

<p rend="text-indent: 0">(Artful Hannah, as if she did not know!)</p>

<quote rend="display; pre: none; post: none">
  <p><q rend="post: none">It is very delightful in New York and I have been
  having a gay time since I came, and everybody has been most pleasant,
  but&mdash;</q></p>

  <lg rend="margin-left: 2" type="verse">
    <l>&ldquo;&rsquo;Mid pleasures and palaces though we may roam,</l>
    <l>Still, be it ever so humble, there&rsquo;s no place like home.</l>
    <l>A charm from the skies seems to hallow it there,</l>
    <l>Which, go through the world, you&rsquo;ll not meet with elsewhere.</l>
    <l rend="text-align: center">Home, home, sweet home!</l>
    <l rend="text-align: center">There&rsquo;s no place like
    home.&rsquo;[**PM typo: no &rsquo;]</l>
  </lg>

  <p><q rend="post: none">That is a new song, Lemuel, that everybody
  here is singing. It is written by a young American named John Howard Payne
  who is in London now acting in a great playhouse. Everybody is wild over
  this song. I&rsquo;ll sing it for you when I come home.</q></p>

  <p><q rend="post: none">I shall be at home in time for singing school
  next week, Lemuel. I wonder if you&rsquo;ll come to see me at once and
  welcome me. You cannot think how glad I shall be to get home again. It
  seems as though I had been gone a year at least. Hoping to see you soon, I
  remain</q></p>

  <p rend="text-align: right; margin-right: 5"><q rend="post: none">Always
  your sincere friend,</q></p>

  <p rend="text-align: right"><q><hi rend="font-variant: small-caps">Hannah
  Heath.</hi></q></p>
</quote>

<pb n="256"/><anchor id="pg256"/>

<p>And thus did Hannah make smooth her path before her,
further time in chasing will-o-the-wisps.</p>


<p>It did not take her long to reduce the dinner table to
one they sang in school,</p>

<lg rend="margin-left: 13" type="song">
  <l rend="margin-left: -1">&ldquo;Sister, thou wast mild and lovely,</l>
  <l rend="margin-left: 2">Gentle as the summer breeze,</l>
  <l>Pleasant as the air of evening</l>
  <l rend="margin-left: 2">When it floats among the trees.&rdquo;</l>
</lg>

<p rend="text-indent: 0">But the first words set her to thinking of her own
sister, and girl for whom that song was written.</p>

Roland Schlenker


From rolsch at verizon.net  Thu Oct  4 19:29:39 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Thu, 04 Oct 2007 22:29:39 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <392096647.20071004191413@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47056722.9000801@novomail.net> <392096647.20071004191413@noring.name>
Message-ID: <200710042229.39180.rolsch@verizon.net>

On Thursday 04 October 2007 9:14 pm, Jon Noring wrote:
> Lee wrote:
> > Jeroen Hellingman wrote:
> >> I think the biggest barrier here is the steep learning curve of TEI
> >> (20% of the tags cover 80% of the things you encounter, but every
> >> other book you will need something from those remaining 80%, and,
> >> oh gosh, which tag can I use then) ....
> >
> > I am intrigued by this comment (and not only because it mirrors my
> > own experience). So by way of information gathering among those who
> > use TEI on a regular basis, I would you to tell me, perhaps simply
> > as an ordered list, what TEI tags you believe are most used and most
> > valuable (not necessarily the same thing). In other words, what are
> > the 20% of the tags that cover 80% of the need, and from the
> > remaining 80% what seems to come up the most often?
> >
> > I'm thinking of writing a little script that will try to automate
> > the collection of usage data from current Gutenberg TEI texts.
>
> Regarding the last paragraph Lee wrote, I think that's a splendid
> idea, to see what elements, attributes and attribute values have been
> used, doing a statistical analysis of their usage. I hope the analysis
> will also look at content models, maybe building a minimal DTD to which
> all the documents will validate, (I believe there are tools which will
> build a common DTD from a set of XML documents -- hmmm, maybe that's
> the approach to take first, then one can do a statistical analysis by
> comparing to this minimal DTD. Also, inspection of that DTD will
> provide insights.)
>
> *****
>
> Regarding the rest of what Lee wrote, a few days ago I outlined in a
> private email to Lee some preliminary ideas, which I'll restate here
> for discussion purposes:
>
> The gist of the idea is that a group of us create a very strict subset
> of TEI: elements, attributes and *standardized* attribute values, and
> constrained element content models, along with any other markup usage
> rules that cannot be enforced by a DTD or schema. This subset and
> associated ruleset would be sufficient to consistently, uniformly, and
> in standardized fashion (especially attribute values), markup 80% or
> 90% of the public domain books which PG/DP works with. A related goal
> (if possible to achieve) would be that when different people
> independently markup the same book, and follow the rules, the marked
> up documents would be, for all practical purposes, canonically
> identical with one another.

I have been following my rule "make look like the original book".  Thou, I 
have thought at times that what I am producing is a PG edition of the 
original book.  That is, it would be nice if all PG books had the same look 
and feel, like a book from O'Rielly Publishers.
 
>
> Furthermore, we would actually build our own DTD or schema so that
> those authoring to this strictly applied subset could immediately
> validate to it. Also, we could write a script to do conformance
> checking to check any other requirements that cannot be enforced by
> DTD or schema -- a sort of "conformance chekcer." We could also write
> a brief document describing how to markup documents using this subset
> vocabulary, and minimizing the need for people learning to markup
> using our simple subset to have to slog through the TEI manual to
> figure out how to do something.

IMO, is this not what we have a the present time, PGTEI is a minimal subset of 
TEI.  The "Guide to PGTEI" a document describing PGTEI.

In my short time using TEI, having marked up five books of simple fiction.  I 
have encountered enough non-general use cases, that have not been covered by 
PGTEI and the "Guide to PGTEI".  That have required me to seek for more 
information in TEI-Lite and TEI4 to mark them up.

>
> And for most (but clearly not all) of the remaining texts, we could
> slowly build a "superset" of the basic DTD, so that at least the more
> complicated books follow the strict subset in uniformity of basic
> markup. This superset could be slowly built over time.
>
> The benefit of this approach is that we can now involve more people in
> marking up books, have the validation tools, and provide a much more
> uniform basis by which authoring tool and conversion tools can be
> built more quickly. The problem with the full-blown TEI, and even
> TEI-Lite (which is not "Lite"), is that it is so massive, and the
> manual so difficult to comprehend without spending a year studying it,
> that those trying to build dedicated authoring tools and conversion
> tools to handle all possibilities is much more difficult. And I'm not
> advocating not using the full blown TEI for the extremely funky texts,
> but let's at least standardize on something simple and uniform for the
> vast majority of the books.
>
> Anyway, that's the core of the idea. And not actually new as Josh has
> mentioned it in some fashion, but maybe what is proposed here has a
> few twists on what previously has been proposed.
>
> And I'm willing to help build the DTD (I prefer DTDs since they should
> be sufficient for this purpose and there are other advantages to DTD
> over schema which I won't get into here.) I am quite experienced in
> building DTDs, having built by hand the OEBPS 1.2 Document and
> Package DTDs, the OpenReader Binder DTD, and the BookX DTD (which has
> a similar philosophy to that described above but focused on new book
> publishers, clearly not for use by PG/DP for reasons I won't delve
> into.)
>
> Jon Noring
>
>
> (p.s., since TEI P5 is soon to be released -- it is currently at
> version 0.9 and may be elevated to 1.0 at this years TEI Annual
> Meeting in November -- our subset should definitely be built on P5.)

IMO, I think that there are two obstacles to the adoption of TEI.  One, there 
is no conversion tool such as Guiguts to convert a DP formatted text to TEI.  
Two, DP has very strong community of PPers, who know how to mark up a DP 
formatted text into HTML.  I believe a conversion tool to TEI and a very 
helpfull group of DPers, well versed in TEI, are needed to further the 
adoption of TEI.

Roland Schlenker


From vze3rknp at verizon.net  Thu Oct  4 19:40:31 2007
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Thu, 04 Oct 2007 22:40:31 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <392096647.20071004191413@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4700FDCF.1060009@perathoner.de> <47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net> <392096647.20071004191413@noring.name>
Message-ID: <4705A41F.2060708@verizon.net>

In thinking about persuading DP volunteers to use TEI, one area to 
consider is the periodicals. They raise interesting formatting problems, 
and have lots and lots of instances so that time put into work on one 
will be useful over and over. Punch is a good sample case. It has some 
funky formatting issues, that we've come to deal with in a way that is 
common across the final document producers (post-processors). Any 
proposed TEI DTD or schema or whatever will have to be able to  produce 
something that looks very similar to the issues that we've produced so 
far. I know that Josh has been using TEI for American Missionary. It 
would be good to have some nice examples of other periodicals as well.

Another place where the investment of some time by someone skilled in 
such things would really pay off is in making different style sheets 
(forgive me if I have the term wrong) for drama. Having one that lays 
out the material for reading, and another that lays it out as a script 
for actors to work from (with lots more white space, for example) would 
provide a clear benefit for semantic markup of the basic elements (who's 
talking, stage directions, acts and scenes, list of characters, etc). 
Metrical verse plays can introduce some interesting combinations as can 
ones that use both prose and verse. But the basic idea is the same.

I'm not a current DP post-processor, but my impression from interactions 
with many of them is that in addition to the lack of authoring tools for 
TEI, another major stumbling block is that the currently available 
transforms (or whatever they are properly called) produce ugly looking 
results. Post-processors put a lot of time into their projects and they 
want them to look nice. It's all very well to say that only semantic 
markup should be used, but that's not going to win converts if the 
results are dramatically different in character and feel from the 
original book and just don't look good. This applies mostly to the html 
versions, since there's not a lot that can be done to make plain text 
worse. Something that comes to mind immediately is dealing with 
decorative capital letters at the beginning of a chapter or paragraph. 
It's a presentation matter, that has nothing to do with the semantics of 
the book, but can be important to the look and feel.

I guess my basic message is that two things are needed in order to 
persuade the majority of book producers for PG to use TEI (or any other 
master format). The first is the need to produce a result that the 
post-processors can feel proud of when they see it, and the second is 
good authoring tools.

JulietS

Jon Noring wrote:
> Lee wrote:
>   
>> Jeroen Hellingman wrote:
>>     
>
>   
>>> I think the biggest barrier here is the steep learning curve of TEI
>>> (20% of the tags cover 80% of the things you encounter, but every
>>> other book you will need something from those remaining 80%, and,
>>> oh gosh, which tag can I use then) ....
>>>       
>
>   
>> I am intrigued by this comment (and not only because it mirrors my
>> own experience). So by way of information gathering among those who
>> use TEI on a regular basis, I would you to tell me, perhaps simply
>> as an ordered list, what TEI tags you believe are most used and most
>> valuable (not necessarily the same thing). In other words, what are
>> the 20% of the tags that cover 80% of the need, and from the
>> remaining 80% what seems to come up the most often?
>>
>> I'm thinking of writing a little script that will try to automate
>> the collection of usage data from current Gutenberg TEI texts.
>>     
>
> Regarding the last paragraph Lee wrote, I think that's a splendid
> idea, to see what elements, attributes and attribute values have been
> used, doing a statistical analysis of their usage. I hope the analysis
> will also look at content models, maybe building a minimal DTD to which
> all the documents will validate, (I believe there are tools which will
> build a common DTD from a set of XML documents -- hmmm, maybe that's
> the approach to take first, then one can do a statistical analysis by
> comparing to this minimal DTD. Also, inspection of that DTD will
> provide insights.)
>
> *****
>
> Regarding the rest of what Lee wrote, a few days ago I outlined in a
> private email to Lee some preliminary ideas, which I'll restate here
> for discussion purposes:
>
> The gist of the idea is that a group of us create a very strict subset
> of TEI: elements, attributes and *standardized* attribute values, and
> constrained element content models, along with any other markup usage
> rules that cannot be enforced by a DTD or schema. This subset and
> associated ruleset would be sufficient to consistently, uniformly, and
> in standardized fashion (especially attribute values), markup 80% or
> 90% of the public domain books which PG/DP works with. A related goal
> (if possible to achieve) would be that when different people
> independently markup the same book, and follow the rules, the marked
> up documents would be, for all practical purposes, canonically
> identical with one another.
>
> Furthermore, we would actually build our own DTD or schema so that
> those authoring to this strictly applied subset could immediately
> validate to it. Also, we could write a script to do conformance
> checking to check any other requirements that cannot be enforced by
> DTD or schema -- a sort of "conformance chekcer." We could also write
> a brief document describing how to markup documents using this subset
> vocabulary, and minimizing the need for people learning to markup
> using our simple subset to have to slog through the TEI manual to
> figure out how to do something.
>
> And for most (but clearly not all) of the remaining texts, we could
> slowly build a "superset" of the basic DTD, so that at least the more
> complicated books follow the strict subset in uniformity of basic
> markup. This superset could be slowly built over time.
>
> The benefit of this approach is that we can now involve more people in
> marking up books, have the validation tools, and provide a much more
> uniform basis by which authoring tool and conversion tools can be
> built more quickly. The problem with the full-blown TEI, and even
> TEI-Lite (which is not "Lite"), is that it is so massive, and the
> manual so difficult to comprehend without spending a year studying it,
> that those trying to build dedicated authoring tools and conversion
> tools to handle all possibilities is much more difficult. And I'm not
> advocating not using the full blown TEI for the extremely funky texts,
> but let's at least standardize on something simple and uniform for the
> vast majority of the books.
>
> Anyway, that's the core of the idea. And not actually new as Josh has
> mentioned it in some fashion, but maybe what is proposed here has a
> few twists on what previously has been proposed.
>
> And I'm willing to help build the DTD (I prefer DTDs since they should
> be sufficient for this purpose and there are other advantages to DTD
> over schema which I won't get into here.) I am quite experienced in
> building DTDs, having built by hand the OEBPS 1.2 Document and
> Package DTDs, the OpenReader Binder DTD, and the BookX DTD (which has
> a similar philosophy to that described above but focused on new book
> publishers, clearly not for use by PG/DP for reasons I won't delve
> into.)
>
> Jon Noring
>
>
> (p.s., since TEI P5 is soon to be released -- it is currently at
> version 0.9 and may be elevated to 1.0 at this years TEI Annual
> Meeting in November -- our subset should definitely be built on P5.)
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>   


From joshua at hutchinson.net  Fri Oct  5 06:45:51 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Fri, 5 Oct 2007 13:45:51 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>

>----Original Message----
>From: vze3rknp at verizon.net
>
>... my impression from interactions 
>with many of them is that in addition to the lack of authoring tools 
for 
>TEI, another major stumbling block is that the currently available 
>transforms (or whatever they are properly called) produce ugly 
looking 
>results. 

This is a huge issue, believe it or not.

Basically, the problem lies in that there are two camps of thought.

Camp 1 - The layout of the original book is important and the final 
product (in the HTML and PDF at least) should as closely resemble the 
original look and feel of the book as is possible.  Things like drop 
caps at the beginning of chapters, poems that are centered on the page 
as opposed to left justified, etc. are EXTREMELY important to folks in 
this camp.

Camp 2 - The content is king.  This group doesn't care so much about 
the original layout of the book, but rather what that layout is meant 
to convey.  Is a chapter heading BOLD, CENTER and ITALICS in the 
original?  Great.  But in the PG version, it should be a chapter 
heading and use whatever standard PG stylesheets use.  Things like drop 
caps or matching the idiosyncratic layout of the original table of 
contents are "fluff" to folks in this camp.

***

You will find loud proponents of both camps.  (You'll also find quiet 
and reasonable proponents, too.)  

The problem is that TEI is better suited to Camp 2 (though Camp 1 can 
be accomodated ... it's just much more work).

***

Is one camp right and the other wrong?  Is it necessary to have one 
camp or the other "win"?  Can both be adequately served?  Is it worth 
the effort to TRY to serve both camps?

These are some questions that need to be answered.

Josh

PS I don't believe that the current output of PGTEI process is 
"ugly".  Rather, it is uniform and loses the original "charm" of the 
original book's layout.


From rolsch at verizon.net  Fri Oct  5 07:27:32 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Fri, 05 Oct 2007 10:27:32 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <200710051027.32646.rolsch@verizon.net>

On Friday 05 October 2007 9:45 am, joshua at hutchinson.net wrote:
> >----Original Message----
> >From: vze3rknp at verizon.net
> >
> >... my impression from interactions
> >with many of them is that in addition to the lack of authoring tools
>
> for
>
> >TEI, another major stumbling block is that the currently available
> >transforms (or whatever they are properly called) produce ugly
>
> looking
>
> >results.
>
> This is a huge issue, believe it or not.
>
> Basically, the problem lies in that there are two camps of thought.
>
> Camp 1 - The layout of the original book is important and the final
> product (in the HTML and PDF at least) should as closely resemble the
> original look and feel of the book as is possible.  Things like drop
> caps at the beginning of chapters, poems that are centered on the page
> as opposed to left justified, etc. are EXTREMELY important to folks in
> this camp.
>
> Camp 2 - The content is king.  This group doesn't care so much about
> the original layout of the book, but rather what that layout is meant
> to convey.  Is a chapter heading BOLD, CENTER and ITALICS in the
> original?  Great.  But in the PG version, it should be a chapter
> heading and use whatever standard PG stylesheets use.  Things like drop
> caps or matching the idiosyncratic layout of the original table of
> contents are "fluff" to folks in this camp.
>
> ***
>
> You will find loud proponents of both camps.  (You'll also find quiet
> and reasonable proponents, too.)
>
> The problem is that TEI is better suited to Camp 2 (though Camp 1 can
> be accomodated ... it's just much more work).
>
> ***
>
> Is one camp right and the other wrong?  Is it necessary to have one
> camp or the other "win"?  Can both be adequately served?  Is it worth
> the effort to TRY to serve both camps?

I do not think it is a matter of which camp is right or wrong but, how many of 
the PPers at DP, which are followers of Camp 1, can be convinced to adopt 
TEI.  Since, I believe at present, the PPers of Camp 1 greatly out number the 
PPers of Camp 2.  So, I think it very much worth the effort to accommodate 
the followers of Camp 1.  Otherwise, the number of PPers who use TEI will 
continue to expand at the present slow rate of growth.  Thereby, earning us 
very little on our time and effort.

> These are some questions that need to be answered.
>
> Josh
>
> PS I don't believe that the current output of PGTEI process is
> "ugly".  Rather, it is uniform and loses the original "charm" of the
> original book's layout.

Roland Schlenker

From jon at noring.name  Fri Oct  5 09:05:15 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 5 Oct 2007 10:05:15 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <724651335.20071005100515@noring.name>

Joshua wrote:

> Basically, the problem lies in that there are two camps of thought.
>
> Camp 1 - The layout of the original book is important and the final 
> product (in the HTML and PDF at least) should as closely resemble the 
> original look and feel of the book as is possible.  Things like drop 
> caps at the beginning of chapters, poems that are centered on the page
> as opposed to left justified, etc. are EXTREMELY important to folks in
> this camp.
>
> Camp 2 - The content is king.  This group doesn't care so much about 
> the original layout of the book, but rather what that layout is meant 
> to convey.  Is a chapter heading BOLD, CENTER and ITALICS in the 
> original?  Great.  But in the PG version, it should be a chapter 
> heading and use whatever standard PG stylesheets use.  Things like drop
> caps or matching the idiosyncratic layout of the original table of 
> contents are "fluff" to folks in this camp.
>
> ***
>
> The problem is that TEI is better suited to Camp 2 (though Camp 1 can 
> be accomodated ... it's just much more work).


Josh, this is an excellent summary of the two camps!

Most who have followed my messages over the years know that I am
firmly in Camp 2, but I have no difficulty with interested individuals
in taking the digital content and producing "facsimile" digital
renditions. (Juliet's message seems to indicate there are many at DP
who fall into Camp 1.)

One thing I do know, that with a properly proofed and marked up digital
content master which focuses on identifying the universally-important
document structures and inline text semantics, it is possible to
repurpose the content almost anyway one wants, and this includes
producing facsimile renditions (sometimes this can be done
automatically, other times does require human intervention.)

However, if the markup is mostly presentationally-oriented so as to
focus on only on facsimile production, then the content is much less
reusable. I do believe most of the Camp 1 people at DP understand
this and strive in their work product to markup document structures
and inline text semantics as much as possible. (There are a number of
older PG texts, though, where the HTML markup is wholly
presentational, such as what happened to the 1001 Nights -- the markup
of that is *so bad* that I gave up trying to elevate it to proper
markup. Harumph. Also, the source text used was NOT the right one to
use, unfortunately, but that's not germane to this discussion.)

Two other factors to consider:

1) Most now recognize the importance of having the original page scans
   available alongside the digitized text master. For some who wish to
   see how the original book was typeset, this is more than
   sufficient and they would not need to see a facsimile digital
   rendition.

   (Aside, I've always believed that if we are to scan the public
   domain books, we should do so at sufficient scan quality so the
   scans will be useful for ludic reading, and not just as feed for
   OCR, and thus have always advocated higher quality master scans
   than has been done. This zeal for speed has troubled me, especially
   in that the bottleneck at DP is not scans, but proofing -- I would
   hope that DP will begin to encourage book scanners to focus on
   archival quality -- even presentation quality -- rather than the
   current "just scan 'em good enough for OCR." Make the scans
   themselves a valuable work product that the scanner can be proud to
   distribute to the world, and not only to the inner workings of DP.
   Let's have a "proudly scanned by ..." added to the information
   (metadata) provided along-side by the scansets.)

2) When one analyzes the total work to get from a paper book to an
   accurate digital facsimile rendition, most of the work is still
   in the scanning, proofing, and structure/semantic markup stages
   (with most in the first two). I can only guesstimate the percentage
   of the total person-hours to accomplish these three items, but I
   believe it's well into the upper 90% range. The person-hours to
   take the digital master and then produce a facsimile, even if done
   mostly manually, is relatively minor -- those skilled at it
   actually enjoy doing this, and will do it mostly manually anyway,
   and can work quite fast, sometimes just a few hours. (True
   facsimiles probably require InDesign to layout the text.)


Juliet mentioned the difficulties with "Punch", and definitely Punch
stretches things with it's "stream of consciousness",
visually-oriented approach. I've only looked at one so far, #22698:

   http://www.gutenberg.org/files/22698/22698-h/22698-h.htm

But as I look at the HTML example (not sure where the original page
scans are), it still appears the content is pretty linear and so a
digital text master can still be produced which focuses on structural/
semantic markup (just a stream of section after section, each section
autonomous from the others.) So long as the page scans are available,
someone can then take that digital text master and produce a facsimile
version.

In some ways, the complications DP is facing with Punch is *because*
many there want to directly *autoconvert* from the Master to the
Facsimile:

   Digital Master --> Digital Facsimile

What I propose is the following:

   Digital Master --> "Facsimile Master" --> Digital Facsimile

The focus would be on producing the Digital Master, but if someone
wants to produce a "Facsimile Master" from the Digital Master (even
if it necessitates redoing the document markup from scratch or putting
in a lot of hand labor), then all the power to them.

I see a separation into different groups:

1) Distributed Scanning

2) Distributed Proofing

3) Digital Text Mastering (sufficient to autoconvert into ebook
   formats optimized for various target platforms -- focus on content.)

4) Digital Facsimile Production

I see getting to #3 as the most important for fully usable texts. I do
not see #4 as being that important for the vast majority of texts,
especially now that we will have the original scan sets available in
sufficiently high quality for reference or direct reading. However, all
the power to those who want to produce Digital Facsimiles of *selected*
Digital Text Masters. (Do all public domain texts require we produce
digital facsimiles of them, at least right away, especially when we
now will have scansets available? I'd say only for certain texts do
digital facsimiles make any sense -- that's why I have trouble with
the zeal to make digital facsimiles of all books.)

To be frank, I see Juliet in a sort of bind since a lot of the DP
volunteers are so enamored with #4 that they don't properly focus on
#3 in a separate manner. A separation into different "projects" makes
sense.

Of course, those who love digital facsimiles can continue to advise the
digital text mastering folk how to better markup the digital masters
to make facsimile production a little easier, but there's a point when
the digital mastering folk have to say "enough" and tell the facsimile
folk to do it themselves. The digital text mastering folk *have* to
focus on the repurposeability and accessibility of their work product
to the world at large, to focus on the content and not on original
presentation (most of which is arbitrary anyway.)


Well, obviously I've opened up a lot of contentious issues here, but
they are my opinions, and hope others will respond in an objective,
unemotional way.

Jon Noring


From joshua at hutchinson.net  Fri Oct  5 09:29:35 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Fri, 5 Oct 2007 16:29:35 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <28140207.1191601775105.JavaMail.?@fh1064.dia.cp.net>

>----Original Message----
>From: jon at noring.name
>
>Juliet mentioned the difficulties with "Punch", and definitely Punch
>stretches things with it's "stream of consciousness",
>visually-oriented approach. I've only looked at one so far, #22698:
>
>   http://www.gutenberg.org/files/22698/22698-h/22698-h.htm
>

Just a quick FYI:

The page images can currently be found here:

http://pageimages.pglaf.org/Loewenstein/22698/22698-page-images/

and will be moved to the archives once the WWer (in this case, Joe) 
okays them.  We're about a month behind posting images due to September 
being a pretty heavy month.

Josh


From jon at noring.name  Fri Oct  5 10:03:01 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 5 Oct 2007 11:03:01 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <28140207.1191601775105.JavaMail.?@fh1064.dia.cp.net>
References: <28140207.1191601775105.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <422003665.20071005110301@noring.name>

Josh wrote:

> Just a quick FYI:
>
> The page images can currently be found here:
>
> http://pageimages.pglaf.org/Loewenstein/22698/22698-page-images/
>
> and will be moved to the archives once the WWer (in this case, Joe) 
> okays them.  We're about a month behind posting images due to September
> being a pretty heavy month.

Thanks!

Hmmm, from a Digital Text Mastering approach, that version of Punch
would be relatively easy. It is actually a quite linear text divided
up into autonomous sections/divisions. The only complication is that
it is a mix of Prose, Verse, Drama, and apparently stand-alone images,
but TEI has markup for all that which can be applied in standardized
manner.

So my view would be to mark this up using TEI (and where necessary
bring in both the Verse and Drama modules for their added "stuff"), in
linear fashion and not be concerned at all about layout since in the
example I looked at I did not see any "layout is content" stuff (and
when there is the rare "typographic layout is itself content", use SVG
to mark that up.)

Issue that TEI as a Digital Text Master, then let the Digital
Facsimile enthusiasts, if they so choose, to take that and produce a
facsimile edition by creating their own derivative master(s), using
InDesign, or whatever path(s) makes sense for their purposes.

But trying to embed the digital text master with special markup so it
will autoconvert to the desired facsimile result is, in my opinion,
not the way to go to produce a near-facsimile (such as HTML) or a
perfect-facsimile (e.g., PDF or SVG).

This might be where some bottlenecks in the DP process is occurring
and making adoption of TEI for digital text mastering difficult: a
focus on facsimile reproduction *directly* from the Master. If the
digital text mastering is separated from the digital facsimile
production, such as by separating into different groups, this may free
up one bottleneck. Just an outside observation.

To be clear, I am not hostile to creating digital facsimiles, but
trying to produce a single Digital Text Master, along with a universal
conversion system, which will pushbutton convert to both 1) optimized
digital format versions focusing on target platforms and content, and
2) quite faithful facsimile reproductions of the original typesetting,
is a near impossible proposition in a practical sense -- it is doable,
but so damn complicated it will never be implemented (I am aware of
commercial systems that do this, such as Rosetta Solutions, and they
are complicated, meant to be used commercially.)

This probably explains why DP and PG are still struggling with TEI --
to try to be all things to all people.

The separation of Digital Text Mastering with Digital Facsimile
production makes sense, then, as I previously noted. DTM would focus
on the content and repurposeability to all target digital platforms;
DFP would focus on taking the DTM of selected texts and producing
various levels of facsimiles using a variety of tools and output
formats optimized for that purpose. DFP will probably create its own
"internal master" if they wish, but this would not replace the DTM
master.

Jon Noring


From grythumn at gmail.com  Fri Oct  5 10:17:58 2007
From: grythumn at gmail.com (Robert Cicconetti)
Date: Fri, 5 Oct 2007 13:17:58 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <724651335.20071005100515@noring.name>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<724651335.20071005100515@noring.name>
Message-ID: <15cfa2a50710051017i4b4476acv19a8aa63037c3089@mail.gmail.com>

On 10/5/07, Jon Noring <jon at noring.name> wrote:
>    (Aside, I've always believed that if we are to scan the public
>    domain books, we should do so at sufficient scan quality so the
>    scans will be useful for ludic reading, and not just as feed for
>    OCR, and thus have always advocated higher quality master scans
>    than has been done. This zeal for speed has troubled me, especially
>    in that the bottleneck at DP is not scans, but proofing -- I would
>    hope that DP will begin to encourage book scanners to focus on
>    archival quality -- even presentation quality -- rather than the
>    current "just scan 'em good enough for OCR." Make the scans

You're confusing speed of scanning with bandwidth restrictions[0]..
CPers are encouraged to provide sub-100k page images[1] to make sure
that page load times are not prohibitive for people on modems, (and,
incidently, to stay under the monthly transfer limits on the server..)
I generally scan in grayscale, and later convert to B/W before
uploading.

In addition, most of the page images actually look pretty good scaled
down if you use something better than nearest neighbor.. try using
Opera, or IE7 with a line of CSS to enable bicubic scaling.

R C
[0] Although IIRC some of the high-speed scanners are black and white only..
[1] But not required.. I've provided several projects with 4 or 8
color grayscale pages weighing in at 150-200k.. when the condition of
the text required it.

From jon at noring.name  Fri Oct  5 10:43:02 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 5 Oct 2007 11:43:02 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <15cfa2a50710051017i4b4476acv19a8aa63037c3089@mail.gmail.com>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<724651335.20071005100515@noring.name>
	<15cfa2a50710051017i4b4476acv19a8aa63037c3089@mail.gmail.com>
Message-ID: <1941308865.20071005114302@noring.name>

Robert
> Jon Noring wrote:

>> (Aside, I've always believed that if we are to scan the public
>> domain books, we should do so at sufficient scan quality so the
>> scans will be useful for ludic reading, and not just as feed for
>> OCR, and thus have always advocated higher quality master scans
>> than has been done. This zeal for speed has troubled me, especially
>> in that the bottleneck at DP is not scans, but proofing -- I would
>> hope that DP will begin to encourage book scanners to focus on
>> archival quality -- even presentation quality -- rather than the
>> current "just scan 'em good enough for OCR." Make the scans

> You're confusing speed of scanning with bandwidth restrictions[0]..
> CPers are encouraged to provide sub-100k page images[1] to make sure
> that page load times are not prohibitive for people on modems, (and,
> incidently, to stay under the monthly transfer limits on the server..)
> I generally scan in grayscale, and later convert to B/W before
> uploading.
>
> In addition, most of the page images actually look pretty good scaled
> down if you use something better than nearest neighbor.. try using
> Opera, or IE7 with a line of CSS to enable bicubic scaling.

Well, I am aware of "bandwidth restrictions", but I still think
scanning should be an autonomous activity whose work product is itself
publishable, such as donating to IA. One can certainly burn DVDs
containing a couple gigs of scans of a book, then mail the DVD to IA
and/or elsewhere, including DP folk.

And certainly once one has hi-rez versions, they can be downsampled
before uploading to DP for OCR purposes.

I think the confusion lies in that DP still considers the sole purpose
of scans to be input to its process, so are not concerned as much by
resolution and color depth (other than for images in the books).

So long as DP does not make any effort to encourage those who scan
books to do so at archival or even presentational quality, most won't.
But if it is encouraged, I think most will take the time to do it. If
volunteers are given reasons why to do something to a certain higher
level of quality, most will gladly do so. I reject the notion that
*asking* them to take the effort to produce archival quality will
turn them away. The end result is that a lot of high quality scan sets
will result and be made available to the world.

Good enough for DP should NOT be considered good enough.

Jon Noring


From editor at pg-news.org  Fri Oct  5 11:19:22 2007
From: editor at pg-news.org (Mike Cook)
Date: Fri, 5 Oct 2007 20:19:22 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <392096647.20071004191413@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>	<4700FDCF.1060009@perathoner.de>
	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>
	<392096647.20071004191413@noring.name>
Message-ID: <005901c8077c$42e25f10$c8a71d30$@org>

Of the 70 PG texts I've made into TEI, this is my list of current tags. Just of
the <body> section. (I don't think I've missed anything out ;-)

<body>
<div type="blankpage">
<div type="book" n="xx">
<div type="part" n="xx">
<div type="volume" n="xx">
<div>
<emph>
<figDesc>
<figure url="images/">
<head type="sub">
<head>
<index index="fig" />
<index index="pdf" />
<index index="toc" />
<l rend="margin-left: 10%">
<l>
<lg rend="margin-left: 10%">
<lg>
<milestone unit="tb" />
<note place="foot">
<p rend="margin-left: 10%">
<p rend="text-align: center">
<p>
<pb>
<q>
<quote>
<sub>
<epigraph>


And also

&backslash;
&braceleft;
&braceright;
&deg;
&gt;
&hellip;
&lt;
&mdash;
&nbsp;
&ndash;
&qdash;

That's all I have :)

In some of my previous messages I talked about making very simple TEI files.
Once we have all the PG texts in this type of minimal markup then the TEI guru's
can start adding the more interesting markup options.

Mike


-----Original Message-----
From: Jon Noring [mailto:jon at noring.name] 
Sent: 05 October 2007 03:14
To: Project Gutenberg Volunteer Discussion
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?

Lee wrote:
> Jeroen Hellingman wrote:

>> I think the biggest barrier here is the steep learning curve of TEI
>> (20% of the tags cover 80% of the things you encounter, but every
>> other book you will need something from those remaining 80%, and,
>> oh gosh, which tag can I use then) ....

> I am intrigued by this comment (and not only because it mirrors my
> own experience). So by way of information gathering among those who
> use TEI on a regular basis, I would you to tell me, perhaps simply
> as an ordered list, what TEI tags you believe are most used and most
> valuable (not necessarily the same thing). In other words, what are
> the 20% of the tags that cover 80% of the need, and from the
> remaining 80% what seems to come up the most often?
>
> I'm thinking of writing a little script that will try to automate
> the collection of usage data from current Gutenberg TEI texts.

Regarding the last paragraph Lee wrote, I think that's a splendid
idea, to see what elements, attributes and attribute values have been
used, doing a statistical analysis of their usage. I hope the analysis
will also look at content models, maybe building a minimal DTD to which
all the documents will validate, (I believe there are tools which will
build a common DTD from a set of XML documents -- hmmm, maybe that's
the approach to take first, then one can do a statistical analysis by
comparing to this minimal DTD. Also, inspection of that DTD will
provide insights.)

*****

Regarding the rest of what Lee wrote, a few days ago I outlined in a
private email to Lee some preliminary ideas, which I'll restate here
for discussion purposes:

The gist of the idea is that a group of us create a very strict subset
of TEI: elements, attributes and *standardized* attribute values, and
constrained element content models, along with any other markup usage
rules that cannot be enforced by a DTD or schema. This subset and
associated ruleset would be sufficient to consistently, uniformly, and
in standardized fashion (especially attribute values), markup 80% or
90% of the public domain books which PG/DP works with. A related goal
(if possible to achieve) would be that when different people
independently markup the same book, and follow the rules, the marked
up documents would be, for all practical purposes, canonically
identical with one another.

Furthermore, we would actually build our own DTD or schema so that
those authoring to this strictly applied subset could immediately
validate to it. Also, we could write a script to do conformance
checking to check any other requirements that cannot be enforced by
DTD or schema -- a sort of "conformance chekcer." We could also write
a brief document describing how to markup documents using this subset
vocabulary, and minimizing the need for people learning to markup
using our simple subset to have to slog through the TEI manual to
figure out how to do something.

And for most (but clearly not all) of the remaining texts, we could
slowly build a "superset" of the basic DTD, so that at least the more
complicated books follow the strict subset in uniformity of basic
markup. This superset could be slowly built over time.

The benefit of this approach is that we can now involve more people in
marking up books, have the validation tools, and provide a much more
uniform basis by which authoring tool and conversion tools can be
built more quickly. The problem with the full-blown TEI, and even
TEI-Lite (which is not "Lite"), is that it is so massive, and the
manual so difficult to comprehend without spending a year studying it,
that those trying to build dedicated authoring tools and conversion
tools to handle all possibilities is much more difficult. And I'm not
advocating not using the full blown TEI for the extremely funky texts,
but let's at least standardize on something simple and uniform for the
vast majority of the books.

Anyway, that's the core of the idea. And not actually new as Josh has
mentioned it in some fashion, but maybe what is proposed here has a
few twists on what previously has been proposed.

And I'm willing to help build the DTD (I prefer DTDs since they should
be sufficient for this purpose and there are other advantages to DTD
over schema which I won't get into here.) I am quite experienced in
building DTDs, having built by hand the OEBPS 1.2 Document and
Package DTDs, the OpenReader Binder DTD, and the BookX DTD (which has
a similar philosophy to that described above but focused on new book
publishers, clearly not for use by PG/DP for reasons I won't delve
into.)

Jon Noring


(p.s., since TEI P5 is soon to be released -- it is currently at
version 0.9 and may be elevated to 1.0 at this years TEI Annual
Meeting in November -- our subset should definitely be built on P5.)


From lee at novomail.net  Fri Oct  5 11:33:36 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 05 Oct 2007 12:33:36 -0600
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
In-Reply-To: <200710042133.43012.rolsch@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>
	<200710042133.43012.rolsch@verizon.net>
Message-ID: <47068380.1060802@novomail.net>

Roland Schlenker wrote:

> On Thursday 04 October 2007 6:20 pm, Lee Passey wrote:

[snip]

>> So by way of information gathering among those who use TEI
>> on a regular basis, I would you to tell me, perhaps simply as an ordered
>> list, what TEI tags you believe are most used and most valuable (not
>> necessarily the same thing). In other words, what are the 20% of the
>> tags that cover 80% of the need, and from the remaining 80% what seems
>> to come up the most often?

[snip]

> I use a minimal TEI file as a test file to check how a peculiar piece of text 
> will render in the three current formats.  Then, when the TEI markup is 
> correct, I copy it into the master TEI file.
> 
> The following are some examples from a condensed test file:

This post wasn't really a response to the question I asked, which I 
suppose is not surprising.  I have taken the liberty of editing the 
examples, removing purely presentational attributes, and attempting to 
distinguish between ambiguous uses of different elements. This is what I 
conclude are the elements most important to Mr. Schlenker:

<div>
<div> 	// used as <titlePage>
<head>
<p>
<p>	// Used as <title>
<p>	// Used as <docEdition>
<p>	// used as <ab>
<p>	// used as <byline>
<p>	// used as <respStmt>
<p>	// used as <imprint>
<p>	// used as <imprint><publisher>
<p>	// used as <imprint><date>
<p>	// used as <availability>
<p>	// used as <div type="dedication">
<p>	// used as <closer><salute>
<p>	// used as <closer><signed>
<figure>
<figDesc>
<q>	// used as <said>
<q>	// used for quotation marks in a context other than a quote
<quote>
<lg>
<l>
----------------------
<hi>	// Used purely for presentation
<pb>
<anchor>
<lb>

Interestingly, this is only 13 unique elements (although it would be at 
least double that if the <p> element were used correctly). All in all, 
not an unreasonably large number of elements to learn.

I have also segregated the <pb>, <lb> and <anchor> tags from the rest 
because I just have this gut feeling that while they are useful 
elements, I don't really think they are crucial. I'm trying to 
categorize elements as "crucial", "useful", and "esoteric." No doubt 
there will be a fair amount of controversy when it comes to the border 
cases, but maybe there will some consensus for a significant number of 
elements.

I also think I'm going to ignore (for now) those elements unique to the 
<teiHeader> (which encode metadata) and focus on those elements used in 
the <text> element (which encode the actual content). Personally, I 
think encoding of metadata /is/ crucial, but it feels enough different 
from the encoding of the text that I think it can be dealt with separately.

Hopefully, I can get some more feedback in this type of summary format, 
and I'll try to post my own list (perhaps with some justifications) 
later this weekend.

-- 
Nothing of significance below this line.


From joshua at hutchinson.net  Fri Oct  5 11:38:38 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Fri, 5 Oct 2007 18:38:38 +0000 (UTC)
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
Message-ID: <19938225.1191609518323.JavaMail.?@fh1064.dia.cp.net>

Work from Mike's list.  Quite comprehension (the only other tag I can 
think of that I use regularly is the <divGen> tag that generates 
various automated sections of text, like a titlepage or a table of 
contents).

Josh

>----Original Message----
>From: lee at novomail.net
>
>Hopefully, I can get some more feedback in this type of summary 
format, 
>and I'll try to post my own list (perhaps with some justifications) 
>later this weekend.
>


From Bowerbird at aol.com  Fri Oct  5 12:46:12 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 5 Oct 2007 15:46:12 EDT
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <d31.17e946be.3437ee84@aol.com>

robert said:
>    You're confusing speed of scanning with bandwidth restrictions[0]..

is noring back on his "scan at high resolution" merry-go-round?
why not use the "distributed scanners" yahoogroup to discuss it?

and wasn't there a separate listserve set up to discuss t.e.i. issues?
or if you're discussing d.p. doing .tei, why not discuss it over there?

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071005/ecf5d5a6/attachment.htm 

From jon at noring.name  Fri Oct  5 14:22:10 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 5 Oct 2007 15:22:10 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <d31.17e946be.3437ee84@aol.com>
References: <d31.17e946be.3437ee84@aol.com>
Message-ID: <1491082365.20071005152210@noring.name>

Bowerbird wrote:

>  is noring back on his "scan at high resolution" merry-go-round?
>  why not use the "distributed scanners" yahoogroup to discuss it?
>  
>  and wasn't there a separate listserve set up to discuss t.e.i. issues?
>  or if you're discussing d.p. doing .tei, why not discuss it over there?

Hmmm, are you saying that discussion about books scans is off-topic
here?

And has Greg and Michael given you the authority to decide what is on-
and off-topic in this group?

Jon Noring


From rolsch at verizon.net  Fri Oct  5 14:27:42 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Fri, 05 Oct 2007 17:27:42 -0400
Subject: [gutvol-d]
 =?iso-8859-1?q?gnutenberg-press_maintenance_offer_=28w?=
 =?iso-8859-1?q?as_Re=3A_Proposal_to_add=09OpenDocument_as_an_additional?=
In-Reply-To: <47056722.9000801@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
Message-ID: <200710051727.42671.rolsch@verizon.net>

On Thursday 04 October 2007 6:20 pm, Lee Passey wrote:
> Jeroen Hellingman (Mailing List Account) wrote:
> > I think the biggest barrier here is the steep learning curve of TEI (20%
> > of the tags cover 80% of the things you encounter, but every other book
> > you will need something from those remaining 80%, and, oh gosh, which
> > tag can I use then) ....
>
> I am intrigued by this comment (and not only because it mirrors my own
> experience). So by way of information gathering among those who use TEI
> on a regular basis, I would you to tell me, perhaps simply as an ordered
> list, what TEI tags you believe are most used and most valuable (not
> necessarily the same thing). In other words, what are the 20% of the
> tags that cover 80% of the need, and from the remaining 80% what seems
> to come up the most often?
>
> I'm thinking of writing a little script that will try to automate the
> collection of usage data from current Gutenberg TEI texts.

From my lastest project, Marcia Schuyler, by Grace Livingston Hill Lutz:

<p>                  - 1687
<q>                  - 934
<anchor>             - 434
<pb>                 - 358
<hi>                 - 204
<item>               - 118
<ref>                - 76
<lb/>                - 71
<index>              - 63
<div>                - 41
<list>               - 39
<corr>               - 38
<head>               - 37
<milestone>          - 27
<l>                  - 25
<quote>              - 8
<lg>                 - 5
<divGen>             - 5
<figure>             - 4
<figDesc>            - 4
<title>              - 3
<name>               - 3
<date>               - 3
<publisher>          - 2
<idno>               - 2
<classCode>          - 2
<bibl>               - 2
<author>             - 2
<titleStmt>          - 1
<textClass>          - 1
<text>               - 1
<teiHeader>          - 1
<taxonomy>           - 1
<sourceDesc>         - 1
<revisionDesc>       - 1
<respStmt>           - 1
<publicationStmt>    - 1
<pubPlace>           - 1
<projectDesc>        - 1
<profileDesc>        - 1
<language>           - 1
<langUsage>          - 1
<keywords>           - 1
<imprint>            - 1
<front>              - 1
<fileDesc>           - 1
<encodingDesc>       - 1
<editorialDecl>      - 1
<editionStmt>        - 1
<edition>            - 1
<classDecl>          - 1
<change>             - 1
<body>               - 1
<back>               - 1
<availability>       - 1
<TEI.2>              - 1

The python script used to create the above:

#!/usr/bin/env python

import sys

# Usage filename.py filename
xml = open(sys.argv[1]).read()

# Create a list of elements
xml = xml.split('<')

xml = [x for x in xml if not x.startswith('!--')]
xml = [x for x in xml if not x.startswith('-->')]
xml = [x for x in xml if not x.startswith('!DOCTYPE')]
xml = [x for x in xml if not x.startswith('?xml')]

xml = [x.split()[0] for x in xml if len(x.split()) > 0]
xml = [x.split('>')[0] for x in xml if len(x.split('>')) > 0]

xml = [x for x in xml if not x.startswith('/')]

# Count number of elements
elements = {}
for element in xml:
    if element in elements:
        elements[element] += 1
    else:
        elements[element] = 1

# Create a sorted list of elements and counts
sorted_list = [[value, '<' + element + '>', value]
               for element, value in elements.iteritems()]
sorted_list.sort()
sorted_list.reverse()
sorted_list = [r[1:3] for r in sorted_list]

# Output to standard output a formatted sorted list and counts
result = ['%-20s - %s\n' % (e, c) for e, c in sorted_list]
result = ''.join(result)
print result

Roland Schlenker

PS Best viewed using a fixed font.


From lee at novomail.net  Fri Oct  5 16:08:55 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 05 Oct 2007 17:08:55 -0600
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
In-Reply-To: <200710051727.42671.rolsch@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
Message-ID: <4706C407.7060706@novomail.net>

Roland Schlenker wrote:

[snip]

>>From my lastest project, Marcia Schuyler, by Grace Livingston Hill Lutz:

Thanks, that's excellent data! Of course, the number have to be looked 
at in context. For example, even though there's only one <body> tag, 
/every/ TEI file is going to have to have one, so it's pretty important. 
On the other hand, if we find elements that are used only rarely /in/ a 
document, and are used only rarely /across/ documents, that's a good 
candidate for "esoteric" status. And if those esoteric elements can be 
modeled with other more generic tags (you could probably mark up an 
entire text using only <div>, <ab> and <seg>) then maybe they aren't 
necessary to include in a TEI tutorial.

Thanks again for the data.

-- 
Nothing of significance below this line.


From jon at noring.name  Fri Oct  5 17:26:30 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 5 Oct 2007 18:26:30 -0600
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
	to add OpenDocument as an additional
In-Reply-To: <4706C407.7060706@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net> <4706C407.7060706@novomail.net>
Message-ID: <463976383.20071005182630@noring.name>

Lee wrote:
> Roland Schlenker wrote:

>> From my lastest project, Marcia Schuyler, by Grace Livingston Hill Lutz:

> Thanks, that's excellent data! Of course, the number have to be looked
> at in context. For example, even though there's only one <body> tag, 
> /every/ TEI file is going to have to have one, so it's pretty important.
> On the other hand, if we find elements that are used only rarely /in/ a
> document, and are used only rarely /across/ documents, that's a good 
> candidate for "esoteric" status. And if those esoteric elements can be
> modeled with other more generic tags (you could probably mark up an 
> entire text using only <div>, <ab> and <seg>) then maybe they aren't 
> necessary to include in a TEI tutorial.
>
> Thanks again for the data.

As I previously mentioned, another thing that could be done would be
to gather some or all the TEI documents done for DP/PG and from them
generate a "minimum DTD" that covers all of them.

I think there exists, but I'm not certain, tools which will do this. I
know there exists tools which build a DTD from a single XML document,
but not certain a tool exists which will generate a DTD for a batch of
XML documents using a common vocabulary (of course the same root
element). This is something I'll be happy to check on if no here knows
of a tool and thinks such a minimal DTD would be useful for analysis.

I believe a minimal DTD would be useful in that it gathers in a single
place all the important things about an XML vocabulary:

1) Elements used
2) Attributes used
3) Attribute values used
4) Element content models  (often overlooked but important!)

Of course, one still needs to do a statistical analysis to determine,
as Lee describes above, how often and across how many documents
certain tags are used in DP/PG TEI texts.

Another value of such a DTD is that it could be the starting point, if
we deem it useful, to build a "Basic DP/PG" TEI-subset DTD which
covers a reasonable fraction of the PG texts (say 80 or 90%). Of
course, we'd have to adapt the DTD to the changes (and new elements)
in the upcoming TEI P5 vocabulary, as well as do other changes as deemed
necessary to streamline the vocabulary (such as trying to constrain it
so there's just one way to markup documents that will be fully
characterized by the Basic DTD.) I think it is a way that maybe we can
finally converge on a TEI-based strategy for mastering PG/DP texts
which is easy-to-use for the vast majority of the texts and allows for
better standardization of authoring and conversion tools.

Jon Noring


From prosfilaes at gmail.com  Fri Oct  5 17:46:11 2007
From: prosfilaes at gmail.com (David Starner)
Date: Fri, 5 Oct 2007 20:46:11 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>

On 10/5/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
> Things like drop
> caps or matching the idiosyncratic layout of the original table of
> contents are "fluff" to folks in this camp.

The frustrating thing about this is that the distinctive features of
the original were frequently added carefully and with great care, and
you lose a lot by folding it into a regular pattern.

> PS I don't believe that the current output of PGTEI process is
> "ugly".  Rather, it is uniform and loses the original "charm" of the
> original book's layout.

I disagree; in particular, my stopping point has always been the title
page. The title page of
<http://www.gutenberg.org/files/19487/19487-h/19487-h.html> doesn't
look like any title page I've ever seen. Looking "Money, Money, Money"
by Ed McBain, "Outwitting History", by Aaron Lansky, and Dover's
edition of Fantomas, all three of them have the text centered, the
title in the largest font, the subtitle, then the author's name, which
is also in a font that is larger and more dominant than the font for
the body of the text. Then, on obverse side, is all the fine detail
about editions and copyrights and years and publication history, in a
small font. The TEI-Lite one just looks like some text dumped onto a
page in a way that doesn't stand out in the least; in fact, in a
typesetting atrocity, "GENERAL PREFACE" and other section headings are
larger than the title.

<http://www.gutenberg.org/files/21195/21195-h/21195-h.html> actually
manages to be worse; the title page is a hideous atrocity of
Lovecraftian proportions. Once again, the text isn't centered. In an
incredibly tacky, Anglo-centric manner, we introduce English onto the
title page of a book written completely in Esperanto. Bad English,
mind you, since English speakers always say First Edition, never
Edition 1. I've never seen anything like "Edition 1, (April 20,
2007)", either; you don't separate off a parenthetic clause with a
comma. Nor have I ever seen the full date written out on a title page.
Nor is this the first edition; that's been justified to me on the
claims that it's the first PG edition, but that's still very
misleading, especially as Project Gutenberg isn't mentioned on the
title page. We also fail to credit the translator on the title page,
which is usually done in more respectable books.

And don't give me for one second that all we need to do is translated
the appropriate files for TEI-Lite. That's not an excuse for
publishing the book without translating those files. Furthermore, I
know there are languages where those files can't be simply translated,
where the gender of the author matters in which word-forms you need to
use. And we may want to do books in languages that no one--or at least
we--aren't entirely fluent in.

Beyond, it mostly looks acceptable, with the exception of the
continued Anglo-centric use of pg to indicate page numbers in an
Esperanto text. I object to how the page numbers are handled; even in
an English text, pg instead of page looks informal and unprofessional.
Any repeating word should be unnecessary. It also ignores best current
practices in how to display the page number; most modern DP HTML texts
don't display it so large and loud.

In a lot of cases, what makes a DP HTML text look good is not the
original "charm"; it's the work put into making the HTML look good.
There doesn't seem to have been much if any such work on making the
output of TEI-Lite look good. Nobody seems to have looked at the ways
that HTML writers have produced page numbers and made it look right,
or looked at title pages in etexts and in real life, or even bothered
to listen to what I said last time I complained about the title pages.
Listening to people who don't already think everything about TEI-Lite
is the bee's knees might help you reach the 90% that can be swayed.

From traverso at posso.dm.unipi.it  Fri Oct  5 22:40:38 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Sat,  6 Oct 2007 07:40:38 +0200 (CEST)
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re:
	Proposal	to add OpenDocument as an additional
In-Reply-To: <463976383.20071005182630@noring.name> (message from Jon Noring
	on Fri, 5 Oct 2007 18:26:30 -0600)
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net> <4706C407.7060706@novomail.net>
	<463976383.20071005182630@noring.name>
Message-ID: <20071006054038.34D0C101FD@posso.dm.unipi.it>

>>>>> "Jon" == Jon Noring <jon at noring.name> writes:

    Jon> As I previously mentioned, another thing that could be done
    Jon> would be to gather some or all the TEI documents done for
    Jon> DP/PG and from them generate a "minimum DTD" that covers all
    Jon> of them.

I think that it would be important not only to gather what has been
used, but also what should have been used. For example, I have never
done TEI PG books, but I am planning to do them, and in the tags that
I plan to use, and one of the main reasons to me to use TEI, are the
<corr> and <sic> tags, to document the errors in the original. The
support in the conversion is not important, (as long as the tag does
not make the conversion fail, but the tag is just discarded), since
the feature may be added later, but it is a kind of information that
is preserved in DP proofreading and should not be discarded.


A different consideration, of more general type. It would be useful to
have a network of formats and conversion tools; of these formats some
should be considered "essential" (currently, it is only the txt
format), and an ebook coded in any format could be considered a
"master text" if from it every essential format can be obtained with
some combination of the tools. For example, to come back to the
original thread, an OpenDocument book might be accepted as master, if
an accepted tool exists to convert (a subset of) OpenDocument to
PGTEI, (and the book submitted can be converted with this tool) since
from PGTEI the other formats may be obtained. Another example, a
carefully hand-made HTML could be accepted as master if a tool to
convert to PGTEI exists.

Of course, even a (regularized) txt format might be a master, if every
essential format can be reached from it. 


A "master format" should be a format in which a very large part of
books can be represented faithfully as master text. 

Carlo

From rolsch at verizon.net  Sat Oct  6 06:03:32 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Sat, 06 Oct 2007 09:03:32 -0400
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
In-Reply-To: <4706C407.7060706@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<200710051727.42671.rolsch@verizon.net> <4706C407.7060706@novomail.net>
Message-ID: <200710060903.32265.rolsch@verizon.net>

On Friday 05 October 2007 7:08 pm, Lee Passey wrote:
> Roland Schlenker wrote:
>
> [snip]
>
> >>From my lastest project, Marcia Schuyler, by Grace Livingston Hill Lutz:
>
> Thanks, that's excellent data! Of course, the number have to be looked
> at in context. For example, even though there's only one <body> tag,
> /every/ TEI file is going to have to have one, so it's pretty important.
> On the other hand, if we find elements that are used only rarely /in/ a
> document, and are used only rarely /across/ documents, that's a good
> candidate for "esoteric" status. And if those esoteric elements can be
> modeled with other more generic tags (you could probably mark up an
> entire text using only <div>, <ab> and <seg>) then maybe they aren't
> necessary to include in a TEI tutorial.
>
> Thanks again for the data.

I would be willing to process a random sample of TEI book available at PG for 
more data.

How would this be for a starters:

english only
etext #
elements > 3
formatted the same

Posted to the list?

Any suggestions?

Roland Schlenker


From marcello at perathoner.de  Sat Oct  6 07:51:53 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 06 Oct 2007 16:51:53 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
Message-ID: <4707A109.5070000@perathoner.de>

David Starner wrote:

>> PS I don't believe that the current output of PGTEI process is
>> "ugly".  Rather, it is uniform and loses the original "charm" of the
>> original book's layout.
> 
> I disagree; in particular, my stopping point has always been the title
> page. The title page of
> <http://www.gutenberg.org/files/19487/19487-h/19487-h.html> doesn't
> look like any title page I've ever seen.

You don't realize that a TEI title page is in no way different from any
other page in the book and you can make it look any way you want. See:

  http://www.gnutenberg.de/download/candide/candi.html
  http://www.gnutenberg.de/download/candide/candi.pdf
  http://www.gnutenberg.de/download/candide/candi.txt
  http://www.gnutenberg.de/download/candide/candi.tei

As you can see "Candide" was done with the TEI converter of 2003. The
current converter can do much more than this.


The reason why most TEI title pages in the archive follow the same
template is that there is a labour-saving macro

  <divGen type="titlepage" />

that generates a stock title page from the data in the TEI header.

Of course, you don't have to use that macro. You may use it if you think
the time it makes you save is better spent making more books than
dinking with the title page yourself.


> <http://www.gutenberg.org/files/21195/21195-h/21195-h.html> actually
> manages to be worse; the title page is a hideous atrocity of
> Lovecraftian proportions. Once again, the text isn't centered. In an
> incredibly tacky, Anglo-centric manner, we introduce English onto the
> title page of a book written completely in Esperanto.

Again, see the Candide example for a book done entirely in the French
manner down to the French spaces before punctuation.


> And don't give me for one second that all we need to do is translated
> the appropriate files for TEI-Lite.

You don't need even that. Just build your own title page and use all the
words you like. Full unicode support comes included.


> Beyond, it mostly looks acceptable, with the exception of the
> continued Anglo-centric use of pg to indicate page numbers in an
> Esperanto text.

The "[pg ]" was an explicit request from DP PPers. I personally would
dump page numbers from all formats except the TEI master.


> There doesn't seem to have been much if any such work on making the
> output of TEI-Lite look good.

Of course not. Because every PPer has a different idea of "look good".
What for one PPer is a "conditio sine qua non TEI" will make the other
PPer go berserk for hours about your "ugly output".

The PGTEI output is intentionally left as neutral as possible. If you
don't like that, you can use a style sheets inside your TEI master.


> Nobody seems to have looked at the ways
> that HTML writers have produced page numbers and made it look right,
> or looked at title pages in etexts and in real life, or even bothered
> to listen to what I said last time I complained about the title pages.

If you -- again -- criticize things that have been solved since at least
four years, nobody will listen to you -- again.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From joshua at hutchinson.net  Sat Oct  6 09:11:07 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Sat, 6 Oct 2007 16:11:07 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>

>----Original Message----
>From: prosfilaes at gmail.com
>
>I disagree; in particular, my stopping point has always been the 
title
>page. 
>
><http://www.gutenberg.org/files/21195/21195-h/21195-h.html> actually
>manages to be worse; the title page is a hideous atrocity of
>Lovecraftian proportions. 

Yep, it's basic and quick and dirty.  But it's my fault, not PGTEI.  
I, as the PPer, don't care about fancy title page layout (though you do 
have a point about the translator's name being missing.  That is my 
bad.)  I just wanted a bare minimum that let the reader know what 
he/she was getting.

However, this is not the standard.  You can create a manual title page 
(I used the built in title page macro).  It can be as elaborate and 
francy as you want.  *Almost* anything you can do in HTML can be done 
in a manual title page.

>Beyond, it mostly looks acceptable, with the exception of the
>continued Anglo-centric use of pg to indicate page numbers in an
>Esperanto text. 

The [Pg xxx] format was chosen at the behest of DP suggestions.  Since 
there are always English parts of a PG texts (see header/footer), it 
doesn't seem like a big deal to me.  I honestly don't know if it can be 
changed in the master or if it is hardwired into the conversion right 
now.

>most modern DP HTML texts don't display it so large and loud.

That can be changed by adding a style-sheet to the TEI master.

>
>Listening to people who don't already think everything about TEI-Lite
>is the bee's knees might help you reach the 90% that can be swayed.
>

Honestly, most of the things you don't like are my fault, not the 
TEI's.  I'll see about marking up some better examples (maybe some of 
the bajillion Punch issues waiting over at DP like Juliet suggested).  
The output should look just like Punch issues done in HTML.

Josh

From prosfilaes at gmail.com  Sat Oct  6 11:19:35 2007
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 6 Oct 2007 14:19:35 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4707A109.5070000@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
Message-ID: <6d99d1fd0710061119o7bd0e56ax7a96c679d6dc3a33@mail.gmail.com>

On 10/6/07, Marcello Perathoner <marcello at perathoner.de> wrote:
> You don't realize that a TEI title page is in no way different from any
> other page in the book and you can make it look any way you want. See:

No, of course I realize that. Of course that's true for every format
in PG; if I don't like the way that the HTML file looks, I can open it
up in a text editor and change how it looks. The issue is, the default
looks like crap.

This is what I saw when I clicked on the HTML versions of TEI-Lite in
the archive. This is what most of our users are going to see, and very
few of them are going to realize that it's supposed to be flexible,
and a small percent of that group is going to be willing to put in the
work to change it. It has to be good out of the box.

> The reason why most TEI title pages in the archive follow the same
> template is that there is a labour-saving macro
>
>   <divGen type="titlepage" />
>
> that generates a stock title page from the data in the TEI header.
>
> Of course, you don't have to use that macro. You may use it if you think
> the time it makes you save is better spent making more books than
> dinking with the title page yourself.

So basically the default is "look like crap" and if you actually want
it to look decent, the people who do TEI are going to mock you for
putting the extra work in. Wow, that's a motivation to use TEI.

> > There doesn't seem to have been much if any such work on making the
> > output of TEI-Lite look good.
>
> Of course not. Because every PPer has a different idea of "look good".
> What for one PPer is a "conditio sine qua non TEI" will make the other
> PPer go berserk for hours about your "ugly output".

Once again you dismiss any concepts that it could be done better by
attacking the very concept of better. There's lots of agreement among
PPers about what you need for a high quality HTML version, agreement
that you completely ignore.

> The PGTEI output is intentionally left as neutral as possible.

This is about as neutral as bowerbird's writing style. As I said
before, it blatantly violates the standards that virtually all books
I've seen printed after 1700 adhere to. I've never seen Edition 1 or
the full date on a title page, ever. How is that neutral?

> If you -- again -- criticize things that have been solved since at least
> four years, nobody will listen to you -- again.

If you want to ignore your critics, that's your right. But very few
people are doing TEI-Lite, and that's not going to change if you
ignore the reasons people aren't using it.

From jon at noring.name  Sat Oct  6 11:45:54 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 6 Oct 2007 12:45:54 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
Message-ID: <1977299086.20071006124554@noring.name>

Josh wrote:

> I, as the PPer, don't care about fancy title page layout (though you do
> have a point about the translator's name being missing.  That is my 
> bad.)  I just wanted a bare minimum that let the reader know what 
> he/she was getting.

I view title pages in books to be essentially metadata, and in some
cases a work of art to be treated as a graphic. But, in toto, not
part of the book's textual content.

Sometimes title pages includes textual content (almost always
epigraphs and dedications), but these can be considered independent
of the title page and marked up in their own special sections (usually
placed in the frontmatter division.)

So for digital text mastering I wouldn't even bother with trying to
markup title pages so long as:

1) the metadata contained is recorded in a special metadata section so
   a "title page" optimized for every target reading platform/format
   can be built (e.g., like what MS Reader does for LIT -- it uses the
   Dublin Core data in the LIT's OEBPS Publication to auto-generate a
   title page.)

   (I think Dublin Core is sufficient, particularly if we use the IDPF
   OPS extension for Creator/Contributor. How can we incorporate
   Dublin Core in TEI digital masters?)

2) A high-quality scan of the original title page (and any related
   page or pages) is available to the reader.

For creating a digital facsimile rendition, the facsimile creator can
certainly re-create the title page using the original page scan as a
visual template. The facsimile title page can either be mastered in
SVG, or in XHTML (or a similar vocabulary) using CSS or XSL-FO for
exact styling, or whatever format the facsimile creator prefers.

*****

Likewise, the same goes for a table of contents and other types of
navigational lists we see in books (I'll ignore back of book indexes
in this discussion -- that's a whole different animal.) These
nav-lists are not part of a book's content, but are a sort of
navigational metadata.

Thus, for the digital text master we could do one of at least two
methods to record the navigational data:

1) Use Digital Talking Book's NCX. The NCX can be embedded in the
   digital text master document (using either prefixed namespaces
   which is messy, or within a CDATA section), or kept external. TEI
   might offer other mechanisms. Note that NCX is now legally mandatory
   for accessible educational materials and is required in all OPS
   Publications (which are inside "EPub".)

2) Markup the navigation target points along with sufficient metadata
   for each target describing what nav-list it belongs to, some sort of
   notemark or index number, some sort of title, and its hierarchical
   level which is important to include for accessibility.

With either method, a table of contents and other nav-lists (as needed)
can be machine built fully optimized for each target platform/format.
Adobe Digital Editions, for example, does this using the NCX in EPub.

Again, for those who want to create a facsimile version, they will
refer to the original page scans and build their facsimile Table of
Contents, etc., exactly the way they want and without worry of
repurposability for non-facsimile formats and platforms.


Jon Noring


From marcello at perathoner.de  Sat Oct  6 14:47:18 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 06 Oct 2007 23:47:18 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6d99d1fd0710061119o7bd0e56ax7a96c679d6dc3a33@mail.gmail.com>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>	<4707A109.5070000@perathoner.de>
	<6d99d1fd0710061119o7bd0e56ax7a96c679d6dc3a33@mail.gmail.com>
Message-ID: <47080266.20403@perathoner.de>

David Starner wrote:

> On 10/6/07, Marcello Perathoner <marcello at perathoner.de> wrote:
>> You don't realize that a TEI title page is in no way different from any
>> other page in the book and you can make it look any way you want. See:
> 
> No, of course I realize that. Of course that's true for every format
> in PG; if I don't like the way that the HTML file looks, I can open it
> up in a text editor and change how it looks. The issue is, the default
> looks like crap.

If the PPers thought so, they wouldn't have used the divGen macro. If
they use it, they probably don't think it "looks like crap". As for
everything else, crap is in the eyes of the beholder.


> It has to be good out of the box.

Well. It cannot, because the divGen macro can only use the metadata it
finds in the TEI header. If you want more information or if you want
prettier formatting, you have to roll your own title page.

To roll your own title page will take you about half an hour. Much less
than the time you spent griping here.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jeroen.mailinglist at bohol.ph  Sat Oct  6 15:23:34 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sun, 07 Oct 2007 00:23:34 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <005901c8077c$42e25f10$c8a71d30$@org>
References: <20071001081923.GA29575@ark.in-berlin.de>	<4700FDCF.1060009@perathoner.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<392096647.20071004191413@noring.name>
	<005901c8077c$42e25f10$c8a71d30$@org>
Message-ID: <47080AE6.8060906@bohol.ph>

Mike Cook wrote:
> Of the 70 PG texts I've made into TEI, this is my list of current tags. Just of
> the <body> section. (I don't think I've missed anything out ;-)
>
>
>   
Just the tag count from my most recent TEI file (Expedition to Borneo of
H.M.S. Dido, by Henry Keppel and James Brooke, only HTML and Text got
posted). This shows that most tags are used only a few times, and the
title page and teiHeader account for half of these. Both could be
provided as templates to fill in.


    XML-Tag Frequencies

Tag 	Count
|argument| 	*24*
|author| 	*2*
|availability| 	*1*
|back| 	*1*
|bibl| 	*1*
|body| 	*4*
|byline| 	*1*
|cell| 	*291*
|corr| 	*25*
|date| 	*2*
|dateline| 	*2*
|div1| 	*34*
|div2| 	*7*
|divGen| 	*1*
|docAuthor| 	*1*
|docDate| 	*1*
|docImprint| 	*1*
|docTitle| 	*1*
|encodingDesc| 	*1*
|fileDesc| 	*1*
|front| 	*1*
|head| 	*41*
|hi| 	*759*
|idno| 	*3*
|item| 	*25*
|language| 	*7*
|langUsage| 	*1*
|lb| 	*16*
|list| 	*2*
|milestone| 	*2*
|name| 	*1*
|note| 	*40*
|opener| 	*3*
|p| 	*1439*
|pb| 	*436*
|profileDesc| 	*1*
|publicationStmt| 	*1*
|publisher| 	*1*
|pubPlace| 	*1*
|q| 	*10*
|ref| 	*36*
|resp| 	*1*
|respStmt| 	*1*
|revisionDesc| 	*1*
|row| 	*71*
|salute| 	*2*
|sic| 	*1*
|signed| 	*3*
|sourceDesc| 	*1*
|table| 	*11*
|TEI.2| 	*1*
|teiHeader| 	*1*
|text| 	*4*
|title| 	*2*
|titlePage| 	*1*
|titlePart| 	*2*
|titleStmt| 	*1*
|xref| 	*4*


From prosfilaes at gmail.com  Sat Oct  6 16:12:13 2007
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 6 Oct 2007 19:12:13 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
Message-ID: <6d99d1fd0710061612r529723fftdc41b7d0c05435df@mail.gmail.com>

On 10/6/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
> Yep, it's basic and quick and dirty.  But it's my fault, not PGTEI.
> I, as the PPer, don't care about fancy title page layout (though you do
> have a point about the translator's name being missing.  That is my
> bad.)  I just wanted a bare minimum that let the reader know what
> he/she was getting.

I don't necessarily care about fancy title page layout; but if you
look at almost any book, almost any webpage, heck almost any movie or
TV show, the biggest, most prominent text is the title, followed by
the author/main artists. We layout poetry so it looks like poetry, we
layout prose so it looks like prose, we need to layout title pages so
they look like title pages.

I also disagree with the text/metadata dictonomy that Jon Noring
brings up here. The title page is text, and as long as we are
precisely recording the rest of the book, I don't see any reason not
to precisely record the title page. If you want to make a pretty
modernized edition, then you're can do whatever you want to the title
page, but that's not what we do here.

> However, this is not the standard.  You can create a manual title page
> (I used the built in title page macro).  It can be as elaborate and
> francy as you want.  *Almost* anything you can do in HTML can be done
> in a manual title page.

You never want to have a macro that shouldn't be used. Either it
should be fixed to look like a title page and use proper English (not
Edition 1!), or it should be deleted so nobody uses it.

> The [Pg xxx] format was chosen at the behest of DP suggestions.  Since
> there are always English parts of a PG texts (see header/footer), it
> doesn't seem like a big deal to me.

The header/footer are more inevitable facts of nature type things, and
they are header and footer, that is outside the text itself.

> >most modern DP HTML texts don't display it so large and loud.
>
> That can be changed by adding a style-sheet to the TEI master.

But the default needs to look sharp, because that is the only thing
95% of our readers will ever see, and I doubt even most of that 5%
will regularly regenerate HTML from TEI masters.

> Honestly, most of the things you don't like are my fault, not the
> TEI's.

To the extent that that is true, and I'm not sure that it entirely is,
even then, if TEI is going to be a success, the books that people look
at in the archives that are it TEI need to look good. Excuses and
blame doesn't matter. Also, what people do most with our books is look
at them and read them, so they have to continue to look good. All the
cool and fancy stuff you can do with TEI isn't going to win anything
if they make people who pick up an HTML versions (and that means the
default style-sheet) wish it hadn't been done with TEI.

From prosfilaes at gmail.com  Sat Oct  6 16:19:30 2007
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 6 Oct 2007 19:19:30 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1977299086.20071006124554@noring.name>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name>
Message-ID: <6d99d1fd0710061619q17716999ia749ba4447ebb40b@mail.gmail.com>

On 10/6/07, Jon Noring <jon at noring.name> wrote:
> I view title pages in books to be essentially metadata, and in some
> cases a work of art to be treated as a graphic. But, in toto, not
> part of the book's textual content.

It may be metadata, but that doesn't stop it from being text.  If we
were making new editions, then that might be another thing, but as
long as we're making faithful copies of old editions, we must preserve
that information, all of that information, in a way that it is
accessible to the end user in HTML, not just the TEI geek.

Table of Contents and such are different, mainly because they can be
automatically regenerated and they aren't so diverse in style and
content.

From prosfilaes at gmail.com  Sat Oct  6 16:26:15 2007
From: prosfilaes at gmail.com (David Starner)
Date: Sat, 6 Oct 2007 19:26:15 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <47080266.20403@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<6d99d1fd0710061119o7bd0e56ax7a96c679d6dc3a33@mail.gmail.com>
	<47080266.20403@perathoner.de>
Message-ID: <6d99d1fd0710061626h3791b588h464a1eac38eaf6c2@mail.gmail.com>

On 10/6/07, Marcello Perathoner <marcello at perathoner.de> wrote:
> If the PPers thought so, they wouldn't have used the divGen macro. If
> they use it, they probably don't think it "looks like crap". As for
> everything else, crap is in the eyes of the beholder.

Which format should be used for Project Gutenberg texts is also in the
eye of the beholder. But somehow you manage to hold and loudly express
your opinion on that. Perhaps PPers should start sending in their
texts in their favorite wordprocessing format; after all, it's all in
the eye of the beholder and what people complain about really doesn't
matter.

From Bowerbird at aol.com  Sat Oct  6 17:06:17 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 6 Oct 2007 20:06:17 EDT
Subject: [gutvol-d] introducing zandbox
Message-ID: <c6d.1910ae02.34397cf9@aol.com>

zandbox -- the z.m.l. sandbox -- is a free authoring tool
that helps authors conveniently format their book in z.m.l.
-- "zen markup language" -- a revolutionary light markup.

zandbox uses a convenient 2-up interface, where you can
edit text on one half of the window, and have it displayed
in its fully-formatted form on the other half of the window,
displaying very much like it'll be shown to the people who
will eventually read your book using the free zml-viewer...

because it helps ensure that your z.m.l. is how you want it,
you can think of zandbox as a "validity-checker" for z.m.l.

if the display-side doesn't look and act the way you want,
you'll know you need to change the text on the edit-side.
the "rules" of z.m.l. are simple, so it's usually obvious from
the display what went wrong, and how to correct the text.

and even before all that, zandbox is a way to _learn_ z.m.l.

from the immediate feedback via the display-side, you will
rapidly learn the rules for formatting text on the edit-side,
and before long you'll be able to create z.m.l. in any editor.

of course, since other editors don't display z.m.l. correctly,
you'll probably want to stick with zandbox to do your z.m.l.
but it's comforting to know that you could use any editor...

to get your preview copy of sandbox, just backchannel me.

the zandbox manual, in progress:
>    http://z-m-l.com/go/zandbox_manual.zml

a z.m.l. "skeleton document", for use with zandbox: 
>    http://z-m-l.com/go/zml_skeleton_book.zml

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071006/93ad1582/attachment.htm 

From joshua at hutchinson.net  Sun Oct  7 17:43:05 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 8 Oct 2007 00:43:05 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <22751892.1191804185436.JavaMail.?@fh1036.dia.cp.net>

>----Original Message----
>From: prosfilaes at gmail.com
>
>On 10/6/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
>
>> That can be changed by adding a style-sheet to the TEI master.
>
>But the default needs to look sharp, because that is the only thing
>95% of our readers will ever see, and I doubt even most of that 5%
>will regularly regenerate HTML from TEI masters.
>

I was unclear.  I mean the style-sheet embedded in the TEI master 
file.  If the PPer changes the "setting" in the master, the resultant 
files that are automatically generated and posted to the archive will 
have that change.

Josh

From ralf at ark.in-berlin.de  Mon Oct  8 00:08:35 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 8 Oct 2007 09:08:35 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4707A109.5070000@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
Message-ID: <20071008070835.GA27881@ark.in-berlin.de>

>   http://www.gnutenberg.de/download/candide/candi.pdf
> [...]
> 
> > And don't give me for one second that all we need to do is translated
> > the appropriate files for TEI-Lite.
> 
> You don't need even that. Just build your own title page and use all the
> words you like. Full unicode support comes included.

That example isn't representative because 'Chapitre' and 'Candide'
which are the only words in the running header don't have accented
characters.


ralf


From ralf at ark.in-berlin.de  Mon Oct  8 00:25:24 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 8 Oct 2007 09:25:24 +0200
Subject: [gutvol-d] full TEI on PG?
Message-ID: <20071008072523.GB27881@ark.in-berlin.de>

This may be a FAQ. Would PG accept files that are a superset
of PGTEI and a subset of TEI? If so, which ending should the
file have to not confuse it with a possible PGTEI file?

Maybe like others, I'm thinking about a Plan B in case there
is no PGTEI 0.5 version. Also, without doubt, any file marked
up with such a set is both worth preserving and publishing.


ralf


From tb at baechler.net  Mon Oct  8 00:55:55 2007
From: tb at baechler.net (Tony Baechler)
Date: Mon, 08 Oct 2007 00:55:55 -0700
Subject: [gutvol-d] PG-E shows text, encoding jumbled
In-Reply-To: <47015BD4.20809@tintazul.com.pt>
References: <mailman.2.1191265202.24007.gutvol-d@lists.pglaf.org>
	<47015BD4.20809@tintazul.com.pt>
Message-ID: <20071008080125.6112B352602@mail1.pglaf.org>

Hello, sorry if this post is irrelevant and/or out of date.  I think 
there's still a problem with PGE as outlined in this email.  Etext 
7384 is from PG US.  Check GUTINDEX.ALL.  There are 7-bit and 8-bit 
files apparently.  I only know English so I know nothing about 
encoding but I see the 8-bit file in the PG US index.

At 09:43 PM 10/1/07 +0100, you wrote:
>The upshot is -- now I can actually follow the link to the e-text 
>page, and I can click to download the document. Dandy! But the 
>encoding is wrong. For instance, in 
><http://pge.rastko.net/etext/7384>http://pge.rastko.net/etext/7384 
>there are utf-7 and iso-8859-1 versions of the Carta da Companhia by 
>Jos? de Anchieta. The guy's name inside the file shows as "Jos+AOk- 
>de Anchieta". It's the same in either encoding.


From walter.van.holst at xs4all.nl  Mon Oct  8 01:13:07 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Mon, 08 Oct 2007 10:13:07 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1977299086.20071006124554@noring.name>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name>
Message-ID: <4709E693.9040900@xs4all.nl>

Jon Noring wrote:
> I view title pages in books to be essentially metadata, and in some
> cases a work of art to be treated as a graphic. But, in toto, not
> part of the book's textual content.

As mostly a consumer of Gutenberg etexts (and occassionally proofing a 
page or two on DP), I have to say that the title pages are often an 
indication of the effort that has been put in to make the etext's 
reading a pleasant experience. An ugly title pages will put readers off. 
  But maybe less readers will be put off by ugly title pages than by 
etexts that are only available as ASCII files.

Regards,

  Walter

From marcello at perathoner.de  Mon Oct  8 02:23:20 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 11:23:20 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <20071008070835.GA27881@ark.in-berlin.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
Message-ID: <4709F708.20604@perathoner.de>

Ralf Stephan wrote:

>>   http://www.gnutenberg.de/download/candide/candi.pdf
>> [...]
>>
>>> And don't give me for one second that all we need to do is translated
>>> the appropriate files for TEI-Lite.
>> You don't need even that. Just build your own title page and use all the
>> words you like. Full unicode support comes included.
> 
> That example isn't representative because 'Chapitre' and 'Candide'
> which are the only words in the running header don't have accented
> characters.

So what? "Accented characters" are no different from other ones.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Mon Oct  8 02:28:08 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 11:28:08 +0200
Subject: [gutvol-d] full TEI on PG?
In-Reply-To: <20071008072523.GB27881@ark.in-berlin.de>
References: <20071008072523.GB27881@ark.in-berlin.de>
Message-ID: <4709F828.9000207@perathoner.de>

Ralf Stephan wrote:

> This may be a FAQ. Would PG accept files that are a superset
> of PGTEI and a subset of TEI? If so, which ending should the
> file have to not confuse it with a possible PGTEI file?

.tei

You cannot confuse them files because PGTEI requires a pointer to the
used DTD in the DOCTYPE.


> Maybe like others, I'm thinking about a Plan B in case there
> is no PGTEI 0.5 version. Also, without doubt, any file marked
> up with such a set is both worth preserving and publishing.

There should be no reservations if the file validates against the full
TEI DTD and you provide at least a plain vanilla TXT version to go along
with it.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Mon Oct  8 02:46:44 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 11:46:44 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709E693.9040900@xs4all.nl>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>	<1977299086.20071006124554@noring.name>
	<4709E693.9040900@xs4all.nl>
Message-ID: <4709FC84.1090709@perathoner.de>

Walter van Holst wrote:

> As mostly a consumer of Gutenberg etexts (and occassionally proofing a 
> page or two on DP), I have to say that the title pages are often an 
> indication of the effort that has been put in to make the etext's 
> reading a pleasant experience. An ugly title pages will put readers off. 

Did you do any research to prove these claims?


Google is the most popular page on the web and look at their "title
page". But maybe they are successful because they put their efforts into
search engine programming and not into cute title page design.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From walter.van.holst at xs4all.nl  Mon Oct  8 02:52:49 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Mon, 08 Oct 2007 11:52:49 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709FC84.1090709@perathoner.de>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>	<1977299086.20071006124554@noring.name>
	<4709E693.9040900@xs4all.nl> <4709FC84.1090709@perathoner.de>
Message-ID: <4709FDF1.7080406@xs4all.nl>

Marcello Perathoner wrote:

>> As mostly a consumer of Gutenberg etexts (and occassionally proofing a 
>> page or two on DP), I have to say that the title pages are often an 
>> indication of the effort that has been put in to make the etext's 
>> reading a pleasant experience. An ugly title pages will put readers off. 
> 
> Did you do any research to prove these claims?
> 
> 
> Google is the most popular page on the web and look at their "title
> page". But maybe they are successful because they put their efforts into
> search engine programming and not into cute title page design.

Ugly versus beautiful is always subject, however, Google's title page is 
too minimalistic to be ugly by most standards.

It is a apples and oranges comparison anyways, people do judge books by 
their covers. Everytime you see the cover or the title page of a new 
book it is something you have to make a new decision about whether it is 
worth your time. In Google's case it is almost always a repeat business. 
After a first experience Google's users _know_ how good it is, so that 
minimalistic title page (which in the end contributes to its usability 
as a _search engine_, mind you, not a _book_) won't put many users off 
anymore.

Regards,

  Walter

From ralf at ark.in-berlin.de  Mon Oct  8 02:36:52 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 8 Oct 2007 11:36:52 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709F708.20604@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
	<4709F708.20604@perathoner.de>
Message-ID: <20071008093652.GA24464@ark.in-berlin.de>

Marcello:
> > That example isn't representative because 'Chapitre' and 'Candide'
> > which are the only words in the running header don't have accented
> > characters.
> 
> So what? "Accented characters" are no different from other ones.

You don't even read those bug reports? A shame. Plan B it will be.

To repeat: Latin-1 characters in the <title>, even coded as HTML entities
like &auml;, garble PDF output in the running header.


ralf


From ralf at ark.in-berlin.de  Mon Oct  8 02:39:12 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 8 Oct 2007 11:39:12 +0200
Subject: [gutvol-d] full TEI on PG?
In-Reply-To: <4709F828.9000207@perathoner.de>
References: <20071008072523.GB27881@ark.in-berlin.de>
	<4709F828.9000207@perathoner.de>
Message-ID: <20071008093912.GB24464@ark.in-berlin.de>

> > This may be a FAQ. Would PG accept files that are a superset
> > of PGTEI and a subset of TEI? If so, which ending should the
> > file have to not confuse it with a possible PGTEI file?
> 
> .tei
> 
> You cannot confuse them files because PGTEI requires a pointer to the
> used DTD in the DOCTYPE.

And what if they are both in the PG directory?

> > Maybe like others, I'm thinking about a Plan B in case there
> > is no PGTEI 0.5 version. Also, without doubt, any file marked
> > up with such a set is both worth preserving and publishing.
> 
> There should be no reservations if the file validates against the full
> TEI DTD and you provide at least a plain vanilla TXT version to go along
> with it.

Of course.


ralf


From joshua at hutchinson.net  Mon Oct  8 05:31:25 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 8 Oct 2007 12:31:25 +0000 (UTC)
Subject: [gutvol-d] full TEI on PG?
Message-ID: <15854003.1191846685163.JavaMail.?@fh1036.dia.cp.net>

The point is that the tei file itself defines what version of TEI it 
was written to, so there is no need to change the file extension.

Josh

>----Original Message----
>From: ralf at ark.in-berlin.de
>Date: Oct 8, 2007 5:39 
>To: "Marcello Perathoner"<marcello at perathoner.de>
>Cc: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] full TEI on PG?
>
>> > This may be a FAQ. Would PG accept files that are a superset
>> > of PGTEI and a subset of TEI? If so, which ending should the
>> > file have to not confuse it with a possible PGTEI file?
>> 
>> .tei
>> 
>> You cannot confuse them files because PGTEI requires a pointer to 
the
>> used DTD in the DOCTYPE.
>
>And what if they are both in the PG directory?
>
>> > Maybe like others, I'm thinking about a Plan B in case there
>> > is no PGTEI 0.5 version. Also, without doubt, any file marked
>> > up with such a set is both worth preserving and publishing.
>> 
>> There should be no reservations if the file validates against the 
full
>> TEI DTD and you provide at least a plain vanilla TXT version to go 
along
>> with it.
>
>Of course.
>
>
>ralf
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From traverso at posso.dm.unipi.it  Mon Oct  8 05:50:31 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Mon,  8 Oct 2007 14:50:31 +0200 (CEST)
Subject: [gutvol-d] PG-E shows text, encoding jumbled
In-Reply-To: <20071008080125.6112B352602@mail1.pglaf.org> (message from Tony
	Baechler on Mon, 08 Oct 2007 00:55:55 -0700)
References: <mailman.2.1191265202.24007.gutvol-d@lists.pglaf.org>
	<47015BD4.20809@tintazul.com.pt>
	<20071008080125.6112B352602@mail1.pglaf.org>
Message-ID: <20071008125031.58D5E93B62@posso.dm.unipi.it>

>>>>> "Tony" == Tony Baechler <tb at baechler.net> writes:

    Tony> Hello, sorry if this post is irrelevant and/or out of date.
    Tony> I think there's still a problem with PGE as outlined in this
    Tony> email.  Etext 7384 is from PG US.  Check GUTINDEX.ALL.
    Tony> There are 7-bit and 8-bit files apparently.  I only know
    Tony> English so I know nothing about encoding but I see the 8-bit
    Tony> file in the PG US index.

The problem is at PG US, the file is encoded in UTF-7, that is
obsolete, but the zip file is labeled iso-8859-1, that is wrong.

There is (of course) no 7-bit txt file, that does not make sense in
portuguese. 

Carlo


From marcello at perathoner.de  Mon Oct  8 06:07:45 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 15:07:45 +0200
Subject: [gutvol-d] full TEI on PG?
In-Reply-To: <20071008093912.GB24464@ark.in-berlin.de>
References: <20071008072523.GB27881@ark.in-berlin.de>	<4709F828.9000207@perathoner.de>
	<20071008093912.GB24464@ark.in-berlin.de>
Message-ID: <470A2BA1.20507@perathoner.de>

Ralf Stephan wrote:

>>> This may be a FAQ. Would PG accept files that are a superset
>>> of PGTEI and a subset of TEI? If so, which ending should the
>>> file have to not confuse it with a possible PGTEI file?
>> .tei
>>
>> You cannot confuse them files because PGTEI requires a pointer to the
>> used DTD in the DOCTYPE.
> 
> And what if they are both in the PG directory?

You mean doing the same book both in PGTEI *and* in "full" TEI?

We have posted books with two different PDF versions, so posting a book
with two different TEI versions should not be impossible.

Before doing that you should make sure you absolutely need those extra
tags and check if the PGTEI converter actually chokes on them. The
converter quietly drops unknown tags and this should output the Right
Thing in most cases.

Also, to save work, you should use a transform to go from full TEI to
PGTEI just tweaking the extra tags.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Mon Oct  8 07:32:46 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 08:32:46 -0600
Subject: [gutvol-d] Separation of the "master" from the "reader" versions
In-Reply-To: <4709E693.9040900@xs4all.nl>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
Message-ID: <1868761478.20071008083246@noring.name>

Walter wrote:
> Jon Noring wrote:

>> I view title pages in books to be essentially metadata, and in some
>> cases a work of art to be treated as a graphic. But, in toto, not
>> part of the book's textual content.

> As mostly a consumer of Gutenberg etexts (and occassionally proofing a
> page or two on DP), I have to say that the title pages are often an 
> indication of the effort that has been put in to make the etext's 
> reading a pleasant experience. An ugly title pages will put readers off.
>   But maybe less readers will be put off by ugly title pages than by 
> etexts that are only available as ASCII files.

Hmmmm.

This answer sort of represents the older paradigm thinking where the
"master" is also the "display" copy. Back when Michael started PG
there were various technical reasons why this made sense. But not in
2007. Yet that mode of thinking still permeates our thinking today.

So, we must not confuse mastering with end-user formatting.

The purpose of this discussion is to explore the concept of a PG
"master", a single format, or a small number of compatible formats,
from which all the readable versions are derived by standardized
conversion processes (preferably all automated but for a few end
formats may require some human intervention.)

That is, we don't care if the "master" is itself in a form ready for
the end-user to *directly* enjoy [endnote]. Rather, the focus is on
requirements so that the "master" contains the most important stuff
(accurate content, document structure, metadata, etc.) so as to allow
conversion to virtually any format, both digital and material.

When we look at it this way, the only purpose of carefully trying to
reproduce not only the content of a title page (which is NOT the
content of the Work's Expression itself -- a Title Page contains
mostly to completely metadata which the *publisher*, not the *author*
produced) is to aid those here who produce reading versions which are
semi-exact and exact facsimiles of the original source book.

(It should be clear that I am not hostile to those who wish to produce
digital versions which are semi- to full-facsimiles of the original --
It's just that such facsimile versions are *derivative* end-user
formats, not masters.)

So long as we have the original page scans sitting along-side the
"master" format, those who want to produce a semi-facsimile (usually
HTML) or facsimile copy (e.g. PDF) should put in the work to format
the title page so it looks like it originally did. This is work that is
unnecessary for the mastering process since for most digital formats
that facsimile title page markup can't be used anyway (after all, it
is NOT part of the authored content) and has to be tossed aside.
Should the "mastering team" put in that extra effort, especially for
books where those who build facsimile editions may decide to pass on
the book?

This separation of mastering from producing facsimiles now frees up
those creating facsimile versions to not try to thread the needle of
creating title pages which are both "repurposeable" (which as just
noted is a losing proposition -- many formats require special Title
Pages built from the metadata, e.g. MS Reader LIT, so all that hard
work will be simply get tossed aside) and which are formatted to
render very much like the original. (Note, too, that so long as we
have original page scans, the title page scan can be viewed by the
end-user as a graphic, and this is about as original as one can get.)

Now, I am intrigued with the "Facsimile Team" taking the Digital
Master and producing an SVG document which reproduces the original
title page. This SVG document would not be part of the master, but
could go into the repository to sit along side the master, graphics
images, page scans, and various end-user derivatives. SVG support is
slowly becoming ubiquitous in web browsers.


Jon Noring


[endnote: Of course, Bowerbird would say that we can have both
mastering and direct readability of the master -- his ZML. And that is
true. The issue is not whether ZML can be used as a mastering format
-- it can -- but whether it sufficiently meets the various requirements
PG/DP needs in a mastering format. I don't believe it does.]


From jon at noring.name  Mon Oct  8 08:11:57 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 09:11:57 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709FC84.1090709@perathoner.de>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
Message-ID: <1659264922.20071008091157@noring.name>

Marcello wrote:
> Walter van Holst wrote:

>> As mostly a consumer of Gutenberg etexts (and occassionally proofing a 
>> page or two on DP), I have to say that the title pages are often an 
>> indication of the effort that has been put in to make the etext's 
>> reading a pleasant experience. An ugly title pages will put readers off. 

> Did you do any research to prove these claims?
>
> Google is the most popular page on the web and look at their "title
> page". But maybe they are successful because they put their efforts into
> search engine programming and not into cute title page design.

One can certainly take a book's title page metadata (title, creators,
contributors, etc.) and build a beautiful, standardized "title page",
such as to use for the web presentation version.

Now, this title page won't be a facsimile, but it will be attractive
and appealing. If one wants to show the reader how the original title
page looked, simply provide a link to the original page scan image (or
embed it directly). One can't get more exact than that.

And the "Facsimile Edition Team" (FET) can of course produce a true
facsimile of the original title page, either in SVG or an elaborately
styled XHTML+CSS. But re-creating the original title page down to the
last serif in the TEI master should not be included as part of the
mastering process. Rather, focus on the metadata to assure it is
complete so that platform-optimized title pages can be auto-generated.

Jon Noring


From lee at novomail.net  Mon Oct  8 09:16:57 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 08 Oct 2007 10:16:57 -0600
Subject: [gutvol-d] full TEI on PG?
In-Reply-To: <4709F828.9000207@perathoner.de>
References: <20071008072523.GB27881@ark.in-berlin.de>
	<4709F828.9000207@perathoner.de>
Message-ID: <470A57F9.7090806@novomail.net>

Marcello Perathoner wrote:

> Ralf Stephan wrote:

[snip]

>> Maybe like others, I'm thinking about a Plan B in case there is no
>> PGTEI 0.5 version. Also, without doubt, any file marked up with
>> such a set is both worth preserving and publishing.

Ostensibly, PGTEI is not a superset of TEI, but simply a refinement of
it (some attribute values which are left undefined in TEI are defined in
PGTEI). Thus, every valid TEI file is also a valid PGTEI file (it simply 
doesn't take advantage of the attribute set defined by PGTEI) and vice 
versa.

> There should be no reservations if the file validates against the
> full TEI DTD and you provide at least a plain vanilla TXT version to
> go along with it.

It should, by now, be well established that Mr. Hart and the PGPTB are 
strongly opposed to the establishment of any file format as the 
"preferred" format, regardless of its capabilities. If you look 
carefully at the PG FAQ you will not that while an ASCII text version is 
requested, it is not required. Thus, you should be able to submit a 
valid TEI file to PG, and no other format. Those people who want a 
degraded text version can derive it from the TEI file just as those 
people who want an RTF version can do so.


-- 
Nothing of significance below this line.


From prosfilaes at gmail.com  Mon Oct  8 10:59:18 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 13:59:18 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709FC84.1090709@perathoner.de>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
Message-ID: <6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>

On 10/8/07, Marcello Perathoner <marcello at perathoner.de> wrote:
> Walter van Holst wrote:
>
> > As mostly a consumer of Gutenberg etexts (and occassionally proofing a
> > page or two on DP), I have to say that the title pages are often an
> > indication of the effort that has been put in to make the etext's
> > reading a pleasant experience. An ugly title pages will put readers off.
>
> Did you do any research to prove these claims?

Did you do any research to disprove these claims? Maybe just about
every serious publisher in the world is wasting time spending a lot of
time making title pages, but I'd be willing to be bet they have done
research.

> Google is the most popular page on the web and look at their "title
> page". But maybe they are successful because they put their efforts into
> search engine programming and not into cute title page design.

Look at <http://www.msn.com> and <http://www.yahoo.com> and
<http://www.weather.com>. You think it's a coincidence that
<http://www.google.com> looks very little like them, that's it's
laziness?

I use Google because they have cute title page design, because they
have title page design that's attractive to the eye (uses colors,
appropriately centered) and doesn't clutter up the page with a lot of
junk. That's good title page design, and it didn't come by magic or
luck.

From joshua at hutchinson.net  Mon Oct  8 11:06:57 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 8 Oct 2007 18:06:57 +0000 (UTC)
Subject: [gutvol-d] full TEI on PG?
Message-ID: <29931807.1191866817909.JavaMail.?@fh1036.dia.cp.net>

Well, yes and no.  The FAQ does not say it is required ... but none of 
the whitewashers will post it without a text file.  You'd have to go 
through Greg Newby and get a special dispensation from on high.  :)  
And there has to be a "read good reason" to not post a text version.

That being said, the part about any TEI file working is pretty much 
correct.

Josh

>----Original Message----
>From: lee at novomail.net
>Date: Oct 8, 2007 12:16 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] full TEI on PG?
>
>Marcello Perathoner wrote:
>
>> Ralf Stephan wrote:
>
>[snip]
>
>>> Maybe like others, I'm thinking about a Plan B in case there is no
>>> PGTEI 0.5 version. Also, without doubt, any file marked up with
>>> such a set is both worth preserving and publishing.
>
>Ostensibly, PGTEI is not a superset of TEI, but simply a refinement 
of
>it (some attribute values which are left undefined in TEI are defined 
in
>PGTEI). Thus, every valid TEI file is also a valid PGTEI file (it 
simply 
>doesn't take advantage of the attribute set defined by PGTEI) and 
vice 
>versa.
>
>> There should be no reservations if the file validates against the
>> full TEI DTD and you provide at least a plain vanilla TXT version 
to
>> go along with it.
>
>It should, by now, be well established that Mr. Hart and the PGPTB 
are 
>strongly opposed to the establishment of any file format as the 
>"preferred" format, regardless of its capabilities. If you look 
>carefully at the PG FAQ you will not that while an ASCII text version 
is 
>requested, it is not required. Thus, you should be able to submit a 
>valid TEI file to PG, and no other format. Those people who want a 
>degraded text version can derive it from the TEI file just as those 
>people who want an RTF version can do so.
>
>
>-- 
>Nothing of significance below this line.
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From prosfilaes at gmail.com  Mon Oct  8 11:10:45 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 14:10:45 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1659264922.20071008091157@noring.name>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
	<1659264922.20071008091157@noring.name>
Message-ID: <6d99d1fd0710081110l29ab6d60q66d06fd5cc5201b7@mail.gmail.com>

On 10/8/07, Jon Noring <jon at noring.name> wrote:
> But re-creating the original title page down to the
> last serif in the TEI master should not be included as part of the
> mastering process.

I think this is a strawman here. I'm not, nor is anyone else here I've
read, asking for the original title page being reproduced down to the
last serif. I would like for the original text of the title page to be
preserved just like the original text anywhere else in the book. I
would happily settle for a decent looking title page that preserves
and displays the basic information every title page has, combined with
the original edition information that good scholarly editions (for
Dover to the Library of America) never forget to include, all in a
nice sharp neutral package. (And once again, neutral means it looks
like everything else out there, not just throw it on the page.)

From Bowerbird at aol.com  Mon Oct  8 11:25:57 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 8 Oct 2007 14:25:57 EDT
Subject: [gutvol-d] what's all the fuss about title pages?
Message-ID: <d4d.fd415c7.343bd035@aol.com>

what's all the fuss about title pages?

just _center_ the text on the title page,
and do a few other small adjustments,
and david starner will be happy with it...

-bowerbird

p.s.   i've never seen a professionally-made
book that didn't have its title page centered.


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071008/93dcbe8f/attachment-0001.htm 

From jon at noring.name  Mon Oct  8 11:30:35 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 12:30:35 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
	<6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
Message-ID: <294960705.20071008123035@noring.name>

David Starner wrote:
> Marcello Perathoner wrote:

>> Did you do any research to prove these claims?

> Did you do any research to disprove these claims? Maybe just about
> every serious publisher in the world is wasting time spending a lot of
> time making title pages, but I'd be willing to be bet they have done
> research.

For final end-user viewing of a PG/DP book, certainly a nice title
page improves the reading experience.

But such a title page can be auto-generated from the book's
title-page-related metadata, and presented to the end-user in a
form which is beautiful, standardized, and optimized for the target
platform. It will also "brand" the book. And now with swappable style
sheets, there's even the intriguing ability to let the end-user choose
the CSS formatting for the whole book, and not be constrained by the
styling someone chose for them.

Most of the modern paperbook reissues of public domain books that
I've seen have redesigned title pages that don't look anything like
the title pages in the "original" (pre-1923) books. Why? Because the
title page does not contain (other than the occasional dedication or
epigraph) authorial content -- it is a publisher device to present the
Work to the reader -- the content (the "text") in a title page is NOT
part of the Work itself -- it is not part of the Work's content.

Thus, I see no reason for digital text mastering to laboriously try to
exactly format the title page. It's a waste of the digital text
master's time and makes the markup more complex, and produces something
that is of interest *only* to those who wish to produce some sort of
facsimile edition. (And again, make sure the end-user has access to
the original title page scan.)

Now, if the "Facsimile Editions Team" wishes to take the digital text
master and create semi- to full-facsimile reader versions, all the
power to them! That's great! And now that team has free rein to produce
the title page anyway they want for a particular book: SVG, LaTeX, PDF,
RTF, XHTML+CSS -- whatever works best for them without having to worry
about what they produce being repurposeable or accessible or whatever
(since facsimile versions are for visual presentation only.)

Jon Noring


From marcello at perathoner.de  Mon Oct  8 11:33:10 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 20:33:10 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <4709FDF1.7080406@xs4all.nl>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>	<1977299086.20071006124554@noring.name>	<4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de> <4709FDF1.7080406@xs4all.nl>
Message-ID: <470A77E6.5020301@perathoner.de>

Walter van Holst wrote:

> It is a apples and oranges comparison anyways, people do judge books by 
> their covers.

Some people surely do. But by different standards.

I personally prefer "ugly" web sites because experience has shown their
contents to be more informative. Every author has only a finite amount
of time, if more time goes into the formatting, less time will go into
the contents, and vice versa.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Mon Oct  8 11:35:43 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 14:35:43 -0400
Subject: [gutvol-d] Separation of the "master" from the "reader" versions
In-Reply-To: <1868761478.20071008083246@noring.name>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<1868761478.20071008083246@noring.name>
Message-ID: <6d99d1fd0710081135h4275354bpe9a079ed79d4fec7@mail.gmail.com>

On 10/8/07, Jon Noring <jon at noring.name> wrote:
> Walter wrote:
> > Jon Noring wrote:
>
> >> I view title pages in books to be essentially metadata, and in some
> >> cases a work of art to be treated as a graphic. But, in toto, not
> >> part of the book's textual content.
>
> > As mostly a consumer of Gutenberg etexts (and occassionally proofing a
> > page or two on DP), I have to say that the title pages are often an
> > indication of the effort that has been put in to make the etext's
> > reading a pleasant experience. An ugly title pages will put readers off.
> >   But maybe less readers will be put off by ugly title pages than by
> > etexts that are only available as ASCII files.
>
> Hmmmm.
>
> This answer sort of represents the older paradigm thinking where the
> "master" is also the "display" copy.

Jon, are you listening? The issue is, the title page of the display
copy, the HTML edition, looks like crap, and that will turn off
readers.

> The purpose of this discussion is to explore the concept of a PG
> "master", a single format,

The point of my comments in this discussion is that all the pie in the
sky winnings aren't going to get people to work with if what they see
looks terrible.

> When we look at it this way, the only purpose of carefully trying to
> reproduce not only the content of a title page (which is NOT the
> content of the Work's Expression itself -- a Title Page contains
> mostly to completely metadata which the *publisher*, not the *author*
> produced) is to aid those here who produce reading versions which are
> semi-exact and exact facsimiles of the original source book.

I don't make this distinction between author and publisher that you
do. I want to capture books that drove generations wild. The
generations didn't read the original manuscript, they read the book.
That's what I want to capture, that book, no matter what the publisher
did. Furthermore, your line between publisher and author doesn't make
a whole lot sense for a lot of material, material that has passed
through many hands beside the author, be it committee-produced or
anthologies. or on the flip side material that was produced entirely
by the author, title page included.

Not only that, changing and reformatting this information is
inherently a lossy process; if we record it exactly as it was in the
original book, we preserve that information in a lossless manner.

And more to the point, I believe we should be doing as little editing
as possible. Creating a new title page is like updating the spelling;
it's changing the original volume into a more modern form. Updating
and modernizing is not our job; we are merely ideal scribes, copying
the original works.

From marcello at perathoner.de  Mon Oct  8 11:41:52 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 20:41:52 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>	<1977299086.20071006124554@noring.name>
	<4709E693.9040900@xs4all.nl>	<4709FC84.1090709@perathoner.de>
	<6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
Message-ID: <470A79F0.3020709@perathoner.de>

David Starner wrote:
> On 10/8/07, Marcello Perathoner <marcello at perathoner.de> wrote:
>> Walter van Holst wrote:
>>
>>> As mostly a consumer of Gutenberg etexts (and occassionally proofing a
>>> page or two on DP), I have to say that the title pages are often an
>>> indication of the effort that has been put in to make the etext's
>>> reading a pleasant experience. An ugly title pages will put readers off.
>> Did you do any research to prove these claims?
> 
> Did you do any research to disprove these claims?

In my world the person who makes a claim that has to prove it.

Also, there are two claims made. The one you didn't spot is that the
amount of work gone into the text is directly proportional to the amount
of work gone into the title page. This is speculative at best.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Mon Oct  8 11:45:05 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 14:45:05 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <294960705.20071008123035@noring.name>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
	<6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
	<294960705.20071008123035@noring.name>
Message-ID: <6d99d1fd0710081145y38e6fa9dq5024ea84f32b58f6@mail.gmail.com>

On 10/8/07, Jon Noring <jon at noring.name> wrote:
> But such a title page can be auto-generated from the book's
> title-page-related metadata, and presented to the end-user in a
> form which is beautiful, standardized, and optimized for the target
> platform.

Great. Then let's see it done. I'm tired of hearing about theory, Jon.
I wouldn't have brought it up if we were looking beautiful title
pages. But we've all heard huge promise made for things that never
worked out. I'm not going to work on TEI, nor will I encourage others
to, until it actually works right.

From joshua at hutchinson.net  Mon Oct  8 11:50:09 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 8 Oct 2007 18:50:09 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <6947627.1191869409300.JavaMail.?@fh1036.dia.cp.net>

Ok, everyone, David has a point.  A very GOOD point.

Master formats are great and all, but the final output has to look 
nice or no one will want to use it.

Let's table this part of the discussion for a few days.  I'll try to 
fix up some examples of nice looking title pages and post them up.  
Hopefully, we can get some consensus on whether TEI can do the job (and 
some feedback on how hard it was/wasn't from me).

Josh

>----Original Message----
>From: prosfilaes at gmail.com
>
>Great. Then let's see it done. I'm tired of hearing about theory, 
Jon.
>I wouldn't have brought it up if we were looking beautiful title
>pages. But we've all heard huge promise made for things that never
>worked out. I'm not going to work on TEI, nor will I encourage others
>to, until it actually works right.


From jon at noring.name  Mon Oct  8 12:18:08 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 13:18:08 -0600
Subject: [gutvol-d] Separation of the "master" from the "reader" versions
In-Reply-To: <6d99d1fd0710081135h4275354bpe9a079ed79d4fec7@mail.gmail.com>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<1868761478.20071008083246@noring.name>
	<6d99d1fd0710081135h4275354bpe9a079ed79d4fec7@mail.gmail.com>
Message-ID: <44282966.20071008131808@noring.name>

David Starner wrote:

> Updating and modernizing is not our job; we are merely ideal scribes,
> copying the original works.


Well, we do agree on some aspects of this last point David made.

The digital text master (DTM) should reproduce the original textual
content in the source book as much as possible, including authors'
and printers' errors.

The nice thing about XML is that we can add markup in the DTM pointing
to corrections to clearly known errors (which over time can expand),
then let those who create reader editions to decide how they want to
tweak the text.

And I also agree 120% with Bowerbird on the need for the DTM to
include the exact point of line breaks (including mid-word) and page
breaks in the source book. In XML this would be done by markup
specific for this purpose. This is one of the few "original typographic
presentation" items I would include in the DTM markup.

Why? For at least three reasons (if you think of other reasons, let
us know!):

1) Alignment of the text with the source book for future proofing and
   other unforeseen needs, and

2) For page breaks to know the original page number associated with a
   piece of content so existing references by page number can be
   perfectly pointed to that piece of content, and

3) To aid in the production of *perfect* facsimiles for those who wish
   to do so.

   Perfect facsimiles (PF) require access to the original page scans
   *anyway*, and there's a lot of items we need not record in the DTM
   since they can be seen in the PF and with relatively little work
   implemented in the PF. However, it would take a whole lot of work to
   put back the exact points of line and page breaks if those were
   stripped away in the DTM XML version. I mean, major big time work.
   (To cite an example: Putting the first word of every chapter in small
   caps is pretty trivial to do, but reinserting many thousands of line
   breaks, including breaking words, and hundreds of page breaks, is
   downright a lot of work.)


The bottom line is that I see the DTM embodying the most critical
information in the book which requires the most work to capture
correctly (perfect text to the original, preserving line and page
breaks), and information needed for general repurposability and
accessibility. In addition, I see that the DTM *must* have associated
with it the original page scans, which is a huge problem with the PG
collection as it now stands.

For those Works where a perfect facsimile (PF) is called for (and many
works simply don't need this level of digital reproduction), those
interested can take the DTM+scans and with relatively little work
create a PF in a format optimized for that purpose, such as PDF,
LaTeX, SVG, or whatever.

The same argument applies for those wanting to create a "semi-facsimile",
which would be, for example, an XHTML document that significantly
captures the typographic flavor of the original but is not concerned
with exact line breaks.

These are my thoughts. What do the others think?

Jon Noring


From jon at noring.name  Mon Oct  8 12:35:23 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 13:35:23 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6947627.1191869409300.JavaMail.?@fh1036.dia.cp.net>
References: <6947627.1191869409300.JavaMail.?@fh1036.dia.cp.net>
Message-ID: <49101967.20071008133523@noring.name>

Josh wrote:

> Ok, everyone, David has a point.  A very GOOD point.
>
> Master formats are great and all, but the final output has to look 
> nice or no one will want to use it.
>
> Let's table this part of the discussion for a few days.  I'll try to 
> fix up some examples of nice looking title pages and post them up.  
> Hopefully, we can get some consensus on whether TEI can do the job (and
> some feedback on how hard it was/wasn't from me).

Definitely cobble up some XHTML title pages! It is a useful exercise.

But do keep in mind the alternative to simply not encode the title page
as markup in the TEI "master" but move the information to the metadata
section. Then when the XHTML derivative is created move the information
from the metadata tags to the XHTML markup of the "title page".

(Do note that if one builds XHTML markup for the title page, it should
be optimized so it will look nice on all browsers for all platforms,
including handhelds! It would not surprise me if some of the "facsimile"
XHTML produced over at DP does not render well on smaller devices. It
is also important that the title page is "readable" when all CSS styling
is removed, so this requires the use of the h1-to-h6 tags, etc. This
is also useful for accessibility.)

Btw, this is what is being planned for BookX, and I've talked with a
couple script experts (considering hiring them), and it is pretty
trivial to do. In some cases such a transformation can be done with
XSLT. This is in reply to David Starner who asked this be
demonstrated, which is a reasonable request. But in this case I
think the script people here will agree with me in that the markup for
a standardized XHTML "title page" can be autogenerated from metadata
information in TEI. There are times when demonstration is needed to
prove something, and times when it is not. This is one time it is not
needed -- it can be done, it's just a matter whether or not this is
the best way to do it.

Jon Noring


From walter.van.holst at xs4all.nl  Mon Oct  8 12:58:31 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Mon, 08 Oct 2007 21:58:31 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6d99d1fd0710081145y38e6fa9dq5024ea84f32b58f6@mail.gmail.com>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>	<1977299086.20071006124554@noring.name>
	<4709E693.9040900@xs4all.nl>	<4709FC84.1090709@perathoner.de>	<6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>	<294960705.20071008123035@noring.name>
	<6d99d1fd0710081145y38e6fa9dq5024ea84f32b58f6@mail.gmail.com>
Message-ID: <470A8BE7.7090902@xs4all.nl>

David Starner wrote:

> Great. Then let's see it done. I'm tired of hearing about theory, Jon.
> I wouldn't have brought it up if we were looking beautiful title
> pages. But we've all heard huge promise made for things that never
> worked out. I'm not going to work on TEI, nor will I encourage others
> to, until it actually works right.

I'd be willing to put effort in adding proper CSS formatting to TEI 
texts or even adding TEI mark-up if there was a (preferably web-based) 
proper tool for that. The current crop of XML editors I've tried so far 
were way too arcane for a casual user like me.

Regards,

  Walter

From joshua at hutchinson.net  Mon Oct  8 13:05:04 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 8 Oct 2007 20:05:04 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>

>----Original Message----
>From: jon at noring.name
>
>But do keep in mind the alternative to simply not encode the title 
page
>as markup in the TEI "master" but move the information to the 
metadata
>section. Then when the XHTML derivative is created move the 
information
>from the metadata tags to the XHTML markup of the "title page".
>

No, that is the result that is causing the complaint (the title page 
macro just populates from the meta data).  Granted I might be able to 
fix some of the complaints directly in the meta data (ie, Change 
"Edition 1" to something like "First PG Edition" or some such) but some 
of it needs to be handled at the level of manually controlling the 
title page.

Josh

From jon at noring.name  Mon Oct  8 13:19:38 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 14:19:38 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470A8BE7.7090902@xs4all.nl>
References: <32420778.1191687067584.JavaMail.?@fh1035.dia.cp.net>
	<1977299086.20071006124554@noring.name> <4709E693.9040900@xs4all.nl>
	<4709FC84.1090709@perathoner.de>
	<6d99d1fd0710081059s6b296329n9a5cf7245cb6585b@mail.gmail.com>
	<294960705.20071008123035@noring.name>
	<6d99d1fd0710081145y38e6fa9dq5024ea84f32b58f6@mail.gmail.com>
	<470A8BE7.7090902@xs4all.nl>
Message-ID: <597714017.20071008141938@noring.name>

Walter wrote:
> David Starner wrote:

>> Great. Then let's see it done. I'm tired of hearing about theory, Jon.
>> I wouldn't have brought it up if we were looking beautiful title
>> pages. But we've all heard huge promise made for things that never
>> worked out. I'm not going to work on TEI, nor will I encourage others
>> to, until it actually works right.

> I'd be willing to put effort in adding proper CSS formatting to TEI 
> texts or even adding TEI mark-up if there was a (preferably web-based)
> proper tool for that. The current crop of XML editors I've tried so far
> were way too arcane for a casual user like me.

I've always been intrigued with using CSS to directly view TEI
documents in browsers, if for nothing else as a means of markup
visualization for editing purposes.

Of course, the problem with TEI is that it is not wholly compatible
with CSS. For example, TEI may include an inline <note> element which
itself can contain a whole document. In the absence of CSS, browsers
will simply leave it inline and the main flow of text becomes jumbled
(the web paradigm never included an inline <note> tag intended to be
yanked out of the flow and presented elsewhere.) CSS can float such
notes to the side, but that CSS only works in Opera and Firefox, not
IE -- and it is "messy".

Then there are differences between the TEI table model and the
HTML-CSS table model.

What is intriguing, though, and appears workable, is to NOT include
markup in the body for a title page, but to use the metadata section
at the front of TEI documents, combined with CSS, to make a "title
page". I did this very thing for the BookX vocabulary. Here's a BookX
example using silly (and not very pretty) CSS strictly for document
visualization purposes -- unfortunately it only really works for Opera
and Firefox -- IE barfs on the CSS (it might be possible to tweak the
CSS to make it work in IE, but not sure):

   http://www.openreader.org/myantonia/BookX/myantonia-bookx.xml

Look at the source of this XML document -- the metadata used for
generating the "title page" is located in the <bookinfo> section.

Note that the "title page" is created from the "metadata" using CSS.
It might be possible to do likewise from the TEI metadata provided it
is properly ordered (a restriction). With a script approach to
generate XHTML from TEI, the metadata order is not important.

Jon Noring


From jon at noring.name  Mon Oct  8 13:24:19 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 14:24:19 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
Message-ID: <589228901.20071008142419@noring.name>

Josh wrote:

> No, that is the result that is causing the complaint (the title page 
> macro just populates from the meta data).  Granted I might be able to 
> fix some of the complaints directly in the meta data (ie, Change 
> "Edition 1" to something like "First PG Edition" or some such) but some
> of it needs to be handled at the level of manually controlling the 
> title page.

What kind of complaints?

By taking the metadata, one should be able to do just about anything
with it, including building standardized title pages.

Can you give us one or two examples of TEI metadata that leads to
title pages leading to complaints?

Thanks.

Jon Noring


From marcello at perathoner.de  Mon Oct  8 14:34:28 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 23:34:28 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <20071008093652.GA24464@ark.in-berlin.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
	<4709F708.20604@perathoner.de>
	<20071008093652.GA24464@ark.in-berlin.de>
Message-ID: <470AA264.2070908@perathoner.de>

Ralf Stephan wrote:

> Marcello:
>>> That example isn't representative because 'Chapitre' and 'Candide'
>>> which are the only words in the running header don't have accented
>>> characters.
>> So what? "Accented characters" are no different from other ones.
> 
> You don't even read those bug reports? A shame. Plan B it will be.
> 
> To repeat: Latin-1 characters in the <title>, even coded as HTML entities
> like &auml;, garble PDF output in the running header.

We were talking about title pages, which do "accented characters" quite
well. Candide is an example of a formatted title page.


You are talking about a bug you reported in the page headers in the PDF
output. I'm trying to fix it, but I have to work around some LaTeX
internals. (TeX is not unicode compatible. Version 0.5 will probably not
use TeX any more if they don't implement full unicode support by then.)

I can put in a quick work around so the book title doesn't automatically
get assigned to the PDF left page headers. But that will change existing
PDFs.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Mon Oct  8 14:44:07 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 08 Oct 2007 23:44:07 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <589228901.20071008142419@noring.name>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
	<589228901.20071008142419@noring.name>
Message-ID: <470AA4A7.4010908@perathoner.de>

Jon Noring wrote:

> By taking the metadata, one should be able to do just about anything
> with it, including building standardized title pages.

Bonjour. That's what we *are* doing.


> Can you give us one or two examples of TEI metadata that leads to
> title pages leading to complaints?

The complaint so far is that the title page doesn't use the font sizes
and text alignment the plaintiff likes best.

Also there is confusion between the ebook title page, which contains PG
metadata and PG edition and publication date, and the title page of the
paper book. IMO they are different entites. You can provide one or both
of them, but at the discretion of the PPer.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Mon Oct  8 15:23:27 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 18:23:27 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AA4A7.4010908@perathoner.de>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
	<589228901.20071008142419@noring.name>
	<470AA4A7.4010908@perathoner.de>
Message-ID: <6d99d1fd0710081523l5776b6dcvb4080599b1cac94e@mail.gmail.com>

On 10/8/07, Marcello Perathoner <marcello at perathoner.de> wrote:
> The complaint so far is that the title page doesn't use the font sizes
> and text alignment the plaintiff likes best.

The complaint is that the title page doesn't look anything like any
title page I have ever seen, be it paper book, web page, or
hand-created ebook.  I can pull books off my shelves, books printed in
Japanese, in Russian, in Esperanto, in English, in Romanian, books
printed in Japan, Brazil, England, the Soviet Union, the US, books
printed in 2007, books printed in 1831, mass-market paperbacks,
expensive math books. The title pages look pretty similar, and nothing
like the one your macro generates.

From lee at novomail.net  Mon Oct  8 16:44:39 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 08 Oct 2007 17:44:39 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <470AC0E7.1010208@novomail.net>

joshua at hutchinson.net wrote:

>  Is one camp right and the other wrong?

Sure ;-). The problem is deciding which is which.

You see, the whole meaning of "right" and "wrong" requires a standard. 
In the case of morality, right and wrong are defined in relationship to 
the word of God (however you conceive of Her). In the case of legality, 
right and wrong are defined in relationship to the law of the land 
(English common law is an interesting hybrid that sort of straddles 
legality and morality). In the case of democracy, right and wrong are 
defined in relationship to the opinion of the majority. In the case of 
logical positivism right and wrong are defined in relationship to 
whatever provides the greatest good for the greatest number. In the case 
of textual markup, you can't decide which camp is "right" and which is 
"wrong" until you have determined what the standard is against which 
these competing philosophies are to be judged.

>  Is it necessary to have one camp or the other "win"?

When you ask if one camp must "win" you imply that the other must 
"lose." As you pointed out TEI can serve both camps at least as well as 
any other solution, with the possible exception of page images for the 
WYSIWYG camp. (You suggest that camp 2 can accommodate camp 1, but that 
"it's just much more work." I disagree. It's only slightly more work.)

Thus, if the "content" camp wins, the WYSIWYG camp wins too; there is no 
loser. On the other hand, purely presentational markup is patently 
unsuited for virtually any purpose other than use by a fully-functional 
human (the visually impaired, even the myopic, need not apply). Thus, if 
the WYSIWYG camp wins, the "content" camp loses.

I don't think it's necessary to have one camp "win." Rather I think it's 
important that neither camp lose.

>  Can both be adequately served?

Only by the adoption of structural/semantic markup.

>  Is it worth the effort to TRY to serve both camps?

It depends on how much of an altruist you are.

In psychology there is the concept of "projection." In the classic 
formulation, projection is a defense mechanism whereby one "projects" 
one's own undesirable thoughts, motivations, desires, and feelings onto 
someone else (http://en.wikipedia.org/wiki/Psychological_projection). I 
believe that projection extends beyond /undesirable/ thoughts and 
motivations, and usually includes /all/ thoughts, motivations, etc. This 
more inclusive formulation of projection can still be considered a 
defense mechanism due to the common, pervasive desire of humans to be 
considered part of the norm. If I believe something as innocuous as 
"blue is a cool color" (in the temperature sense, not in the social 
acceptability sense) I will tend to believe that everyone else thinks 
blue is a cool color as well.

The more controversial the belief, and the more threatening its denial 
is to our fundamental belief system, the stronger our hold on projection 
will be (thus the Freudian formulation of /undesirable/ thoughts. If you 
express the opinion that blue is a hot color (again in the temperature 
sense, not in the social acceptability sense) I would tend to accept 
that you hold a differing opinion, as your opinion does not challenge my 
fundamental beliefs. On the other hand, if you were to express the 
opinion that most people believe that war is a good thing (in either the 
moral or the legal sense) I would probably reject that opinion, as I am 
strongly opposed to war and therefore believe that most people would 
share my believe (despite all the evidence to the contrary).

Likewise, those people who believe that posting unencrypted, unprotected 
e-books to the internet will cause those books to be widely pirated and 
shared, probably believe this because in similar circumstances it is 
what they, themselves, would do.

Thus, Mr. Hart, who believes that degraded ASCII texts is sufficient for 
his purposes, projects that belief onto others, and believes that it is 
sufficient for /everyone's/ purposes. He has expressed his opinion so 
frequently, and in so many forums that it has been deeply incorporated 
into his fundamental belief system. When presented with contrary 
opinions, even when such opinions seem to be in the majority, he will 
continue to believe that those people expressing the contrary opinions 
are the aberrations, not the norm; otherwise, his own belief system 
would be the aberration, and this is too great a challenge to his 
fundamental beliefs.

Likewise, Bowerbird has invested a great deal of emotional capital in 
the creation and promotion of his proprietary Zen Markup Language. He 
has a strong, internalized belief that it is capable of completely 
representing everything required for the electronic version of a book. 
Thus, not only are those constructs it is not capable of representing by 
definition unimportant, the vast majority of people when exposed to ZML 
will recognize its superiority. The critics of ZML are the aberration, 
not Bowerbird.

To put it in simple syllogistic terms:
1. All right-minded people will recognize that ZML is the best markup 
language possible (psychological projection).
2. Lee Passey does not recognize that ZML is a good markup language.
3. QED, Lee Passey is not in his right mind.

A completely logical conclusion, but completely out of touch with reality.

Those people who seem to believe that a certain presentation is "ugly" 
also seem to me to be those people who are most likely to project this 
belief on to others, as though questions of aesthetics could have any 
kind of universal standard, and are least likely to accept any 
presentation which is not the same as what they would do in similar 
circumstances. They are also less likely to see the value in someone 
else's preferences or needs (which are, obviously, aberrations). Thus, 
Mr. Starner needn't explain why the PG TEI-to-HTML XSL script generates 
a title page that "looks like crap," nor need he explain what needs to 
be done to fix it; virtually everyone in the world shares his artistic 
sensibilities, so all you should have to do is look at it to understand.

Personally, I have spoken with many people (mostly programmers) about 
marking up text semantically or structurally instead of 
presentationally. My experience is that a certain percentage of them 
(probably less than half) "get it" almost immediately. No arguments or 
justifications are needed; you just say, "if we mark up the text 
according to the document's semantic, we can produce any presentation, 
and consume it like a database as well", and they reply "well, of 
course, that's the way it should be done."

Those people who don't "get it" almost immediately rarely, if ever, "get 
it." They can't seem to imagine that anyone could use the e-book in any 
way other than how /they/ want to use it, or to see it in any other way 
than the way they want to see it. The reaction has its source in 
sub-conscious emotions and motivations, and no amount of logic can alter 
an emotional response.

So, is it worth the effort to try to serve both camps? Is it even 
/possible/ to serve both camps?

Because semantic markup is much more powerful than presentational 
markup, and because those who favor semantic markup tend to be more 
far-sighted and less ego-centric than those caught up in presentation, I 
believe it is possible, with little additional effort, for those in camp 
#2 to produce a product that will satisfy those in camp #1. But I do not 
believe that there is any way to create a work flow that would allow the 
adherents of camp #1 to produce a product that would be acceptable to 
those in camp #2; at least nothing that provides any significant 
advantage to just going to the scan set and starting over.

So yes, I think it is worth the effort for those in camp #2 to try and 
satisfy the needs of camp #1, so long as they understand that there will 
be no reciprocity, and the "good" will have to co-exist in the same 
database with the "bad" and the "ugly." At the same time, I don't think 
there is any process possible that would permit those who are 
presentationally oriented to create something that the "content" camp 
will find useful, and I don't think it is worth the effort for those in 
camp #2 to try and persuade those in camp #1 that semantic markup is a 
better way. They just can't see it. As the saying goes, "Never try to 
teach a pig to sing -- it wastes your time and annoys the pig."


From jon at noring.name  Mon Oct  8 16:56:57 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 17:56:57 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AA4A7.4010908@perathoner.de>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
	<589228901.20071008142419@noring.name> <470AA4A7.4010908@perathoner.de>
Message-ID: <1414923511.20071008175657@noring.name>

Marcello wrote:
> Jon Noring wrote:

>> By taking the metadata, one should be able to do just about anything
>> with it, including building standardized title pages.

> Bonjour. That's what we *are* doing.

>> Can you give us one or two examples of TEI metadata that leads to
>> title pages leading to complaints?

> The complaint so far is that the title page doesn't use the font sizes
> and text alignment the plaintiff likes best.

Hmmm, that doesn't seem to be related to the issue of drawing the
title page information from the metadata.


> Also there is confusion between the ebook title page, which contains PG
> metadata and PG edition and publication date, and the title page of the
> paper book. IMO they are different entites. You can provide one or both
> of them, but at the discretion of the PPer.

Hmmm, isn't there a way to provide both in the same TEI document?

Maybe one way is to insert the PG-related metadata as some Dublin Core
markup, and/or in some RDF, and embed that in a CDATA section. That
info will thus be available to scripts, and will pass XML validation
to the TEI DTD.

Jon Noring


From lee at novomail.net  Mon Oct  8 18:23:39 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 08 Oct 2007 19:23:39 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1414923511.20071008175657@noring.name>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>	<589228901.20071008142419@noring.name>
	<470AA4A7.4010908@perathoner.de>
	<1414923511.20071008175657@noring.name>
Message-ID: <470AD81B.6060301@novomail.net>

Jon Noring wrote:

>  Marcello wrote:
 >
> > Jon Noring wrote:
 >>
> >> By taking the metadata, one should be able to do just about
> >> anything with it, including building standardized title pages.
>
> > Bonjour. That's what we *are* doing.
>
> >> Can you give us one or two examples of TEI metadata that leads to
> >>  title pages leading to complaints?
>
> > The complaint so far is that the title page doesn't use the font
> > sizes and text alignment the plaintiff likes best.
>
>  Hmmm, that doesn't seem to be related to the issue of drawing the
>  title page information from the metadata.

But it is.

The <teiHeader> element is used for all the metadata about a book. You 
should never see /any/ of the data from the <teiHeader> when displaying 
the book, which is why in my CSS for TEI I have marked the <teiHeader> 
as "display:none". If you want a displayable title page in your e-book 
(particularly if you want to control its presentation), one way is to 
create a <titlePage> element in the <front> and construct your title 
page there. This is not the common way it is done in PGTEI, however.

In TEI there is an element called <divGen>. In essence, it is a function 
call instruction; it says, "at this point in the text, generate this 
'type' of a <div> element." According to the draft P5 specification, 
"This element is intended primarily for use in document production or 
manipulation, rather than in the transcription of pre-existing 
materials; it makes it easier to specify the location of indices, tables 
of contents, etc., to be generated by text preparation or word 
processing software." Most PGTEI texts do not contain a transcribed 
<titlePage>, instead they contain a <divGen type="titlepage"> which the 
PG XSL script interprets as an instruction to generate a standard title 
page from pieces of the <teiHeader> data.

If you haven't realized it yet, PGTEI is inextricably linked to the PG 
XSL conversion scripts. If you're not happy with the output of those 
scripts, there is no purpose in using the PGTEI extensions to TEI.

Thus, what Mr. Starner is complaining about is /not/ TEI, or the way TEI 
has been used to encode a particular e-book. He's complaining about the 
way Mr. Perathoner's script generates the title page from the given 
metadata.

> > Also there is confusion between the ebook title page, which
> > contains PG metadata and PG edition and publication date, and the
> > title page of the paper book. IMO they are different entites. You
> > can provide one or both of them, but at the discretion of the PPer.
> >
>
>  Hmmm, isn't there a way to provide both in the same TEI document?

Sure. Add complete metadata to the <teiHeader> and create a title page 
using <titlePage>. If you're displaying the document using CSS, nothing 
in the <teiHeader> will be displayed (presumably, if you're using a good 
style sheet), nor will the <divGen> element (although it's contents, if 
any, will be - although the only contents allowed in a <divGen> is a 
<head>). Then, when you write a script to do some other transformation 
(which is about the only way you can get anything useful out of a 
<divGen> element), you add a conditional to suppress the <titlePage> 
element if a <divGen type="titlepage"> is present, or alternatively to 
only generate a title page if a <titlePage> is /not/ present.

You know, in XML order of child elements is (typically) not important. 
Just to make things a little more forgiving for those who /don't/ have a 
good CSS style sheet for TEI, I think the standard ought to be to put 
the <teiHeader> /after/ the <text> element instead of before.


From prosfilaes at gmail.com  Mon Oct  8 20:10:07 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 8 Oct 2007 23:10:07 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AC0E7.1010208@novomail.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<470AC0E7.1010208@novomail.net>
Message-ID: <6d99d1fd0710082010o31dd45e6xf48306ba89e4a9ea@mail.gmail.com>

On 10/8/07, Lee Passey <lee at novomail.net> wrote:
> Thus,
> Mr. Starner needn't explain why the PG TEI-to-HTML XSL script generates
> a title page that "looks like crap," nor need he explain what needs to
> be done to fix it; virtually everyone in the world shares his artistic
> sensibilities, so all you should have to do is look at it to understand.

Or, rather, Lee Passey doesn't need to read Mr. Starner's messages,
the one's where he points out that the format of title pages is
virtually unanimous across time and space, and where Mr. Starner
complains that titles are usually larger than anything else in the
book, that the author is correspondingly large, and that the whole
bulk of material is generally centered.

> those who favor semantic markup tend to be more
> far-sighted and less ego-centric than those caught up in presentation,

I.E. "people who agree with me tend to be morally superior to those
who don't." Besides the fact that that is the most questionable type
of conclusion to be drawn, given that it's completely self-serving and
blinding to any positive aspects of the other side, most people feel
it's a bit of a personal attack to be called short-sighted and
ego-centric and don't really want to work with you when you toss that
type of phrasing around.

From jon at noring.name  Mon Oct  8 21:56:55 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 8 Oct 2007 22:56:55 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AD81B.6060301@novomail.net>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
	<589228901.20071008142419@noring.name> <470AA4A7.4010908@perathoner.de>
	<1414923511.20071008175657@noring.name> <470AD81B.6060301@novomail.net>
Message-ID: <1649568367.20071008225655@noring.name>

Lee Passey wrote:
> Jon Noring wrote:
>> Marcello wrote:
>>> Jon Noring wrote:

>>>> Can you give us one or two examples of TEI metadata that leads to
>>>> title pages leading to complaints?

>>> The complaint so far is that the title page doesn't use the font
>>> sizes and text alignment the plaintiff likes best.

>> Hmmm, that doesn't seem to be related to the issue of drawing the
>> title page information from the metadata.

> But it is.

Er, ok.

But before commenting further, I just had to keep all the prior
comments in this thread to get five levels of comments. <lol/>


> The <teiHeader> element is used for all the metadata about a book. You
> should never see /any/ of the data from the <teiHeader> when displaying
> the book, which is why in my CSS for TEI I have marked the <teiHeader>
> as "display:none". If you want a displayable title page in your e-book
> (particularly if you want to control its presentation), one way is to 
> create a <titlePage> element in the <front> and construct your title 
> page there. This is not the common way it is done in PGTEI, however.

O.k.

If the purpose of the TEI document is to actually be read by end-users
in a web browser with no "plugin", then I agree with Lee that the
title page must be built into the document in the <titlePage>. (Of
course, if the TEI document is to be used in this end-user manner,
then other restrictions probably have to also be established, such as
no inline <note>, limitations on the TEI table elements used, and
possibly a couple others to make the TEI as XHTML/CSS-compatible as
possible. And note the problems with embedding images and enabling
hypertext links, too.)

But if the purpose of the TEI document is solely as a "master" for
script conversion to readable formats, then my current thinking is
that <titlePage> is redundant, and derivative formats would use the
data in the <teiHeader> to build optimized title pages for the target
platform, as Marcello says he does.

And certainly for CSS "visualization" of the TEI master during the
document authoring process, the metadata in <teiHeader> may certainly
be displayed -- unless God Almighty herself disallows it, it is sort
of arbitrary when it comes to visualization during the document
editing process.


> In TEI there is an element called <divGen>. In essence, it is a function
> call instruction; it says, "at this point in the text, generate this 
> 'type' of a <div> element." According to the draft P5 specification, 
> "This element is intended primarily for use in document production or 
> manipulation, rather than in the transcription of pre-existing 
> materials; it makes it easier to specify the location of indices, tables
> of contents, etc., to be generated by text preparation or word 
> processing software." Most PGTEI texts do not contain a transcribed 
> <titlePage>, instead they contain a <divGen type="titlepage"> which the
> PG XSL script interprets as an instruction to generate a standard title
> page from pieces of the <teiHeader> data.

It still seems to me that for using TEI solely as a master, <divGen>
is not needed for generating title pages and navigational lists, since
the best "locations" for thesea are heavily platform/format dependent.

I'm still intrigued in using Digital Talking Book's NCX for describing
the navigational lists of each work, but I can see the alternative
where the necessary nav-list information is encoded at each target
point location, and optimized nav-lists built for each target platform
using that information. (For OPS Publications, an NCX would thus be
built -- note that specifying hierarchical level is important in NCX
Table of Contents, the reasons of which I won't go into here.)


> Thus, what Mr. Starner is complaining about is /not/ TEI, or the way TEI
> has been used to encode a particular e-book. He's complaining about the
> way Mr. Perathoner's script generates the title page from the given 
> metadata.

Hmmm, are you saying the complaints are because:

1) the title page is generated from the metadata. Period. Or

2) the title page markup generated by Marcello's script is somehow not
   right?

If the first, then I'd like to know why. If the second, what is the
fix to Marcello's script to make "better" title pages?


>>> Also there is confusion between the ebook title page, which
>>> contains PG metadata and PG edition and publication date, and the
>>> title page of the paper book. IMO they are different entites. You
>>> can provide one or both of them, but at the discretion of the PPer.

>> Hmmm, isn't there a way to provide both in the same TEI document?

> Sure. Add complete metadata to the <teiHeader> and create a title page
> using <titlePage>. If you're displaying the document using CSS, nothing
> in the <teiHeader> will be displayed (presumably, if you're using a good
> style sheet), nor will the <divGen> element (although it's contents, if
> any, will be - although the only contents allowed in a <divGen> is a 
> <head>). Then, when you write a script to do some other transformation
> (which is about the only way you can get anything useful out of a 
> <divGen> element), you add a conditional to suppress the <titlePage> 
> element if a <divGen type="titlepage"> is present, or alternatively to
> only generate a title page if a <titlePage> is /not/ present.

O.k. I had assumed from the prior comment that the metadata in the
<teiHeader> cannot easily simultaneously contain:

1) Work/Expression metadata,

2) Source book (Manifestation) metadata, and

3) PG-related metadata.

But I assume by your comment, Lee, that indeed all three types of
metadata can coexist in <teiHeader> and be unambiguously identified
for what they are by scripts. I've not yet closely studied the
metadata facility in TEI.

Btw, I *love* the OEBPS/OPS method for identifying the role played by
both creators and contributors of a given work. This provides very
useful information when generating title pages from the metadata since
we now know what role each creator placed in producing the book (e.g.,
be able to differentiate between author, illustrator, translator,
etc.)

Can the OEBPS/OPS "role" be implemented in TEI?


> You know, in XML order of child elements is (typically) not important.

It depends upon the content model in the DTD. For example, if wanted,
one could build a TEI-subset DTD where order is important of sibling
elements.

There are times when flexibility of order are important, but there are
other times when fixing order is important. It is a case-by-case sort
of thing so as to meet specific requirements.


> Just to make things a little more forgiving for those who /don't/ have a
> good CSS style sheet for TEI, I think the standard ought to be to put 
> the <teiHeader> /after/ the <text> element instead of before.

I assume in the full blow TEI DTD that <teiHeader> may appear after
<text>?


Thanks, Lee, for clarifying several things.

Jon


From ralf at ark.in-berlin.de  Tue Oct  9 01:09:50 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Tue, 9 Oct 2007 10:09:50 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AA264.2070908@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
	<4709F708.20604@perathoner.de>
	<20071008093652.GA24464@ark.in-berlin.de>
	<470AA264.2070908@perathoner.de>
Message-ID: <20071009080950.GC27456@ark.in-berlin.de>

You wrote 
> internals. (TeX is not unicode compatible. Version 0.5 will probably not
> use TeX any more if they don't implement full unicode support by then.)

pango/cairo has a Unicode PS backend which should be mature enough.


ralf


From ralf at ark.in-berlin.de  Tue Oct  9 00:55:31 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Tue, 9 Oct 2007 09:55:31 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AA264.2070908@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
	<4709F708.20604@perathoner.de>
	<20071008093652.GA24464@ark.in-berlin.de>
	<470AA264.2070908@perathoner.de>
Message-ID: <20071009075531.GA27456@ark.in-berlin.de>

> You are talking about a bug you reported in the page headers in the PDF
> output. I'm trying to fix it, but I have to work around some LaTeX
> internals. (TeX is not unicode compatible. Version 0.5 will probably not
> use TeX any more if they don't implement full unicode support by then.)
> 
> I can put in a quick work around so the book title doesn't automatically
> get assigned to the PDF left page headers. But that will change existing
> PDFs.

Your decision. It would remove garbled running headers like in
http://www.gutenberg.org/files/19239/19239-pdf.zip


ralf


From marcello at perathoner.de  Tue Oct  9 03:11:05 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 09 Oct 2007 12:11:05 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AC0E7.1010208@novomail.net>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<470AC0E7.1010208@novomail.net>
Message-ID: <470B53B9.9050804@perathoner.de>

Lee Passey wrote:

> Because semantic markup is much more powerful than presentational 
> markup, and because those who favor semantic markup tend to be more 
> far-sighted and less ego-centric than those caught up in presentation, I 
> believe it is possible, with little additional effort, for those in camp 
> #2 to produce a product that will satisfy those in camp #1.

If the "semantic camp" has done a book in TEI, the "presentational camp"
can easily augment the markup to make it "look" much like the original
paper copy.

In TEI all presentational markup is confined to the "rend" attribute,
which can be attached to any element without changing the semantic
structure of the text.

I suggest introducing "semantic" and "presentational" rounds at DP. They
are nearly orthogonal and it should be very seldom necessary to change
the tag structure to accomodate presentational-oriented refinements.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From marcello at perathoner.de  Tue Oct  9 03:20:16 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 09 Oct 2007 12:20:16 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470AD81B.6060301@novomail.net>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>	<589228901.20071008142419@noring.name>	<470AA4A7.4010908@perathoner.de>	<1414923511.20071008175657@noring.name>
	<470AD81B.6060301@novomail.net>
Message-ID: <470B55E0.3010409@perathoner.de>

Lee Passey wrote:

> You know, in XML order of child elements is (typically) not important. 
> Just to make things a little more forgiving for those who /don't/ have a 
> good CSS style sheet for TEI, I think the standard ought to be to put 
> the <teiHeader> /after/ the <text> element instead of before.

The TEI DTD prescribes that the <teiHeader> must come before the <text>.
There's nothing we can do about that.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Tue Oct  9 07:08:19 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 9 Oct 2007 08:08:19 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <470B53B9.9050804@perathoner.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<470AC0E7.1010208@novomail.net> <470B53B9.9050804@perathoner.de>
Message-ID: <727840511.20071009080819@noring.name>

Marcello wrote:
> Lee Passey wrote:

>> Because semantic markup is much more powerful than presentational 
>> markup, and because those who favor semantic markup tend to be more 
>> far-sighted and less ego-centric than those caught up in presentation, I 
>> believe it is possible, with little additional effort, for those in camp 
>> #2 to produce a product that will satisfy those in camp #1.

> I suggest introducing "semantic" and "presentational" rounds at DP. They
> are nearly orthogonal and it should be very seldom necessary to change
> the tag structure to accomodate presentational-oriented refinements.

This is similar to my suggestion that DP "officially" separate when
the time is right into the "mastering" and the "facsimile" groups.

Jon Noring


From lee at novomail.net  Tue Oct  9 11:40:22 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 09 Oct 2007 12:40:22 -0600
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1649568367.20071008225655@noring.name>
References: <2835006.1191873904270.JavaMail.?@fh1036.dia.cp.net>
	<589228901.20071008142419@noring.name>
	<470AA4A7.4010908@perathoner.de>
	<1414923511.20071008175657@noring.name>
	<470AD81B.6060301@novomail.net>
	<1649568367.20071008225655@noring.name>
Message-ID: <470BCB16.3010306@novomail.net>

Jon Noring wrote:

[snip]

> It still seems to me that for using TEI solely as a master, <divGen>
> is not needed for generating title pages and navigational lists, since
> the best "locations" for these are heavily platform/format dependent.

I tend to agree. If one is generating a title page (or table of 
contents, or list of whatever) via a script, the script is probably in 
the best position to decide where it should go. Having a command to 
"generate a title page, and put it here" embedded in the file is 
probably unnecessary. I suspect it's mostly a result of the script 
author encountering the <divGen> element and saying, "this is cool, 
let's see if we can make it work."

The <divGen> element is, however, interesting evidence of TEI's somewhat 
schizophrenic nature. This schizophrenia is a result of the fact that 
TEI can be used to transcribe existing works, as well as to encode new, 
never before published works.

As mentioned earlier, the P5 draft specification states that "[the 
divGen] element is intended primarily for use in document production or 
manipulation, rather than in the transcription of pre-existing 
materials; it makes it easier to specify the location of indices, tables 
of contents, etc., to be generated by text preparation or word 
processing software."

It appears to me that use of the <divGen> element in PG texts is 
probably inappropriate; if I'm transcribing an existing work I would 
probably want to include a <titlePage> element in the <front> section, 
placed where the title page occurred in the existing work, and 
containing all the data included on that existing title page, and 
nothing more. After all, it's a transcription.

On the other hand, if I'm writing "TEI for Dummies," it might make sense 
to use the <divGen> (only inside a <front> element) to say, "generate a 
title page for me here following corporate standards." In this case, 
it's part of a document production work flow, not a transcription.

This same schizophrenia frequently manifests itself in the use of the 
"rend" attribute. In the case of my "For Dummies" book, if I use the 
element <hi rend="italic"> I mean "I don't know why, but when this 
phrase gets rendered, it should be rendered in an italic font." On the 
other hand, if I use the same element in a transcription of an existing 
book I mean "I can't figure out why this phrase was rendered in an 
italic font, but it was."

Hopefully you can see the distinction; in one case it indicates how it 
was done in the past, and in the other it indicates how it should be 
done in the future. The two cases are not necessarily equivalent. 
Personally, for PG's purposes I would think the focus should be on 
transcription (how it was) and not document production (how it should be).

[snip]

> Hmmm, are you saying the complaints are because:
> 
> 1) the title page is generated from the metadata. Period. Or
> 
> 2) the title page markup generated by Marcello's script is somehow not
>    right?
> 
> If the first, then I'd like to know why. If the second, what is the
> fix to Marcello's script to make "better" title pages?

The second. Although I wouldn't say that the complaint is that the title 
page markup is not right, but rather that it produces a result which is 
aesthetically unpleasing to the vast majority of consumers (i.e. me).

XSL is a fairly complex and esoteric scripting language. I've glanced at 
Mr. Perathoner's conversion script, but nowhere near closely enough to 
know where the code is that generates the title page. But if one knows 
XSL programming, one could certainly take the script and modify it to 
generate the kind of title page one prefers.

If someone wanted Mr. Perathoner to do the work for him, he would 
probably have to generate sample title pages in HTML and PDF, present 
them to Mr. Perathoner, and then convince him that the samples are 
superior to the current output and that he should modify the scripts to 
generate output similar to the sample markup.

[snip]

> O.k. I had assumed from the prior comment that the metadata in the
> <teiHeader> cannot easily simultaneously contain:
> 
> 1) Work/Expression metadata,
> 
> 2) Source book (Manifestation) metadata, and
> 
> 3) PG-related metadata.
> 
> But I assume by your comment, Lee, that indeed all three types of
> metadata can coexist in <teiHeader> and be unambiguously identified
> for what they are by scripts. I've not yet closely studied the
> metadata facility in TEI.

Yep. In fact, I think that TEI support is probably much superior to 
Dublin Core. A <teiHeader> can contain a <fileDesc> which "provides a 
title and statements of responsibility together with details of the 
publication or distribution of the file, of any series to which it 
belongs, and detailed bibliographic notes for matters not addressed 
elsewhere in the header. It also contains a full bibliographic 
description for the source or sources from which the electronic text was 
derived." It also can contain a <encodingDesc> element which "documents 
the relationship between an electronic text and the source or sources 
from which it was derived."

All in all, the <teiHeader> information is very powerful. On the other 
hand, IIRC the <divGen> process gets its information from certain fields 
in the <teiHeader> with no fallback to other fields if the preferred 
fields are absent. Thus, you may cram all sorts of good data into the 
<teiHeader> and still end up with a sparsely generated title page. It 
would probably be a good thing if we had some documentation that says, 
in essence, "the PG XSL script generates a title page from information 
found in these elements; if you want to have a complete title page you 
must include data in those elements."

[snip]

> Can the OEBPS/OPS "role" be implemented in TEI?

Yes. See <respStmt> 
(http://www.tei-c.org/release/doc/tei-p5-doc/html/CO.html#COBICOR).

>> You know, in XML order of child elements is (typically) not important.
> 
> It depends upon the content model in the DTD. For example, if wanted,
> one could build a TEI-subset DTD where order is important of sibling
> elements.
> 
> There are times when flexibility of order are important, but there are
> other times when fixing order is important. It is a case-by-case sort
> of thing so as to meet specific requirements.

True, which is why I added the qualifier "typically." I have rarely 
encountered a DTD where element order /is/ important, but they do exist, 
and, in fact, the TEI DTD is one of them (the notion that you can't have 
a <div> after a <p> still boggles my mind).

>> Just to make things a little more forgiving for those who /don't/ have a
>> good CSS style sheet for TEI, I think the standard ought to be to put 
>> the <teiHeader> /after/ the <text> element instead of before.
> 
> I assume in the full blow TEI DTD that <teiHeader> may appear after
> <text>?

Relying on Mr. Perathoner's message, apparently not, although I cannot 
imagine any reason why it shouldn't. It's probably an oversight in the 
creation of the TEI DTD, and someone ought to suggest to the TEI-C that 
the restriction ought to be removed.

-- 
Nothing of significance below this line.


From Bowerbird at aol.com  Tue Oct  9 14:49:58 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 9 Oct 2007 17:49:58 EDT
Subject: [gutvol-d] speaking of title-pages
Message-ID: <d6a.e9c34e9.343d5186@aol.com>

speaking of title-pages...

i'm finding that editing the title-pages is
one of the more interesting aspects of the
conversion from pg-ascii to z.m.l. format...

it's fun to make 'em look nice.

anyway, i've produced an offline application
to grab the first chunk of text from an e-text
so that i can do the edits and previews offline,
and then the app sends it back up to the site...

if anyone wants to try it out to see if they would
also enjoy doing this -- only if you will enjoy it,
i'm not looking for "volunteers" to do "work" --
backchannel me and i will send you the program
and some general instructions how to do the edits.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071009/4f34b1fd/attachment.htm 

From jon at noring.name  Tue Oct  9 17:06:27 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 9 Oct 2007 18:06:27 -0600
Subject: [gutvol-d] speaking of title-pages
In-Reply-To: <d6a.e9c34e9.343d5186@aol.com>
References: <d6a.e9c34e9.343d5186@aol.com>
Message-ID: <926033296.20071009180627@noring.name>

Bowerbird wrote:

>  i'm finding that editing the title-pages is
>  one of the more interesting aspects of the
>  conversion from pg-ascii to z.m.l. format...
>  
>  it's fun to make 'em look nice.

Yes, I agree. The title page is something where the publisher can do
some very creative things for both branding and improving the quality
of the reading experience.

One suggestion to PG and DP is to consider coming up with a "standard"
title page template, which can be quite elaborate and/or ornate. All
that's needed is to fill the fields with the title page information.

This would be used for XHTML versions of the texts, and other higher
typographic resolution formats like PDF. Even SVG and raster image
versions could be built. The "master" for the title page could be
SVG, and that's of real interest since a lot can be done with that
(more than XHTML+CSS), such as incorporation into PDF, conversion to
raster graphics, and direct rendering on SVG-aware browsers.

The folk at DP have looked at enough elaborate and ornate title pages
that someone there with an artistic flair could come up with a pretty
good title page template based on either one particular book, or a
"composite" from a number of title pages.

Jon Noring


From marcello at perathoner.de  Wed Oct 10 03:53:21 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 10 Oct 2007 12:53:21 +0200
Subject: [gutvol-d] speaking of title-pages
In-Reply-To: <926033296.20071009180627@noring.name>
References: <d6a.e9c34e9.343d5186@aol.com>
	<926033296.20071009180627@noring.name>
Message-ID: <470CAF21.5090709@perathoner.de>

Jon Noring wrote:

> Yes, I agree. The title page is something where the publisher can do
> some very creative things for both branding and improving the quality
> of the reading experience.

Usually the "creative" branding is done on the book cover and the actual
title page is quite dull.

The title page, being somewhere in no man's land between four-color
front cover and where the jumble of words actually starts, is probably
the most never-looked-at page in the whole book (bar the empty ones).

Contents, yes. People do sometimes look at the contents page. Index
also. People sometimes use the index. But title page? What title page? I
bet 9 people out of 10, if you give them a book, can't even tell you
where the title page *is*.

Until recently the book cover was not provided by the book publisher at
all, but by the book binder, and was selected to match the cover of all
the other books in your library.

Also we must here consider that cover art may have different copyright
status than book contents, especially in live + x countries.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Wed Oct 10 07:29:26 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 10 Oct 2007 08:29:26 -0600
Subject: [gutvol-d] speaking of title-pages
In-Reply-To: <470CAF21.5090709@perathoner.de>
References: <d6a.e9c34e9.343d5186@aol.com>
	<926033296.20071009180627@noring.name> <470CAF21.5090709@perathoner.de>
Message-ID: <832724833.20071010082926@noring.name>

Marcello wrote:
> Jon Noring wrote:

>> Yes, I agree. The title page is something where the publisher can do
>> some very creative things for both branding and improving the quality
>> of the reading experience.

> Usually the "creative" branding is done on the book cover and the actual
> title page is quite dull.

For paper books, yes, most title pages are rather dull.


> The title page, being somewhere in no man's land between four-color
> front cover and where the jumble of words actually starts, is probably
> the most never-looked-at page in the whole book (bar the empty ones).

For digital versions of public domain books, we no longer have the
"luxury" of having a binding, front cover, etc. So all that's really
left in the lead-up to the actual textual content is the title page.

(Some public domain books did have something like a front cover
graphic, but most did not. I somehow doubt PG will create new front
covers for every etext that didn't have one in the original paper
edition.)

Since Greg Newby has stated many times that he considers PG a
publisher with its own brand name ("Project Gutenberg"), then the PG
and DP folk might consider coming up with a flashy title page design
that may be used for those digital rendition formats that don't
already include their own built-in title page mechanism. SVG is an
intriguing "mastering" format for such title pages. And certainly the
title page may use color -- RGB digital ink costs the same as digital
black ink. <lol/>

Jon Noring


From ralf at ark.in-berlin.de  Wed Oct 10 02:45:16 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Wed, 10 Oct 2007 11:45:16 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <20071009075531.GA27456@ark.in-berlin.de>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>
	<6d99d1fd0710051746r5cf61006v9e83091543fcb830@mail.gmail.com>
	<4707A109.5070000@perathoner.de>
	<20071008070835.GA27881@ark.in-berlin.de>
	<4709F708.20604@perathoner.de>
	<20071008093652.GA24464@ark.in-berlin.de>
	<470AA264.2070908@perathoner.de>
	<20071009075531.GA27456@ark.in-berlin.de>
Message-ID: <20071010094516.GA29264@ark.in-berlin.de>

> > I can put in a quick work around so the book title doesn't automatically
> > get assigned to the PDF left page headers. But that will change existing
> > PDFs.
> 
> Your decision. It would remove garbled running headers like in
> http://www.gutenberg.org/files/19239/19239-pdf.zip

Also, it would work around the problem of overly long titles
garbling running headers as for example in
http://www.gutenberg.org/files/22492/22492-pdf.zip

So I'm all for it, at the moment.


ralf


From Bowerbird at aol.com  Wed Oct 10 11:07:17 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 10 Oct 2007 14:07:17 EDT
Subject: [gutvol-d] speaking of title-pages
Message-ID: <bc8.1b1aecbe.343e6ed5@aol.com>

people want to know what i mean about the title-pages...

here's a very good example, with a full explanation below it:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/screen1.html
(wow, that work was from june of 2006!   seems like yesterday.)

the idea is to edit your title-page -text so the zml-viewer
knows how to copy-fit it to the screen in the same manner
that a relatively decent typographer would set it for a page.

(the main difference being that the zml-viewer must do this
for a _wide_ variety of screen-sizes, which is why i described
the task as "copy-fitting", a term that -- though it fits well --
is probably rarely applied to the setting of a print title-page.)

here's another one:
>    http://www.z-m-l.com/go/rieger/oya-cover.html
the type-size on the title of that one seems big to me, but...

and here's some more:
>    http://www.z-m-l.com/go/mabie/mabiec001.html
>    http://www.z-m-l.com/go/mabie/mabiec002.html

those last two are reworked versions of the page-scans --
>    http://snowy.arsc.alaska.edu/bowerbird/bachwm/bachwmp001.png
>    http://snowy.arsc.alaska.edu/bowerbird/bachwm/bachwmp004.png
>    http://snowy.arsc.alaska.edu/bowerbird/bachwm/bachwmp005.png
-- but they're a relatively good example of what i aim for.
the psychedelia makes it fairly clear this is a _new_ page...

the "title-page" in a z.m.l. file is a cross between a "cover"
in the traditional sense and a typical p-book "title-page"...

it is the first thing that people see, and -- as it says in the
"zandbox" manual -- the first text on it must be the title:
>?? http://z-m-l.com/go/zandbox_manual.zml
it's a reaction against the p.g. header, which -- to my mind --
fails its main mission, i.e., to inform readers what this file _is_.

because i feel free to discard the publisher information that
is typically found at the bottom of the title-page, i can often
use the cover as the "official" page-scan for my title-page:
>   http://z-m-l.com/go/myant/myantc001.html
>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy001.jpg
>    http://snowy.arsc.alaska.edu/bowerbird/sgfhb/sgfhbc001.jpg

but i'll include relevant text from the title-page (e.g., on illustrators):
>    http://www.z-m-l.com/go/myant/myantf003.png
>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy003.png

more than just title-pages, the task is on all frontmatter pages.

section 2 in a .zml file is reserved for the hotlinked table of contents:
>    http://www.z-m-l.com/go/mabie/mabiec002.html

for an example of how i edit the table of contents page,
compare this text as it appeared originally in the p.g. e-text:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/screen2before.jpg
with the edited version that would appear in the .zml file:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/screen2after.jpg
which will then be displayed in the zml-viewer like this:
>    http://snowy.arsc.alaska.edu/bowerbird/misc/screen2.jpg
where, of course, the entries will be hotlinked to their chapters.
(the look of this page compares well with the original, and also
corresponds to the form it takes in the .html versions from d.p.)

as this last series demonstrates, i feel very little compulsion to
"match the original".   people can look at the page-scan for that.
what i'm doing is creating an "in-house style" such that _all_ of
the books in my library have a consistent look-and-feel to them.

if the original book had a frontispiece, that would go in section 3.
>    http://snowy.arsc.alaska.edu/bowerbird/betsy/betsy002.jpg

plus, to preserve the flavor of the p-book's original facing-spread
if a book had a frontispiece, i'll include the title-page as section 4.
(sections 3 and 4 might "be" pages 1 and 2, i.e., p001 and p002,
in the case that sections 1 and 2 were named as c001 and c002.
the renaming/renumbering of frontmatter pages is case-by-case,
depending on frontmatter idiosyncrasies of the original p-book.)

it goes on for other frontmatter pages.   a dedication page has
its text centered and displayed in the upper third of the screen.

at one time, i even did the programming to "properly" format the
typical p-book verso with the library of congress cataloging info.
(if memory serves correctly, i did that for lawrence lessig's book.)

anyway...

to boil it all down again, it's just a matter of proper copy-fitting...

-bowerbird

p.s.   it's kind of cute that, in the recent pg-ascii files out of d.p.,
all of the title-page information is actually "centered" in the text.
that is, they've used leading spaces to create a "centered" look.
(of course, when you have a 23-inch cinema-screen like i do,
and the window is expanded to the size of the screen, the text
isn't really "centered" at all.   but that's kind of beside the point.)
i was wondering why i was getting funny results, when i had my
viewer center the text, and discovered it was the leading spaces.
no problem, they're easy enough to have the ap get rid of 'em...


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071010/f9086d36/attachment.htm 

From jon at noring.name  Wed Oct 10 12:23:34 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 10 Oct 2007 13:23:34 -0600
Subject: [gutvol-d] speaking of title-pages
In-Reply-To: <bc8.1b1aecbe.343e6ed5@aol.com>
References: <bc8.1b1aecbe.343e6ed5@aol.com>
Message-ID: <477696541.20071010132334@noring.name>

Bowerbird wrote:

>  as this last series demonstrates, i feel very little compulsion to
>  "match the original".?  people can look at the page-scan for that.
>  what i'm doing is creating an "in-house style" such that _all_ of
>  the books in my library have a consistent look-and-feel to them.

This is a good way to put it.

So long as PG/DP considers the work product they produce to be
"brandable" (e.g., PG considers itself a publisher), then it makes
sense that a consistent and brandable title page be generated for
most, if not all, digital renditions which are put online.

But then, maybe this consistency is something PG does now want since
then it becomes a sort of "requirement" which stifles individuality.
But at least some in DP, when they produce an XHTML rendition, might
consider doing this...

Jon Noring


From piggy at netronome.com  Thu Oct 11 10:02:24 2007
From: piggy at netronome.com (La Monte Henry Piggy Yarroll)
Date: Thu, 11 Oct 2007 13:02:24 -0400
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
In-Reply-To: <200710051727.42671.rolsch@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
Message-ID: <470E5720.6030207@netronome.com>

Roland Schlenker wrote:
> On Thursday 04 October 2007 6:20 pm, Lee Passey wrote:
>   
>> Jeroen Hellingman (Mailing List Account) wrote:
>>     
>>> I think the biggest barrier here is the steep learning curve of TEI (20%
>>> of the tags cover 80% of the things you encounter, but every other book
>>> you will need something from those remaining 80%, and, oh gosh, which
>>> tag can I use then) ....
>>>       
>> I am intrigued by this comment (and not only because it mirrors my own
>> experience). So by way of information gathering among those who use TEI
>> on a regular basis, I would you to tell me, perhaps simply as an ordered
>> list, what TEI tags you believe are most used and most valuable (not
>> necessarily the same thing). In other words, what are the 20% of the
>> tags that cover 80% of the need, and from the remaining 80% what seems
>> to come up the most often?
>>
>> I'm thinking of writing a little script that will try to automate the
>> collection of usage data from current Gutenberg TEI texts.
>>     
>
> >From my lastest project, Marcia Schuyler, by Grace Livingston Hill Lutz:
>
> <p>                  - 1687
> <q>                  - 934
> <anchor>             - 434
> <pb>                 - 358
> <hi>                 - 204
>   

Thanks for the lists!

I'm particularly interested in the typical range of values used for the 
rend attribute on <hi>. I've added the (I think) fictional 
rend="gesperrt" to the book I'm working on. At some point, I'll have to 
figure out the right way to do that.

The biggest problem I've had getting started with TEI is the staggering 
plethora of documentation. There doesn't seem to be a strong consensus 
on best documentation or best tools. I didn't even notice "The Guide to 
PGTEI" until I'd been playing with TEI for a couple weeks.

A good solution for me would be to have a few of the TEI experts start 
hanging out on #pgdp. If there's a better place to find live humans to 
ask questions of, I'd love to hear about it. This list appears to be the 
best place I've found so far and I'm a little uncomfortable with asking 
questions that I OUGHT to be able to extract from that great pile of 
documentation.

Well, back to the firehose for another sip...

> <item>               - 118
> <ref>                - 76
> <lb/>                - 71
> <index>              - 63
> <div>                - 41
> <list>               - 39
> <corr>               - 38
> <head>               - 37
> <milestone>          - 27
> <l>                  - 25
> <quote>              - 8
> <lg>                 - 5
> <divGen>             - 5
> <figure>             - 4
> <figDesc>            - 4
> <title>              - 3
> <name>               - 3
> <date>               - 3
> <publisher>          - 2
> <idno>               - 2
> <classCode>          - 2
> <bibl>               - 2
> <author>             - 2
> <titleStmt>          - 1
> <textClass>          - 1
> <text>               - 1
> <teiHeader>          - 1
> <taxonomy>           - 1
> <sourceDesc>         - 1
> <revisionDesc>       - 1
> <respStmt>           - 1
> <publicationStmt>    - 1
> <pubPlace>           - 1
> <projectDesc>        - 1
> <profileDesc>        - 1
> <language>           - 1
> <langUsage>          - 1
> <keywords>           - 1
> <imprint>            - 1
> <front>              - 1
> <fileDesc>           - 1
> <encodingDesc>       - 1
> <editorialDecl>      - 1
> <editionStmt>        - 1
> <edition>            - 1
> <classDecl>          - 1
> <change>             - 1
> <body>               - 1
> <back>               - 1
> <availability>       - 1
> <TEI.2>              - 1
>   


From jon at noring.name  Thu Oct 11 11:23:21 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 11 Oct 2007 12:23:21 -0600
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
	to add OpenDocument as an additional
In-Reply-To: <470E5720.6030207@netronome.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
Message-ID: <1354905005.20071011122321@noring.name>

La Monte Henry Piggy Yarroll wrote:

> The biggest problem I've had getting started with TEI is the staggering
> plethora of documentation. There doesn't seem to be a strong consensus
> on best documentation or best tools. I didn't even notice "The Guide to
> PGTEI" until I'd been playing with TEI for a couple weeks.
>
> A good solution for me would be to have a few of the TEI experts start
> hanging out on #pgdp. If there's a better place to find live humans to
> ask questions of, I'd love to hear about it. This list appears to be the
> best place I've found so far and I'm a little uncomfortable with asking
> questions that I OUGHT to be able to extract from that great pile of 
> documentation.

An approach which I previously discussed is based on the recognition
that maybe 80% (a guesstimate for now) of all the books are quite
simple in overall structure, and the TEI subset to properly add document
structure and inline text semantics for these books is pretty small and
manageable by most anyone with familiarity in marking up documents.

This brings up the possibility of PG/DP coming up with such a basic TEI
subset (and associated DTD and usage ruleset) for marking up books, along
with a "usage manual". Those 20% (or so) of books which are more complex
would then be turned over to those familiar with the full TEI vocabulary.

I previously gave a few other requirements I think we should have in
using TEI this way, but won't repeat them in this message.

Jon Noring


From Bowerbird at aol.com  Thu Oct 11 11:39:05 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 11 Oct 2007 14:39:05 EDT
Subject: [gutvol-d] here's another one
Message-ID: <c4b.1f2b7fef.343fc7c9@aol.com>

here's another form of light-markup, asciidoc:
>    http://www.methods.co.nz/asciidoc/

from the home-page:
>    AsciiDoc is a text document format for writing short documents, 
>    articles, books and UNIX man pages.
>    ...
>    The asciidoc(1) command translates AsciiDoc files to HTML, XHTML 
>    and DocBook markups. DocBook can be post-processed to 
>    presentation formats such as HTML, PDF, roff, and Postscript 
>    using readily available Open Source tools.

asciidoc seems to be very well developed,
compared to many light-markup systems.

asciidoc does music (lilypond and abc) 
and math (asciimathml and latexmathml).

among an interesting mix of projects using asciidoc is
the linux kernel git source code management system...

***

here's the asciidoc user-guide web-page:
>    http://www.methods.co.nz/asciidoc/userguide.html

the inventor of asciidoc was using docbook before, but said:
>    But DocBook is a complex language, the marked up text is 
>    difficult to read and even more difficult to write directly --
>    I found I was spending more time typing markup tags, 
>    consulting reference manuals and fixing syntax errors, 
>    than I was writing...

i'd say that about sums it up...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071011/9448394e/attachment.htm 

From jon at noring.name  Thu Oct 11 13:15:08 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 11 Oct 2007 14:15:08 -0600
Subject: [gutvol-d] here's another one
In-Reply-To: <c4b.1f2b7fef.343fc7c9@aol.com>
References: <c4b.1f2b7fef.343fc7c9@aol.com>
Message-ID: <1899484476.20071011141508@noring.name>

Bowerbird wrote:

> here's another form of light-markup, asciidoc:
>
> [snip]
>
>  i'd say that about sums it up...


I do believe that ZML needs to be given a chance to show its stuff.

So again I ask the PG/DP folk to select a good representative cross-
section of texts, something like 10 or so, from simple to complex,
and ask Bowerbird to put them into ZML.

Then we can analyze them and see if the "ZML markup" is sufficient
to represent these books for PG purposes.

If ZML shows itself sufficient for all of them, then great, we can
discuss where to go from there.

If there are some areas where ZML is deemed deficient, then we can see
if ZML can be tweaked while staying within what Bowerbird deems
"ZMLness" (only he knows what that is.) Then we go from there.

So, PG/DPers. Can you suggest, by number, some PG etexts that
Bowerbird should consider converting to ZML?

Jon Noring


From joshua at hutchinson.net  Thu Oct 11 14:17:28 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Thu, 11 Oct 2007 21:17:28 +0000 (UTC)
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
Message-ID: <5235198.1192137448655.JavaMail.?@fh1037.dia.cp.net>

Ok, I have an issue of the magazine Punch available for comment.  I 
*had* hoped to have a few more, but time is ever a fleeting thing.  :)

http://pglaf.org/~joshua/punch/

There is a version created with the built-in macro for the title page 
and a version with a manually created title page that is properly 
centered.  The txt versions are also there.

The remainder of each version is the same.  Only the titlepages have 
been changed.

To keep the discussion on topic, I ask that you refrain from comments 
on the markup used (yes, I know I went lazy and marked up the italics 
with <hi> instead of <emph>, etc).  I want to know what people think of 
the built-in macro verses the manually created title page and what 
could be improved on each.

Thanks,
Josh

>----Original Message----
>From: joshua at hutchinson.net
>Date: Oct 8, 2007 14:50 
>To: <gutvol-d at lists.pglaf.org>
>Subj: Re: [gutvol-d] Very strict subset of TEI P5 for most PG/DP 
books?
>
>Ok, everyone, David has a point.  A very GOOD point.
>
>Master formats are great and all, but the final output has to look 
>nice or no one will want to use it.
>
>Let's table this part of the discussion for a few days.  I'll try to 
>fix up some examples of nice looking title pages and post them up.  
>Hopefully, we can get some consensus on whether TEI can do the job 
(and 
>some feedback on how hard it was/wasn't from me).
>
>Josh
>
>>----Original Message----
>>From: prosfilaes at gmail.com
>>
>>Great. Then let's see it done. I'm tired of hearing about theory, 
>Jon.
>>I wouldn't have brought it up if we were looking beautiful title
>>pages. But we've all heard huge promise made for things that never
>>worked out. I'm not going to work on TEI, nor will I encourage 
others
>>to, until it actually works right.
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From lee at novomail.net  Thu Oct 11 16:41:48 2007
From: lee at novomail.net (Lee Passey)
Date: Thu, 11 Oct 2007 17:41:48 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470E5720.6030207@netronome.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
Message-ID: <470EB4BC.9090201@novomail.net>

La Monte Henry Piggy Yarroll wrote:

> Roland Schlenker wrote:
>
>> On Thursday 04 October 2007 6:20 pm, Lee Passey wrote:

[snip]

>>> I'm thinking of writing a little script that will try to automate the
>>> collection of usage data from current Gutenberg TEI texts.

[snip]

Yet more grist for the mill:

I wrote a program that downloaded each of the alleged 112 TEI files 
stored at Project Gutenberg. Of these, on three occasions the PG server 
responded with a 404 or 406, leaving a total of 109 files for analysis. 
I loaded each file into a DOM, and then counted all of the elements used 
in the <text> element. As a result of this strategy 1. what is 
identified are only those elements used to transcribe the document, not 
those used to record metadata, and 2. certain elements may be 
under-counted if they are used both in the <teiHeader> element and the 
<text> element. (Analysis of <teiHeader> elements is down the road).

The data has been presented as a table of comma-separated values; it 
should be possible to copy this table and save it as a .csv file which 
can then be opened in any reasonably capable spreadsheet program.

The first line is a header line. The first column counts the total 
number of documents in which the named element appears; the second 
column is how that relates as a percentage of the total number of 
documents scanned. The third column records the total number of uses of 
the named element, and the fourth column is the average number of uses 
for those documents in which the element is used (not the total number 
of documents scanned). The last column is the name of the element.

Total Docs,Percentage, Total Use,Avg per Doc,   Element Name
        109,    100.00,      9928,     91.08,    head
        109,    100.00,      9654,     88.57,    div
        109,    100.00,       623,      5.72,    divGen
        109,    100.00,       109,      1.00,    front
        109,    100.00,       109,      1.00,    body
        109,    100.00,       109,      1.00,    back
        108,     99.08,     87341,    808.71,    p
        106,     97.25,     18293,    172.58,    index
         95,     87.16,     28950,    304.74,    hi
         69,     63.30,      5170,     74.93,    lb
         66,     60.55,      8796,    133.27,    note
         63,     57.80,       175,      2.78,    then
         63,     57.80,       175,      2.78,    pgIf
         63,     57.80,       174,      2.76,    else
         52,     47.71,      9189,    176.71,    pb
         49,     44.95,      8731,    178.18,    anchor
         34,     31.19,       640,     18.82,    figure
         34,     31.19,       640,     18.82,    figDesc
         32,     29.36,     11891,    371.59,    l
         32,     29.36,       899,     28.09,    lg
         30,     27.52,      3898,    129.93,    milestone
         22,     20.18,        33,      1.50,    titlePart
         22,     20.18,        23,      1.05,    docImprint
         22,     20.18,        22,      1.00,    docTitle
         22,     20.18,        22,      1.00,    titlePage
         21,     19.27,        22,      1.05,    docAuthor
         20,     18.35,       165,      8.25,    quote
         20,     18.35,        21,      1.05,    byline
         16,     14.68,      9818,    613.63,    cell
         16,     14.68,      4167,    260.44,    row
         16,     14.68,       313,     19.56,    table
         14,     12.84,      5392,    385.14,    ref
         14,     12.84,        14,      1.00,    docDate
         13,     11.93,      1038,     79.85,    item
         13,     11.93,       177,     13.62,    list
         11,     10.09,       134,     12.18,    corr
          8,      7.34,      7161,    895.13,    q
          7,      6.42,         7,      1.00,    docEdition
          5,      4.59,       559,    111.80,    emph
          4,      3.67,       152,     38.00,    title
          3,      2.75,       494,    164.67,    abbr
          3,      2.75,       212,     70.67,    foreign
          3,      2.75,        81,     27.00,    reg
          3,      2.75,         8,      2.67,    name
          2,      1.83,       671,    335.50,    formula
          2,      1.83,        28,     14.00,    sic
          2,      1.83,        27,     13.50,    bibl
          2,      1.83,        26,     13.00,    author
          2,      1.83,         2,      1.00,    epigraph
          2,      1.83,         2,      1.00,    date
          2,      1.83,         2,      1.00,    add
          2,      1.83,         2,      1.00,    trailer
          1,      0.92,        13,     13.00,    label
          1,      0.92,         8,      8.00,    argument
          1,      0.92,         2,      2.00,    del
          1,      0.92,         1,      1.00,    eg
          1,      0.92,         1,      1.00,    cit

I'm sure this data will reveal some oddities and insights, more than 
just the ones that jump out at me at first blush.

Looking at paragraphs, I'm sure this number is over-inflated because 
there is so much paragraph abuse apparent in PGTEI texts (pretty much 
every block of text is labeled a paragraph, even those which are 
obviously not). I'm sure a little more discretion in the use of the <p> 
element would result in an increase is those elements which are part of 
the <titlePage>.

Regarding paragraphs, one of the oddities is that there is apparently 
one document that doesn't contain a single paragraph! I thought perhaps 
it was the TEI version of the American Declaration of Independence, but 
that proved not to be the case.

I note that 100% of the files contained a <divGen> element (usually used 
to create a table of contents). I also note that 22 of the files have a 
<titlePage> element as well, which means that in some cases there may be 
both a generated title page and a hand-crafted title page as well.

My perusal of PGTEI files indicates that the <index> element is used 
almost exclusive to support the PGTEI XSL scripts which generate title 
pages, tables of contents, and lists of illustration. My guess is that 
the <index> count is also over-inflated if you were to disregard the 
effects of the PGTEI conversion scripts.

Yet another oddity is that the <quote> element is used in 20 documents 
while the <q> element is used only in 8. This seemed odd to me at first, 
as the <quote> element is only used for those quotations which are 
attributed by the author to some agency external to the text. And in 
TEI-Lite, <quote> has been deprecated in favor of <q>, which is kind of 
a "catch-all" element in TEI. As I think about it, however, the TEI 
specification does suggest that it is acceptable to not use the <q> 
element at all, instead retaining the original quotation marks. Maybe 
the disparity between <q> and <quote> is really not odd after all.

> Thanks for the lists!
> 
> I'm particularly interested in the typical range of values used for the 
> rend attribute on <hi>. I've added the (I think) fictional 
> rend="gesperrt" to the book I'm working on. At some point, I'll have to 
> figure out the right way to do that.

One of the data points I found most interesting is that while the <hi> 
tag (typically used to record italicization when the reason is not 
discernible) is used in 95% of all the documents, the <emph> element 
(indicating emphasized text) which is probably the most common use of 
italicization, was used in only 5 of the texts, and the <foreign> tag, 
indicating a word foreign to the language of the text, was used in only 
3 of the texts.

The preponderance of the <hi> element (which is almost purely 
presentational) together with the amount of paragraph abuse leads me to 
conclude that even those people who are using TEI because it is 
semantic, not presentational, are still marking up text using a 
presentational philosophy. I would bet that when I do an analysis of the 
"rend" attribute on the <hi> element we will find that when it exists it 
will almost always be 'italic'.

Thus, as we work towards a tutorial on the use of TEI one of the major 
focuses will need to be how to avoid the tendency to use presentational 
markup.

> The biggest problem I've had getting started with TEI is the staggering 
> plethora of documentation. There doesn't seem to be a strong consensus 
> on best documentation or best tools. I didn't even notice "The Guide to 
> PGTEI" until I'd been playing with TEI for a couple weeks.

And if you look at "The Guide to PGTEI" I think you will find that the 
only thing it does is document the PG extensions to TEI; it does nothing 
to actually help novices get started with TEI itself, nor does it expose 
any Known Best Practices. That is what I am trying to develop in this 
thread.

The questions I want resolved are 1. What are the core TEI elements 
which everyone /must/ understand before attempting to encode /any/ work 
(examples include <text>, <front>, <body>, and <back>)? 2. What are the 
most commonly encountered elements that I should understand to correctly 
encode 80% of the books in the database? 3. What are the less common but 
still useful TEI elements that I will need to understand occasionally? 
4. What are the common mistakes made in TEI encoding books, and how can 
and should one avoid them? 5. What are the TEI elements that can safely 
be ignored (unless you're creating a dictionary, I know a how tassle of 
them).

Thus, while I agree with Mr. Noring's suggestion of creating a "usage 
manual" for the most common and useful TEI elements, I don't agree with 
the suggestion of creating a DTD for any "approved" subset of TEI. You 
see, DTDs are only useful in detecting deviations from a standard. They 
are no good whatsoever in helping people decide which parts of the 
standard are appropriate in any given circumstance.

I think any usage that satisfies the full TEI DTD ought to be acceptable 
(and I wouldn't be opposed to the development of a "liberalized" TEI DTD 
that removes some of the restrictions of the current TEI DTD). But 
restricting documents to a certain subset of elements does nothing to 
promote an understanding of those elements. We need a document which 
does not tell us which elements and usages are acceptable, we need a 
document which tells us which elements and usages are important and 
appropriate in given scenarios.

Looking at the numbers above, there appear to be about 40 documents that 
were created using only the base TEI structural elements with <hi> and 
<p>. These documents are no doubt valid (i.e. they satisfy the TEI DTD) 
but they can't possibly all be good. The right document should help us 
create documents which are valid /and/ good.

I'd be very interested in hearing what other observations and insights 
these data provoke in other people.

[remainder snipped]

-- 
Nothing of significance below this line.


From marcello at perathoner.de  Thu Oct 11 16:49:58 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 12 Oct 2007 01:49:58 +0200
Subject: [gutvol-d] gnutenberg-press maintenance offer (was Re: Proposal
 to add OpenDocument as an additional
In-Reply-To: <470E5720.6030207@netronome.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
Message-ID: <470EB6A6.3030308@perathoner.de>

La Monte Henry Piggy Yarroll wrote:

> I'm particularly interested in the typical range of values used for the 
> rend attribute on <hi>. I've added the (I think) fictional 
> rend="gesperrt" to the book I'm working on. At some point, I'll have to 
> figure out the right way to do that.

  rend="letter-spacing: 0.15em"

PGTEI aims to support all attributes of CSS 2.1 / 3 at some point in the
future.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Thu Oct 11 17:03:16 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 11 Oct 2007 20:03:16 EDT
Subject: [gutvol-d] if you wanteded to convince people that t.e.i. is easy
Message-ID: <c56.1fa41180.344013c4@aol.com>

if you wanted to convince people that t.e.i. is easy,
these humongous threads are not the way to do it.

my finger is tired from deleting so many messages
from my spam folder...                  ;+)

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071011/fa96561e/attachment.htm 

From Bowerbird at aol.com  Thu Oct 11 17:07:16 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 11 Oct 2007 20:07:16 EDT
Subject: [gutvol-d] if you wanteded to convince people that t.e.i. is
	easy
Message-ID: <cba.1c94b728.344014b4@aol.com>

i said:
>   my finger is tired from deleting so many messages
>   from my spam folder...???????????????? ;+)

on the other hand -- which would be my left one --
the fingers are still quite spry and ready for action,
which is why the middle finger pressed "ed" twice,
thus explaining the "wanteded" typo in the subject.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071011/facec2a2/attachment.htm 

From marcello at perathoner.de  Thu Oct 11 17:10:09 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri, 12 Oct 2007 02:10:09 +0200
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <5235198.1192137448655.JavaMail.?@fh1037.dia.cp.net>
References: <5235198.1192137448655.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <470EBB61.4080407@perathoner.de>

joshua at hutchinson.net wrote:

> I want to know what people think of 
> the built-in macro verses the manually created title page and what 
> could be improved on each.

I would use rend="margin-top: 4em" instead of <lb/><lb/>.

You were using <lb/> for purely presentational purposes (to create a
vertical gap). There ain't no such thing as 2 adjacent linebreaks anyway.

Maybe <docImprint> instead of <docEdition> ?

<titlePage rend="page-break-before: right; text-align: center">
  <docTitle>
    <titlePart type="main" rend="font-size:
xx-large">Punch</titlePart><lb />
    <titlePart type="sub" rend="font-size: x-large">or the London
Charivari</titlePart>
  </docTitle>

  <docImprint rend="margin-top: 4em">
    Volume 98<lb />
    <docDate value="1890-01-11">11th January 1890</docDate>
  </docImprint>

</titlePage>


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Fri Oct 12 07:30:11 2007
From: prosfilaes at gmail.com (David Starner)
Date: Fri, 12 Oct 2007 10:30:11 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <5235198.1192137448655.JavaMail.?@fh1037.dia.cp.net>
References: <5235198.1192137448655.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <6d99d1fd0710120730n75c56f6cy62839239e727fb3a@mail.gmail.com>

On 10/11/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
> To keep the discussion on topic, I ask that you refrain from comments
> on the markup used (yes, I know I went lazy and marked up the italics
> with <hi> instead of <emph>, etc).  I want to know what people think of
> the built-in macro verses the manually created title page and what
> could be improved on each.

The manual page looks fine, and I assume it compares well to the
original. My biggest gripe on the built-in page is the comma between
"First Project Gutenberg Edition" and "(October 2007)"; parenthetical
clauses are never separated off with a comma.

I don't know how flexible the macro page is, but given that the macro
is the one-size-fits-all title page, I'd like to see more, including
information about the original publisher, publication place and date.

From jeroen.mailinglist at bohol.ph  Fri Oct 12 11:55:39 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Fri, 12 Oct 2007 20:55:39 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470EB4BC.9090201@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>
Message-ID: <470FC32B.5060409@bohol.ph>


Thanks Lee for the analysis.

I grabbed the over 300 .tei files on my disk (master of both the posted
HTML files I've posted to PG, and those in progress), and am currently
converting them to XML (I use SGML in pure ASCII, combined with various
transcription methods that most tools can't handle).

If interested, I will create an archive of these, so you can repeat the
analysis. Is there a place where I can drop this (rather large) archive?

Looking at your observation.

<p> is probably the most common occurance. The only documents I can
imagine with very few <p> tags are those exclusively dealing with poetry
or plays, as these will use the <l> tag. TEI often requires <p> in
places you won't expect them, such as inside an argument.

I hardly use <index>, which I believe would be the best way to deal with
pre-existing indexes, but would be extremely labour intensive (unless
somebody takes the time to build a tool for it. Ideally, you would
resolve every term appearing in the index, and replace it with an
<index> tag on the exact place intended by the indexer. This could be
partly automated, but good indexes are often smart, that is, they would
refer to a person mentioned on a page using a normalized name, which
would break automated index-resolving tool. Some of my texts use <ref>
to the extreme, as I let every entry in the index point back to the <pb>
before the page they point to as a poor-mans alternative.

I never used PGTEI, for the following reasons:

    * Use of SGML. I use the SGML version of TEI, which is slightly
      easier for human editors than its XML reincarnation. Since I
      employ an automated conversion from SGML to XML, this is no
      problem. The automatic conversion is performed with J. Clark's SX
      tool, available at www.jclark.com <http://www.jclark.com/sp/>.
      After this I run the tei2tei.xsl
      <http://www.tei-c.org/Activities/MI/Tools/tei2tei.xsl> stylesheet.
    * Use of ASCII only. Since my SGML work predates Unicode, I don't
      use Unicode and stick to ASCII only. All characters outside ASCII
      are encoded with entities. When including sections in non-Latin
      script, such as Greek, I use ad-hoc transcription schemes. Since I
      have tools to convert these to Unicode, this is no problem.
    * Use of extensions. I try to avoid extensions to TEI, and stick
      exclusively to TEILite, or borrow elements from the full-blown TEI
      on a case-by-case basis when required.
    * Use of the |rend| attribute. We both use the |rend| attribute to
      provide hints on rendering elements. I use the concepts of
      rendition ladders, whereas PGTEI uses (since version 0.4) slightly
      modified CSS. Since this is mainly a syntactic distinction, I may
      migrate to CSS in future. (Which means I'll have to write a
      conversion tool for this purpose.)
    * Use of |<divGen>| for tables of contents. Since we are digitizing
      pre-existing texts, I avoid the use of the |<divGen>| attribute
      for tables of contents and similar sections in favor of encoding
      these as they appear in the source. The only exception is where
      the source has no table of contents. Note that titles in original
      tables of contents often differ considerably from the actual
      headings used. Sometimes this is (apparently) intentional,
      sometimes a mistake, which I will then correct.
    * Use of |<divGen>| for footnotes. I automatically generate footnote
      sections at the end of the chapter they appear in. This requires
      some tweaks with quoted sections, etc., but these can be handled
      by software easily.
    * Use of |<q>| elements. I try to follow the principle that TEI
      texts should be the characters in the source plus tagging, and
      thus encode all quotation marks with the proper characters or
      character entities, as they appear in the source. I only use the
      |<q>| element when required to add attributes to a run-in
      quotation, but do not object to tagging it, but insist on keeping
      the quotation marks.


Jeroen.

Lee Passey wrote:
>
> [snip]
>
> Yet more grist for the mill:
>
> I wrote a program that downloaded each of the alleged 112 TEI files 
> stored at Project Gutenberg. 


From lee at novomail.net  Fri Oct 12 12:59:09 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 12 Oct 2007 13:59:09 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470FC32B.5060409@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>
Message-ID: <470FD20D.1020000@novomail.net>

Jeroen Hellingman (Mailing List Account) wrote:

> Thanks Lee for the analysis.
> 
> I grabbed the over 300 .tei files on my disk (master of both the posted
> HTML files I've posted to PG, and those in progress), and am currently
> converting them to XML (I use SGML in pure ASCII, combined with various
> transcription methods that most tools can't handle).
> 
> If interested, I will create an archive of these, so you can repeat the
> analysis. Is there a place where I can drop this (rather large) archive?

I would love to analyze your corpus. The way my program works (it is a 
'C' program, not a script) is it looks in an XHTML table for a <td> 
element containing an "href" attribute. If the value of that attribute 
starts with "file://" it looks in the local file system for the TEI file 
to parse. If the value of the attribute starts with "http://" it parses 
the URL, opens a socket to the remote machine and transmits an http GET 
command for the file. It then parses the file from the incoming socket 
stream, and the file is never stored on the local file system.

Thus, if your files are available on an HTTP (web) server, all I need is 
a list of the URLs.

> Looking at your observation.
> 
> <p> is probably the most common occurance. The only documents I can
> imagine with very few <p> tags are those exclusively dealing with poetry
> or plays, as these will use the <l> tag. TEI often requires <p> in
> places you won't expect them, such as inside an argument.

The predominance of <p> is not surprising; when we write we almost 
always write in paragraphs. A problem, however, is that people who are 
steeped in the word-processing paradigm tend to mark every block of text 
that looks like a paragraph with the <p> tag, even when it's not a 
paragraph. I wish there were a good automated way to detect these 
paragraph abuses, but so far I haven't figured one out.

> I hardly use <index>, which I believe would be the best way to deal with
> pre-existing indexes, but would be extremely labour intensive (unless
> somebody takes the time to build a tool for it. Ideally, you would
> resolve every term appearing in the index, and replace it with an
> <index> tag on the exact place intended by the indexer. This could be
> partly automated, but good indexes are often smart, that is, they would
> refer to a person mentioned on a page using a normalized name, which
> would break automated index-resolving tool. Some of my texts use <ref>
> to the extreme, as I let every entry in the index point back to the <pb>
> before the page they point to as a poor-mans alternative.

I think you've hit the proper use of the index element right on the 
head. As it turns out, the PGTEI files use the element differently, 
primarily as targets for the <divGen> function. When I wrote my tei2html 
program I used a somewhat different approach to the automated creation 
of tables of contents and illustrations. For the table of contents I 
scanned for the <head>ers of <div>s and used them to create a 
hierarchical table with the same hierarchy that the <div>isions had; for 
the list of illustrations I just looked for every <figure> tag. I don't 
like the way PG uses the <index> element, which is why I think its not 
one of the elements that deserves inclusion in the group of "crucial" 
elements.

> I never used PGTEI, for the following reasons:

[snip]

In my view, the PGTEI constraints are focused very strongly on 
supporting the XSLT transformations built into the PG web site. Because 
I find the output from those scripts unacceptable, the scripts, and 
therefore the markup supporting those scripts, and of very little 
importance to me. They certainly have little, if any, general applicability.

The use of the "rend" attribute throughout is very problematic, because 
it has two, somewhat different, meanings. It can mean "in the original 
text this region had this presentation," or it can mean "when you 
convert this document to a presentation document, force it to have this 
presentation." Because I strongly support the proposition that the end 
user should have simple and direct control over the presentation of a 
document, I fully support the first use of the rend attribute, but give 
only qualified support to the second.

Finding all the "rend" attributes, and listing the array of values and 
what elements they are used on will be one of my next projects.

-- 
Nothing of significance below this line.


From lee at novomail.net  Fri Oct 12 13:59:12 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 12 Oct 2007 14:59:12 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470EB4BC.9090201@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>
Message-ID: <470FE020.3080608@novomail.net>

Lee Passey wrote:

> Yet more grist for the mill:

[snip]

> Total Docs, Percentage, Total Use, Avg per Doc, Element Name

[snip]

>         69,     63.30,      5170,     74.93,    lb

[snip]

> I'm sure this data will reveal some oddities and insights, more than 
> just the ones that jump out at me at first blush.

A couple of other observations that have recently occurred to me:

If you remove the items in the list which are obviously included only to 
support the PG XSLT scripts (divGen, index, pgIf, then, else) over 50% 
of the TEI documents in the PG database are created using only 9 tags in 
the body text (i.e. not including metadata). While I suspect that this 
is a result of a certain amount of /over/-simplification, it is evidence 
that TEI is not nearly as complex as its detractors would have us believe.

63% of the PG texts included the <lb> element averaging 75 uses per 
document. There are some individuals who believe that line endings in 
the original source document should be preserved in the TEI 
transcription, and this is obviously one use for the <lb> element. 
Others obviously view the <lb> element as being analogous to the HTML 
<br> element which instructs the user agent to begin a new line when the 
document is being presented. These two uses are to some extent 
contradictory.

If all the PG documents which use the <lb> element used it to record 
line breaks in the text, I can't believe that there would average only 
75 instances per text. On the other hand, it seems to me that if all the 
uses were to indicate a desired presentation, there wouldn't average 75 
in a document. I suspect that most of the documents scanned use the <lb> 
element to force a line break on presentation, not to memorialize the 
format of the original document, but there are probably a few documents 
in the mix which /do/ use it for its intended purpose, which skews the 
numbers (some way of identifying outliers would be useful). Hopefully, 
all of the documents contain some indication of how the <lb> element is 
used in that particular document.

In any event, the <lb> element needs to join the <hi> element in the 
list of elements prone to abuse that need to be distinguished.

-- 
Nothing of significance below this line.


From jon at noring.name  Fri Oct 12 14:03:03 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 12 Oct 2007 15:03:03 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470FD20D.1020000@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<470FD20D.1020000@novomail.net>
Message-ID: <1603744685.20071012150303@noring.name>

Lee Passey wrote:

> Finding all the "rend" attributes, and listing the array of values and
> what elements they are used on will be one of my next projects.

It will be interesting to see how the "rend" attribute has been used.

I do like the idea that PG/DP needs to standardize on something when
using the "rend" attribute (and, yes, I agree with Lee that the value
in "rend" should simply describe how the element was rendered in the
paper source.) I like Marcello's approach of using CSS 2.1/3.0 as the
attribute value. In cases where CSS cannot be used (can't think of
any, but no doubt it will occur for real odd stuff), then the PG/DP
folk need to standardize on a set of values.

Jon Noring


From klofstrom at gmail.com  Fri Oct 12 14:15:08 2007
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 12 Oct 2007 11:15:08 -1000
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <1603744685.20071012150303@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph> <470FD20D.1020000@novomail.net>
	<1603744685.20071012150303@noring.name>
Message-ID: <1e8e65080710121415icf7cba3vf89bf01c896ea46e@mail.gmail.com>

A question, from someone who hasn't grappled with TEI yet:

Perhaps it would be possible to do the TEI in two stages? One, a plain
vanilla TEI. Academic quality. Two, this TEI marked up into PGTEI (a
markup as automated as possible), a specialized format designed for
easy on-the-fly generation of ebooks in various formats.

This would make the plain vanilla TEI into the archival master, the
PGTEI into a completely behind the scenes format that could be changed
as formats and ebook readers change.

Just a thought. May or may not work.

--
Karen Lofstrom

From lee at novomail.net  Fri Oct 12 14:29:30 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 12 Oct 2007 15:29:30 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470EB4BC.9090201@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>
Message-ID: <470FE73A.6060302@novomail.net>

Forwarded without further comment. Metadata elements (<teiHeader> 
contents) from the PG TEI corpus:

Total Docs,  Percent , Total Use,  Avg / Doc, Element Name
        109,    100.00,       637,       5.84,            p
        109,    100.00,       276,       2.53,         date
        109,    100.00,       275,       2.52,         name
        109,    100.00,       244,       2.24,        title
        109,    100.00,       150,       1.38,         item
        109,    100.00,       143,       1.31,    publisher
        109,    100.00,       138,       1.27,     respStmt
        109,    100.00,       135,       1.24,       change
        109,    100.00,       126,       1.16,         idno
        109,    100.00,       124,       1.14,     language
        109,    100.00,       109,       1.00,    langUsage
        109,    100.00,       109,       1.00,publicationStmt
        109,    100.00,       109,       1.00, availability
        109,    100.00,       109,       1.00,    titleStmt
        109,    100.00,       109,       1.00,   sourceDesc
        109,    100.00,       109,       1.00,  profileDesc
        109,    100.00,       109,       1.00,     fileDesc
        109,    100.00,       109,       1.00, encodingDesc
        109,    100.00,       109,       1.00, revisionDesc
        108,     99.08,       151,       1.40,       author
        104,     95.41,       104,       1.00,      edition
        104,     95.41,       104,       1.00,  editionStmt
        101,     92.66,       200,       1.98,         bibl
        100,     91.74,       100,       1.00,     taxonomy
        100,     91.74,       100,       1.00,    classDecl
         99,     90.83,        99,       1.00,    textClass
         40,     36.70,        83,       2.08,           lb
         34,     31.19,        38,       1.12,    classCode
         33,     30.28,        44,       1.33,     pubPlace
         33,     30.28,        33,       1.00,         list
         33,     30.28,        33,       1.00,     keywords
         31,     28.44,        31,       1.00,      imprint
         10,      9.17,        10,       1.00,  projectDesc
          7,      6.42,         8,       1.14,       editor
          4,      3.67,         4,       1.00,editorialDecl
          3,      2.75,         4,       1.33,         xref
          3,      2.75,         3,       1.00,         resp

-- 
Nothing of significance below this line.


From jon at noring.name  Fri Oct 12 14:52:07 2007
From: jon at noring.name (Jon Noring)
Date: Fri, 12 Oct 2007 15:52:07 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470FC32B.5060409@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
Message-ID: <1408403531.20071012155207@noring.name>

Jeroen wrote:

[A quite informative message in response to Lee's]


> I hardly use <index>, which I believe would be the best way to deal with
> pre-existing indexes, but would be extremely labour intensive (unless
> somebody takes the time to build a tool for it. Ideally, you would
> resolve every term appearing in the index, and replace it with an
> <index> tag on the exact place intended by the indexer. This could be
> partly automated, but good indexes are often smart, that is, they would
> refer to a person mentioned on a page using a normalized name, which
> would break automated index-resolving tool. Some of my texts use <ref>
> to the extreme, as I let every entry in the index point back to the <pb>
> before the page they point to as a poor-mans alternative.

Back-of-book indexes are difficult to deal with for mastering
purposes, and something we almost need to put together a separate
working group to hammer out. I'm fortunate to have talked with one of
the world's top experts at indexing, and marking up for it. It is
complicated. Since we are dealing with existing books with existing
indexes (e.g., they will not be expanded or added to in the master),
we may determine that using <index> is the best way. (Unlike Tables of
Contents and title pages, I view original back-of-book indexes as being
*closer* to content, but not exactly there -- a sort of Twilight Zone
sort of thing -- oooh, I see Rod Serling walk out now!)

To be discussed at the appropriate time.


> I never used PGTEI, for the following reasons:
>
>     * Use of ASCII only. Since my SGML work predates Unicode, I don't
>       use Unicode and stick to ASCII only. All characters outside ASCII
>       are encoded with entities. When including sections in non-Latin
>       script, such as Greek, I use ad-hoc transcription schemes. Since I
>       have tools to convert these to Unicode, this is no problem.

There's something to be said with this approach, especially for
primarily English texts where the use of non-ASCII characters is
pretty constrained. Using a mnemonic character entities set, such as
that used in TEI, makes sense. (I don't remember: is the HTML
mnemonic character entities a subset of that used in TEI?)

As time goes on, our text editors will become more and more Unicode
conformant to the point where we may never use mnemonic character
entities in them at any stage of authorship.

Now this is not to say I'd disallow UTF-8 master documents which
encode characters beyond ASCII. We'd allow them. There now exist cool
freeware tools to convert between encodings and any character entities
in them, Such as BabelPad for Windows (highly recommended! No doubt
Mac has some similar freeware text encoding conversion tools.)

(In my case, I love my vi editor, and since it is not Unicode
conformant, I simply use either mnemonic or numerical entities to
represent characters beyond ASCII -- it's easy for me to type, for
example, &mdash; and &ouml;. Later, when I convert the docs to UTF-8
with all character entities converted to encoded characters, I use
BabelPad -- and if need be I can use BabelPad to go the other
direction.)


>     * Use of extensions. I try to avoid extensions to TEI, and stick
>       exclusively to TEILite, or borrow elements from the full-blown TEI
>       on a case-by-case basis when required.

Another good piece of advice. For primarily prose works, the TEI-Lite
probably has enough to do the job. And if not, then pull in individual
elements as needed.


>     * Use of |<divGen>| for tables of contents. Since we are digitizing
>       pre-existing texts, I avoid the use of the |<divGen>| attribute
>       for tables of contents and similar sections in favor of encoding
>       these as they appear in the source. The only exception is where
>       the source has no table of contents. Note that titles in original
>       tables of contents often differ considerably from the actual
>       headings used. Sometimes this is (apparently) intentional,
>       sometimes a mistake, which I will then correct.

Hmmm, I think we should seriously consider using Digital Talking Book's
NCX to "format" the tables of contents and other navigation lists
(like "List of Illustrations"), with the targets being ID's placed on
the appropriate elements in the content. NCX is quite powerful at
structuring such nav-lists, including ways to designate the table of
contents item description which, as someone else mentioned, oftentimes
is described differently in the original table of contents than the
associated header title. It is also hierarchical in structure, and,
finally, meets legal requirements for educational use of the books.
Oh, and the NCX is now ready for use in EPub.

The NCX may be embedded within the "master" TEI, probably using a
CDATA section so as to not create problems with DTD validation (one
could consider using namespaces, but namespacing, especially with
regards to validation, is still a royal mess.)


>     * Use of |<divGen>| for footnotes. I automatically generate footnote
>       sections at the end of the chapter they appear in. This requires
>       some tweaks with quoted sections, etc., but these can be handled
>       by software easily.

Hmmm, I believe the best system is to simply place the annotation
(whether it is a footnote, endnote, sidebar, etc.) at the point in the
main flow of the text where it naturally fits using the <note> tag. If
need be, add the appropriate attribute/value to describe where it was
placed originally.

The problem is that different digital renditions from the master will
each have its own best place and method to present the annotations. We
must not "force" placement in a particular place or manner. Let the
conversion tools take care of placement.

*****

Just my thoughts, but I may have overlooked some issues we need to
discuss, or there may be even better ways to do these things even
given my present understanding.

Jon Noring


From piggy at netronome.com  Fri Oct 12 18:53:52 2007
From: piggy at netronome.com (La Monte Henry Piggy Yarroll)
Date: Fri, 12 Oct 2007 21:53:52 -0400
Subject: [gutvol-d] Very strict subset of TEI P5 for most PG/DP books?
In-Reply-To: <1941308865.20071005114302@noring.name>
References: <6688876.1191591951762.JavaMail.?@fh1064.dia.cp.net>	<724651335.20071005100515@noring.name>	<15cfa2a50710051017i4b4476acv19a8aa63037c3089@mail.gmail.com>
	<1941308865.20071005114302@noring.name>
Message-ID: <47102530.5000607@netronome.com>

Jon Noring wrote:
> Robert
>   
>> Jon Noring wrote:
>>     
>>> (Aside, I've always believed that if we are to scan the public
>>> domain books, we should do so at sufficient scan quality so the
>>> scans will be useful for ludic reading, and not just as feed for
>>> OCR, and thus have always advocated higher quality master scans
>>> than has been done. This zeal for speed has troubled me, especially
>>> in that the bottleneck at DP is not scans, but proofing -- I would
>>> hope that DP will begin to encourage book scanners to focus on
>>> archival quality -- even presentation quality -- rather than the
>>> current "just scan 'em good enough for OCR." Make the scans
> ...
> So long as DP does not make any effort to encourage those who scan
> books to do so at archival or even presentational quality, most won't.
> But if it is encouraged, I think most will take the time to do it. If
> volunteers are given reasons why to do something to a certain higher
> level of quality, most will gladly do so. I reject the notion that
> *asking* them to take the effort to produce archival quality will
> turn them away. The end result is that a lot of high quality scan sets
> will result and be made available to the world.
>
> Good enough for DP should NOT be considered good enough.
>   
Here here!

But I don't think this is an issue for DP. We need a CPer site similar 
to DP suitable for distributing the work associated with producing 
digital facsimiles. I have about 150 books in various states of 
preparedness to contribute.

If you build it, I will come. :-)


From marcello at perathoner.de  Sat Oct 13 07:54:10 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 13 Oct 2007 16:54:10 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <1e8e65080710121415icf7cba3vf89bf01c896ea46e@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<470FC32B.5060409@bohol.ph>
	<470FD20D.1020000@novomail.net>	<1603744685.20071012150303@noring.name>
	<1e8e65080710121415icf7cba3vf89bf01c896ea46e@mail.gmail.com>
Message-ID: <4710DC12.7090402@perathoner.de>

Karen Lofstrom wrote:

> Perhaps it would be possible to do the TEI in two stages? One, a
> plain vanilla TEI. Academic quality. Two, this TEI marked up into
> PGTEI (a markup as automated as possible), a specialized format
> designed for easy on-the-fly generation of ebooks in various formats.

People think PGTEI is some sort of degraded, impure, tainted TEI because
it adds some tags and defines the rend attribute (which TEI leaves
intentionally undefined).

Well, it is not. PGTEI is a TEI application (extends TEI). PGTEI is in
no way different than TEI-Lite, which is another TEI application.

TEI was expressly designed with extensibility in mind and ways of
building TEI applications have been built into the TEI DTD:

> In brief, the TEI Guidelines define a general-purpose encoding scheme
> which makes it possible to encode different views of text, possibly
> intended for different applications, serving the majority of
> scholarly purposes of text studies in the humanities. However, no
> predefined encoding scheme can serve all research purposes.
> Therefore, the TEI also provides means of modifying and extending the
> encoding scheme defined by the Guidelines (see chapter 29 Modifying
> and Customizing the TEI DTD).

  ---- http://www.tei-c.org/P4X/AB.html#ABDPIU


-- 
Marcello Perathoner
webmaster at gutenberg.org


From lee at novomail.net  Sat Oct 13 11:16:13 2007
From: lee at novomail.net (Lee Passey)
Date: Sat, 13 Oct 2007 12:16:13 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <1e8e65080710121415icf7cba3vf89bf01c896ea46e@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<470FC32B.5060409@bohol.ph>
	<470FD20D.1020000@novomail.net>	<1603744685.20071012150303@noring.name>
	<1e8e65080710121415icf7cba3vf89bf01c896ea46e@mail.gmail.com>
Message-ID: <47110B6D.50806@novomail.net>

Karen Lofstrom wrote:

>  A question, from someone who hasn't grappled with TEI yet:
>
>  Perhaps it would be possible to do the TEI in two stages? One, a
>  plain vanilla TEI. Academic quality. Two, this TEI marked up into
>  PGTEI (a markup as automated as possible), a specialized format
>  designed for easy on-the-fly generation of ebooks in various formats.
>
>
>  This would make the plain vanilla TEI into the archival master, the
>  PGTEI into a completely behind the scenes format that could be
>  changed as formats and ebook readers change.
>
>  Just a thought. May or may not work.

Mr. Perathoner's response was a bit testy and defensive, so you may not 
have recognized that he answered your question.

Yes, what you are proposing could work; but in fact, step 2 is probably 
unnecessary because a good TEI file should contain all the information 
needed to generate the various formats without any additional human 
intervention.

As I understand it, PGTEI does extend TEI, in an approved way, but I 
also believe that those extensions are for the most part 
inconsequential. More importantly, as Mr. Perathoner points out, PGTEI 
/refines/ TEI to make it more usable.

Consider the "rend" attribute. In TEI it is used to indicate how a 
particular phrase was presented in the work being transcribed (for 
original works the rend attribute is not nearly as important). In most 
printed works, emphasized phrases are presented as italic text. In early 
PG editions, emphasized phrases are frequently presented as uppercase 
text. This distinction can be preserved in TEI by using the "rend" 
attribute: e.g. <emph rend="italic"> vs. <emph rend="uppercase">. In 
both cases you have noted that the text is emphasized, not merely 
rendered differently (potentially important in the case of 
text-to-speech), you permit the end user to decide how s/he prefers 
emphasized text rendered (ignoring the "rend" attribute) but you have 
preserved the distinction between the two.

The problem with the "rend" attribute is that TEI has provided no 
controlled vocabulary for the values associated with "rend" attributes. 
The P4 guidelines /do/ provide the <rendition> attribute which is 
designed to map "rend" values to some formal language but the element 
description does not explain how that is to be done (the draft P5 
guidelines have remedied that ambiguity).

PGTEI refines TEI by stating that only CSS rules may be used as values 
for the "rend" attribute. This simple statement formalizes TEI in such a 
way that it can now be used by automated processes designed to transform 
TEI into a presentation format. Small constraints as this, which cannot 
be expressed in a DTD, make a huge impact on the usability of a file, 
but they have no deleterious effect on its "academic quality."

Other refinements to TEI, such as the use of the <index> element 
combined with the <divGen> element to create lists, or the use of the 
<divGen> element to create a title page when the necessary information 
has been stored in the <teiHeader> element, are of marginal utility, but 
they do not make the file any less pure. As I see it, the biggest 
problem with these constructs is that they encourage people to omit 
transcribing title pages and tables of contents, believing that they 
will be generated later on the fly. Limited support for the <divGen> 
element, combined with the uncommon way it is used by the PG XSLT 
script, makes files which rely on <divGen> as an alternative to 
transcription less useful. Nevertheless, this problem is a side effect 
of the PGTEI extensions, not a result. If PPers make the effort to 
transcribe these valuable parts of a work, the presence or absence of 
the <divGen> is irrelevant.

As I understand it, there are some PGTEI extensions that may be required 
if you want to use the PG XSLT scripts to generate PDF files. Because 
PDF is a pre-print format, not an e-book format, I have paid no 
attention to these extensions. I do not know how the PG XSLT PDF 
conversion scripts might respond to a TEI file without those extensions, 
so someone else will have to fill us in on that score.

The bottom line is that a PGTEI conformant file can work as the master 
as well as a "plain vanilla" TEI file, and vice versa. If there is an 
automated way to add PGTEI extensions to a TEI file, then the later 
conversion processes can be modified to incorporate those methods 
rendering the addition of PGTEI extensions moot. At this point, while 
the two-stage model you propose is definitely feasible, I don't see any 
way it would be useful.


From ralf at ark.in-berlin.de  Sat Oct 13 11:28:08 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sat, 13 Oct 2007 20:28:08 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470FC32B.5060409@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
Message-ID: <20071013182808.GB5263@ark.in-berlin.de>

>     * Use of |<divGen>| for footnotes. I automatically generate footnote
>       sections at the end of the chapter they appear in. This requires
>       some tweaks with quoted sections, etc., but these can be handled
>       by software easily.

That's possible in PGTEI 0.4 too. Use <note place="end"> and
a divGen "footnote" at the end iof each chapter. However, numbers
are no reset, so you'll get marks with hundreds in some books.


ralf


From ralf at ark.in-berlin.de  Sat Oct 13 11:23:04 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sat, 13 Oct 2007 20:23:04 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470EB4BC.9090201@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
Message-ID: <20071013182304.GA5263@ark.in-berlin.de>

> The preponderance of the <hi> element (which is almost purely 
> presentational) together with the amount of paragraph abuse leads me to 
> conclude that even those people who are using TEI because it is 
> semantic, not presentational, are still marking up text using a 
> presentational philosophy. 

I don't thin so. I'm using <hi> as a first translation of <i> and <b>,
with the plan to change the <hi>s later to the special mark up. I'm
sure I've read somewhere in the docs an advice to that effect.

> I would bet that when I do an analysis of the 
> "rend" attribute on the <hi> element we will find that when it exists it 
> will almost always be 'italic'.

You will see bold and, in my german language text, gesperrt/antiqua,
which is then further defined in the style sheet.


ralf

From klofstrom at gmail.com  Sat Oct 13 12:49:34 2007
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Sat, 13 Oct 2007 09:49:34 -1000
Subject: [gutvol-d] Kindness to TEI novices
Message-ID: <1e8e65080710131249g6930021x648e18c64054b632@mail.gmail.com>

Thanks, Lee, for being so kind as to explain further.

I've got a charming but tiny, simple book to put through DP. I've
sworn to myself that I'm going to shepherd it through all the stages
of preparation. When I post-process it, I'm going to prepare a PGTEI
version, my first. Expect further stupid questions :)

-- 
Karen Lofstrom
aka Zora

From marcello at perathoner.de  Sat Oct 13 14:10:08 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 13 Oct 2007 23:10:08 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470FC32B.5060409@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>
Message-ID: <47113430.7050608@perathoner.de>

Jeroen Hellingman (Mailing List Account) wrote:

> I never used PGTEI, for the following reasons:

>     * Use of ASCII only. Since my SGML work predates Unicode, I don't
>       use Unicode and stick to ASCII only. All characters outside ASCII
>       are encoded with entities. When including sections in non-Latin
>       script, such as Greek, I use ad-hoc transcription schemes. Since I
>       have tools to convert these to Unicode, this is no problem.

You can do that in English. With other European languages it starts to
look bad and becomes completely impossible with cyrillic and asian scripts.

PGTEI has to accomodate all languages and therefore has full support for
unicode. Of course, it inputs many lesser encodings too, so if your
editor just won't do unicode, you still can use those other encodings +
entites.


>     * Use of extensions. I try to avoid extensions to TEI, and stick
>       exclusively to TEILite, or borrow elements from the full-blown TEI
>       on a case-by-case basis when required.

Funny. TEI-Lite is an extension of TEI.


>     * Use of |<divGen>| for tables of contents. Since we are digitizing
>       pre-existing texts, I avoid the use of the |<divGen>| attribute
>       for tables of contents and similar sections in favor of encoding
>       these as they appear in the source. The only exception is where
>       the source has no table of contents. Note that titles in original
>       tables of contents often differ considerably from the actual
>       headings used. Sometimes this is (apparently) intentional,
>       sometimes a mistake, which I will then correct.

Just the same as for title pages, use of <divGen> for building tables of
contents is optional.

You can build your table of contents by hand. Use <ref> and <anchor>
instead of <index> and <divGen>. It will just take you longer.


>     * Use of |<divGen>| for footnotes. I automatically generate footnote
>       sections at the end of the chapter they appear in. This requires
>       some tweaks with quoted sections, etc., but these can be handled
>       by software easily.

Use of <divGen> makes PGTEI more flexible because you can collect
footnotes at the end of the chapter OR at the end of the book, whichever
you like best. (More exactly: you can collect them *any* place a
<divGen> is legal markup.)


>     * Use of |<q>| elements. I try to follow the principle that TEI
>       texts should be the characters in the source plus tagging, and
>       thus encode all quotation marks with the proper characters or
>       character entities, as they appear in the source. I only use the
>       |<q>| element when required to add attributes to a run-in
>       quotation, but do not object to tagging it, but insist on keeping
>       the quotation marks.

Use the embedded TEI stylesheet to set rend="pre: none; post: none" to
<q> and <quote>. PGTEI will then no longer insert any quotation marks of
its own.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Sat Oct 13 16:32:19 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 13 Oct 2007 19:32:19 EDT
Subject: [gutvol-d] tim over at librarything.com
Message-ID: <c27.22dc4d12.3442af83@aol.com>

tim over at librarything.com is doing some interesting stuff,
providing a new feature called "common knowledge", which is
a "fielded wiki" where people can supply semi-structured data
about books and authors, such as "important places in a book",
"characters in a book", "author's residences", and so on...

you can read more about it here:
>    
http://www.librarything.com/blog/2007/10/common-knowledge-social-cataloging.php
>    http://www.librarything.com/blog/2007/10/common-knowledge-explodes.php

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071013/e5617b68/attachment.htm 

From joshua at hutchinson.net  Sun Oct 14 06:16:57 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Sun, 14 Oct 2007 13:16:57 +0000 (UTC)
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
Message-ID: <4994264.1192367817750.JavaMail.?@fh1064.dia.cp.net>

Plus, the way TEI works ... if someone were to take our "PGTEI" marked 
document and run it through their system that doesn't support things 
like <divGen> and our rend structure ... it'll just ignore them.  You 
wouldn't get an automated Table of Contents, but the rest of it would 
come across.  The layout might look different, since it would be 
ignoring the rend attributes, but the content would still be there.

Josh

>----Original Message----
>From: marcello at perathoner.de
>Date: Oct 13, 2007 10:54 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] The TEI 80/20 rule - empirical data
>
>Karen Lofstrom wrote:
>
>> Perhaps it would be possible to do the TEI in two stages? One, a
>> plain vanilla TEI. Academic quality. Two, this TEI marked up into
>> PGTEI (a markup as automated as possible), a specialized format
>> designed for easy on-the-fly generation of ebooks in various 
formats.
>
>People think PGTEI is some sort of degraded, impure, tainted TEI 
because
>it adds some tags and defines the rend attribute (which TEI leaves
>intentionally undefined).
>
>Well, it is not. PGTEI is a TEI application (extends TEI). PGTEI is 
in
>no way different than TEI-Lite, which is another TEI application.
>
>TEI was expressly designed with extensibility in mind and ways of
>building TEI applications have been built into the TEI DTD:
>
>> In brief, the TEI Guidelines define a general-purpose encoding 
scheme
>> which makes it possible to encode different views of text, possibly
>> intended for different applications, serving the majority of
>> scholarly purposes of text studies in the humanities. However, no
>> predefined encoding scheme can serve all research purposes.
>> Therefore, the TEI also provides means of modifying and extending 
the
>> encoding scheme defined by the Guidelines (see chapter 29 Modifying
>> and Customizing the TEI DTD).
>
>  ---- http://www.tei-c.org/P4X/AB.html#ABDPIU
>
>
>
>-- 
>Marcello Perathoner
>webmaster at gutenberg.org
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From jeroen.mailinglist at bohol.ph  Sun Oct 14 12:24:32 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sun, 14 Oct 2007 21:24:32 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <47113430.7050608@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de>
Message-ID: <47126CF0.7080508@bohol.ph>

Marcello Perathoner wrote:
> Jeroen Hellingman (Mailing List Account) wrote:
>
>
> PGTEI has to accomodate all languages and therefore has full support for
> unicode. Of course, it inputs many lesser encodings too, so if your
> editor just won't do unicode, you still can use those other encodings +
> entites.
>
>   
Ten years ago, support for Unicode was non-existing, and using any
character set but ASCII was a nightmare when working with more than one
system. (I had to work on both Macs, PCs in DOS and PCs in Windows at
that time.

Now that we have Unicode, it has become much easier.

>>     * Use of extensions. I try to avoid extensions to TEI, and stick
>>       exclusively to TEILite, or borrow elements from the full-blown TEI
>>       on a case-by-case basis when required.
>>     
>
> Funny. TEI-Lite is an extension of TEI.
>
>   
If you belong to the school that considers a subset an extension, I can
agree. I will not use extensions where perfectly valid TEI constructs
exists, or where I think the purpose of the tagging lies outside the
scope of semantic tagging (such as conditional switches in the tagged
text, not in the rendering code). If I need some really odd,
one-of-a-kind construct, I can always include an illustration. However,
if I have a need for an extension, I will certainly invent it.

> You can build your table of contents by hand. Use <ref> and <anchor>
> instead of <index> and <divGen>. It will just take you longer.
>
>   
When I build my table of contents by hand, based on the book at hand, I
will more accurately capture their contents. If I regenerate from the
available heads, they are often quite different.

> Use of <divGen> makes PGTEI more flexible because you can collect
> footnotes at the end of the chapter OR at the end of the book, whichever
> you like best. (More exactly: you can collect them *any* place a
> <divGen> is legal markup.)
>   

By tweaking my XSLT, I can put them anywhere I like, and leave that
choice to the person rendering, not the person encoding the text. I have
XSLT scripts that produce tweaked HTML that goes through Prince, and the
footnotes end up as footnotes in a PDF.


Jeroen.


From jon at noring.name  Sun Oct 14 13:10:26 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 14 Oct 2007 14:10:26 -0600
Subject: [gutvol-d] "California court tilts towards mandating web
	accessibility" -- ebook connection?
Message-ID: <1421257391.20071014141026@noring.name>

[I have already posted the following to The eBook Community, and am
reposting it here for discussion specific to PG and DP. I won't go
into detail in this preamble what I see are the connections, but
briefly it revolves around the use of the PG corpus in public schools,
and my suggestion that for TEI mastering, NCX be used for the
navigational lists, including the Table of Contents.]


Everyone,

Large and small publishers of digital text content, such as ebooks,
need to be aware that legal requirements (such as the Americans with
Disabilities Act, ADA) may eventually force them to adopt accessible-
friendly formats, and to implement them in an accessible manner.

(Btw, this is something I've predicted would happen the last 10 years
on The eBook Community, and we are now seeing the opening salvos.)

A recent ruling regarding Target and the accessibility of its web site
is a sort of presage of what may come to the digital publishing world:

   http://www.theregister.co.uk/2007/10/14/california_target_web_accessibility/

At first glance, this court case appears limited to publicly-accessible
web sites, but it is clear that any textual-content which is digitally
readable, whether online, or remotely, may be subject to disability
laws in the future, especially if such content is used in the public
sector, such as for education, the government, public libraries, etc.
(The likely scenario is that publishers have to provide at least an
accessible version, but because of production costs, this will likely
lead to simply using accessible formats for all publications as will
be mentioned later.)

Now a lot of digital text formats are accessible IF IMPLEMENTED
PROPERLY, but the issue is that many publishers today do not implement
them properly, as the Target case illustrates. (It's well-known that a
lot of web sites are wholly inaccessible because their markup focuses
on presentational markup rather than focusing on document structure
and important inline text semantics, and using CSS for most visual
styling -- refer to CSS Zen Garden, http://www.csszengarden.com/ , for
a demo on how web markup *should* be done. The W3C Web Content
Accessibility Guidelines, WCAG, should be religously followed as they
apply: http://www.w3.org/WAI/intro/wcag20 .)

Now, many ebook formats actually use HTML/XHTML in some fashion, such
as the new IDPF "EPub" format. So it is important that publishers use
only clean XHTML which minimizes presentational elements and
attributes -- and NEVER NEVER NEVER use tables for layout purposes --
and religiously follow WCAG guidelines as they apply. (I've run a web
site for several years now which uses tables for layout, something I'm
very much NOT proud of, and plan to upgrade it very soon.)

Unfortunately some tools which generate XHTML from word processing
formats produce garbage XHTML, and I mean garbage which is completely
inaccessible. Interestingly, the more "web accessible" the markup is,
the easier it is to author and edit and maintain, and it is much more
repurposeable, so with web accessibility comes greater
repurposeability, simpler documents, and ultimately lower cost in the
publishing workflow -- and without sacrificing presentation quality.
(So publishers can have their cake and eat it, too.)

Another aspect of the accessibility of digital texts is navigation,
and one that's less understood. It is little known that in the U.S.
all textbooks used in the K-12 sector must also be provided in a form
that follows DAISY's Digital Talking Book format, a sort of
supercharged XHTML (with some TEI influenced markup) that EPub now
supports. As part of this requirement, all DTB (and EPub) *must*
include what's called NCX, an XML document that contains each
publication's "navigational lists", including a required Table of
Contents. Publishers need to begin to understand NCX (at least the
EPub subset of NCX which is REQUIRED for all EPubs) and implement it
in their work flow.


Jon Noring


From jon at noring.name  Sun Oct 14 13:15:19 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 14 Oct 2007 14:15:19 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <47126CF0.7080508@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de> <47126CF0.7080508@bohol.ph>
Message-ID: <1249433446.20071014141519@noring.name>

Jeroen wrote:

> By tweaking my XSLT, I can put them anywhere I like, and leave that
> choice to the person rendering, not the person encoding the text. I have
> XSLT scripts that produce tweaked HTML that goes through Prince, and the
> footnotes end up as footnotes in a PDF.

Cool!

I've been following the development of the Prince application for a
few years now, and have gotten to know quite well Michael Day who
developed Prince. It shows the power of XML+CSS to produce fixed paged
output.


Jon Noring


From jeroen.mailinglist at bohol.ph  Sun Oct 14 13:56:56 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Sun, 14 Oct 2007 22:56:56 +0200
Subject: [gutvol-d] "California court tilts towards mandating
 web	accessibility" -- ebook connection?
In-Reply-To: <1421257391.20071014141026@noring.name>
References: <1421257391.20071014141026@noring.name>
Message-ID: <47128298.50100@bohol.ph>

Jon Noring wrote:
> Large and small publishers of digital text content, such as ebooks,
> need to be aware that legal requirements (such as the Americans with
> Disabilities Act, ADA) may eventually force them to adopt accessible-
> friendly formats, and to implement them in an accessible manner.
>   

I think (and hope), in the US, this type of thinking will eventually be
scrapped by the workings of the first amendment, as such requirements
add considerable cost to developing websites, and are thus an impediment
to free speech (These things are not just about using sane techniques
and semantic mark-up). It should be up to the speaker to decide in which
language to speak, and, if such language cannot be understood by a group
of people, that should be the speaker's choice. Otherwise, requirements
to make websites in basic English only will soon pop-up, and we will
only be able to serve the lowest common denominator of the idiots that
can somehow operate a computer. In other countries, I also hope common
sense will prevail.

Leaving principle aside, I strongly believe it is in your own interest,
if you want to be heard, to make sites as accessible as possible. I
fully believe that government sites, and others paid for by public money
should be legally prescribed to be accessible, but I reject the notion
that you, as a private individual or company, need to adjust your speech
to be heard by everybody, if you don't want that. Any other direction
would be a kind of censorship. (Just as those idiotic one sentence
French clauses in contracts that state that parties have agreed to use
English for the remainder).

As you may be aware, there has been a lot of talking about the latest
round of accessibility guidelines by various groups, and I believe, with
other critics
(http://www.theregister.co.uk/2007/10/10/web_accessibility_critic/) that
many of them will be ignored, and for good reasons as well (too vague to
be of practical applicability, too limiting on artistic expressions, etc.)

I've been working hard to make ebooks accessible, by providing HTML
versions with reading hints and navigation aids, and will add more of
such features in future, but I will never go as far as to rephrase them
in simple language (as some of the accessibility guidelines suggest).

Jeroen Hellingman


From marcello at perathoner.de  Sun Oct 14 16:18:59 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 15 Oct 2007 01:18:59 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <47126CF0.7080508@bohol.ph>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<470FC32B.5060409@bohol.ph>	<47113430.7050608@perathoner.de>
	<47126CF0.7080508@bohol.ph>
Message-ID: <4712A3E3.5030204@perathoner.de>

Jeroen Hellingman (Mailing List Account) wrote:

>> Funny. TEI-Lite is an extension of TEI.
>>   
> If you belong to the school that considers a subset an extension, I can
> agree.

TEI-Lite extends TEI in that it adds some tags that are not in TEI.


> When I build my table of contents by hand, based on the book at hand, I
> will more accurately capture their contents. If I regenerate from the
> available heads, they are often quite different.

PGTEI allows to build a TOC with entries differing from the chapter
heads, like this:

  <div>
    <index level1="Chapter 2">
    <head>Chapter the Second</head>

The TOC will say: Chapter 2.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Sun Oct 14 22:31:41 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 14 Oct 2007 23:31:41 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <4712A3E3.5030204@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de> <47126CF0.7080508@bohol.ph>
	<4712A3E3.5030204@perathoner.de>
Message-ID: <102352901.20071014233141@noring.name>

Marcello wrote:

> TEI-Lite extends TEI in that it adds some tags that are not in TEI.

Wow, I did not know this!

I was under the impression that TEI-Lite was a pure subset of TEI,
meaning that any conceivable XML document valid to TEI-Lite will
also validate to TEI.

So what are the "additions"? Is that briefly documented somewhere?

Such "additions" can be elements, attributes, attribute values (for
those attributes having a set of possible values), and element content
model differences.

Jon Noring


From ralf at ark.in-berlin.de  Mon Oct 15 01:21:11 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Mon, 15 Oct 2007 10:21:11 +0200
Subject: [gutvol-d] Kindness to TEI novices
In-Reply-To: <1e8e65080710131249g6930021x648e18c64054b632@mail.gmail.com>
References: <1e8e65080710131249g6930021x648e18c64054b632@mail.gmail.com>
Message-ID: <20071015082111.GB6969@ark.in-berlin.de>

> I've got a charming but tiny, simple book to put through DP. I've
> sworn to myself that I'm going to shepherd it through all the stages
> of preparation. When I post-process it, I'm going to prepare a PGTEI
> version, my first. Expect further stupid questions :)

Please see our DP thread
http://www.pgdp.net/phpBB2/viewtopic.php?t=16031

You might want to have a look also at the (in construction)
http://www.pgdp.net/wiki/Post-Processing_With_PGTEI_0.4


ralf


From ralf at ark.in-berlin.de  Sun Oct 14 07:40:27 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Sun, 14 Oct 2007 16:40:27 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <20071013182808.GB5263@ark.in-berlin.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<20071013182808.GB5263@ark.in-berlin.de>
Message-ID: <20071014144027.GA6093@ark.in-berlin.de>

(replying to myself)

> That's possible in PGTEI 0.4 too. Use <note place="end"> and
> a divGen "footnote" at the end iof each chapter. However, numbers
> are no reset, so you'll get marks with hundreds in some books.

Also, what's not possible is multiple marks for one footnote, and
a mark inside <sp> or <speaker> for marking the speaker name.

Both of which I would need for my current project.


ralf


From marcello at perathoner.de  Mon Oct 15 04:41:46 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 15 Oct 2007 13:41:46 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <102352901.20071014233141@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>	<47113430.7050608@perathoner.de>
	<47126CF0.7080508@bohol.ph>	<4712A3E3.5030204@perathoner.de>
	<102352901.20071014233141@noring.name>
Message-ID: <471351FA.2060100@perathoner.de>

Jon Noring wrote:

> So what are the "additions"? Is that briefly documented somewhere?


<!-- TEILiteX.dtd:  TEI.extensions.dtd file for TEI Lite      -->
<!-- Define some additions for the phrase level tags          -->
<!-- Revisions:                                               -->
<!-- 2002-01-21 : LB add type attribute for consistency       -->
<!-- 2001-12-07 : LB : parameterize for P4                    -->
<!-- 1995-02-17 : CMSMcQ : make file after agreements w/LB    -->

<!ENTITY % gi 'INCLUDE' >
<![ %gi; [
<!ELEMENT %n.gi;        %om.RO;  (#PCDATA)                          >
<!ATTLIST %n.gi;             %a.global;
          TEI                (yes | no)          "yes"
          TEIform            CDATA               'gi'           >
]]>

<!ENTITY % eg 'INCLUDE' >
<![ %eg; [
<!ELEMENT %n.eg;        %om.RR;  (#PCDATA)                          >
<!ATTLIST %n.eg;             %a.global;
          TEIform            CDATA               'eg'           >
]]>

<!ENTITY % code 'INCLUDE' >
<![ %code; [
<!ELEMENT code          %om.RO;  (#PCDATA)                          >
<!ATTLIST code               %a.global;                         >
]]>

<!ENTITY % ident 'INCLUDE' >
<![ %ident; [
<!ELEMENT ident         %om.RO;  (#PCDATA)                          >
<!ATTLIST ident              %a.global;
          type               CDATA #IMPLIED         >
]]>

<!ENTITY % kw 'INCLUDE' >
<![ %kw; [
<!ELEMENT kw            %om.RO;  (#PCDATA)                          >
<!ATTLIST kw                 %a.global;
          type          CDATA #IMPLIED     >
]]>


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Mon Oct 15 07:23:13 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 08:23:13 -0600
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <471351FA.2060100@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de> <47126CF0.7080508@bohol.ph>
	<4712A3E3.5030204@perathoner.de> <102352901.20071014233141@noring.name>
	<471351FA.2060100@perathoner.de>
Message-ID: <1811467071.20071015082313@noring.name>

[TEI P5 questions asked at end]


Marcello wrote:
> Jon Noring wrote:

>> So what are the "additions"? Is that briefly documented somewhere?

> <!-- TEILiteX.dtd:  TEI.extensions.dtd file for TEI Lite      -->
> <!-- Define some additions for the phrase level tags          -->
> <!-- Revisions:                                               -->
> <!-- 2002-01-21 : LB add type attribute for consistency       -->
> <!-- 2001-12-07 : LB : parameterize for P4                    -->
> <!-- 1995-02-17 : CMSMcQ : make file after agreements w/LB    -->

The element additions in TEI-Lite appear to be the elements <gi>,
<eg>, <code>, <ident> and <kw>. The attribute addition appears to be
to allow the 'type' attribute on <lb/>. Not sure what CMSMcQ is, or if
it is even relevant to PG/DP usage.

Checking the new P5, it looks like all the elements listed above have
been added (although not sure on <kw> -- there is a new <keywords>
which appears to do the same thing.) It does not appear that the
'type' attribute has been added to <lb/>.

That's only a quick analysis. I'm sure others here can provide more
complete analysis.

*****


O.k., about the new TEI P5 that seems poised to be issued as version
1.0 -- the usual questions: How do the TEI experts here view P5? Is
it an improvement? Should all PG/DP documents be upgraded to conform
to P5 (if not already)? Etc.


Jon Noring


From lee at novomail.net  Mon Oct 15 08:33:26 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 15 Oct 2007 09:33:26 -0600
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <1811467071.20071015082313@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>	<47113430.7050608@perathoner.de>
	<47126CF0.7080508@bohol.ph>	<4712A3E3.5030204@perathoner.de>
	<102352901.20071014233141@noring.name>	<471351FA.2060100@perathoner.de>
	<1811467071.20071015082313@noring.name>
Message-ID: <47138846.5030705@novomail.net>

Jon Noring wrote:
> [TEI P5 questions asked at end]
> 
> 
> Marcello wrote:
>> Jon Noring wrote:
> 
>>> So what are the "additions"? Is that briefly documented somewhere?
> 
>> <!-- TEILiteX.dtd:  TEI.extensions.dtd file for TEI Lite      -->
>> <!-- Define some additions for the phrase level tags          -->
>> <!-- Revisions:                                               -->
>> <!-- 2002-01-21 : LB add type attribute for consistency       -->
>> <!-- 2001-12-07 : LB : parameterize for P4                    -->
>> <!-- 1995-02-17 : CMSMcQ : make file after agreements w/LB    -->
> 
> The element additions in TEI-Lite appear to be the elements <gi>,
> <eg>, <code>, <ident> and <kw>. The attribute addition appears to be
> to allow the 'type' attribute on <lb/>. Not sure what CMSMcQ is, or if
> it is even relevant to PG/DP usage.

LB => Lou Burnard
CMSMcQ => C. M. Sperberg-McQueen

As near as I can tell, (I speak only pidgin DTD) there is nothing in 
here about the <lb> element. Instead, the DTD indicates that TEI was 
extended by the addition of 5 new elements, which presumably are 
documented elsewhere.

-- 
Nothing of significance below this line.


From hart at pglaf.org  Mon Oct 15 08:34:22 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 15 Oct 2007 08:34:22 -0700 (PDT)
Subject: [gutvol-d] Research on Listservers
Message-ID: <Pine.LNX.4.64.0710150834060.16118@pglaf.org>

Research on Listservers


It has now been nearly 20 years since I started my first
listserver, and I have been wondering if anyone else may
have noticed any yearly trends they could report on.

Even if you have just a small suspicion that listservers
act slightly differently at some times of the year, just
let me know what you think, and perhaps we may spot some
kind of patterns that may help understand listservers in
future of the Internet.


Please email me at:

hart at pglaf.org

I will reply to all such emails, so if you do not get an
answer in a few days, please email me again.

Please feel free to forward this to other listservers.


Thanks!!!


Michael S. Hart
Founder
Project Gutenberg


From marcello at perathoner.de  Mon Oct 15 08:40:14 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 15 Oct 2007 17:40:14 +0200
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <1811467071.20071015082313@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>	<47113430.7050608@perathoner.de>
	<47126CF0.7080508@bohol.ph>	<4712A3E3.5030204@perathoner.de>
	<102352901.20071014233141@noring.name>	<471351FA.2060100@perathoner.de>
	<1811467071.20071015082313@noring.name>
Message-ID: <471389DE.6040801@perathoner.de>

Jon Noring wrote:

>> <!-- 2002-01-21 : LB add type attribute for consistency       -->

> It does not appear that the
> 'type' attribute has been added to <lb/>.

LB == Lou Burnard


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Mon Oct 15 08:48:36 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 09:48:36 -0600
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <471389DE.6040801@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de> <47126CF0.7080508@bohol.ph>
	<4712A3E3.5030204@perathoner.de> <102352901.20071014233141@noring.name>
	<471351FA.2060100@perathoner.de>
	<1811467071.20071015082313@noring.name>
	<471389DE.6040801@perathoner.de>
Message-ID: <1247294775.20071015094836@noring.name>

Marcello wrote:
> Jon Noring wrote:

>>> <!-- 2002-01-21 : LB add type attribute for consistency       -->

>> It does not appear that the
>> 'type' attribute has been added to <lb/>.

> LB == Lou Burnard

<laugh type="egg on my face"/>

I should have noticed since LB was capitalized!

Jon


From jon at noring.name  Mon Oct 15 08:55:03 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 09:55:03 -0600
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <47138846.5030705@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net> <470FC32B.5060409@bohol.ph>
	<47113430.7050608@perathoner.de> <47126CF0.7080508@bohol.ph>
	<4712A3E3.5030204@perathoner.de> <102352901.20071014233141@noring.name>
	<471351FA.2060100@perathoner.de>
	<1811467071.20071015082313@noring.name>
	<47138846.5030705@novomail.net>
Message-ID: <1936014050.20071015095503@noring.name>

Lee Passey wrote:
> Jon Noring wrote:
>> From TEI-Lite DTD

>>> <!-- TEILiteX.dtd:  TEI.extensions.dtd file for TEI Lite      -->
>>> <!-- Define some additions for the phrase level tags          -->
>>> <!-- Revisions:                                               -->
>>> <!-- 2002-01-21 : LB add type attribute for consistency       -->
>>> <!-- 2001-12-07 : LB : parameterize for P4                    -->
>>> <!-- 1995-02-17 : CMSMcQ : make file after agreements w/LB    -->

>> The element additions in TEI-Lite appear to be the elements <gi>,
>> <eg>, <code>, <ident> and <kw>. The attribute addition appears to be
>> to allow the 'type' attribute on <lb/>. Not sure what CMSMcQ is, or if
>> it is even relevant to PG/DP usage.

> LB =>> Lou Burnard
> CMSMcQ =>> C. M. Sperberg-McQueen

As noted in my other reply, I certainly misread that DTD comment big
time. The capitalization should have been a clue that they weren't
element names.

I was fortunate to have met C. Michael Sperberg-McQueen (brother of
Roger Sperberg) at an XML Conference in Philadelphia in late 1999. We
chatted for a short while. Brilliant person.


> As near as I can tell, (I speak only pidgin DTD) there is nothing in 
> here about the <lb> element. Instead, the DTD indicates that TEI was 
> extended by the addition of 5 new elements, which presumably are 
> documented elsewhere.

It does seem like four of these elements have been added to P5, with the
fifth, TEI-Lite <kw>, being represented by the P5 <keywords> (not sure
on this, though.)


Jon Noring


From lee at novomail.net  Mon Oct 15 09:18:07 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 15 Oct 2007 10:18:07 -0600
Subject: [gutvol-d] What about P5? (was additions to TEI-Lite)
In-Reply-To: <1936014050.20071015095503@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>	<47113430.7050608@perathoner.de>
	<47126CF0.7080508@bohol.ph>	<4712A3E3.5030204@perathoner.de>
	<102352901.20071014233141@noring.name>	<471351FA.2060100@perathoner.de>	<1811467071.20071015082313@noring.name>	<47138846.5030705@novomail.net>
	<1936014050.20071015095503@noring.name>
Message-ID: <471392BF.10302@novomail.net>

Jon Noring wrote:

> It does seem like four of these elements have been added to P5, with the
> fifth, TEI-Lite <kw>, being represented by the P5 <keywords> (not sure
> on this, though.)

The <keywords> tag already exists in P5, P4 and Lite. It is designed to 
hold a list of keywords, perhaps from some controlled vocabulary. In 
TEI-Lite, the individual keywords can be indicated using the <kw> tag. 
In P5, the individual keywords are indicated either by using the <term> 
tag, or by using a <list> element which is composed of <item>s.


-- 
Nothing of significance below this line.


From piggy at netronome.com  Mon Oct 15 10:26:24 2007
From: piggy at netronome.com (La Monte Henry Piggy Yarroll)
Date: Mon, 15 Oct 2007 13:26:24 -0400
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <470EB4BC.9090201@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>
Message-ID: <4713A2C0.1090004@netronome.com>

Lee Passey wrote:
> [snip]
>
> Yet more grist for the mill:
>
> ...
> Total Docs,Percentage, Total Use,Avg per Doc,   Element Name
>         109,    100.00,      9928,     91.08,    head
>         109,    100.00,      9654,     88.57,    div
>         109,    100.00,       623,      5.72,    divGen
>         109,    100.00,       109,      1.00,    front
>         109,    100.00,       109,      1.00,    body
>         109,    100.00,       109,      1.00,    back
>         108,     99.08,     87341,    808.71,    p
>         106,     97.25,     18293,    172.58,    index
>          95,     87.16,     28950,    304.74,    hi
>          69,     63.30,      5170,     74.93,    lb
>          66,     60.55,      8796,    133.27,    note
>          63,     57.80,       175,      2.78,    then
>          63,     57.80,       175,      2.78,    pgIf
>          63,     57.80,       174,      2.76,    else
>          52,     47.71,      9189,    176.71,    pb
>          49,     44.95,      8731,    178.18,    anchor
>          34,     31.19,       640,     18.82,    figure
>          34,     31.19,       640,     18.82,    figDesc
>          32,     29.36,     11891,    371.59,    l
>          32,     29.36,       899,     28.09,    lg
>          30,     27.52,      3898,    129.93,    milestone
>          22,     20.18,        33,      1.50,    titlePart
>          22,     20.18,        23,      1.05,    docImprint
>          22,     20.18,        22,      1.00,    docTitle
>          22,     20.18,        22,      1.00,    titlePage
>          21,     19.27,        22,      1.05,    docAuthor
>          20,     18.35,       165,      8.25,    quote
>          20,     18.35,        21,      1.05,    byline
>          16,     14.68,      9818,    613.63,    cell
>          16,     14.68,      4167,    260.44,    row
>          16,     14.68,       313,     19.56,    table
>          14,     12.84,      5392,    385.14,    ref
>          14,     12.84,        14,      1.00,    docDate
>          13,     11.93,      1038,     79.85,    item
>          13,     11.93,       177,     13.62,    list
>          11,     10.09,       134,     12.18,    corr
>           8,      7.34,      7161,    895.13,    q
>           7,      6.42,         7,      1.00,    docEdition
>           5,      4.59,       559,    111.80,    emph
>           4,      3.67,       152,     38.00,    title
>           3,      2.75,       494,    164.67,    abbr
>           3,      2.75,       212,     70.67,    foreign
>           3,      2.75,        81,     27.00,    reg
>           3,      2.75,         8,      2.67,    name
>           2,      1.83,       671,    335.50,    formula
>           2,      1.83,        28,     14.00,    sic
>           2,      1.83,        27,     13.50,    bibl
>           2,      1.83,        26,     13.00,    author
>           2,      1.83,         2,      1.00,    epigraph
>           2,      1.83,         2,      1.00,    date
>           2,      1.83,         2,      1.00,    add
>           2,      1.83,         2,      1.00,    trailer
>           1,      0.92,        13,     13.00,    label
>           1,      0.92,         8,      8.00,    argument
>           1,      0.92,         2,      2.00,    del
>           1,      0.92,         1,      1.00,    eg
>           1,      0.92,         1,      1.00,    cit
>   
Fascinating. With all three datasets I see the same rough shape. Simply 
plot the frequency of each element in rank order.

It looks to me like two separate statistical processes operating on the 
same data.

If you look at the first four, p, hi, index and l, you have the start of 
something like an exponential distribution.

 From rank 5 through rank 15, we have something that looks more like a 
Gaussian hump. This encompasses head, cell, div, pb, note, anchor, q, 
ref, lb, row and milestone.

 From rank 16 onward, item, lg, formula, figure, etc... we have a long 
tail which might be the sum of a Gaussian tail and an exponential tail.

What are the two selection processes? I am going to speculate that 
presentational markup is the exponential process and structural markup 
is the Gaussian process.

I would further speculate that we could define a "flatness" metric which 
would be higher for documents using proportionately more structural 
markup than presentational markup.


From lee at novomail.net  Mon Oct 15 10:53:52 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 15 Oct 2007 11:53:52 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <20071013182304.GA5263@ark.in-berlin.de>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
Message-ID: <4713A930.7060103@novomail.net>

Ralf Stephan wrote:

> > The preponderance of the <hi> element (which is almost purely
> > presentational) together with the amount of paragraph abuse leads
> > me to conclude that even those people who are using TEI because it
> > is semantic, not presentational, are still marking up text using a
> > presentational philosophy.
>
>  I don't think so. I'm using <hi> as a first translation of <i> and
>  <b>, with the plan to change the <hi>s later to the special mark up.
>  I'm sure I've read somewhere in the docs an advice to that effect.


I think Josh Hutchinson is the biggest proponent of TEI levels, level 1 
being bare bones and level 3 (or higher) being only for uber-geeks. Mr. 
Perathoner alludes to it his "Guide to PGTEI" 
(http://pgtei.pglaf.org/marcello/0.3/doc/20000-h.html) when he states:

<cite>
You can and should mark up a text incrementally. That is: make more than 
one pass over the whole text and in each pass mark up a subset of elements.

You may start marking only the most prominent text features like 
chapters and paragraphs. Later you make a second pass marking all 
italicized text. If you still want to do more, make another pass 
replacing all quotation marks with the <q> element.

TODO: a PG working group needs to codify different ?levels? of PGTEI markup.
</cite>

However, just because you are following this guidance, doesn't mean that 
the markup isn't presentational; indeed it makes it more likely that it 
/is/ presentational, and not semantic. (Semantic: "of, pertaining to, or 
arising from the different meanings of words or other symbols.")

When you mark a passage with <hi rend="italic"> (or some variation 
thereof) what you are saying is, "I cannot or will not state with 
confidence why this passage was italicized, but it was." When you mark a 
passage with <emph rend="italic"> what you are saying is "this passage 
is emphasized; in the original text it was italicized, but you may 
render it in whatever way you render emphasized text." Saying that 
something was presented in a particular way is presentational markup, 
saying /why/ it was presented in a particular way is semantic markup.

Generally, I support the notion of levels of markup. However, it can 
lead to some unfortunate consequences. In the first place it can lead to 
the loss of data. While the <hi> element can contain an attribute 
indicating how the highlighting was rendered in a particular addition, 
the "type" attribute is not allowed. Therefore, there is no good way to 
indicate "this <hi> differs from that <hi>, and the second needs to be 
revisited to determine its semantic meaning." Neither is there any good 
way to record any hints about why this particular rendering may have 
been used. If you view purely presentational markup as "level one," then 
I would suggest that HTML ought to be level one markup; it contains all 
the power of purely presentational TEI, and is more directly useful.

The second, and perhaps more pernicious, problem with being satisfied 
with low levels of markup is that once such a text is placed in the PG 
database the chances of anyone coming along later to fix the markup 
become vanishingly small. Looking at my data I see that while 87% of the 
texts use the <hi> (presentational) element, only 4.6% use the <emph> 
(semantic) element. What are the chances that anyone is going to go 
through all those texts and convert all the presentational markup to 
semantic markup? And how much harder would it have been for the original 
poster to just use semantic markup in the preparation of the texts in 
the first place? I'm a firm believer in the old adage that it is easier 
to do things right than to do them over. My suggestion is that if you're 
using <hi> in the first pass, with the intent to convert them to 
semantic markup in a subsequent pass, you probably ought to keep them in 
your queue and not pass them on to PG until the upgrade has occurred.

What I am trying to do is come up with an overview of different levels 
of markup complexity which still maintain the unique semantic markup 
which is the hallmark of TEI.

As a more complete example, consider the <p> element. This element is 
supposed to be used for paragraphs, which are a group of one or more 
complete sentences which express a single thought or topic. 
Unfortunately, OCR programs are so far incapable of detecting complete 
sentences to say nothing of single thoughts or topics. So HTML output 
from an OCR program will mark /every/ block with the <p> element; they 
simply can't do any better than that.

So let's say you've just completed an OCR of "Rip Foster Rides the Gray 
Planet." You might find a resultant passage such as:

...
could hear, "I'll bust the bubble of any son of a space sausage who 
laughs!"</p>
<p style="text-align:center">Chapter Two - Rake That Radiation!</p>
<p>The deputy commander and the safety officer got untangled and hurried to
...

It's pretty obvious that the second "paragraph" is not really a 
paragraph, it's the title of the next chapter. If you were to change the 
"style" keyword to "rend" you would have perfectly legal TEI markup -- 
but it would still be purely presentational. In fact, marking the 
chapter title as a <p> may be said to be anti-semantic; by doing so you 
are, in essence, saying, "make this block /look/ like a paragraph, but 
don't give it the /meaning/ of a paragraph."

Now if you wanted to keep the presentational aspects of a paragraph 
without lying about the semantics, you could replace the <p> in the 
title with <ab>, which "contains any arbitrary component-level unit of 
text, ... analogous to, but without the semantic baggage of, a 
paragraph." And if you wanted to take advantage of the PGTEI XSLT 
transformation script to auto-generate a table of contents you could add 
an <index> element. Now you would have:

...
could hear, "I'll bust the bubble of any son of a space sausage who 
laughs!"</p>
<index index="toc" level1="Chapter Two - Rake That Radiation!" />
<ab style="text-align:center">Chapter Two - Rake That Radiation!</ab>
<p>The deputy commander and the safety officer got untangled and hurried to
...

Note: the attributes of the <index> element have changed in P5.

If you look carefully, you will see that this markup is still mostly 
presentational. The only improvement is that now only real paragraphs 
are marked as <p>. Even the use of the <index> element is 
presentational; it says "build me a list referring to this point in the 
text, without any implication as to what the list may signify."

What we really want to do is change the <ab> to <head>, because that's 
what the phrase is: it's the heading on a chapter. Unfortunately, we 
can't just change the elements, because TEI only allows <head> elements 
to appear at the beginning of a textual division, before any paragraphs. 
That's really not a big deal, because we want to identify each chapter 
as its own division of text anyway. So if we add some <div>s, and change 
the <ab> to head we might end up with:

...
could hear, "I'll bust the bubble of any son of a space sausage who 
laughs!"</p>
</div>
<index index="toc" level1="Chapter Two - Rake That Radiation!" />
<div rend="page-break-before: always">
<head style="text-align:center">Chapter Two - Rake That Radiation!</head>
<p>The deputy commander and the safety officer got untangled and hurried to
...

Now we're starting to get some semantic markup in the file. All <p>s are 
paragraphs, chapter headings are marked as headings, and each chapter is 
marked as being a unified division of the text. Unfortunately, we still 
have some presentational cruft hanging around. Maybe we want to keep it 
around for historical purposes, but maybe we can remove it as being 
implied in semantic markup (or even make it explicit in a <rendition> 
element).

Our most recent revision doesn't really specify what the nature of the 
divisions are, so let's start by making that explicit. And if all our 
chapters start on a new page, the "rend" attribute that tells us that 
that is how it was done is superfluous. So let's add a "type" attribute 
to our <div>s, and get rid of the "rend". We now have:

...
could hear, "I'll bust the bubble of any son of a space sausage who 
laughs!"</p>
</div>
<index index="toc" level1="Chapter Two - Rake That Radiation!" />
<div type="chapter">
<head style="text-align:center">Chapter Two - Rake That Radiation!</head>
<p>The deputy commander and the safety officer got untangled and hurried to
...

Now we can see that because all chapter headings are centered, and 
because we now know that the <head> element belongs to a chapter, the 
"rend" attribute on the <head> element is unnecessary. And because we 
know where every chapter begins, and we know how to get the title of 
every chapter, we can build a table of contents if we wanted to without 
the need of an <index> element. The resulting markup is now:

...
could hear, "I'll bust the bubble of any son of a space sausage who 
laughs!"</p>
</div>
<div type="chapter">
<head>Chapter Two - Rake That Radiation!</head>
<p>The deputy commander and the safety officer got untangled and hurried to
...

What we have done here is taken a purely presentational TEI markup and 
transformed it into a purely semantic TEI markup. And yet, the only tags 
we have used are <div>, <head> and <p>. This feels to me like it is 
still "level one;" we have only used basic tags and not very many of 
them at that. What /has/ changed is our mindset in marking up the text.


From jeroen.mailinglist at bohol.ph  Mon Oct 15 11:20:30 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Mon, 15 Oct 2007 20:20:30 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <20071014144027.GA6093@ark.in-berlin.de>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>
	<470FC32B.5060409@bohol.ph>	<20071013182808.GB5263@ark.in-berlin.de>
	<20071014144027.GA6093@ark.in-berlin.de>
Message-ID: <4713AF6E.9090606@bohol.ph>


I typically resolve the case of multiple markers for a footnote by saying:

blah blah<ref target=n123.2 type=noteref>2</ref> blah blah.

My XSLT will pick up this reference (based on its type), and render it
as a link to the footnote in question, similar to the footnote reference
for the footnote, taking care to replace the number with whatever the
number of the footnote in the result has become.

The speaker marks would require some tweaking of the DTD (and processing
scripts)

Jeroen.

Ralf Stephan wrote:
> (replying to myself)
>
>   
>> That's possible in PGTEI 0.4 too. Use <note place="end"> and
>> a divGen "footnote" at the end iof each chapter. However, numbers
>> are no reset, so you'll get marks with hundreds in some books.
>>     
>
> Also, what's not possible is multiple marks for one footnote, and
> a mark inside <sp> or <speaker> for marking the speaker name.
>
> Both of which I would need for my current project.
>
>
> ralf
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>   


From marcello at perathoner.de  Mon Oct 15 11:30:47 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 15 Oct 2007 20:30:47 +0200
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <4713A930.7060103@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
Message-ID: <4713B1D7.7020301@perathoner.de>

Lee Passey wrote:

> I think Josh Hutchinson is the biggest proponent of TEI levels, level 1 
> being bare bones and level 3 (or higher) being only for uber-geeks. Mr. 
> Perathoner alludes to it his "Guide to PGTEI" 
> (http://pgtei.pglaf.org/marcello/0.3/doc/20000-h.html) when he states:
> 
> <cite>
> You can and should mark up a text incrementally. That is: make more than 
> one pass over the whole text and in each pass mark up a subset of elements.
> 
> You may start marking only the most prominent text features like 
> chapters and paragraphs. Later you make a second pass marking all 
> italicized text. If you still want to do more, make another pass 
> replacing all quotation marks with the <q> element.
> 
> TODO: a PG working group needs to codify different ?levels? of PGTEI markup.
> </cite>
> 
> However, just because you are following this guidance, doesn't mean that 
> the markup isn't presentational; indeed it makes it more likely that it 
> /is/ presentational, and not semantic. (Semantic: "of, pertaining to, or 
> arising from the different meanings of words or other symbols.")

Doing a complete text in TEI in one big single pass can take many days.
Doing multiple passes has several advantages:

- Newbies can do the easy stuff
- and experts the hard stuff
- Different experts can work their fields of expertise:
  - teiHeader metadata expert
  - title page formatting expert
  - native speakers of LOTE
  - history expert
- You get more consistent markup because you focus on one aspect only
- You know what to expect from previous passes (use grep)
- Markup levels can be assigned to DP rounds


> While the <hi> element can contain an attribute 
> indicating how the highlighting was rendered in a particular addition, 
> the "type" attribute is not allowed. Therefore, there is no good way to 
> indicate "this <hi> differs from that <hi>, and the second needs to be 
> revisited to determine its semantic meaning." Neither is there any good 
> way to record any hints about why this particular rendering may have 
> been used.

<hi rend="italic"><!-- fixme: emph or foreign? -->Par Dieu!</hi>.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Mon Oct 15 11:37:15 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 15 Oct 2007 14:37:15 EDT
Subject: [gutvol-d] it's good to see the .tei people
Message-ID: <be5.18c1e5c6.34450d5b@aol.com>

it's good to see the .tei people wasting their time trying to figure it out.
i was waiting for a long time for them to start doing that.   i welcome it!

but don't make the mistake of letting them waste _your_ time!            :+)

seriously, light markup will give the big benefits _without_ the big costs.

for the record, those big benefits are (1) a simple transition from o.c.r.
into a plain-text "master format", (2) easy maintenance of that "master",
(3) button-click conversion to other formats by users themselves, and
(4) new functionalities from developers due to a straightforward format.

i mean, jump into the gobbledygook soup if you _like_ that sorta thing,
but if it makes your skin crawl, understand that you do _not_ "need" it...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071015/166407b3/attachment.htm 

From jon at noring.name  Mon Oct 15 11:52:16 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 12:52:16 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <be5.18c1e5c6.34450d5b@aol.com>
References: <be5.18c1e5c6.34450d5b@aol.com>
Message-ID: <6410280365.20071015125216@noring.name>

Bowerbird wrote:

>  i mean, jump into the gobbledygook soup if you _like_ that sorta thing,
>  but if it makes your skin crawl, understand that you do _not_ "need" it...

On what objective basis do you make this claim ("you do not need TEI")?

As I've asked recently, the proof is in the pudding (to use
Bowerbird's own phrase), and the best proof to show that ZML is
sufficient to properly structure PG texts for *mastering* purposes is
to take a representative sample of PG texts (I said 10, but 50 or 100
would be better) and "mark them up" in ZML. Then post for commentary.

It is important the sample include some tough stuff. So let the PG/DP
crowd pick the list of representative texts to convert to ZML.

Once we have a bunch of files, then that puts the burden on the PG and
DPers to focus on them, and tell the group here what is missing in the
ZML renderings for mastering purposes. This should lead to a
discussion of what is important the master needs to do (which has not
yet been put all together into a cogent story), and possibly point out
where ZML could be tweaked to improve it. (Of course, some of us believe
that ZML is not sufficient for mastering purposes, and of course we
eventually have to get down to specifics, just as Bowerbird also needs
to get down to specifics rather than making vague statements that ZML
is "sufficient".)

Nevertheless, as I've always said, so long as there is a need for the
PG collection to include "plain text" versions of the books, ZML is a
good candidate for that since it does normalize the texts. This is not
the same as mastering.

Jon Noring


From lee at novomail.net  Mon Oct 15 14:53:54 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 15 Oct 2007 15:53:54 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <6410280365.20071015125216@noring.name>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name>
Message-ID: <4713E172.3070106@novomail.net>

Jon Noring wrote:

> Nevertheless, as I've always said, so long as there is a need for the
> PG collection to include "plain text" versions of the books, ZML is a
> good candidate for that since it does normalize the texts. 

ZML does not have a mechanism to block indent text, which for me is a 
show-stopper.

-- 
Nothing of significance below this line.


From jon at noring.name  Mon Oct 15 14:55:47 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 15:55:47 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <4713E172.3070106@novomail.net>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name> <4713E172.3070106@novomail.net>
Message-ID: <1877073571.20071015155547@noring.name>

Lee wrote:
> Jon Noring wrote:

>> Nevertheless, as I've always said, so long as there is a need for the
>> PG collection to include "plain text" versions of the books, ZML is a
>> good candidate for that since it does normalize the texts. 

> ZML does not have a mechanism to block indent text, which for me is a 
> show-stopper.

Can you explain that in a little more detail, maybe with an example?
And is this for mastering purposes, or a plain text rendition (not for
mastering)?

Jon


From lee at novomail.net  Mon Oct 15 15:26:39 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 15 Oct 2007 16:26:39 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <1877073571.20071015155547@noring.name>
References: <be5.18c1e5c6.34450d5b@aol.com>	<6410280365.20071015125216@noring.name>
	<4713E172.3070106@novomail.net>
	<1877073571.20071015155547@noring.name>
Message-ID: <4713E91F.1040707@novomail.net>

Jon Noring wrote:
> Lee wrote:
>> Jon Noring wrote:
>>
>>> Nevertheless, as I've always said, so long as there is a need for the
>>> PG collection to include "plain text" versions of the books, ZML is a
>>> good candidate for that since it does normalize the texts. 
>>
>> ZML does not have a mechanism to block indent text, which for me is a 
>> show-stopper.
> 
> Can you explain that in a little more detail, maybe with an example?

Sure. You're looking at it.

Block indentation is when you take a "component-level unit of
text" and indent the entire unit. Typically, this kind of text is a 
lengthy quotation from another source, but it certainly doesn't have to 
be. Me, quoting you, would be block indented in a message. You, quoting 
me, quoting you, should probably be nested indents. Me, quoting you, 
quoting me, quoting you would be yet another nested indent. My e-mail 
program (Thunderbird) indicates each one of these indent levels with an 
angle bracket in the first column. A good user agent will actually 
indent with spaces rather than a visual cue.

In HTML, these block indents are marked with the <blockquote> element. 
So in HTML the above example would be coded as:

Jon Noring wrote:
<blockquote>
   Lee wrote:
   <blockquote>
     Jon Noring wrote:
     <blockquote>
       Nevertheless, as I've always said ...
     </blockquote>
     ZML does not have a ...
   </blockquote>
   Can you explain that in ...
</blockquote>

The important thing is that not only must the text be repeatedly 
indented, when appropriate, the interior text must also be word-wrapped. 
In ZML, any whitespace in the first column means that word-wrapping is 
turned off, so block indentation cannot be accomplished by hand. Some 
markup, such as the angle brackets used by Thunderbird or the 
<blockquote> tag in HTML has to be devised; so far it has not been.

> And is this for mastering purposes, or a plain text rendition (not for
> mastering)?

Because the capability doesn't exist, it doesn't matter. It doesn't 
exist for /either/ purpose.

-- 
Nothing of significance below this line.


From jon at noring.name  Mon Oct 15 18:29:37 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 19:29:37 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <4713E91F.1040707@novomail.net>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name> <4713E172.3070106@novomail.net>
	<1877073571.20071015155547@noring.name> <4713E91F.1040707@novomail.net>
Message-ID: <1272903299.20071015192937@noring.name>

Lee Passey wrote:
> Jon Noring wrote:
>> Lee wrote:

>>> ZML does not have a mechanism to block indent text, which for me is a 
>>> show-stopper.

>> Can you explain that in a little more detail, maybe with an example?

> In ZML, any whitespace in the first column means that word-wrapping is
> turned off, so block indentation cannot be accomplished by hand. Some 
> markup, such as the angle brackets used by Thunderbird or the 
> <blockquote> tag in HTML has to be devised; so far it has not been.

O.k.!

I looked at Bowerbird's online "11 Rules of ZML", and if those are
still the complete set of rules, then I see no means to identify block
quotes which contain prose (like paragraphs), where the user agent
is expected to auto-wrap the block quote content (to differentiate it
from fixed line structures like verse lines in poetry.)

If this is indeed the case (if Bowerbird has not come up with a clever
way to identify block quotes in ZML, or I missed how to do this given
the rules I've looked at), then ZML is truly insufficient as a
mastering format and iffy as a derivative plain text format.

Block quote support is a must for any mastering format. (Note that a
block quote may contain a mix of prose, verse, and other things -- in
essence a block quote can be a standalone ZML document in and of
itself.)

I've cc'd Bowerbird on this, but his supposed spam filter will
probably sieve this one out. I guess a friend of his can forward this
to him. <smile/>

Jon Noring


(p.s., if ZML were to differentiate the purpose between tabs and spaces,
then I can see how this might be done. Use tabs for identifying block
quotes, one tab for the first level, two tabs for the second level
(which will be extremely rare), and so on. And save spaces for no-wrap
line situations. Never use tabs except for this sole purpose of
identifying a block quote and the level it is at.)


From prosfilaes at gmail.com  Mon Oct 15 18:43:20 2007
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 15 Oct 2007 21:43:20 -0400
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <4713A930.7060103@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
Message-ID: <6d99d1fd0710151843l3e2f1ba6wd9e317062f75c363@mail.gmail.com>

On 10/15/07, Lee Passey <lee at novomail.net> wrote:
> Generally, I support the notion of levels of markup. However, it can
> lead to some unfortunate consequences. In the first place it can lead to
> the loss of data.

When transcribing an edition, using hi instead of emph isn't a loss of
data; that data isn't in the text. It's stating clearly what we know.
To convert that to emph may add data, but it also adds uncertainty and
editorial opinion; the data it adds isn't pure and doesn't come from
the text before us.

From jon at noring.name  Mon Oct 15 20:31:12 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 15 Oct 2007 21:31:12 -0600
Subject: [gutvol-d] The TEI 80/20 rule - empirical data
In-Reply-To: <6d99d1fd0710151843l3e2f1ba6wd9e317062f75c363@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<6d99d1fd0710151843l3e2f1ba6wd9e317062f75c363@mail.gmail.com>
Message-ID: <8410580897.20071015213112@noring.name>

David wrote:
> Lee Passey wrote:

>> Generally, I support the notion of levels of markup. However, it can
>> lead to some unfortunate consequences. In the first place it can lead to
>> the loss of data.

> When transcribing an edition, using hi instead of emph isn't a loss of
> data; that data isn't in the text. It's stating clearly what we know.
> To convert that to emph may add data, but it also adds uncertainty and
> editorial opinion; the data it adds isn't pure and doesn't come from
> the text before us.

Definitely when we try to describe the "why" something is emphasized
in the original paper edition, we will certainly sometimes be wrong.
Or, it is just plain difficult to know exactly for sure -- there are
times when trying to fit it into our "standardized" list of elements
and attribute values (which PG/DP should do) may be difficult or
ambiguous. (In some cases two or more apply simultaneously.)

Nevertheless, it is a good thing to do, and I believe accuracy can be
quite high without much thought or effort.

What I think Lee is really saying is that PG/DP should not consider
releasing a TEI document to the public until each and every <hi> is
converted to the PG/DP standardized semantic description (and PG/DP
should standardize on something.) With Lee I agree.

Certainly a 2-3 stage markup process may be used where the easy ones
are first handled by those less experienced, leaving the few tough ones
to the more seasoned veterans. In rare cases a decision may be need to
be made by "committee", and in some cases the "standardized" list may
need to be expanded or tweaked. I expect the need for committee-level
treatment to be pretty rare, but enough that it needs to be planned for.

Nevertheless, it is a *good* thing in the long run to remove all <hi>
and the value of that will also be very instructive for the users of
the TEI masters -- they will notice that and over time *understand*.

Jon Noring


From ralf at ark.in-berlin.de  Tue Oct 16 01:05:28 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Tue, 16 Oct 2007 10:05:28 +0200
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical	data
In-Reply-To: <4713A930.7060103@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
Message-ID: <20071016080528.GA4609@ark.in-berlin.de>

Lee wrote 
> ... Saying that 
> something was presented in a particular way is presentational markup, 
> saying /why/ it was presented in a particular way is semantic markup.

Yes, and that's why a first pass using scripts, the suggested Level 1,
always tends to produce presentational markup. Scripts are stupid,
they cannot handle the 'why' question _if given presentational markup_.

What it looks to me is that semantic markup can't be had cheaply,
can it? Not even when a bit more intelligent than average humans are
involved, as can be clearly supposed being part of DP.

> Generally, I support the notion of levels of markup. However, it can 
> lead to some unfortunate consequences. In the first place it can lead to 
> the loss of data.

Supposing the foofing process in DP didn't already lose it.

> What are the chances that anyone is going to go 
> through all those texts and convert all the presentational markup to 
> semantic markup?

Maybe you should include some (unknown future) AI with 'anyone'.

> And how much harder would it have been for the original 
> poster to just use semantic markup in the preparation of the texts in 
> the first place?

Not the poster, the foofer.

> I'm a firm believer in the old adage that it is easier 
> to do things right than to do them over. My suggestion is that if you're 
> using <hi> in the first pass, with the intent to convert them to 
> semantic markup in a subsequent pass, you probably ought to keep them in 
> your queue and not pass them on to PG until the upgrade has occurred.

Why you would declare the poster to be better suited for that task than
the foofer, you didn't explain.

And, of course it's easier to do it right from the start, but we don't
have an AI just now, do we? So we need to build it up stepwise.

> Unfortunately, OCR programs are so far incapable of detecting complete 
> sentences to say nothing of single thoughts or topics.

I'm with you there.

Let's assume that Google is in the best position to come up with
some AI that can "do better". What it would need would be a corpus
of semantically marked up texts (for a specific language).

Is there any effort outside PG to come up with such a thing?

Why would any sane person mark up some text semantically except
for being a hopeless bibliophile or for

 M O N E Y ?


Just send me your proposals in this respect per eMail,
ralf

From traverso at posso.dm.unipi.it  Tue Oct 16 03:31:05 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Tue, 16 Oct 2007 12:31:05 +0200 (CEST)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule -
	empirical	data
In-Reply-To: <20071016080528.GA4609@ark.in-berlin.de> (message from Ralf
	Stephan on Tue, 16 Oct 2007 10:05:28 +0200)
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
Message-ID: <20071016103105.795EE10225@posso.dm.unipi.it>

>>>>> "Ralf" == Ralf Stephan <ralf at ark.in-berlin.de> writes:


    Ralf> Why would any sane person mark up some text semantically
    Ralf> except for being a hopeless bibliophile or for

    Ralf>  M O N E Y ?

Some kind of semantic markup can make ebooks better accessible, (e.g. a
foreign tag to drive prononciation of automatically reading), this
might be a motivation for some volunteers (I know some projects in DP
in which the PM asks to include <foreing> markup).

(true, DPers might also qualify for "hopeless bibliophile", and some
for "insane").

Carlo


From marcello at perathoner.de  Tue Oct 16 07:15:35 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 16 Oct 2007 16:15:35 +0200
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <6410280365.20071015125216@noring.name>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name>
Message-ID: <4714C787.3060609@perathoner.de>

Jon Noring wrote:

> Nevertheless, as I've always said, so long as there is a need for the
> PG collection to include "plain text" versions of the books, ZML is a
> good candidate for that since it does normalize the texts.

ZML is the worst candidate for that. The design of ZML is fundamentally
flawed.

To format a text in ZML you have to use combinations of characters that
you cannot distinguish from each other on screen (space, tab and newline).

Space and tab look very much the same. (After a word of 7 chars, a space
and a tab look *exactly* the same.) You also need trailing tabs on a
line, which are also invisible.


You clearly see the problem when you read the zml documentation. It has
to use the markup `~tab~' instead of the tab character to be at all
readable. This should have tipped off BB, that an invisible character is
a bad choice for a markup tag.

Moreover, some editors chop off trailing whitespace. You can fuck up
your ZML text simply by loading and saving.

Moreover some editors substitute tabs and spaces without asking.


All in all, using non-printing characters as markup tags must be the
most bone-headed design decision ever.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Tue Oct 16 08:08:47 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 16 Oct 2007 09:08:47 -0600
Subject: [gutvol-d] Establish plain text normalization rules? (was "it's
	good to see the .tei people")
In-Reply-To: <4714C787.3060609@perathoner.de>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name>
	<4714C787.3060609@perathoner.de>
Message-ID: <876403294.20071016090847@noring.name>

Marcello wrote:
> Jon Noring wrote:

>> Nevertheless, as I've always said, so long as there is a need for the
>> PG collection to include "plain text" versions of the books, ZML is a
>> good candidate for that since it does normalize the texts.

> All in all, using non-printing characters as markup tags must be the
> most bone-headed design decision ever.

I agree that using white space characters to communicate document
structure for *machine-processing* (i.e., for mastering purposes) is
a show stopper, for a few reasons.

However, my comments had to do with creating plain text renditions
whose sole purpose is for direct reading and not to be converted to
something else. (The plain text is NOT for machine conversion.)

It is clear that in plain text renditions white space *must be used*
(and is used looking at all PG plain text renditions) to create
document structure for *human-processing*. So plain text editions
can't avoid using white spaces for communicating structure to human
readers (who are intelligent enough to "figure it out.")

Now, as I think about it though, even here, tabs should never be used
because how text editors interpret tabs can vary plus the tabs muck
up the usability of the text when some text is extracted for reuse.
So long as the user *knows* that all the white space there is the
ASCII space character plus the usual EOL stuff, then they will know
how to process it. But when tabs are mixed in with spaces, that is not
good -- it is downright annoying and depending upon the text editor
used can lead to unpredictable results (e.g., in some situations, a
tab character may visually pass for a single space character.) By and
large, the tab character is evil and, in my opinion, should never be
used in plain text renditions of books.

It may be possible Bowerbird can create a tabless ZML, but now things
get tricky since I think he will have to establish rules based on using
a specific number of white space characters plus the use of some of the
other ASCII characters (such as the ">" character which could be used
for block quotes) to communicate structure. But using other ASCII
characters in certain situations actually adds "content characters" to
the content, and that is something that should be avoided. All in all,
Bowerbird is caught between a rock and a hard place to come up with
some plain text normalization rules useful for mastering that do not
break some "thou shalt not do this" rule. To summarize the two I came
up with in this message:

1) Thou shalt not use any white space character other than the
   ASCII space character and EOL characters,

2) Thou shalt not use any non-white space character in the Unicode set
   except when that character is actually used in the textual content
   of the work.

)


*****

So, a question for PG/DP to maybe discuss. Is it important that PG even
establish some sort "normalization rules" for the formatting of plain
text renditions of books solely used for direct reading, or are we past
that now and it doesn't matter any more?

Jon Noring


From lee at novomail.net  Tue Oct 16 08:16:48 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 16 Oct 2007 09:16:48 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <4714C787.3060609@perathoner.de>
References: <be5.18c1e5c6.34450d5b@aol.com>	<6410280365.20071015125216@noring.name>
	<4714C787.3060609@perathoner.de>
Message-ID: <4714D5E0.1050704@novomail.net>

Marcello Perathoner wrote:

> All in all, using non-printing characters as markup tags must be the
> most bone-headed design decision ever.

I don't know, I've seen some pretty bone-headed design decisions in my 
time. I will agree, however, that it's probably in the top ten.

Somebody needs to pass this on to the Distributed Proofreaders so they 
can fix their proofing process in this regard as well.

-- 
Nothing of significance below this line.


From jon at noring.name  Tue Oct 16 08:33:15 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 16 Oct 2007 09:33:15 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <4714D5E0.1050704@novomail.net>
References: <be5.18c1e5c6.34450d5b@aol.com>
	<6410280365.20071015125216@noring.name>
	<4714C787.3060609@perathoner.de> <4714D5E0.1050704@novomail.net>
Message-ID: <26157054.20071016093315@noring.name>

Lee wrote:
> Marcello Perathoner wrote:

>> All in all, using non-printing characters as markup tags must be the
>> most bone-headed design decision ever.

> I don't know, I've seen some pretty bone-headed design decisions in my
> time. I will agree, however, that it's probably in the top ten.
>
> Somebody needs to pass this on to the Distributed Proofreaders so they
> can fix their proofing process in this regard as well.

Agreed. The biggest abuse I've seen is the use of the non-breaking
space character (usually inserted in XML docs using the character
entity "&nbsp;" or its numerical equivalent -- it may also be encoded
at the bit level in UTF-* encoded texts.)

It should *never* be used in either TEI or XHTML in the context PG/DP
uses them (I do recognize in some situations it is a quick fix for web
authoring, but PG/DP should not be doing "quick fixes").

There is *always* a structural or inline semantic reason why (notice
the word "why"?), during visual presentation, one may want to see more
space inserted between chunks of text -- in this case mark it up properly
and use CSS to add the necessary space if really, really, really needed.

Jon Noring


From joshua at hutchinson.net  Tue Oct 16 09:28:09 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Tue, 16 Oct 2007 16:28:09 +0000 (UTC)
Subject: [gutvol-d] it's good to see the .tei people
Message-ID: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>

Ok, time to disagree.

&nbsp; is very useful if you have something that you don't want broken 
up in a word wrap.

For instance, let's say I put my initials in here: J. H.

A word wrapping program could wrap that to J.
H.

Not real nice looking.  

But, if you put in J.&nbsp;H. ...  it'll never get split apart, which 
is where the vast majority of &nbsp; get used in DP texts.

Josh

>----Original Message----
>From: jon at noring.name
>Date: Oct 16, 2007 11:33 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] it&#39;s good to see the .tei people
>
>Lee wrote:
>> Marcello Perathoner wrote:
>
>>> All in all, using non-printing characters as markup tags must be 
the
>>> most bone-headed design decision ever.
>
>> I don't know, I've seen some pretty bone-headed design decisions in 
my
>> time. I will agree, however, that it's probably in the top ten.
>>
>> Somebody needs to pass this on to the Distributed Proofreaders so 
they
>> can fix their proofing process in this regard as well.
>
>Agreed. The biggest abuse I've seen is the use of the non-breaking
>space character (usually inserted in XML docs using the character
>entity "&nbsp;" or its numerical equivalent -- it may also be encoded
>at the bit level in UTF-* encoded texts.)
>
>It should *never* be used in either TEI or XHTML in the context PG/DP
>uses them (I do recognize in some situations it is a quick fix for 
web
>authoring, but PG/DP should not be doing "quick fixes").
>
>There is *always* a structural or inline semantic reason why (notice
>the word "why"?), during visual presentation, one may want to see 
more
>space inserted between chunks of text -- in this case mark it up 
properly
>and use CSS to add the necessary space if really, really, really 
needed.
>
>Jon Noring
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From jon at noring.name  Tue Oct 16 09:52:53 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 16 Oct 2007 10:52:53 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
Message-ID: <447877789.20071016105253@noring.name>

Joshua wrote:

> Ok, time to disagree.

Great! :^)


> &nbsp; is very useful if you have something that you don't want broken
> up in a word wrap.
>
> For instance, let's say I put my initials in here: J. H.
>
> A word wrapping program could wrap that to J.
> H.
>
> Not real nice looking.  
>
> But, if you put in J.&nbsp;H. ...  it'll never get split apart, which 
> is where the vast majority of &nbsp; get used in DP texts.

In XHTML (there's a TEI equivalent):

   <span class="keeptogether">J. H.</span>

   <!-- use whatever classname you want, example only -->

In CSS, one may then, if they wish to:

   span.keeptogether {white-space: nowrap}

   (see: http://www.w3.org/TR/CSS21/text.html#propdef-white-space )

(The value of the above is that we now have a better semantic idea
*why* we are doing something. Putting in &nbsp; we have a lesser idea
why, and in some cases, to someone reading the document, may become
confused, or in some situations ambiguous. Also note that we are,
using both techniques, adding presentationally-oriented markup. One
can imagine just leaving it out entirely, and letting conversion
systems hunt down those instances and treat them as desired.)

There are times, such as for extremely limited displays or space,
where forcing nowrapping creates a situation worse than allowing the
J. and H. to be broken on separate lines. After all, if we begin to be
worried about breaking the J. and H., then we have to be equally
"anal" about things like orphans and widows -- now we move into the
realm of typesetting engines and the like... Is this the role of the
master format to be worried about?

Now, granted, I had not thought of this situation, even though I am
aware of it, since in *so many* PG (X)HTML texts I've looked at,
&nbsp; is rampantly being abused, such as for indentation of
paragraphs and verse lines, etc. It's better to see &nbsp; being used
only for keeping words together rather than forcing spacing in visual
presentation (since that is its purpose.) Yet, &nbsp; is still
something I am not fond of using in virtually any circumstance,
especially in that in most instances there is a markup solution, as
illustrated above.

Jon Noring


From lee at novomail.net  Tue Oct 16 09:56:08 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 16 Oct 2007 10:56:08 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <20071016080528.GA4609@ark.in-berlin.de>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>
	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
Message-ID: <4714ED28.6070008@novomail.net>

Ralf Stephan wrote:
> Lee wrote 
>> ... Saying that 
>> something was presented in a particular way is presentational markup, 
>> saying /why/ it was presented in a particular way is semantic markup.
> 
> Yes, and that's why a first pass using scripts, the suggested Level 1,
> always tends to produce presentational markup. Scripts are stupid,
> they cannot handle the 'why' question _if given presentational markup_.

I agree, a first pass /using scripts/ will always produce presentational 
markup. But where is it suggested that a first pass using scripts is 
sufficient to create a Level 1 document? And where is it suggested that 
Level 1 documents are sufficient to "check in" to PG?

TEI is inherently a semantic/structural markup language. When you create 
a TEI file you are making an implicit promise that it contains at least 
a modicum of semantic markup. When you create a TEI file that is purely 
presentational you are breaking that promise.

There are, of course, other markup languages that are much better than 
TEI for carrying presentational information, not the least of which is 
XHTML. It is at least as easy to write an XSLT script to convert XHTML 
to PDF or RTF as it is to write a script to convert TEI. Plus, XHTML is 
directly usable by most User Agent software without conversion.

I would suggest that if one is going to create files that are purely 
presentational then XHTML is a better choice. When the time comes to add 
semantic markup XHTML can easily first be converted to TEI.

> What it looks to me is that semantic markup can't be had cheaply,
> can it? Not even when a bit more intelligent than average humans are
> involved, as can be clearly supposed being part of DP.

I disagree. I think that a significant amount of semantic markup /can/ 
be had cheaply, particularly when humans are involved as in DP. 
Consider, for example, the DP proofing rules regarding the beginning of 
chapters. The last time I looked, the DP rules were:

<cit>
Put 4 blank lines before the "CHAPTER XXX".... Then leave one blank line 
between each additional part of the chapter header, such as a chapter 
description, opening quote, etc., and finally leave two blank lines 
before the start of the text of the chapter.
<xptr>http://www.pgdp.net/c/faq/document.php#chap_head</xptr>
</cit>

Overlooking the fact that it is a boneheaded design to use non-printing 
characters as markup tags, how is the existing rule any easier for 
people to use that a rule such as:

<ab>
Begin each chapter with '<div type="chapter">.' Chapter headers should 
begin with <head> and end with </head>, and should appear immediately 
after the "<div>" line, e.g.:

<div type="chapter">
<head>CHAPTER XXX</head>
</ab>?

>> Generally, I support the notion of levels of markup. However, it can 
>> lead to some unfortunate consequences. In the first place it can lead to 
>> the loss of data.
> 
> Supposing the foofing process in DP didn't already lose it.

Well, that can be a problem. But I'm trying to establish some parameters 
for the use of TEI, regardless of whether or not DP is involved in the 
process.

>> What are the chances that anyone is going to go 
>> through all those texts and convert all the presentational markup to 
>> semantic markup?
> 
> Maybe you should include some (unknown future) AI with 'anyone'.

Oh, I have. I know that computer scientists have been researching the 
problems of natural language processing for more than four decades now, 
and while great strides have been made, I still don't think that we will 
have in my lifetime, or in my children's lifetimes, an AI that can read 
a page and say "that phrase is italicized because it is emphasized."

I think that true TEI texts are valuable today, and that means that some 
human intervention will be required. If we are going to wait for some 
hypothetical AI in the future, we would be better off spending our time 
preserving books as paper artifacts rather than trying to convert them 
to /any/ electronic format.

>> And how much harder would it have been for the original 
>> poster to just use semantic markup in the preparation of the texts in 
>> the first place?
> 
> Not the poster, the foofer.

The person or persons who caused a TEI file to be prepared and added to 
the PG database. Use whatever term you like for him/her/them.

>> I'm a firm believer in the old adage that it is easier 
>> to do things right than to do them over. My suggestion is that if you're 
>> using <hi> in the first pass, with the intent to convert them to 
>> semantic markup in a subsequent pass, you probably ought to keep them in 
>> your queue and not pass them on to PG until the upgrade has occurred.
> 
> Why you would declare the poster to be better suited for that task than
> the foofer, you didn't explain.
> 
> And, of course it's easier to do it right from the start, but we don't
> have an AI just now, do we? So we need to build it up stepwise.

Yes. And the first step /requires/ human input, and those humans need to 
be taught to think in terms of semantics not presentation. That human 
input may be an individual who cares deeply about a single work and is 
capable of carrying the process from beginning to end (see 
http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml). 
That human input may be a more or less formal organization where some 
individuals are responsible for scanning the books, others are 
responsible for proofreading the content, and others are responsible for 
assembling the completed work.

I don't think that Level 1 texts need to be /complete/ semantic markup; 
but I do think they ought to be semantic markup. Mr. Perathoner has 
suggested a hack (in the most positive sense of the word) whereby XML 
comments could be included in a TEI file to explain why a purely 
presentational markup was used instead of the expected semantic markup. 
I think that even Level 1 texts should have these kinds of comments 
whenever any kind of non-semantic element (e.g. <ab>, <hi>, <seg>, etc.) 
is used, explaining why a semantic element could not be chosen.

>> Unfortunately, OCR programs are so far incapable of detecting complete 
>> sentences to say nothing of single thoughts or topics.
> 
> I'm with you there.
> 
> Let's assume that Google is in the best position to come up with
> some AI that can "do better". What it would need would be a corpus
> of semantically marked up texts (for a specific language).
> 
> Is there any effort outside PG to come up with such a thing?

Yes, in Colleges and Universities around the world. Natural language 
processing is a hot topic, and there is almost always some research 
going on in the area (my brother-in-law did his master's thesis on the 
topic).

> Why would any sane person mark up some text semantically except
> for being a hopeless bibliophile or for
> 
>  M O N E Y ?

Because some people are altruists and believe that what they are doing 
is for the good of mankind. I suspect that this was Michael Hart's 
original motivation and is the motivation of virtually everyone who 
participates in Distributed Proofreaders. And unlike Michael Hart, I 
believe that if those volunteers are given instructions and guidelines 
as to how to produce a better work product they would happily accept and 
adopt those guidelines.

-- 
Nothing of significance below this line.


From jon at noring.name  Tue Oct 16 10:26:45 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 16 Oct 2007 11:26:45 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <4714ED28.6070008@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net>
Message-ID: <177426060.20071016112645@noring.name>

Lee Passey wrote:

> [snip of a lot of excellent insights]
>
> Oh, I have. I know that computer scientists have been researching the 
> problems of natural language processing for more than four decades now,
> and while great strides have been made, I still don't think that we will
> have in my lifetime, or in my children's lifetimes, an AI that can read
> a page and say "that phrase is italicized because it is emphasized."

I've always said that when we have AI at the level of a Commander Data in
Star Trek, then we can turn over all our text digitization completely to
machines, at least to do it properly and completely and perfectly, the
way we know it needs to be done.

Such AI has to essentially be "sentient-level", and has to learn
language as a human and understand human nature and social systems as
a human, and must especially understand the language and culture
associated with a particular text being transcribed and structured.

When will that happen? Nobody really knows, but to be a little bit off
topic here, I think we are closer to this than many might think, but it
will be based on understanding how the human brain really works and
building machines to mimic that (I believe, but it is only a belief,
that true intelligence is only possible as a result of quantum
effects, so quantum computing now under development may be a component
of this AI revolution... Back in the 90's I had some fascinating
private talks with some of the quantum physicists at LLNL on this very
topic. Of course, a few quantum physicists believe all solutions to all
problems are based on applying quantum mechanics to them! A spooky lot
these quantum physicists are. LOL)


> I think that true TEI texts are valuable today, and that means that some
> human intervention will be required. If we are going to wait for some 
> hypothetical AI in the future, we would be better off spending our time
> preserving books as paper artifacts rather than trying to convert them
> to /any/ electronic format.

Well said!


> Because some people are altruists and believe that what they are doing
> is for the good of mankind. I suspect that this was Michael Hart's 
> original motivation and is the motivation of virtually everyone who 
> participates in Distributed Proofreaders. And unlike Michael Hart, I 
> believe that if those volunteers are given instructions and guidelines
> as to how to produce a better work product they would happily accept and
> adopt those guidelines.

Yes, agreed on this, too.

In many ways, PG's laxness grew as a result of Michael's personality
which is very individualistic ("nobody tells me what to do"). And that's
fine -- we need those people in the world.

Personally, I think PG would have gotten *more* people involved had it
had a little more structure and stricter guidelines from the start
because most people who are altruistic are also those who need and
gladly accept guidance, and see the value in doing things right: "give
me a 'to-do' check list, and I'll follow it exactly." For example, I
decided not get involved back in 1994 with PG at the text production
level for the reason that PG was too lax in a number of important ways.

It is unfortunate that it is the self-motivated, individualistic types
who often get projects started, and the idea of providing a strict
check-list of guidelines is something that is alien to them -- they
assume everyone else is just like them.

Jon Noring


From vze3rknp at verizon.net  Tue Oct 16 11:25:11 2007
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Tue, 16 Oct 2007 14:25:11 -0400
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <447877789.20071016105253@noring.name>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name>
Message-ID: <47150207.9010203@verizon.net>

Jon Noring wrote:
> Joshua wrote:
>
>   
>> Ok, time to disagree.
>>     
>
> Great! :^)
>
>   
>> &nbsp; is very useful if you have something that you don't want broken
>> up in a word wrap.
>>
>> For instance, let's say I put my initials in here: J. H.
>>
>> A word wrapping program could wrap that to J.
>> H.
>>
>> Not real nice looking.  
>>
>> But, if you put in J.&nbsp;H. ...  it'll never get split apart, which 
>> is where the vast majority of &nbsp; get used in DP texts.
>>     
>
> In XHTML (there's a TEI equivalent):
>
>    <span class="keeptogether">J. H.</span>
>
>    <!-- use whatever classname you want, example only -->
>
> In CSS, one may then, if they wish to:
>
>    span.keeptogether {white-space: nowrap}
>
>    (see: http://www.w3.org/TR/CSS21/text.html#propdef-white-space )
Other places that might well get non-breaking spaces are abbreviations 
such as i.e. or e.g. Here I've rendered them without a space, but they 
are clearly spaced in some books and we have ongoing debates about how 
to handle them. We first standardized on always closing up the space (so 
that they wouldn't rewrap with if line break fell between the letters) 
but that proved confusing. (What are initials? What aren't? Which 
abbreviation should be closed up? How do we know? etc, etc) We've now 
moved to transcribing as it is printed in the book. Leave a space if one 
appears, don't if one doesn't. Note that these rules also cover initials 
as in Josh's example. Also, please, please note that this is how the 
proofers and formatters transcribe things in the rounds. What the 
post-processor does is different. He/she can space the initials with no 
non-breaking space, add the non-breaking space, or choose not to space 
the initials at all. Our only requirement is that whatever method is 
chosen be used consistently throughout that book.

Still another place where non-breaking spaces can come up is in simple, 
in-line equations. x + y = z for example.

And one more significant use for the non-breaking space is in indenting 
poetry. We have agreed that indentation in poetry is part of the 
author's intent and that we need to preserve it.  I gather that there 
aren't easy ways of doing this in xhmtl and thus that the non-breaking 
spaces are often used for that purpose.

I'm not saying that the way our post-processors use non-breaking spaces 
is always correct. Just pointing out some more places where they arise.

JulietS


From vze3rknp at verizon.net  Tue Oct 16 11:59:18 2007
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Tue, 16 Oct 2007 14:59:18 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <4714ED28.6070008@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net>
Message-ID: <47150A06.1010207@verizon.net>


Lee Passey wrote:
> I disagree. I think that a significant amount of semantic markup /can/ 
> be had cheaply, particularly when humans are involved as in DP. 
> Consider, for example, the DP proofing rules regarding the beginning of 
> chapters. The last time I looked, the DP rules were:
>
> <cit>
> Put 4 blank lines before the "CHAPTER XXX".... Then leave one blank line 
> between each additional part of the chapter header, such as a chapter 
> description, opening quote, etc., and finally leave two blank lines 
> before the start of the text of the chapter.
> <xptr>http://www.pgdp.net/c/faq/document.php#chap_head</xptr>
> </cit>
>
> Overlooking the fact that it is a boneheaded design to use non-printing 
> characters as markup tags, how is the existing rule any easier for 
> people to use that a rule such as:
>
> <ab>
> Begin each chapter with '<div type="chapter">.' Chapter headers should 
> begin with <head> and end with </head>, and should appear immediately 
> after the "<div>" line, e.g.:
>
> <div type="chapter">
> <head>CHAPTER XXX</head>
> </ab>?
Please remember that there is a big difference between what we do in the 
formatting rounds and what gets produced by the post-processors. What we 
ask our formatters to do has to be quickly learned, easily remembered, 
and easily typed. By these criteria, the blank line rules work quite 
well. It would probably work as well to use some markup that says 
"chapter" with an opening and closing tag, but for historical reasons, 
we don't. Nonetheless, the post-processing software can find the 
4-stuff-2 spacing and convert that automatically into whatever chapter 
heading markup is appropriate. Similarly with the 2 before, 1 after line 
spacing for sections of a chapter. The post-processor will have to look 
at each of these and determine that they really are chapters, sections, 
subsections, etc. but hopefully the worst of the grunt work will have 
been done.

Just out of curiosity, how would you markup a two line chapter heading? 
Something like

CHAPTER 1
Missy Goes to Space

Would you just assume that both lines are part of the title, that that 
is all that matters semantically and ignore the fact that the whole 
thing was printed on two lines? That would seem to be a strictly 
semantic approach. But what happens if someone decides that rendering 
chapters with the chapter number first and then the title on another 
line would look better. I understand that this is covered by the 
transform that converts the semantic information into something 
presentational. But how can that transform offer the option of two line 
chapter headers if the information about where to break the line has 
been lost?

Or, in a more complicated case, how would you handle the article header 
seen at
http://www.pgdp.net/c/tools/project_manager/displayimage.php?project=projectID40faf2a2aaaff&imagefile=226.png 
<http://www.pgdp.net/c/tools/project_manager/displayimage.php?project=projectID40faf2a2aaaff&imagefile=226.png>
I'm assuming that you'd use some combination of Chapter Title, Chapter 
Subtitle, Author, InfoReAuthor, and maybe something to indicate the 
opening quote as being different from most other quotes (or not). That 
makes perfect sense to me. What concerns me, however, is whether the 
currently existing PGTEI transforms that would render this would make 
something that looks decent and that makes sense of the information to 
the reader. Not that the exact typography be reproduced, but so that the 
reader can tell at a glance what's what and how the various lines of 
information relate to each other.

Which brings me to my final point. I believe that much of the reluctance 
among the DP post-processors about using PGTEI is not because it is 
semantic markup, but because they don't trust the current system of 
rendering the semantic information to produce something that look 
acceptable. Jon, this is quite different from wanting to exactly 
reproduce the typography. And Marcello, it's all very well to say that 
anyone can write their own XSLT (or style sheets or whatever the thing 
is called that converts the TEI to html, plain text, etc) but the facts 
are that the DP volunteers don't have the skills to do it. I'd say that 
for many of the books we produce, it isn't terribly important exactly 
how chapter titles are rendered, for example, but it is important that 
the result look nice. I know that Josh said he would see about producing 
something that makes a better looking title page. That is the sort of 
change that will be necessary for DP to adopt TEI for at least the 
simple projects.

JulietS


From joshua at hutchinson.net  Tue Oct 16 12:41:15 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Tue, 16 Oct 2007 19:41:15 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
Message-ID: <10536393.1192563675046.JavaMail.?@fh1064.dia.cp.net>


>----Original Message----
>From: vze3rknp at verizon.net
>
>Just out of curiosity, how would you markup a two line chapter 
heading? 
>Something like
>
>CHAPTER 1
>Missy Goes to Space
>

Here is how I've handled it in the past (which may or may not be the 
"best" way):

<div>
<index index="toc" level1="Missy Goes to Space" />
<index index="pdf" level1="Missy Goes to Space" />
<head>CHAPTER 1 - Missy Goes to Space</head>

<p>...</p>
</div>

If you wanted it on two lines, you could "force" a line break with a 
<lb /> after the Chapter 1, I suppose.  

>
>Or, in a more complicated case, how would you handle the article 
header 
>seen at
>http://www.pgdp.net/c/tools/project_manager/displayimage.php?
project=projectID40faf2a2aaaff&imagefile=226.png 
><http://www.pgdp.net/c/tools/project_manager/displayimage.php?
project=projectID40faf2a2aaaff&imagefile=226.png>

Here is a first pass attempt.  My apologies in advance if I type 
something stupid:

<div>
<index index="toc" />
<index index="pdf" />
<head rend="text-align: center">VI - AMERICAN BUSINESS IN THE 
WAR</head>
<head type="sub" rend="text-align: center">Voluntary Cooperation of 
Experts and Loyal Support of Labor Put Our Industries on a War 
Basis</head>

<p rend="text-align: center">By Grosvenor B. Clarkson</p>

<p rend="text-align: center">Director of the U.S. Council of National 
Defense and of Its Advisory Commission</p>

<p>...</p>
</div>

You could add semantic tags such as marking Clarkson as a name, but 
that isn't necessary.

>
>Which brings me to my final point. I believe that much of the 
reluctance 
>among the DP post-processors about using PGTEI is not because it is 
>semantic markup, but because they don't trust the current system of 
>rendering the semantic information to produce something that look 
>acceptable. 
>

I agree and add one more major stumbling block ... lack of 
tools/scripts.  The HTML crowd has a lot of existing tools to help go 
from DP output to HTML final product.  TEI does not (and I wish I had 
the time and ability to create such tools!).

Josh

From Bowerbird at aol.com  Tue Oct 16 13:00:36 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 16 Oct 2007 16:00:36 EDT
Subject: [gutvol-d] it's good to see the .tei people
Message-ID: <ceb.1e171e0b.34467264@aol.com>

wow, i see jon and lee have dragged people into another t.e.i. black-hole.

oh well, i'm working on my .pdf converter right now, so i can't be bothered.

***

juliet said:
>   What the post-processor does is different. 
>    He/she can space the initials with no non-breaking space, 
>    add the non-breaking space, or choose not to space the initials at all. 
>    Our only requirement is that whatever method is chosen 
>    be used consistently throughout that book.

and that "requirement" means individual _books_ are "uniform",
but that the _library_ is _inconsistent_.   oh well, i'll deal with it...
it would be nice, though, if other people thought of the library
as a whole, as developers cannot build on an inconsistent base.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071016/d9999386/attachment.htm 

From marcello at perathoner.de  Tue Oct 16 14:29:52 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 16 Oct 2007 23:29:52 +0200
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <47150A06.1010207@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>
Message-ID: <47152D50.3000205@perathoner.de>

Juliet Sutherland wrote:

> Just out of curiosity, how would you markup a two line chapter heading? 
> Something like
> 
> CHAPTER 1
> Missy Goes to Space

First do it sematically:

<div>
  <index type="toc" level1="Chapter 1: Missy goes to space" />
  <head>CHAPTER 1</head>
  <head type="sub">Missy Goes to Space</head>

Then add presentational stuff:

<div>
  <index type="toc" level1="Chapter 1: Missy goes to space" />
  <head rend="font-size: 200%; text-align: center">CHAPTER 1</head>
  <head type="sub" rend="font-size: 80%; text-align: center">Missy Goes
to Space</head>


> Or, in a more complicated case, how would you handle the article header 
> seen at
> http://www.pgdp.net/c/tools/project_manager/displayimage.php?project=projectID40faf2a2aaaff&imagefile=226.png 

First do it semantically:

<group>
  <text>
    <front>
      <titlePage>
        <docTitle>
          <titlePart>VI - American Business in the War</titlePart>
	  <titlePart>Voluntary Co?peration ... Basis</titlePart>
        </docTitle>
        <byline>
          By <docAuthor>Grosvenor B. Clarkson</docAuthor><lb/>
          Director of ... Commission
        </byline>
        <epigraph>
          <cit>
            <q>Modern wars are not won by ... force.</q>
            <bibl>&mdash;Woodrow Wilson.</bibl>
          </cit>
        </epigraph>
      </titlePage>
    </front>
    <body>
      <p>War today means ...

Then add the presentational stuff:

<index index="toc" level1="VI - American Business in the War" />
<docTitle>
  <titlePart rend="display: block; text-align: center; font-size: 200%;
text-transform: uppercase"
            >VI - American Business in the War</titlePart>
  <titlePart rend="display: block; text-align: center; font-size: 150%"
            >Voluntary Co?peration ... Basis</titlePart>


The rest is left as exercise to the reader.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From prosfilaes at gmail.com  Tue Oct 16 15:43:05 2007
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 16 Oct 2007 18:43:05 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <47152D50.3000205@perathoner.de>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47152D50.3000205@perathoner.de>
Message-ID: <6d99d1fd0710161543n3fe18eb1wccadceb312a14b03@mail.gmail.com>

On 10/16/07, Marcello Perathoner <marcello at perathoner.de> wrote:
> Juliet Sutherland wrote:
>
> > Just out of curiosity, how would you markup a two line chapter heading?
> > Something like
> >
> > CHAPTER 1
> > Missy Goes to Space
>
> First do it sematically:
>
> <div>
>   <index type="toc" level1="Chapter 1: Missy goes to space" />
>   <head>CHAPTER 1</head>
>   <head type="sub">Missy Goes to Space</head>
>
> Then add presentational stuff:
>
> <div>
>   <index type="toc" level1="Chapter 1: Missy goes to space" />
>   <head rend="font-size: 200%; text-align: center">CHAPTER 1</head>
>   <head type="sub" rend="font-size: 80%; text-align: center">Missy Goes
> to Space</head>

That's at least double the work of HTML. The whole promise of TEI is
that we shouldn't have to add the presentational stuff to make it come
out right. The TEI has to produce the equivalent of that
presentational stuff in the HTML and PDF editions to make this whole
thing worth our time.

From lee at novomail.net  Tue Oct 16 15:46:04 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 16 Oct 2007 16:46:04 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <47150A06.1010207@verizon.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<47013DAF.5090400@bohol.ph>	<47056722.9000801@novomail.net>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>
Message-ID: <47153F2C.8060308@novomail.net>

Juliet Sutherland wrote:

[snip]

> Please remember that there is a big difference between what we do in the 
> formatting rounds and what gets produced by the post-processors. What we 
> ask our formatters to do has to be quickly learned, easily remembered, 
> and easily typed. By these criteria, the blank line rules work quite 
> well. 

Not for me. The requirement that I put a cursor in a spot on the screen 
and then count keystrokes as I move it is quite annoying, and one of the 
things that keeps me from doing more work at DP. And it's really hard to 
just look at a gap on the screen and know whether it's 3 lines, or 4, or 
5. If the markup is explicit (i.e., doesn't rely on non-printing 
characters) I can see in a glance if the markup is correct.

> It would probably work as well to use some markup that says 
> "chapter" with an opening and closing tag, but for historical reasons, 
> we don't.

Then perhaps it's time to re-examine the process, and change it if it 
makes sense. "Because that's the way we've always done it" is about the 
/worst/ reason I can imagine to justify anything.

[snip]

> Just out of curiosity, how would you markup a two line chapter heading? 
> Something like
> 
> CHAPTER 1
> Missy Goes to Space

<div type="chapter">
<head>CHAPTER 1</head>
<head>Missy Goes to Space</head>

My recollection is that TEI allows as many <head> elements as you want 
just so long as they all come before other elements.

Or...

<div type="chapter">
<head>CHAPTER 1 <lb/>Missy Goes to Space</head>

This is saying, "there is a single header semantically, which was broken 
into two lines in the original." (According to the TEI specification, 
the <lb/> element should be inserted at the /beginning/ of the new line, 
not the end of the old one. I added a space so it would look acceptable 
if you were viewing the file natively in a software User Agent that 
didn't have good support for CSS).

> Would you just assume that both lines are part of the title, that that 
> is all that matters semantically and ignore the fact that the whole 
> thing was printed on two lines? That would seem to be a strictly 
> semantic approach. But what happens if someone decides that rendering 
> chapters with the chapter number first and then the title on another 
> line would look better. I understand that this is covered by the 
> transform that converts the semantic information into something 
> presentational. But how can that transform offer the option of two line 
> chapter headers if the information about where to break the line has 
> been lost?
> 
> Or, in a more complicated case, how would you handle the article header 
> seen at
> http://www.pgdp.net/c/tools/project_manager/displayimage.php?project=projectID40faf2a2aaaff&imagefile=226.png 

Assuming that what you have posted is a chapter, off the top of my head 
(I may have violated some picky TEI DTD requirement):

<div type="chapter">
   <head>VI&mdash;AMERICAN BUSINESS IN THE WAR</head>
   <head type="sub">Voluntary Cooperation of Experts and Loyal Support 
of Labor Put Our Industries on a War Basis</head>
   <byline>By <docAuthor>Grosvenor B. Clarkson</docAuthor>
   <lb />Director of the U.S. Council of National Defense and of Its 
Advisory Commission</byline>
   <epigraph>
     <cit>
       <p>Modern wars are not won by mere numbers. They are not won by 
mere enthusiasm. They are not won by mere national spirit. They are won 
by the scientific conduct of war, the scientific application of 
irresistible force.</p>
       <bibl><author>Woodrow Wilson</author></bibl>
     </cit>
   </epigraph>
   <p>War today means that for every man on the fighting line...
</div>

> I'm assuming that you'd use some combination of Chapter Title, Chapter 
> Subtitle, Author, InfoReAuthor, and maybe something to indicate the 
> opening quote as being different from most other quotes (or not). That 
> makes perfect sense to me. What concerns me, however, is whether the 
> currently existing PGTEI transforms that would render this would make 
> something that looks decent and that makes sense of the information to 
> the reader. Not that the exact typography be reproduced, but so that the 
> reader can tell at a glance what's what and how the various lines of 
> information relate to each other.

I don't know how the PGTEI XSL script would handle this. My own tei2html 
program rendered this fragment as:

<DIV class="tei-div chapter">
   <H3 class="tei-head">VI&mdash;AMERICAN BUSINESS IN THE WAR</H3>
   <H3 class="tei-head sub">Voluntary Cooperation of Experts and Loyal 
Support of Labor Put Our Industries on a War Basis</H3>
   <H1 class="tei-byline">By <SPAN class="tei-docAuthor">Grosvenor B. 
Clarkson</SPAN> <BR class="tei-lb" />Director of the U.S. Council of 
National Defense and of Its Advisory Commission</H1>
   <DIV class="tei-epigraph">
     <blockquote class="tei-cit">
       <P>Modern wars are not won by mere numbers. They are not won by 
mere enthusiasm. They are not won by mere national spirit. They are won 
by the scientific conduct of war, the scientific application of 
irresistible force.</P>
       <span class="tei-bibl"><SPAN class="tei-author">Woodrow 
Wilson</SPAN></span>
     </blockquote>
   </DIV>
   <P>War today means that for every man on the fighting line... </P>
</DIV>

You have to have a CSS which sets off the epigraph, and right aligns the 
  "tei-bibl", but I think it looks pretty good (I'll e-mail you a screen 
shot if you'd like.

> Which brings me to my final point. I believe that much of the reluctance 
> among the DP post-processors about using PGTEI is not because it is 
> semantic markup, but because they don't trust the current system of 
> rendering the semantic information to produce something that look 
> acceptable.

I think by your wording you have hit the nail on the head: "they don't 
trust..." It really is a matter of trust. Because I understand how XML 
transformations can occur, and how CSS works, and even the XSL scripting 
language a little, I have complete confidence that semantic TEI markup 
is capable of preserving all the data necessary to re-render the text in 
a way that is aesthetically pleasing to me (I refuse to make any 
judgments as to whether or not it is aesthetically pleasing to anyone 
else). Even if I thought the HTML output from Mr. Perathoner's XSLT 
scripts sucks (which I do) it wouldn't make any difference to me because 
all the data is still there in the master file.

Those volunteers who are contemplating using TEI markup will have to 
learn to trust that if they do so there will be tools, either existing 
or in the future, that will create output that doesn't suck (I believe 
it is possible to create a CSS file that will let people see good output 
from the TEI files directly in a web browser). I understand that this 
can be a hard thing to do, and for many people the only way they will 
develop this level of trust is to actually see it. But I would encourage 
everyone to take the leap of faith.

[snip]

-- 
Nothing of significance below this line.


From prosfilaes at gmail.com  Tue Oct 16 15:49:50 2007
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 16 Oct 2007 18:49:50 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <4714ED28.6070008@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47013DAF.5090400@bohol.ph> <47056722.9000801@novomail.net>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net>
Message-ID: <6d99d1fd0710161549m9fd7ee7v887ba662dca0fe66@mail.gmail.com>

On 10/16/07, Lee Passey <lee at novomail.net> wrote:
> TEI is inherently a semantic/structural markup language. When you create
> a TEI file you are making an implicit promise that it contains at least
> a modicum of semantic markup. When you create a TEI file that is purely
> presentational you are breaking that promise.

The hi tag is in TEI, so by using it you aren't breaking any promises.

> I would suggest that if one is going to create files that are purely
> presentational then XHTML is a better choice.

Whatever that means. XHTML is distinctly lacking in several regards
for the type of documents I think we should be creating. There's lots
of things TEI does better than XHTML, like sidenotes and footnotes and
page numbers and chapter markings, things that require little to no
editorial interpretation to figure out.

> Overlooking the fact that it is a boneheaded design to use non-printing
> characters as markup tags, how is the existing rule any easier for
> people to use that a rule such as:

We've found that the more complex the markup, the more likely it is to
get mistyped. The chapter markup is probably on the way out some time,
but never for raw TEI.

> I think that true TEI texts are valuable today,

If a lot of people found them valuable enough, they'd be common. As it
is, plain text and HTML and PDF etexts are common; TEI etexts are
rare. They're rare because it's hard to make them, and people don't
find the additional markup worth it. You could make so many etexts in
TEI and make your collection so important that people will choose to
work in TEI to work with you; but if you decide to choose to get
people working on an existing project to change, then you have to
understand that they don't think it's valuable right now.

> If we are going to wait for some
> hypothetical AI in the future, we would be better off spending our time
> preserving books as paper artifacts rather than trying to convert them
> to /any/ electronic format.

Plain text and HTML have enough value to do today; the users have
shown that. That doesn't mean your changes have enough value to do
today.

> I think that even Level 1 texts should have these kinds of comments
> whenever any kind of non-semantic element (e.g. <ab>, <hi>, <seg>, etc.)
> is used, explaining why a semantic element could not be chosen.

I think that's a waste of time. These elements are perfectly good TEI.

From prosfilaes at gmail.com  Tue Oct 16 16:11:02 2007
From: prosfilaes at gmail.com (David Starner)
Date: Tue, 16 Oct 2007 19:11:02 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <47153F2C.8060308@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
Message-ID: <6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>

On 10/16/07, Lee Passey <lee at novomail.net> wrote:
> If the markup is explicit (i.e., doesn't rely on non-printing
> characters) I can see in a glance if the markup is correct.

Can you honestly see in a glance if

<div type="chapter">
    <head>VI&emdash;AMERICAN BUSINESS IN THE WAR</head>
    <head type="sub">Voluntary Cooperation of Experts and Loyal Support
 of Labor Put Our Industries on a War Basis</head>
    <byline>By <docAuthor>Grosvenor B. Clarkson</docAuthor>
    <lb/>Director of the U.S. Council of National Defense and of Its
 Advisory Commission</byline>
    <epigraph>
      <cit>
        <p>Modern wars are not won by mere numbers. They are not won by
 mere enthusiasm. They are not won by mere national spirit. They are won
 by the scientific conduct of war, the scientific application of
 irresistible force.</p>
        <bibl><author>Woodrow Wilson</author></bibl>
      </cit>
    </epigraph>
    <p>War today means that for every man on the fighting line...
 </div>

is correct? Noisy markup is not easy to verify.

> "Because that's the way we've always done it" is about the
> /worst/ reason I can imagine to justify anything.

Every time we change markup at DP, we get a long period where people
get confused about what the right way to do things is. There's some
definite frustration at DP every so often about how proofers and
formatters are constantly having to learn new rules. Not changing
things unless there's a definite large benefit is a good reason.

If you're familiar with computers at all, there's a huge history of
that. Why do AMD-64s boot up in 8-bit mode? Because every chip that
has decided that "that's the way we've always done it" was a bad
reason to do things and looked to compete with the Intel x86 line has
failed. Why does UTF-8--an ASCII compatible encoding for
Unicode--exist? It wasn't part of the original design of Unicode,
which specified 16 (or 32) bit characters only; but Unix programmers
weren't about to give up dealing with ASCII bytes. I'm sure Unicode
could have fought that, but I don't think that would have been a good
move for them.

> I think by your wording you have hit the nail on the head: "they don't
> trust..." It really is a matter of trust.

There's an old Arabic saying: "Trust in Allah; but tie up your camel".
I've seen obscure formats for text come and go; we want to see it
working.

> I have complete confidence that semantic TEI markup
> is capable of preserving all the data necessary to re-render the text in
> a way that is aesthetically pleasing to me

The original scans preserve all this data. We need to see it in practice.

> Those volunteers who are contemplating using TEI markup will have to
> learn to trust that if they do so there will be tools, either existing
> or in the future, that will create output that doesn't suck

I don't see why they should, and I don't think they will. We want to
know that our work is usable today, and we don't want to worry about
such tools not appearing.

> But I would encourage
> everyone to take the leap of faith.

And I would encourage everyone not to dedicate work into a text format
until the tools do the things they want.

From lee at novomail.net  Tue Oct 16 21:24:40 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 16 Oct 2007 22:24:40 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
Message-ID: <47158E88.8080404@novomail.net>

David Starner wrote:
> On 10/16/07, Lee Passey <lee at novomail.net> wrote:
>   
>> If the markup is explicit (i.e., doesn't rely on non-printing
>> characters) I can see in a glance if the markup is correct.
>>     
>
> Can you honestly see in a glance if
>   

[snip]

> is correct? Noisy markup is not easy to verify.
>   

Well, yes, I can see it easily in a glance. Although I will admit it 
would probably take your average DP volunteer two or three tries to rise 
to that level of proficiency.

However, I believe you have mis-stated the proposition, which was that

<div type="chapter">
<head>CHAPTER VII</head>
<head>THE MERRY LITTLE BREEZES HELP LIGHTFOOT</head>

is easier to validate at a glance than is:


CHAPTER VII


THE MERRY LITTLE BREEZES HELP
LIGHTFOOT


Could you have seen the hunter...

Or how about:


 VI--AMERICAN BUSINESS IN THE WAR
 Voluntary Cooperation of Experts and Loyal Support
 of Labor Put Our Industries on a War Basis
 By Grosvenor B. Clarkson
 Director of the U.S. Council of National Defense and of Its
 Advisory Commission


Modern wars are not won by mere numbers. They are not won by mere enthusiasm. They are not won by mere national spirit. They are won by the scientific conduct of war, the scientific application of irresistible force.

Woodrow Wilson

War today means that for every man on the fighting line...

Did you see the errors in the foregoing?

"Noisy" markup may be hard for humans to validate (although it is trivial for automation to validate), but ambiguous, invisible, and subtle markup is even harder.


>> "Because that's the way we've always done it" is about the
>> /worst/ reason I can imagine to justify anything.
>>     
>
> Every time we change markup at DP, we get a long period where people
> get confused about what the right way to do things is. There's some
> definite frustration at DP every so often about how proofers and
> formatters are constantly having to learn new rules. Not changing
> things unless there's a definite large benefit is a good reason.
>   

Well, not changing things unless the benefit outweighs the cost is a 
good reason. But you're changing the argument from "Because that's the 
way we've always done it" to "the DP volunteers have such a hard time 
adapting to new processes that making any change at all is too 
disruptive to our work." That's definitely a valid argument, I just 
don't believe it.

[snip]

> Why does UTF-8--an ASCII compatible encoding for
> Unicode--exist? It wasn't part of the original design of Unicode,
> which specified 16 (or 32) bit characters only; but Unix programmers
> weren't about to give up dealing with ASCII bytes. I'm sure Unicode
> could have fought that, but I don't think that would have been a good
> move for them.
>   

The 'C' standard libraries contain a number of routines designed to 
manipulate strings, which were defined as a series of 7 bit characters 
terminated with the "null" (0) character. If programmers would have 
started using strings where each character was 16 bits (2 bytes, UCS-2 
encoding) the standard 'C' libraries could not have been used, because 
the strings would have contained embedded zeros. Even now, many 
databases cannot store double-byte strings except as Binary Large 
OBjects (BLOBs). UTF-8 was developed because it enabled programmers to 
store Unicode characters in strings, without encountering embedded 
nulls, and while continuing to be able to use a large corpus of code 
written initially for 7 bit characters.

UTF-8 exists not because unix programmers were unwilling to change, but 
because it enabled the continued use of a large body of existing code 
and programs.

[snip]

>> I have complete confidence that semantic TEI markup
>> is capable of preserving all the data necessary to re-render the text in
>> a way that is aesthetically pleasing to me
>>     
>
> The original scans preserve all this data. We need to see it in practice.
>   

True, the original scans preserve all the data, if they are complete and 
of sufficient quality, but at a huge cost in usability. I simply cannot 
envision reading by looking at image scans on my 2.8 inch PDA screen. 
TEI encoding can preserve all the same data, but with an exponential 
increase in usability.

>> Those volunteers who are contemplating using TEI markup will have to
>> learn to trust that if they do so there will be tools, either existing
>> or in the future, that will create output that doesn't suck
>>     
>
> I don't see why they should, and I don't think they will. We want to
> know that our work is usable today, and we don't want to worry about
> such tools not appearing.
>   

TEI (or a similar markup) is the future. I just don't want to have to 
redo your work next year, because what you did today is inadequate.


From marcello at perathoner.de  Wed Oct 17 02:22:27 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 17 Oct 2007 11:22:27 +0200
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <6d99d1fd0710161543n3fe18eb1wccadceb312a14b03@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>
	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>	<47152D50.3000205@perathoner.de>
	<6d99d1fd0710161543n3fe18eb1wccadceb312a14b03@mail.gmail.com>
Message-ID: <4715D453.2020907@perathoner.de>

David Starner wrote:

>> <div>
>>   <index type="toc" level1="Chapter 1: Missy goes to space" />
>>   <head rend="font-size: 200%; text-align: center">CHAPTER 1</head>
>>   <head type="sub" rend="font-size: 80%; text-align: center">Missy Goes
>> to Space</head>
> 
> That's at least double the work of HTML.

No, that's not. And it gets you 3 user formats in one markup run.

BTW, the code above was written to answer Juliets question. In a
production environment you would use a PGTEI stylesheet, so you would
only write those formatting rules down once for all heads and subheads.


> The whole promise of TEI is
> that we shouldn't have to add the presentational stuff to make it come
> out right.

Definitely! The TEI converter should send tiny experimental signals down
the neural pathways to the visual and spacial cognition centers of the
PPers brain to see what kind of visual formatting is most likely to
please this one particular PPer.

The next version of PGTEI will support the "Sirius Cybernetics USB 2.0
Synaptic Scanner". Until then, you can work around using stylesheets and
the "rend" attribute, just like you are accustomed to do in HTML.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From joshua at hutchinson.net  Wed Oct 17 05:50:29 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Wed, 17 Oct 2007 12:50:29 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
Message-ID: <3466526.1192625429600.JavaMail.?@fh1064.dia.cp.net>

Ok, all sarcastic commentary aside ... the default render for <head> 
and <head type="sub"> gets the job done.

ie.

<head>CHAPTER 1</head>
<head type="sub">Missy Goes to Space</head>

Renders with a large "CHAPTER 1" and a slightly smaller "Missy Goes to 
Space".  The only rend attribute necessary is a centering attribute 
(the same thing you'd need to add in an HTML document because <h1> 
isn't automatically centered).  Now, just like HTML, you can center 
each <head> individually (<head rend="text-align: center">) or you can 
put a line in the stylesheet section at the beginning telling it to 
center every <head> element.

Honestly, for 99% of the stuff we see, the TEI code is no more complex 
than the HTML code equivalent.  It's just that it is DIFFERENT from the 
HTML code equivalent and therefore needs different tools/scripts if you 
want to automate any part of it.

Marcello has said, repeatedly, he's not interested nor has the time to 
write such tools and scripts.  I've admitted I don't have the ability.  
Lee sounds like he has the ability, but perhaps not the time or 
inclination.  If someone was willing to step up and start creating a 
tool/regex scripts/perl scripts/whatever, I'd be happy to work with 
them and I'm positive so would the couple of other people active in 
trying to work with TEI (Ralf and others from DP).

Right now, the arguments are going to be fairly useless and circular 
in nature simply because we don't have the tools to take the process to 
the next level.  And that is the main reason I haven't been actively 
stumping for TEI in quite a while (just answering questions as I see 
them).

Josh

>----Original Message----
>From: marcello at perathoner.de
>Date: Oct 17, 2007 5:22 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - 
empirical data
>
>David Starner wrote:
>
>>> <div>
>>>   <index type="toc" level1="Chapter 1: Missy goes to space" />
>>>   <head rend="font-size: 200%; text-align: center">CHAPTER 
1</head>
>>>   <head type="sub" rend="font-size: 80%; text-align: center">Missy 
Goes
>>> to Space</head>
>> 
>> That's at least double the work of HTML.
>
>No, that's not. And it gets you 3 user formats in one markup run.
>
>BTW, the code above was written to answer Juliets question. In a
>production environment you would use a PGTEI stylesheet, so you would
>only write those formatting rules down once for all heads and 
subheads.
>
>
>> The whole promise of TEI is
>> that we shouldn't have to add the presentational stuff to make it 
come
>> out right.
>
>Definitely! The TEI converter should send tiny experimental signals 
down
>the neural pathways to the visual and spacial cognition centers of 
the
>PPers brain to see what kind of visual formatting is most likely to
>please this one particular PPer.
>
>The next version of PGTEI will support the "Sirius Cybernetics USB 
2.0
>Synaptic Scanner". Until then, you can work around using stylesheets 
and
>the "rend" attribute, just like you are accustomed to do in HTML.
>
>
>
>-- 
>Marcello Perathoner
>webmaster at gutenberg.org
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From piggy at netronome.com  Wed Oct 17 06:14:45 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Wed, 17 Oct 2007 09:14:45 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <47158E88.8080404@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>	<200710051727.42671.rolsch@verizon.net>	<470E5720.6030207@netronome.com>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>	<47150A06.1010207@verizon.net>	<47153F2C.8060308@novomail.net>	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
Message-ID: <47160AC5.2060007@netronome.com>

Lee Passey wrote:
> David Starner wrote:
>   
>> On 10/16/07, Lee Passey <lee at novomail.net> wrote:
>>   
>>     
>> ...
>>> "Because that's the way we've always done it" is about the
>>> /worst/ reason I can imagine to justify anything.
>>>     
>>>       
>> Every time we change markup at DP, we get a long period where people
>> get confused about what the right way to do things is. There's some
>> definite frustration at DP every so often about how proofers and
>> formatters are constantly having to learn new rules. Not changing
>> things unless there's a definite large benefit is a good reason.
>>
>>     
>
> Well, not changing things unless the benefit outweighs the cost is a 
> good reason. But you're changing the argument from "Because that's the 
> way we've always done it" to "the DP volunteers have such a hard time 
> adapting to new processes that making any change at all is too 
> disruptive to our work." That's definitely a valid argument, I just 
> don't believe it.
>   

Experiments at PGDP are pretty easy to conduct. Could we hear from 
someone who has run a book through PGDP asking the F* rounds to use 
PGTEI instead of normal PGDP markup?

I have some easy novels similar to novels already in PG which I am 
willing to make available to someone willing to run such an experiment. 
Can we even get enough interest from formatters to complete one such book?

>>> I have complete confidence that semantic TEI markup
>>> is capable of preserving all the data necessary to re-render the text in
>>> a way that is aesthetically pleasing to me
>>>     
>>>       
>> The original scans preserve all this data. We need to see it in practice.
>>   
>>     
>
> True, the original scans preserve all the data, if they are complete and 
> of sufficient quality, but at a huge cost in usability. I simply cannot 
> envision reading by looking at image scans on my 2.8 inch PDA screen. 
> TEI encoding can preserve all the same data, but with an exponential 
> increase in usability.
>   
I tend to agree that TEI is more easily manipulated than even very good 
page scans, but I take issue with the claim that it can preserve all the 
same data.

I have a large collection of high-resolution color and grayscale scans 
of blank paper. I'm more than happy to provide one or two to anyone who 
would like to attempt TEI markup of the paper properties I'm interested in.

TEI is great, but we need to preserve high-grade scans too.
>   
>>> Those volunteers who are contemplating using TEI markup will have to
>>> learn to trust that if they do so there will be tools, either existing
>>> or in the future, that will create output that doesn't suck
>>>     
>>>       
>> I don't see why they should, and I don't think they will. We want to
>> know that our work is usable today, and we don't want to worry about
>> such tools not appearing.
>>   
>>     
>
> TEI (or a similar markup) is the future. I just don't want to have to 
> redo your work next year, because what you did today is inadequate.
>   
If I'm researching the metrical structure of Shakespearean sonnets, I 
might START with PG TEI, but the TEI markup I'm looking for is most 
likely missing or not very reliable. No TEI text will be sufficient for 
every potential user.

I think we should be able to make a good case for TEI without making 
unnecessarily broad claims. Whatever we do today WILL be inadequate for 
some future user. Let's do what we can to give them a good starting point.


From prosfilaes at gmail.com  Wed Oct 17 07:04:17 2007
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 17 Oct 2007 10:04:17 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <47158E88.8080404@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
Message-ID: <6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>

On 10/17/07, Lee Passey <lee at novomail.net> wrote:
> TEI (or a similar markup) is the future.

No one knows what's in the future. From my vantage point, I've seen a
lot of movement towards HTML, with all those devices that supposedly
need specialized handling getting more and more artful at dealing with
the ever-present HTML.

> I just don't want to have to
> redo your work next year, because what you did today is inadequate.

If you have the time to go around redoing inadequate work, you mind if
I upload my scans of the Grammar of the Lau Language to you? Between
the ASCIIifaction and the loss of graphics, it's been on my list for
redoing for a long time. Thanks.

From lee at novomail.net  Wed Oct 17 08:53:59 2007
From: lee at novomail.net (Lee Passey)
Date: Wed, 17 Oct 2007 09:53:59 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>	<47153F2C.8060308@novomail.net>	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
Message-ID: <47163017.5090605@novomail.net>

David Starner wrote:
> 
> If you have the time to go around redoing inadequate work, you mind if
> I upload my scans of the Grammar of the Lau Language to you? Between
> the ASCIIifaction and the loss of graphics, it's been on my list for
> redoing for a long time. Thanks.

Ordinarily I'm open to these kind of suggestions, but there are a few 
extra considerations.

Most importantly, I want to get the biggest bang for my buck. There are, 
no doubt, hundreds, if not thousands, of works in the early PG corpus 
that not only need to be redone, but which are quite popular. While 
<title>The Grammar of the Lau Language</title> is no doubt of interest 
to you, it seems to be a rather esoteric work, devoid of interest to the 
public at large.

Mr. Perathoner claims that the most popular work at PG is Jane Austen's 
<title>Pride and Prejudice</title>. He has offered no evidence to 
support this claim, but it doesn't seem unlikely to me. So after I 
complete the two conversions I have in my queue right now, and if I 
can't find a reputable TEI version of <title>Pride and Prejudice</title> 
that will probably be my next project.

It would be really great if we could figure out a way to gather download 
statistics from PG over the past 4-5 years, so we could get a better 
handle on just what works /are/ of the greatest interest to the general 
public, and then focus our effort on re-doing those works. The Gutenberg 
web site only lists the most popular downloads in the past 30 days, but 
I note that the Internet Archive's Wayback machine has archived these 
pages since Sept. 2004, so I might be able to write a tool that can 
aggregate these pages.

I thought I had read in the PG faq that PG is not really interested in 
archiving multiple editions of the same work. After all, Project 
Gutenberg is, in point of fact, an e-publisher that publishes its own 
editions. So I don't think that PG would be open to archiving different 
editions of the same work.

I also think that Mr. Hutchinson is right when he says that while the 
submission of a degraded text version of any particular work is not a 
<foreign>de jure</foreign> it /is/ a <foreign>de facto</foreign> rule. I 
have absolutely no interest in creating degraded text versions of any of 
the works I transcribe, nor do I have any interest in encouraging other 
people to do so.

So, in all likelihood, I will not submit to Project Gutenberg any 
versions of its most popular downloads that I have redone. These will be 
submitted to the Internet Archive instead.

If anyone would like to join me in my efforts, I would be glad to have 
the help.

-- 
Nothing of significance below this line.


From jon at noring.name  Wed Oct 17 09:13:31 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 10:13:31 -0600
Subject: [gutvol-d] A proposed list of common understandings on the TEI
	mastering threads
In-Reply-To: <47160AC5.2060007@netronome.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<200710051727.42671.rolsch@verizon.net>
	<470E5720.6030207@netronome.com> <470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net> <47160AC5.2060007@netronome.com>
Message-ID: <121591611.20071017101331@noring.name>

La Monte H.P. Yarroll wrote:

> I think we should be able to make a good case for TEI without making 
> unnecessarily broad claims. Whatever we do today WILL be inadequate
> for some future user. Let's do what we can to give them a good
> starting point.

This is an excellent comment.

The common understandings we have in this set of TEI-related threads
are the following:

1) Each text project will use a known source book, and the final
   digitized text, in whatever form, will be "accurate" to that source
   book, and will include metadata referencing that source book.
   (Note that in this statement "accurate" remains undefined.)

2) Each text project will always make available the source book
   scanset in (at least) sufficient quality for OCR, human proofing,
   verifying text accuracy by end-users, and discerning the original
   typography. (I believe every scanset should be archival quality
   but this is an issue not germane to this particular discussion.)

3) Each text project will produce a "digital master" from which all
   user renditions, and other types of uses, will be derived.

4) The "digital master" will be an XML document marked up with some
   "flavor" of TEI.

[Note: There may be a couple other common understandings that I've not
included in the above list, and certainly mention them if you think of
them. But I think this is a good starting point of where I believe
most of us participating in these threads agree with. However, if we
don't have super-majority agreement on the above four items, then the
gap in views is wider than I suspected, and I doubt we can ever get to
any agreement at all on the specifics of implementation if we can't
even agree on the general principles. I'll assume in the comments
below that we have collective majority agreement on the above general
understandings.]


What is obvious, though, is that these common understandings are not
of sufficient completeness that the specifics of implementation become
crystal clear -- they just don't fall out. So that's the reason for
our discussions, to clarify each understanding and maybe also add to
the list. And this is proving difficult because we tend to fall into
different "camps" as Josh, and then Lee, so eloquently explained.

Alright, now to provide maybe a little more on the above from my
perspective...

Obviously, a dream we all have is that the "master" will have all
that is needed to allow push-button auto-conversion, using today's
technology, for *all conceivable renditions and uses* we can ever
imagine. But the reality is that this is unreasonable and probably
impossible. I think we do agree on this.

Thus, I see a "master" as a sort of intermediary which captures the
most important information common to all conceivable uses, and maybe
with some added support for some select uses. Thus the "master"
becomes, as La Monte says, a good starting point.

I believe, then, that we come to some agreement about what minimum
information the master should capture, and what form the information
will take in the master. This is where we disagree on the specifics.

[As an aside, hopefully we can work towards some general set of
requirements to aid in decision making -- this is what any competent
engineering organization does in project development. But we haven't
yet taken an objective requirements approach to this, and if we don't,
then agreement will never be reached. Rather it will become a
Darwinian race between various factions to see whose views prevail,
and contrary to what others may otherwise think, oftentimes such races
do not lead to the best long-term result for the common vision we all
share. The best long-term result might happen, but then it is more
likely not -- it depends upon the views of those pushing their
particular solution.]

Obviously, I think we can agree that if some information is pretty
trivial (effort-wise) to add to the master during mastering which is
useful to certain *recognized* end-uses, and such information does
not inhibit other important uses, then it makes sense to add it to
the master. We might, too, consider which end-uses are the most
important and make sure we have enough information to allow for full
auto-conversion for those uses, or get it very close to that level.

So that may be a discussion thread: what are the most important user
renditions/uses we need to support above all others?

(The one clinker in this last consideration has less to do with
typography and more to do with "accuracy" -- error correction. I
believe the master must, for a couple reasons I've noted previously,
faithfully preserve the original text in the source book, including
author's, publisher's, and typesetter's errors. And, yes, sometimes
decisions have to be made about what exactly to transcribe to be
"accurate" to the original. But this does not preclude marking up, in
the "master", corrections to such errors. Conversion systems for
some end-use purpose can then decide whether to use the "original text
warts and all", or use the corrections, or a particular set of
corrections since we should, I believe, allow for different sets of
corrections based on different perspectives or end-uses.)

Anyway, I could go on, but I'll mercifully end this message. <laugh/>

Thoughts? Additions to the general agreements list?

Jon 


From traverso at posso.dm.unipi.it  Wed Oct 17 09:13:52 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Wed, 17 Oct 2007 18:13:52 +0200 (CEST)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <47163017.5090605@novomail.net> (message from Lee Passey on Wed, 
	17 Oct 2007 09:53:59 -0600)
References: <20071001081923.GA29575@ark.in-berlin.de>	<470EB4BC.9090201@novomail.net>	<20071013182304.GA5263@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>	<47153F2C.8060308@novomail.net>	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
	<47163017.5090605@novomail.net>
Message-ID: <20071017161352.5A02410231@posso.dm.unipi.it>

>>>>> "Lee" == Lee Passey <lee at novomail.net> writes:

    Lee> I thought I had read in the PG faq that PG is not really
    Lee> interested in archiving multiple editions of the same
    Lee> work. After all, Project Gutenberg is, in point of fact, an
    Lee> e-publisher that publishes its own editions. So I don't think
    Lee> that PG would be open to archiving different editions of the
    Lee> same work.

I don't know if this is written in the FAQ, but practically it is
absolutely false. PG has many examples of multiple editions,
(transcriptions of different editions marginally different), and many
are added regularly. It has also a few multiple transcriptions of the
same edition, but this has stopped: new transcriptions are used to
produce a merged new edition. A new transcription is never discarded.

Carlo Traverso

From jon at noring.name  Wed Oct 17 09:25:23 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 10:25:23 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<470EB4BC.9090201@novomail.net>
	<20071013182304.GA5263@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
Message-ID: <1388279505.20071017102523@noring.name>

David Starner wrote:

> No one knows what's in the future. From my vantage point, I've seen a
> lot of movement towards HTML, with all those devices that supposedly
> need specialized handling getting more and more artful at dealing with
> the ever-present HTML.

Well, David's comment seems to make the assumption that the end-user
version is the same as the "master" version. Lee's comments that led
to this assume a "master" from which other renditions, like XHTML, are
derived.

Is the concept of an intermediary "master" from which all end-user
renditions are derived something that we have not yet come to a
collective agreement?

Jon Noring


From prosfilaes at gmail.com  Wed Oct 17 09:35:55 2007
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 17 Oct 2007 12:35:55 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <1388279505.20071017102523@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
	<1388279505.20071017102523@noring.name>
Message-ID: <6d99d1fd0710170935i76b58b9en3aa954c579785371@mail.gmail.com>

On 10/17/07, Jon Noring <jon at noring.name> wrote:
> David Starner wrote:
>
> > No one knows what's in the future. From my vantage point, I've seen a
> > lot of movement towards HTML, with all those devices that supposedly
> > need specialized handling getting more and more artful at dealing with
> > the ever-present HTML.
>
> Well, David's comment seems to make the assumption that the end-user
> version is the same as the "master" version.

It doesn't make that assumption. Today, HTML is the most common master
format for ebooks, at least the non-commercial kind. And partially
because of that, most ebook readers read HTML, either directly, or by
converting it into a different end-user version. Frankly, I don't see
HTML being unseated as the primary master ebook format by TEI; people
are going to use HTML because it's easy to create, and everyone can
read it directly. The question for me is whether TEI is going to be a
major format for PG.

From joshua at hutchinson.net  Wed Oct 17 09:36:29 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Wed, 17 Oct 2007 16:36:29 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule -
	empirical	data
Message-ID: <14178793.1192638989699.JavaMail.?@fh1064.dia.cp.net>

The problem is that "markup nuts" (and I use the term affectionately), 
like you (Jon) and Lee, view a master document on its own, assuming 
that conversion to end user formats is possible and nothing to be 
worried about.

Folks like David, judge a master document by the end-user format that 
results from it.  So the HTML and PDF that outputs from a TEI master 
are the yardstick by which TEI is being measured.

Both are a valid yardstick, but they make the arguments back and forth 
seem to have a conceptual gap between them.

Josh

>----Original Message----
>From: jon at noring.name
>Date: Oct 17, 2007 12:25 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: Re: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - 
empirical	data
>
>David Starner wrote:
>
>> No one knows what's in the future. From my vantage point, I've seen 
a
>> lot of movement towards HTML, with all those devices that 
supposedly
>> need specialized handling getting more and more artful at dealing 
with
>> the ever-present HTML.
>
>Well, David's comment seems to make the assumption that the end-user
>version is the same as the "master" version. Lee's comments that led
>to this assume a "master" from which other renditions, like XHTML, 
are
>derived.
>
>Is the concept of an intermediary "master" from which all end-user
>renditions are derived something that we have not yet come to a
>collective agreement?
>
>Jon Noring
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From prosfilaes at gmail.com  Wed Oct 17 09:42:55 2007
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 17 Oct 2007 12:42:55 -0400
Subject: [gutvol-d] A proposed list of common understandings on the TEI
	mastering threads
In-Reply-To: <121591611.20071017101331@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net> <47160AC5.2060007@netronome.com>
	<121591611.20071017101331@noring.name>
Message-ID: <6d99d1fd0710170942k28c03fb1q6cb55f7936b92ca6@mail.gmail.com>

On 10/17/07, Jon Noring <jon at noring.name> wrote:
> I
> believe the master must, for a couple reasons I've noted previously,
> faithfully preserve the original text in the source book, including
> author's, publisher's, and typesetter's errors.

Eh. There are some volumes for which this is important. But if you
really want to go through  French and Oriental Love in a Harem
(http://www.gutenberg.org/etext/21868) and pick out all the times
where the typesetter forgot which way u's go or ran out of b's and
started using h's, go ahead.

From prosfilaes at gmail.com  Wed Oct 17 09:51:38 2007
From: prosfilaes at gmail.com (David Starner)
Date: Wed, 17 Oct 2007 12:51:38 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <47163017.5090605@novomail.net>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
	<47163017.5090605@novomail.net>
Message-ID: <6d99d1fd0710170951r14a2a14r61db24125666032a@mail.gmail.com>

On 10/17/07, Lee Passey <lee at novomail.net> wrote:
> Mr. Perathoner claims that the most popular work at PG is Jane Austen's
> <title>Pride and Prejudice</title>. He has offered no evidence to
> support this claim, but it doesn't seem unlikely to me. So after I
> complete the two conversions I have in my queue right now, and if I
> can't find a reputable TEI version of <title>Pride and Prejudice</title>
> that will probably be my next project.

But is there any evidence that the PG edition of Pride and Prejudice
is lacking anything? It seems very political to go after the most
highly visible material, rather than working on stuff that most needs
doing.

> So, in all likelihood, I will not submit to Project Gutenberg any
> versions of its most popular downloads that I have redone. These will be
> submitted to the Internet Archive instead.

Then why are you posting here? This is a list of PG volunteers to
discuss PG. If you want to discuss ebooks unrelated to PG, the
bookpeople list is quite open to that.

> If anyone would like to join me in my efforts, I would be glad to have
> the help.

It's generally considered tacky to try and drag away volunteers from a
competing project on their mailing lists.

From piggy at netronome.com  Wed Oct 17 10:08:51 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Wed, 17 Oct 2007 13:08:51 -0400
Subject: [gutvol-d] A proposed list of common understandings on the TEI
 mastering threads
In-Reply-To: <6d99d1fd0710170942k28c03fb1q6cb55f7936b92ca6@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>	<4713A930.7060103@novomail.net>	<20071016080528.GA4609@ark.in-berlin.de>	<4714ED28.6070008@novomail.net>
	<47150A06.1010207@verizon.net>	<47153F2C.8060308@novomail.net>	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>	<47158E88.8080404@novomail.net>
	<47160AC5.2060007@netronome.com>	<121591611.20071017101331@noring.name>
	<6d99d1fd0710170942k28c03fb1q6cb55f7936b92ca6@mail.gmail.com>
Message-ID: <471641A3.2010704@netronome.com>

David Starner wrote:
> On 10/17/07, Jon Noring <jon at noring.name> wrote:
>   
>> I
>> believe the master must, for a couple reasons I've noted previously,
>> faithfully preserve the original text in the source book, including
>> author's, publisher's, and typesetter's errors.
>>     
>
> Eh. There are some volumes for which this is important. But if you
> really want to go through  French and Oriental Love in a Harem
> (http://www.gutenberg.org/etext/21868) and pick out all the times
> where the typesetter forgot which way u's go or ran out of b's and
> started using h's, go ahead.
>   

David brings out two very nice points:

1) Not every work deserves the same amount of effort.
2) In a volunteer project, each volunteer gets to decide what deserves 
their effort.

I'm experimenting with TEI because I have been frustrated with getting 
both HTML and text editions that are consistent with each other and make 
me feel good about having produced them.


From jon at noring.name  Wed Oct 17 10:21:48 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 11:21:48 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <6d99d1fd0710170951r14a2a14r61db24125666032a@mail.gmail.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net>
	<6d99d1fd0710170704w3b2ee680k87e8e99da324d2a@mail.gmail.com>
	<47163017.5090605@novomail.net>
	<6d99d1fd0710170951r14a2a14r61db24125666032a@mail.gmail.com>
Message-ID: <1839225867.20071017112148@noring.name>

David Starner wrote:
> Lee Passey wrote:

>> Mr. Perathoner claims that the most popular work at PG is Jane Austen's
>> <title>Pride and Prejudice</title>. He has offered no evidence to
>> support this claim, but it doesn't seem unlikely to me. So after I
>> complete the two conversions I have in my queue right now, and if I
>> can't find a reputable TEI version of <title>Pride and Prejudice</title>
>> that will probably be my next project.

> But is there any evidence that the PG edition of Pride and Prejudice
> is lacking anything? It seems very political to go after the most
> highly visible material, rather than working on stuff that most needs
> doing.

Well, without rehashing the *many* reasons, the most popular works of
the Public Domain in the PG corpus need to be redone from scratch. The
PG versions, particularly the pre-DP ones, are wholly unsatisfactory
for one or more reasons.

And regarding "what most needs doing", DP is working at a fast clip,
so I don't have any worries that Lee will somehow slow DP down in any
way. He won't. And those who work outside of DP will continue to do
what they will continue to do.


>> So, in all likelihood, I will not submit to Project Gutenberg any
>> versions of its most popular downloads that I have redone. These will be
>> submitted to the Internet Archive instead.

> Then why are you posting here? This is a list of PG volunteers to
> discuss PG. If you want to discuss ebooks unrelated to PG, the
> bookpeople list is quite open to that.

Well, maybe Lee is finding who here might wish to join him.  :^)

After all, the goal of PG is to digitize the Public Domain books, and
make them freely available to the world. I know Michael is *supportive*
of other projects that do the same. He himself has said it. So Lee's
comment is very much appropriate to gutvol-d.

Now if Greg and Michael think differently than I wrote above, then I
hope they chime in since they are the ones who actually administer
this group.


>> If anyone would like to join me in my efforts, I would be glad to have
>> the help.

> It's generally considered tacky to try and drag away volunteers from a
> competing project on their mailing lists.

See my comment above. What Lee proposes is not competitive -- it is in
the spirit of PG as envisioned by Michael Hart. So, by definition, it
cannot take away volunteers from PG. If anything, Lee's proposal will
add volunteers to this goal.

Btw, count me in as joining Lee in his effort. Anyone else here
interested in helping Lee out, contact him directly -- or contact me
if you can't get a hold of him.

Jon Noring


From jon at noring.name  Wed Oct 17 10:38:39 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 11:38:39 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <14178793.1192638989699.JavaMail.?@fh1064.dia.cp.net>
References: <14178793.1192638989699.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <751740422.20071017113839@noring.name>

Josh wrote:

> The problem is that "markup nuts" (and I use the term affectionately),
> like you (Jon) and Lee, view a master document on its own, assuming 
> that conversion to end user formats is possible and nothing to be 
> worried about.

LOL.

To the contrary, we consider the conversion to end-user renditions to
be critical, but we are looking at the even bigger picture of ebook
formats, various platforms from 2" screens and larger, etc. (I think
many of the "beautiful" XHTML editions being produced at DP will not
do well on such platforms.) And also the needs of the accessibility
community.

Nevertheless, I don't think the gap between the two camps is as large
as some may think it to be since the arguments have more to do with
details rather than general philosophy. I also suspect for 80% to 90%
of the books, there are not that many issues that lead to wild
disagreements regarding markup.

As another example, Juliet mentioned about some poetry where the verse
lines are indented various lengths in the original, and that is
important information that needs to be preserved since the author had
some reason for variable indentation. So I see value for some renditions
needing to reproduce that, and the "rend" attribute seems sufficient to
communicate that information. But note that for some end-user
renditions, such as for 2" screens, you *don't* want to force much if
any indentation of the verse because the verse will become unreadable.
And what about text-to-speech for the blind? I don't think Lee and I
are saying we will dump certain typographic information, but that it
must be preserved the right way, and used where appropriate, and NOT
used where appropriate. We must not FORCE all end-user renditions to
present the texts a certain way. And I see some of the DP work product
to get perilously close to this border. That is, DP may be releasing
end-user renditions as "masters" but which do not make good "masters"
and also do not make good renditions for certain platforms.


> Folks like David, judge a master document by the end-user format that 
> results from it.  So the HTML and PDF that outputs from a TEI master 
> are the yardstick by which TEI is being measured.
>
> Both are a valid yardstick, but they make the arguments back and forth
> seem to have a conceptual gap between them.

Yes, agreed.

In my other message where I propose the understandings we do seem to
share, I make note that the "master" may certainly include information
of value for the more recognized end-use renditions. Refer to that for
more details. And of course my example mentioned above.


Jon


From jon at noring.name  Wed Oct 17 10:53:51 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 11:53:51 -0600
Subject: [gutvol-d] A proposed list of common understandings on the TEI
	mastering threads
In-Reply-To: <471641A3.2010704@netronome.com>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<4713A930.7060103@novomail.net>
	<20071016080528.GA4609@ark.in-berlin.de>
	<4714ED28.6070008@novomail.net> <47150A06.1010207@verizon.net>
	<47153F2C.8060308@novomail.net>
	<6d99d1fd0710161611n5005bad3he9364a9b37b30e75@mail.gmail.com>
	<47158E88.8080404@novomail.net> <47160AC5.2060007@netronome.com>
	<121591611.20071017101331@noring.name>
	<6d99d1fd0710170942k28c03fb1q6cb55f7936b92ca6@mail.gmail.com>
	<471641A3.2010704@netronome.com>
Message-ID: <371969742.20071017115351@noring.name>

La Monte H.P. Yarroll wrote:
> David Starner wrote:

>> Eh. There are some volumes for which this is important. But if you
>> really want to go through  French and Oriental Love in a Harem
>> (http://www.gutenberg.org/etext/21868) and pick out all the times
>> where the typesetter forgot which way u's go or ran out of b's and
>> started using h's, go ahead.

To answer David, one reason is for purposes of aligning the master
with the digital scans for future proofing, OCRing, etc. Plus, there
are scholars who may be interested in this. Certainly for these kinds
of texts which PG/DP is now doing, they will correct it *anyway*, so
we can certainly provide the original *and* the marked up corrections.

That is, DP gets the original information, and then throws it away
with the corrections. Who says the original information needs to be
thrown away?

(As an aside, I do believe the "master" must preserve the original
line breaks, even "in-word", since this is important for alignment
plus the event someone didn't correctly join an EOL broken word. In
addition, it will be useful for those who may wish to produce an exact
facsimile reproduction. After all, it is information that the OCR
or key entry gives us, so stripping it out *removes* information we
already have which may prove useful to both the project and some users
of the "master".)


> David brings out two very nice points:
>
> 1) Not every work deserves the same amount of effort.

Exactly!


> 2) In a volunteer project, each volunteer gets to decide what deserves
>   their effort.
>
> I'm experimenting with TEI because I have been frustrated with getting
> both HTML and text editions that are consistent with each other and make
> me feel good about having produced them.

This again is the idea of a "master", to capture the critical
information in the most processible manner so others who wish to
produce specific renditions meeting specific requirements have a
reasonable starting point to build upon. Certainly we should strive
for auto-conversion for the most important renditions which are most
widely used, but other more specialized renditions may require some
human effort, and whether anyone will do the effort depends upon
whether the work is worth it to them. Apparently David Starner feels
that "French and Oriental Love in a Harem" is important enough for him
to produce a very fancy rendition of it, while to others that book
will not be considered important enough and the original page scans
and the "raw accurate text" sufficient for their needs...

Jon Noring


From grythumn at gmail.com  Wed Oct 17 11:29:18 2007
From: grythumn at gmail.com (Robert Cicconetti)
Date: Wed, 17 Oct 2007 14:29:18 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <751740422.20071017113839@noring.name>
References: <14178793.1192638989699.JavaMail.?@fh1064.dia.cp.net>
	<751740422.20071017113839@noring.name>
Message-ID: <15cfa2a50710171129u26d95c6cldca8cd6e406d74a2@mail.gmail.com>

On 10/17/07, Jon Noring <jon at noring.name> wrote:
> As another example, Juliet mentioned about some poetry where the verse
> lines are indented various lengths in the original, and that is
> important information that needs to be preserved since the author had
> some reason for variable indentation. So I see value for some renditions
> needing to reproduce that, and the "rend" attribute seems sufficient to
> communicate that information. But note that for some end-user
> renditions, such as for 2" screens, you *don't* want to force much if
> any indentation of the verse because the verse will become unreadable.

Just to have an example handy, Christmas and its associations has a
poem formatted in the shape of a stocking. I didn't PP this one, I
just scanned it.

http://www.gutenberg.org/files/22042/22042-h/22042-h.htm
http://www.gutenberg.org/files/22042/22042-h/images/stocking.jpg

R C

From Bowerbird at aol.com  Wed Oct 17 11:39:22 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 17 Oct 2007 14:39:22 EDT
Subject: [gutvol-d] pushing the merry-go-round
Message-ID: <c26.21872cfd.3447b0da@aol.com>

my spam folder is overflowing!

it looks like lee is pushing the noring merry-go-round,
and taking the whole listserve for a ride...           :+)

i'm glad i got off that thing, or i would be _very_ dizzy.

meanwhile, my .pdf converter is starting to look good!

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071017/f5369898/attachment.htm 

From jon at noring.name  Wed Oct 17 11:41:55 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 12:41:55 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <15cfa2a50710171129u26d95c6cldca8cd6e406d74a2@mail.gmail.com>
References: <14178793.1192638989699.JavaMail.?@fh1064.dia.cp.net>
	<751740422.20071017113839@noring.name>
	<15cfa2a50710171129u26d95c6cldca8cd6e406d74a2@mail.gmail.com>
Message-ID: <769730035.20071017124155@noring.name>

Robert wrote:
> Jon Noring wrote:

>> As another example, Juliet mentioned about some poetry where the verse
>> lines are indented various lengths in the original, and that is
>> important information that needs to be preserved since the author had
>> some reason for variable indentation. So I see value for some renditions
>> needing to reproduce that, and the "rend" attribute seems sufficient to
>> communicate that information. But note that for some end-user
>> renditions, such as for 2" screens, you *don't* want to force much if
>> any indentation of the verse because the verse will become unreadable.

> Just to have an example handy, Christmas and its associations has a
> poem formatted in the shape of a stocking. I didn't PP this one, I
> just scanned it.
>
> http://www.gutenberg.org/files/22042/22042-h/22042-h.htm
> http://www.gutenberg.org/files/22042/22042-h/images/stocking.jpg

LOL, cool. :^)

In many prior messages I do note that when the typography itself
becomes content, that one may consider SVG. This way the content is
still accessible, and has fallback to the core text, but it will
also reproduce the layout. Fortunately things like this are not
that common -- they occur, but only in a small percentage.

This is one of those things that really challenges cross-platform
presentation. Yet we must not throw up our hands in despair and say it
can't be done, but simply compromise as to what will be presented on
some platforms. Trying to *force* the stocking typography on 2"
cellphone screens may end up making the content inside the stocking,
which is the core content, very difficult if not impossible to read.
Is that what we want to do?

Thus, our solutions have to take into account the wide cross-platform
visual presentation support we'd like to give. And of course, for the
stocking, how does one render that in text-to-speech? (Ideally, a
human would say, "and the following text is shaped on the paper in the
form of a stocking...".)

Jon Noring


From ralf at ark.in-berlin.de  Wed Oct 17 03:32:30 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Wed, 17 Oct 2007 12:32:30 +0200
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <47150207.9010203@verizon.net>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name>
	<47150207.9010203@verizon.net>
Message-ID: <20071017103230.GA17928@ark.in-berlin.de>

Wouldn't it be better, before discussing &nbsp; to look what
current browsers already do automatically when rendering text?

I have the impression that Firefoxc already handles some of the
cases but can't put the finger on it.


ralf


From jon at noring.name  Wed Oct 17 12:09:03 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 13:09:03 -0600
Subject: [gutvol-d] pushing the merry-go-round
In-Reply-To: <c26.21872cfd.3447b0da@aol.com>
References: <c26.21872cfd.3447b0da@aol.com>
Message-ID: <6710107542.20071017130903@noring.name>

Bowerbird wrote:

> my spam folder is overflowing!

I bet. LOL.


>  it looks like lee is pushing the noring merry-go-round,
>  and taking the whole listserve for a ride...?????????  :+)

Bringing in names is irrelevant in this discussion. Calling it a
"merry-go-round" is fine, but adding a name is a form of disparagement
that creates a hostile discussion environment. Some may even classify
what you are saying as a form of hate speech by your focusing on the
individual rather than on the thoughts and ideas being discussed.


> i'm glad i got off that thing, or i would be _very_ dizzy.

No comment, but I am so tempted... <smile/>


> meanwhile, my .pdf converter is starting to look good!

Great!

Btw, did you not realize that there does seem consensus that your ZML
is insufficient for the purpose you are promoting it for? And reasons
have been given? (I have a couple other reasons but have not taken the
time to elaborate on them.)

You can demonstrate all the conversion "toolz" you want, but the core
input itself is insufficient for PG/DP purposes of a "master" format,
therefore whatever you demonstrate in output will be wasted effort.

You probably disagree, and the best way to prove us wrong is to get a
list of 10 to 100 representative PG texts that the PG/DP experts
suggest, format them in ZML, then see what you can do with them. (Of
course, a couple of the books have to include block quotes, and maybe
notes inside of notes.)

Of course, I've repeatedly called for the PG/DP folk to recommend a
list (and this is another plea), but no one has. Either no one is
reading my messages (I'm getting a lot of replies on my messages so it
can't be that), or they simply don't want to provide *you* with such a
list. If they don't want to, I wonder why?

Jon Noring


From jon at noring.name  Wed Oct 17 12:11:02 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 17 Oct 2007 13:11:02 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <20071017103230.GA17928@ark.in-berlin.de>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name> <47150207.9010203@verizon.net>
	<20071017103230.GA17928@ark.in-berlin.de>
Message-ID: <1819197792.20071017131102@noring.name>

Ralf wrote:

> Wouldn't it be better, before discussing &nbsp; to look what
> current browsers already do automatically when rendering text?
>
> I have the impression that Firefoxc already handles some of the
> cases but can't put the finger on it.

Yes, this is interesting.

In many respects it is the quality and sophistication of the
typographic rendering engine used.

Jon Noring


From Bowerbird at aol.com  Wed Oct 17 12:47:49 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 17 Oct 2007 15:47:49 EDT
Subject: [gutvol-d] A proposed list of common understandings on the TEI
	mastering threads
Message-ID: <c88.171dc933.3447c0e5@aol.com>

piggy said:
>   I have some easy novels similar to novels already in PG which I am
>    willing to make available to someone willing to run such an experiment.

why work on simple books?   you need to complete a _test-suite_
which will give you the confidence that you need to proceed fully. 


>    Can we even get enough interest from formatters to complete one such 
book?

wouldn't it be nice if some of the .tei advocates took on this task?
really, if they won't even do this little bit of work, why should you?

the advice is exactly the same as what i gave to that gentleman who
recently proposed the e-texts be repurposed in the opendoc format:
mount a mirror of the library where the e-texts are in such a format
and -- if the format is really useful people -- your mirror will prevail.
convert the e-texts that are released daily, then work on the backlog.

the .tei people want _someone_else_ to do the work.
(and i don't blame 'em, because .tei is a lot of work...)

***

piggy said:
>   I'm experimenting with TEI because I have been frustrated with 
>    getting both HTML and text editions that are consistent with each other 
>    and make me feel good about having produced them.

when you tire of experimenting with .tei, take a look at z.m.l.
since the .html is auto-generated from the text version, they
will stay consistent.   and you will feel good about the quality.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071017/4d06cbf6/attachment.htm 

From Bowerbird at aol.com  Wed Oct 17 12:49:38 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 17 Oct 2007 15:49:38 EDT
Subject: [gutvol-d] taking my own advice
Message-ID: <d03.1dd2ff7a.3447c152@aol.com>

so, i'm now capable of easily churning out the conversion of
a couple of hundred pg-ascii e-texts into z.m.l. at a stretch...

so i expect to be releasing 1,000 at a time in the near future,
in the building of my z.m.l. mirror of the p.g. library.   game on.

if anyone has any preferences as to _where_ i should start
in the p.g. library, please say so, either here or backchannel
(e.g., start at 12000, or start at 16,500, or so on)...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071017/f7a0af5a/attachment.htm 

From rolsch at verizon.net  Wed Oct 17 13:18:47 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Wed, 17 Oct 2007 16:18:47 -0400
Subject: [gutvol-d] A proposed list of common understandings on the TEI
 mastering threads
In-Reply-To: <121591611.20071017101331@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<47160AC5.2060007@netronome.com> <121591611.20071017101331@noring.name>
Message-ID: <200710171618.47926.rolsch@verizon.net>

On Wednesday 17 October 2007 12:13:31 pm Jon Noring wrote:
> La Monte H.P. Yarroll wrote:
> > I think we should be able to make a good case for TEI without making
> > unnecessarily broad claims. Whatever we do today WILL be inadequate
> > for some future user. Let's do what we can to give them a good
> > starting point.
>
> This is an excellent comment.
>
> The common understandings we have in this set of TEI-related threads
> are the following:
>
> 1) Each text project will use a known source book, and the final
>    digitized text, in whatever form, will be "accurate" to that source
>    book, and will include metadata referencing that source book.
>    (Note that in this statement "accurate" remains undefined.)
>
> 2) Each text project will always make available the source book
>    scanset in (at least) sufficient quality for OCR, human proofing,
>    verifying text accuracy by end-users, and discerning the original
>    typography. (I believe every scanset should be archival quality
>    but this is an issue not germane to this particular discussion.)

IMO, this is very important.  It will always allow someone to refer back the 
the original source material.  It also will allow someone to further refine 
the TEI file or even to produce a total different file using a total 
different markup method.

>
> 3) Each text project will produce a "digital master" from which all
>    user renditions, and other types of uses, will be derived.
>
> 4) The "digital master" will be an XML document marked up with some
>    "flavor" of TEI.
>
> [Note: There may be a couple other common understandings that I've not
> included in the above list, and certainly mention them if you think of
> them. But I think this is a good starting point of where I believe
> most of us participating in these threads agree with. However, if we
> don't have super-majority agreement on the above four items, then the
> gap in views is wider than I suspected, and I doubt we can ever get to
> any agreement at all on the specifics of implementation if we can't
> even agree on the general principles. I'll assume in the comments
> below that we have collective majority agreement on the above general
> understandings.]
>
>
> What is obvious, though, is that these common understandings are not
> of sufficient completeness that the specifics of implementation become
> crystal clear -- they just don't fall out. So that's the reason for
> our discussions, to clarify each understanding and maybe also add to
> the list. And this is proving difficult because we tend to fall into
> different "camps" as Josh, and then Lee, so eloquently explained.
>
> Alright, now to provide maybe a little more on the above from my
> perspective...
>
> Obviously, a dream we all have is that the "master" will have all
> that is needed to allow push-button auto-conversion, using today's
> technology, for *all conceivable renditions and uses* we can ever
> imagine. But the reality is that this is unreasonable and probably
> impossible. I think we do agree on this.
>
> Thus, I see a "master" as a sort of intermediary which captures the
> most important information common to all conceivable uses, and maybe
> with some added support for some select uses. Thus the "master"
> becomes, as La Monte says, a good starting point.
>
> I believe, then, that we come to some agreement about what minimum
> information the master should capture, and what form the information
> will take in the master. This is where we disagree on the specifics.
>
> [As an aside, hopefully we can work towards some general set of
> requirements to aid in decision making -- this is what any competent
> engineering organization does in project development. But we haven't
> yet taken an objective requirements approach to this, and if we don't,
> then agreement will never be reached. Rather it will become a
> Darwinian race between various factions to see whose views prevail,
> and contrary to what others may otherwise think, oftentimes such races
> do not lead to the best long-term result for the common vision we all
> share. The best long-term result might happen, but then it is more
> likely not -- it depends upon the views of those pushing their
> particular solution.]
>
> Obviously, I think we can agree that if some information is pretty
> trivial (effort-wise) to add to the master during mastering which is
> useful to certain *recognized* end-uses, and such information does
> not inhibit other important uses, then it makes sense to add it to
> the master. We might, too, consider which end-uses are the most
> important and make sure we have enough information to allow for full
> auto-conversion for those uses, or get it very close to that level.
>
> So that may be a discussion thread: what are the most important user
> renditions/uses we need to support above all others?
>
> (The one clinker in this last consideration has less to do with
> typography and more to do with "accuracy" -- error correction. I
> believe the master must, for a couple reasons I've noted previously,
> faithfully preserve the original text in the source book, including
> author's, publisher's, and typesetter's errors. And, yes, sometimes
> decisions have to be made about what exactly to transcribe to be
> "accurate" to the original. But this does not preclude marking up, in
> the "master", corrections to such errors. Conversion systems for
> some end-use purpose can then decide whether to use the "original text
> warts and all", or use the corrections, or a particular set of
> corrections since we should, I believe, allow for different sets of
> corrections based on different perspectives or end-uses.)

From the P5 specs:

If the encoder elects both to record the original source text and to provide a 
correction for the sake of word-search and other programs, both sic and corr 
are used, wrapped in a choice: 

? marginal comments which indicate that the <choice>
  <corr>dates</corr>
  <sic>date's</sic>
 </choice> mentioned in the main body of the text are incorrect.

Question: DP proofing guidelines current call for contractions to be closed 
up.  Are we to mark this up as 

<choice>
  <corr>wouldn't</corr>
  <sic>would n't</sic>
</choice>

>
> Anyway, I could go on, but I'll mercifully end this message. <laugh/>
>
> Thoughts? Additions to the general agreements list?
>
> Jon
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From rolsch at verizon.net  Wed Oct 17 13:32:34 2007
From: rolsch at verizon.net (Roland Schlenker)
Date: Wed, 17 Oct 2007 16:32:34 -0400
Subject: [gutvol-d] A proposed list of common understandings on the TEI
 mastering threads
In-Reply-To: <371969742.20071017115351@noring.name>
References: <20071001081923.GA29575@ark.in-berlin.de>
	<471641A3.2010704@netronome.com> <371969742.20071017115351@noring.name>
Message-ID: <200710171632.34321.rolsch@verizon.net>

On Wednesday 17 October 2007 1:53:51 pm Jon Noring wrote:
> La Monte H.P. Yarroll wrote:
> > David Starner wrote:
> >> Eh. There are some volumes for which this is important. But if you
> >> really want to go through  French and Oriental Love in a Harem
> >> (http://www.gutenberg.org/etext/21868) and pick out all the times
> >> where the typesetter forgot which way u's go or ran out of b's and
> >> started using h's, go ahead.
>
> To answer David, one reason is for purposes of aligning the master
> with the digital scans for future proofing, OCRing, etc. Plus, there
> are scholars who may be interested in this. Certainly for these kinds
> of texts which PG/DP is now doing, they will correct it *anyway*, so
> we can certainly provide the original *and* the marked up corrections.
>
> That is, DP gets the original information, and then throws it away
> with the corrections. Who says the original information needs to be
> thrown away?
>
> (As an aside, I do believe the "master" must preserve the original
> line breaks, even "in-word", since this is important for alignment
> plus the event someone didn't correctly join an EOL broken word. In
> addition, it will be useful for those who may wish to produce an exact
> facsimile reproduction. After all, it is information that the OCR
> or key entry gives us, so stripping it out *removes* information we
> already have which may prove useful to both the project and some users
> of the "master".)

If original breaks are to be retained, then EOL hyphenations are going to be 
retained.  DP proofing guidelines current call for EOL hyphenations to be 
closed up to the line above.  IMO, I do not see that the DP proofing 
guidelines are going to be changed for this reason.

>
> > David brings out two very nice points:
> >
> > 1) Not every work deserves the same amount of effort.
>
> Exactly!
>
> > 2) In a volunteer project, each volunteer gets to decide what deserves
> >   their effort.
> >
> > I'm experimenting with TEI because I have been frustrated with getting
> > both HTML and text editions that are consistent with each other and make
> > me feel good about having produced them.
>
> This again is the idea of a "master", to capture the critical
> information in the most processible manner so others who wish to
> produce specific renditions meeting specific requirements have a
> reasonable starting point to build upon. Certainly we should strive
> for auto-conversion for the most important renditions which are most
> widely used, but other more specialized renditions may require some
> human effort, and whether anyone will do the effort depends upon
> whether the work is worth it to them. Apparently David Starner feels
> that "French and Oriental Love in a Harem" is important enough for him
> to produce a very fancy rendition of it, while to others that book
> will not be considered important enough and the original page scans
> and the "raw accurate text" sufficient for their needs...
>
> Jon Noring

Roland Schlenker


From joshua at hutchinson.net  Wed Oct 17 14:11:14 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Wed, 17 Oct 2007 21:11:14 +0000 (UTC)
Subject: [gutvol-d] pushing the merry-go-round
Message-ID: <28703431.1192655474902.JavaMail.?@fh1064.dia.cp.net>

>----Original Message----
>From: jon at noring.name
>
> ** on why no one has provided bowerbird with any list of texts to 
work from **
>
>... they simply don't want to provide *you* with such a
>list. If they don't want to, I wonder why?
>

Ding! Ding! Ding!  We have a winner!

Josh

From marcello at perathoner.de  Thu Oct 18 06:11:49 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu, 18 Oct 2007 15:11:49 +0200
Subject: [gutvol-d] [Fwd: Wiki2Tei converter 1.0!]
Message-ID: <47175B95.8090905@perathoner.de>

<hype type="noring">

Wow! Cool! This sounds great! Rave on!

</hype>


-------- Original Message --------
Subject: Wiki2Tei converter 1.0!
Date: Wed, 10 Oct 2007 20:33:30 +0200
From: Sylvain Loiseau <sylvain.loiseau at U-PARIS10.FR>
Reply-To: Sylvain Loiseau <sylvain.loiseau at U-PARIS10.FR>
To: TEI-L at LISTSERV.BROWN.EDU


We are pleased to announce the first release of the Wiki2Tei software.
Wiki2Tei is a converter from the mediawiki format to XML (TEI vocabulary).

The mediawiki format is used by wikimedia fundation wikis (Wikipedia,
Wikibooks, Wikisource), and many other wikis using the mediawiki software.
Large amounts of free hight-quality structured texts are available in this
format. These texts are used more and more often in NLP (natural language
processing) projects. However, the mediawiki parser is oriented towards
rendition and the mediawiki syntax is complex and hard to parse.

The Wiki2Tei converter makes available the information contained in wiki
syntax
(structuration, highlighting, etc.), and allows to properly retrieve the
plain
text. This conversion is intended to preserve all the properties of the
original text. Wiki2Tei is closely coupled with the mediawiki software,
allowing to convert all the features of the mediawiki syntax.

The Wiki2Tei converter provides a rich set of tools for converting
mediawiki
text from several sources (file, mediawiki database) and managing
collections
of files to be converted. The TEI vocabulary used is documented, according
to
the TEI Guidelines, in an ODD document. The code is open source and may be
downloaded from the SourceForge download area:

    http://sourceforge.net/projects/wiki2tei/
    http://sourceforge.net/project/showfiles.php?group_id=198407

The web site contains full documentation and a "demo":

    http://wiki2tei.sourceforge.net/
    http://wiki2tei.sourceforge.net/demo/

A mailing list is open:

    https://lists.sourceforge.net/lists/listinfo/wiki2tei-users

Best,
Bernard Desgraupes,
Sylvain Loiseau


----------------------------------------------------------------
Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre


-- 
Marcello Perathoner
webmaster at gutenberg.org


From hart at pglaf.org  Thu Oct 18 10:13:31 2007
From: hart at pglaf.org (Michael Hart)
Date: Thu, 18 Oct 2007 10:13:31 -0700 (PDT)
Subject: [gutvol-d] taking my own advice
In-Reply-To: <d03.1dd2ff7a.3447c152@aol.com>
References: <d03.1dd2ff7a.3447c152@aol.com>
Message-ID: <Pine.LNX.4.64.0710181012590.21491@pglaf.org>


Follow the instructions in "Alice". . . .


On Wed, 17 Oct 2007, Bowerbird at aol.com wrote:

> so, i'm now capable of easily churning out the conversion of
> a couple of hundred pg-ascii e-texts into z.m.l. at a stretch...
>
> so i expect to be releasing 1,000 at a time in the near future,
> in the building of my z.m.l. mirror of the p.g. library.   game on.
>
> if anyone has any preferences as to _where_ i should start
> in the p.g. library, please say so, either here or backchannel
> (e.g., start at 12000, or start at 16,500, or so on)...
>
> -bowerbird
>
>
>
> **************************************
> See what's new at http://www.aol.com
>

From piggy at netronome.com  Thu Oct 18 10:27:36 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Thu, 18 Oct 2007 13:27:36 -0400
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <4714C787.3060609@perathoner.de>
References: <be5.18c1e5c6.34450d5b@aol.com>	<6410280365.20071015125216@noring.name>
	<4714C787.3060609@perathoner.de>
Message-ID: <47179788.6080801@netronome.com>

Marcello Perathoner wrote:
> All in all, using non-printing characters as markup tags must be the
> most bone-headed design decision ever.
>   

But I LIKE python! :-)


From jon at noring.name  Thu Oct 18 10:33:34 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 18 Oct 2007 11:33:34 -0600
Subject: [gutvol-d] How are block quotes handled in ZML?
In-Reply-To: <Pine.LNX.4.64.0710181012590.21491@pglaf.org>
References: <d03.1dd2ff7a.3447c152@aol.com>
	<Pine.LNX.4.64.0710181012590.21491@pglaf.org>
Message-ID: <684145262.20071018113334@noring.name>

[Josh and others in PG/DP, do you know of specific PG texts which
contain block quotes? And I'd love to see some where the block quote
itself contains multiple paragraphs and maybe mixed with other
structures like verse. Jon]


Bowerbird wrote:

> so, i'm now capable of easily churning out the conversion of
> a couple of hundred pg-ascii e-texts into z.m.l. at a stretch...
>
> so i expect to be releasing 1,000 at a time in the near future,
> in the building of my z.m.l. mirror of the p.g. library.   game on.
>
> if anyone has any preferences as to _where_ i should start
> in the p.g. library, please say so, either here or backchannel
> (e.g., start at 12000, or start at 16,500, or so on)...

Cool!


As asked before, Bowerbird, how does ZML handle block quotes?

Many books have block quotes, which themselves usually contain one or
more *ordinary paragraphs* and may also contain pretty much anything
else (i.e., a block quote may be a "mini document" all to itself, such
as paragraphs, verse, and even other block quotes.)

According to Lee, ZML does not have a special rule to recognize a block
quote so it can be properly presented, which is that a paragraph in a
block quote needs to be reflowed upon presentation like any paragraph
would. (In fact, to be recognized as a paragraph and not lines of
verse.)

But I'd rather hear it from the ZML expert: how does ZML specifically
identify block quotes?

If it can't, this is one show stopper for the current ZML spec being a
universal mastering format. If it can, then we'd like to know, and of
course maybe mention how in your "11 Rulez."

Jon Noring


From lee at novomail.net  Thu Oct 18 10:56:07 2007
From: lee at novomail.net (Lee Passey)
Date: Thu, 18 Oct 2007 11:56:07 -0600
Subject: [gutvol-d] How are block quotes handled in ZML?
In-Reply-To: <684145262.20071018113334@noring.name>
References: <d03.1dd2ff7a.3447c152@aol.com>	<Pine.LNX.4.64.0710181012590.21491@pglaf.org>
	<684145262.20071018113334@noring.name>
Message-ID: <47179E37.9040304@novomail.net>

Jon Noring wrote:

> [Josh and others in PG/DP, do you know of specific PG texts which
> contain block quotes? And I'd love to see some where the block quote
> itself contains multiple paragraphs and maybe mixed with other
> structures like verse. Jon]

How about: http://www.gutenberg.org/files/16494/16494-h.zip?

[snip]

> According to Lee, ZML does not have a special rule to recognize a block
> quote so it can be properly presented, which is that a paragraph in a
> block quote needs to be reflowed upon presentation like any paragraph
> would. (In fact, to be recognized as a paragraph and not lines of
> verse.)

Well, I'm not sure I ever said that ZML didn't have a rule to recognize 
a block quote (and if I did, I apologize). I only said if one exists 
that I couldn't find any documentation of it. The rule may exist in a 
local file, or in BB's brain, or somewhere else I'm not aware of; I just 
can't find it.

-- 
Nothing of significance below this line.


From jon at noring.name  Thu Oct 18 11:10:40 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 18 Oct 2007 12:10:40 -0600
Subject: [gutvol-d] How are block quotes handled in ZML?
In-Reply-To: <47179E37.9040304@novomail.net>
References: <d03.1dd2ff7a.3447c152@aol.com>
	<Pine.LNX.4.64.0710181012590.21491@pglaf.org>
	<684145262.20071018113334@noring.name> <47179E37.9040304@novomail.net>
Message-ID: <784139522.20071018121040@noring.name>

Lee wrote:
> Jon Noring wrote:

>> [Josh and others in PG/DP, do you know of specific PG texts which
>> contain block quotes? And I'd love to see some where the block quote
>> itself contains multiple paragraphs and maybe mixed with other
>> structures like verse. Jon]

> How about: http://www.gutenberg.org/files/16494/16494-h.zip?

LOL, yes, that's a good one.

There you go, Bowerbird...


>> According to Lee, ZML does not have a special rule to recognize a block
>> quote so it can be properly presented, which is that a paragraph in a
>> block quote needs to be reflowed upon presentation like any paragraph
>> would. (In fact, to be recognized as a paragraph and not lines of
>> verse.)

> Well, I'm not sure I ever said that ZML didn't have a rule to recognize
> a block quote (and if I did, I apologize). I only said if one exists 
> that I couldn't find any documentation of it. The rule may exist in a 
> local file, or in BB's brain, or somewhere else I'm not aware of; I just
> can't find it.

Lee, I apologize for extrapolating what you said. Indeed you were careful
to say that if there is a ZML rule for "marking up" block quotes, it is
not yet documented.

To clarify again to Bowerbird. A "block quote" may itself be a whole
standalone document (which may contain paragraphs, other block quotes,
verse, letters, etc., etc.) which is quoted inside another document.
So ZML, for mastering purposes, must be able to handle this construct
as just described (essentially a "nesting" of documents.)

Of course, it is possible to come up with a rule using an unusual
combination of white space (tabs and spaces) to accomplish this, but
now we are really getting into other difficulties as Lee previously
mentioned. Differentiating tabs and spaces in many text editors is
oftentimes difficult, plus other problems in requiring counting spaces
and tabs.

Jon Noring


From Bowerbird at aol.com  Thu Oct 18 11:34:31 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 18 Oct 2007 14:34:31 EDT
Subject: [gutvol-d] taking my own advice
Message-ID: <d5e.14465252.34490137@aol.com>

michael said:
>    Follow the instructions in "Alice". . . .

i asked michael backchannel what he meant.

the reply:
>   "Start at the beginning"

yeah, i should've mentioned that.

because the filenaming structures
are in flux with pre-10000 e-texts,
i am not dealing with those at first...

i'm not sure why y'all don't make the
library consistent in its filenames, but
until you do, i can't be bothered with it.

once i've got all the 10000+ files in z.m.l.,
i'll do the rest.   but probably not until then.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071018/dae3d51b/attachment.htm 

From lee at novomail.net  Thu Oct 18 12:30:36 2007
From: lee at novomail.net (Lee Passey)
Date: Thu, 18 Oct 2007 13:30:36 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <3466526.1192625429600.JavaMail.?@fh1064.dia.cp.net>
References: <3466526.1192625429600.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <4717B45C.5050603@novomail.net>

You raise several interesting issues here, some of which I would like to 
explore in more depth.

joshua at hutchinson.net wrote:

> Ok, all sarcastic commentary aside ... the default render for <head> 
> and <head type="sub"> gets the job done.
> 
> ie.
> 
> <head>CHAPTER 1</head>
> <head type="sub">Missy Goes to Space</head>
>
> renders with a large "CHAPTER 1" and a slightly smaller "Missy Goes to 
> Space".  

Actually, there really /isn't/ a default rendering for <head>, just as 
there is no default rendering for /most/ XML vocabularies. I think what 
you are saying is that Mr. Perathoner's XSL script will automatically 
add these rendering styles to it's output, if that's the conversion 
method you choose.

I have been generally unsuccessful in discovering documentation as to 
just what styles will be automatically added during an XSLT 
transformation, unless you count the old unix programmer adage that "the 
code is the documentation." Some simple documentation about just what an 
end user should expect from the transformation would probably be helpful.

On the other hand, it would probably be possible to /create/ a default 
rendering for PGTEI file using a CSS file. In this case you would 
probably create a couple of rules such as:

div[type~="chapter"] head              { font-size:150% }
div[type~="chapter"] head[type~="sub"] { font-size:120% }

> The only rend attribute necessary is a centering attribute 
> (the same thing you'd need to add in an HTML document because <h1> 
> isn't automatically centered).  Now, just like HTML, you can center 
> each <head> individually (<head rend="text-align: center">) or you can 
> put a line in the stylesheet section at the beginning telling it to 
> center every <head> element.

Or add

head { text-align:center }

to your own personal CSS file and just add

<?xml-stylesheet href="mytei.css" type="text/css"?>

to the beginning of every TEI file you download (you would probably want 
to do the same thing for the default CSS file as well).

> Honestly, for 99% of the stuff we see, the TEI code is no more complex 
> than the HTML code equivalent. 

This is /so/ true. The notion that TEI is difficult to use is a myth 
promulgated by those people who are scared by the shear size of the TEI 
specification (and its reliance on DTDs, which are a truly foreign 
language) or by people who have a vested interest in maintaining the 
status quo or are promoting an alternative.

The biggest stumbling block to the adoption of TEI is the perception 
that it is hard. But it's all a perception problem, because TEI is 
really no harder to use than any other markup language, and a good deal 
simpler and more straight-forward than ZML.

> It's just that it is DIFFERENT from the 
> HTML code equivalent and therefore needs different tools/scripts if you 
> want to automate any part of it.

Perhaps more importantly, it requires a different mind set. What You See 
is not only /not/ What You Get, What You See Is Mostly Irrelevant. On 
the other hand, when people rely on their brains instead of their eyes 
things seem to just fall into place.

> Marcello has said, repeatedly, he's not interested nor has the time to 
> write such tools and scripts.  I've admitted I don't have the ability.  
> Lee sounds like he has the ability, but perhaps not the time or 
> inclination.  If someone was willing to step up and start creating a 
> tool/regex scripts/perl scripts/whatever, I'd be happy to work with 
> them and I'm positive so would the couple of other people active in 
> trying to work with TEI (Ralf and others from DP).

Check out http://www.passkeysoft.com/~lee/te12html.

You could also check out http://www.passkeysoft.com/~lee/antonia.xml to 
see what TEI + CSS looks like (use Firefox or Opera, Microsoft is still 
learning how to do XML right).

> Right now, the arguments are going to be fairly useless and circular 
> in nature simply because we don't have the tools to take the process to 
> the next level. 

On the other hand, it's difficult to develop tools because there are so 
few complex TEI texts available for testing, and so little feedback as 
to just what kind of tools might be desired (I'm fairly certain there 
are commercial WYSIWYG XML editors available, but you would have to 
develop a default CSS file to make them work -- and TEI is not really a 
WYSIWYG markup anyway).

One of the first things I learned in law school was that legal 
technicalities are not nearly so important as the layman thinks. 
Yesterday a colleague suggested that because his credit card says "not 
valid unless signed," if a merchant accepted the unsigned card as 
payment then he might be able to avoid paying the charge. A court would 
blow by that argument so fast it would make your head spin. The fact is, 
what matters is what a thing /is/, not what it is called.

Most of what is available from PG as "TEI", isn't. It's HTML with a 
different set of tags. It's like the people who slap <pre> tags around 
PG degraded text and call the result HTML. It's a legal technicality, 
that just doesn't fly in the real world. These files may validate 
against the TEI DTDs, but they don't contain the Tao of TEI.

We definitely have a chicken and egg problem here, but because TEI can 
be created fairly easily in a simple text editor, I'm inclined to favor 
the creation of TEI documents as the first step, as opposed to the 
creation of TEI manipulation tools. (That and the fact that converting 
presentationally-oriented markup to TEI is virtually impossible without 
a very powerful AI engine). It would be very useful to have some 
examples of real TEI to work with, and not just the pseudo-TEI which is 
so pervasive.

-- 
Nothing of significance below this line.


From jon at noring.name  Thu Oct 18 12:38:37 2007
From: jon at noring.name (Jon Noring)
Date: Thu, 18 Oct 2007 13:38:37 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <4717B45C.5050603@novomail.net>
References: <3466526.1192625429600.JavaMail.?@fh1064.dia.cp.net>
	<4717B45C.5050603@novomail.net>
Message-ID: <34367422.20071018133837@noring.name>

Lee wrote:

> Most of what is available from PG as "TEI", isn't. It's HTML with a 
> different set of tags. It's like the people who slap <pre> tags around
> PG degraded text and call the result HTML. It's a legal technicality, 
> that just doesn't fly in the real world. These files may validate 
> against the TEI DTDs, but they don't contain the Tao of TEI.

"Tao of TEI"

I love this! I checked Google to see if it appears anywhere else and
it doesn't. It is a Lee Passey original and, again, I love it!

Jon


From joshua at hutchinson.net  Thu Oct 18 14:08:04 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Thu, 18 Oct 2007 21:08:04 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
Message-ID: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>

>----Original Message----
>From: lee at novomail.net
>
>On the other hand, it's difficult to develop tools because there are 
so 
>few complex TEI texts available for testing, and so little feedback 
as 
>to just what kind of tools might be desired (I'm fairly certain 
there 
>are commercial WYSIWYG XML editors available, but you would have to 
>develop a default CSS file to make them work -- and TEI is not really 
a 
>WYSIWYG markup anyway).
>

Here is exactly what is needed to start things moving at DP (which is 
were critical mass *has* to occur):

1 - A script/utility/plug-in/whatever that takes DP formatted input 
and spits out a generic TEI encoded document.  

For instance

- paragraphs need to be enclosed in <p></p>.  
- Poetry (/* */ enclosed in DP) needs converted to <lg><l> ... 
</l></lg> type markup including the indention information (DP uses 2 
spaces per indent level).
- /# #/ gets converted to a blockquote markup.
- Chapter headings need to be used as <div> dividers and enclosed with 
<head></head> markup properly.
- The DP page divider markup needs to be converted into <pb> markup so 
that the original page break informaion is retained.
- [Illustration: Caption] markup needs to be converted to a generic 
<figure> markup with <head>Caption</head> information.
- Convert the [1] type footnote markup (out of line) into inline TEI 
<note> markup
- Things that *are* presentational need to be handled: <i> = italics  
<sc> = small caps. <b> = bold, etc etc.  Just because semantic is 
important ... doesn't mean you can ignore presentational.

2 - A utility/program/web form that allows easy entry of book meta 
data and then spits out a happy little teiHeader.  That thing is a 
bloody nightmare to encode by hand.

***

We get that much going and DP will start using it.  Keep it all open 
source and it's much easier to extend it as needed.  And it not only 
doesn't have to be a WYSIWYG editor ... it doesn't have to be an editor 
at all.  Just a widget that plugs in DP text in one end and spits out 
TEI on the other.

The first utility doesn't have to be perfect.  Just good enough so 
that a manual second pass can catch the "weird" stuff.

I also don't think we should worry about getting all the heavy lifting 
done up front.  Get something covers the "easy" fiction type books 
first and then move on from there.  People will throw easy stuff at it 
at first to get comfortable ... then start adding more and more 
difficult stuff.  I don't expect some to throw a scholarly critique of 
Middle English poetry on a first go.  But the Campfire Girl Take 
Another Unsupervised Trip Where They Shouldn't Be ... heck, yeah.  
Let's mark it up!

Josh

From traverso at posso.dm.unipi.it  Fri Oct 19 02:30:53 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Fri, 19 Oct 2007 11:30:53 +0200 (CEST)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	(joshua@hutchinson.net)
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
Message-ID: <20071019093053.5C406101F0@posso.dm.unipi.it>

>>>>> "josh" == joshua at hutchinson net <joshua at hutchinson.net> writes:


    josh> Here is exactly what is needed to start things moving at DP
    josh> (which is were critical mass *has* to occur):

    josh> 1 - A script/utility/plug-in/whatever that takes DP
    josh> formatted input and spits out a generic TEI encoded
    josh> document.

    josh> For instance .....

I warmly agree. I also think that we might SLIGHTLY modify DP markup
to allow better working of such a tool. This should NOT include using
anything else than blank line for paragraphs (no <p>...</p> in the
formatting stages) but might e.g. require some light markup for
chapters and sections, speakers and stage notes in drama, standardized
markup for corrections, etc.; of course, this has to be supported by
the DP proofing interface.

BTW, DP code already prepares a skeletal TEI from the txt files,
extremely limited and out of sync with the current guidelines. But can
be used as a template for a better version.

Because of the need to mix DP coding and TEI coding this can only be
made by a DP team. If Josh is willing to help, I can give my
contribution for a prototype, including modifications to the DP proofing
interface to test in the DP test site.

A problem with this approach is the handling of markup errors; a
markup validation step should be included in the DP page saving
code. And an utility to convert DP-marked txt after the first phases
of post-processing is needed too. Interacting with guiguts might hence
be necessary, complicating the issues.

    
    josh> 2 - A utility/program/web form that allows easy entry of book meta 
    josh> data and then spits out a happy little teiHeader.  That thing is a 
    josh> bloody nightmare to encode by hand.

Ditto, this can be auto-generated from the project data, (and is
present in an insufficient form in the skeletal DP TEI), but the
project data form and project database have to be modified to keep all
the information needed.

I see this as less urgent and more difficult to provide at DP, but can
be easily made with a simple tool if it is well specified. 


Carlo

From ralf at ark.in-berlin.de  Thu Oct 18 09:24:29 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Thu, 18 Oct 2007 18:24:29 +0200
Subject: [gutvol-d] taking my own advice
In-Reply-To: <d03.1dd2ff7a.3447c152@aol.com>
References: <d03.1dd2ff7a.3447c152@aol.com>
Message-ID: <20071018162429.GA20062@ark.in-berlin.de>

You wrote 
> if anyone has any preferences as to _where_ i should start
> in the p.g. library, please say so, either here or backchannel
> (e.g., start at 12000, or start at 16,500, or so on)...

In any case, include 22001 and 13006, please.


From ralf at ark.in-berlin.de  Fri Oct 19 03:41:36 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Fri, 19 Oct 2007 12:41:36 +0200
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule -	empirical
	data
In-Reply-To: <20071019093053.5C406101F0@posso.dm.unipi.it>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	<20071019093053.5C406101F0@posso.dm.unipi.it>
Message-ID: <20071019104135.GA26101@ark.in-berlin.de>

Carlo wrote 
> BTW, DP code already prepares a skeletal TEI from the txt files,
> extremely limited and out of sync with the current guidelines. But can
> be used as a template for a better version.

But very awkwardly, as one can see from my tutorial
http://www.pgdp.net/wiki/Post-Processing_With_PGTEI_0.4

and making the TEI template PGTEI 0.4 conform would greatly
simplify this document! So count me in with the work.

Where is that script that produces that 0.3 output to be found,
anyway?

> A problem with this approach is the handling of markup errors; a

OTOH, it could make an automated verification of the foofing stage,
make a button for the foofers like the WordCheck one.


ralf


From joshua at hutchinson.net  Fri Oct 19 06:18:32 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Fri, 19 Oct 2007 13:18:32 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
Message-ID: <11915392.1192799912118.JavaMail.?@fh1039.dia.cp.net>

Count me in, Carlos.  Tell me what you need and I'll put together as 
fast as I can.  Tell me what needs testing and feedback and I'll get on 
it.

I am at your disposal, sir.

Josh

>----Original Message----
>From: traverso at posso.dm.unipi.it
>Date: Oct 19, 2007 5:30 
>To: <joshua at hutchinson.net>, <gutvol-d at lists.pglaf.org>
>Cc: <gutvol-d at lists.pglaf.org>
>Subj: Re: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - 
empirical data
>
>>>>>> "josh" == joshua at hutchinson net <joshua at hutchinson.net> writes:
>
>
>    josh> Here is exactly what is needed to start things moving at DP
>    josh> (which is were critical mass *has* to occur):
>
>    josh> 1 - A script/utility/plug-in/whatever that takes DP
>    josh> formatted input and spits out a generic TEI encoded
>    josh> document.
>
>    josh> For instance .....
>
>I warmly agree. I also think that we might SLIGHTLY modify DP markup
>to allow better working of such a tool. This should NOT include using
>anything else than blank line for paragraphs (no <p>...</p> in the
>formatting stages) but might e.g. require some light markup for
>chapters and sections, speakers and stage notes in drama, 
standardized
>markup for corrections, etc.; of course, this has to be supported by
>the DP proofing interface.
>
>BTW, DP code already prepares a skeletal TEI from the txt files,
>extremely limited and out of sync with the current guidelines. But 
can
>be used as a template for a better version.
>
>Because of the need to mix DP coding and TEI coding this can only be
>made by a DP team. If Josh is willing to help, I can give my
>contribution for a prototype, including modifications to the DP 
proofing
>interface to test in the DP test site.
>
>A problem with this approach is the handling of markup errors; a
>markup validation step should be included in the DP page saving
>code. And an utility to convert DP-marked txt after the first phases
>of post-processing is needed too. Interacting with guiguts might 
hence
>be necessary, complicating the issues.
>
>    
>    josh> 2 - A utility/program/web form that allows easy entry of 
book meta 
>    josh> data and then spits out a happy little teiHeader.  That 
thing is a 
>    josh> bloody nightmare to encode by hand.
>
>Ditto, this can be auto-generated from the project data, (and is
>present in an insufficient form in the skeletal DP TEI), but the
>project data form and project database have to be modified to keep 
all
>the information needed.
>
>I see this as less urgent and more difficult to provide at DP, but 
can
>be easily made with a simple tool if it is well specified. 
>
>
>Carlo


From traverso at posso.dm.unipi.it  Fri Oct 19 07:12:11 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Fri, 19 Oct 2007 16:12:11 +0200 (CEST)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule
	-	empirical	data
In-Reply-To: <20071019104135.GA26101@ark.in-berlin.de> (message from Ralf
	Stephan on Fri, 19 Oct 2007 12:41:36 +0200)
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	<20071019093053.5C406101F0@posso.dm.unipi.it>
	<20071019104135.GA26101@ark.in-berlin.de>
Message-ID: <20071019141211.3B19E101E6@posso.dm.unipi.it>

>>>>> "Ralf" == Ralf Stephan <ralf at ark.in-berlin.de> writes:

    Ralf> Carlo wrote
    >> BTW, DP code already prepares a skeletal TEI from the txt
    >> files, extremely limited and out of sync with the current
    >> guidelines. But can be used as a template for a better version.

    Ralf> But very awkwardly, as one can see from my tutorial
    Ralf> http://www.pgdp.net/wiki/Post-Processing_With_PGTEI_0.4

    Ralf> and making the TEI template PGTEI 0.4 conform would greatly
    Ralf> simplify this document! So count me in with the work.

    Ralf> Where is that script that produces that 0.3 output to be
    Ralf> found, anyway?

the DP code is in http://dproofreaders.sourceforge.net , the file is
tools/project_manager/post_files.inc , the function join_proofed_text_tei

Carlo


From traverso at posso.dm.unipi.it  Fri Oct 19 07:14:43 2007
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Fri, 19 Oct 2007 16:14:43 +0200 (CEST)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <11915392.1192799912118.JavaMail.?@fh1039.dia.cp.net>
	(joshua@hutchinson.net)
References: <11915392.1192799912118.JavaMail.?@fh1039.dia.cp.net>
Message-ID: <20071019141443.339F9101E8@posso.dm.unipi.it>

>>>>> "josh" == joshua at hutchinson net <joshua at hutchinson.net> writes:

    josh> Count me in, Carlos.  Tell me what you need and I'll put
    josh> together as fast as I can.  Tell me what needs testing and
    josh> feedback and I'll get on it.

    josh> I am at your disposal, sir.

    josh> Josh

I start a thread in the DP forum. 

Carlo

From lee at novomail.net  Fri Oct 19 09:38:04 2007
From: lee at novomail.net (Lee Passey)
Date: Fri, 19 Oct 2007 10:38:04 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
Message-ID: <4718DD6C.6090207@novomail.net>

joshua at hutchinson.net wrote:

> Here is exactly what is needed to start things moving at DP (which 
> is where critical mass *has* to occur):  

I don't think so. Given the institutional inertia that DP has so far 
exhibited, I don't think DP is any longer the proper venue for 
innovation. I would suggest a smaller, nimbler group who can develop a 
new proofing processes without having to cater to outdated conventions.

> 1 - A script/utility/plug-in/whatever that takes DP formatted input 
> and spits out a generic TEI encoded document.

The problem here is that by the time you have DP formatted input you're 
already reduced to purely presentational markup. Converting this to TEI 
will result in purely presentational TEI, which, while valid, doesn't 
buy you anything. If you're going to be satisfied with purely 
presentational markup a tool to input DP markup and convert to HTML 
would be even better; or we could easily convert from DPML to ZML, 
because we /know/ that the process of converting ZML to HTML is already 
perfected. ;-)

I would suggest that the best approach would be to do away with DPML 
altogether. Instead, we/I could write a script/program that would take 
the HTML output from an OCR engine and convert it to (admittedly purely 
presentational) TEI. For any markup which is questionable (such as <hi>) 
the output should be intentionally made invalid, perhaps by the addition 
of the attribute "class='invalid'" (there is no "class" attribute 
anywhere in TEI). This oversimplified TEI would then go on to the 
proofing rounds.

Now, the DP proofing guidelines would be re-written to replace DPML with 
TEI. As Mr. Hutchinson has pointed out, TEI really is quite easy to use. 
There's no reason a proofer couldn't use <quote 
rend="display:block"></quote> instead of /# #/; I, myself, would find it 
easier because the TEI markup virtually describes itself, where as /# is 
completely arbitrary, and easy to confuse mentally with /* (which we all 
know is how you mark up comments which are not part of the text).

Mr. Hutchinson's list, below, shows just how easy it would be to 
re-write the DPML markup rules.
> For instance
>
> - paragraphs need to be enclosed in <p></p>.
And everything that is /not/ a paragraph needs to be enclosed in 
something else.

> - Poetry (/* */ enclosed in DP) needs converted to <lg><l> ... 
> </l></lg> type markup including the indention information (DP uses 2 
> spaces per indent level).
> - /# #/ gets converted to a blockquote markup.
> - Chapter headings need to be used as <div> dividers and enclosed with 
> <head></head> markup properly.
> - The DP page divider markup needs to be converted into <pb> markup so 
> that the original page break informaion is retained.
> - [Illustration: Caption] markup needs to be converted to a generic 
> <figure> markup with <head>Caption</head> information.
> - Convert the [1] type footnote markup (out of line) into inline TEI 
> <note> markup
> - Things that *are* presentational need to be handled: <i> = italics  
> <sc> = small caps. <b> = bold, etc etc.

I wasn't aware that DP used "etc etc." as markup. What do you want it 
converted to?

> Just because semantic is 
> important ... doesn't mean you can ignore presentational.
>   

And just because presentation is important ... doesn't mean you can 
ignore semantics, particularly when the target markup language is 
inherently semantic in nature. But this is exactly what you would get if 
you don't engage a human brain in the process.

BTW, if I wanted to get a hold of a DPML file to experiment with, how 
would I get one/several?

> 2 - A utility/program/web form that allows easy entry of book meta 
> data and then spits out a happy little teiHeader.  That thing is a 
> bloody nightmare to encode by hand.
>   

OK. Do you want it web based, or a stand-alone application? Please 
provide a mockup of what you want it to look like.
> ***
>
> We get that much going and DP will start using it.  

Well, get that much going, AND make it work end to end, AND produce a 
significant number of "real" TEI texts, then DP MAY adopt it, perhaps in 
parallel with its current work flow. It has been said that "big ships do 
not turn in tight circles," and I think this is particularly true of big 
ships with powerful engines and tiny rudders. Personally, I'd be happy 
to get a process going that works end to end, and let DP worry about 
whether it can be integrated into its own work flow.

> Keep it all open 
> source and it's much easier to extend it as needed.  And it not only 
> doesn't have to be a WYSIWYG editor ... it doesn't have to be an editor 
> at all.  Just a widget that plugs in DP text in one end and spits out 
> TEI on the other.
>
> The first utility doesn't have to be perfect.  Just good enough so 
> that a manual second pass can catch the "weird" stuff.
>   

In my mind, the first pass will be /all/ weird stuff, because you're 
moving from a purely presentational paradigm to what is at least a 
semantic/presentational hybrid. Perhaps the most useful tool would be an 
application which would scan a file for everything that is questionable 
(and I would rank every instance of <p> and <hi> in that category) and 
ask for human confirmation that the markup is, in fact, correct, and 
offer reasonable alternatives if it is not.

> I also don't think we should worry about getting all the heavy lifting 
> done up front.  Get something covers the "easy" fiction type books 
> first and then move on from there.  People will throw easy stuff at it 
> at first to get comfortable ... then start adding more and more 
> difficult stuff.  I don't expect some to throw a scholarly critique of 
> Middle English poetry on a first go.  But the Campfire Girl Take 
> Another Unsupervised Trip Where They Shouldn't Be ... heck, yeah.  
> Let's mark it up!
>   

I agree. How do you eat an elephant? One bite at a time. This idea is 
what prompted my original post in this thread, "The TEI 80/20 rule." 
What is the 20% of the TEI markup vocabulary that can be used to cover 
80% of the e-books we are trying to create? Let's promote that. My 
suspicion is that once someone has becomes comfortable with the 20% 
cream, s/he will be able to easily dip into the the 80% when it is needed.

On the other hand, (as Tevya said, there's always an other hand) we need 
to remember the admonition of Albert Einstein, to "make everything as 
simple as possible, but not simpler." Things can always be made simpler, 
but at a certain point the simplification starts to make it impossible 
to meet the goal you are trying to achieve. I think that allowing purely 
presentational TEI to be distributed outside of the production chain is 
making things simpler than is possible. We must be careful to not allow 
our quest for simplicity to render the process as a whole futile.


From grythumn at gmail.com  Fri Oct 19 09:47:50 2007
From: grythumn at gmail.com (Robert Cicconetti)
Date: Fri, 19 Oct 2007 12:47:50 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <4718DD6C.6090207@novomail.net>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	<4718DD6C.6090207@novomail.net>
Message-ID: <15cfa2a50710190947n4fe02d27x233996bca243453a@mail.gmail.com>

On 10/19/07, Lee Passey <lee at novomail.net> wrote:
> BTW, if I wanted to get a hold of a DPML file to experiment with, how
> would I get one/several?

Pick any project in PP, click the download zipped text link, or for
that matter, the download zipped TEI link. You can grab stuff in
earlier states by using the download concatenated text button.

http://www.pgdp.net/c/tools/project_manager/projectmgr.php?show=search&state%5B%5D=proj_post_first_available&state%5B%5D=proj_post_first_checked_out&state%5B%5D=proj_post_second_available&state%5B%5D=proj_post_second_checked_out&n_results_per_page=100

You may need a DP account to access it.

R C

From joshua at hutchinson.net  Fri Oct 19 10:51:23 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Fri, 19 Oct 2007 17:51:23 +0000 (UTC)
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
Message-ID: <14789848.1192816283649.JavaMail.?@fh1039.dia.cp.net>

>----Original Message----
>From: lee at novomail.net
>
>joshua at hutchinson.net wrote:
>
>> Here is exactly what is needed to start things moving at DP (which 
>> is where critical mass *has* to occur):  
>
>I don't think so. Given the institutional inertia that DP has so far 
>exhibited, I don't think DP is any longer the proper venue for 
>innovation. I would suggest a smaller, nimbler group who can develop 
a 
>new proofing processes without having to cater to outdated 
conventions.
>

You've just pretty much guaranteed failure.  If you don't want to work 
within the DP framework, that's up to you.  But our discussion is 
pretty much over at this point.

Josh

PS DP markup is NOT purely presentational.  In fact, it is semantic in 
many ways, but that is an argument for another time.


From donovan at abs.net  Fri Oct 19 10:51:56 2007
From: donovan at abs.net (D Garcia)
Date: Fri, 19 Oct 2007 13:51:56 -0400
Subject: [gutvol-d]
	=?iso-8859-1?q?Gimme_that_AI_now_Re=3A_The_TEI_80/20_r?=
	=?iso-8859-1?q?ule_-=09empirical=09data?=
In-Reply-To: <20071019141211.3B19E101E6@posso.dm.unipi.it>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	<20071019104135.GA26101@ark.in-berlin.de>
	<20071019141211.3B19E101E6@posso.dm.unipi.it>
Message-ID: <200710191351.57120.donovan@abs.net>

On Friday 19 October 2007 10:12, Carlo Traverso wrote:
> >>>>> "Ralf" == Ralf Stephan <ralf at ark.in-berlin.de> writes:
>
>     Ralf> Carlo wrote
>
>     >> BTW, DP code already prepares a skeletal TEI from the txt
>     >> files, extremely limited and out of sync with the current
>     >> guidelines. But can be used as a template for a better version.
>
>     Ralf> But very awkwardly, as one can see from my tutorial
>     Ralf> http://www.pgdp.net/wiki/Post-Processing_With_PGTEI_0.4
>
>     Ralf> and making the TEI template PGTEI 0.4 conform would greatly
>     Ralf> simplify this document! So count me in with the work.
>
>     Ralf> Where is that script that produces that 0.3 output to be
>     Ralf> found, anyway?
>
> the DP code is in http://dproofreaders.sourceforge.net , the file is
> tools/project_manager/post_files.inc , the function join_proofed_text_tei
>

There is an open task at DP to correct these issues, but it has languished for 
lack of knowledgeable resources as well as being low-priority.

Reference http://www.pgdp.net/c/tasks.php?f=detail&tid=426  for details on the 
known issues with that code.

From Bowerbird at aol.com  Fri Oct 19 11:54:31 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 19 Oct 2007 14:54:31 EDT
Subject: [gutvol-d] taking my own advice
Message-ID: <cf3.1d0d02a3.344a5767@aol.com>

ralf said:
>    In any case, include 22001 and 13006, please.

sure thing ralf.

it'll be good for me to do a play as an example...
if you can make the scan-set for 13006 available
-- the image-files seemed to be missing on o.l.s. --
i'd be happy to do both those after this weekend...

as for 22001 -- a straightforward book of poetry --
the one obvious shortcoming -- vis a vis z.m.l. -- is
that every line of poetry doesn't have a leading space,
to indicate that each block of text is _not_ a paragraph
-- and thus should not have its first line indented --
_and_ that all those lines should not be unwrapped...

and this isn't just a "problem" when it comes to z.m.l.

_every_ conversion program that rewraps the lines
will have the identical problem with a file like this...
so this is one of the worst problems in the library...

it also appears, from a quick glance at the text-file
without reference to the scans, that the _headings_
(in this case, poem titles) do not have the standard
4-blank-lines-before-and-2-blank-lines-after...

-bowerbird

>    http://www.pgdp.org/ols/tools/display.php?nextpage=001.png&
lastpage=120.png&numpages=120&book=4026e97f4b0fc


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071019/b3f26f84/attachment.htm 

From prosfilaes at gmail.com  Fri Oct 19 14:16:44 2007
From: prosfilaes at gmail.com (David Starner)
Date: Fri, 19 Oct 2007 17:16:44 -0400
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
	data
In-Reply-To: <4718DD6C.6090207@novomail.net>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>
	<4718DD6C.6090207@novomail.net>
Message-ID: <6d99d1fd0710191416g3aa7d342if9ff3eeeb2cb6860@mail.gmail.com>

On 10/19/07, Lee Passey <lee at novomail.net> wrote:
> Converting this to TEI
> will result in purely presentational TEI, which, while valid, doesn't
> buy you anything.

Repeating the claim doesn't make it true. Being able to specify page
numbers, sidenotes, and footnotes as such is a huge advantage over
HTML, where you can, with enough work, define page numbers and
sidenotes in an opaque way, but never footnotes.

> There's no reason a proofer couldn't use <quote
> rend="display:block"></quote> instead of /# #/;

This shows a lack of experience with the system. We've seen footnote
spelled a dozen different ways; the last thing we want to do is
replace two characters with 20, especially when the 20 characters are
just as confusing to someone who hasn't touched TEI-Lite, especially
as most of our users haven't written XML or HTML.

> I, myself, would find it
> easier because the TEI markup virtually describes itself, where as /# is
> completely arbitrary,

The choice of " over ' or ? is completely arbitrary (well, ? isn't
completely arbitrary, but you can't expect most of our users to know
that.) Why rend="display:block" instead of rend=display:block or
display=block or even just block, again from the point of view of
someone to whom XML means nothing?

> But this is exactly what you would get if
> you don't engage a human brain in the process.

What, insulting of people you want to convince?

From lee at novomail.net  Mon Oct 22 13:06:30 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 22 Oct 2007 14:06:30 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <6310459816.20071020133137@noring.name>
References: <6310459816.20071020133137@noring.name>
Message-ID: <471D02C6.9010201@novomail.net>

Jon Noring wrote:

> I plan to use the following for this:
> 
> <choice>
>    <sic>...</sic>
>    <corr>...</corr>
> </choice>
> 
> (and of course various attributes will be applied...)

I think that is the correct way, and, as I understand it, the only 
acceptable way under P5.

> Anyway, one thing that interests me are your reasons for placing the
> sole note in My Antonia where you did in the text, at the end of the
> chapter it is referenced.
> 
> I've been going through the pros and cons of three placements:
> 
> 1) Inline at the point of occurrence,
> 
> 2) At the end of the division they are referenced in (which you did)
> 
> 3) At the end of the document in a stand-alone "note dump" section
>   (which for books with many notes would collect all the notes in one
>   place.)
> 
> So what are your thoughts?

I think the question implies much broader issues than what you see at 
face value, specifically in regard to two dichotomies that exist in TEI: 
  those of past presentation versus future presentation, and implicit 
markup versus explicit markup.

When I placed the note at the end of the chapter, I did it primarily to 
make the file display nicely when using CSS. I'm sure you understand CSS 
much better than I do, but I couldn't figure out any way, using CSS, to 
move a note from an inline presentation. The one thing we /don't/ want 
is for the note to be displayed at the same place it is referenced.

Your questions prompted further reflection on my part, however. I asked 
myself, what is the purpose of footnotes/endnotes, and how are they 
traditionally presented.

It seems to me that generally there are two kinds of footnotes, 
explanatory footnotes and bibliographic footnotes. Bibliographic 
footnotes are used to contain a reference to the source of a quotation 
or viewpoint, or to further useful information pertaining to the subject 
in the main text. Explanatory footnotes are used for additional 
information or explanatory notes that might be too digressive for the 
main text, as an alternative to a parenthetical comment.

Footnotes are notes of text placed at the bottom of a page in a book or 
document. Endnotes are similar to footnotes, but differ in that rather 
than appearing at the foot of the particular page, they are collected 
together at the end of the chapter or at the end of the work. Rarely do 
you see notes actually embedded in the text; when you do they are called 
parenthetical expressions and are delimited from the surrounding text by 
parentheses.

In print documents endnotes are considered more inconvenient than 
footnotes because of the need to move back and forth between the main 
text and the endnote section. I think it is for this reason that 
explanatory notes are typically presented as footnotes, so you can 
quickly glance at the note while still in the main text, and endnotes 
are typically bibliographic notes.

In electronic documents, the practical distinction between footnotes and 
endnotes becomes less important. Assuming you have implemented the note 
with reciprocal pointers, it doesn't matter where in the file the note 
appears, except that you /don't/ want the note to appear in the main 
text where it will disrupt the flow of the text. In HTML there are a 
couple of ways to achieve this. You could collect all the notes at the 
end of the main file, or in a separate notes file. Each reference in the 
main text would point to a note using "href='#xxx'", and each note would 
point back to the reference in the main text. Alternatively, each 
reference in the main text could contain a "title" attribute which most 
browsers display when the mouse cursor "flies over" the reference. Or, 
the two options can be combined.

Given the ease of navigating between the main text and the notes in 
electronic documents, there doesn't seem to be much need to try to force 
the note text onto the same screen "page" as the referring text, unless 
the electronic format is simply a precursor to a printed document, as 
with PDF.

So why would you ever place the note inline at the place of the 
reference? Well, I think this brings us to the past/future dichotomy I 
mentioned earlier.

Most commonly TEI is probably used as a transcription markup; that is, 
TEI is used to transcribe an existing printed work into an electronic 
format. But it is also possible to use TEI as a text mastering format. 
How certain TEI elements should be interpreted depends on which of these 
two uses you have chosen.

For example, consider the <lb/> element which, according to the P5 
guidelines, "marks the start of a new (typographic) line in some edition 
or version of a text." You have indicated that in your "faithful" 
edition of My ?ntonia you want to maintain a record of the original line 
breaks. For this you would use the <lb/> element. But when presenting 
your edition you certainly don't want the User Agent to display these 
line breaks, unless the end user has explicitly declared that this is 
what s/he wants. On the other hand, in most PGTEI editions of other 
works the <lb/> element is used to indicate where line breaks should 
occur when presenting the current editions. In other words, in your 
document <lb/> is an indication of where line breaks appeared /in the 
past/ whereas in PGTEI editions <lb/> is an indication of where line 
breaks should appear /in the future/.

This same analysis applies to the presentation of a Title Page, which is 
one of the bones of contention on the Gutenberg discussion list. The 
<titlePage> element is used to transcribe how a title page appeared /in 
a past edition/, whereas the <divGen> element is used in PGTEI to create 
a new, standardized title page /in a future edition/.

We both know that Project Gutenberg is its own electronic publisher, and 
is not really concerned about archiving or preserving past editions of 
any particular work. Thus, the use of <lb/> as a forced line break, and 
the use of <divGen> to create a new title page, are completely 
appropriate uses of the markup. The implied "ed" attribute on the <lb/> 
element is, essentially, "that edition which will be created when the 
PGTEI transformation script is run."

Returning to the problem of notes, if your intended use is to master a 
future edition you may want to embed the note at the point in the main 
text where the reference occurs. When you're actually composing the 
text, it's at that point during composition that the explanation or 
reference is close at hand. When you're editing or maintaining the text 
having the actual note embedded in the main text will help to be sure 
that edits to the text will not invalidate the reference or require 
alteration of the explanation. When the file is transformed into a 
presentation format the note can be moved to wherever is most 
appropriate for that particular format. During the transformation a new 
intra-document reference will have to be created because the 
relationship between an in-line note and its context is only implicit.

On the other hand, if you are transcribing a work from an existing 
edition, and alteration of the text is not foreseen, I don't see how the 
justifications for creating in-line notes apply. One of the downsides to 
using embedded notes is that if you try to view the document using an 
appropriate Cascading Style Sheet the note text will remain visible in 
the middle of the noted text in the displayed document; yet moving text 
which is too digressive is exactly why an author or publisher used a 
footnote in the first place. We want our notes to be stored somewhere 
where they can be easily accessed, but only when we choose to do so, and 
displayed in a manner which will not disrupt the flow of the main text.

Another of the problems of the using in-line notes, at least as you have 
used them (from my perspective) is that they imply a reference without 
explicitly creating one. This may just be my irrational bias, but I get 
really nervous with implied content; everything that can be made 
explicit, should be.

For example, even if you were to leave the note in-line, you should 
probably include a reference at that point, making the linkage between 
the main text and the note explicit, e.g.:

<p id="p0016">I first heard of ?ntonia<ref xml:id="ref1" type="note" 
target="#fn1>[*]</ref><note xml:id="fn1" place="foot" 
target="#ref1><p>The Bohemian name <name rend="font-style: 
italic">?ntonia</name> is strongly accented on the first syllable, like 
the English name <name rend="font-style: italic">Anthony</name>, ...

(As an alternative, you may want to omit a note marker and use the noted 
text itself, e.g.:

<p id="p0016">I first heard of <ref xml:id="ref1" type="note" 
target="#fn1>?ntonia</ref><note xml:id="fn1" place="foot" 
target="#ref1><p>The Bohemian name <name rend="font-style: 
italic">?ntonia</name> is strongly accented on the first syllable, like 
the English name <name rend="font-style: italic">Anthony</name>, ...)

Likewise, I feel that anytime there is a risk of confusion between "in 
the past" uses of the document and "in the future" uses, something needs 
to be added to the markup to make the use explicit. For example, every 
<lb/> element that is intended to force a line break in all future 
presentations should have some indication that it is more that the 
simple description of the presentation of the source text that the 
guidelines envisioned; perhaps something like <lb ed="all"/> or <lb 
rend="display"/>.

Given the purposes and use of footnotes, I think I have concluded that 
for TEI encoding of notes in transcribed texts I would follow these 
guidelines:

1. Never encode notes inline.
2. Encode explanatory notes immediately after the paragraph in which 
they are referenced.
3. Encode bibliographic notes in blocks, at the end of the document in 
the <back> element if they are not extensive, or at the end of each 
chapter if they are.
4. If bibliographic notes are placed at the end of the text in the 
source material, place all notes in the <back> element.
5. If the notes are a combination of explanatory notes and bibliographic 
notes, choose one method that best reflects their use in the text and 
use it exclusively. Mimicking the presentation in the source text is 
probably the best option.
6. Notes should include <ref>erences back to the point in the text that 
referenced them.

If you want to play with a document that has extensive footnotes, both 
explanatory and bibliographic, see 
http://www.passkeysoft.com/~lee/911-Commission-Report.zip.


From lee at novomail.net  Mon Oct 22 14:14:22 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 22 Oct 2007 15:14:22 -0600
Subject: [gutvol-d] Gimme that AI now Re: The TEI 80/20 rule - empirical
 data
In-Reply-To: <6d99d1fd0710191416g3aa7d342if9ff3eeeb2cb6860@mail.gmail.com>
References: <18948644.1192741684598.JavaMail.?@fh1036.dia.cp.net>	<4718DD6C.6090207@novomail.net>
	<6d99d1fd0710191416g3aa7d342if9ff3eeeb2cb6860@mail.gmail.com>
Message-ID: <471D12AE.5000801@novomail.net>

David Starner wrote:

> On 10/19/07, Lee Passey <lee at novomail.net> wrote:
>> Converting this to TEI
>> will result in purely presentational TEI, which, while valid, doesn't
>> buy you anything.
> 
> Repeating the claim doesn't make it true. Being able to specify page
> numbers, sidenotes, and footnotes as such is a huge advantage over
> HTML, where you can, with enough work, define page numbers and
> sidenotes in an opaque way, but never footnotes.

Rats! I wish you would have told me this 5 years ago. All this time I've 
been doing it without knowing it was impossible. Just think how much 
time I could have saved.

>> There's no reason a proofer couldn't use <quote
>> rend="display:block"></quote> instead of /# #/;
> 
> This shows a lack of experience with the system. We've seen footnote
> spelled a dozen different ways; the last thing we want to do is
> replace two characters with 20, especially when the 20 characters are
> just as confusing to someone who hasn't touched TEI-Lite, especially
> as most of our users haven't written XML or HTML.

Well, you obviously have a lower opinion of the average DPer than I do. 
I think your average volunteer wouldn't have any trouble at all with a 
new, and slightly more verbose, markup. And with DTD validation you 
should be able to get feedback on mistakes much quicker.

>> I, myself, would find it
>> easier because the TEI markup virtually describes itself, where as /# is
>> completely arbitrary,
> 
> The choice of " over ' or ? is completely arbitrary (well, ? isn't
> completely arbitrary, but you can't expect most of our users to know
> that.) Why rend="display:block" instead of rend=display:block or
> display=block or even just block, again from the point of view of
> someone to whom XML means nothing?

/Everything/ is arbitrary from the point of view of someone who doesn't 
understand the context in which it is meaningful. The most recent 
version of PGTEI calls for the values of "rend" attributes to be valid 
CSS rules. Thus, "rend='display:block'" has meaning to everyone who 
understands CSS, or someone inquisitive enough to question what the 
underlying meaning might be, yet it is no more arbitrary than "/# #/" to 
someone who does not understand CSS. It may not have meaning to all 
people in all contexts, but it does have meaning to some people in some 
contexts, which is better than having no meaning at all to anyone.

>> But this is exactly what you would get if
>> you don't engage a human brain in the process.
> 
> What, insulting of people you want to convince?

It is regrettable that you have chosen to be offended by my comments. 
Perhaps I was not clear enough in expressing my opinion that it is 
impossible to add meaning (semantics) to a file in an automated way. It 
/requires/ a human in the process. I'm not suggesting that humans are 
involved, but not engaging their brains, I'm suggesting that any process 
that excludes human involvement (i.e. automated scripts) cannot succeed 
in this particular task (although I'm open to being convinced of the 
contrary; in fact, I would /love/ to be convinced of the contrary).

In any event, I'm not trying to convince anyone of anything. I'm just 
gathering information, and trying to share what I have gathered and 
concluded with others. I have no illusions about my ability to affect 
the institutional inertia of DP, or Michael Hart's commitment to 
anarchy. If someone finds what I am saying interesting, or wants to 
explore the ideas further, that's great. If not, I'm not troubled; I 
have no expectation that what I say will be wildly popular.

-- 
Nothing of significance below this line.


From piggy at netronome.com  Mon Oct 22 17:16:10 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Mon, 22 Oct 2007 20:16:10 -0400
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D02C6.9010201@novomail.net>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net>
Message-ID: <471D3D4A.2080909@netronome.com>

Lee Passey wrote:
> Jon Noring wrote:
>
>   
>> I plan to use the following for this:
>>
>> <choice>
>>    <sic>...</sic>
>>    <corr>...</corr>
>> </choice>
>>
>> (and of course various attributes will be applied...)
>>     
>
> I think that is the correct way, and, as I understand it, the only 
> acceptable way under P5.
>   

That's a shame as I really like the asymmetry in <corr 
sic="error">correction</corr> recommended in P4. The P5 solution is so 
painfully verbose.


From jon at noring.name  Mon Oct 22 17:35:07 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 22 Oct 2007 18:35:07 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D3D4A.2080909@netronome.com>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net> <471D3D4A.2080909@netronome.com>
Message-ID: <1795766454.20071022183507@noring.name>

La Monte H.P. Yarroll wrote:
> Lee Passey wrote:
>> Jon Noring wrote:

>>> I plan to use the following for this:
>>>
>>> <choice>
>>>    <sic>...</sic>
>>>    <corr>...</corr>
>>> </choice>
>>>
>>> (and of course various attributes will be applied...)

>> I think that is the correct way, and, as I understand it, the only
>> acceptable way under P5.

> That's a shame as I really like the asymmetry in
> <corr sic="error">>correction</corr> recommended in P4. The P5
> solution is so painfully verbose.

Hmmm, in actually marking up the error corrections to "My Antonia",
I ran into the issue of placing "corrected" text into attribute
values, which leads to some problems under certain circumstances which
I won't get into here. It is much better if the corrected text is also
PCDATA in its own markup (and if needed may include TEI tags which is
one of the problems with using attribute values.) The P5 solution is
better, imho, and very clean to understand and follow.

In addition, with the P5 system, we can do the following:

<choice>
   <sic>...</sic>
   <corr resp="#xx">...</corr>
   <corr resp="#yy">...</corr>
   <corr resp="#zz">...</corr>
</choice>

Thus if we have some differences of opinion as to what a correction
should be, we can handle those multiple suggestions using the P5
method.

As I look more at the <choice> element, the more I like it for
mastering purposes.


Of course, just imho.

Jon Noring


From Bowerbird at aol.com  Mon Oct 22 19:44:20 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 22 Oct 2007 22:44:20 EDT
Subject: [gutvol-d] nice weekend
Message-ID: <c02.2371c2ea.344eba04@aol.com>

what a nice weekend, blessedly free of t.e.i. posts in my spam folder...

i suppose this means the .tei guys have gotten their stuff figured out,
which is good, because maybe they'll make pudding instead of spam.       :+)

meanwhile, i've been working on my .zml-to-.pdf conversion.

yeah, .pdf still stinks, for the most part, but if i'm gonna do it,
i want to do it right.

you might recall that .zml has been able to create a nice-looking .pdf
for some time now.   indeed, if you go over and look at this page here:
>    http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/
you'll see that i showed you versions of a .pdf of "alice in wonderland"
way back in september of 2005.   two years ago!   seems like yesterday!

(ok, maybe not yesterday.   but it's hard to believe it was 2 years ago;
i can still remember how marcello squealed like a stuck pig about it.
then again, it was 3 years ago boston was down 3-0 to the yankees.)

anyway, at that time, i had not yet installed the nifty navigational links
that i consider to be crucial in an electronic-book.   i've now done that.
so now the .pdf doesn't just look good, it's also high-powered as well.

you can now download a demo .pdf, created using my p.g. test-suite:
>    http://z-m-l.com/go/suite/test-suite-demo.pdf

the z.m.l. file -- from which this .pdf was auto-generated -- is here:
>    http://z-m-l.com/go/suite/test-suite-demo.zml

you'll find, on page 2, a table of contents that is thoroughly hotlinked.
moreover, every chapter-heading links _back_ to this table of contents,
which means it's very easy for a reader to get an overview of the e-book.

in addition, every chapter-heading has links in the upper-corners that
will conveniently transport you to the previous/next chapter-heading...

i found this capability _so_ useful that i put these links on _every_ page.
when i read, i often look ahead -- to see where the next chapter starts --
so i know whether to finish up the current chapter, or stop where i am.
and the "jump to the next chapter" button comes in very handy for that...

the links in the bottom-corners transport you to the previous/next page.
so you can quickly and easily navigate a whole document with the mouse.

in addition, the "internal" links are installed as well.   so -- for instance 
--
when there is a back-reference in chapter 10 back to chapter 2, it is a
hotlink so you can just click on the back-reference to jump to chapter 2.
any reference to a chapter-heading automatically becomes a link in .zml.
(if you have a 2-part chapter-heading, _either_ part will satisfy this rule,
so you can have a reference to "chapter 2" or "the sections of the book")

i've also installed links for the footnotes, so you can jump back and forth
between the footnote anchors in the text and their footnotes at the end...
in addition, i've also used the .pdf "note" capacity to "pop up" the footnote
right on the page where it occurs, when you put your cursor over the note.

eventually i will offer users the ability to have the footnote actually 
printed
on the page itself, for when they want to print out hard-copy, but for now,
i think the double-functionality of links and pop-ups should be sufficient.

i also install links for each u.r.l. that's mentioned.   the adobe reader 
will
do this automatically, but only for an end-user who activates that option,
and some might have it turned off, so i have my converter do it as well...

i also install links to the u.r.l. for each picture, if that was given in the 
z.m.l.
(i have a few such pictures in the test-suite, so you can see how they work.)

which brings up the shortcomings of this demo.   the pictures are not yet
being positioned in an optimal way, so you will have to forgive that 
glitch...

also, i didn't take the time to figure out how to do text styling in this 
demo,
so the italics are being rendered as bold, and the spacing is kind of 
weird...

also, all my links have visible rectangles drawn around them, which is 
putrid.
(it should be an option that the end-user can turn 'em off or on, as 
desired.)

finally, i used helvetica for this demo.   some people _love_ helvetica, but
for the life of me, i don't know why.   i like sans serif, but helvetica is 
ugly,
so i apologize for offending your sensibilities if you can't stand helvetica.

***

for a long time now, the price for anyone wanting to buy the rights to z.m.l.
has been "six figures".   and i think it was a couple years back that i 
raised it
to $200,000 minimum.   now, with fairly solid conversions to .html and .pdf,
an offline-standalone authoring tool, and a to-be-announced-quite-soon
web-based authoring tool, plus viewer-apps, i'll be raising the price 
again...

as of november 1, 2007, the price for full rights to z.m.l. will be $350,000.
since this is just 10% of what amazon paid for mobipocket, it's a fair 
price...

preference will be given to buyers who will make the package open-source,
and such buyers can negotiate for a substantial discount, maybe up to 50%...

of course, you know, you could just figure it all out for yourself.   it's 
simple...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071022/3c2f6457/attachment-0001.htm 

From lee at novomail.net  Mon Oct 22 20:49:18 2007
From: lee at novomail.net (Lee Passey)
Date: Mon, 22 Oct 2007 21:49:18 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D3D4A.2080909@netronome.com>
References: <6310459816.20071020133137@noring.name>	<471D02C6.9010201@novomail.net>
	<471D3D4A.2080909@netronome.com>
Message-ID: <471D6F3E.30004@novomail.net>

La Monte H.P. Yarroll wrote:

> Lee Passey wrote:
>> Jon Noring wrote:
>>
>>> I plan to use the following for this:
>>>
>>> <choice>
>>>    <sic>...</sic>
>>>    <corr>...</corr>
>>> </choice>
>>>
>>> (and of course various attributes will be applied...)
>>>     
>> I think that is the correct way, and, as I understand it, the only 
>> acceptable way under P5.
> 
> That's a shame as I really like the asymmetry in <corr 
> sic="error">correction</corr> recommended in P4. The P5 solution is so 
> painfully verbose.

 From a purely aesthetic point of view, I tend to agree. But as some 
people on this list may recall, I believe that it is not only possible, 
but desirable, to create TEI files which can render natively in 
CSS-aware browsers, such as Opera and Firefox.

I haven't figured out how (if it's even possible) to get an attribute 
value to be displayed as part of the text using CSS, but with the new P5 
structure it should be possible to do:

sic { display:none } to see only the corrected text, or
corr { display:none } to see only the uncorrected text.

What's really kind of fun is that you could get output like

A thoroughly modem [sic] Millie

from

A thoroughly <choice><sic>modem</sic><corr>modern</corr></choice> Millie

by using this:

corr { display:none }
sic:after { content: " [sic]" }

so all in all I think the change is positive, even if not quite as 
elegant as the P4 version.


From jon at noring.name  Mon Oct 22 23:28:14 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 00:28:14 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D6F3E.30004@novomail.net>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net> <471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net>
Message-ID: <05211275.20071023002814@noring.name>

Lee wrote:

> I haven't figured out how (if it's even possible) to get an attribute 
> value to be displayed as part of the text using CSS.

This, I believe, is not possible with CSS.


It is usually (but not always) a bad idea to place content into an
attribute value. And, if I am interpreting the XML spec correctly (see
Sec. 3.1), for XML well-formedness we cannot have start or end tags in
an attribute value, either direct or indirect:

   "Well-formedness constraint: No < in Attribute Values

   "The replacement text of any entity referred to directly or
   indirectly in an attribute value MUST NOT contain a <."

   (That means, I believe, one may not use "<", "&lt;", "&amp;lt;",
   etc., in attribute values. Nor the numeric character entity
   equivalent to "<". No "<" in any form. Period. Thus, there is no
   way by XML well-formedness rules to include start/end tags in
   attribute values.)


The <choice> technique is actually very elegant, since it allows a
lot more flexiblity and power, such as choosing between multiple
corrections, including certain markup in the corrected text, and using
CSS for desired visualization.

Jon Noring


From marcello at perathoner.de  Tue Oct 23 01:50:31 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 23 Oct 2007 10:50:31 +0200
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D6F3E.30004@novomail.net>
References: <6310459816.20071020133137@noring.name>	<471D02C6.9010201@novomail.net>	<471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net>
Message-ID: <471DB5D7.7000304@perathoner.de>

Lee Passey wrote:

> I haven't figured out how (if it's even possible) to get an attribute 
> value to be displayed as part of the text using CSS

corr:after { content: " " attr(sic);
             text-decoration: line-through; color: red; }

The hard part is to do this using M$ browsers ...


-- 
Marcello Perathoner
webmaster at gutenberg.org


From piggy at netronome.com  Tue Oct 23 04:51:08 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Tue, 23 Oct 2007 07:51:08 -0400
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D6F3E.30004@novomail.net>
References: <6310459816.20071020133137@noring.name>	<471D02C6.9010201@novomail.net>	<471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net>
Message-ID: <471DE02C.2040905@netronome.com>

Lee Passey wrote:
> La Monte H.P. Yarroll wrote:
>
>   
>> Lee Passey wrote:
>>     
>>> Jon Noring wrote:
>>>
>>>       
>>>> I plan to use the following for this:
>>>>
>>>> <choice>
>>>>    <sic>...</sic>
>>>>    <corr>...</corr>
>>>> </choice>
>>>>
>>>> (and of course various attributes will be applied...)
>>>>     
>>>>         
>>> I think that is the correct way, and, as I understand it, the only 
>>> acceptable way under P5.
>>>       
>> That's a shame as I really like the asymmetry in <corr 
>> sic="error">correction</corr> recommended in P4. The P5 solution is so 
>> painfully verbose.
>>     
>
>  From a purely aesthetic point of view, I tend to agree. But as some 
> people on this list may recall, I believe that it is not only possible, 
> but desirable, to create TEI files which can render natively in 
> CSS-aware browsers, such as Opera and Firefox.
>
> I haven't figured out how (if it's even possible) to get an attribute 
> value to be displayed as part of the text using CSS, but with the new P5 
> structure it should be possible to do:
>
> sic { display:none } to see only the corrected text, or
> corr { display:none } to see only the uncorrected text.
>
> What's really kind of fun is that you could get output like
>
> A thoroughly modem [sic] Millie
>
> from
>
> A thoroughly <choice><sic>modem</sic><corr>modern</corr></choice> Millie
>
> by using this:
>
> corr { display:none }
> sic:after { content: " [sic]" }
>
> so all in all I think the change is positive, even if not quite as 
> elegant as the P4 version.
>   
OK, I'm convinced.

Anybody care to create CSS which will show the <sic> in a <choice> as 
mouse-over on the <corr>?


From jon at noring.name  Tue Oct 23 08:03:26 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 09:03:26 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471DB5D7.7000304@perathoner.de>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net> <471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net> <471DB5D7.7000304@perathoner.de>
Message-ID: <159627520.20071023090326@noring.name>

Marcello wrote:
> Lee Passey wrote:

>> I haven't figured out how (if it's even possible) to get an attribute 
>> value to be displayed as part of the text using CSS

> corr:after { content: " " attr(sic);
>              text-decoration: line-through; color: red; }

Wow, thanks! My prior comment on this was wrong. I should have checked
the CSS spec before saying it could not be done. I do know there are
some things regarding value transfer between XML and CSS that cannot be
done.


> The hard part is to do this using M$ browsers ...

I tested the above, and indeed IE does not recognize this CSS.
However, the CSS works like a charm in Firefox and Opera.

If the idea is to use CSS for visualizing TEI documents during the
authoring process, where we don't care about end-user presentation,
and don't care that we can't enable image embedding and hypertext
links, then the fact that we can't use IE does not really matter.

Jon Noring


From lee at novomail.net  Tue Oct 23 08:36:21 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 23 Oct 2007 09:36:21 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471DB5D7.7000304@perathoner.de>
References: <6310459816.20071020133137@noring.name>	<471D02C6.9010201@novomail.net>	<471D3D4A.2080909@netronome.com>	<471D6F3E.30004@novomail.net>
	<471DB5D7.7000304@perathoner.de>
Message-ID: <471E14F5.5060409@novomail.net>

Marcello Perathoner wrote:

> Lee Passey wrote:
> 
>> I haven't figured out how (if it's even possible) to get an attribute 
>> value to be displayed as part of the text using CSS
> 
> corr:after { content: " " attr(sic);
>              text-decoration: line-through; color: red; }

Thank you.

> The hard part is to do this using M$ browsers ...

Well, I think we all know what the solution to /this/ problem is ... :-)

-- 
Nothing of significance below this line.


From lee at novomail.net  Tue Oct 23 10:18:50 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 23 Oct 2007 11:18:50 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <447877789.20071016105253@noring.name>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name>
Message-ID: <471E2CFA.5060708@novomail.net>

Jon Noring wrote:

> Now, granted, I had not thought of this situation, even though I am
> aware of it, since in *so many* PG (X)HTML texts I've looked at,
> &nbsp; is rampantly being abused, such as for indentation of
> paragraphs and verse lines, etc. It's better to see &nbsp; being used
> only for keeping words together rather than forcing spacing in visual
> presentation (since that is its purpose.) Yet, &nbsp; is still
> something I am not fond of using in virtually any circumstance,
> especially in that in most instances there is a markup solution, as
> illustrated above.

Playing with My ?ntonia, I notice you have left the original typography 
in place with regards to contractions, e.g. "do n't". I find that 
frequently this will cause the "do" to end one line, and "n't" to start 
the following line. Don't you think this is an appropriate use for 
&nbsp; (e.g. "do&nbsp;n't")? After all, the nbsp stands for Non-Breaking 
SPace, which seems to be exactly what it is being used for here.

-- 
Nothing of significance below this line.


From ralf at ark.in-berlin.de  Tue Oct 23 09:50:48 2007
From: ralf at ark.in-berlin.de (Ralf Stephan)
Date: Tue, 23 Oct 2007 18:50:48 +0200
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471DB5D7.7000304@perathoner.de>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net>
	<471D3D4A.2080909@netronome.com> <471D6F3E.30004@novomail.net>
	<471DB5D7.7000304@perathoner.de>
Message-ID: <20071023165048.GA25910@ark.in-berlin.de>

Marcello Perathoner wrote 
> Lee Passey wrote:
> > I haven't figured out how (if it's even possible) to get an attribute 
> > value to be displayed as part of the text using CSS
> 
> corr:after { content: " " attr(sic);
>              text-decoration: line-through; color: red; }

Would that be acceptable behaviour for a TEI file for PG?

I'm just in the process of providing a default stylesheet
with the TEI file skeleton that is downloadable on DP project
pages, and I'm asking myself if this should go in.


ralf


From jon at noring.name  Tue Oct 23 11:03:29 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 12:03:29 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471D02C6.9010201@novomail.net>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net>
Message-ID: <1963269351.20071023120329@noring.name>

Lee wrote:
> Jon Noring wrote:

>> Anyway, one thing that interests me are your reasons for placing the
>> sole note in My Antonia where you did in the text, at the end of the
>> chapter it is referenced.
>>
>> I've been going through the pros and cons of three placements:
>>
>> 1) Inline at the point of occurrence,
>>
>> 2) At the end of the division they are referenced in (which you did)
>>
>> 3) At the end of the document in a stand-alone "note dump" section
>>   (which for books with many notes would collect all the notes in one
>>   place.)

> When I placed the note at the end of the chapter, I did it primarily
> to make the file display nicely when using CSS. I'm sure you
> understand CSS much better than I do, but I couldn't figure out any
> way, using CSS, to move a note from an inline presentation. The one
> thing we /don't/ want is for the note to be displayed at the same
> place it is referenced.

I've experimented using CSS in yanking and moving inline text to
elsewhere in the document (such as to the left or right of the
containing paragraph using the "float" property) during browser
presentation, and it's not really worked to my satisfaction. (See CSS
2.1, Section 9: Visual formatting model, for the grab-bag of CSS
properties that may be used: http://www.w3.org/TR/CSS21/visuren.html )

One can, of course, present the inline note as its own "block"
dividing the main flow of the text (and differentiate it from the
main, use "display: block;"). One can even handle multiple levels of
notes (e.g., a note to a note.) Lee, I sent you a demo of doing this.
It works quite nicely in Opera and Firefox, but IE wants to generate a
new paragraph after the inline note is presented in its own block.

Obviously for end-user presentation this is not a good way to present
inline annotations, but for visualization to aid in the document
markup process, it works quite well.

To me, the only reason to apply CSS to the TEI master is for
visualization during document markup and not for end-user purposes
since we cannot embed images (other than forcing it by custom CSS
which is sort of silly to do, in my opinion) and more importantly for
hypertext links for tables of contents and referenced annotations
(with Firefox we can enable hypertext links using XLink, but then
that's not "vanilla" TEI markup -- we'd have to introduce xlink:
namespace stuff into our TEI master documents -- not sure we want to
go that direction. And we'd be restricted to Firefox browsers at the
moment: Opera nor IE support any XLink -- Firefox support is only
for hypertext links and not embedded images.)


> Your questions prompted further reflection on my part, however. I
> asked myself, what is the purpose of footnotes/endnotes, and how are
> they traditionally presented.

And this is a good thing to understand. :^)


> It seems to me that generally there are two kinds of footnotes,
> explanatory footnotes and bibliographic footnotes. Bibliographic
> footnotes are used to contain a reference to the source of a quotation
> or viewpoint, or to further useful information pertaining to the subject
> in the main text. Explanatory footnotes are used for additional
> information or explanatory notes that might be too digressive for the
> main text, as an alternative to a parenthetical comment.

Yes, it does seem like *referenced* annotations come in two basic
flavors as Lee noted. An annotation may also mix the two. (I use
the word "annotation" loosely here since it may include only a
bibliographic reference -- maybe "amplificatory" is a better word?)

I'm not sure we need to differentiate between the two in mastering,
however, unless we plan, upon conversion to an end-user format, to
separate the two types from each other so they are presented to the
end-user in "different areas". But I'll try not to become religious
on this issue until I hear more pros and cons. <smile/>

Hopefully DP folk will let us know of some books which use funky/
mixed/multiple ways to handle referenced annotations. It's the
"outliers" that may help in decision-making when we have multiple
options.


> Footnotes are notes of text placed at the bottom of a page in a book
> or document. Endnotes are similar to footnotes, but differ in that
> rather than appearing at the foot of the particular page, they are
> collected together at the end of the chapter or at the end of the
> work. Rarely do you see notes actually embedded in the text; when
> you do they are called parenthetical expressions and are delimited
> from the surrounding text by parentheses.

Summarizing what Lee said, we have essentially three general places
where referenced annotations may occur in the original paper source:

1) Inline at the exact point or range of reference.

2) On the page the annotation is first referenced but not at the point
   of reference (an annotation of course can be referenced more than
   once.) Two general places we find them:

   a) Footnotes area.

   b) Sidebar area (right and/or left)

3) Gathered at the end of some document hierarchical level in which
   the references occur. Could be the same division, or as high as
   the top level of the book (the whole book -- here called endnotes.)

Of course, some books may use any two or all three of these general
locations to place referenced annotations.


> In print documents endnotes are considered more inconvenient than
> footnotes because of the need to move back and forth between the
> main text and the endnote section. I think it is for this reason
> that explanatory notes are typically presented as footnotes, so you
> can quickly glance at the note while still in the main text, and
> endnotes are typically bibliographic notes.
>
> In electronic documents, the practical distinction between footnotes
> and endnotes becomes less important. Assuming you have implemented
> the note with reciprocal pointers, it doesn't matter where in the
> file the note appears, except that you /don't/ want the note to
> appear in the main text where it will disrupt the flow of the text.

This is an important point. For mastering purposes, one can certainly
describe where an annotation was placed (for those wishing to create
a facsimile reproduction), but the placement is almost always itself
arbitrary and is not part of the content itself. It is usually a
decision of the publisher/typesetter as to where to place annotations
based on readability/usability for the *particular paper artifact*,
the "Manifestation" of the "Expression", being produced (see below for
a better explanation and reference to Manifestation and Expression.)


> In HTML there are a couple of ways to achieve this. You could
> collect all the notes at the end of the main file, or in a separate
> notes file. Each reference in the main text would point to a note
> using "href='#xxx'", and each note would > point back to the
> reference in the main text.

Usually an annotation is referenced only once from the main text, but
we do find that annotations may be referenced multiple times in a
book. In this case, when I've implemented "reference back" in XHTML,
I've always pointed back to the first reference occurence.


> Alternatively, each reference in the main text could contain a
> "title" attribute which most browsers display when the mouse cursor
> "flies over" the reference. Or, the two options can be combined.

Lots of things can be done in the end-user renditions, but the question
is what makes sense for the TEI master. Of course, making it easier to
produce the most important end-use renditions may factor into what is
done for the master (when we have multiple options for what we can do
in the master, we then invoke various requirements to decide which
among the various options to implement.)


> Given the ease of navigating between the main text and the notes in
> electronic documents, there doesn't seem to be much need to try to
> force the note text onto the same screen "page" as the referring
> text, unless the electronic format is simply a precursor to a
> printed document, as with PDF.

Agreed! The placement of referenced annotations is almost always (if
not "always always") totally arbitrary in the original source book.
For mastering we can record placement for those who want that
information, and let those who produce end-user renditions to decide
how to handle them (and whether or not to use the original placement
information.)

In essence, referenced annotations we find in old books is a primitive
form of hypertext linking.


> So why would you ever place the note inline at the place of the
> reference? Well, I think this brings us to the past/future
> dichotomy I mentioned earlier.

Since Lee and I seem to agree that the placement of referenced
annotations in the TEI master document is pretty much arbitrary, where
we place them in the master then depends upon other requirements.


> Most commonly TEI is probably used as a transcription markup; that
> is, TEI is used to transcribe an existing printed work into an
> electronic format. But it is also possible to use TEI as a text
> mastering format. How certain TEI elements should be interpreted
> depends on which of these two uses you have chosen.

Agreed.

My philosophical view of where to focus is based upon the Work-
Expression-Manifestation-Item FRBR group 1 entities:

   http://en.wikipedia.org/wiki/FRBR

(I use the acronym WEMI as a sort of memory aid. It rhymes with
"hemi". <smile/>)

I believe for maximum repurposeability and usability, the TEI digital
master must capture the Expression, not the Manifestation. Once we
free ourselves from the (almost always) arbitrary layout of a
particular physical book (the Manifestation), that makes it clearer
what is and is not important to capture in the TEI Master, and how to
do so. I think this is where many of us have differences of opinion,
since several here are interested for the digital master to capture
the Manifestation of the source book. From what Lee has said, I
think we agree the focus should be on the Expression for maximum
usability of the digitized texts.

Now, I am NOT hostile to those who wish to produce facsimile
reproductions (at any level of exactness), but that must not drive the
markup of the Master. So long as we preserve original page scans,
those who wish to take the TEI Master and produce some sort of
facsimile reproduction are certainly encouraged to do so! I view
the facsimile reproduction as not a "master", but an "end-user"
rendition.


> For example, consider the <lb/> element which, according to the P5
> guidelines, "marks the start of a new (typographic) line in some
> edition or version of a text."

At this time I plan to use <lb/> in test mastering as a mark for a
typographic "end of line" in the original, with no statement or
inference as to whether there's any meaning to the break -- that's
for other markup to flesh out. So this is a deviation from the P5
meaning of <lb/>. Maybe I should use <milestone> instead with a
custom value if I want to flag typographic EOL ("tEOL"?)


> You have indicated that in your "faithful" edition of My ?ntonia you
> want to maintain a record of the original line breaks. For this you
> would use the <lb/> element.

I preserved the exact spot (including within hyphenation) for
typographic EOL. This is done for four primary reasons:

1) For purposes of future OCR and proofing work on the texts -- we'd
   like to know the exact spot of each EOL.

2) References found in the "old" literature which may refer not only
   to a page but to a line number on the page.

3) To aid those who wish to produce an *exact* facsimile reproduction.
   If we did not have this information, they would have to reinsert
   these EOL markers, which would be a *lot* of work. Since we will
   have this info from the OCR and/or key entry stages, why throw it
   away?

4) To record hyphenated EOL and the decision made on whether the
   resulting re-joined word includes a hard hyphen or not. (This is
   one of the few "editorial" decisions that needs to be made in
   producing the master -- to infer the original text -- and we could
   get some of these wrong the first time, so being able to know where
   these EOL hyphenations occur is important for future fixes of the
   master.)


> But when presenting your edition you certainly don't want the User
> Agent to display these line breaks, unless the end user has
> explicitly declared that this is what s/he wants. On the other hand,
> in most PGTEI editions of other works the <lb/> element is used to
> indicate where line breaks should occur when presenting the current
> editions. In other words, in your document <lb/> is an indication of
> where line breaks appeared /in the past/ whereas in PGTEI editions
> <lb/> is an indication of where line breaks should appear /in the
> future/.

Yes! Good observation. My purpose for preserving source EOL points
is for the previously mentioned four reasons -- only one of which
is for rendition use -- and only provided because it is easy for us
to do (since we will preserve it anyway for the other cited reasons.)

For repurposeability, for lines which were broken in the source work
for reasons *other* than simple typography, it is better to use the
appropriate structural/semantic markup to describe why we may want,
in most end-user renditions, to break the line at such a point. There
is a reason "why", and we should mark it up. In many DP produced
texts, they simply force a line break on presentation without saying
why, and this is troubling since such line breaks will oftentimes make
the presentation needlessly look awful on some platforms. Forcing
line breaks in presentation is something that should not be taken for
granted.


> This same analysis applies to the presentation of a Title Page,
> which is one of the bones of contention on the Gutenberg discussion
> list. The <titlePage> element is used to transcribe how a title page
> appeared /in a past edition/, whereas the <divGen> element is used
> in PGTEI to create a new, standardized title page /in a future
> edition/.

Yes, I view the Title Page in nearly all books to be part of the
Manifestation, but not the Expression. The Title Page includes
important metadata which must be preserved, but in a way which is
machine processible as metadata.

Personally, in TEI mastering, I would not even include a title
page nor even a <divGen>, since we will preserve the original page
scan anyway. Rather, for markup visualization purposes only (not
end-user use), I'd simply use CSS to present the metadata as a sort of
"title page" information. It's when the TEI master is converted to
end-user formats that a title page will be created (as necessary),
and each target format has its own requirements which we cannot
a priori predict. This is why I am not enamored with the extra
effort it takes to produce some sort of "title page" in the TEI
Master -- it's sort of useless imho.

(I really don't like using <divGen> at all in TEI mastering. Maybe
Marcello and Lee can provide some very useful reasons that it may be
used for some purposes.)


> Returning to the problem of notes, if your intended use is to master
> a future edition you may want to embed the note at the point in the
> main text where the reference occurs. When you're actually composing
> the text, it's at that point during composition that the explanation
> or reference is close at hand. When you're editing or maintaining
> the text having the actual note embedded in the main text will help
> to be sure that edits to the text will not invalidate the reference
> or require alteration of the explanation. When the file is
> transformed into a presentation format the note can be moved to
> wherever is most appropriate for that particular format. During the
> transformation a new intra-document reference will have to be
> created because the relationship between an in-line note and its
> context is only implicit.

These are good points, and in my opinion I am still intrigued with
embedding all referenced annotations at the point in the text they are
referenced.

But I do see Lee's point that we must add xml:id to each note so when
the master is repurposed, there will exist a standardized ID for each
note for both intra- and inter-publication linking purposes.

Furthermore, we should add markup to declare the point or range in the
main text which references the annotation, and such markup would
specify the equivalent of IDREF (this would also be used when we have
one annotation which is referenced in multiple spots in the text.)
Don't know what the appropriate TEI markup would be for this.

(upon rereading this, Lee mentioned the <ref> element, more below.)


> On the other hand, if you are transcribing a work from an existing
> edition, and alteration of the text is not foreseen, I don't see
> how the justifications for creating in-line notes apply. One of the
> downsides to using embedded notes is that if you try to view the
> document using an appropriate Cascading Style Sheet the note text
> will remain visible in the middle of the noted text in the displayed
> document; yet moving text which is too digressive is exactly why an
> author or publisher used a footnote in the first place. We want our
> notes to be stored somewhere where they can be easily accessed, but
> only when we choose to do so, and displayed in a manner which will
> not disrupt the flow of the main text.

My next favorite location in the TEI Master (not end-user renditions
necessarily) to place all the book's annotations is to collect them all
at the end of the book in a special "end notes" section. We now have
them all in one place, and from my work with a few books that contain
hundreds of annotations, collecting them in one place has great
benefit to the authoring process.


> Another of the problems of the using in-line notes, at least as you
> have used them (from my perspective) is that they imply a reference
> without explicitly creating one. This may just be my irrational
> bias, but I get really nervous with implied content; everything that
> can be made explicit, should be.

I agree. See my comment earlier on the need to add referencing markup.


> For example, even if you were to leave the note in-line, you should
> probably include a reference at that point, making the linkage
> between the main text and the note explicit, e.g. [I redid this
> markup a little to make it easier to see]:

**********************************************************************
<p id="p0016">I first heard of ?ntonia

<ref xml:id="ref1" type="note" target="#fn1>[*]</ref>

<note xml:id="fn1" place="foot" target="#ref1>

   <p>The Bohemian name <name rend="font-style: italic">>?ntonia</name>
    is strongly accented on the first syllable, like the English name
    <name rend="font-style: italic">Anthony</name>, ... </p>

</note>

blah, blah, blah...

</p>
**********************************************************************


> (As an alternative, you may want to omit a note marker and use the noted
> text itself, e.g.:

**********************************************************************
<p id="p0016">I first heard of

<ref xml:id="ref1" type="note" target="#fn1>>?ntonia</ref>

<note xml:id="fn1" place="foot" target="#ref1>

   <p>The Bohemian name <name rend="font-style: italic">>?ntonia</name>
    is strongly accented on the first syllable, like the English name
    <name rend="font-style: italic">Anthony</name>, ... </p>

</note>

blah, blah, blah...

</p>
**********************************************************************

I *like* the second above since I believe, for TEI Mastering, the
kinds of note markers used in the source book (with a few rare
exceptions) is NOT important at the Expression level. We should try to
record what the original marker was, but try avoid putting it into
content itself since it is NOT content of the Expression.

Thus, we should let the conversion system add the appropriate
referencing markers, numbers or letters, as needed.


> Likewise, I feel that anytime there is a risk of confusion between
> "in the past" uses of the document and "in the future" uses,
> something needs to be added to the markup to make the use explicit.
> For example, every <lb/> element that is intended to force a line
> break in all future presentations should have some indication that
> it is more that the simple description of the presentation of the
> source text that the guidelines envisioned; perhaps something like
> <lb ed="all"/> or <lb rend="display"/>.

Well, as noted above, we should do our absolute best to markup 'why'
a line break is to occur there in end-user renditions. I personally
believe we can cover 99.9% this way (mostly for verse.) If we still
gotta have a forced line break somewhere and can't mark it up
semantically, we could consider using the <milestone> element with a
"standardized" value that is ours and unambiguous as to meaning.


> Given the purposes and use of footnotes, I think I have concluded
> that for TEI encoding of notes in transcribed texts I would follow
> these guidelines:

Hmmm, so far I am not convinced in separating bibliographic types of
annotations from exposition types. Given that I remain unconvinced,
and I could be persuaded differently, then here's the order of where
I'd place the referenced annotations in the TEI master:

1) At the point of reference.

2) In a special end-notes section which collects them all in one place.

3) At the end of the division where the annotations are *first*
   referenced (note that an annotation may be referenced multiple
   times.)

#1 and #2 are pretty close in my mind, and with #2 we can get quite
powerful CSS visualization, plus we now have them all collected so
conversion is made easier for end-user formats where having all the
annotations collected makes sense (e.g., Microsoft LIT -- which I
do in my commercial version of Burton's "Kama Sutra".)

And I do agree with Lee that no matter what, each annotation must
carry an "xml:id" and that we include a <ref> at the point or range
where the annotation is referenced from, and includes an IDREF. In
addition, I would NOT place at the point of reference the original
notemarker, or any other notemarker, in the *content* of the TEI
master, but do believe we should record the original note marker
within an attribute value in <ref>, however that may be done (not
in the note.)


Jon Noring


From jon at noring.name  Tue Oct 23 11:17:21 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 12:17:21 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <471E2CFA.5060708@novomail.net>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name> <471E2CFA.5060708@novomail.net>
Message-ID: <1461317943.20071023121721@noring.name>

> Jon Noring wrote:

>> Now, granted, I had not thought of this situation, even though I am
>> aware of it, since in *so many* PG (X)HTML texts I've looked at,
>> &nbsp; is rampantly being abused, such as for indentation of
>> paragraphs and verse lines, etc. It's better to see &nbsp; being used
>> only for keeping words together rather than forcing spacing in visual
>> presentation (since that is its purpose.) Yet, &nbsp; is still
>> something I am not fond of using in virtually any circumstance,
>> especially in that in most instances there is a markup solution, as
>> illustrated above.

> Playing with My ?ntonia, I notice you have left the original typography
> in place with regards to contractions, e.g. "do n't". I find that 
> frequently this will cause the "do" to end one line, and "n't" to start
> the following line. Don't you think this is an appropriate use for 
> &nbsp; (e.g. "do&nbsp;n't")? After all, the nbsp stands for Non-Breaking
> SPace, which seems to be exactly what it is being used for here.

Hmmm, yes, this might be a place to use &nbsp;. I'll see if, in the
original source text, an EOL never occurs in the use of these
contractions.

Anyway, using &nbsp; for purposes other than "requesting" the user
agent not put a linebreak at the point, should not be allowed in the
TEI Master.

Jon Noring


From jon at noring.name  Tue Oct 23 11:41:09 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 12:41:09 -0600
Subject: [gutvol-d] it's good to see the .tei people
In-Reply-To: <1461317943.20071023121721@noring.name>
References: <3027016.1192552089449.JavaMail.?@fh1035.dia.cp.net>
	<447877789.20071016105253@noring.name> <471E2CFA.5060708@novomail.net>
	<1461317943.20071023121721@noring.name>
Message-ID: <19210344480.20071023124109@noring.name>

Lee wrote:

> Playing with My ?ntonia, I notice you have left the original typography
> in place with regards to contractions, e.g. "do n't". I find that
> frequently this will cause the "do" to end one line, and "n't" to start
> the following line. Don't you think this is an appropriate use for
> &nbsp; (e.g. "do&nbsp;n't")? After all, the nbsp stands for Non-Breaking
> SPace, which seems to be exactly what it is being used for here.

As a followup to my prior reply, some here may be interested in the
summary regarding the use of white space, the Unicode space
characters, line breaking and the use of the soft hyphen in XML
documents -- a sort of grab-bag of interrelated topics:

   http://www.openreader.org/spec/bnd10.html#sec3.3.7


Especially note a couple references that add insights into these
topics:

1. Unicode in XML and other Markup Languages:

   http://www.w3.org/TR/unicode-xml/

2. Section 6.2 of the Unicode 4.0 Standard (PDF):

   http://www.unicode.org/versions/Unicode4.0.0/ch06.pdf

3. And the easy-to-understand (NOT) Unicode Standard Annex #14 --
  Line Breaking Properties:

   http://www.unicode.org/unicode/reports/tr14/


Jon


From marcello at perathoner.de  Tue Oct 23 11:56:19 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 23 Oct 2007 20:56:19 +0200
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <20071023165048.GA25910@ark.in-berlin.de>
References: <6310459816.20071020133137@noring.name>	<471D02C6.9010201@novomail.net>	<471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net>	<471DB5D7.7000304@perathoner.de>
	<20071023165048.GA25910@ark.in-berlin.de>
Message-ID: <471E43D3.0@perathoner.de>

Ralf Stephan wrote:

> Marcello Perathoner wrote 
>> Lee Passey wrote:
>>> I haven't figured out how (if it's even possible) to get an attribute 
>>> value to be displayed as part of the text using CSS
>> corr:after { content: " " attr(sic);
>>              text-decoration: line-through; color: red; }
> 
> Would that be acceptable behaviour for a TEI file for PG?

As already said, no version of IE supports these standard CSS 2
declarations: you won't see anything on IE.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From Bowerbird at aol.com  Tue Oct 23 12:13:16 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 23 Oct 2007 15:13:16 EDT
Subject: [gutvol-d] early results of radiohead's experiment
Message-ID: <bcb.1780a102.344fa1cc@aol.com>

perhaps you've been wondering about radiohead's experiment?

>    http://mashable.com/2007/10/19/radiohead-album-sales/
>
>    Radiohead, which offered its latest album as free downloads last week, 
>    has seen 1.2 million downloads of ?In Rainbows.? With no label, 
>    no promotions, and direct access to fans, Radiohead   gave up its music 
>    for free and asked for donations, whatever fans deemed reasonable, 
>    in return. What the band got was an average of $8 per album sold, 
>    bringing estimates of profit to about $10 million. Not too shabby 
>    for one week. The number of albums sold in the past week 
>    exceeded the launch week sales of its three previous albums combined.

even if i'm not sure i believe they got "an average of $8" over _all_ of 
those
1.2 million downloads, an average of even half that much would be great,
when you consider distribution costs were minimal, and their cut is 100%...

especially when you factor in the benefits they derive from the bonding
of their fans because of this measure of generosity, a long-term benefit.

ironically, the fans got something over and above a digital download too,
the warm feeling that they have supported a band they love, in a way that
making a buy of a recording-company product probably never matched...

this is truly the model of the future.   and no, when it's an everyday thing,
it won't have the big impact that it had here -- as a novelty occurrence --
but the development of this methodology as "the normal course of music"
will indeed exert a tremendous influence on 21st-century human relations.

the virtually-zero cost of reproducing digital goods will allow a generosity
of spirit to emerge that would have been impossible to engineer in the age
of physical goods, with their comparatively expensive nature of reproduction.

which is not to say that physical goods will disappear.   indeed, radiohead 
is
likely to make a good deal of money from sales of the "hard-copy" versions.
but that $80 package will rightly be seen as a "souvenir" of the _free_ 
music...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071023/2d1a0e3e/attachment.htm 

From jon at noring.name  Tue Oct 23 12:14:13 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 13:14:13 -0600
Subject: [gutvol-d] placement of the note in the TEI of My Antonia
In-Reply-To: <471E43D3.0@perathoner.de>
References: <6310459816.20071020133137@noring.name>
	<471D02C6.9010201@novomail.net> <471D3D4A.2080909@netronome.com>
	<471D6F3E.30004@novomail.net> <471DB5D7.7000304@perathoner.de>
	<20071023165048.GA25910@ark.in-berlin.de> <471E43D3.0@perathoner.de>
Message-ID: <1919890506.20071023131413@noring.name>

Marcello wrote:

> corr:after { content: " " attr(sic);
>              text-decoration: line-through; color: red; }
>
> As already said, no version of IE supports these standard CSS 2
> declarations: you won't see anything on IE.

If the purpose of using this CSS is for visualization of a TEI master
for authoring and maintenance purposes, then we don't care about IE
support so long as one browser supports these properties. It turns out
we have both Opera and Firefox that work with this CSS. Good enough
for me. LOL.

Jon Noring


From Bowerbird at aol.com  Tue Oct 23 14:21:08 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 23 Oct 2007 17:21:08 EDT
Subject: [gutvol-d] nice weekend
Message-ID: <c27.23dce9b6.344fbfc4@aol.com>

here are a few follow-up thoughts on .zml generation of .pdf files...

.pdf gives z.m.l. entry into the world of handheld reader-machines,
including the sony reader, the iliad, and -- of course -- the iphone.

of course, these machines also support .html, but there are people
who are more comfortable with the paged presentation of the .pdf.

and, realistically, since these machines have non-resizable screens,
the frozen-page nature of .pdf is _not_ a deficit; it's even a benefit.

however, my present web approaches work _dandy_ on my iphone,
after a few u.i. tweaks (e.g., enlarging the buttons for my fat fingers).

it's even workable to read a _scan-set_, as i show with "my antonia":
>    http://www.z-m-l.com/go/iphone/myantf007.html
to go forward to the next scan, click on the right 3/4ths of the page;
and to go back one, click the left quarter.   ignore the small buttons...

(thank you to jon noring for his good job of scanning and cropping!
each page of the book displays clearly and readable on the iphone.)

i'm likely to make a native app for the iphone when the developer kit
comes out next february, since i'm an apple/iphone person, but still,
it's nice to know i don't _have_ to, that -- already -- "it just works"...

-bowerbird

p.s.   you could also download the zipped scan-set of "my antonia" --
http://z-m-l.com/go/myant/myant.zip -- and load 'em to your ipod, 
then use the nifty flicking motion to thumb through all of the images.


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071023/382ffa87/attachment.htm 

From sly at victoria.tc.ca  Tue Oct 23 18:47:29 2007
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue, 23 Oct 2007 18:47:29 -0700 (PDT)
Subject: [gutvol-d] Canadian Public Domain music scores site taken town
Message-ID: <Pine.GSO.4.58.0710231846020.22363@vtn1.victoria.tc.ca>


For those interested in Copyright, cease and desist orders, etc.

Geist, Knopf and Universal Edition
http://www.p2pnet.net/story/13749

Universal Edition AG Forces Public Domain Website Offline
http://www.slyck.com/story1603_Universal_Edition_AG_Forces_Public_Domain_Website_Offline

Andrew

From Bowerbird at aol.com  Tue Oct 23 19:01:28 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 23 Oct 2007 22:01:28 EDT
Subject: [gutvol-d] Canadian Public Domain music scores site taken town
Message-ID: <c5c.1b80c936.34500178@aol.com>


michael hart has already come to the rescue...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071023/c7b3d5ca/attachment.htm 

From jon at noring.name  Tue Oct 23 22:44:04 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 23 Oct 2007 23:44:04 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <c27.23dce9b6.344fbfc4@aol.com>
References: <c27.23dce9b6.344fbfc4@aol.com>
Message-ID: <1493909058.20071023234404@noring.name>

Bowerbird wrote:

>  (thank you to jon noring for his good job of scanning and cropping!
>  each page of the book displays clearly and readable on the iphone.)

Thank you for the kind words!

Those who have followed the "Distributed Scanners" YahooGroup, which
was (and still is) intended to discuss the feasibility of starting a
distributed effort at scanning books, know that I tend to fall on the
side of rigor when scanning books. If one gets only one chance at
scanning a book, best to err on the side of overkill rather than
rushing the job through...

   (see: http://groups.yahoo.com/group/distscan/ )

My scan of an original 1st edition of "My Antonia" was done at 600 dpi
and 24-bit color depth (essentially full color). The book was
carefully disassembled and the edges cut so each page could be truly
flat on the flatbed scanner glass. I was pretty rigorous at keeping the
glass clean. Each scanned image was saved in lossless png (typical
compression is about 50% compared to uncompressed bitmap.)

After scanning was done, I then used Paint Shop Pro (DO NOT USE ANY
IMAGE PROCESSING SOFTWARE USED IN OCR PACKAGES) to deskew each page
scan using a *true* image rotation tool (won't explain why this is
important, but trust me, not all deskewing algorithms use true image
rotation.) I then cropped each image to be the same x-y pixel size
(2210x3716) and cropped so as to align the text on all the pages
similarly as best I could. The result was again saved in 600 dpi,
24-bit lossless PNG.

For an example (page 311) of a scanned page image which has been
processed as just mentioned, see:

   http://www.openreader.org/myantonia/orig-pagescans/311.png

(warning, 14 meg file!)

I learned a lot from scanning a few books, and in the future (when I
get a new flatbed scanner) will rescan "My Antonia" following this
partial list of practices (I've kept the disassembled book saved in a
ziploc bag to keep it from collecting any more dust than it has):

1) All pages with print will be scanned at 600 dpi, 24-bit color.
   Those with graphics/images, and the title page, will be done at
   1200 dpi, 24-bit color.

2) I will scan a color calibration chart at both the beginning and the
   end of the job. These images will become part of the book scan set.

3) Each paper page will be thoroughly cleaned to remove as much dust
   as I can. How I'm not sure, so ideas here are welcome (I'd love a
   machine where one simply inserts the page and it comes out the
   other end thoroughly cleaned of all dust.) Of course, the glass
   will also be religiously kept clean.

4) Each scanned image will be saved in lossless format (likely PNG
   since it is an open standard.)

5) This time I will save the raw page scans before they are deskewed
   and cropped. The raw set will be the "raw masters", and the
   deskewed and cropped set the "normalized masters". Of course,
   deskewing will be done using a true rotation algorithm in a
   professional-grade image processing application.

6) Proper filenaming will be done (won't go into specifics here).

7) All scan sets will be burned to multiple DVD and distributed to
   various archives, including the Internet Archive. Other methods of
   storage will also be experimented with. Redundancy and distribution
   of the raw master is important for long-term preservation.

There's a few other "best practices" I will also do in the scanning,
but the above are the major ones.


Now, someone will say why the fuss?

A variety of reasons:

1) For the great Works in the Public Domain, we *should* produce
   archival-quality scan sets of the "canonical" printings of those
   Works. The 1st Edition of "My Antonia" which I possess falls into
   this special category. Of course, I believe all books should be
   scanned with the same rigor, but realistically this is unlikely to
   happen unless I can find an army of people who believe as I do in
   "doing it ueber-right." (Anyone here willing to do this for the
   "canonical" printings of the great Works in the Public Domain, let
   me know.)

2) 600 dpi creates scans which are quite highly readable "as is" and
   for image processing has a lot more information to get good results
   when downsized and processed in various ways.

3) 24-bit provides a lot of useful information. Not only does it
   preserve the natural look of the page, but it provides 3 channels
   (RGB) that we can play with for image processing *and* OCR. I'm
   intrigued of running OCR on each channel and comparing. In
   addition, for things like rotation and resampling, this higher
   color depth will give more accurate results.

4) For each color channel (which effectively is 256 gray scale), we
   can also convert each page to 2 color using a variety of thresholds,
   OCR each one, and do comparison of the results. Starting with high
   quality, high resolution images should give us better results.


Anyway, thanks again to Bowerbird's kind words on the My Antonia scan
set. It looks good on the iPhone because I started with high-quality,
high-resolution, high-color depth masters, and was careful in image
normalization.

Jon Noring


From walter.van.holst at xs4all.nl  Wed Oct 24 09:18:50 2007
From: walter.van.holst at xs4all.nl (Walter van Holst)
Date: Wed, 24 Oct 2007 18:18:50 +0200
Subject: [gutvol-d] [Fwd: Hands on with Google's OCRopus open-source
	scanning software]
Message-ID: <471F706A.3040704@xs4all.nl>


Ars Technica has reviewed Google's OCRopus software:

<http://arstechnica.com/news.ars/post/20071024-hands-on-with-googles-ocropus-open-source-scanning-software.html>

From hart at pglaf.org  Wed Oct 24 12:58:30 2007
From: hart at pglaf.org (Michael Hart)
Date: Wed, 24 Oct 2007 12:58:30 -0700 (PDT)
Subject: [gutvol-d] Canadian Public Domain music scores site taken town
In-Reply-To: <c5c.1b80c936.34500178@aol.com>
References: <c5c.1b80c936.34500178@aol.com>
Message-ID: <Pine.LNX.4.64.0710241258070.23478@pglaf.org>


Well, it wasn't just me directly, it was PG.


On Tue, 23 Oct 2007, Bowerbird at aol.com wrote:

>
> michael hart has already come to the rescue...
>
> -bowerbird
>
>
>
> **************************************
> See what's new at http://www.aol.com
>

From gbnewby at pglaf.org  Wed Oct 24 17:28:08 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 24 Oct 2007 17:28:08 -0700
Subject: [gutvol-d] Canadian Public Domain music scores site taken town
In-Reply-To: <Pine.LNX.4.64.0710241258070.23478@pglaf.org>
References: <c5c.1b80c936.34500178@aol.com>
	<Pine.LNX.4.64.0710241258070.23478@pglaf.org>
Message-ID: <20071025002808.GA27052@mail.pglaf.org>

On Wed, Oct 24, 2007 at 12:58:30PM -0700, Michael Hart wrote:
> 
> Well, it wasn't just me directly, it was PG.

As covered on slashdot, based on a note to the 
BookPeople list:
  http://yro.slashdot.org/article.pl?sid=07/10/24/0325256

We have not heard back from the IMSLP fellow in the past day or
so, but I do think he'll accept our offer.
  -- Greg

> On Tue, 23 Oct 2007, Bowerbird at aol.com wrote:
> 
> >
> > michael hart has already come to the rescue...
> >
> > -bowerbird
> >
> >
> >
> > **************************************
> > See what's new at http://www.aol.com
> >
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From julio.reis at tintazul.com.pt  Thu Oct 25 04:08:46 2007
From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio_Reis?=)
Date: Thu, 25 Oct 2007 12:08:46 +0100
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>
References: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>
Message-ID: <4720793E.8020907@tintazul.com.pt>

A quick (?) general copyright clearance question - are international 
treaties covered by copyright or other restrictions in the USA, which 
would prevent PG from freely distributing them?

We already have a disclaimer which explicitly says people use the texts 
at their own risk, so we're not offering legal help of any kind, only 
informing and distributing legal texts, so we're clear there.

Some such treaties are useful to have around; potentially all? I am 
thinking right now of the Act of Paris of the Berne Convention (1971), 
which Portugal signed in 1978. Administrative texts are not subject to 
copyright in Portugal, so I could find a reliable official source and 
produce a clear text version of the Portuguese translation for upload to 
PG; perhaps HTML too.

FYI - PG Europe carries the Human Rights Declaration. The database is 
down again, so you can't really read it, but... Oh, and if it's the case 
that Gutenberg (USA) can carry international treaties, then I'd like to 
know if the Human Rights Declaration can be offered from gutenberg.org 
also, and what I should do, if anything.

J?lio a.k.a. Tintazul.

From jeroen.mailinglist at bohol.ph  Thu Oct 25 14:04:38 2007
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Thu, 25 Oct 2007 23:04:38 +0200
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <4720793E.8020907@tintazul.com.pt>
References: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>
	<4720793E.8020907@tintazul.com.pt>
Message-ID: <472104E6.2060808@bohol.ph>


Facts cannot be copyrighted. Since any text that has force of law cannot
be paraphrased in any way without loosing its legal standing, laws and
treaties with force of law, at least in the US fall under the
fact-expression merger doctrine, and thus loose copyright restrictions,
certainly when use in a context of law and legal discussion. In other
words, laws are uncopyrightable facts.

I didn't find this rule in PG books, though, but there is enough
jurisprudence available. Note that this is US only. Some countries have
even more insane copyright laws.

Jeroen Hellingman

J?lio Reis wrote:
> A quick (?) general copyright clearance question - are international 
> treaties covered by copyright or other restrictions in the USA, which 
> would prevent PG from freely distributing them?
>
> We already have a disclaimer which explicitly says people use the texts 
> at their own risk, so we're not offering legal help of any kind, only 
> informing and distributing legal texts, so we're clear there.
>
> Some such treaties are useful to have around; potentially all? I am 
> thinking right now of the Act of Paris of the Berne Convention (1971), 
> which Portugal signed in 1978. Administrative texts are not subject to 
> copyright in Portugal, so I could find a reliable official source and 
> produce a clear text version of the Portuguese translation for upload to 
> PG; perhaps HTML too.
>
> FYI - PG Europe carries the Human Rights Declaration. The database is 
> down again, so you can't really read it, but... Oh, and if it's the case 
> that Gutenberg (USA) can carry international treaties, then I'd like to 
> know if the Human Rights Declaration can be offered from gutenberg.org 
> also, and what I should do, if anything.
>
> J?lio a.k.a. Tintazul.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>   


From Bowerbird at aol.com  Thu Oct 25 15:35:59 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 25 Oct 2007 18:35:59 EDT
Subject: [gutvol-d] by by
Message-ID: <c43.1d0079ab.3452744f@aol.com>

>    http://www.gutenberg.org/etext/23181
>   Thomas Jefferson Brown by By James Oliver Curwood

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071025/bdff5157/attachment.htm 

From hart at pglaf.org  Thu Oct 25 17:39:13 2007
From: hart at pglaf.org (Michael Hart)
Date: Thu, 25 Oct 2007 17:39:13 -0700 (PDT)
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <472104E6.2060808@bohol.ph>
References: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>
	<4720793E.8020907@tintazul.com.pt> <472104E6.2060808@bohol.ph>
Message-ID: <Pine.LNX.4.64.0710251738520.16871@pglaf.org>


I think you will find that laws are copyrighted in England.


mh


On Thu, 25 Oct 2007, Jeroen Hellingman (Mailing List Account) wrote:

>
> Facts cannot be copyrighted. Since any text that has force of law cannot
> be paraphrased in any way without loosing its legal standing, laws and
> treaties with force of law, at least in the US fall under the
> fact-expression merger doctrine, and thus loose copyright restrictions,
> certainly when use in a context of law and legal discussion. In other
> words, laws are uncopyrightable facts.
>
> I didn't find this rule in PG books, though, but there is enough
> jurisprudence available. Note that this is US only. Some countries have
> even more insane copyright laws.
>
> Jeroen Hellingman
>
> J?lio Reis wrote:
>> A quick (?) general copyright clearance question - are international
>> treaties covered by copyright or other restrictions in the USA, which
>> would prevent PG from freely distributing them?
>>
>> We already have a disclaimer which explicitly says people use the texts
>> at their own risk, so we're not offering legal help of any kind, only
>> informing and distributing legal texts, so we're clear there.
>>
>> Some such treaties are useful to have around; potentially all? I am
>> thinking right now of the Act of Paris of the Berne Convention (1971),
>> which Portugal signed in 1978. Administrative texts are not subject to
>> copyright in Portugal, so I could find a reliable official source and
>> produce a clear text version of the Portuguese translation for upload to
>> PG; perhaps HTML too.
>>
>> FYI - PG Europe carries the Human Rights Declaration. The database is
>> down again, so you can't really read it, but... Oh, and if it's the case
>> that Gutenberg (USA) can carry international treaties, then I'd like to
>> know if the Human Rights Declaration can be offered from gutenberg.org
>> also, and what I should do, if anything.
>>
>> J?lio a.k.a. Tintazul.
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From gbnewby at pglaf.org  Thu Oct 25 19:11:30 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu, 25 Oct 2007 19:11:30 -0700
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <4720793E.8020907@tintazul.com.pt>
References: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>
	<4720793E.8020907@tintazul.com.pt>
Message-ID: <20071026021130.GH18258@mail.pglaf.org>

On Thu, Oct 25, 2007 at 12:08:46PM +0100, J?lio Reis wrote:
> A quick (?) general copyright clearance question - are international 
> treaties covered by copyright or other restrictions in the USA, which 
> would prevent PG from freely distributing them?

Dear J?lio:

For the most part, these will be eligible for Project Gutenberg.
The key will be to find a source where the US Government publishes
the treaties (such as in the Federal Register), to confirm that
no copyright was claimed.  Then, we should be able to clear
under our Rule 8.

I can think of several variations that might make it more difficult,
but I can correspond individually (or just submit at copy.pglaf.org
with details).
  -- Greg

> We already have a disclaimer which explicitly says people use the texts 
> at their own risk, so we're not offering legal help of any kind, only 
> informing and distributing legal texts, so we're clear there.
> 
> Some such treaties are useful to have around; potentially all? I am 
> thinking right now of the Act of Paris of the Berne Convention (1971), 
> which Portugal signed in 1978. Administrative texts are not subject to 
> copyright in Portugal, so I could find a reliable official source and 
> produce a clear text version of the Portuguese translation for upload to 
> PG; perhaps HTML too.
> 
> FYI - PG Europe carries the Human Rights Declaration. The database is 
> down again, so you can't really read it, but... Oh, and if it's the case 
> that Gutenberg (USA) can carry international treaties, then I'd like to 
> know if the Human Rights Declaration can be offered from gutenberg.org 
> also, and what I should do, if anything.
> 
> J?lio a.k.a. Tintazul.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From wvholst at xs4all.nl  Fri Oct 26 02:40:01 2007
From: wvholst at xs4all.nl (Walter H. van Holst)
Date: Fri, 26 Oct 2007 11:40:01 +0200 (CEST)
Subject: [gutvol-d] Copyright on international treaties
Message-ID: <14569.80.127.124.230.1193391601.squirrel@webmail.xs4all.nl>

>
>
> I think you will find that laws are copyrighted in England.

Article 2.4 of the Berne Convention reads as follows:

"It shall be a matter for legislation in the countries of the Union to
determine the protection to be granted to official texts of a
legislative, administrative and legal nature, and to official
translations of such texts."

Several signatories, including The Netherlands, have made explicit
allowances in their copyright laws for legislation and judicial
verdicts to be exempted from copyright. The UK is indeed an exception
with an especially convoluted set of Crown copyright, Parliamentary
copyright and Copyright in Acts and Measures.

Regards,

 Walter


From Bowerbird at aol.com  Fri Oct 26 10:33:34 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 26 Oct 2007 13:33:34 EDT
Subject: [gutvol-d] web-based wysiwyg word-processors
Message-ID: <c0e.18afedb3.34537eee@aol.com>

the web-based wysiwyg word-processors just keep getting better.

this one -- based in flash -- was even recently bought by adobe:
>    http://www.buzzword.com

the heavy-markup people will soon be faced with a tough choice:
they can align themselves with the light-markup revolution instead,
or face a world where structure is abandoned entirely and we have
documents formatted in a totally undisciplined wysiwyg manner...

it might help to refresh their memory about the way this battle
played out when it was fought on the desktop the last 20 years.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071026/92b4d5c5/attachment.htm 

From Bowerbird at aol.com  Fri Oct 26 13:01:11 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 26 Oct 2007 16:01:11 EDT
Subject: [gutvol-d] z.m.l. auto-conversions to .html and .pdf
Message-ID: <ca2.1ab9da5e.3453a187@aol.com>

ok, i've got z.m.l. auto-converting pretty well to both .html and .pdf now.
so if anyone wants to send me some z.m.l. files to convert for them, i will.

this will let me see where i've got problems with my converter-routines, or
in my formatting rules, or just where people seem "confused" on the rules
(which -- by my philosophy -- means that the rules need to be changed)...

and once i've been reasonably convinced that you can create correct z.m.l.,
i'll give you a u.r.l. where you can just do the .html conversion by 
yourself...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071026/107e0209/attachment.htm 

From Bowerbird at aol.com  Fri Oct 26 16:46:45 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 26 Oct 2007 19:46:45 EDT
Subject: [gutvol-d] funny and too funny
Message-ID: <bff.24d64510.3453d665@aol.com>

funny: is it christmas?
>    http://www.isitchristmas.com/

too funny: the site has an r.s.s. feed:
>    http://www.isitchristmas.com/rss.xml

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071026/059d9b54/attachment.htm 

From piggy at netronome.com  Fri Oct 26 20:19:02 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Fri, 26 Oct 2007 23:19:02 -0400
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <1493909058.20071023234404@noring.name>
References: <c27.23dce9b6.344fbfc4@aol.com>
	<1493909058.20071023234404@noring.name>
Message-ID: <4722AE26.5000804@netronome.com>

Jon Noring wrote:
> ...
> I learned a lot from scanning a few books, and in the future (when I
> get a new flatbed scanner) will rescan "My Antonia" following this
> partial list of practices (I've kept the disassembled book saved in a
> ziploc bag to keep it from collecting any more dust than it has):
>
> 1) All pages with print will be scanned at 600 dpi, 24-bit color.
>    Those with graphics/images, and the title page, will be done at
>    1200 dpi, 24-bit color.
>   
As much as I enjoy the texture of nice paper, 24-bit color seems 
excessive for text. I find for most books that 8-bit gray actually 
produces a more visually pleasing result at 1/3rd the storage. For most 
text I find 200 dpi more than sufficient,  but I do agree that certain 
well-printed works benefit from 600 dpi.

My general rule for deciding if I've used a high enough resolution to 
scan artwork is to ask if I have captured any artifacts which the artist 
did not intend. If I see bubbles in the ink, evidence of aberrations in 
the pen, or even small defects in printing registration, then I have 
scanned at high enough resolution.

I generally agree with your choice of 1200 dpi X 24 bit for illos, but 
there are some caveats.

My scanner will not stream at 1200 dpi on a full-width page. It stops 
every few inches. The places it stops show up as half-pixel deviations 
across the full page. Sometimes these artifacts are sufficiently 
annoying that I am happier with a 600 dpi scan which WILL stream.

I have run across a handful of very high-quality etchings for which I 
have been pleased to have 2400 dpi available. Unfortunately, the pause 
artifacts are even more frequent, but I don't have a good way around that.

> 2) I will scan a color calibration chart at both the beginning and the
>    end of the job. These images will become part of the book scan set.
>   

I have been very pleased with my target from 
http://www.targets.coloraid.de/. I only use the color target with books 
where the color is critical such as illustrated natural history books or 
text books on color. Other than those, the calculated color scanning 
profile for the scanner is quite sufficient.

In the course of scanning a single book, your scanner will not shift 
enough to make a measurable difference between the first and last page 
to justify using the calibration chart twice. One scan every few weeks 
is more than sufficient. Once per book is even overkill, but it does 
simplify record keeping.
> 3) Each paper page will be thoroughly cleaned to remove as much dust
>    as I can. How I'm not sure, so ideas here are welcome (I'd love a
>    machine where one simply inserts the page and it comes out the
>    other end thoroughly cleaned of all dust.) Of course, the glass
>    will also be religiously kept clean.
>   

I have had success with canned air. With some books though, the paper is 
actively turning into dust and it quickly becomes an issue of seriously 
diminishing returns. I have also used scotch tape to pick up particles. 
The scotch tape is more of a touch-up tool than a full cleaning.

For the glass I use Windex wipes followed by a dry lintless cloth. 
Depending on the book I clean anywhere from once per page to once every 
20 or even 30 pages.

I learned the hard way to use the canned air on the glass first. 
Occasionally you come across a particle which will scratch glass.
> 4) Each scanned image will be saved in lossless format (likely PNG
>    since it is an open standard.)
>   

I use png's for almost everything.
> 5) This time I will save the raw page scans before they are deskewed
>    and cropped. The raw set will be the "raw masters", and the
>    deskewed and cropped set the "normalized masters". Of course,
>    deskewing will be done using a true rotation algorithm in a
>    professional-grade image processing application.
>   
Yes, keeping raw originals is VERY useful. I have them for all of my books.

The deskew capability in leptonica is very good and uses the 
fastest-known algorithm. The patent expired a couple years ago.

By your reference to "true rotation" I gather that you object to 
rotation by successive shears? I have been unable to differentiate 
sheared pages from so-called "true rotations" by visual inspection--even 
very close visual inspection. After 50 books, the amount of time you 
spend waiting for page rotations really adds up. Archive your raw scans 
and future generations have the opportunity to redo your rotations if 
they aren't happy with your work.
> 6) Proper filenaming will be done (won't go into specifics here).
>   

Filenames which match page numbers are very handy, but I've lately found 
that a hand-built HTML TOC is even more useful for about the same amount 
of effort. Combine that with a good page-turning interface and exactly 
correct page number filenames are not all that critical.
> 7) All scan sets will be burned to multiple DVD and distributed to
>    various archives, including the Internet Archive. Other methods of
>    storage will also be experimented with. Redundancy and distribution
>    of the raw master is important for long-term preservation.
>   
I highly recommend a simple page-turning interface on top of your raw 
pages. The program "curator" produces a simple HTML hierarchy which 
increases the utility of a CD or DVD of page images quite a bit with 
very little extra overhead.
> There's a few other "best practices" I will also do in the scanning,
> but the above are the major ones.
>
>
> Now, someone will say why the fuss?
>
> A variety of reasons:
>
> 1) For the great Works in the Public Domain, we *should* produce
>    archival-quality scan sets of the "canonical" printings of those
>    Works. The 1st Edition of "My Antonia" which I possess falls into
>    this special category. Of course, I believe all books should be
>    scanned with the same rigor, but realistically this is unlikely to
>    happen unless I can find an army of people who believe as I do in
>    "doing it ueber-right." (Anyone here willing to do this for the
>    "canonical" printings of the great Works in the Public Domain, let
>    me know.)
>   

I feel that we have until 2019 to catch up. Only certain works are worth 
archival-grade preservation. For the bulk of human authorship, simple 
preservation of the content is sufficient. I have even come to feel that 
there are works which are *gasp* not worth preserving.
> 2) 600 dpi creates scans which are quite highly readable "as is" and
>    for image processing has a lot more information to get good results
>    when downsized and processed in various ways.
>   

I find 200 dpi very readable for most works. Interestingly, OCR does 
work better on larger images, but scaling 200 dpi scans to 600 dpi (with 
geometric interpolation) actually works just as well as scanning at 600 
dpi to start with.

> 3) 24-bit provides a lot of useful information. Not only does it
>    preserve the natural look of the page, but it provides 3 channels
>    (RGB) that we can play with for image processing *and* OCR. I'm
>    intrigued of running OCR on each channel and comparing. In
>    addition, for things like rotation and resampling, this higher
>    color depth will give more accurate results.
>   
Unless the book is foxed or badly oxidized, I've found no value to 
scanning text in color.

I HAVE tried the multiple channel OCR trick and the differences are 
insignificant.

The exception is books with lots of foxing. Generally the green channel 
gives the best results, but occasionally the blue channel is better.
> 4) For each color channel (which effectively is 256 gray scale), we
>    can also convert each page to 2 color using a variety of thresholds,
>    OCR each one, and do comparison of the results. Starting with high
>    quality, high resolution images should give us better results.
>
>   
I haven't tried this.

I have observed that most text scanning yields nearly gray results--i.e. 
the color channels tend to have very close numeric values. Heavily 
oxidized paper skews the results, but even then we're talking about a 
few percent difference among channels. Variations across a page 
(especially if we have gutter noise) could be an order of magnitude 
larger than the inter-channel differences.

Now, a good and fast localized thresholding algorithm would be very handy.
>
> Anyway, thanks again to Bowerbird's kind words on the My Antonia scan
> set. It looks good on the iPhone because I started with high-quality,
> high-resolution, high-color depth masters, and was careful in image
> normalization.
>
> Jon Noring
>   
I would add my kudos.


From piggy at netronome.com  Fri Oct 26 20:30:05 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Fri, 26 Oct 2007 23:30:05 -0400
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <472104E6.2060808@bohol.ph>
References: <mailman.2.1193252402.22887.gutvol-d@lists.pglaf.org>	<4720793E.8020907@tintazul.com.pt>
	<472104E6.2060808@bohol.ph>
Message-ID: <4722B0BD.7090802@netronome.com>

Jeroen Hellingman (Mailing List Account) wrote:
> Facts cannot be copyrighted. Since any text that has force of law cannot
> be paraphrased in any way without loosing its legal standing, laws and
> treaties with force of law, at least in the US fall under the
> fact-expression merger doctrine, and thus loose copyright restrictions,
> certainly when use in a context of law and legal discussion. In other
> words, laws are uncopyrightable facts.
>
>   
The case law in the US is not so clear:

http://www.g4tv.com/techtvvault/features/32238/Who_Owns_the_Law.html

> I didn't find this rule in PG books, though, but there is enough
> jurisprudence available. Note that this is US only. Some countries have
> even more insane copyright laws.
>
> Jeroen Hellingman
>   


From dbalexander2 at comcast.net  Fri Oct 26 20:59:31 2007
From: dbalexander2 at comcast.net (David Alexander)
Date: Fri, 26 Oct 2007 22:59:31 -0500
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <4722B0BD.7090802@netronome.com>
Message-ID: <004c01c8184d$cb2ef5e0$640fa8c0@youru3ef4ouuir>

http://www.ca5.uscourts.gov:8081/isysquery/irl346f/1/doc

The full 5th circuit overturned that ruling. Unless the Supreme Court has
said otherwise, or the 5th circuit has since changed its mind, city laws are
public domain in the 5th circuit.

-----Original Message-----
From: gutvol-d-bounces at lists.pglaf.org
[mailto:gutvol-d-bounces at lists.pglaf.org] On Behalf Of La Monte H.P. Yarroll
Sent: Friday, October 26, 2007 10:30 PM
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] Copyright on international treaties


Jeroen Hellingman (Mailing List Account) wrote:
> Facts cannot be copyrighted. Since any text that has force of law 
> cannot be paraphrased in any way without loosing its legal standing, 
> laws and treaties with force of law, at least in the US fall under the 
> fact-expression merger doctrine, and thus loose copyright 
> restrictions, certainly when use in a context of law and legal 
> discussion. In other words, laws are uncopyrightable facts.
>
>   
The case law in the US is not so clear:

http://www.g4tv.com/techtvvault/features/32238/Who_Owns_the_Law.html

> I didn't find this rule in PG books, though, but there is enough 
> jurisprudence available. Note that this is US only. Some countries 
> have even more insane copyright laws.
>
> Jeroen Hellingman
>   

_______________________________________________
gutvol-d mailing list
gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Bowerbird at aol.com  Sat Oct 27 00:35:51 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 27 Oct 2007 03:35:51 EDT
Subject: [gutvol-d] Comment on My Antonia scan set
Message-ID: <c72.1b5258c5.34544457@aol.com>

piggy said:
>    24-bit color seems excessive for text.

oh geez, is noring going on again about high-resolution scanning?

like i said, he did a nice job on the "my antonia" scan-set, what with
the cropping he did.   he also regularized all scans to the same size.
those are the things that need to be done to make a scan-set nice...

but yeah, he scanned them at 600dpi, which just bloated their size...

and he saved 'em as .png files, probably because .jpg is "lossy", but
that just meant they were bigger than they needed to be, which has
a negative impact on both storage requirements _and_ bandwidth...

plus, of course, scanning at 600dpi is 4 times slower than 300dpi,
unless you're using a camera-based setup like the big boys have,
so it's really a waste of most people's time to scan at that resolution.

and 24-bit color?   time-consuming!   it'd take days to scan one book.
which -- by the way -- is how long it took noring to scan "my antonia".
which is probably why he hasn't scanned more of them, i would think...


>   I find for most books that 8-bit gray actually produces 
>    a more visually pleasing result at 1/3rd the storage.

my opinion is that, if the printer considered a page to be black ink on
white paper, then that is exactly how _we_ should consider the page...
if it's something different than that, fine.   otherwise, it should be that.
scan in grayscale if it improves the o.c.r.   but when we make the scans
available to the masses, make 'em bandwidth-kind black-and-white...


>   Filenames which match page numbers are very handy, but 
>    I've lately found that a hand-built HTML TOC is even more useful 
>    for about the same amount of effort.

wrong.   but i'm tired of arguing this one.


>   Combine that with a good page-turning interface and 
>    exactly correct page number filenames are not all that critical.

"a good page-turning interface" is just a basic start on a good tool.

it's even better, though, when you can just type in any pagenumber,
hit <return>, and instantly be at that page; that's how my tools work;
that's how they've worked for _years_, and because of my experience,
over the course of years, i know that it's stupid to work any other way.
and once you've worked for years with tools that have this capability,
you'll agree with me, and see the need for using _proper_ filenames...


>   I highly recommend a simple page-turning interface on top of 
>    your raw pages. The program "curator" produces a simple HTML 
>    hierarchy which increases the utility of a CD or DVD of page images 
>    quite a bit with very little extra overhead.

in the very exact same way that a "simple" interface can "increase utility" 
by
"quite a bit", an interface just a little bit less crude also gives a huge 
jump...

but like i said, i'm tired of arguing this.   try it sometime, and you'll 
see...

***

aside from these few points where you are off badly, however,
the rest of your post was very informative on a number of topics.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071027/448be772/attachment-0001.htm 

From grythumn at gmail.com  Sat Oct 27 05:46:26 2007
From: grythumn at gmail.com (Robert Cicconetti)
Date: Sat, 27 Oct 2007 08:46:26 -0400
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <4722AE26.5000804@netronome.com>
References: <c27.23dce9b6.344fbfc4@aol.com>
	<1493909058.20071023234404@noring.name>
	<4722AE26.5000804@netronome.com>
Message-ID: <15cfa2a50710270546r6b04b76fx6f589fde268c3d5c@mail.gmail.com>

On 10/26/07, La Monte H.P. Yarroll <piggy at netronome.com> wrote:
> By your reference to "true rotation" I gather that you object to
> rotation by successive shears? I have been unable to differentiate
> sheared pages from so-called "true rotations" by visual inspection--even

It eats line art for breakfast:

http://home.comcast.net/~grythumn/abbyy/abbyy_shearing.png

R C

From piggy at netronome.com  Sat Oct 27 05:50:42 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Sat, 27 Oct 2007 08:50:42 -0400
Subject: [gutvol-d] Copyright on international treaties
In-Reply-To: <004c01c8184d$cb2ef5e0$640fa8c0@youru3ef4ouuir>
References: <004c01c8184d$cb2ef5e0$640fa8c0@youru3ef4ouuir>
Message-ID: <47233422.5040508@netronome.com>

David Alexander wrote:
> http://www.ca5.uscourts.gov:8081/isysquery/irl346f/1/doc
>
> The full 5th circuit overturned that ruling. Unless the Supreme Court has
> said otherwise, or the 5th circuit has since changed its mind, city laws are
> public domain in the 5th circuit.
>   

Hooray! I was greatly troubled by the original decision.

> -----Original Message-----
> From: gutvol-d-bounces at lists.pglaf.org
> [mailto:gutvol-d-bounces at lists.pglaf.org] On Behalf Of La Monte H.P. Yarroll
> Sent: Friday, October 26, 2007 10:30 PM
> To: Project Gutenberg Volunteer Discussion
> Subject: Re: [gutvol-d] Copyright on international treaties
>
>
> Jeroen Hellingman (Mailing List Account) wrote:
>   
>> Facts cannot be copyrighted. Since any text that has force of law 
>> cannot be paraphrased in any way without loosing its legal standing, 
>> laws and treaties with force of law, at least in the US fall under the 
>> fact-expression merger doctrine, and thus loose copyright 
>> restrictions, certainly when use in a context of law and legal 
>> discussion. In other words, laws are uncopyrightable facts.
>>
>>   
>>     
> The case law in the US is not so clear:
>
> http://www.g4tv.com/techtvvault/features/32238/Who_Owns_the_Law.html
>
>   
>> I didn't find this rule in PG books, though, but there is enough 
>> jurisprudence available. Note that this is US only. Some countries 
>> have even more insane copyright laws.
>>
>> Jeroen Hellingman
>>   
>>     
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>   


From jon at noring.name  Sat Oct 27 09:36:58 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 27 Oct 2007 10:36:58 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <c72.1b5258c5.34544457@aol.com>
References: <c72.1b5258c5.34544457@aol.com>
Message-ID: <16264944.20071027103658@noring.name>

Bowerbird wrote:

> oh geez, is noring going on again about high-resolution scanning?

Oh geez, is Bowerbird still pushing ITF (impoverished text format)?
<laugh/>

O.k., to get to a couple of his points.


> but yeah, he scanned them at 600dpi, which just bloated their size...

No, I scanned them at 600 dpi, full color, and they are the size they
are. I don't consider file size to be that important for *mastering*
purposes.


> and he saved 'em as .png files, probably because .jpg is "lossy", but
> that just meant they were bigger than they needed to be, which has
> a negative impact on both storage requirements _and_ bandwidth...

Yes, I saved them as PNG since "masters" should not be saved in lossy
formats which add *visible* artifacts to the scans.

When music studios record a performer, do they convert the audio to
MP3 right off the bat, And use the MP3 as their "master"?

(Most professional grade audio recording equipment today, including
that used by amateurs in their basement, sample at 96K and store the
audio in lossy WAV. For iPod use, this is overkill. So why should they
bother to "master" their music at audiophile quality if most people
only listen to the low quality, audibly lossy formats on their iPods?
There are some similarities between music recording and book scanning.)

  
> plus, of course, scanning at 600dpi is 4 times slower than 300dpi,
> unless you're using a camera-based setup like the big boys have,
> so it's really a waste of most people's time to scan at that
> resolution

If something is to be done, it should be done right.

  
> and 24-bit color??  time-consuming!?  it'd take days to scan one book.
> which -- by the way -- is how long it took noring to scan "my antonia".
> which is probably why he hasn't scanned more of them, i would think...

First, there's a lot of potential useful information in those three
color channels. Plus, when there's image processing to be done, that
information is actually quite important to assure the most accurate
results, particularly deskewing by image rotation (not shearing,
thanks for the correct word, La Monte.)

Now, it took a few days because I had other things to do in that
period of time. Each page did take a while, however, about one minute
from start to start (I used the wait time for some online work in
post-processing the images.) So, yes, it took time, but it is time
well spent for that particular book which was a first edition (which
in this case is considered the canonical version) of one of the great
works of American fiction.

And another part of the issue is the speed/quality of the scanner, and
the time to push the data to the computer. The consumer-grade scanning
equipment is improving in speed and quality. Disk space is becoming
dirt cheap, and even DVD-ROM drives and media are getting cheap.

And the reason I haven't done any more books is because my scanner
went on the blink right after that. I'm about ready to buy a new
scanner, and I am looking at the Plustek. I plan to be scanning a few
books pretty soon, some of which I am not at liberty to "chop".

As an aside, since I haven't done scanner product research lately,
have other scanner manufactures put out models competitive to the
Plustek OpticBook?

  
> my opinion is that, if the printer considered a page to be black ink on
> white paper, then that is exactly how _we_ should consider the page...
> if it's something different than that, fine.?  otherwise, it should be that.
> scan in grayscale if it improves the o.c.r.?  but when we make the scans
> available to the masses, make 'em bandwidth-kind black-and-white...

Do note that I downsampled the My Antonia master scans for public
dissemination. Again, you have to differentiate between master scans
and distributable scans.

Here's the link to the various scan set options:

   http://www.openreader.org/myantonia/index.htm

Notice I have 600 dpi bitonal, and 120 dpi anti-aliased gray scale. I
could distribute pretty much anything I want (<= 600 dpi, <= 24 bit
color), each of which will be optimal because I have color masters
from which to generate them. But if one has low quality masters,
repurposing leads to visibly bad results.

Btw, when one is dealing with books with smaller print, down in the
4-5 point size, one must do 600 dpi to get reasonable results for
both OCR and direct viewing. So even if one finds 300 dpi sufficient
for most books *for their immediate needs*, there will be some books
that have to be scanned at 600 dpi, even if only for OCR purposes.

(Part of the hesitancy of many to do higher quality scans is because
they are essentially scanning for a particular process in mind,
usually OCR for use in DP or similar project. In essence, to them the
scans are "throw away" -- simply an intermediary to some particular
end goal. So if they are throw-away, why put in the effort and time to
make them archival quality? -- they'll probably end up disappearing
in some black hole never to be seen by the public. For example, PG
distributes very few scan sets associated with the texts. Yes, I know
DP intends to make its scan sets available someday... But I am talking
about the present time and the message that sends.)


> aside from these few points where you are off badly, however,
> the rest of your post was very informative on a number of topics.

Let me rewrite what you said, Bowerbird:

"aside from a few points where *I believe* you are off badly, however,"

You forgot to add "I believe". <smile/>


Anyway, all the points I bring up in this message are imho, and other
than my comment on ZML, I've avoided focusing on an individual.

Jon Noring


From jon at noring.name  Sat Oct 27 09:50:04 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 27 Oct 2007 10:50:04 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <15cfa2a50710270546r6b04b76fx6f589fde268c3d5c@mail.gmail.com>
References: <c27.23dce9b6.344fbfc4@aol.com>
	<1493909058.20071023234404@noring.name>
	<4722AE26.5000804@netronome.com>
	<15cfa2a50710270546r6b04b76fx6f589fde268c3d5c@mail.gmail.com>
Message-ID: <1083089549.20071027105004@noring.name>

Robert wrote:
> La Monte H.P. Yarroll wrote:

>> By your reference to "true rotation" I gather that you object to
>> rotation by successive shears? I have been unable to differentiate
>> sheared pages from so-called "true rotations" by visual inspection--even

> It eats line art for breakfast:
>
> http://home.comcast.net/~grythumn/abbyy/abbyy_shearing.png

This was discussed in the "Distributed Scanners" YahooGroup:

   http://groups.yahoo.com/group/distscan/


And of course, the shearing method introduces its own distortion to
the characters, which gets worse as the skewing angle increases (both
narrowing and angling of the characters backwards or forwards -- and
not to mention line art as Robert brought up.)

In the "My Antonia" scanning project, a number of pages had skewing as
bad as 2-3 degrees, even though I was careful to align the pages (after
all, I had the time <laugh/> -- imagine those who *hurry* their
scanning.) Part of the problem is that the print block on pages itself
can be skewed a little during printing, and the binding can also
introduce some skewing.

And the effects of shearing on 2-3 degree skews is visibly noticeable
in the characters, and I can't help but think it might even affect OCR
a little bit. With true rotation, the characters do not get distorted.
And in order to get good results with true rotation, it is best to
apply it to high resolution, high color depth images since one has a
lot more "information" the rotation algorithm can use to good effect.

Jon Noring


From Bowerbird at aol.com  Sat Oct 27 11:47:19 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 27 Oct 2007 14:47:19 EDT
Subject: [gutvol-d] Comment on My Antonia scan set
Message-ID: <c49.1b3c29ee.3454e1b7@aol.com>

geez, i think i figured out why my spam folder is always _so_ overflowing;
is jon sending the same message to the listserve _and_ to me personally?

if so, jon, don't bother, because they both end up in the same spam folder.

and no, i am _not_ gonna get on your merry-go-round for the exact same
"discussion" that we've already gone through 89 times (at least) in the past.

if you wanna be totally anal-compulsive and scan at 87000dpi, be my guest.
if you wanna tell everyone else they should be as anal-compulsive as you are,
be my guest; if they're stupid enough to listen, they will deserve what they 
get.

but for anyone who wants to listen to common-sense, then listen up.

if you're generating page-images with a camera, where it takes the same time
to shoot higher-resolution as lower-resolution, fine, shoot 
higher-resolution.

but when you're dragging a scan-head over a page, and it takes 4 times as 
long
to do 600dpi as 300dpi, and another 4 times as long to jump it up to 1200dpi,
and another 4 times as long to go 2400dpi, high-resolution is a waste of 
time...

we've got millions of books needing to be scanned, and 300dpi is good enough.
the only exception is the rare and fragile book that can only be scanned 
_once_.

and, as long as you're not saving and resaving continually, .jpg will be just 
fine...
want to see for yourself?   then take a look and compare a .jpg with a .png 
here:
>    http://z-m-l.com/misc/myantf009.html

-bowerbird

p.s.   now, if by some _miracle_, noring has come up with a new point, 
please,
won't someone share it?   because i'm not gonna read his same old shit again.

p.p.s.   the funniest thing about jon's "my antonia" project is the _cover_ 
scan.
it's a 50-megabyte file.   that's right, 50 megabytes!   for one scan -- the 
cover!
if you want a summary of jon noring's philosophy, that's the best one there 
is.
boy, did i ever feel like a bloomin' idiot after i downloaded _that_...


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071027/f9f6b900/attachment.htm 

From jon at noring.name  Sat Oct 27 14:10:00 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 27 Oct 2007 15:10:00 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <c49.1b3c29ee.3454e1b7@aol.com>
References: <c49.1b3c29ee.3454e1b7@aol.com>
Message-ID: <1581359295.20071027151000@noring.name>

Bowerbird wrote:

> geez, i think i figured out why my spam folder is always _so_ overflowing;
> is jon sending the same message to the listserve _and_ to me personally?

Nope, just to gutvol-d. So your comment on "spam folder" overflowing
is simply hyperbole ("hype"). Oops, maybe I should not use the word
"hype"? <smile/>


>  and no, i am _not_ gonna get on your merry-go-round for the exact same
>  "discussion" that we've already gone through 89 times (at least) in the past.

Just like the ZML merry-go-round? It is clear that nobody on gutvol-d
and DP are interested in your ZML. If anyone has been the number one
*proponent* of it (not for mastering, but for a normalized end-user
plain text rendition), it is ME. It must irk you to no end that I am
the number one supporter of ZML (again as a distribution format, to
make that clear.)

I even asked the DP folk to provide a list of a few representative
texts for you to master in ZML to show them what you can do. So you
could strut your stuff to show ZML could also be a viable mastering
format. They haven't even bothered to do that (well, actually a couple
were mentioned -- have you converted those yet?) I wonder why the
the PG/DP communities have not worked with you on ZML?

And you can demonstrate all you want about converting ZML to crappy
HTML (which you've intentionally made crappy for reasons that are
totally irrational and violate several of *your own principles*) and
probably crappy PDF, too... Making pudding, and making pudding that
is the best tasting in the world, are not one and the same. You
certainly are making pudding. But how will the world view its taste?
Will it be edible?


>  if you wanna be totally anal-compulsive and scan at 87000dpi, be my guest.
>  if you wanna tell everyone else they should be as anal-compulsive as you are,
>  be my guest; if they're stupid enough to listen, they will deserve what they
> get.

If I were truly anal-compulsive, I'd be advocating reproduction
quality, which is a step above archival quality. <smile/>

Jim Weiler, posting to DistScan, proposed 5 levels of quality that
seems to have been embraced by pretty much everyone on DistScan:

   http://groups.yahoo.com/group/distscan/message/6

They are, from highest quality to lowest quality:

5) Reproduction quality. (extreme requirements)

4) Archival quality.

3) Recognition quality.

2) Reference quality.

1) Poor quality. (I term it "unusable for any purpose.")

A lot of the stuff OCA lately produces seems to be a pretty solid
Level 3 (they might claim it is archival quality -- some of their
scans don't reach that level, imho.) Google's quality is quite
variable but seems to also be getting better.

(And again, for OCA at least, they may have internal "masters" of
higher quality, but of that I'm not sure of.)


>  but for anyone who wants to listen to common-sense, then listen up.

600 dpi, 24-bit for text is reasonable and common sense for
*mastering* page scans usable for *pretty much all uses except
for facsimile reproduction.* And I've noted *reasons* why which you
don't address because you can't (you call it a merry-go-round, I call
it rational debate.)

Certainly, your concern is quantity, to hurry up and scan the books.
Fine, I share the same concern.

But the key is that OCA is doing this at a rate that outstrips all of
the PG/DP folk. (Not to mention Google.) In that case, unless one
finds books that OCA will not scan for a number of years (how does one
really know?), then maybe it is better that the DP/PG folk concentrate
on the books which are available through OCA and Google?

That is, if one were to apply "common-sense", then let the "pros" do
the scanning using their camera scanners so DP's work can be focused
on the OCR/proofing side. (Yes, there will be arguments now as to why
the PG/DP folk still need to scan, but I am citing "common-sense"
which is not necessarily the best advice. That is, using the phrase
"common-sense" is oftentimes simply an empty phrase used to stifle
rational debate.)

Btw, it is interesting that I don't see the PG/DP folk promoting an
archive of book scan sets, or stressing the need to submit a copy to
the Internet Archive or something. I believe part of the reason is
the prevailing viewpoint that page scans are simply a "throw away"
intermediary to structured and proofed digital text, which is the real
product. This "meme" persists today even though some are now beginning
to see value in distributing the scan sets online.

While the world applauds the production and archiving of book scans,
we don't see much interest in that here in the PG world, except to use
the scans that OCA/Google produces. Yet maybe over 10,000 book scan
sets have been produced over the years by the PG/DP folk, yet very few
of them are available to the world. (To be fair, DP plans to make its
scan sets available, but has not done so because of some programming
needs. Maybe Juliet can give us an update on the status of this...)

So, yes, my views on scan quality are in the small minority here in the
PG/DP communities. Does holding such an opinion make it automatically
wrong? No. It depends upon what one considers the purpose/requirements
of the scan sets.

And certainly I can get a little extreme in my views that others should
do archival quality scanning. I do not mean to give offense, but I would
like people when they scan that book to ask themselves if the scan set
they produce could have value in and of itself, if they should make
their scans available to the world, and would they be proud of the
quality of their work? So my purpose here is to provide a perspective,
and various reasons, why they should consider taking that extra time
to produce an archival quality scan set. If all they see is that they
submit their scan set to DP and it then disappears from sight (like
the scan set of the Kama Sutra I submitted to DP), they certainly would
NOT be interested in making that extra effort.

Maybe that's all DP needs to do: ask all those who produce scan sets to
not only submit them into their "system" but to also submit them to the
Internet Archive. Brewster has a standing invitation to receive such
scan sets, and it would not take long to write up exact step-by-step
instructions as to how PG/DP volunteers can submit their book scan
sets to the Internet Archive: just gather up the minimum metadata they
require, how to name the file(s), where to upload, etc. Since it is
voluntary, some won't, but I think many will gladly do so since I
think most people will see the need to preserve scan sets in and of
themselves. In turn, the PG archive of DP texts will now be able to
provide a link to the associated scan sets at IA, and if PG wants, can
even download that and offer the same scan set from its own server.


>  if you're generating page-images with a camera, where it takes the same time
>  to shoot higher-resolution as lower-resolution, fine, shoot higher-resolution.
>  
>  but when you're dragging a scan-head over a page, and it takes 4 times as long
>  to do 600dpi as 300dpi, and another 4 times as long to jump it up to 1200dpi,
>  and another 4 times as long to go 2400dpi, high-resolution is a waste of time...

By your argument, let's go to 150 dpi. LOL. The thing you are ignoring
in all this discussion is the purpose/reason for scanning. If the sole
purpose of the scans is simply to be fodder for OCR, then that
establishes the scanning requirements.

And note that flatbed scanners continue to improve in both mechanical
and data transfer speeds.


>  we've got millions of books needing to be scanned, and 300dpi is good enough.
>  the only exception is the rare and fragile book that can only be scanned _once_.

Even if one were to start a "Distributed Scanners" project which is
all volunteer driven and lets everyone decide the quality level they
want to scan, it will still not keep up with OCA and Google. (Now I
may be wrong, but we've had a number of years for such a project to
arise, and we don't see it yet. Are there some dynamics that such a
project would not get started and reach some sort of critical size to
challenge OCA in daily output?)


>  and, as long as you're not saving and resaving continually, .jpg will be just fine...
>  want to see for yourself??  then take a look and compare a .jpg with a .png here:
>
>??  http://z-m-l.com/misc/myantf009.html

And you are mixing up "mastering" with "distribution". I'm not
advocating distribution "to the poverty stricken in the third world"
using the high rez masters. In fact, for bitonal of text only pages,
JPG is *also* wasteful -- there are much better "lossless" and "lossy"
algorithms than PNG and JPG, respectively. DjVu comes to mind. (Yes
some issues regarding proprietary formats, browser viewability, and
such... but if you are talking about "waste" then one has to put it
into proper perspective...)

(And also note that for bitonal images, the difference in size between
PNG and JPG is rather small. In your example we are talking about
73.8k for lossless PNG which these days is already reasonable, and
47.1k for lossy JPG which is not that much smaller. Not a big
difference. Hmmm, I wonder how the two images would OCR using different
OCR packages? Anyone wanting to take Bowerbird's examples and run OCR
on them?)

By your argument, you'd say amateur musicians should convert all their
"recorded masters" (at 96K, lossless WAV) to MP3 since that's all the
world is interested in these days? No, you'd say they should take their
"master" and generate a MP3 for "distribution" and keep the master
laying around. Now with storage media getting so dirt cheap, and
(hopefully) the Internet getting faster and faster, they can now even
consider distributing the "master" as-is so everyone can experience
the highest fidelity in the original recording, if they have the
equipment to bring out the full fidelity.

In fact, I follow online music distribution (most of which is
"unauthorized"), since I am interested in this area. I'm noticing
lately a huge rise in the distribution of lossless FLAC files rather
than lossy MP3. More and more people are ripping CD audio tracks and
converting them to lossless FLAC for distribution -- FLAC exactly
preserves the original digital audio bit-for-bit, while MP3 actually
alters the digital audio and introduces artifacts which for even
192 kbit I can hear on my audio system from blind tests I've done --
and looking at the frequency/time domain maps one really sees the
major distortion MP3 does to the wave. Imagine a future 100 years from
now when the studio masters are gone and all we have are MP3's and
similar lossy formats floating around which have been discombobulated
such that even remastering them is more difficult because the waveform
has been so fundamentally altered. There is hope the MP3 and other
lossy formats will be pushed out of the picture to be replaced by
lossless audio formats. I hope so...  (Btw, typical lossless
compression of digital stereo audio averages around 50%, very similar
to lossless compression of continuous tone, full color digital
images -- also at about 50%.)


>  p.s.?  now, if by some _miracle_, noring has come up with a new point, please,
>  won't someone share it??  because i'm not gonna read his same old shit again.

It would not surprise me if over time there will be new points come up
as to why we should have "mastered" book page scans at archival
quality. The thing is that we *don't know the future.* We already
discovered how decisions made in the mid 90's are now rearing their
ugly heads today in the PG world. Those decisions were made for
expediency, granted, given the constraints of the time, but in nearly
all cases the decisions were not based on a rational requirements
analysis, nor any consideration of future needs and opportunities --
the decisions I saw were pretty ad hoc and did not consider
intermediary alternatives that would have worked then (e.g.,
preservation of accented characters found in many English texts.)


>  p.p.s.?  the funniest thing about jon's "my antonia" project is the _cover_ scan.
>  it's a 50-megabyte file.?  that's right, 50 megabytes!?  for one scan -- the cover!
>  if you want a summary of jon noring's philosophy, that's the best one there is.
>  boy, did i ever feel like a bloomin' idiot after i downloaded _that_...

Yes, so what's your point? That cover image is the *master*. I can
easily generate lower resolution and JPEG versions if I wanted to, and
to come to think of it, should provide a link to a browser viewable
image, probably in JPEG.

The cover to My Antonia is not that interesting anyway, but it is the
book cover to the First Edition (considered the canonical PD version)
of this great classic of American fiction. Subsequent editions (which
are not PD) are considered inferior since some stuff in the
introduction was removed.

Jon Noring


From Bowerbird at aol.com  Sat Oct 27 16:33:19 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 27 Oct 2007 19:33:19 EDT
Subject: [gutvol-d] Comment on My Antonia scan set
Message-ID: <c2f.1f4ca40e.345524bf@aol.com>


i'm not biting at your bait, jon.   i had enough of your merry-go-round years 
ago.

like i said, you did a good job on my antonia, but you've scanned _2_ darn 
books!
even the george w. bush library has 5 books.   (coloring books, but _5_ of 
them.)
your own (lack of) behavior points out that your "ideals" are _not_ 
cost-effective.

i'd much rather have the thousands of crappy scan-sets from d.p. than your 
_2_:
>    http://www.pgdp.org/ols

also excellent are the 400 scan-sets produced by nicholas hodson, which 
people
can locate quite easily by searching the o.c.a. text-file archives for "
athelstane"...
high-enough resolution, carefully cropped and size-standardized, great 
work...
good enough to read, if we must.   but most importantly, good enough for 
o.c.r.,
so as to make small-footprint mini-storage narrow-bandwith light-markup text.

and, while i'm on the topic, in this regard, the "digital reprint" that jose 
menendez
made of "my antonia", at 2.2 megs, is 50 times better than your 30-meg 
scan-set:
>    http://www.ibiblio.org/ebooks/Cather/
faster, cheaper, _and_ higher-quality.   that combo breaks several laws of 
physics.

and that, in the long run, is why high-resolution scan-sets are a waste of 
time:
because once we verify we digitized 'em correctly, we will have little use 
for 'em.
the digital reprints we've created will be _better_ for every conceivable 
purpose.
as michael hart has been telling us all along, "a picture of a book is not a 
book"...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071027/8c870610/attachment.htm 

From jon at noring.name  Sat Oct 27 19:02:08 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 27 Oct 2007 20:02:08 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <c2f.1f4ca40e.345524bf@aol.com>
References: <c2f.1f4ca40e.345524bf@aol.com>
Message-ID: <686943385.20071027200208@noring.name>

Bowerbird wrote:

> like i said, you did a good job on my antonia, but you've scanned _2_ darn books!
> even the george w. bush library has 5 books.?  (coloring books, but _5_ of them.)
> your own (lack of) behavior points out that your "ideals" are _not_ cost-effective.

So how many books have you scanned and have made available online?

Your comment again reminds me of a schoolyard taunt as to whose is
bigger. I won't even dignify your comment with any kind of
explanation.


> also excellent are the 400 scan-sets produced by nicholas hodson, which people
> can locate quite easily by searching the o.c.a. text-file archives for "athelstane"...
> high-enough resolution, carefully cropped and size-standardized, great work...
> good enough to read, if we must.?  but most importantly, good enough for o.c.r.,
> so as to make small-footprint mini-storage narrow-bandwith light-markup text.

Yes, Nicholas does laudable work. He participated in a lot of the
discussion on "Distscan", and he and I had a lot of personal email
exchanges. Even though he and I disagree on several things, I have the
greatest respect for him and his work. And I'm glad his work is being
contributed to OCA. He is very passionate about what he does and what
he believes, and I always admire passionate people who are also polite
towards everyone, as he is.

So, don't bring up individuals and try to say that my comments about
scan quality are a slap in the face of these individuals. That's the
same tired psychological warfare you've used in the past. You don't
really care for these people, in my estimation at least, because you
are *clearly* misusing them as a "ploy" to avoid rational debate. It
is, to be blunt, an ad hominem type of attack.


> and, while i'm on the topic, in this regard, the "digital reprint" that jose menendez
> made of "my antonia", at 2.2 megs, is 50 times better than your 30-meg scan-set:
> ?  http://www.ibiblio.org/ebooks/Cather/
> faster, cheaper, _and_ higher-quality.?  that combo breaks several laws of physics.

Since some will mistaken your comment (which you make clearer below),
let me note that Jose's 2.2 meg PDF is a modern-typeset "reproduction"
of the original, while my 30-meg scan set is simply a set of page scan
images of the original sampled down to whatever they are.

Without the following clarification, what you said would be an
apples/oranges comparison.


> and that, in the long run, is why high-resolution scan-sets are a waste of time:
> because once we verify we digitized 'em correctly, we will have little use for 'em.
> the digital reprints we've created will be _better_ for every conceivable purpose.
> as michael hart has been telling us all along, "a picture of a book is not a book"...

Of course, structured and proofed digital text is where it's at, but
scanned images play a role. There are some very smart people behind
what OCA is doing (especially Brewster), and they feel a need to
achieve almost archival quality in what they do. Many of their book
scans actually exceed 600 dpi (typically they are 400-500 dpi -- using
a fixed camera the resolution actually varies) and are at 24-bit color
depth. And note that they actually distribute these as JPEG2000 images
at this high image quality -- I just grabbed a whole set of a
particular book as JP2 images -- 550 megs -- took one hour while I was
doing something else.)

So, in a sense what you are saying is that the quality they produce
*and* distribute at is pointless. So if I were to decide on who to
listen to, who would I tend to ascribe greater authority? Maybe we
should be asking why OCA itself has chosen this quality? And don't
bring up that tired "well, since they use a camera they may as well
capture it all" argument again -- it is fallacious argument (for
reasons I won't delve into here), plus it doesn't cover the
*distribution* of the hi-rez scan sets which are beyond the needs
of just OCR (and in your estimation are overkill for all purposes.)

(In a shameless name drop, I have had the fortune of personally
talking with Brewster in the past about scan quality, and he is
clearly passionate about this. His comments and arguments to me
actually form a part of what I am advocating here. If he could
master scan all books at 600 dpi/24-bit resolution and save in
lossless format, I believe he would. What mostly keeps him from the
lossless is simply disk space in doing massive quantities of books at
a site -- but he won't compromise any further than that. His choice
for JPEG2000 rather than JPG revolves, I would guess, around it
having better image quality for the same compression, and does not
introduce the same kind of chunky artifacts that JPG does. See the
Wikipedia article on JPEG 2000. As a result, I plan in the future
of still mastering at 600/24/lossless (and pages with graphics at
1200 dpi), but will create JPEG2000 of the scan sets for online
distribution in zip archives -- for most books the zip archives will
fit on a CD-ROM, and will be a slightly lower quality "master"
backup. This does not preclude *also* distributing derivative scan
sets at lower resolution and color depth, including derivatives
optimized for OCR.)

And about Michael's statement, a picture of a book (a set of scanned
pages) is definitely a *book* since there's really no difference (other
than maybe quality) between viewing the original page and a digital
facsimile of it (and such a digital facsimile can itself be printed
out onto paper.) Now, we know what Michael really means by his
statement (and I agree structured and proofed digital text is far
superior), but you are apparently misusing his statement to further
your argument. Maybe Michael will explain what he means by his
statement in regards to this discussion?

*****

Finally, to interject a personal and, yes, emotional note directed
to Bowerbird: Feel very fortunate that you have found a home in
gutvol-d where you can be "yourself." Be thankful that Greg and
Michael cut you a whole lot of slack. On nearly all other groups
I've participated in you would have been thrown out a long time ago
for what I term fostering a hostile discussion environment. For
example, calling whatever someone writes a "merry-go-round" is
exactly an ad-hominem attack on their person. It is, in my opinion,
a form of hate speech and has no place in rational discourse.

Jon Noring


From jon at noring.name  Sat Oct 27 20:20:16 2007
From: jon at noring.name (Jon Noring)
Date: Sat, 27 Oct 2007 21:20:16 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <686943385.20071027200208@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
Message-ID: <1789330137.20071027212016@noring.name>

I previously wrote:


> Finally, to interject a personal and, yes, emotional note directed
> to Bowerbird: Feel very fortunate that you have found a home in
> gutvol-d where you can be "yourself." Be thankful that Greg and
> Michael cut you a whole lot of slack. On nearly all other groups
> I've participated in you would have been thrown out a long time ago
> for what I term fostering a hostile discussion environment. For
> example, calling whatever someone writes a "merry-go-round" is
> exactly an ad-hominem attack on their person. It is, in my opinion,
> a form of hate speech and has no place in rational discourse.

And to add an addendum.

Clearly, everything above is "in my opinion" based on observation. And
since I run quite a few mail-based forums myself (all YahooGroups),
including The eBook Community, I am supportive of those who administer
any forum, even when I may disagree with their policies. Administering
public groups like this is a thankless job since one can never please
all the people all the time -- and sometimes we group administrators/
moderators have to make tough, often-times no-win decisions.

In effect, Greg and Michael are the ones who have defacto control of
this group simply by running the software that administers it. Thus,
they ultimately decide who gets to post and what they allow to be
posted here. They may even deny they have this power, but they do have
this power by default -- if the voluntarily give up this power, they
do so because they have the power. <smile/>

And I may say that such-and-such is "hate speech", or so-and-so is
creating a "hostile discussion environment." But it is not what I say,
but what they say that matters. If many of us don't like how this
group is run to the point where we get nothing out of it, we simply
vote with our feet and leave, maybe even starting a new discussion
group if we find value in the discussion but want a different set of
group policies.

So with that said, I still believe what I wrote previously and
reproduced above. But ultimately all that matters is what Greg and
Michael think. If they decide that Bowerbird can pretty much write and
say what he wants to gutvol-*, then that's the way it is.

Jon Noring


From Bowerbird at aol.com  Sun Oct 28 05:47:31 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 28 Oct 2007 08:47:31 EDT
Subject: [gutvol-d] Comment on My Antonia scan set
Message-ID: <d32.17fbd96a.3455dee3@aol.com>

more posts with this header in my spam folder.

i don't know if noring is attempting to bait me.

but i do know that i'm not biting.   end of story.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071028/335ff4f7/attachment.htm 

From jon at noring.name  Sun Oct 28 06:42:31 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 28 Oct 2007 07:42:31 -0600
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <d32.17fbd96a.3455dee3@aol.com>
References: <d32.17fbd96a.3455dee3@aol.com>
Message-ID: <419590423.20071028074231@noring.name>

Bowerbird wrote:

> more posts with this header in my spam folder.
>  
> i don't know if noring is attempting to bait me.
>  
> but i do know that i'm not biting.?  end of story.

Quote the fisherman.


Yes, end of story.

Now, back to real discussion.


Jon


From Bowerbird at aol.com  Sun Oct 28 09:14:47 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 28 Oct 2007 12:14:47 EDT
Subject: [gutvol-d] z.m.l. and youtube
Message-ID: <c75.1c39797f.34560f77@aol.com>

i'm now supporting embedding of youtube videos into
the .html versions that i auto-convert from a .zml file...

i'm not sure if i can plug them into the .pdf versions or
handle them in my offline viewer, but time will tell me...

i'm not sure if this is a bad thing or a good thing...      :+)
i guess it is what it is...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071028/2b808a69/attachment.htm 

From hart at pglaf.org  Sun Oct 28 10:20:57 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 10:20:57 -0700 (PDT)
Subject: [gutvol-d] Comment on My Antonia scan set
In-Reply-To: <419590423.20071028074231@noring.name>
References: <d32.17fbd96a.3455dee3@aol.com>
	<419590423.20071028074231@noring.name>
Message-ID: <Pine.LNX.4.64.0710281019420.32230@pglaf.org>


Jon Noring makes us wonder about reviving moderation,
just for him. . . .

His folder is full of this sort of thing. . . .


Michael S. Hart
Founder
Project Gutenberg


On Sun, 28 Oct 2007, Jon Noring wrote:

> Bowerbird wrote:
>
>> more posts with this header in my spam folder.
>>
>> i don't know if noring is attempting to bait me.
>>
>> but i do know that i'm not biting.?  end of story.
>
> Quote the fisherman.
>
>
> Yes, end of story.
>
> Now, back to real discussion.
>
>
> Jon
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sun Oct 28 10:24:25 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 10:24:25 -0700 (PDT)
Subject: [gutvol-d] !@!Re:  Comment on My Antonia scan set
In-Reply-To: <1789330137.20071027212016@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
Message-ID: <Pine.LNX.4.64.0710281022550.32230@pglaf.org>


On Sat, 27 Oct 2007, Jon Noring wrote:

> I previously wrote:
>
>
>> Finally, to interject a personal and, yes, emotional note directed
>> to Bowerbird: Feel very fortunate that you have found a home in
>> gutvol-d where you can be "yourself." Be thankful that Greg and
>> Michael cut you a whole lot of slack. On nearly all other groups
>> I've participated in you would have been thrown out a long time ago
>> for what I term fostering a hostile discussion environment. For
>> example, calling whatever someone writes a "merry-go-round" is
>> exactly an ad-hominem attack on their person. It is, in my opinion,
>> a form of hate speech and has no place in rational discourse.

Jon's Noring's remarks are as close to "a form of hate speech"
as anyone's I have read here.

Michael S. Hart
Founder
Project Gutenberg


>
> And to add an addendum.
>
> Clearly, everything above is "in my opinion" based on observation. And
> since I run quite a few mail-based forums myself (all YahooGroups),
> including The eBook Community, I am supportive of those who administer
> any forum, even when I may disagree with their policies. Administering
> public groups like this is a thankless job since one can never please
> all the people all the time -- and sometimes we group administrators/
> moderators have to make tough, often-times no-win decisions.
>
> In effect, Greg and Michael are the ones who have defacto control of
> this group simply by running the software that administers it. Thus,
> they ultimately decide who gets to post and what they allow to be
> posted here. They may even deny they have this power, but they do have
> this power by default -- if the voluntarily give up this power, they
> do so because they have the power. <smile/>
>
> And I may say that such-and-such is "hate speech", or so-and-so is
> creating a "hostile discussion environment." But it is not what I say,
> but what they say that matters. If many of us don't like how this
> group is run to the point where we get nothing out of it, we simply
> vote with our feet and leave, maybe even starting a new discussion
> group if we find value in the discussion but want a different set of
> group policies.
>
> So with that said, I still believe what I wrote previously and
> reproduced above. But ultimately all that matters is what Greg and
> Michael think. If they decide that Bowerbird can pretty much write and
> say what he wants to gutvol-*, then that's the way it is.
>
> Jon Noring
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Sun Oct 28 10:38:11 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 10:38:11 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  Comment on My Antonia scan set
In-Reply-To: <686943385.20071027200208@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
Message-ID: <Pine.LNX.4.64.0710281032540.32230@pglaf.org>


I have gotten to the point where I would prefer Jon Noring NOT
quote me, or pretend to quote me, as correctly below, and I am
going to as him now, here, in public, never to quote me again,
on or off this list, as is my right by copyright.

I would prefer Mr. Noring actually never mention me, directly,
or indirectly, or forward my messages without permission.

Sorry Jon, but you've just overdone it this week.

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Sat, 27 Oct 2007, Jon Noring wrote:

> (In a shameless name drop, I have had the fortune of personally 
> talking with Brewster in the past about scan quality, and he is 
> clearly passionate about this. His comments and arguments to me 
> actually form a part of what I am advocating here. If he could 
> master scan all books at 600 dpi/24-bit resolution and save in 
> lossless format, I believe he would. What mostly keeps him from 
> the lossless is simply disk space in doing massive quantities 
> of books at a site -- but he won't compromise any further than 
> that. His choice for JPEG2000 rather than JPG revolves, I would 
> guess, around it having better image quality for the same 
> compression, and does not introduce the same kind of chunky 
> artifacts that JPG does. See the Wikipedia article on JPEG 
> 2000. As a result, I plan in the future of still mastering at 
> 600/24/lossless (and pages with graphics at 1200 dpi), but will 
> create JPEG2000 of the scan sets for online distribution in zip 
> archives -- for most books the zip archives will fit on a 
> CD-ROM, and will be a slightly lower quality "master" backup. 
> This does not preclude *also* distributing derivative scan sets 
> at lower resolution and color depth, including derivatives 
> optimized for OCR.)
>
> And about Michael's statement, a picture of a book (a set of 
> scanned pages) is definitely a *book* since there's really no 
> difference (other than maybe quality) between viewing the 
> original page and a digital facsimile of it (and such a digital 
> facsimile can itself be printed out onto paper.) Now, we know 
> what Michael really means by his statement (and I agree 
> structured and proofed digital text is far superior), but you 
> are apparently misusing his statement to further your argument. 
> Maybe Michael will explain what he means by his statement in 
> regards to this discussion?
>
> *****
>
> Finally, to interject a personal and, yes, emotional note 
> directed to Bowerbird: Feel very fortunate that you have found 
> a home in gutvol-d where you can be "yourself." Be thankful 
> that Greg and Michael cut you a whole lot of slack. On nearly 
> all other groups I've participated in you would have been 
> thrown out a long time ago for what I term fostering a hostile 
> discussion environment. For example, calling whatever someone 
> writes a "merry-go-round" is exactly an ad-hominem attack on 
> their person. It is, in my opinion, a form of hate speech and 
> has no place in rational discourse.
>
> Jon Noring

From hart at pglaf.org  Sun Oct 28 11:11:55 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 11:11:55 -0700 (PDT)
Subject: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
In-Reply-To: <1789330137.20071027212016@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
Message-ID: <Pine.LNX.4.64.0710281049010.32230@pglaf.org>


My apologies, my previous attempt at replying didn't work out.

Trying again below, perhaps with more patience.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Sat, 27 Oct 2007, Jon Noring wrote:

> I previously wrote:
>
>
>> Finally, to interject a personal and, yes, emotional note 
>> directed to Bowerbird: Feel very fortunate that you have found 
>> a home in gutvol-d where you can be "yourself." Be thankful 
>> that Greg and Michael cut you a whole lot of slack. On nearly 
>> all other groups I've participated in you would have been 
>> thrown out a long time ago for what I term fostering a hostile 
>> discussion environment. For example, calling whatever someone 
>> writes a "merry-go-round" is exactly an ad-hominem attack on 
>> their person. It is, in my opinion, a form of hate speech and 
>> has no place in rational discourse.

As we have commented before, Mr. Noring is certainly one of the
top people  whom "Greg and Michael cut you a whole lot of slack."

Mr. Noring seems as guilty of writing merry-go-rounds and/or
hate speech as much as anyone.  If anyone is going to be left
off this list, you can be sure Mr. Noring will be among them.

His baiting of Mr. Bowerbird, and of myself, is the mere work
of an apprentice baiter. . .not even up to journeyman level.

Yet I am sure Mr. Noring would elevate his words beyond that.

Enough said, obviously more than enough words to the wise and
thus not expected to reach Mr. Noring or his loyal opposition
as it were.


> And to add an addendum.
>
> Clearly, everything above is "in my opinion" based on 
> observation. And since I run quite a few mail-based forums 
> myself (all YahooGroups), including The eBook Community, I am 
> supportive of those who administer any forum, even when I may 
> disagree with their policies. Administering public groups like 
> this is a thankless job since one can never please all the 
> people all the time -- and sometimes we group administrators/ 
> moderators have to make tough, often-times no-win decisions.

This is one reason why we don't do moderation here.

Another is the obvious misuse of political powers--
as so often requested by Mr. Noring--both in front,
where people can see it, and behind the scenes from
perspectives hidden from the normal list members.

I am sure all the list personnel know Mr. Noring is
a person with an agenda who wishes other agendas to
be silenced while his own goes forwards.


> In effect, Greg and Michael are the ones who have defacto 
> control of this group simply by running the software that 
> administers it. Thus, they ultimately decide who gets to post 
> and what they allow to be posted here. They may even deny they 
> have this power, but they do have this power by default -- if 
> the voluntarily give up this power, they do so because they 
> have the power. <smile/>

It is only those who desire such power who consider
this sort of thing.

Gladly we have plenty of support to keep moderators
out of the fray, unlike the other lists Mr. Noring,
and others, have lobbied for such power.

Even before these recent outbursts we have noticed,
and commented on, Mr. Noring's apalling behavior.


> And I may say that such-and-such is "hate speech", or so-and-so 
> is creating a "hostile discussion environment." But it is not 
> what I say, but what they say that matters. If many of us don't 
> like how this group is run to the point where we get nothing 
> out of it, we simply vote with our feet and leave, maybe even 
> starting a new discussion group if we find value in the 
> discussion but want a different set of group policies.

Mr. Noring is striking out at himself here, as much as anyone.

As most of us here are aware, and also other servers, Noring's 
speech is as much "hate speech" as anyone's, largely ignored-- 
but saved in the archives for future reference.  Anyone should
be able to trace Mr. Noring's comments for themselves, and see
the trends, over the years.  A "hostile discussin environment"
usually follows Mr. Noring's agenda, rather than the opposite.


> So with that said, I still believe what I wrote previously and 
> reproduced above. But ultimately all that matters is what Greg 
> and Michael think. If they decide that Bowerbird can pretty 
> much write and say what he wants to gutvol-*, then that's the 
> way it is.

Mr. Noring appears to by saying, here and elsewhere, that his
right to free speech trumps everyone else's rights.

It is all to obvious that Mr. Noring and a few others baited,
and continute to bait, Mr. Bowerbird and others, to create an
environment in which he could claim hostility.

This is not working here, and it should not work elsewhere.

> Jon Noring


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


From marcello at perathoner.de  Sun Oct 28 11:47:20 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 28 Oct 2007 19:47:20 +0100
Subject: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
In-Reply-To: <Pine.LNX.4.64.0710281049010.32230@pglaf.org>
References: <c2f.1f4ca40e.345524bf@aol.com>	<686943385.20071027200208@noring.name>	<1789330137.20071027212016@noring.name>
	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
Message-ID: <4724D938.2090009@perathoner.de>

Michael Hart wrote:

> It is all to obvious that Mr. Noring and a few others baited,
> and continute to bait, Mr. Bowerbird and others, to create an
> environment in which he could claim hostility.

There are three persons on this list you should not take seriously.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Sun Oct 28 13:01:40 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 28 Oct 2007 14:01:40 -0600
Subject: [gutvol-d] Amazing! <smile/>
In-Reply-To: <Pine.LNX.4.64.0710281049010.32230@pglaf.org>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
Message-ID: <1253347043.20071028140140@noring.name>

In reply to Michael's four detailed messages this morning where I am
the topic of discussion:


Michael and I certainly have quite different perspectives regarding
general world-view (politics, economics, etc.) as well as specific
issues in the realm of digitizing the Public Domain.

For example, we recently contributed differing viewpoints on an
ongoing discussion at Book People regarding book pricing ("where does
the money go"). Some gutvol-ers here who do not subscribe to the Book
People mailing list may be interested in that discussion -- refer to
the BP archive at:

   http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2007

and look for the various messages which include "Book Price Inflation"
in the subject header field. Quite a few people have contributed to
that discussion.

Despite our very major differences, I greatly respect Michael for what
he has observably accomplished over the years. He will be written up
in the history books for his true accomplishments and for various
predictions of his which eventually come true. I've said this many
times because it is a fact -- not because I'm trying to curry favor.

So of course being specially singled out by Michael, who along with
Greg runs this group, for a couple messages I posted last night,
whether deserved or not as others here will decide for themselves,
definitely did not make my day!...

...I spent some time writing up something to go in this spot, to
continue with the above train of thought, and of course to provide my
own perspective. But it would have ended up resaying what I've said
before, and make this message overly long. Those here who are even
following this discussion have pretty much already made up their minds
on a number of issues and people. So for each reader what I write
would either be a futile exercise at convincing, or a preaching to the
choir. And of course, a diversion from rational, respectful, and
cordial discussion on topics of interest to the PG community.

Jon Noring


From hart at pglaf.org  Sun Oct 28 13:19:36 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 13:19:36 -0700 (PDT)
Subject: [gutvol-d] Amazing! <smile/>
In-Reply-To: <1253347043.20071028140140@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
	<1253347043.20071028140140@noring.name>
Message-ID: <Pine.LNX.4.64.0710281305280.32230@pglaf.org>


Of course, Mr. Noring is leaving out that much of my replies
to his ranting and raving were not passed by the moderator--
and I can't understand why Mr. Noring's were passed.

Hence, I presume that is why he recommends you reading there
because his words were passed and mind were not.

I should perhaps also mention that a private message send in
a private manner to Mr. Noring and not to the moderator, was
still forwarded by Mr. Noring to the moderator.

Mr. Noring will probably tell you he didn't see the header--
indicating it was NOT a listserver message. . .an error even
the moderator was quick to point out.


On Sun, 28 Oct 2007, Jon Noring wrote:

> In reply to Michael's four detailed messages this morning where 
> I am the topic of discussion:
>
>
> Michael and I certainly have quite different perspectives 
> regarding general world-view (politics, economics, etc.) as 
> well as specific issues in the realm of digitizing the Public 
> Domain.
>
> For example, we recently contributed differing viewpoints on an 
> ongoing discussion at Book People regarding book pricing 
> ("where does the money go"). Some gutvol-ers here who do not 
> subscribe to the Book People mailing list may be interested in 
> that discussion -- refer to the BP archive at:
>
> 
> http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2007
>
> and look for the various messages which include "Book Price 
> Inflation" in the subject header field. Quite a few people have 
> contributed to that discussion.
>
> Despite our very major differences, I greatly respect Michael 
> for what he has observably accomplished over the years. He will 
> be written up in the history books for his true accomplishments 
> and for various predictions of his which eventually come true. 
> I've said this many times because it is a fact -- not because 
> I'm trying to curry favor.

With respect such as received from Mr. Noring, no disrespect is
ever going to be needed.


> So of course being specially singled out by Michael, who along 
> with Greg runs this group, for a couple messages I posted last 
> night, whether deserved or not as others here will decide for 
> themselves, definitely did not make my day!...

Mr. Noring simply cannot admit this group has no moderation,
that no one runs it at all, and never has.

Ooops!!!

Other than one time I can remember when, at Mr. Noring's request
with a few others, one person was "moderated" for a time.

That would make Mr. Noring the one who ran it the most. . . .
In my own personal opinion, since no one else ever got anyone
blacklisted. . .and if Mr. Noring gets himself blacklisted in
the same respect. . .it will again be his own doing, as he is
in receipt of plenty of warning.


> ...I spent some time writing up something to go in this spot, 
> to continue with the above train of thought, and of course to 
> provide my own perspective. But it would have ended up resaying 
> what I've said before, and make this message overly long. Those 
> here who are even following this discussion have pretty much 
> already made up their minds on a number of issues and people. 
> So for each reader what I write would either be a futile 
> exercise at convincing, or a preaching to the choir. And of 
> course, a diversion from rational, respectful, and cordial 
> discussion on topics of interest to the PG community.

If only Mr. Noring would say and do the same everywhere. . . .


> Jon Noring


Michael

PS  I am hoping to be too busy to reply to Mr. Noring again
for some time, so he is welcome to dig his hole further.


From hart at pglaf.org  Sun Oct 28 13:31:48 2007
From: hart at pglaf.org (Michael Hart)
Date: Sun, 28 Oct 2007 13:31:48 -0700 (PDT)
Subject: [gutvol-d] z.m.l. and youtube
In-Reply-To: <c75.1c39797f.34560f77@aol.com>
References: <c75.1c39797f.34560f77@aol.com>
Message-ID: <Pine.LNX.4.64.0710281331330.3375@pglaf.org>


What will happen when YouTube kills the vids?


On Sun, 28 Oct 2007, Bowerbird at aol.com wrote:

> i'm now supporting embedding of youtube videos into
> the .html versions that i auto-convert from a .zml file...
>
> i'm not sure if i can plug them into the .pdf versions or
> handle them in my offline viewer, but time will tell me...
>
> i'm not sure if this is a bad thing or a good thing...      :+)
> i guess it is what it is...
>
> -bowerbird
>
>
>
> **************************************
> See what's new at http://www.aol.com
>

From jon at noring.name  Sun Oct 28 13:52:08 2007
From: jon at noring.name (Jon Noring)
Date: Sun, 28 Oct 2007 14:52:08 -0600
Subject: [gutvol-d] Amazing! <smile/>
In-Reply-To: <Pine.LNX.4.64.0710281305280.32230@pglaf.org>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
	<1253347043.20071028140140@noring.name>
	<Pine.LNX.4.64.0710281305280.32230@pglaf.org>
Message-ID: <1711258186.20071028145208@noring.name>

Michael wrote:

> Of course, Mr. Noring is leaving out that much of my replies
> to his ranting and raving were not passed by the moderator--
> and I can't understand why Mr. Noring's were passed.

Well, here's the URL to the message in that thread which I believe
has the highest level of "ranting and raving". I'll let the others
here on gutvol-d decide on the R&R level on a scale from 0 to 10:

   http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2007&post=2007-10-04,5

Does it qualify as a 10?

I posted other rants and raves, too, as reference to the 2007 archive
messages will show:

   http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2007

Yes, I'm just a no good ranter and raver. <smile/>


> Hence, I presume that is why he recommends you reading there
> because his words were passed and mind were not.

Actually, I had two or three messages on this topic filtered out by
John Mark Ockerbloom as well. You may not realize this, Michael, but
John has disallowed a number of my messages, too. I thank him for his
moderation of the list. Am I sometimes disappointed? Certainly. But
that's how he runs his group. Overall discussion is pretty good there.

Would I run BP that way? Probably not. Just as Michael and Greg set
the policy and guidelines for gutvol-*, so Mr. Ockerbloom has the
right to set the guidelines for his group. We can complain, and say it
is short-sighted, but that's the way it goes.


The final point is that Bowerbird (I believe), started the YahooGroup
"bpsuper" to be a place where messages rejected for Book People could
be posted. When the group started, John Mark Ockerbloom actually
allowed an announcement of that group, and it would not surprise me if
he'd allow another announcement. This is a way that rejected messages
could still be seen, and archived at Google. Here's that's group URL:

   http://groups.yahoo.com/group/bpsuper/


Btw, this brings up the issue that the archives for gutvol-d are not
open to the public, therefore they are not archived by Google. Is this
the intent?

Jon Noring


From Bowerbird at aol.com  Sun Oct 28 14:36:01 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 28 Oct 2007 17:36:01 EDT
Subject: [gutvol-d] z.m.l. and youtube
Message-ID: <c6e.1b548301.34565ac1@aol.com>

michael said:
>    What will happen when YouTube kills the vids?

i'm not sure, since i don't know exactly how they'll do it.

in general, my inclination as to what i'll do when a file is
requested from the internet and not delivered will be to
inform the user that the requested file was not delivered.

but it could vary.   photobucket, for example, sends out a
graphic that says "this account has exceeded its bandwidth"
when that happens, instead of sending the requested photo.

a bigger problem is when the file which _was_ at a u.r.l.
is replaced by another file at the same u.r.l., which means
your readers won't get the file you intended them to get...

this is an insolvable problem with the internet as a whole.
ted nelson is right that such shifting sands are quicksand.

for this reason, and also because "hotlinking" pisses off
some people, i recommend only using files in your control,
either because you (a) bundle them with your document, or
(b) store them at a web-location which _you_ fully control...

http://www.zamzar.com is a free file-conversion website
which will convert youtube videos into your chosen format.
whether it's legal to repurpose the vids is another question.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071028/35ef4492/attachment.htm 

From Bowerbird at aol.com  Sun Oct 28 15:48:51 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 28 Oct 2007 18:48:51 EDT
Subject: [gutvol-d] academic digitization
Message-ID: <d50.163ac575.34566bd3@aol.com>

let's see, first duguid -- the academic who wrote the "first monday" piece
that, among other things, was critical of the project gutenberg version of
"tristram shandy" -- later wrote another one equally critical of google's 
scan
of that same book, and their operations in general...

then there was a post about the second article over on the o'reilly blogs:
>    http://radar.oreilly.com/archives/2007/08/the_google_exch.html#comments

i made several comments there, including one critical of this white-paper:
>    http://www.clir.org/activities/details/lsdi.pdf
which someone had recommended.   i found it clueless, in content and form.

i reformatted the white-paper, into z.m.l., as a demonstration...

i have now auto-converted the z.m.l. file into .pdf format...

the z.m.l. file is here:
>    http://z-m-l.com/oyayr/oyayr.zml

the auto-generated .pdf file is here:
>    http://z-m-l.com/oyayr/oya-sunday.pdf

over 120 footnotes in this baby.   i even added a couple of footnotes
onto existing footnotes, to show david starner that it's no big deal...
(as is often the case with him, i couldn't figure out for the life of me
why he thought it would be hard.)   you'll find them at the very end.

i also put in some .pdf links in my table-of-contents, so if you click
the little boxes at the extreme right, you'll open up the original .pdf
to the appropriate section, assuming you name it "oya-lsdi.pdf" and
put it in the same folder as my .pdf.   just a little thing to amuse me...

oh yeah, and once again, the footnotes are in pop-up boxes that are
displayed when you mouseover the "note" icon in the right-hard margin.
(if you click the footnote number, you'll jump to the end-note section,
to the exact page that contains that note.   clicking the footnote number
there jumps back to the referent in the body of the text, as you'd expect.)

notes in pop-ups have been one of jon noring's _favorite_horses_ on his
merry-go-round, so if he downloaded my earlier "test-suite" .pdf and
checked it out and made a post on it, i'm sure he's already mentioned it...
yep, jon, i did pop-ups _just_for_you_...   (not really, but it sounds good.)

anyway, if anyone has any comments on this .pdf -- either specific to it or
more general reactions to the auto-conversion as a whole -- i welcome it.

i'm quite satisfied all the functionality that needs to be included _is_ 
there,
but if anyone has any requests, i'd love to entertain them.   plus if anyone
has any suggestions about enhancing the _beauty_ of the beast, speak up.

as with the earlier test-suite .pdf, i used the butt-ugly helvetica font and
all of the links have an obnoxious black rectangle so you can't miss 'em...

so you don't have to comment on those things.   but anything else is open.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071028/844c67d9/attachment.htm 

From gbnewby at pglaf.org  Sun Oct 28 17:23:56 2007
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 28 Oct 2007 17:23:56 -0700
Subject: [gutvol-d] Metadiscussion (Re:  Amazing! <smile/>)
In-Reply-To: <1711258186.20071028145208@noring.name>
References: <c2f.1f4ca40e.345524bf@aol.com>
	<686943385.20071027200208@noring.name>
	<1789330137.20071027212016@noring.name>
	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
	<1253347043.20071028140140@noring.name>
	<Pine.LNX.4.64.0710281305280.32230@pglaf.org>
	<1711258186.20071028145208@noring.name>
Message-ID: <20071029002356.GA6330@mail.pglaf.org>

On Sun, Oct 28, 2007 at 02:52:08PM -0600, Jon Noring wrote:
> Michael wrote:
> ...
> Would I run BP that way? Probably not. Just as Michael and Greg set
> the policy and guidelines for gutvol-*, so Mr. Ockerbloom has the
> right to set the guidelines for his group. We can complain, and say it
> is short-sighted, but that's the way it goes.

I wasn't really following this thread, but saw Michael's responses and
see the thread has partially turned into a meta-discussion concerning
gutvol-d.  A few thoughts on this...

As many people on the list can recall, the list is non-moderated by
choice.  Everyone is encouraged to use their own judgement about which
threads to follow, which email addresses to filter, etc.  Because there
is a generally high level of technical competency on the list (and
willingness of list members to offer help!), we expect people can can
handle their own email preferences, filtering, etc.

Casting non-moderation as a policy is accurate.

Seeking deeper meaning or symbolism in non-moderation of the
PGLAF-hosted lists is more hazardous.  This is because Project Gutenberg
as an effort is much more than the gutvol-d list and other lists
(contrary to the BP list, which to my knowledge isn't in place to
support any particular centralized effort).

It's also because this list isn't policy-making for PG, though of course
it often provides wonderful input for policy.

  -- Greg

From joshua at hutchinson.net  Mon Oct 29 06:24:22 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Mon, 29 Oct 2007 13:24:22 +0000 (UTC)
Subject: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
Message-ID: <4734168.1193664262993.JavaMail.?@fh1064.dia.cp.net>

NOTE: I apologize for this post in advance.  It is completely off-topic 
for the list, but I don't feel I can stand by without calling foul.  
This will be my last comment on the matter.

***

Ok, this is getting ridiculous.

Noring, while often painfully verbose and long-winded (even he admits 
that at times), he is almost always unfailingly polite.  The only 
person he ever argues with is BB.  And BB does everything he can to 
push Noring's buttons (and quite a few other people).  I long ago 
learned to shunt BB's message straight to a kill file because my blood 
pressure just couldn't take him.

Michael, you've made no effort to hide the fact that you flat out 
don't like Noring.  Fine. You've made the same thing clear with me in 
the past, too.  Again fine.  But don't try to say Noring is using hate 
speech.

Aside to Jon: I strongly recommend putting BB in your kill file.  Your 
frustration level will thank you, the rest of the list will thank you 
and perhaps we can have more constructive conversations around here 
because of it.

Again, to everyone else, I'm sorry for the off-topic post and have a 
nice day.

Josh


>----Original Message----
>From: hart at pglaf.org
>Date: Oct 28, 2007 13:11 
>To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
org>
>Subj: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
>
>
>My apologies, my previous attempt at replying didn't work out.
>
>Trying again below, perhaps with more patience.
>
>
>Thanks!!!
>
>Michael S. Hart
>Founder
>Project Gutenberg
>
>
>On Sat, 27 Oct 2007, Jon Noring wrote:
>
>> I previously wrote:
>>
>>
>>> Finally, to interject a personal and, yes, emotional note 
>>> directed to Bowerbird: Feel very fortunate that you have found 
>>> a home in gutvol-d where you can be "yourself." Be thankful 
>>> that Greg and Michael cut you a whole lot of slack. On nearly 
>>> all other groups I've participated in you would have been 
>>> thrown out a long time ago for what I term fostering a hostile 
>>> discussion environment. For example, calling whatever someone 
>>> writes a "merry-go-round" is exactly an ad-hominem attack on 
>>> their person. It is, in my opinion, a form of hate speech and 
>>> has no place in rational discourse.
>
>As we have commented before, Mr. Noring is certainly one of the
>top people  whom "Greg and Michael cut you a whole lot of slack."
>
>Mr. Noring seems as guilty of writing merry-go-rounds and/or
>hate speech as much as anyone.  If anyone is going to be left
>off this list, you can be sure Mr. Noring will be among them.
>
>His baiting of Mr. Bowerbird, and of myself, is the mere work
>of an apprentice baiter. . .not even up to journeyman level.
>
>Yet I am sure Mr. Noring would elevate his words beyond that.
>
>Enough said, obviously more than enough words to the wise and
>thus not expected to reach Mr. Noring or his loyal opposition
>as it were.
>
>
>> And to add an addendum.
>>
>> Clearly, everything above is "in my opinion" based on 
>> observation. And since I run quite a few mail-based forums 
>> myself (all YahooGroups), including The eBook Community, I am 
>> supportive of those who administer any forum, even when I may 
>> disagree with their policies. Administering public groups like 
>> this is a thankless job since one can never please all the 
>> people all the time -- and sometimes we group administrators/ 
>> moderators have to make tough, often-times no-win decisions.
>
>This is one reason why we don't do moderation here.
>
>Another is the obvious misuse of political powers--
>as so often requested by Mr. Noring--both in front,
>where people can see it, and behind the scenes from
>perspectives hidden from the normal list members.
>
>I am sure all the list personnel know Mr. Noring is
>a person with an agenda who wishes other agendas to
>be silenced while his own goes forwards.
>
>
>> In effect, Greg and Michael are the ones who have defacto 
>> control of this group simply by running the software that 
>> administers it. Thus, they ultimately decide who gets to post 
>> and what they allow to be posted here. They may even deny they 
>> have this power, but they do have this power by default -- if 
>> the voluntarily give up this power, they do so because they 
>> have the power. <smile/>
>
>It is only those who desire such power who consider
>this sort of thing.
>
>Gladly we have plenty of support to keep moderators
>out of the fray, unlike the other lists Mr. Noring,
>and others, have lobbied for such power.
>
>Even before these recent outbursts we have noticed,
>and commented on, Mr. Noring's apalling behavior.
>
>
>> And I may say that such-and-such is "hate speech", or so-and-so 
>> is creating a "hostile discussion environment." But it is not 
>> what I say, but what they say that matters. If many of us don't 
>> like how this group is run to the point where we get nothing 
>> out of it, we simply vote with our feet and leave, maybe even 
>> starting a new discussion group if we find value in the 
>> discussion but want a different set of group policies.
>
>Mr. Noring is striking out at himself here, as much as anyone.
>
>As most of us here are aware, and also other servers, Noring's 
>speech is as much "hate speech" as anyone's, largely ignored-- 
>but saved in the archives for future reference.  Anyone should
>be able to trace Mr. Noring's comments for themselves, and see
>the trends, over the years.  A "hostile discussin environment"
>usually follows Mr. Noring's agenda, rather than the opposite.
>
>
>> So with that said, I still believe what I wrote previously and 
>> reproduced above. But ultimately all that matters is what Greg 
>> and Michael think. If they decide that Bowerbird can pretty 
>> much write and say what he wants to gutvol-*, then that's the 
>> way it is.
>
>Mr. Noring appears to by saying, here and elsewhere, that his
>right to free speech trumps everyone else's rights.
>
>It is all to obvious that Mr. Noring and a few others baited,
>and continute to bait, Mr. Bowerbird and others, to create an
>environment in which he could claim hostility.
>
>This is not working here, and it should not work elsewhere.
>
>> Jon Noring
>
>
>Thanks!!!
>
>Michael S. Hart
>Founder
>Project Gutenberg
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d at lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From hart at pglaf.org  Mon Oct 29 09:40:13 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 09:40:13 -0700 (PDT)
Subject: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
In-Reply-To: <4734168.1193664262993.JavaMail.?@fh1064.dia.cp.net>
References: <4734168.1193664262993.JavaMail.?@fh1064.dia.cp.net>
Message-ID: <Pine.LNX.4.64.0710290904200.18344@pglaf.org>


On Mon, 29 Oct 2007, joshua at hutchinson.net wrote:

> NOTE: I apologize for this post in advance.  It is completely 
> off-topic for the list, but I don't feel I can stand by without 
> calling foul. This will be my last comment on the matter.
>
> ***
>
> Ok, this is getting ridiculous.
>
> Noring, while often painfully verbose and long-winded (even he 
> admits that at times), he is almost always unfailingly polite. 
> The only person he ever argues with is BB.  And BB does 
> everything he can to push Noring's buttons (and quite a few 
> other people).  I long ago learned to shunt BB's message 
> straight to a kill file because my blood pressure just couldn't 
> take him.
>
> Michael, you've made no effort to hide the fact that you flat 
> out don't like Noring.  Fine. You've made the same thing clear 
> with me in the past, too.  Again fine.  But don't try to say 
> Noring is using hate speech.

Mr. Noring pushes more buttons here and elsewhere as anyone,
except, as you say, perhaps Mr. Bowerbird.

Just because they are not YOUR buttons is not the criterion.


Mr. Noring claims the protection of free speech for himself,
while denying it to the world at large. . .out of bounds.

Mr. Noring claims Dr. Newby and myself run things to a kind
of advantage for Mr. Bowerbird and disadvantage to himself,
again. . .out of bounds.


What Mr. Noring really has wanted is to take over Gutenberg
by creating a Board of Directors to his own liking, and the
top priority of his proposed administration was money, even
to the point of accusing me of being in it for the money.

This has not been forgotten.

I should perhaps add here that I still have not been paid a
single salary check for the last 4.5 years, or so, but I do
get some office expenses, perhaps half of the entitled in a
similar period of time.

I seriously doubt anyone else here would work as hard for a
similar period on a career that was not paying any salary.

Mr. Noring's points are rarely, if ever, about getting more
eBooks to more people out there in the world without books,
at least to the degree most of us are used to.

One reason we keep up the freedom of speech is to find out,
in this case, whether the policies of Mr. Noring or those a
policy of Mr. Bowerbird will work better, impartially.

I warn Mr. Bowerbird as often as I warn Mr. Noring, with an
apparently greater effect, as I don't have to keep warning,
eventually in public, to calm him down.

Mr. Noring has had some success in getting Mr. Bowerbird or
others censored, here and/or on other listserverers, and he
understands that certain semantic tricks usually work on an
ordinary Moderator.

He is frustrated that neither Dr. Newby nor myself are some
of those his tricks work on.

By the way, I think you will find a pattern to Mr. Noring's
and others' attempts to start flame wars, if you look.


> Aside to Jon: I strongly recommend putting BB in your kill 
> file.  Your frustration level will thank you, the rest of the 
> list will thank you and perhaps we can have more constructive 
> conversations around here because of it.

Josh, I hate to put it this way but Mr. Noring likes his own
voice more than anyone else's, and will use Mr. Bowerbird or
anyone else as an excuse to voice the same lines over again.

He doesnt' WANT an excuse to avoid Mr. Bowerbird or others a
trolling for flame wars might work on.

Mr. Noring's comments cause flame wars, incite censorship to
a greater degree, etc. . .just not censorship here.

As I said before, if Mr. Noring's attempts do cause any kind
of censorship, he will be among the first batch to go.

However, in regards to his previous years of such incitement
I have noticed a tone that leads me to think that censorship
of himself and Mr. Bowerbird would be regarded as exchanging
queens, to employ a game theory analogy, to his benefit.

Personally, I think of Mr. Noring as a bellweather warning a
few of us when his ranting and raving get much more support,
warning us of his potential to sway the minds of others.

Mr. Bowerbird, on the other hand, cannot be accused of such;
he is obviously not trying to curry favor with anyone.  I am
including myself, because I find his attitude as annoying as
the rest of you, I am just more of the kind to take that old
and new advice just given here, to ignore him.

If everyone here ignored Mr. Bowerbird and Mr. Noring we all
would be much calmer.  Mr. Bowerbird at least has a product,
one we could possibly take advantage of without taking those
attitudinal qualities along with. . .a product that might be
used to get a lot more eBooks to a lot more people, if those
claims he makes are true.  On the other hand, I'm not sure a
wide implementation of Mr. Noring's suggestions would have a
similar possible effect.

However, I don't silence either one of them, even when these
events take place as they have this week, which I see as the
ploy of Mr. Noring more than of Mr. Bowerbird.

Nevertheless, Mr. Noring's emails will continue to be passed
through to our membership, though it has been suggested that
his messages be put into a weekly digest to calm the waters,
and thus avoid his attempts to incite flames.

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


> Again, to everyone else, I'm sorry for the off-topic post and 
> have a nice day.
>
> Josh
>
>
>> ----Original Message----
>> From: hart at pglaf.org
>> Date: Oct 28, 2007 13:11
>> To: "Project Gutenberg Volunteer Discussion"<gutvol-d at lists.pglaf.
> org>
>> Subj: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
>>
>>
>> My apologies, my previous attempt at replying didn't work out.
>>
>> Trying again below, perhaps with more patience.
>>
>>
>> Thanks!!!
>>
>> Michael S. Hart
>> Founder
>> Project Gutenberg
>>
>>
>> On Sat, 27 Oct 2007, Jon Noring wrote:
>>
>>> I previously wrote:
>>>
>>>
>>>> Finally, to interject a personal and, yes, emotional note
>>>> directed to Bowerbird: Feel very fortunate that you have found
>>>> a home in gutvol-d where you can be "yourself." Be thankful
>>>> that Greg and Michael cut you a whole lot of slack. On nearly
>>>> all other groups I've participated in you would have been
>>>> thrown out a long time ago for what I term fostering a hostile
>>>> discussion environment. For example, calling whatever someone
>>>> writes a "merry-go-round" is exactly an ad-hominem attack on
>>>> their person. It is, in my opinion, a form of hate speech and
>>>> has no place in rational discourse.
>>
>> As we have commented before, Mr. Noring is certainly one of the
>> top people  whom "Greg and Michael cut you a whole lot of slack."
>>
>> Mr. Noring seems as guilty of writing merry-go-rounds and/or
>> hate speech as much as anyone.  If anyone is going to be left
>> off this list, you can be sure Mr. Noring will be among them.
>>
>> His baiting of Mr. Bowerbird, and of myself, is the mere work
>> of an apprentice baiter. . .not even up to journeyman level.
>>
>> Yet I am sure Mr. Noring would elevate his words beyond that.
>>
>> Enough said, obviously more than enough words to the wise and
>> thus not expected to reach Mr. Noring or his loyal opposition
>> as it were.
>>
>>
>>> And to add an addendum.
>>>
>>> Clearly, everything above is "in my opinion" based on
>>> observation. And since I run quite a few mail-based forums
>>> myself (all YahooGroups), including The eBook Community, I am
>>> supportive of those who administer any forum, even when I may
>>> disagree with their policies. Administering public groups like
>>> this is a thankless job since one can never please all the
>>> people all the time -- and sometimes we group administrators/
>>> moderators have to make tough, often-times no-win decisions.
>>
>> This is one reason why we don't do moderation here.
>>
>> Another is the obvious misuse of political powers--
>> as so often requested by Mr. Noring--both in front,
>> where people can see it, and behind the scenes from
>> perspectives hidden from the normal list members.
>>
>> I am sure all the list personnel know Mr. Noring is
>> a person with an agenda who wishes other agendas to
>> be silenced while his own goes forwards.
>>
>>
>>> In effect, Greg and Michael are the ones who have defacto
>>> control of this group simply by running the software that
>>> administers it. Thus, they ultimately decide who gets to post
>>> and what they allow to be posted here. They may even deny they
>>> have this power, but they do have this power by default -- if
>>> the voluntarily give up this power, they do so because they
>>> have the power. <smile/>
>>
>> It is only those who desire such power who consider
>> this sort of thing.
>>
>> Gladly we have plenty of support to keep moderators
>> out of the fray, unlike the other lists Mr. Noring,
>> and others, have lobbied for such power.
>>
>> Even before these recent outbursts we have noticed,
>> and commented on, Mr. Noring's apalling behavior.
>>
>>
>>> And I may say that such-and-such is "hate speech", or so-and-so
>>> is creating a "hostile discussion environment." But it is not
>>> what I say, but what they say that matters. If many of us don't
>>> like how this group is run to the point where we get nothing
>>> out of it, we simply vote with our feet and leave, maybe even
>>> starting a new discussion group if we find value in the
>>> discussion but want a different set of group policies.
>>
>> Mr. Noring is striking out at himself here, as much as anyone.
>>
>> As most of us here are aware, and also other servers, Noring's
>> speech is as much "hate speech" as anyone's, largely ignored--
>> but saved in the archives for future reference.  Anyone should
>> be able to trace Mr. Noring's comments for themselves, and see
>> the trends, over the years.  A "hostile discussin environment"
>> usually follows Mr. Noring's agenda, rather than the opposite.
>>
>>
>>> So with that said, I still believe what I wrote previously and
>>> reproduced above. But ultimately all that matters is what Greg
>>> and Michael think. If they decide that Bowerbird can pretty
>>> much write and say what he wants to gutvol-*, then that's the
>>> way it is.
>>
>> Mr. Noring appears to by saying, here and elsewhere, that his
>> right to free speech trumps everyone else's rights.
>>
>> It is all to obvious that Mr. Noring and a few others baited,
>> and continute to bait, Mr. Bowerbird and others, to create an
>> environment in which he could claim hostility.
>>
>> This is not working here, and it should not work elsewhere.
>>
>>> Jon Noring
>>
>>
>> Thanks!!!
>>
>> Michael S. Hart
>> Founder
>> Project Gutenberg
>>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From piggy at netronome.com  Mon Oct 29 09:54:38 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Mon, 29 Oct 2007 12:54:38 -0400
Subject: [gutvol-d] !@!  RESEND Re:  Comment on My Antonia scan set
In-Reply-To: <4724D938.2090009@perathoner.de>
References: <c2f.1f4ca40e.345524bf@aol.com>	<686943385.20071027200208@noring.name>	<1789330137.20071027212016@noring.name>	<Pine.LNX.4.64.0710281049010.32230@pglaf.org>
	<4724D938.2090009@perathoner.de>
Message-ID: <4726104E.7060904@netronome.com>

Marcello Perathoner wrote:
> Michael Hart wrote:
>
>   
>> It is all to obvious that Mr. Noring and a few others baited,
>> and continute to bait, Mr. Bowerbird and others, to create an
>> environment in which he could claim hostility.
>>     
>
> There are three persons on this list you should not take seriously.
>
>   
But I'm grateful that most of you read my postings anyway :-).


From Bowerbird at aol.com  Mon Oct 29 10:20:02 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 29 Oct 2007 13:20:02 EDT
Subject: [gutvol-d] another tempest in the gutvol-d teapot
Message-ID: <d44.14d31b9b.34577042@aol.com>

wow, the spin has churned up to the point where the term "hate speech"
is being used?   c'mon, folks, get a sense of proportion here.   there are
people in the world who're _really_ being victimized by hate speech, and
it is a shame when their experience is compared to something so trivial...

it's a listserve.   easy to ignore.   just press delete.   unsubscribe if you 
must.

as for the term "merry-go-round", well, isn't it patently obvious to _all_
that jon has a tendency to repeat himself?   when given any opportunity?

i mean, geez, i talk a lot about the same old set of subjects, it's true, but
at least i try to bring something new to the table with every message...

sometimes it takes me two frickin' years to bring that "something new",
as was the case with the .pdf improvements i recently discussed here,
but i've always been in this e-book game for the long haul, ever since
i started doing it 25-plus years ago.   i'm tenacious.   but i bore easily 
too.

indeed, a main reason i quit responding to the noring merry-go-round
was that i got tired of making the same old replies time after time after...
gotta keep stuff fresh, especially if you want people to keep reading you.

but yeah, i _chuckle_ when people accuse me of "pushing their buttons".
what a classic way to blame _me_ for _their_behavior_.   it's very amusing.

i disagree with noring on a lot of points.   even on the issues where _he_
thinks he's in _agreement _with me, half the time he's misunderstanding.

but hey, i'd be _extremely_ happy to put our opposing positions on a wiki
somewhere, one time, and point people to it when the question came up.
jon seems to prefer to do the same old little dance over and over and over.

in the long run, though, people aren't interested in _discussions_ of things.
they want the _proof_ in the _pudding_.   if you can't deliver it, you're 
done.

so don't be too hard on "mr. noring".   he's just frustrated.

he wants other people to mark up books in the manner he's prescribed,
but they don't seem willing to do that.

he wants other people to write converters that'll create beautiful books,
but they don't seem willing to do that.

he wants open-source programmers to write viewer-apps for his format,
but they don't seem willing to do that.

he wants them to program innovative tools that will make authoring easy,
but they don't seem willing to do that.

he wants all the other format advocates to give up to his one true format,
but they don't seem willing to do that.

he wants to "win friends and influence people" and even change the world,
but the world doesn't seem willing to do that.

so he's frustrated.   and it's totally understandable.

if i were in his shoes, i would be frustrated too...

but i'm doing my markup myself, and coding my own programs, and i don't
give a flying frog if the world listens up or not, i'm just doing it to have 
fun...

and if that "pushes your buttons", well, i'm sorry about that, but i'm not 
likely
to stop anytime soon...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071029/fe12279c/attachment-0001.htm 

From f.fuchs at gmx.net  Mon Oct 29 10:50:28 2007
From: f.fuchs at gmx.net (Franz Fuchs)
Date: Mon, 29 Oct 2007 18:50:28 +0100
Subject: [gutvol-d] New Yorker: Anthony Grafton: Digitization and its
	discontents
In-Reply-To: <Pine.LNX.4.64.0710290904200.18344@pglaf.org>
Message-ID: <MHBBKILBOBDADPCENMGKCENNDDAA.f.fuchs@gmx.net>


---
Future Reading

Digitization and its discontents.
by Anthony Grafton 
---

http://www.newyorker.com/reporting/2007/11/05/071105fa_fact_grafton
(ca. 4 300 words)

From jon at noring.name  Mon Oct 29 11:36:35 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 12:36:35 -0600
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
	the PGLAF board
Message-ID: <913787777.20071029123635@noring.name>

In order to constructively focus on some of Michael's comments
about me, let's redirect discussion to focus on what is good for PG.
Why he has spent an incredible amount of time focusing on me, my
motives, etc., is sort of puzzling.

*****

One issue Michael brought up:

The PGLAF Board and the general operation of the organization.

It is very true that a few years ago I proposed that PGLAF reorganize
to improve its organization, potential for fund-raising, and to work
more cooperatively and tangibly with other organizations digitizing
the public domain.

I continue to suggest two actions, which were based upon talking
with experts in non-profit organizations, including one who is a
very well noted attorney in that area and who has advised me on a
couple projects I was involved with. And both of these suggestions
were supported in whole, or in part, by several other notable people
who are involved with PG, DP and other projects to digitize the public
domain tests

My two suggestions are:

1) Setting up a real Board of Trustees that would include notables
   from the public domain digitization arena.

2) Transfer the "Project Gutenberg" trademark to PGLAF.

There are reasons for these recommendations. I've noted them before.

Now Michael has taken the above proposals as a sort of "power play
grab". He is, to be frank, ascribing certain motives on my part for
having even proposed them. And I know that my motives are NOT what he
believes them to me. My motives are what I believe is best for the
movement, just as I believe that Michael's motives are what he
believes is best for the movement.

I hope Michael will accept on face-value what I just said my motives
are. Those who know me know that I am not interested in power nor
fame.

Now, obviously, Michael, who still holds the defacto reins of power in
PGLAF, has vehemently opposed my two proposals, as one can see by his
last few messages where *he* brings these up. Why? Well, again, I will
only ascribe pure motives on his part and that he is afraid that
embracing the above suggestions will harm the mission of the Project
Gutenberg "movement."

I very much hope that Michael will provide us the exact, detailed
reasons why PGLAF should continue to be organized as it is, and why
he should personally hold the trademark, which is universally "not
recommended." He has not yet done so.

What makes this even more ironic is that my two suggestions were
offered at a time when Greg and Michael officially asked for ideas as
to how to improve the movement. So, I offered mine, and others offered
theirs. Funny thing though that Michael has repeated and viciously
attacked mine, and has not offered objective reasons why the two
suggestions should not even be considered in some form.

Regarding #1, I mentioned that the current PGLAF Board is a rump
board. No one on that Board has real experience with digitizing the
Public Domain, nor are notable in any sense in the arena. Three of the
four Board members work at UIUC in Illinois. No doubt they are nice
people, and competent at what they do (e.g., one has a Ph.D. in
aeronautical engineering), but they are not the kind of people one
would want to completely fill the Board of Trustees. I do not think I
need to go into the reasons why the right people on the Board will
greatly benefit PG.

For example, I have mentioned the kind of people who should be asked
to serve on the PGLAF Board, and over time have proposed names, a
sort of "dream team" list:

   a) Charles Franks
   b) Juliet Sutherland
   c) Brewster Kahle
   d) John Mark Ockerbloom
   e) Dr. Widger
   f) Steve Harris
   g) Peter Brantley
   h) Dr. Allen Renear (who is at UIUC)

   
(Btw, for the current PGLAF Board, refer to:

http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Literary_Archive_Foundation

)


Regarding #2, well no need to explain that in any detail. It is never
recommended that a trademark be held by an individual. Individuals do
funny things at times -- like die. And organizations can be more
aggressive at defending their trademark. There are other benefits,
too.

Again, Michael has NOT explained WHY he must hold on to the Project
Gutenberg trademark. I hope he offers an explanation. If not, people
will begin to think of their own reasons, some of which are not
flattering to Michael nor the PG movement. And PGLAF is taken less
seriously by others.


Jon Noring


From jon at noring.name  Mon Oct 29 12:06:07 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 13:06:07 -0600
Subject: [gutvol-d] Focus on the ideas, not the person
In-Reply-To: <d44.14d31b9b.34577042@aol.com>
References: <d44.14d31b9b.34577042@aol.com>
Message-ID: <308293095.20071029130607@noring.name>

Bowerbird wrote:

> [an amazing list of what he says I want]

What is important to see in both Michael's and Bowerbird's replies is
that they strictly focus on me: my motives, my wants, my deficiencies
(I have many, Josh pointed out one), etc., etc.

One has to ask the question: What's the point? How does this benefit
Project Gutenberg? And how do such messages *harm* the PG Community?

Whatever happened to the healthy debate of thoughts and ideas? To view
them as completely independent of the person proposing them? To let
the thoughts and ideas and debate points stand or fall on their own
merits?

I've run a lot of mailing lists the last 14 years, and one thing I've
observed is that when a group focuses on the thoughts and ideas, and
treats the proposer almost as "anonymous" in a cordial manner, the
group thrives. Discussion is robust and meaningful, and sometimes
leads to some new group-level insight. I also notice greater
participation over time because the group is "safe" for participation.

Cordiality and *full respect* for others (and yes I've probably not
been as respectful as I should have been) actually leads to more
fruitful discussion -- eventually leading to new ideas and ways of
doing things.

As soon as discussion turns toward the person proposing an idea or
debate point -- to focus on personality or motives -- the group
rapidly devolves into chaos and the topic is never explored in
sufficient depth to provide information for everyone to make up
their own mind.

I wonder how many here who follow gutvol-d are interested in sharing
their ideas, but have not out of fear that doing so might lead to a
personal attack? (Feel free to email me in private if indeed you are
put off by the recent "tar and feathering" messages.)

Jon Noring


From johnson.leonard at gmail.com  Mon Oct 29 12:25:32 2007
From: johnson.leonard at gmail.com (Leonard Johnson)
Date: Mon, 29 Oct 2007 15:25:32 -0400
Subject: [gutvol-d] New Yorker: Anthony Grafton: Digitization and its
	discontents
In-Reply-To: <MHBBKILBOBDADPCENMGKCENNDDAA.f.fuchs@gmx.net>
References: <Pine.LNX.4.64.0710290904200.18344@pglaf.org>
	<MHBBKILBOBDADPCENMGKCENNDDAA.f.fuchs@gmx.net>
Message-ID: <748ba8e50710291225y151d805bj3c4cafc659717963@mail.gmail.com>

On 10/29/07, Franz Fuchs <f.fuchs at gmx.net> wrote:
>
> ---
> Future Reading
>
> Digitization and its discontents.
> by Anthony Grafton
> ---
>
> http://www.newyorker.com/reporting/2007/11/05/071105fa_fact_grafton
> (ca. 4 300 words)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
I seldom get involved with the discussions here, but I wish to thank
Franz Fuchs for the link. I found the article very interesting.

-- 
http://members.cox.net/leaonarddjohnson/

From jon at noring.name  Mon Oct 29 12:36:47 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 13:36:47 -0600
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
	the PGLAF board
In-Reply-To: <913787777.20071029123635@noring.name>
References: <913787777.20071029123635@noring.name>
Message-ID: <1103954005.20071029133647@noring.name>

Oops, need to clarify a couple points in my prior message. I wrote:

> I very much hope that Michael will provide us the exact, detailed
> reasons why PGLAF should continue to be organized as it is, and why
> he should personally hold the trademark, which is universally "not
> recommended." He has not yet done so.

Now Michael has given reasons here and there, but in my opinion they
are not yet sufficient, nor cogently and objectively organized, to
form a final answer on his part. Hopefully he will fully clarify his
thoughts to the level or organization where he can simply repost it
whenever someone brings it up again. Of course, he can refuse to
answer this request, but that does not help for transparency of an
organization/movement whose whole philosophy is built around
transparency and openness, and relies upon thousands of volunteers.

To summarize, I'd like to hear Michael's beliefs on:

1) Why is the PGLAF Board made up the way it is -- what is Michael's
   philosophy towards the purpose and role of the Board, and what
   kind of people should not serve on the PGLAF Board (besides me, of
   course. <smile/>)

   I also continue to be mystified why at least Juliet Sutherland or
   someone else from DP is not on the PGLAF Board (maybe she was asked
   and turned it down, but this is important to know given the
   importance of DP to the PG "movement".)

2) What value he sees in himself personally holding the PG trademark
   rather than turning it over to PGLAF. How does personal ownership
   benefit the long-term mission of the Project Gutenberg "movement"?


> Regarding #1, I mentioned that the current PGLAF Board is a rump
> board. No one on that Board has real experience with digitizing the
> Public Domain, nor are notable in any sense in the arena.

Obviously, one of the four Board members listed at the URL I gave, and
I assume that is an updated list, is Dr. Greg Newby, who is the Board
Chair and also the CEO. I was referring to the other three members.
Refer to:

http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Literary_Archive_Foundation


Jon Noring


From bowerbird at aol.com  Mon Oct 29 13:44:03 2007
From: bowerbird at aol.com (bowerbird at aol.com)
Date: Mon, 29 Oct 2007 16:44:03 -0400
Subject: [gutvol-d] i've asked before, but i'll ask again`
Message-ID: <8C9E88393776E53-554-31ED@FWM-D08.sysops.aol.com>

jon-

i've asked before, but i'll ask again.

when you send a message to the list, it comes to me.
so there's no need to send a second copy to me too...

so please stop doing that.

and please don't make me have to ask you a third time.

i'm not even reading messages that come from you anyway
-- and i mostly succeed not reading even the fragments
that people include when they make a reply to you --
so it's really unnecessary to double up your messages...

now, i'm gonna say some things to you here, jon, but
i'm not going to read your reply if you make one, so
if that bothers you, then don't even read the rest...

***

you make a big deal about "respect". but the thing is,
you long ago sacrificed the _modicum_ of respect that
_everyone_ deserves from me, by virtue of being human.

so the only "respect" you have left is what you _earn_,
by your _ideas_, and the little bit of that which you
_once_ had has evaporated, because you've proven that
your ideas generally don't pass the cost-benefit test,
even in terms of _your_very_own_behavior_.

and, quite frankly, i don't feel any "respect" coming
in the opposite direction, from you toward me, which
doesn't bother me, since i'm not hung up on "respect".
plus i simply don't ascribe much value to your judgment.
but this hypocrisy in your position always humors me...

finally, i don't see you have much "respect" for the
institution of _dialog_and_discussion_, because you
never seem to learn much of anything at all from it.
it's merely a way for you to reiterate your opinions.

like the rest of the "win friends and influence people"
crowd, it seems you want others to bend to you, but yet
the notion that you might bend to them is inconceivable.

i think my positions are correct too, but that's because
i'm willing to shift 'em immediately if a stronger argument
emerges for any another position.

so there. reply if you want to, but i won't even read it.
i spent way too many years already reading your messages,
and way too much time replying to them, until i finally
decided that you no longer had anything of value to me...

and that you _hadn't_ had anything of value for years...

and it took several years to wean myself off of replying,
because it had just become a bad habit. and also because
i thought it was necessary to counter many of your ideas,
just in case some newbies believed you, but now i realize
that's not even a problem any more, so i am fully clean.

or maybe not, because look at me, here i am once again,
wasting my time writing a post to jon noring... geez!

go waste other people's time, and leave me alone. goodbye.

-bowerbird

________________________________________________________________________
Email and AIM finally together. You've gotta check out free AOL Mail! - 
http://mail.aol.com

From hart at pglaf.org  Mon Oct 29 14:22:42 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 14:22:42 -0700 (PDT)
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
 the PGLAF board
In-Reply-To: <1103954005.20071029133647@noring.name>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
Message-ID: <Pine.LNX.4.64.0710291411260.24540@pglaf.org>


Jon has already had all the answers to these questions,
and he knows it, he is just trying once again to move a
certain conversation to the point where he can expound,
again, at length, how he thinks things should be run.

The most obvious and complete answer is, of course, the
one you have heard the most often, that he is welcome--
nay, ENCOURAGED, to stop talking and start DOING and as
with all suggested courses of ACTION, Project Gutenberg
will provide as much assistance as possible.

HOWEVER, as long as Jon is igNoring the call to ACTION,
his words remaind just that, words, and it is obvious--
thank goodness--that his words never carry any weight--
and no one has tried to elect him to anything.

DOUBLY HOWEVER, it Mr. Noring SHOULD eventually amass a
group who want him to lead them, Project Gutenberg will
be only to glad to offer all possible assistance.

As it has been, as it is, and we can only hope. . . .

Distributed Proofreaders is a perfect example, and this
example should be more than enough to provide Jon with,
we hope, all the encouragement needed.

DP is its own entity, has its own leaders, and gets the
support it requests from Project Gutenberg, just as Mr.
Noring would/could/should have had if he were a willing
worker as much as he is a willing talker.

Now you understand why we don't censor what he says, he
is just too good an example, we'd never find better.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Mon, 29 Oct 2007, Jon Noring wrote:

> Oops, need to clarify a couple points in my prior message. I 
> wrote:
>
>> I very much hope that Michael will provide us the exact, 
>> detailed reasons why PGLAF should continue to be organized as 
>> it is, and why he should personally hold the trademark, which 
>> is universally "not recommended." He has not yet done so.
>
> Now Michael has given reasons here and there, but in my opinion 
> they are not yet sufficient, nor cogently and objectively 
> organized, to form a final answer on his part. Hopefully he 
> will fully clarify his thoughts to the level or organization 
> where he can simply repost it whenever someone brings it up 
> again. Of course, he can refuse to answer this request, but 
> that does not help for transparency of an organization/movement 
> whose whole philosophy is built around transparency and 
> openness, and relies upon thousands of volunteers.
>
> To summarize, I'd like to hear Michael's beliefs on:
>
> 1) Why is the PGLAF Board made up the way it is -- what is 
> Michael's
>   philosophy towards the purpose and role of the Board, and 
> what
>   kind of people should not serve on the PGLAF Board (besides 
> me, of
>   course. <smile/>)
>
>   I also continue to be mystified why at least Juliet 
> Sutherland or
>   someone else from DP is not on the PGLAF Board (maybe she was 
> asked
>   and turned it down, but this is important to know given the
>   importance of DP to the PG "movement".)
>
> 2) What value he sees in himself personally holding the PG 
> trademark
>   rather than turning it over to PGLAF. How does personal 
> ownership
>   benefit the long-term mission of the Project Gutenberg 
> "movement"?
>
>
>> Regarding #1, I mentioned that the current PGLAF Board is a 
>> rump board. No one on that Board has real experience with 
>> digitizing the Public Domain, nor are notable in any sense in 
>> the arena.
>
> Obviously, one of the four Board members listed at the URL I 
> gave, and I assume that is an updated list, is Dr. Greg Newby, 
> who is the Board Chair and also the CEO. I was referring to the 
> other three members. Refer to:
>
> http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Literary_Archive_Foundation
>
>
>
> Jon Noring
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Mon Oct 29 14:47:22 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 29 Oct 2007 22:47:22 +0100
Subject: [gutvol-d] i've asked before, but i'll ask again`
In-Reply-To: <8C9E88393776E53-554-31ED@FWM-D08.sysops.aol.com>
References: <8C9E88393776E53-554-31ED@FWM-D08.sysops.aol.com>
Message-ID: <472654EA.8000701@perathoner.de>

bowerbird at aol.com wrote:

> when you send a message to the list, it comes to me.
> so there's no need to send a second copy to me too...

Stop bugging people. *Your* mail agent is misconfigured. You are sending
a copy of all messages to yourself:

> From: Bowerbird at aol.com
> Message-ID: <d44.14d31b9b.34577042 at aol.com>
> Date: Mon, 29 Oct 2007 13:20:02 EDT
> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com

Now, if people hit 'reply', each former recipient gets an answer. It's
*your* fault, not Jon's.

Solution: stop sending a copy of your messages to yourself. Or, if you
absolutely need a copy, send it as BCC.


While you're reconfiguring, stop sending your post as text *and* HTML.
That will save you even more bandwidth than if you get answers twice.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From jon at noring.name  Mon Oct 29 14:52:09 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 15:52:09 -0600
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
	the PGLAF board
In-Reply-To: <Pine.LNX.4.64.0710291411260.24540@pglaf.org>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
Message-ID: <1035758925.20071029155209@noring.name>

Michael wrote:

> Jon has already had all the answers to these questions,
> and he knows it, he is just trying once again to move a
> certain conversation to the point where he can expound,
> again, at length, how he thinks things should be run.

These answers have NOT been given, and if they have, I'd be happy to
have a link to the message in the archive, or a summary of your
reasons.

For the defacto leader of the PG movement, these are legitimate
questions to ask:

1) The makeup of the PGLAF Board, why it is as it is, and future
   plans, and

2) The personal ownership of the trademark.

People will begin to wonder why you continue to avoid answering these
simple yet important questions. The governance of any non-profit
organization is a legitimate thing to ask when it asks thousands of
people to volunteer for it.

So are you saying that asking these questions is out of bounds?

You can keep trying to divert attention back to me and my foibles
(real and imagined). But the questions asked are legitimate, and they
have not yet been cogently answered in any public forum. If you have,
then point a link to your answer in the archives and I'll be happy to
blog the link somewhere. Better yet, provide a link to it on the
Gutenberg site.

Jon Noring


From hart at pglaf.org  Mon Oct 29 14:56:24 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 14:56:24 -0700 (PDT)
Subject: [gutvol-d] Focus on the ideas, not the person
In-Reply-To: <308293095.20071029130607@noring.name>
References: <d44.14d31b9b.34577042@aol.com>
	<308293095.20071029130607@noring.name>
Message-ID: <Pine.LNX.4.64.0710291426000.24540@pglaf.org>

On Mon, 29 Oct 2007, Jon Noring wrote:

> Bowerbird wrote:
>
>> [an amazing list of what he says I want]
>
> What is important to see in both Michael's and Bowerbird's 
> replies is that they strictly focus on me: my motives, my 
> wants, my deficiencies (I have many, Josh pointed out one), 
> etc., etc.

No one I know has any idea of Mr. Noring's "motives," "wants,"
etc., for the simple reason that he never presents any goal to
consider other than that he and his cabinet should control PG.

PG has been created, from the very start, up to today, simply,
and completely, by DOING. . .not just TALKING.

Mr. Noring has continually been invited to run anything he is
interested in up the flagpole, with or without example works,
though many presume it would work better if he had at least a
small handful of examples, and then just see who salutes.

Mr. Noring is not getting the kinds of salutes he wants so he
has asked for some kind of knighthood process again and again
in the hopes that if we acknowledge him BEFORE his action the
action will then prove worthy of this knighthood.

We continually offer him all the support we possibly can.

The results are there for all to see.


> One has to ask the question: What's the point? How does this 
> benefit Project Gutenberg? And how do such messages *harm* the 
> PG Community?

Just wasting our time and energy, seems to be all for the moment,
but I always wonder what else Mr. Noring has in mind.


> Whatever happened to the healthy debate of thoughts and ideas? 
> To view them as completely independent of the person proposing 
> them? To let the thoughts and ideas and debate points stand or 
> fall on their own merits?

I'm sorry, did I miss Mr. Noring's presentation of some project?

Perhaps I am just not able to read between the lines to get some
inner meaning to something that will change Project gutenberg or
perhaps even help change the world.

If Mr. Noring has a way to get more books to more people, I have
nothing but interest.

However, this is not what I seem to have been receiving.

And I most sincerely apologize to all concerned, and Mr. Noring,
a dozen times over, for any such lack.


> I've run a lot of mailing lists the last 14 years, and one 
> thing I've observed is that when a group focuses on the 
> thoughts and ideas, and treats the proposer almost as 
> "anonymous" in a cordial manner, the group thrives. Discussion 
> is robust and meaningful, and sometimes leads to some new 
> group-level insight. I also notice greater participation over 
> time because the group is "safe" for participation.

This seems to be just the opposite of Mr. Noring's approach with
"The Book People" mailing list, where my responses to him have a
double rate of censorship than what he claimed for his own and I
strongly suspect this is not accidental on his part.


> Cordiality and *full respect* for others (and yes I've probably 
> not been as respectful as I should have been) actually leads to 
> more fruitful discussion -- eventually leading to new ideas and 
> ways of doing things.

Mr. Noring claims to respect me, and perhaps Project Gutenberg as
well, but, again,  I have trouble with this kind of respect.

Mr. Noring seems to take the same strategic stand every year at a
similar time, around the equinoxes, has anyone noticed.

I strongly suspect the entire world has a certain susceptability,
if you will pardon me, to Seasonal Affective Disorder, not just a
certain individual, but enough to flavor the world at large.

Comments?

I have asked "The Book People" Moderator, but he instantly denied
any such susceptability to Seasonal Affective Distorder, even tho
he seems to censor me more at these times. . .perhaps only due to
the conversations with Mr. Noring, and the world at large.

> As soon as discussion turns toward the person proposing an idea 
> or debate point -- to focus on personality or motives -- the 
> group rapidly devolves into chaos and the topic is never 
> explored in sufficient depth to provide information for 
> everyone to make up their own mind.

Then make a real proposal, which is what we always ask. . . .

Just what is it that you have in mind that no one is accepting?


> I wonder how many here who follow gutvol-d are interested in 
> sharing their ideas, but have not out of fear that doing so 
> might lead to a personal attack? (Feel free to email me in 
> private if indeed you are put off by the recent "tar and 
> feathering" messages.)

Jon, you have had me "tarred and feathered" far more than any
such treatment you have received, I think most will agree tho
with the exception of the few you always bring with you and a
perhaps new recruit this year, if you are aware enough to get
such a recruit firmly on your side.

If you feel "tarred and feathered" by me, I don't see why, as
it appears you have simply made your usual fray into politics
at the usual time, and you never seemed to feel the responses
from me, or others, were out of place.

Again, and again, and again, it always comes down to the one,
simple, unavoidable question:

"What do you want to do?"

If you want to take over some ACTION, you must first ACT.

I'm not at all sure just what ACTIONS you are proposing other
than your continual efforts to stack the Board of Directors--
but for what PURPOSE, other than your own personal power.

What is the Project Gutenberg of YOUR dreams???

If you will just give us that handful of examples we ask for,
year after year, we will as always offered, give you your own
directory, with all permissions, your own newsletter slot and
all the publicity we can to promote you and your project.


> Jon Noring


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


From jon at noring.name  Mon Oct 29 15:02:24 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 16:02:24 -0600
Subject: [gutvol-d] Focus on the ideas, not the person
In-Reply-To: <Pine.LNX.4.64.0710291426000.24540@pglaf.org>
References: <d44.14d31b9b.34577042@aol.com>
	<308293095.20071029130607@noring.name>
	<Pine.LNX.4.64.0710291426000.24540@pglaf.org>
Message-ID: <391113694.20071029160224@noring.name>

Michael Hart wrote:

> No one I know has any idea of Mr. Noring's "motives," "wants,"
> etc., for the simple reason that he never presents any goal to
> consider other than that he and his cabinet should control PG.

Ah, this is the crux.

You have no basis to say this Michael, because I *never* said I wanted
control, nor advocated any structure where *I* and my "cabinet" would
have control. (whoever my "cabinet" is)

Do you have any evidence to back up this ridiculous charge? Or did you
just imagine I'm part of the Trilateral commission or something.

Please Michael, return to reality.

Jon Noring


From hart at pglaf.org  Mon Oct 29 15:11:30 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 15:11:30 -0700 (PDT)
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
 the PGLAF board
In-Reply-To: <1035758925.20071029155209@noring.name>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<1035758925.20071029155209@noring.name>
Message-ID: <Pine.LNX.4.64.0710291501120.24540@pglaf.org>


Jon, over all the years I've done eBooks, only one other person
than yourself has ever mentioned who should be on the Board and
we had already offered that person a board position.

The fact is, as we have told you before, that there is not much
direction from anyone, Board or not, as to what YOU should do--
you are free to do whatever you think best to do eBooks.

The trouble, it seems, is that everyone one else is free, too--
and they don't seem to want to do it your way.

I would certainly be only too happy if they would.

Then perhaps you would stop this.

No one seems to want to take over Project Gutenberg but you.

And we have rolled out the red carpet for any project you might
want to try. . .you can have your own project. . .take all of a
whole world of credit for it. . .and more power to you!!!

Just Do It!

Michael

PS  I seriously doubt there will be any "meat" on this "bone of
contention" again this year, if the follows the patter of Jon's
past attempts, so I will be passing over Mr. Noring's emails at
least for a few days, as I have a presentation to do for the UI
Library Mortenson Center for visiting librarians from at least,
so I am told, 14 countries.

Jon, please do not take offense. . .only possibly you should be
happier than I, if you get the volunteers you want to do what's
in your mind, as long as it's designed to further eBooks in the
sense we have been doing, or perhaps even in a better way.


On Mon, 29 Oct 2007, Jon Noring wrote:

> Michael wrote:
>
>> Jon has already had all the answers to these questions,
>> and he knows it, he is just trying once again to move a
>> certain conversation to the point where he can expound,
>> again, at length, how he thinks things should be run.
>
> These answers have NOT been given, and if they have, I'd be happy to
> have a link to the message in the archive, or a summary of your
> reasons.
>
> For the defacto leader of the PG movement, these are legitimate
> questions to ask:
>
> 1) The makeup of the PGLAF Board, why it is as it is, and future
>   plans, and
>
> 2) The personal ownership of the trademark.
>
> People will begin to wonder why you continue to avoid answering these
> simple yet important questions. The governance of any non-profit
> organization is a legitimate thing to ask when it asks thousands of
> people to volunteer for it.
>
> So are you saying that asking these questions is out of bounds?
>
> You can keep trying to divert attention back to me and my foibles
> (real and imagined). But the questions asked are legitimate, and they
> have not yet been cogently answered in any public forum. If you have,
> then point a link to your answer in the archives and I'll be happy to
> blog the link somewhere. Better yet, provide a link to it on the
> Gutenberg site.
>
> Jon Noring
>
>
>

From hart at pglaf.org  Mon Oct 29 15:23:03 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 15:23:03 -0700 (PDT)
Subject: [gutvol-d] i've asked before, but i'll ask again`
In-Reply-To: <472654EA.8000701@perathoner.de>
References: <8C9E88393776E53-554-31ED@FWM-D08.sysops.aol.com>
	<472654EA.8000701@perathoner.de>
Message-ID: <Pine.LNX.4.64.0710291512010.24540@pglaf.org>


Marcello may not be quite correct, at least for my emailer,
and some others I have tried.

"reply" and "reply to all" many times are different commands,
which each generate different results.

"reply" goes only to the sender, or their assigned "reply to"
[which might be a different address than the sending address]

Depending on your own default settings, you might get one or
the other of these two commands when you reply to an email.

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Mon, 29 Oct 2007, Marcello Perathoner wrote:

> bowerbird at aol.com wrote:
>
>> when you send a message to the list, it comes to me.
>> so there's no need to send a second copy to me too...
>
> Stop bugging people. *Your* mail agent is misconfigured. You are sending
> a copy of all messages to yourself:
>
>> From: Bowerbird at aol.com
>> Message-ID: <d44.14d31b9b.34577042 at aol.com>
>> Date: Mon, 29 Oct 2007 13:20:02 EDT
>> To: gutvol-d at lists.pglaf.org, Bowerbird at aol.com
>
> Now, if people hit 'reply', each former recipient gets an answer. It's
> *your* fault, not Jon's.
>
> Solution: stop sending a copy of your messages to yourself. Or, if you
> absolutely need a copy, send it as BCC.
>
>
> While you're reconfiguring, stop sending your post as text *and* HTML.
> That will save you even more bandwidth than if you get answers twice.
>
>
>
> -- 
> Marcello Perathoner
> webmaster at gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From jon at noring.name  Mon Oct 29 15:26:19 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 16:26:19 -0600
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
	the PGLAF board
In-Reply-To: <Pine.LNX.4.64.0710291501120.24540@pglaf.org>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<1035758925.20071029155209@noring.name>
	<Pine.LNX.4.64.0710291501120.24540@pglaf.org>
Message-ID: <1373634448.20071029162619@noring.name>

Michael wrote:

> Jon, over all the years I've done eBooks, only one other person
> than yourself has ever mentioned who should be on the Board and
> we had already offered that person a board position.

O.k., thanks.

I think my suggestion is for PGLAF to look at the importance of
remaking its board. I believe it will provide positive value in
several ways I've talked about before. This does not mean that PG
can't continue to foster all kinds of independent projects (which is
actually good). But having a good core Board may not only better help
proposed projects but possibly foster more. One should not
underestimate the value and potential of a good Board.

Jon


From hart at pglaf.org  Mon Oct 29 15:28:23 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 15:28:23 -0700 (PDT)
Subject: [gutvol-d] Focus on the ideas, not the person
In-Reply-To: <391113694.20071029160224@noring.name>
References: <d44.14d31b9b.34577042@aol.com>
	<308293095.20071029130607@noring.name>
	<Pine.LNX.4.64.0710291426000.24540@pglaf.org>
	<391113694.20071029160224@noring.name>
Message-ID: <Pine.LNX.4.64.0710291527440.24540@pglaf.org>


OK, Jon has suggested returning to reality, and I agree.

Nothing more need be said on the unreal.


On Mon, 29 Oct 2007, Jon Noring wrote:

> Michael Hart wrote:
>
>> No one I know has any idea of Mr. Noring's "motives," "wants,"
>> etc., for the simple reason that he never presents any goal to
>> consider other than that he and his cabinet should control PG.
>
> Ah, this is the crux.
>
> You have no basis to say this Michael, because I *never* said I wanted
> control, nor advocated any structure where *I* and my "cabinet" would
> have control. (whoever my "cabinet" is)
>
> Do you have any evidence to back up this ridiculous charge? Or did you
> just imagine I'm part of the Trilateral commission or something.
>
> Please Michael, return to reality.
>
> Jon Noring
>

From hart at pglaf.org  Mon Oct 29 15:34:08 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 15:34:08 -0700 (PDT)
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
 the PGLAF board
In-Reply-To: <1373634448.20071029162619@noring.name>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<1035758925.20071029155209@noring.name>
	<Pine.LNX.4.64.0710291501120.24540@pglaf.org>
	<1373634448.20071029162619@noring.name>
Message-ID: <Pine.LNX.4.64.0710291528540.24540@pglaf.org>


The idea of the proposed projects is still lacking,
that's apparently where our "realities" differ.

Jon want the political power of the Board before an
assortment of projects is proposed.

However, this is putting the cart before the horse.

In Project Gutenberg the political power is ignored
in favor of a "Just Do It!" kind of attitude.

What Jon wants is the political power without works
on the projects that would earn it.

Because we offer that kind of power free to all who
ask for it, Jon doesn't seem to want it.

He seems to want the other kind of power.

Power over others without. . . .


enough said. . .I hope. . .

mh


On Mon, 29 Oct 2007, Jon Noring wrote:

> Michael wrote:
>
>> Jon, over all the years I've done eBooks, only one other person
>> than yourself has ever mentioned who should be on the Board and
>> we had already offered that person a board position.
>
> O.k., thanks.
>
> I think my suggestion is for PGLAF to look at the importance of
> remaking its board. I believe it will provide positive value in
> several ways I've talked about before. This does not mean that PG
> can't continue to foster all kinds of independent projects (which is
> actually good). But having a good core Board may not only better help
> proposed projects but possibly foster more. One should not
> underestimate the value and potential of a good Board.
>
> Jon
>
>
>

From jon at noring.name  Mon Oct 29 15:52:32 2007
From: jon at noring.name (Jon Noring)
Date: Mon, 29 Oct 2007 16:52:32 -0600
Subject: [gutvol-d] Since MIchael brought up some points, e.g.,
	the PGLAF board
In-Reply-To: <Pine.LNX.4.64.0710291528540.24540@pglaf.org>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<1035758925.20071029155209@noring.name>
	<Pine.LNX.4.64.0710291501120.24540@pglaf.org>
	<1373634448.20071029162619@noring.name>
	<Pine.LNX.4.64.0710291528540.24540@pglaf.org>
Message-ID: <1786059042.20071029165232@noring.name>

Michael wrote:

> enough said. . .I hope. . .

Well, you've had your say, and I've had mine. So we'll leave it to the
others who are following this (if anyone!) to decide for themselves.

Jon


From klofstrom at gmail.com  Mon Oct 29 16:03:28 2007
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Mon, 29 Oct 2007 13:03:28 -1000
Subject: [gutvol-d] Founder's syndrome
Message-ID: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>

If Jon is to be ignored because he hasn't done enough books, perhaps
my voice will be heard. I've proofed some 39,000 pages at DP, over the
course of four years, and post-processed several books.

PG has a bad case of Founder's Syndrome:

http://www.help4nonprofits.com/NP_Bd_FoundersSyndrome_Art.htm

It's frequent, it's predictable, it's the common cold of non-profits.

Jon has raised some sensible questions about governance and ownership
of the trademark and they shouldn't be dismissed with accusations that
Jon is trying to take over PG.

Alas, I don't expect that I *will* be heard. So I won't belabor the point.

-- 
Karen Lofstrom

From Bowerbird at aol.com  Mon Oct 29 17:34:07 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 29 Oct 2007 20:34:07 EDT
Subject: [gutvol-d] the z.m.l. dingus
Message-ID: <bf3.18a4718f.3457d5ff@aol.com>

ok, i'm not really sure if i'm ready to make this public yet,
but let's give this "reality" a whirl and see what happens...

i've mentioned here before my zml-to-html converter, at:
>    http://z-m-l.com/go/vl3.pl

this takes pre-formatted .zml texts and auto-converts them
to an .html version that is displayed right there on the page.

that gives you a _taste_ of pudding, so it is a nice demo, but
it doesn't let you stick your finger in the pudding and swirl it.

so here's the z.m.l. dingus:
>    http://z-m-l.com/go/zmldingus093.pl

it's live.   like a wiki.   you edit the field, click the "do it" button,
and boom, whatever you edited gets converted _from_ z.m.l.
into .html.

you can click in the same preformatted texts, if you want, and
confirm they still work.   but you can also enter your own stuff.

to get you started, click "skeleton" to pull in a bare-bones file.

of course, if what you enter is not "correct" z.m.l., then it won't
get converted right.   indeed, since the dingus is "in-progress",
you might even do "correct" z.m.l. and have it come out wrong.
in such a case, i'd like to know about it.

if the output isn't right, there is a chance your input is wrong...
so please do make sure that your input is "correct" z.m.l. first...

because if i get a bunch of people saying "it doesn't work right"
when the _real_ problem is that they just fed it some bad input,
i'll just shut the free-entry dingus off again, and continue with
the preformatted stuff i _know_ is right, to prove my pudding...

-bowerbird

p.s. it has display glitches in internet explorer -- imagine that! --
where you'll find the generated .html _underneath_ the editfield.
on most other browsers, you should find them to be side-by-side.
the "w=##" and "h=##" fields let you adjust the height and width
of the edit-field, since c.s.s. seems not to affect an .html editfield.


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071029/31fdcf47/attachment.htm 

From editor at pg-news.org  Mon Oct 29 17:55:22 2007
From: editor at pg-news.org (Mike Cook)
Date: Tue, 30 Oct 2007 00:55:22 -0000
Subject: [gutvol-d] Since MIchael brought up some points,
	e.g. the PGLAF board
In-Reply-To: <Pine.LNX.4.64.0710291411260.24540@pglaf.org>
References: <913787777.20071029123635@noring.name>	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
Message-ID: <001901c81a8f$8e2034e0$aa609ea0$@org>

>> Jon has already had all the answers to these questions,
>> and he knows it

Perhaps he has...and perhaps he hasn't...but I would be very interested in
hearing a response to those questions put forward by Jon.

Mike


-----Original Message-----
From: Michael Hart [mailto:hart at pglaf.org] 
Sent: 29 October 2007 21:23
To: Project Gutenberg Volunteer Discussion
Subject: Re: [gutvol-d] Since MIchael brought up some points, e.g., the PGLAF
board


Jon has already had all the answers to these questions,
and he knows it, he is just trying once again to move a
certain conversation to the point where he can expound,
again, at length, how he thinks things should be run.

The most obvious and complete answer is, of course, the
one you have heard the most often, that he is welcome--
nay, ENCOURAGED, to stop talking and start DOING and as
with all suggested courses of ACTION, Project Gutenberg
will provide as much assistance as possible.

HOWEVER, as long as Jon is igNoring the call to ACTION,
his words remaind just that, words, and it is obvious--
thank goodness--that his words never carry any weight--
and no one has tried to elect him to anything.

DOUBLY HOWEVER, it Mr. Noring SHOULD eventually amass a
group who want him to lead them, Project Gutenberg will
be only to glad to offer all possible assistance.

As it has been, as it is, and we can only hope. . . .

Distributed Proofreaders is a perfect example, and this
example should be more than enough to provide Jon with,
we hope, all the encouragement needed.

DP is its own entity, has its own leaders, and gets the
support it requests from Project Gutenberg, just as Mr.
Noring would/could/should have had if he were a willing
worker as much as he is a willing talker.

Now you understand why we don't censor what he says, he
is just too good an example, we'd never find better.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Mon, 29 Oct 2007, Jon Noring wrote:

> Oops, need to clarify a couple points in my prior message. I 
> wrote:
>
>> I very much hope that Michael will provide us the exact, 
>> detailed reasons why PGLAF should continue to be organized as 
>> it is, and why he should personally hold the trademark, which 
>> is universally "not recommended." He has not yet done so.
>
> Now Michael has given reasons here and there, but in my opinion 
> they are not yet sufficient, nor cogently and objectively 
> organized, to form a final answer on his part. Hopefully he 
> will fully clarify his thoughts to the level or organization 
> where he can simply repost it whenever someone brings it up 
> again. Of course, he can refuse to answer this request, but 
> that does not help for transparency of an organization/movement 
> whose whole philosophy is built around transparency and 
> openness, and relies upon thousands of volunteers.
>
> To summarize, I'd like to hear Michael's beliefs on:
>
> 1) Why is the PGLAF Board made up the way it is -- what is 
> Michael's
>   philosophy towards the purpose and role of the Board, and 
> what
>   kind of people should not serve on the PGLAF Board (besides 
> me, of
>   course. <smile/>)
>
>   I also continue to be mystified why at least Juliet 
> Sutherland or
>   someone else from DP is not on the PGLAF Board (maybe she was 
> asked
>   and turned it down, but this is important to know given the
>   importance of DP to the PG "movement".)
>
> 2) What value he sees in himself personally holding the PG 
> trademark
>   rather than turning it over to PGLAF. How does personal 
> ownership
>   benefit the long-term mission of the Project Gutenberg 
> "movement"?
>
>
>> Regarding #1, I mentioned that the current PGLAF Board is a 
>> rump board. No one on that Board has real experience with 
>> digitizing the Public Domain, nor are notable in any sense in 
>> the arena.
>
> Obviously, one of the four Board members listed at the URL I 
> gave, and I assume that is an updated list, is Dr. Greg Newby, 
> who is the Board Chair and also the CEO. I was referring to the 
> other three members. Refer to:
>
>
http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Literary_Archive_Found
ation
>
>
>
> Jon Noring
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From Bowerbird at aol.com  Mon Oct 29 18:23:56 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 29 Oct 2007 21:23:56 EDT
Subject: [gutvol-d] some responses for mike
Message-ID: <d2b.1a66f95b.3457e1ac@aol.com>

mike said:
>    I would be very interested in hearing a response 
>    to those questions put forward by Jon.

since anyone seems to be able to put forward the questions,
maybe anyone is able to put forward some responses for mike.

so here are mine:

1.   the project gutenberg board has decided the current constitution of
the p.g. board is just fine.   they'll let you know if they change their 
mind.

2.   michael hart, who owns the project gutenberg trademark likely because
he's the person who ordered and paid for it, not to mention who nurtured
the project through its first decades without much support from _anyone_,
thinks his ownership is just fine.   he'll let you know if he changes his 
mind.

if you don't like those, try these:

1.   actually, it never occurred to the people on the board that someone
would object to the volunteer service they rendered for many years, so
they haven't even really thought about putting anyone else on the board,
but if they were questioned, they'd wonder why you brought the issue up.

2.   michael sleeps better at night knowing that he owns the trademark.
you know how parents worry about their children when they're out late.


i'm on a roll now:

1.   we don't need no stinkeen' board.

2.   i sleep better at night knowing that michael owns the trademark.


or, if you don't like any of those, how 'bout these?

1.   what's it to you?

2.   what's it to you?

perhaps you'll get the "flavor" of these last two responses if you remember
that project gutenberg was conceived and nurtured not far from chicago...
so if you will put a chicago construction worker "accent" on those answers,
you might grok them a bit better...

anyway, let me know if you need any more, and i'll do my best...        :+)

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071029/b8ede745/attachment.htm 

From Bowerbird at aol.com  Mon Oct 29 18:42:55 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 29 Oct 2007 21:42:55 EDT
Subject: [gutvol-d] "founder's syndrome"
Message-ID: <d4f.160f782e.3457e61f@aol.com>

i'll have a response to that silly "founder's syndrome" message, too...

indeed, i hope michael doesn't even feel the need to explain himself.

i'll go into it in more detail later, but i'm going out to dinner now...

in a nutshell, though, the answer is that michael -- _intentionally_ --
set out to build a different kind of "non-profit" organization than the
"typical" one _some_ people here now seem to want him to have built.

jimmy wales now has the same kind of "problem" over at wikipedia...
(indeed, his is even _more_ pronounced, because he has to deal with
all the people who want him to "capitalize" on all of his "page-views".
at least michael doesn't have to appear to "give up" a bunch of cash.)

make no mistake about it, the "networking" power of the internet can
create a lot of money.   google is on its way to being the richest business
in the history of the planet, eclipsing a good many nations of the world.

but all that "networking" power can _also_ be used for _collaboration_...

and when it _is_ used for that purpose, it will _transform_the_world_...

so it depends on if you wanna live in the old world, where greed rules,
or the new world, where people live together in peace and harmony...

once you've really "tuned in" to the idea of "unlimited distribution",
you will get it.   that's a notion that makes no sense in the old world.
but really, it's not about "distribution" at all, as that implies "product".
the essence of this new world is that it is about generous spirituality.

but, you know, people still have to eat.   and now i'm late for dinner...     
 :+)

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071029/aa9a35c1/attachment.htm 

From ksclarke at gmail.com  Mon Oct 29 18:44:28 2007
From: ksclarke at gmail.com (Kevin S. Clarke)
Date: Mon, 29 Oct 2007 21:44:28 -0400
Subject: [gutvol-d] Since MIchael brought up some points,
	e.g. the PGLAF board
In-Reply-To: <001901c81a8f$8e2034e0$aa609ea0$@org>
References: <913787777.20071029123635@noring.name>
	<1103954005.20071029133647@noring.name>
	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<001901c81a8f$8e2034e0$aa609ea0$@org>
Message-ID: <3557b8d0710291844k1c2868e0s262005d0bdf792c8@mail.gmail.com>

I'm curious too, but I think the answer might be gleaned from reading:

http://www.gutenberg.org/wiki/Gutenberg:Administrivia_by_Michael_Hart

It seems the purpose of the board is to do as little as possible.  The
organization seems to encourage "being the change" they'd like to see
(it offers no incentives to bring about any one given change).

That makes Michael Hart's responses for Jon Noring to DO something
make a little more sense (though it didn't seem like much of an answer
to the questions asked to me before I read the above URL).  I think
what it boils down to is that the questions Noring is asking aren't of
much interest to PG as an organization.

I think a good example of this "be the change" perspective is seen in
PG moving towards XML as a format.  When I've seen it discussed over
the years, the answer always seems to be, "If you are interested do
it."  There isn't any movement on from the PG organization; they
aren't interested in it, it seems... the question always falls back to
"What standard?  We can't reach agreement."

So, Noring, why try to make PG something other than what it is?  It
doesn't seem like the contributors are clamouring for something
different.

On a related note, anyone attempted lately to do autoconversion from
the plain text formats to XML (any format)?  Any luck with it?  Just
curious,

Kevin


On 10/29/07, Mike Cook <editor at pg-news.org> wrote:
> >> Jon has already had all the answers to these questions,
> >> and he knows it
>
> Perhaps he has...and perhaps he hasn't...but I would be very interested in
> hearing a response to those questions put forward by Jon.
>
> Mike
>
>
> -----Original Message-----
> From: Michael Hart [mailto:hart at pglaf.org]
> Sent: 29 October 2007 21:23
> To: Project Gutenberg Volunteer Discussion
> Subject: Re: [gutvol-d] Since MIchael brought up some points, e.g., the PGLAF
> board
>
>
> Jon has already had all the answers to these questions,
> and he knows it, he is just trying once again to move a
> certain conversation to the point where he can expound,
> again, at length, how he thinks things should be run.
>
> The most obvious and complete answer is, of course, the
> one you have heard the most often, that he is welcome--
> nay, ENCOURAGED, to stop talking and start DOING and as
> with all suggested courses of ACTION, Project Gutenberg
> will provide as much assistance as possible.
>
> HOWEVER, as long as Jon is igNoring the call to ACTION,
> his words remaind just that, words, and it is obvious--
> thank goodness--that his words never carry any weight--
> and no one has tried to elect him to anything.
>
> DOUBLY HOWEVER, it Mr. Noring SHOULD eventually amass a
> group who want him to lead them, Project Gutenberg will
> be only to glad to offer all possible assistance.
>
> As it has been, as it is, and we can only hope. . . .
>
> Distributed Proofreaders is a perfect example, and this
> example should be more than enough to provide Jon with,
> we hope, all the encouragement needed.
>
> DP is its own entity, has its own leaders, and gets the
> support it requests from Project Gutenberg, just as Mr.
> Noring would/could/should have had if he were a willing
> worker as much as he is a willing talker.
>
> Now you understand why we don't censor what he says, he
> is just too good an example, we'd never find better.
>
>
> Thanks!!!
>
> Michael S. Hart
> Founder
> Project Gutenberg
>
>
>
> On Mon, 29 Oct 2007, Jon Noring wrote:
>
> > Oops, need to clarify a couple points in my prior message. I
> > wrote:
> >
> >> I very much hope that Michael will provide us the exact,
> >> detailed reasons why PGLAF should continue to be organized as
> >> it is, and why he should personally hold the trademark, which
> >> is universally "not recommended." He has not yet done so.
> >
> > Now Michael has given reasons here and there, but in my opinion
> > they are not yet sufficient, nor cogently and objectively
> > organized, to form a final answer on his part. Hopefully he
> > will fully clarify his thoughts to the level or organization
> > where he can simply repost it whenever someone brings it up
> > again. Of course, he can refuse to answer this request, but
> > that does not help for transparency of an organization/movement
> > whose whole philosophy is built around transparency and
> > openness, and relies upon thousands of volunteers.
> >
> > To summarize, I'd like to hear Michael's beliefs on:
> >
> > 1) Why is the PGLAF Board made up the way it is -- what is
> > Michael's
> >   philosophy towards the purpose and role of the Board, and
> > what
> >   kind of people should not serve on the PGLAF Board (besides
> > me, of
> >   course. <smile/>)
> >
> >   I also continue to be mystified why at least Juliet
> > Sutherland or
> >   someone else from DP is not on the PGLAF Board (maybe she was
> > asked
> >   and turned it down, but this is important to know given the
> >   importance of DP to the PG "movement".)
> >
> > 2) What value he sees in himself personally holding the PG
> > trademark
> >   rather than turning it over to PGLAF. How does personal
> > ownership
> >   benefit the long-term mission of the Project Gutenberg
> > "movement"?
> >
> >
> >> Regarding #1, I mentioned that the current PGLAF Board is a
> >> rump board. No one on that Board has real experience with
> >> digitizing the Public Domain, nor are notable in any sense in
> >> the arena.
> >
> > Obviously, one of the four Board members listed at the URL I
> > gave, and I assume that is an updated list, is Dr. Greg Newby,
> > who is the Board Chair and also the CEO. I was referring to the
> > other three members. Refer to:
> >
> >
> http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Literary_Archive_Found
> ation
> >
> >
> >
> > Jon Noring
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From marcello at perathoner.de  Tue Oct 30 04:12:07 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 30 Oct 2007 12:12:07 +0100
Subject: [gutvol-d] Since MIchael brought up some points,
 e.g. the PGLAF board
In-Reply-To: <001901c81a8f$8e2034e0$aa609ea0$@org>
References: <913787777.20071029123635@noring.name>	<1103954005.20071029133647@noring.name>	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>
	<001901c81a8f$8e2034e0$aa609ea0$@org>
Message-ID: <47271187.5090206@perathoner.de>

Mike Cook wrote:

>>> Jon has already had all the answers to these questions,
>>> and he knows it
> 
> Perhaps he has...and perhaps he hasn't...but I would be very interested in
> hearing a response to those questions put forward by Jon.

Me too!

Those were the very questions that emerged on this list during the
aftermath of the "PG II incident".

And even if the answers were already given, a short summary would not
hurt at this place.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From richfield at telkomsa.net  Tue Oct 30 04:54:38 2007
From: richfield at telkomsa.net (Jon Richfield)
Date: Tue, 30 Oct 2007 13:54:38 +0200
Subject: [gutvol-d] Harmless monsters
Message-ID: <47271B7E.5000408@telkomsa.net>

You know folks, much of the tone of this forum is frustrating.  In many 
on-line forums mutual satire, extending to outright abuse, are 
appropriate, widely enjoyed, and even admired; their regulars frequent 
them for just such performances, a sort of verbal all-in-wrestling to 
please those still callow enough to be impressed by the delusion that a 
flaming amounts to a flaying, and that conveying an insult in long words 
or capitals will cow an opponent and thrill the groupies.   However, in 
a forum of literate people, where there is work to be done, the 
appropriate emblem might be the bitten tongue.  

Unfortunately, the more strongly anyone feels about the superiority of 
his own ideas or products, the more passionately he is likely to resent 
rival ideas or slighting responses, and accordingly, the more spitefully 
he is likely retaliate for any offence, real or fancied.  The problem is 
that in the resulting fuss and bother, sound points deserving fair 
consideration, or weighing against each other in appropriate contexts, 
are lost or distorted, without compensatory benefit to anyone.  They are 
hardly worth even a smirk from a competitor who imagines that he has 
administered a well-deserved gob-smacking.  

The plain fact is that this little playpen is no heavyweight boxing 
ring.  In terms of literary intimidation it has yielded neither an Ali 
to cower before, nor a Bierce to enjoy, nor yet a Swift to respect, just 
a few intrusive Donald Ducks to ignore.   I ask you: on reading the most 
vituperative exchanges during say, the last few months, was there a 
solitary one that, if its like had occurred in a kindergarten, you would 
have dignified with special attention?  Is there one quip or insight 
that you were tempted to frame for your desk or memorise for your next 
literary dinner?  

As for putting anyone's name on a kill list, suit yourself of course, 
but it amounts to sulking and is about as effective.  The participants 
might be dead losses as polemicists, and not all their ideas worth the 
paper that one hopes they are not printed upon, but sifted from the 
dreck, some of the actual substance of their material is professional 
and may be rewarding.  

But, you insist, you are too thin-skinned to put up with the nonsense or 
malice pervading the writings of certain parties?  Your blood pressure 
cannot take the nastiness or the stupidity?  Well, bad luck yer 'avin!  
You will just have to put up with missing (unfortunately) something like 
half the substance of the forum, and console yourself with my 
overflowing sympathy, and no doubt, that of some of correspondents with 
no time to waste on all that nonsense.  If that is your view, please be 
very careful not to reflect on the altogether higher blood pressure 
attendant on contemplating the joys you missed by ignoring the bonnest 
of their mots and leaving them to die in silence in the empty house.  
When pays things attention commensurate to their sources, it can be 
quite startling to find how soon one forgets to give a damn, or even 
fails to notice anything to damn.  Winnowing the relevant material from 
the chaffing becomes fully automatic.

It calls to mind one of Pope's more pungent observations (which seems 
not to be anthologised in any material that I have seen in PG).  He was 
embroiled in an ink-and-spittle match with one John Dennis who matched 
him in smallness of spirit, if not of person, but not in largeness of 
talent.  

Should Dennis print how once you robb'd your brother,
Traduc'd your monarch and debauched your mother;
Say what revenge on Dennis can be had,
Too dull for laughter, for reply too mad?

Of one so poor you cannot take the law;
On one so old your sword you cannot draw.
Uncag'd, then, let the harmless monster rage,
Secure in dullness, madness, want and age.
                Alexander Pope  
                (From Cohen, "More Comic and curious verse"
                 Penguin 1956)

In practice Pope did not follow his own advice (knowing anything about 
his nature, how surprising do you find that?) but in this forum it works 
for me.  My blood pressure is fine, thanks for asking, and skimming the 
digested input is far less problematic than tuning kill criteria to keep 
protecting oneself adequately and harmlessly from correspondence from 
undesirable sources.  I live in hopes of some day stumbling across some 
really worth-while squelch or insult from the munchkins, but faced with 
their barrenness so far, it is just as well that I have other reasons 
for such skimming as I do.  

Meanwhile, can everyone else please refrain from taking prisoners, 
refuse to show any mercy, and concentrate on matters in hand, instead of 
rewarding undeserving tantrums?

Cheers,

Jon


From nwolcott2ster at gmail.com  Tue Oct 30 08:33:51 2007
From: nwolcott2ster at gmail.com (Norm Wolcott)
Date: Tue, 30 Oct 2007 10:33:51 -0500
Subject: [gutvol-d] Harmless monsters
References: <47271B7E.5000408@telkomsa.net>
Message-ID: <008401c81b0a$527ce980$660fa8c0@atlanticbb.net>

As I suppose many I use the Outlook Express to move certain author's emails
to a junk file where I can later peruse them or delete them. (28 on Oct
28th) Unfortunately
the list of offending names seems to grow and grow, and the amount of good
information reduces in proportion. .
nwolcott2 at post.harvard.edu
----- Original Message -----
From: "Jon Richfield" <richfield at telkomsa.net>
To: <gutvol-d at lists.pglaf.org>
Sent: Tuesday, October 30, 2007 6:54 AM
Subject: [gutvol-d] Harmless monsters


> You know folks, much of the tone of this forum is frustrating.  In many
> on-line forums mutual satire, extending to outright abuse, are
> appropriate, widely enjoyed, and even admired; their regulars frequent
> them for just such performances, a sort of verbal all-in-wrestling to
> please those still callow enough to be impressed by the delusion that a
> flaming amounts to a flaying, and that conveying an insult in long words
> or capitals will cow an opponent and thrill the groupies.   However, in
> a forum of literate people, where there is work to be done, the
> appropriate emblem might be the bitten tongue.
>
> Unfortunately, the more strongly anyone feels about the superiority of
> his own ideas or products, the more passionately he is likely to resent
> rival ideas or slighting responses, and accordingly, the more spitefully
> he is likely retaliate for any offence, real or fancied.  The problem is
> that in the resulting fuss and bother, sound points deserving fair
> consideration, or weighing against each other in appropriate contexts,
> are lost or distorted, without compensatory benefit to anyone.  They are
> hardly worth even a smirk from a competitor who imagines that he has
> administered a well-deserved gob-smacking.
>
> The plain fact is that this little playpen is no heavyweight boxing
> ring.  In terms of literary intimidation it has yielded neither an Ali
> to cower before, nor a Bierce to enjoy, nor yet a Swift to respect, just
> a few intrusive Donald Ducks to ignore.   I ask you: on reading the most
> vituperative exchanges during say, the last few months, was there a
> solitary one that, if its like had occurred in a kindergarten, you would
> have dignified with special attention?  Is there one quip or insight
> that you were tempted to frame for your desk or memorise for your next
> literary dinner?
>
> As for putting anyone's name on a kill list, suit yourself of course,
> but it amounts to sulking and is about as effective.  The participants
> might be dead losses as polemicists, and not all their ideas worth the
> paper that one hopes they are not printed upon, but sifted from the
> dreck, some of the actual substance of their material is professional
> and may be rewarding.
>
> But, you insist, you are too thin-skinned to put up with the nonsense or
> malice pervading the writings of certain parties?  Your blood pressure
> cannot take the nastiness or the stupidity?  Well, bad luck yer 'avin!
> You will just have to put up with missing (unfortunately) something like
> half the substance of the forum, and console yourself with my
> overflowing sympathy, and no doubt, that of some of correspondents with
> no time to waste on all that nonsense.  If that is your view, please be
> very careful not to reflect on the altogether higher blood pressure
> attendant on contemplating the joys you missed by ignoring the bonnest
> of their mots and leaving them to die in silence in the empty house.
> When pays things attention commensurate to their sources, it can be
> quite startling to find how soon one forgets to give a damn, or even
> fails to notice anything to damn.  Winnowing the relevant material from
> the chaffing becomes fully automatic.
>
> It calls to mind one of Pope's more pungent observations (which seems
> not to be anthologised in any material that I have seen in PG).  He was
> embroiled in an ink-and-spittle match with one John Dennis who matched
> him in smallness of spirit, if not of person, but not in largeness of
> talent.
>
> Should Dennis print how once you robb'd your brother,
> Traduc'd your monarch and debauched your mother;
> Say what revenge on Dennis can be had,
> Too dull for laughter, for reply too mad?
>
> Of one so poor you cannot take the law;
> On one so old your sword you cannot draw.
> Uncag'd, then, let the harmless monster rage,
> Secure in dullness, madness, want and age.
>                 Alexander Pope
>                 (From Cohen, "More Comic and curious verse"
>                  Penguin 1956)
>
> In practice Pope did not follow his own advice (knowing anything about
> his nature, how surprising do you find that?) but in this forum it works
> for me.  My blood pressure is fine, thanks for asking, and skimming the
> digested input is far less problematic than tuning kill criteria to keep
> protecting oneself adequately and harmlessly from correspondence from
> undesirable sources.  I live in hopes of some day stumbling across some
> really worth-while squelch or insult from the munchkins, but faced with
> their barrenness so far, it is just as well that I have other reasons
> for such skimming as I do.
>
> Meanwhile, can everyone else please refrain from taking prisoners,
> refuse to show any mercy, and concentrate on matters in hand, instead of
> rewarding undeserving tantrums?
>
> Cheers,
>
> Jon
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d


From lee at novomail.net  Tue Oct 30 10:01:14 2007
From: lee at novomail.net (Lee Passey)
Date: Tue, 30 Oct 2007 10:01:14 -0700
Subject: [gutvol-d] Since MIchael brought up some points,
 e.g. the PGLAF board
In-Reply-To: <3557b8d0710291844k1c2868e0s262005d0bdf792c8@mail.gmail.com>
References: <913787777.20071029123635@noring.name>	<1103954005.20071029133647@noring.name>	<Pine.LNX.4.64.0710291411260.24540@pglaf.org>	<001901c81a8f$8e2034e0$aa609ea0$@org>
	<3557b8d0710291844k1c2868e0s262005d0bdf792c8@mail.gmail.com>
Message-ID: <4727635A.9030305@novomail.net>

Kevin S. Clarke wrote:

[snip]

> On a related note, anyone attempted lately to do autoconversion from
> the plain text formats to XML (any format)?  Any luck with it?  Just
> curious,
> 
> Kevin

I gave up on trying to convert PG's Impoverished Text Format to any XML 
vocabulary several years ago, when I concluded that it would be 
impossible to do it in any meaningful way.

Most of the work in this area has been done by BowerBird. Five years ago 
he was claiming that he would 'soon' be able to write a program that 
would 'intuit' markup from a PG file, much like a human being can. I 
think that he realized that this was too great a task for him alone, so 
he started developing a new markup language which, to the uninitiated, 
would look like a PG ITF file, but which had subtle, almost 
imperceptible markup that a computer could detect. He has since written 
a perl script which would take a file written in his markup language, 
which he calls z.m.l., and convert it to HTML. I haven't looked at the 
output carefully enough to determine if it is XHTML, which is the direct 
answer to your question, but in any case it would always be possible to 
take BB's HTML (assuming it is valid) and run it through Tidy to get 
valid XHTML.

As of today I don't think an automated conversion process from PG ITF to 
an XML vocabulary exists. You could, of course, do a hand conversion of 
PG ITF to z.m.l., and from there use BB's perl script and Tidy to get 
XHTML to automate at least a portion of it.

Of course, there are any number of ways to autoconvert PG ITF to 
non-meaningful XML vocabularies, but I don't think that's what you were 
asking.

-- 
Nothing of significance below this line.


From Bowerbird at aol.com  Tue Oct 30 10:53:47 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 30 Oct 2007 13:53:47 EDT
Subject: [gutvol-d] autoconversion from the plain text formats to XML
Message-ID: <beb.1ef18e3b.3458c9ab@aol.com>

kevin said:
>    anyone attempted lately to do autoconversion 
>    from the plain text formats to XML (any format)?? 
>    Any luck with it?? Just curious,

i'm here on a regular basis giving advice on that...         :+)

there are any number of light-markup formats
that _could_ be used to generate x.m.l. output,
from markdown (the frontrunner i always plug)
to ascii-doc (the newest contender i mentioned):
>   http://www.methods.co.nz/asciidoc/index.html

and there's no question that these methodologies
_can_ generate the complexity of markup required.
markdown is being used all over cyberspace now.
restructured text is used by the python community
as the light-markup for all of their documentation.
ascii-doc is used for a number of technoid things.

then, of course, there is my zen markup language,
which is actually _based_ on the pg-ascii format...
this means that "conversion" of the e-texts to .zml
is almost fully automatic, with the main exception
being the front-matter, which is typically a mess.

(the largest offenders on this are the title-pages.
while i believe i can code routines to fix them too,
interacting with my clean version of p.g.'s catalog,
the unfortunate fact is i have not yet _done_ that.
once it is done, though, conversion of the entire
project gutenberg library is just one click away.)

although the other light-markup methodologies
make x.m.l. by default, and mine could do x.m.l.,
i don't usually stress x.m.l. as an output format,
since i don't wanna give ideas to my antagonists.
(they still cling to the idea that "it's impossible".)

but if you want the fast-track to an x.m.l. library,
light-markup is the solution you're looking for...

of course, the _real_ problem is the _inconsistency_
of the library in its current state, which would need
to be remedied before you can do any conversions.
(i am using routines i developed over several years,
but i'm not sharing them with my opponents here.)

but if you do _that_, followed by auto-conversion,
then what's the purpose of x.m.l. in the first place?

it makes more sense to use the light-markup files
as your "master", especially since they're infinitely
more malleable than crusty bloatware x.m.l. files...
(because even when they aren't, a conversion to
the x.m.l. variant is just one button-click away...)

that's the existential conundrum of heavy-markup.

until it can be applied _automatically_, it's too costly.
but if it can be auto-applied, it becomes unnecessary.
the best it can hope for is to be a transitory middleman.

and once you understand this conundrum, it's funny...

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071030/8c6bdca9/attachment.htm 

From jon at noring.name  Tue Oct 30 11:29:15 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 12:29:15 -0600
Subject: [gutvol-d] autoconversion from the plain text formats to XML
In-Reply-To: <beb.1ef18e3b.3458c9ab@aol.com>
References: <beb.1ef18e3b.3458c9ab@aol.com>
Message-ID: <907090513.20071030122915@noring.name>

Bowerbird wrote:

>  although the other light-markup methodologies
>  make x.m.l. by default, and mine could do x.m.l.,
>  i don't usually stress x.m.l. as an output format,
>  since i don't wanna give ideas to my antagonists.
>  (they still cling to the idea that "it's impossible".)

Well, since you've added me to the "it's impossible" list without
explaining what you mean by "impossible," I've noted for a while now
that one can go from normalized plain text (where the rules for
normalization identify document structures and stuff), into XML of any
compatible vocabulary.

In fact, it's intrigued me to talk with a script friend of mine and
see how long it would take him to write a script to take ZML (assuming
we have the latest rule set) and convert it into a pre-defined XML
vocabulary, probably XHTML with a couple pre-defined classes. What
would be more interesting is to reverse the process from that exactly
defined XML vocabulary back into ZML. Again, I know it can be done.
(Yes, I know you've already written a ZML to crappy SGML-based HTML
script, but again I knew this could be done before you did it.)

The thing is it can be done. Just like we know when you drop a bowling
ball from a ten story building it will not fly up but rather down to
the ground, we know that when one has ZML one can convert it to XML.
No need to perform the experiment since we know the result. The key
is: is it worth it to spend the time writing the scripts?

There are other things requiring experimentation to see if our
speculations hold true or not -- this is not one of them.


>  but if you want the fast-track to an x.m.l. library,
>  light-markup is the solution you're looking for...

If you replace "is" by "may be", then what you say will be correct, in
my opinion. Using the word "is" is sales jargon.


>  of course, the _real_ problem is the _inconsistency_
>  of the library in its current state, which would need
>  to be remedied before you can do any conversions.
>  (i am using routines i developed over several years,
>  but i'm not sharing them with my opponents here.)

Funny thing how you continue to treat this as a game, a competition.
Whatever happened to peace and harmony and working with your fellow
man and all of that? Open source collaboration, etc.


>  but if you do _that_, followed by auto-conversion,
>  then what's the purpose of x.m.l. in the first place?

This is a loaded question since it presupposes that ZML is sufficient,
and we've noted time and again that that remains to be seen, and many
of us believe it is not sufficient. Such as how to handle blockquotes
(which themselves can be documents -- I even suggested a tweak to ZML
to make it work). Unless you can demonstrate how to properly identify
a block quote from verse, ZML is insufficient. (Now maybe you've fixed
this deficiency in ZML, but I've not seen it in your online rule set,
or are you hiding the latest ZML rules as proprietary?)


Jon Noring


From hart at pglaf.org  Tue Oct 30 12:30:48 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 12:30:48 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
References: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0710301203340.14197@pglaf.org>


The real issue about Jon's frequent efforts to rewrite PG
in his own image is that Jon really hasn't presented this
image for anyone to consider. . .something we have all in
a number of ways given him permission and encouragements,
again and again, to do.

Whatever Jon wants to do, we will gladly support, as will
we support any number of other such parties.

However, it seems to come down to the idea that Jon wants
to control the means of such support not just receive it,
and I haven't seen anyone support that idea/ideal.

Jon wants not only to control what he does, what his own,
should they arise, army of volunteers does, but what some
others would be encouraged or discouraged to do, at least
that's what it has always look to me like, over the years
as Jon has tried various suggestions to stack the Boards,
get control of the trademark, make money the top priority
and the rest of his agenda, none of which seems to be the
actual process of getting more eBook to more people.

I would be only too happy to see Jon raise his army, have
some use for the support we continually offer him, and do
something really truly great in the world of eBooks.

Very little would make me happier. . . .

However, as also said before, this is not to be at a huge
expense of giving Jon control over PG at large, either in
budgetary considerations [not that we have a real budget,
since we don't have any real money], or in terms of Board
membership, placement as CEO, etc.

We will help Jon do whatever project he has in mind.

We will not help Jon or anyone else gain political power,
simply because we don't believe in political power.

Jon WANTS there to be political power, financial power or
even more kinds of power in Project Gutenberg, but, those
who could have had such power do NOT want it in their own
hands or in anyone else's.

That, in a nutshell, is the reply to Mr. Noring's effort.

And, it has all been said pretty much that way before.

I, personally, do not like referring people to archives--
or just pulling out pieces of archives and reprinting the
ones I find most relevant. . .it's too much of the past--
and even though Mr. Noring's arguments are of the past in
very large part, I find it more worthwhile to consider it
in the present, especially since he does.

As for "Founder's Syndrome," I continue to encourage each
of you to found her/his own projects and to expect a full
order of all support we can muster, then you can each say
whatever you like about "Founder's Syndrome" from the one
point of view you haven't had. . .as Founder.

No one here is upset about "Founder's Syndrome" rather it
is just the opposite, someone straining to attain powers,
so great, they COULD create their own Founder's Syndrome.

Project Gutenberg has never relied on political power and
financial power, and that is real issue here, to create a
power system than can then be manipulated or taken over.

Not going to happen. . . .

"Freedom, as knowledge, is best served when all have it."

And you can quote me on that.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


On Mon, 29 Oct 2007, Karen Lofstrom wrote:

> If Jon is to be ignored because he hasn't done enough books, perhaps
> my voice will be heard. I've proofed some 39,000 pages at DP, over the
> course of four years, and post-processed several books.
>
> PG has a bad case of Founder's Syndrome:
>
> http://www.help4nonprofits.com/NP_Bd_FoundersSyndrome_Art.htm
>
> It's frequent, it's predictable, it's the common cold of non-profits.
>
> Jon has raised some sensible questions about governance and ownership
> of the trademark and they shouldn't be dismissed with accusations that
> Jon is trying to take over PG.
>
> Alas, I don't expect that I *will* be heard. So I won't belabor the point.
>
> -- 
> Karen Lofstrom
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From joshua at hutchinson.net  Tue Oct 30 12:50:28 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Tue, 30 Oct 2007 19:50:28 +0000 (UTC)
Subject: [gutvol-d] !@! Re:  Founder's syndrome
Message-ID: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>

>----Original Message----
>From: hart at pglaf.org
>
>as Jon has tried various suggestions to stack the Boards,
>get control of the trademark, make money the top priority
>and the rest of his agenda, none of which seems to be the
>actual process of getting more eBook to more people.
>

Jon has not done any of the above things in any of the MANY notes I've 
read.

You've accused him of it often enough, true, but that isn't the same 
thing.

He has asked about the makeup of the board.  Even suggested folks he 
thought would be good folks to have on the board.  Never once has he 
suggested himself OR any of the people you've accused of being his 
"groupies" (your term from a couple messages back).  You've never 
answered how the board was constituted, but then again that may not 
really be your area.  Jon should probably aim that question to Greg.

He has asked why you own the trademark and not the PGLAF foundation.  
In fact, LOTS of people have asked this question.  To be honest, I kind 
of remember Greg saying it was at the direction of a lawyer at some 
point, but I don't remember for certain.  An actual answer to this 
would be appreciated.

He has said making money would be a good thing.  But he never said it 
was making money for profit, but as a way to increase the amount of 
good work PG puts out there.  Whether you agree with this (and I don't 
know that I do), it is most definitely not an evil plot, as you 
insinuate.

Now, as far as personally working to get more books in more hands ... 
you probably have a point as far as his PG efforts are concerned.  
Other than the My Antonia scans, I don't know of a whole lot he has 
done.  But that doesn't stop folks from having opinions and shouldn't 
stop him from expressing them.  Provided he is polite and all ... which 
he usually is.  More so than I am, oftimes.

Josh

From hart at pglaf.org  Tue Oct 30 12:54:42 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 12:54:42 -0700 (PDT)
Subject: [gutvol-d] Founder's syndrome
In-Reply-To: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
References: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0710301230520.14197@pglaf.org>


A bit about the specifics of "Founder's Syndrome"


Here are the basics from the article mentioned:

http://www.help4nonprofits.com/NP_Bd_FoundersSyndrome_Art.htm


1.  Ideas of the Founder are "rubber stamped"


With Project Gutenberg it is just the opposite:

What Mr. Noring is really upset about is that ALL ideas in PG
are "rubber stamped" to a nearly 100% degree.

Mr. Noring would prefer the kind of political power that gets
its real power from saying "NO!"


Not going to happen. . . .


Project Gutenberg will continue to encourage all efforts.


2.  The current leadership took over in "tough times" or as
"a start-up" or through "a growth spurt" or surviving types
of "financial collapse," etc.

Sorry, but none of this has happened here.

No one took over in "tough times."

No one took us through "financial collapse."

Everyone is encouraged to found "a start-up" of their own.

Greg Newby, Harry Hilton, Mark Zinzow, the board members of
Project Gutenberg, have all been here for 20 years, and all
encourage anyone who would like to start their own projects
in every way they possibly can.

The Distributed Proofreaders is a perfect example.

Do you think they bought their own super-scanners.


3.  "$12 million community powerhouse"

Sorry, PG hasn't had even $1 million in all 37 years. . . .

Yes, we ARE a powerhouse, but not through financials. . . .

We are a powerhouse because we are all volunteers, and thus
no can can stop us by stopping our funding.


4.

"The Founder might lose his/her total control of the organization.
Boards of these organizations usually don't govern, 
but instead 'approve' what the founder suggests."

What Mr. Noring hates the most is that I do NOT exercise 
"control of the organization" much less "total control,"
and that neither does the Board of Directors.

Instead, everyone is encouraged to try their projects to
make the world of eBooks a better place.

Mr. Noring would prefer a system that had such control.


At least that's the way I hear his proposals for Boards
of Directors and money as the top priority.


He is welcome to that kind of control in whatever types
of project he creates, over however many volunteers his
army is made up of, but that will never be 100%, which,
truly, is as it should be.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


From Bowerbird at aol.com  Tue Oct 30 12:54:44 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 30 Oct 2007 15:54:44 EDT
Subject: [gutvol-d] =?iso-8859-1?q?!=40!_Re=3A=A0_Founder=27s_syndrome?=
Message-ID: <c14.26b7cb0f.3458e604@aol.com>

michael said:
>    Project Gutenberg has never relied on political power and
>   financial power, and that is real issue here, to create a
>   power system than can then be manipulated or taken over.

and what's _most_ interesting about this is the _meta_ level...

anyone can build an organization that rejects a power system.
a few people can even make such an organization _work_well_.

but michael has gone beyond even that.

michael says, "if you want to create your own organization that
relies on a power system, you can use my p.g. content to do it!"

wow.   that's impressive.

michael is so confident in the superiority of his methodology
that he is willing to _give_other_people_ the fruit of its labor
so they can build upon it to create a competing organization.

he'll even give that competing organization _webspace_ and
pay for their _bandwidth_.   and do it all with a happy heart...

that takes _balls_.

-bowerbird


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071030/f8627adb/attachment.htm 

From jon at noring.name  Tue Oct 30 13:03:53 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 14:03:53 -0600
Subject: [gutvol-d] Wow! Re:  !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301203340.14197@pglaf.org>
References: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
	<Pine.LNX.4.64.0710301203340.14197@pglaf.org>
Message-ID: <76647410.20071030140353@noring.name>

Michael wrote:

> The real issue about Jon's frequent efforts to rewrite PG
> in his own image is that Jon really hasn't presented this
> image for anyone to consider. . .something we have all in
> a number of ways given him permission and encouragements,
> again and again, to do.

> Whatever Jon wants to do, we will gladly support, as will
> we support any number of other such parties.
>
>
> [and more things about me that I didn't know about myself.]

Wow!


Well, I certainly now know what it feels like to be tarred and
feathered in a public forum! <smile/> (I've experienced similar over
the years, but this is probably the most profound.)

Anyway, I will not reciprocate in kind since that serves no good
purpose.

What is interesting is that Michael's message was in response to a
message posted by Karen Lofstrom, and which I never replied.


For those still reading this, three questions to ponder:

1) How appropriate was Michael's answer to the mission of PG?

2) Was it important for Michael, the founder of the PG movement, to
   write what he did, focusing on my character and his perception of
   my nefarious designs towards PG?

   (Yes, he made some general philosophical points, but I appeared
   to be the "example" he wanted to use.)

3) Will this group, gutvol-d, now be a better place because of that
   message?


Jon Noring


From hart at pglaf.org  Tue Oct 30 13:15:38 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 13:15:38 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <Pine.LNX.4.64.0710301256540.14197@pglaf.org>


On Tue, 30 Oct 2007, joshua at hutchinson.net wrote:

>> ----Original Message----
>> From: hart at pglaf.org
>>
>> as Jon has tried various suggestions to stack the Boards,
>> get control of the trademark, make money the top priority
>> and the rest of his agenda, none of which seems to be the
>> actual process of getting more eBook to more people.
>>
>
> Jon has not done any of the above things in any of the MANY notes I've
> read.
>
> You've accused him of it often enough, true, but that isn't the same
> thing.

Just go back to where he proposed duplicating Stanford's Board.

Then when he nominated Brewster Kahle.

Did he not also nominate Richard Stallman around that time?


Am I the only one who remembers?


What need has PG of a Board structure of a billion dollar a year
major world university?


If you have me listed as using the term "Groupies" then I do not
doubt that you may also have mixed up everything else.


As for Kahle and Stallman, I have probably spent more time with
them than anyone here, both in person, on the phone or email.

I think they do a fine job with their own organizations, but it
is not the same in either case as the goals of Project Gutenberg.

If you want your own Board structures, or places on them, just do
what Distributed Proofreaders did. . .you don't even have to make
yourselves a corporation, they didn't for a long time.

If you are successful, as we hope you are, you can either stay as
a part of Project Gutenberg or become and affiliate; as suggested
below, some of this is Greg's turf, I'm not up on all the legals,
as to what constitutes and "affiliate" or "parter" or other.

I have answered the trademark question before, and will again, to
your never to be had satisfaction. . .I believe in the separation
of powers. . .I don't wan't someone to be able to take over Board
of Directors positions and thence all claim to everything that is
named "Project Gutenberg."

I give everyone who asks permission to open their own sites in an
approved "Project Gutenberg" legal fashion:  again you might want
to ask Greg about all the legal details, but as soon as they will
be accomplished there is no other impediment.

The only thing I see here is more of the power to say "NO!"

Project Gutenberg is a resounding "YES!"

And I think THAT is what bothers Mr. Noring and the others whom I
think would NOT share the extremely "Open Door Policies" of Greg,
Harry, Mark, myself, and the rest of the people who just want the
eBooks to get out there to as many people as possible.

Most Boards, CEO's and Founders, say "NO!" most of the time. . .!

Project Gutenberg says "YES!" nearly all of the time. . . !

THAT is what you see me fighting so hard to preserver here!!!!!!!


Thank You!!!


Give the world eBooks in 2007!!!


Michael S. Hart
Founder
Project Gutenberg

100,000 eBooks easy to download at:
http://www.gutenberg.org [coming up on 25,000 eBooks]
http://www/gutenberg.cc [already passed 75,000 eBooks]
http://gutenberg.net.au   Project Gutenberg of Australia 1500+
http://pge.rastko.net 65 languages  PG of Europe ~500
http://gutenberg.ca  Project Gutenberg of Canada

Blog at http://hart.pglaf.org


> He has asked about the makeup of the board.  Even suggested folks he
> thought would be good folks to have on the board.  Never once has he
> suggested himself OR any of the people you've accused of being his
> "groupies" (your term from a couple messages back).  You've never
> answered how the board was constituted, but then again that may not
> really be your area.  Jon should probably aim that question to Greg.
>
> He has asked why you own the trademark and not the PGLAF foundation.
> In fact, LOTS of people have asked this question.  To be honest, I kind
> of remember Greg saying it was at the direction of a lawyer at some
> point, but I don't remember for certain.  An actual answer to this
> would be appreciated.
>
> He has said making money would be a good thing.  But he never said it
> was making money for profit, but as a way to increase the amount of
> good work PG puts out there.  Whether you agree with this (and I don't
> know that I do), it is most definitely not an evil plot, as you
> insinuate.
>
> Now, as far as personally working to get more books in more hands ...
> you probably have a point as far as his PG efforts are concerned.
> Other than the My Antonia scans, I don't know of a whole lot he has
> done.  But that doesn't stop folks from having opinions and shouldn't
> stop him from expressing them.  Provided he is polite and all ... which
> he usually is.  More so than I am, oftimes.
>
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Tue Oct 30 13:55:25 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 13:55:25 -0700 (PDT)
Subject: [gutvol-d] Wow! Re:  !@! Re:  Founder's syndrome
In-Reply-To: <76647410.20071030140353@noring.name>
References: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>
	<Pine.LNX.4.64.0710301203340.14197@pglaf.org>
	<76647410.20071030140353@noring.name>
Message-ID: <Pine.LNX.4.64.0710301331130.14197@pglaf.org>


On Tue, 30 Oct 2007, Jon Noring wrote:

> Michael wrote:
>
>> The real issue about Jon's frequent efforts to rewrite PG
>> in his own image is that Jon really hasn't presented this
>> image for anyone to consider. . .something we have all in
>> a number of ways given him permission and encouragements,
>> again and again, to do.
>
>> Whatever Jon wants to do, we will gladly support, as will
>> we support any number of other such parties.
>>
>>
>> [and more things about me that I didn't know about myself.]
>
> Wow!
>
>
> Well, I certainly now know what it feels like to be tarred and
> feathered in a public forum! <smile/> (I've experienced similar over
> the years, but this is probably the most profound.)
>
> Anyway, I will not reciprocate in kind since that serves no good
> purpose.
>
> What is interesting is that Michael's message was in response to a
> message posted by Karen Lofstrom, and which I never replied.
>
>
> For those still reading this, three questions to ponder:
>
> 1) How appropriate was Michael's answer to the mission of PG?
>
> 2) Was it important for Michael, the founder of the PG movement, to
>   write what he did, focusing on my character and his perception of
>   my nefarious designs towards PG?
>
>   (Yes, he made some general philosophical points, but I appeared
>   to be the "example" he wanted to use.)
>
> 3) Will this group, gutvol-d, now be a better place because of that
>   message?
>
>
> Jon Noring


Jon brings up at least one point here we should really consider,
how will this conversation be recorded in history, will Project
Gutenberg "now be a better place because of that message?"

As for his other comments, Jon has insisted on being the focal-
point of this conversation, year after year.

Should I just ignore him if he tries again next year?

Should I have ignored all this this year?

Now my reply:

The reason I take so much time answering Jon's yearly efforts,
such as they are, is to make sure people know I answer all the
email I receive, other than the most obvious of trolls, spams,
and the like, and to find out just how seriously persons might
be taking these sorts of things.

I answered again for what I considered obvious reasons, even a
day after hoping the conversation was over for another year.

That reason being that someone seemed sincerely concerned.

However, the question still comes down do what would Jon do in
the future with the power he wants that he can't do now?

Personally, I can't think of anything, other than saying he is
the whatever position he would like of Project Gutenberg as an
entire entity, as opposed to his own team of volunteers.

The real question to be considered in response to this message
is "what will all this look like years from now?"

I can only presume that it will continue to look as if Noring,
over and over and over again, has tried to remake a Gutenberg,
in his own image of what he thinks it should be like, but with
political power to establish no policy.

Let us not forget that policy is the root of politics, and the
resulting term, political power.

Jon has no policy.

This is why my answers always include questions as to what Jon
would like to DO. . .what ACTION he would like to take. . . .

It seems all to obvious that Jon wants the power for itself on
no other basis or he would have put forth any number of plans,
projects, proposals, etc., over the years that would have made
him the position he seems to desire so longingly.

Mr. Noring is welcome, once again, to all the supports Project
Gutenberg has offered to everyone else who asks, and to name a
title to his own liking within whatever projects he does.

He is more than free to do what he wants to do.

Does anyone know what that is?

Other than change Project Gutenberg as a whole to whatever?

Why does Mr. Noring feel this great need to Project Gutenberg,
as a whole, when he has not even staked out his own portion of
it as anyone and everyone is encouraged to do?

Is it really not that completely obvious?


This is why I answer. . .for the future. . . .


So when anyone looks back on what has happened before, they're
going to hopefully find they are not alone in the future.

The question is:


Will it be the future Mr./Ms. Noring's who look?

Or will it be those s/he wishes to manipulate?


Thank You For Your Time And Attention,

and

hopefully,

for your support. . .in the years, decades, centuries to come.


Michael S. Hart
Founder
Project Gutenberg


From jon at noring.name  Tue Oct 30 14:03:46 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 15:03:46 -0600
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301256540.14197@pglaf.org>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
Message-ID: <1952325838.20071030150346@noring.name>

Michael Hart wrote:

> Just go back to where he proposed duplicating Stanford's Board.
>
> Then when he nominated Brewster Kahle.
>
> Did he not also nominate Richard Stallman around that time?

So what is wrong with having a Board include a number of notables in
the PD text digitization arena, and related areas such as open source
software, who themselves are making things happen?

Brewster would be a *great* addition to the PGLAF Board, as would one
or two from DP. And Richard Stallman -- maybe (I don't have any feelings
one way or the other for him, but suggested him as an example.)

And remember, my proposals were part of an early 2004 call at the
time, which followed the one and only PG "face-to-face" meeting at
Brewster's place, for ideas on strengthening the PG organization. And I
remember that part of the call was to strengthen the organization so
as to increase donations from various sources, as well as other
potential sources of revenue.

PG may be austere and live on very little, but it still needs the good
graces of money or its equivalent to keep going even at its current
level of organization.

So in good faith I made suggestions. I guess no good deed goes
unpunished.


> Am I the only one who remembers?

No, I remembered, too. <smile/>

In fact, I remember a lot of things said at the 2003 PG "face-to-face"
get together....


> What need has PG of a Board structure of a billion dollar a year
> major world university?

That's not the point. The point is good governance, and that includes
people on the Board who are active in the field and with a proven
track record in making things happen. It has several benefits.

E.g., this helps build bridges to other organizations. The one thing I
know everyone notices is how "isolated" PG is. Now some might say that
is a good thing, while others say it is not a good thing. There are
good reasons on both sides. I tend to fall on the side that PG is best
served by coming out of its isolation *some*. I reject the notion that
doing so will somehow taint the organization and lead to losing its
"mojo".

*****

Anyway, I think the point is that it is a stretch to claim I was
trying to "take over" PG by my simple suggestions. Anyone with a
rational mind would know I had and still have no power base to effect
anything like that. Thus it is a ridiculous suggestion.

In fact, I'm somewhat flattered that anyone would even think I have
that level of power that scares them so. Geez, if I had that level of
power, maybe I *should* use it?

One good thing to come out of this is that at least, in the noise, the
makeup of the PGLAF Board is being discussed, that the philosophy of
the organization is being discussed, etc. This is healthy no matter
what results.

Jon Noring


From grythumn at gmail.com  Tue Oct 30 14:06:59 2007
From: grythumn at gmail.com (Robert Cicconetti)
Date: Tue, 30 Oct 2007 17:06:59 -0400
Subject: [gutvol-d] !@! Re: Founder's syndrome
In-Reply-To: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
Message-ID: <15cfa2a50710301406t31e57f13t2aa62c2636aaf3da@mail.gmail.com>

On 10/30/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
> suggested himself OR any of the people you've accused of being his
> "groupies" (your term from a couple messages back).  You've never

Josh, I think you'll find that was Jon Richfield that first mentioned
the word "groupies"... unless you are referring to a message that I
have not received.


I don't want to get entangled in the rest of it, but I would like to
hear the reason why the PG trademark is set up the way it is.

I also think there is a difference between a) mandating something, b)
recommending something but accepting virtually anything, and c)
accepting virtually everything and recommending very little. There
seems to be little discussion of the middle path.

R C
(Who is not going to read gutvol-d for a few days until he finishes up
some Rule 6 research.)

From joshua at hutchinson.net  Tue Oct 30 14:14:48 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Tue, 30 Oct 2007 21:14:48 +0000 (UTC)
Subject: [gutvol-d] !@! Re: Founder's syndrome
Message-ID: <22297019.1193778888173.JavaMail.?@fh1038.dia.cp.net>

>----Original Message----
>From: grythumn at gmail.com
>
>On 10/30/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
>> suggested himself OR any of the people you've accused of being his
>> "groupies" (your term from a couple messages back).  You've never
>
>Josh, I think you'll find that was Jon Richfield that first mentioned
>the word "groupies"... unless you are referring to a message that I
>have not received.
>

I apologize.  I should have double-checked that attribution.

Josh

From hart at pglaf.org  Tue Oct 30 14:37:41 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 14:37:41 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <1952325838.20071030150346@noring.name>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
Message-ID: <Pine.LNX.4.64.0710301409140.14197@pglaf.org>


On Tue, 30 Oct 2007, Jon Noring wrote:

> Michael Hart wrote:
>
>> Just go back to where he proposed duplicating Stanford's Board.
>>
>> Then when he nominated Brewster Kahle.
>>
>> Did he not also nominate Richard Stallman around that time?
>
> So what is wrong with having a Board include a number of notables in
> the PD text digitization arena, and related areas such as open source
> software, who themselves are making things happen?
>
> Brewster would be a *great* addition to the PGLAF Board, as would one
> or two from DP. And Richard Stallman -- maybe (I don't have any feelings
> one way or the other for him, but suggested him as an example.)

What exactly would we be gaining from Messrs Kahle and Stallman
that we can't get from them simply by asking?


> And remember, my proposals were part of an early 2004 call at 
> the time, which followed the one and only PG "face-to-face" 
> meeting at Brewster's place, for ideas on strengthening the PG 
> organization. And I remember that part of the call was to 
> strengthen the organization so as to increase donations from 
> various sources, as well as other potential sources of revenue.

The First Fule of Journalism:  "Follow The Money"

So. . it still comes down to the fact that Mr. Noring wants PG's
top priority to be money.

However, Mr. Noring, very obviously, still refuses to outline an
array of projects he would enact if he were able to get APPROVAL
from Project Gutenberg in some way he can't already get.

WHAT does Mr. Noring want?

He won't say. . . .

Other than what appears to be simple political/financial power.


> PG may be austere and live on very little, but it still needs 
> the good graces of money or its equivalent to keep going even 
> at its current level of organization.

Not in any real sense.

We've never had a project we couldn't pay for.

We've never had a bill received we couldn't pay for.

And there is no reason Mr. Noring couldn't go out fundraising--
in the name of Project Gutenberg--for whatever projects.

> So in good faith I made suggestions. I guess no good deed goes
> unpunished.

That's just the problem. . .WHAT are Mr. Noring's suggestions???

What are the "good deeds" Mr. Noring wants to accomplish?


>> Am I the only one who remembers?
>
> No, I remembered, too. <smile/>
>
> In fact, I remember a lot of things said at the 2003 PG "face-to-face"
> get together....
>
>
>> What need has PG of a Board structure of a billion dollar a year
>> major world university?
>
> That's not the point. The point is good governance, and that includes
> people on the Board who are active in the field and with a proven
> track record in making things happen. It has several benefits.

Mr. Noring never says just what he would do with "good governance."


> E.g., this helps build bridges to other organizations. The one 
> thing I know everyone notices is how "isolated" PG is.

Everyone notices how "isolated" PG is???

Then how does Mr. Noring explain the hundreds of eLibraries eBook
donations come to us from, or the thousands of people from world-
wide geographic locations, inside and outside academia, etc???

What kind of "isolation" is this???


> Now some might say that is a good thing, while others say it is 
> not a good thing. There are good reasons on both sides. I tend 
> to fall on the side that PG is best served by coming out of its 
> isolation *some*. I reject the notion that doing so will 
> somehow taint the organization and lead to losing its "mojo".

Project Gutenberg allows everyone to use our eBook collection.

Perhaps Mr. Noring's response to this is on the same order as the
reaction to the fact that everyone in Project Gutenberg gets YES!
as a response to nearly every project they want to try.

Still no answer to what Mr. Noring's version of the Board may get
on the agenda that would be be there under the current board....


It still comes down to the fact that Mr. Noring as no ACTION in a
long, long, long series of TALKING. . .no proposal of ACTION.

The reason is quite literally that any ACTION he would propose is
likely to get an IMMEDIATE "YES!"

And then he would have nothing to complain about.

As it is, the only think he can complain about not getting is the
political and financial power that is not Project Gutenberg.

If you want that kind of poliitical/financial power, it will have
to be found elsewhere than in Project Gutenberg.


>
> *****
>
> Anyway, I think the point is that it is a stretch to claim I 
> was trying to "take over" PG by my simple suggestions. Anyone 
> with a rational mind would know I had and still have no power 
> base to effect anything like that. Thus it is a ridiculous 
> suggestion.

Still only the requests for money any political power.

Certainly "a ridiculous suggestion" for Project Gutenberg.


> In fact, I'm somewhat flattered that anyone would even think I 
> have that level of power that scares them so. Geez, if I had 
> that level of power, maybe I *should* use it?

You have managed to get more attention focused on you than any
other person has ever had on this list, not enough for you?


> One good thing to come out of this is that at least, in the 
> noise, the makeup of the PGLAF Board is being discussed, that 
> the philosophy of the organization is being discussed, etc. 
> This is healthy no matter what results.

Except that this conversation has been all about TALK. . . .

It would be nice if you put the same attention to bring eBooks
to a greater and greater portion of the world.


>
> Jon Noring


Michael Hart

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From hart at pglaf.org  Tue Oct 30 14:40:41 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 14:40:41 -0700 (PDT)
Subject: [gutvol-d] !@! Re: Founder's syndrome
In-Reply-To: <15cfa2a50710301406t31e57f13t2aa62c2636aaf3da@mail.gmail.com>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<15cfa2a50710301406t31e57f13t2aa62c2636aaf3da@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0710301440050.18320@pglaf.org>


On Tue, 30 Oct 2007, Robert Cicconetti wrote:

> On 10/30/07, joshua at hutchinson.net <joshua at hutchinson.net> wrote:
>> suggested himself OR any of the people you've accused of being his
>> "groupies" (your term from a couple messages back).  You've never
>
> Josh, I think you'll find that was Jon Richfield that first mentioned
> the word "groupies"... unless you are referring to a message that I
> have not received.
>
>
> I don't want to get entangled in the rest of it, but I would like to
> hear the reason why the PG trademark is set up the way it is.
>
> I also think there is a difference between a) mandating something, b)
> recommending something but accepting virtually anything, and c)
> accepting virtually everything and recommending very little. There
> seems to be little discussion of the middle path.

The simple reason is the saying "YES!" gets you more than "NO!"

michael

>
> R C
> (Who is not going to read gutvol-d for a few days until he finishes up
> some Rule 6 research.)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From jon at noring.name  Tue Oct 30 14:53:29 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 15:53:29 -0600
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301409140.14197@pglaf.org>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
	<Pine.LNX.4.64.0710301409140.14197@pglaf.org>
Message-ID: <892078063.20071030155329@noring.name>

Michael wrote:

> WHAT does Mr. Noring want?
>
> He won't say. . . .
>
> Other than what appears to be simple political/financial power.

Michael, I've had it with your flights of fantasies and delusions.

I do not plan to continue any conversation with you under the current
circumstances of the irrational hostility you are showing me. If you
want to believe you've won this "debate", go right ahead. It's not a
debate, it's an irrational spewing of delusions, and my dad told me a
long time ago there's no use arguing with a crazy person.

I actually feel sorry for you.

There's enough messages out there that the few others who are even
continuing to follow this exchange (and I don't blame them if they
gave up a long time ago) will be able to form their own opinions.

Jon Noring


From hart at pglaf.org  Tue Oct 30 15:00:56 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 15:00:56 -0700 (PDT)
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <892078063.20071030155329@noring.name>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
	<Pine.LNX.4.64.0710301409140.14197@pglaf.org>
	<892078063.20071030155329@noring.name>
Message-ID: <Pine.LNX.4.64.0710301455440.18320@pglaf.org>


I must say I agree wholeheartedly with Jon on this:

Jon and I had agreed to let this all go yesterday,
and I can only wish everyone had felt the same.

I can only offer Jon my most sincere apologies for
what has transpired since. . . .

I honestly wish it hadn't happened.

I feel Jon did get more support, but obviously not
with the results he and I both intended yesterday.

I feel I should perhaps have asked his permission,
and perhaps everyone's, before answering the other
messages that continued things.

I feel even MORE strongly that I should have asked
all concerned if they would mind if I NOT answer--
out of respect for Jon's and my decision. . . .

Hopefully I will do better next time. . .presuming
there IS a next time. . . .

Michael


On Tue, 30 Oct 2007, Jon Noring wrote:

> Michael wrote:
>
>> WHAT does Mr. Noring want?
>>
>> He won't say. . . .
>>
>> Other than what appears to be simple political/financial power.
>
> Michael, I've had it with your flights of fantasies and delusions.
>
> I do not plan to continue any conversation with you under the current
> circumstances of the irrational hostility you are showing me. If you
> want to believe you've won this "debate", go right ahead. It's not a
> debate, it's an irrational spewing of delusions, and my dad told me a
> long time ago there's no use arguing with a crazy person.
>
> I actually feel sorry for you.
>
> There's enough messages out there that the few others who are even
> continuing to follow this exchange (and I don't blame them if they
> gave up a long time ago) will be able to form their own opinions.
>
> Jon Noring
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From Bowerbird at aol.com  Tue Oct 30 15:03:50 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 30 Oct 2007 18:03:50 EDT
Subject: [gutvol-d] "founder's syndrome" as it relates to project gutenberg
Message-ID: <cb3.1ee13732.34590446@aol.com>

here's the webpage on "founder's syndrome" that was "recommended":
>    http://www.help4nonprofits.com/NP_Bd_FoundersSyndrome_Art.htm

so let's see how it relates to project gutenberg...

it says:
>    Founder?s Syndrome occurs when a single individual 
>    or a small group of individuals bring an organization 
>    through tough times (a start-up, a growth spurt, 
>    a financial collapse, etc.). Often these sorts of situations
>    require a strong passionate personality - someone who 
>    can make fast decisions and motivate people to action.

well, it _kinda_ sounds like project gutenberg.
except for that "fast decisions" part, that is...
for its first 20 years, p.g. moved _very_ slowly.

and i'm sure that, back in those early days, when
a person worked on a book, it went slowly, and
michael wasn't there "motivating them to action".


>    Once those rough times are over, however, 
>    the decision-making needs of the organization change, 
>    requiring mechanisms for shared responsibility and authority.

it's not at all clear to me that "the decision-making needs" of
project gutenberg have changed...   it's "same as it ever was"...


>    It is when those decision-making mechanisms don?t change,
>    regardless of growth and changes on the program side,
>    that Founder?s Syndrome becomes an issue. 

there have been no "growth and changes on the program side".

if the staff of this non-profit organization had grown immensely,
or the modus operandi was significantly different than it had been,
then there might be some need to change.   but that hasn't happened.

a lot more books are coming in, yeah, but they're still treated the same.


>    We see this most frequently with organizations that have grown from 
>    a mom-and-pop operation to a $12 million community powerhouse, 
>    while decisions are still made as if the founders are gathered around 
>    someone?s living room, desperately trying to hold things together.

so, now we see were this guy is coming from.   if project gutenberg had
grown to a $12-million organization, maybe michael's laissez-faire style
would no longer be the best one.   or if michael was clinging to a style of
"desperately holding things together" when that was not appropriate,
then there might be some validity to a call for change.   but not only is
michael not now "desperately holding things together", he never was...
there's very little money in the budget now, and there never has been,
but i've never heard project gutenberg described as being "desperate".

michael consciously and _intentionally_ created a thing which is able to
survive and even thrive _without_much_cash_, a remarkable achievement.

project gutenberg is a _weed_, one that cannot be killed, no matter what.

and this is _precisely_ the magic of project gutenberg.   (and wikipedia 
too.)
to create a no-budget organization that goes on to produce something of
_immense_value_ is an extraordinary accomplishment.   it's truly _amazing_.

instead of criticizing michael's style, people should be _emulating_ it,
and trying (and failing) to find the words to express its _brilliance_...

you don't do a weed a favor trying to transform it into a hot-house flower.


>    the main symptom of Founder?s Syndrome is that 
>    decisions are not made collectively. Most decisions 
>    are simply made by the "founder." All other parties 
>    merely rubber stamp what the founder suggests. 

now, do you see why this whole topic is just _silly_ in regard to p.g.?

michael doesn't "make all the decisions".   he makes _no_ decisions!

there isn't any "rubber-stamping" happening at project gutenberg...
the _only_ rubber-stamp at project gutenberg is one that says "yes!"


>    There is generally strong resistance to any change in that 
decision-making, 
>    where the Founder might lose his/her total control of the organization. 

ha!   like michael has "total control of the organization".   what a laugh!
there's no organization to control; it's everybody do whatever you want.

want to make your own library?   fine!   take our books! free of charge!
if you don't want to go through the hassle of downloading all of them,
we'll even send you a d.v.d. with them.   we'll even pay for the postage!

and if the organization you create -- with the board that you want and
the decision-making structure you think is best -- out-performs p.g.,
then _more_power_to_you_.   p.g. will be happy to have helped you out.
that's an amazing philosophy.   one that is _confident_ in its strength...


>    Boards of these organizations usually don't govern, 
>    but instead "approve" what the founder suggests. 
>    Planning isn't done collectively, but by the founder. 
>    And plans / ideas that do NOT come from the founder
>    usually don't go very far.

this whole "founder's syndrome" thing is about autocratic leaders and their
refusal to give up power.   michael never took any power in the first 
place...

so this notion is simply _not_applicable_.   but let me address one last 
point...


>    Some may ask, ?So what?s wrong with that?? And the answer is simple: 
>    If the ?founder? is hit by a bus tomorrow, the organization is not 
sustainable, 
>    and all the good work the organization has done over the years is 
>    in danger of screeching to a halt. That?s because organizations facing 
>    Founder?s Syndrome usually have little infrastructure in place, because 
it 
>    simply hasn?t been needed. In these situations, the founder IS the 
infrastructure!

let me tell you something that i don't need to tell you.   when michael dies
-- let's hope it's due to old age after he's just celebrated birthday #108 --
project gutenberg _will_ live on...   michael's organization _is_ 
sustainable...

and _none_ of the good work it has done over the years is in _any_ danger of
"screeching to a halt".   the infrastructure of project gutenberg _is_ "in 
place"...

indeed, like the good weed that it is, it has reached out to millions of 
cracks in
the internet sidewalk, and we couldn't get rid of it no matter how hard we 
tried.
even if we wanted to.   and _most_ of us here don't even want to.   end of 
post...

-bowerbird

p.s.   well... i was _going_ to end the post right there.   but you know 
what?
there _was_ some stuff on that webpage that might, ironically, be applicable.
the advice was to prepare your organization for the time when you pass away.
michael, i think you need to take a good hard look at the community right 
here.
among them, you will see some -- not many, it's true, but it's the _loud_ 
ones --
who will try to sabotage the organization that _you_ built.   they want a 
_different_
type of organization, the kind you _intentionally_ set out to set yourself 
apart from.
you need to think about putting into place some protection against these 
saboteurs.
maybe, after considering it, you'll decide your organization is sufficiently 
resistant to
repulse their takeover attempts.   and if that's your decision, i will bow to 
your wisdom.
i believe in the strength of your organization, because i've seen the power 
of the weed.
but please, please, please do consider what i have said here.   your 
contribution has been
far too valuable to the future of the world-at-large to let small-minded 
people destroy it.


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071030/a2acbd3a/attachment-0001.htm 

From jon at noring.name  Tue Oct 30 15:10:40 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 16:10:40 -0600
Subject: [gutvol-d] Re[:  !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301455440.18320@pglaf.org>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
	<Pine.LNX.4.64.0710301409140.14197@pglaf.org>
	<892078063.20071030155329@noring.name>
	<Pine.LNX.4.64.0710301455440.18320@pglaf.org>
Message-ID: <27986341.20071030161040@noring.name>

Michael,

I appreciate your apology, and accept it.

And likewise I offer an apology for things that I said which were
hurtful towards you.

After all, you and I are only human. And it takes a big person to
offer the first apology.

We certainly disagree on a whole lot of things, and we are driven both
by passion in what we believe is right. That's where we agree. We also
agree in that we need to digitize the public domain and get it out
there for preservation and for free use by the public who is the owner
of the public domain. So it is good to focus on where we agree.

Again, thanks, Michael. And I will do my best to always reply to your
messages in a cordial and respectful manner, even when I disagree with
you (which I guess is somewhat often <laugh/>).

Jon


From hart at pglaf.org  Tue Oct 30 15:53:10 2007
From: hart at pglaf.org (Michael Hart)
Date: Tue, 30 Oct 2007 15:53:10 -0700 (PDT)
Subject: [gutvol-d] Re[:  !@! Re:  Founder's syndrome
In-Reply-To: <27986341.20071030161040@noring.name>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
	<Pine.LNX.4.64.0710301409140.14197@pglaf.org>
	<892078063.20071030155329@noring.name>
	<Pine.LNX.4.64.0710301455440.18320@pglaf.org>
	<27986341.20071030161040@noring.name>
Message-ID: <Pine.LNX.4.64.0710301537090.18320@pglaf.org>


On Tue, 30 Oct 2007, Jon Noring wrote:

> Michael,
>
> I appreciate your apology, and accept it.
>
> And likewise I offer an apology for things that I said which 
> were hurtful towards you.

Thank you!


> After all, you and I are only human. And it takes a big person 
> to offer the first apology.

I never wanted to "win" in a manner that was hurtful to you,
just to keep an open balance for all concerned.  I feel your
desires are to upset that balance, hence I defend it.

No one you accuse of wanting power wants it.

We don't want anyone to have that kind of power at PG.

Everyone should be equally empowered.

Thus there is no need for "Board Approval" on anything, only
the rarest requests are even brought to the board beforehand
as we just presume everything but the oddest requests are to
be approved, and everyone already knows that.

In the cases where things ARE brought to the Board I am very
pleased to report that they are even more open-minded than I
would have had any expectation of. . . .

We would like to keep it this way. . . .


> We certainly disagree on a whole lot of things, and we are 
> driven both by passion in what we believe is right. That's 
> where we agree. We also agree in that we need to digitize the 
> public domain and get it out there for preservation and for 
> free use by the public who is the owner of the public domain. 
> So it is good to focus on where we agree.

As I said earlier today, and I hope you can/did see that part,
it is my greatest hope for you that you come to manage project
armies that are even beyond your own expectations, just not to
the exclusion of other such projects, if you understand. . . .


> Again, thanks, Michael. And I will do my best to always reply 
> to your messages in a cordial and respectful manner, even when 
> I disagree with you (which I guess is somewhat often <laugh/>).

I think it is obvious to all concerned that we disagree on any
number of things, I fear you, and they, may not understand the
desire I have NOT to have the kind of power aforementioned and
for NO ONE to have that kind of power over Project Gutenberg.

I think very highly of Brewster's work, but he also has a kind
of power over his projects I wouldn't want here, but he has an
entirely different protocol and system, and he pays for it, in
more ways than one, that I could never do.

Richard Stallman and I perhaps share equal credit for starting
"The Open Source Movement," but, again, he has a kind of power
I would never want to have, nor would I pay his price, either.

I am what I am, and worst of all, for some, I like who I am.

I hope who I am, and what I have done, and hope to do, bring a
whole lifetime of opportunities your way, just not the the kind
of opportunities I mentioned earlier, but would not now.

I hope we can achieve some kind of balance, rather than balance
of having this contesing for control every year.


> Jon


Michael

From jon at noring.name  Tue Oct 30 16:18:55 2007
From: jon at noring.name (Jon Noring)
Date: Tue, 30 Oct 2007 17:18:55 -0600
Subject: [gutvol-d] Re[:  !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301537090.18320@pglaf.org>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
	<1952325838.20071030150346@noring.name>
	<Pine.LNX.4.64.0710301409140.14197@pglaf.org>
	<892078063.20071030155329@noring.name>
	<Pine.LNX.4.64.0710301455440.18320@pglaf.org>
	<27986341.20071030161040@noring.name>
	<Pine.LNX.4.64.0710301537090.18320@pglaf.org>
Message-ID: <1646571185.20071030171855@noring.name>

Michael wrote:

> [a lot of good things, even if somethings I disagree with.]
>
> I hope we can achieve some kind of balance, rather than balance
> of having this contesing for control every year.


I appreciate your reply here, and yes, we will always disagree on a
number of things, but then are there any two people who ever think
exactly alike (other than maybe identical twins)?

The key is for both of us to disagree the right away, and I am guilty
of sometimes not doing it the right way. If we do, then everyone
benefits by how the differing views seed the "idea commons".

Hopefully when we meet again we won't need to punch each other out,
<smile/> (and you are a little bigger than me), but rather sit down
with beers or sodas in hand and verbally argue with each other with
smiles on our faces.

Jon


From piggy at netronome.com  Wed Oct 31 05:32:48 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Wed, 31 Oct 2007 08:32:48 -0400
Subject: [gutvol-d] Wow! Re:  !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301331130.14197@pglaf.org>
References: <1e8e65080710291603y5bbe76a4s72b2bc516dfea73d@mail.gmail.com>	<Pine.LNX.4.64.0710301203340.14197@pglaf.org>	<76647410.20071030140353@noring.name>
	<Pine.LNX.4.64.0710301331130.14197@pglaf.org>
Message-ID: <472875F0.4040606@netronome.com>

Michael Hart wrote:
> Now my reply:
>
> The reason I take so much time answering Jon's yearly efforts,
> such as they are, is to make sure people know I answer all the
> email I receive, other than the most obvious of trolls, spams,
> and the like, and to find out just how seriously persons might
> be taking these sorts of things.
>   

Please forgive me for addressing a Noring topic. There is one point he 
brought up which I for one take seriously.

I have some long-term concerns about the Project Gutenberg trademark. I 
am quite content with how Michael is managing the trademark and have 
every expectation that his sensible policies will continue throughout 
his lifetime.

May I suggest that Michael designate PGLAF as the heir of the trademark?

I can tell you at close second hand that dealing with heirs and estates 
over intellectual property rights is at least an order of magnitude more 
difficult than dealing with the original creators. Heirs tend to look at 
most intellectual property as financial assets rather than ideas which 
need to be disseminated.

We have a similar problem in the Linux community over Linus' ownership 
of the Linux trademark. We have an obvious heir for the trademark, the 
Linux Foundation. To the best of my knowledge, Linus has made no public 
statement about arrangements to safeguard the trademark for future 
generations.

Obviously there are many other issues for both Linux and PG to address 
about surviving their founders. There is no real rush in either 
case--both Linus and Michael are in good health and fully in command of 
their faculties. Each has an excellent set of capable lieutenants.

At some point in the next decade or two, it would be reassuring to hear 
that Michael has made long-term arrangements for the stability of the 
trademark.


From piggy at netronome.com  Wed Oct 31 05:42:47 2007
From: piggy at netronome.com (La Monte H.P. Yarroll)
Date: Wed, 31 Oct 2007 08:42:47 -0400
Subject: [gutvol-d] !@! Re:  Founder's syndrome
In-Reply-To: <Pine.LNX.4.64.0710301256540.14197@pglaf.org>
References: <24187201.1193773828225.JavaMail.?@fh1038.dia.cp.net>
	<Pine.LNX.4.64.0710301256540.14197@pglaf.org>
Message-ID: <47287847.1030509@netronome.com>

Michael Hart wrote:
> I have answered the trademark question before, and will again, to
> your never to be had satisfaction. . .I believe in the separation
> of powers. . .I don't wan't someone to be able to take over Board
> of Directors positions and thence all claim to everything that is
> named "Project Gutenberg."
>   
Thanks. This makes good sense.

I amend my earlier suggestion to designate PGLAF as the heir to the PG 
trademark to encourage you to designate a specific heir sometime in the 
next decade or two.


From Bowerbird at aol.com  Wed Oct 31 09:55:08 2007
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 31 Oct 2007 12:55:08 EDT
Subject: [gutvol-d] harmless monsters, and the dogs that have stopped barking
Message-ID: <d6a.10b59719.345a0d6c@aol.com>

what a delightful post from jon richfield!            :+)

i believe mr. richfield has been here for a while, but i'm not sure how 
long...

and there might be other people here who have been subscribed for less than
a period of many years, for whom i can provide some background information.

because one of the most interesting aspects of this listserve _these_days_ is
the dogs who are not barking.   in other words, posts you are _not_ seeing...

(i assume you're familiar with the sherlock holmes story where he solves the
mystery by noting that a dog did _not_ bark, when he usually _would_have_,
thus indicating the murderer was someone with whom the dog was familiar.)

and there is a lot of "non-barking" going on here these days...   a ton of 
it...

to those who know what the noise level _could_ be, what it actually has been,
this silence is deafening.

the dogs have stopped barking...

***

so here's the backstory...

it's kinda long, if you prefer to save it for when you have some time...

if you want to cut right to the chase, and read the rest later (or never),
skip toward the end, where you'll find a header saying "the message"...

***

i subscribed almost exactly 4 years ago.   i believe the date was 2003/11/11.
that was what i intended anyway, but i mighta jumped the gun by a few days.

i read all the archives -- even going back to previous web-based forums --
before posting, so i was thoroughly immersed in the issues being discussed.

but the one issue that i was most interested in was "the planned move to 
.tei",

this transition had been "in the works" for several years even at that time, 
but
nothing had actually happened.   that didn't surprise me, because .tei is 
_hard_.
way too difficult for the ordinary people that were doing volunteering for 
p.g.

the reason i appeared at this time was because i'd been tracking p.g. 
progress
for well over 20 years by then.   to tell you the truth, i was skeptical back 
then...
oh sure, i loved the idea.   heck, it was _my_ idea.   i'd been speaking 
incessantly
since 1980 myself about "putting all the books in the world on the computer".

"and then all the pictures," i would add.   "and all the music, and all the 
movies."
i figured it would go in that order because of the bandwidth required by 
each.
obviously, with college dorms wired with fiber, i was wrong.   music led the 
way.

so i knew about project gutenberg way back when it was almost entirely vapor.
and i called it "almost entirely vapor" _then_, as that was the truth of the 
matter.

still, there was something i admired about michael hart.   not only did he 
have
a very good idea -- i.e., my idea -- but also a punk d.i.y. attitude that i 
loved.
(d.i.y. stands for "do it yourself", for all you young'uns out there.)

i was a graduate student at the time, spending way too much time in 
libraries,
surrounded by books, so i envisioned some big effort to digitize those books.

(the university had also just debuted a newfangled thing called "text 
editing"
on the mainframe, which i loved, as i was one of those people who write once
and then rewrite a thousand times.   but all my scribbling and lines and 
arrows
and inserts and crossouts and stuff drove me crazy.   the computer gave me
an absolutely clean copy any time i wanted it, which i found quite heavenly.
i harangued my fellow graduate students constantly about word-processing,
and went out immediately and bought an osborne when it became available,
so that i could do my writing at home.   and the rest, as they say, is histor
y...
the point is, i knew that very soon, _all_ documents would be digital in 
form.
so the only thing we needed to do was to digitize the paper-book "backlog".)

anyway, back to project gutenberg.   i loved michael's spirit, and the fact 
that
he was willing to _type_in_ a darn book, if that's the only way to get it 
online.

the other thing that i loved was that he was very good about getting press...
oh sure, i was jealous that _he_ was getting all the credit for _my_ idea, 
but
since _i_ wasn't putting the idea out into the big world, i was glad _he_ 
was...
and he was doing a good job.   i especially remembered an article in "wired".

but the fact of the matter is, for some 20 years, he had almost nothing done.
a few high-profile _documents_.   the bible.   shakespeare.   that was about 
it.
anyone who was _determined_ could have caught up to his total quite easily.

even _years_ after the emergence of the web, which i saw as the _beginning_
of the time when "all of the books in the world would be on the computer" --
except starting with new born-digital documents rather than old paper books,
which (in my opinion) was better than the public-domain of project gutenberg
-- the p.g. "library" was almost laughably small.   again, anyone with 
_stamina_
and a good scanner could have matched and surpassed michael's total easily...

but then, michael's "doubling cube" started to give us some impressive 
output.
i mean, when you're doubling your output from 100 to 200 books, no big deal.
even going from 1000 to 2000 isn't all that overwhelming.   but when you go
from 2000 to 4000, and then double to 8000, all of a sudden it _is_ a big 
deal.

because now you're starting to talk about a rather large number of books...

so, along about 2002, i started planning to engage project gutenberg soon.
i'm remembering it was around 8000 books then, so it was increasingly clear
michael had brought his project to critical-mass.   so he deserved the 
credit.

i'd been writing viewer-programs all along, so i decided to make one for 
p.g.,
as a present for michael hart.   for the first time in my work, i decided to 
use a
2-up facing-pages interface.   for more than a decade, it was my orientation
that "an electronic-book doesn't have to look like a physical-book", a mantra
that was shared by virtually all of the other e-book programmers of that 
time.

ordinary people, however, pushed back against e-book viewer-applications,
and one of the things they said was that they liked the "look" of a 
paper-book.

so, more or less as a lark, i said, "well heck, why don't i just give 'em 
that look?"

still, i was of the opinion that, once people "got used to" an 
electronic-book,
they would give up this silliness about wanting it to "look like" a "real" 
book...

boy, was i ever wrong.

what _really_ happened was that _i_ got convinced that the 2-up facing-pages
display has many natural advantages, and that it was indeed the best 
interface.

and, no, _not_ because it "looks" like a "real" book.   although that doesn't 
hurt.
(and i want to stress that it _doesn't_ hurt for an e-book to look like a 
p-book;
indeed, i'm beginning to think it's actually a good thing.   but save that 
for later.)

it's good because the right-side page can "drop away" and be a _workspace_.
this _workspace_ might hold a table of contents, for example, or a list of 
hits
from a user's search operation, or a full-size version of a picture in the 
book,
or any number of other things.   but what's important is the _left-side_ page
is undisturbed by all this.   so you always maintain your "contact" with the 
text.

compare this with adobe's acrobat viewer.   when you summon up "bookmarks",
the page of text from the book is resized, which makes you "lose contact" 
with it.
and when you dismiss the "bookmarks", the text-page is resized once again, so
you "lose contact" again.   all this mental effort required to re-establish 
"contact"
is unnecessarily draining, and it detracts from the overall reading 
experience...

and the clincher was that monitors, which had been getting bigger all along,
were now wide enough that the most _readable_ line-length only used up
_half_ of the monitor, meaning there was finally _room_ for a 2-up display.

so i was quite happy with this new viewer-program i had made.

i had also developed my own methodology for compressing an e-text
-- the 4-meg bible i was using in my demo compressed to 1.2 megs --
so i figured i could make good use of the p.g. library as a demo corpus.
by this time, with 9000+ e-texts in it, it was far away from vapor-land...

when i happened to be in chicago in august of 2003, i decided to visit 
michael.
by then, i'd been talking with him on various e-book listserves for 8 years, 
but
i'd never met him.   it was a pleasure to see him, buy him dinner, and have 
him
talk my ear off about shakespeare.   i told him about my viewer-program and 
my
compression method, and asked him if i could use his library as a demo 
corpus.
he said yes, of course, and offered disk-space.   he said he could not 
guarantee
me any volunteers, that i'd have to get those myself, but he'd help with that 
too...
i told him i intended to do it all myself, and i wouldn't know how to use any 
help.
i was on my way...

***

                                    the message...

***

in the fall of 2003, as p.g. zeroed in on 10,000 e-texts, a victory 
celebration was
scheduled for december in san francisco, so i was able to attend.   as 
precursor to 
all of that, i decided to get on the project gutenberg listserves for the 
first time...

my work with the plain-ascii e-texts from project gutenberg had convinced me
that -- with just a little bit of work -- they could be modified to make 
e-books
that (a) have high-powered functionality, and (b) are typographically 
beautiful.

as i did more and more of this work, i became more and more impressed with
the power of plain-text.   i'd never been fond of heavy-markup, but didn't 
have
any _allergies_ to it either.   but as i came to realize it is completely 
unnecessary,
my aversion to it grew.   so i approached the p.g. listserves to share a 
message:
you don't need heavy-markup, which is costly, to get the benefits you 
desire...

and boy, did my message get flak in return.

as the archives will clearly reveal to anyone who reads them, the 
markup-crowd
responded with a vengeance   there were days with literally dozens of 
messages,
all of them _hostile_, all of them saying "your methodologies will never 
work..."
(there was a lot of ad hominem crap as well, but i have an extremely thick 
skin.)

this went on for months.   for months and months.   for _years_, in fact.

i was dumbfounded.   i had a methodology that worked.   i _knew_ it worked.
i watched it work, on my machine, every day, day in and day out, thank you.

but these people insisted they knew better, that my stuff _couldn't_ work.
and they did it loudly, and incessantly, and with entirely too much venom.

from my vantage-point, it was quite easy to see that they were fools.

because i knew that when i revealed my evidence, they would lose
every single ounce of their credibility.   why did they squander it so?
i still can't give you a good answer to that question.   why would you
bet against someone -- anyone -- about what was in their pocket?
think about it -- you have no idea what's in their pocket, not really,
and they probably do.   so why even consider betting against them?

heck, even if i told you something outrageous, like that i can _fly_
without an airplane, i could see you being very skeptical about it.
i could see you saying, "until you really show me, i don't believe it."

but why would you assert -- flat out -- that "that is _impossible_",
and even _bet_your_entire_credibility_ that what i said was untrue?
doesn't make any sense to me.   yet that is _exactly_ what they did...

so i just kept the poker game running, made them bet their wad...

perhaps it was "cruel" to make them lose _all_ of their credibility,
when i could've ended it right away and let them retain _some_,
but i figured if they were going to treat the credibility they had
in such a cavalier manner, then they didn't _deserve_ to have it.

by the way, i was on vacation a few weeks back, and we went to the
glider-port at torrey pines, and saw humans fly _without_airplanes_.
one was using a hang-glider, and the others had those sailing-seats.
i was reminded of all of those old short-films that showed old-timers
trying to take flight with "wings" they'd built, or various contraptions.
they all crashed hard, of course, and watching those films is _funny_.
but still, here was the hang-glider, stepping off a cliff into an updraft,
and sailing up into the sky.   somehow, the juxtaposition was poignant.

anyway, back to all that flak i was getting...

in a phrase, whenever i made a post, the dogs would start barking.
(actually, since they ganged up on me, it was more like a wolf-pack,
but that doesn't fit the metaphor, so we'll stick to calling 'em dogs.)

now, at the time, i even said that, once i started revealing my proof,
my antagonists would clam up, pretending they never said otherwise.
i said this many times.   you can go back in the archives and confirm it.

sure enough, over the past few years, that is exactly what happened.

for a while, they attempted to do some back-pedaling, but their pride
just wouldn't let them.   even in the past year, when i pressed him into
specifying a percentage of the books in the p.g. library that i would be
_unable_ to handle with z.m.l., jon noring put his estimate at _50%_...

yet, when i've asked jon and others to point to some examples, i get
stone cold silence in return, since they know they simple cannot do it.
if they could, the animosity they've shown me assures me they would.
in the _abstract_, they love saying what z.m.l. cannot do.   but when i
boil things down to the _reality_, then they "can't be bothered" with it.
especially since the pudding i've revealed so far proves they're wrong.

so these days, when i make a post that reveals the latest bit of proof
in my z.m.l. pudding, there is no longer any flurry of hostile replies...

the dogs have stopped barking...

a while back, i announced an offline authoring-tool to make z.m.l.

no response.   it's not like people "just aren't interested" in the topic.
in the past, they've gone on and on and on about how _difficult_ it
would be to author z.m.l.   my word, you'd have to _manually_count_
the linebreaks to tell if something was a header.   or so they claimed...
yet when i present an authoring-tool that gives a _formatted_ display
-- no linebreak counting necessary, thank you very much -- _silence_.

the dogs have stopped barking...

heck, the other day i posted a message about a demo .pdf conversion,
and there wasn't a single reply.   four years ago, there was no shortage
of barking dogs that would respond to every single one of my posts...
as long as z.m.l. was "theoretical", they thought they could destroy me,
and they showed a keen interested in doing that.   now that it's _real_...

even two years ago, when i posted the precursor of my .pdf conversion,
the thread ran on and on for days, trying to pick little holes in my work.

but this week?   crickets...

the dogs have stopped barking...

when they thought they could strangle this little baby in its crib,
they were positively _viscious_ in their attacks.   now that it's big,
big enough to kick their ass, they've got absolutely nothing to say.
(except maybe to whine about how i'm making this "a competition".
they didn't mind that at all when they thought _they_ were winning.)

monday, for the first time, i revealed a "live" zml-to-html converter.
it demonstrates rather clearly that z.m.l. works, and works fairly well.
it'll work better as i improve it, but it's absolutely clear that it _works_.

a year ago, when i put up this same converter that used z.m.l. files
that i had preformatted, the dogs barked that it had been "rigged",
that it couldn't handle anything _except_ those preformatted files...
back then, they _dared_ me to make it live, thinking they "had me"...

but this year -- this week -- yesterday and today -- when given the
evidence that their claims couldn't hold water, you heard _nothing_.

the dogs have stopped barking...

and as i continue to reveal more and more evidence that z.m.l. works,
it will be increasing clear that i can handle almost _all_ of the library...
(i'm unsure because i don't know if i want to clutter z.m.l. with latex.)

so even now, it's obvious that my antagonists have lost their credibility.
lost it completely.   and all of their "good arguments" for heavy-markup
are quickly disintegrating into dust.   nobody will listen to 'em any more.
not without laughing.   and that's _why_ the dogs have stopped barking...

it's easy to get distracted by the flash of the flames happening here.
some arguments can still be _loud_, and go on for far, far too long.
(especially when certain people keep bringing them up repeatedly.)

but the most interesting sound on this listserve these days is the silence...

and the silence that you hear is because the dogs have stopped barking...

and you don't have to be sherlock holmes to figure that out, or know why.

-bowerbird

p.s.   there stretches out in front of me a long string of technologies
that i will continue to debut over the course of some time to come...
indeed, as my tool-chain is quickly beginning to achieve coherence
across the entire work-flow, my pace will be picking up very shortly.
further, the long-term cost-effectiveness of a z.m.l. library will take...
well... it will take a "long-term" amount of time in order to prove it...
so that deafening silence you now hear will get louder and louder.
louder and louder after every post i make nailing z.m.l. down more.
and, of course, the assiduous revolution toward light-markup will
continue to exert itself in a great many arenas across cyberspace...
so, at this point, the silence of my critics on the usefulness of z.m.l.
is just a humorous curiosity, maybe even a welcome respite after 
years and years of yapping.   however, at some point down the line,
when it's clear to all heavy-markup was an evolutionary dead-end
-- since it makes more sense to put smarts in apps, not formats --
they will either continue to resist it (and look stupid), or embrace it
(in which case you know i'll say "i told you so", even from the grave),
or they'll fade away (with tail between legs), probably best for us all.
for those interested in the long-game, those will be interesting times.
when you're predicting the future, it's not just "a difference of opinion"
if people disagree.   one might be right, at most.   the other _is_ wrong.
_accuracy_matters_.   accuracy is _all_ that matters.   as alan kay said,
"the best way to predict the future is to invent it".   so i invented z.m.l.
but i'm under no illusions.   i'm very confident that, in five or ten years,
some people will be trying to rip it out of my hands, pontificating how
"of course you invented it, but _now_ it belongs to _the_community_",
as they try their hardest to change it from what it is to what they want.


**************************************
 See what's new at http://www.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071031/26ab8fc6/attachment-0001.htm 

From jon at noring.name  Wed Oct 31 10:48:15 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 31 Oct 2007 11:48:15 -0600
Subject: [gutvol-d] Note the definition of "work" (was harmless monsters,
	and the dogs that have stopped barking)
In-Reply-To: <d6a.10b59719.345a0d6c@aol.com>
References: <d6a.10b59719.345a0d6c@aol.com>
Message-ID: <754722244.20071031114815@noring.name>

To summarize my longer message below, let's look at the use of the
phrase "it works":

Bowerbird:

   "See, I take this ZML document and convert it to HTML and PDF, and
   doesn't it look pretty? ZML works as a mastering format!"

Most of the rest of us studying the issue of a PG "master" format":

   "A universal mastering format must adequately represent all book
   types, and meet several other requirements regarding archiving,
   retrieval, etc. Certainly ZML documents may be converted to other
   formats. But as a universal mastering format ZML is insufficient,
   and therefore will not work *as a universal mastering format.*"

Notice the differing definitions of "works"? Words can be very
powerful as Bowerbird knows being a performance poet, and so it is
important to understand the nuances of the underlying definitions
that are being asserted when the words are used.


Bowerbird wrote:

> [overall a good summary of his historical perspective of things.]
>
>  as the archives will clearly reveal to anyone who reads them, the markup-crowd
>  responded with a vengeance?there were days with literally dozens of messages,
>  all of them _hostile_, all of them saying "your methodologies will never work..."
>  (there was a lot of ad hominem crap as well, but i have an extremely thick skin.)

Well, Bowerbird continues to rewrite history and intentions with
this "it won't work" message. Do you think if you say it long enough
it will become true?

Most of the "markup crowd" never said "it would never work" as you
have defined "work". I proposed back in the early days of ebook-list
(now TeBC) that PG should normalize its plain texts so as to make
reliable conversion/repurposing possible, which shows I understood
that regularized/normalized plain text can certainly be used in the
role of a "master". So I knew from the start that normalized plain
text (which ZML is one flavor) can be converted to other formats for
presentation.

And anyone who works with document conversion understands this. I
believe the "light markup revolution" as you call it has been around
since the dawn of the computer era, since there are applications where
"light markup" to plain text *is sufficient*. Heck, in the early days
of PG, PG deployed "light markup" so as to identify highlighted text,
for example. (And this message uses "light markup".)

What the "markup crowd" here essentially said, collectively (with some
individual differences) was that "oh, that's quaint -- will it be able
to properly identify this, and that, and this other thing?". As well as
meet certain other requirements for a "master" rendition *for the
entire PG collection, present and future* from which everything else is
derived?

That is, the "markup crowd" believed, and overall still believes
today, that normalized plain text is "not sufficient". This is NOT the
same as "it won't work" per Bowerbird's definition of "it works".

Of course it will "work" by Bowerbird's definition of work, which is
simply "here's a ZML document, and here's the HTML or PDF derivative of
it -- doesn't it look pretty in this viewer or browser? See, it works!"

Even if you fix the "block quote" deficiency of the current published
ZML spec (and I suspect you are since you've not addressed my comment
even in messages where you did reply, making me believe you've
privately added support for that for some grand splash to come soon),
there are several other things where putting all our eggs into the
"let's master all our books in ZML" will lead to a host of problems.

Now I do like normalized plain text for plain text end-user
renditions (not as a master), and ZML is a viable candidate for this
role (fix the block quote thing as I described before and it becomes
an even stronger candidate for this role.)

As I've said before, of all the people who've commented on ZML, I'm the
only one, I believe, who has been intrigued with ZML for the role it
can play in the "digital text ecology", and that role is as the
preferred plain text end-user rendition, not the master.

Jon Noring


From marcello at perathoner.de  Wed Oct 31 12:12:18 2007
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 31 Oct 2007 20:12:18 +0100
Subject: [gutvol-d] Note the definition of "work" (was harmless monsters,
 and the dogs that have stopped barking)
In-Reply-To: <754722244.20071031114815@noring.name>
References: <d6a.10b59719.345a0d6c@aol.com>
	<754722244.20071031114815@noring.name>
Message-ID: <4728D392.6040105@perathoner.de>

Jon Noring wrote:

> Bowerbird:
> 
>    "See, I take this ZML document and convert it to HTML and PDF, and
>    doesn't it look pretty? ZML works as a mastering format!"

I would rather take a still of my dog and convert it to a movie.

In the movie the dog sits still for 2 hours, but hey! with format
changed to "movie" now I can post it to youtube!


All ZML will ever do is produce an ascii text with the ending changed to
.html.


-- 
Marcello Perathoner
webmaster at gutenberg.org


From joshua at hutchinson.net  Wed Oct 31 12:21:45 2007
From: joshua at hutchinson.net (joshua at hutchinson.net)
Date: Wed, 31 Oct 2007 19:21:45 +0000 (UTC)
Subject: [gutvol-d] Librivox Hits 1000!
Message-ID: <4643470.1193858505675.JavaMail.?@fh1064.dia.cp.net>

I just go this press release from the Librivox folks!  We've got a 
little over 300 of their audio books so far and more are being added.  
(See www.librivox.org)

***

LibriVox makes it to 1,000!

LibriVox, the free audio book project has just cataloged it's 1,000th 
book: "The Murders in the Rue Morgue," by Edgar Allan Poe (read by 
Reynard T. Fox).

LibriVox.org started in August 2005 with a simple objective: "to make
all public domain books available as free audio books." Thirteen 
people collaborated to make the first recording, Joseph Conrad's 
"Secret Agent."

Two years later,  LibriVox has become the most prolific audiobook 
publisher in the world - we are now putting out 60-70 books a month, we 
have a catalog of 1,000 works, which represents a little over 6 months 
of *continuous* audio; we have some 1,500 volunteers who have 
contributed audio to the project; and a catalog that includes Jane 
Austin's "Pride and Prejudice," "Moby Dick," Darwin's "Origin of the 
Species," "Alice's Adventures in Wonderland," Einstein's "Relativity: 
The Special and General Theory," Kant's "Critique of Pure Reason," and 
other less well-known gems such as "Romance of Rubber" edited by John 
Martin. We have recordings in 21 languages, and about half of our 
recordings are solo efforts by one
reader, while the other half are collaborations among many readers.

We are always looking for new volunteers! Come join us.

From creeva at gmail.com  Wed Oct 31 12:35:14 2007
From: creeva at gmail.com (Brent Gueth)
Date: Wed, 31 Oct 2007 15:35:14 -0400
Subject: [gutvol-d] Note the definition of "work" (was harmless monsters,
	and the dogs that have stopped barking)
In-Reply-To: <754722244.20071031114815@noring.name>
References: <d6a.10b59719.345a0d6c@aol.com>
	<754722244.20071031114815@noring.name>
Message-ID: <2510ddab0710311235r77ad9e97tdfaaa3b96c510019@mail.gmail.com>

You know I first posted on the gutenberg mailing list 2 years ago on this
very subject.   Somehow no one seems to be able to get around this issue.
Pert of the issue back then stemmed out of getting rid of plaintext version
and moving to XML - which I defended and said their should always be a plain
text version.  As long as a plaintext version is always created and hosted
by PG of every work I have no complaints.

That being said - PG has source editors - what format works best for these
people?  The end user and normal PG contributor really doesn't need to know
much about whatever type of format it is- whether it be plaintext or xml or
html version 423.5 - some people like myself have popped on this - and since
the argument has turned slightly (may I mention again this has been TWO
YEARS this discussion has been going on) who are the parties specifically
responsible for handling and passing out the masters?  What do they want?
If they come up with a consensus on a format they can agree on AND provide
the output tools to convert it to plain text or PDF or whatever - what does
it truly matter since other editors can work on those editions?

I think the focus on this should be the merits of the formats - what
conversion tools exist - what is the limitations of the conversion tools -
and who is going to create adequate conversion tools for the sub par ones?

Which formats is PG going ot be supplying to the public - regardless of the
master markup language?

Since PG is now trying to include original images that were in the
manuscripts - the master faithfully render the following:

PDF with embedded images?
HTML - with identical output as the PDF - if it's not identical we need to
look at the tools?
TXT - that is well formatted with the pointers to the images stripper?
JPG - images that output should look the same as the pdf

Any pure text based conversion tool whether TXT or DOC - should look
identical when everything is said and done.   Everything that includes
images should look identical when everything is said and don (i.e all images
should be centered the same way and bold should be added where needed
identically in TXT leave out codes to say bold - no markups in plain text).

When someone is trumping a master format make them responsible to show it's
flexibility at being able to convert the master format to every other format
indentically.  If they can not prove that - chair the discussion on that
format until progress of the conversion is deemed acceptable by all members.


I don't think any of us really cares about the master format as long as the
people standing behind them have the proper tools or a team of people
working on the tools to faithfully render these identically across any
format.

On 10/31/07, Jon Noring <jon at noring.name> wrote:
>
> To summarize my longer message below, let's look at the use of the
> phrase "it works":
>
> Bowerbird:
>
>    "See, I take this ZML document and convert it to HTML and PDF, and
>    doesn't it look pretty? ZML works as a mastering format!"
>
> Most of the rest of us studying the issue of a PG "master" format":
>
>    "A universal mastering format must adequately represent all book
>    types, and meet several other requirements regarding archiving,
>    retrieval, etc. Certainly ZML documents may be converted to other
>    formats. But as a universal mastering format ZML is insufficient,
>    and therefore will not work *as a universal mastering format.*"
>
> Notice the differing definitions of "works"? Words can be very
> powerful as Bowerbird knows being a performance poet, and so it is
> important to understand the nuances of the underlying definitions
> that are being asserted when the words are used.
>
>
> Bowerbird wrote:
>
> > [overall a good summary of his historical perspective of things.]
> >
> >  as the archives will clearly reveal to anyone who reads them, the
> markup-crowd
> >  responded with a vengeancethere were days with literally dozens of
> messages,
> >  all of them _hostile_, all of them saying "your methodologies will
> never work..."
> >  (there was a lot of ad hominem crap as well, but i have an extremely
> thick skin.)
>
> Well, Bowerbird continues to rewrite history and intentions with
> this "it won't work" message. Do you think if you say it long enough
> it will become true?
>
> Most of the "markup crowd" never said "it would never work" as you
> have defined "work". I proposed back in the early days of ebook-list
> (now TeBC) that PG should normalize its plain texts so as to make
> reliable conversion/repurposing possible, which shows I understood
> that regularized/normalized plain text can certainly be used in the
> role of a "master". So I knew from the start that normalized plain
> text (which ZML is one flavor) can be converted to other formats for
> presentation.
>
> And anyone who works with document conversion understands this. I
> believe the "light markup revolution" as you call it has been around
> since the dawn of the computer era, since there are applications where
> "light markup" to plain text *is sufficient*. Heck, in the early days
> of PG, PG deployed "light markup" so as to identify highlighted text,
> for example. (And this message uses "light markup".)
>
> What the "markup crowd" here essentially said, collectively (with some
> individual differences) was that "oh, that's quaint -- will it be able
> to properly identify this, and that, and this other thing?". As well as
> meet certain other requirements for a "master" rendition *for the
> entire PG collection, present and future* from which everything else is
> derived?
>
> That is, the "markup crowd" believed, and overall still believes
> today, that normalized plain text is "not sufficient". This is NOT the
> same as "it won't work" per Bowerbird's definition of "it works".
>
> Of course it will "work" by Bowerbird's definition of work, which is
> simply "here's a ZML document, and here's the HTML or PDF derivative of
> it -- doesn't it look pretty in this viewer or browser? See, it works!"
>
> Even if you fix the "block quote" deficiency of the current published
> ZML spec (and I suspect you are since you've not addressed my comment
> even in messages where you did reply, making me believe you've
> privately added support for that for some grand splash to come soon),
> there are several other things where putting all our eggs into the
> "let's master all our books in ZML" will lead to a host of problems.
>
> Now I do like normalized plain text for plain text end-user
> renditions (not as a master), and ZML is a viable candidate for this
> role (fix the block quote thing as I described before and it becomes
> an even stronger candidate for this role.)
>
> As I've said before, of all the people who've commented on ZML, I'm the
> only one, I believe, who has been intrigued with ZML for the role it
> can play in the "digital text ecology", and that role is as the
> preferred plain text end-user rendition, not the master.
>
> Jon Noring
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20071031/db1f8e25/attachment.htm 

From jon at noring.name  Wed Oct 31 13:26:59 2007
From: jon at noring.name (Jon Noring)
Date: Wed, 31 Oct 2007 14:26:59 -0600
Subject: [gutvol-d] Note the definition of "work" (was harmless monsters,
	and the dogs that have stopped barking)
In-Reply-To: <2510ddab0710311235r77ad9e97tdfaaa3b96c510019@mail.gmail.com>
References: <d6a.10b59719.345a0d6c@aol.com>
	<754722244.20071031114815@noring.name>
	<2510ddab0710311235r77ad9e97tdfaaa3b96c510019@mail.gmail.com>
Message-ID: <1419906406.20071031142659@noring.name>

Brent wrote:

> When someone is trumping a master format make them responsible to
> show it's flexibility at being able to convert the master format to
> every other format indentically.? If they can not prove that - chair
> the discussion on that format until progress of the conversion is
> deemed acceptable by all members. 

You bring up several good points.

Let's look at this from a different angle, at least from the plain
text markup approach (which encompasses XML-vocabularies, like TEI,
and ZML):

Bowerbird's ZML is actually no different from XML with respect to
conversion to other formats. There is, for example, an XML equivalent
to ZML where it is possible to build ZML <---> XML converters, and I
believe they are almost trivial, that can perfectly round-trip between
the two. (I'll call this ZML-equivalent vocabulary "zXML".)

Thus, ZML essentially defines a vocabulary to markup various
structures and inline text semantics. Although the conversion tools
may be a little different, in terms of programming effort they are
about the same (one approach to use with ZML is conversion to zXML,
then use standard XSLT to convert to XHTML or whatever else -- this
may actually be the easiest approach and now allows ZML to plugin to
the myriad existing XML processing tools.)

Thus the question *has* to go back to the "vocabulary" used. (Plus
other benefits that XML confers that the present ZML does not, and
probably cannot.)

Is the ZML "vocabulary" sufficient for representing all the books in
the PG Corpus? Many of us believe it is not. And not only for
representing structures/semantics, but for metadata, referencing/
citation, and text durability (in ZML, white space normalization is
*critical*, in XML white space is totally flexible.)

(ZML does not yet even include something inside of it which will say
to a machine: "I am ZML". For philosophical reasons I think Bowerbird
is opposed to adding any identifier to the text file which identifies
the content as conforming to ZML. THIS IS CRITICAL, among other
critical things. And don't ask him about machine readable metadata --
I've never seen anyone who is so opposed to even flagging simple
metadata/catalog information in a machine-readable manner. He has this
belief we can write programs today which will extract all the needed
metadata right from the plain ZML content. Of course, he has decided
what metadata is meaningful and what is not.)

Thus, building some sort of conversion toolset does NOT demonstrate
that ZML is the mastering format PG is looking for. Bowerbird is
trying to convince us it is -- it is not. In fact, he is hoping it
will fool people since people like to see "results", so he is going to
show them "results". Notice that Bowerbird chooses the example texts
he normalizes in ZML. And he does not address the other various issues
brought up by many of us -- I think I've only scratched the surface --
others will have their own.

He continues to say (over and over and over again like a broken record
-- who's on the merry-go-round?): "Look at my conversion and reading
toolz -- See! ZML works!"

It's like someone standing on the 10th floor of a building and dropping
a bowling ball -- it hits the ground, and then they say "see, gravity
works as I told you so! If you want to crack nuts I now have the simple
solution! Who needs a complicated mechanism to crack nuts?"


Jon Noring


From bowerbird at aol.com  Wed Oct 31 14:12:07 2007
From: bowerbird at aol.com (bowerbird at aol.com)
Date: Wed, 31 Oct 2007 17:12:07 -0400
Subject: [gutvol-d] back to the basics
In-Reply-To: <2510ddab0710311235r77ad9e97tdfaaa3b96c510019@mail.gmail.com>
References: <2510ddab0710311235r77ad9e97tdfaaa3b96c510019@mail.gmail.com>
Message-ID: <8C9EA19D44675AC-4FC-5175@FWM-M15.sysops.aol.com>


-----Original Message-----
From: Brent Gueth <creeva at gmail.com>
To: Jon Noring <jon at noring.name>; Project Gutenberg Volunteer 
Discussion <gutvol-d at lists.pglaf.org>
Sent: Wed, Oct 31  3:35 PM
Subject: Re: [gutvol-d] Note the definition of "work" (was harmless 
monsters, and the dogs that have stopped barking)

brent said:
>   Since PG is now trying to include original images that were in the 
manuscripts -
>   the master faithfully render the following:
>   PDF with embedded images?
>   HTML - with identical output as the PDF - if it's not identical we 
need to look at the tools?
>   TXT - that is well formatted with the pointers to the images 
stripper?
>   JPG - images that output should look the same as the pdf

ok, i can see it's time for a refresher on the basics, because
some people are confused, because other people are trying
their darndest to _make_ us confused, the better to snow us.

first of all, let me make it clear that i am here precisely because
my message has a great deal of relevance to project gutenberg.

further, were it not for michael hart's insistence on plain-text,
which meant his library was structured that way, it's possible
-- perhaps even likely -- i wouldn't have learned the value of it...
so i came here to share, in an effort to repay michael for that...

having said all that, though, i have no intention of letting z.m.l.
fade away just because of "enemies" here at project gutenberg.

i'd always intended on creating my own mirror of the p.g. library.

originally, i thought i would have to, because i intended on using
my compression format on the e-texts, and i just assumed that
i would be the only person interested in doing such a boring job.
and since it's just a massive file-handling job -- the compression
itself is button-click automatic -- it's not a job that can be shared.
so heck, i was jazzed michael offered me webspace/bandwidth...

anyway, when i realized my routines were fast enough to handle
a regular file -- without having it be compressed first -- i was freed
from the compression task.  but i decided i'd still mount a mirror.

so it matters to me not one tiny little whit if p.g. doesn't use z.m.l.
because i'm going to prove its efficacy all by myself with my mirror.

indeed, i sincerely hope that p.g. does _not_ use z.m.l. at first...

i hope the markup freaks continue their campaign and drive p.g.
into a state of pure stagnation, followed by a complete collapse
when the complexities of their system result in total confusion...

because that will lead to a total housecleaning, and the new staff
will understand deeply the need for a system based on simplicity.

and my mirror will be the model letting them know that is possible.

so they'll just copy over my z.m.l. files and call it the p.g. library.
(which is only fitting, because i got the files from p.g. originally.)

so how will my mirror differ from the p.g. library that exists now?

first, it will have one file only for each e-text -- the z.m.l. 
version...
you can call it the "master" if you like, but since there will be no
"slaves" around, it's really unnecessary.  there's just one version.

so this is one arena in which you've been confused, namely this
focus on "master" files.  the best way to get the answer you want
is to frame the question in your target's head.  the x.m.l. crowd
asks you about "a master file" so your answer becomes "x.m.l."
(go read the "win friends and influence people" manifestos, and
you'll see how this strategy is laid out and completely explained.)

the z.m.l. version will look very much like a pg-ascii version now,
with the big difference (aside from the structured nature of z.m.l.)
being that the z.m.l. file will contain references to _illustrations_.
(such references are now unceremoniously stripped from pg-ascii.)

end-users can read the z.m.l. versions on the web if they want.
for an example of that, see the "babelfish" script located here:
>   http://z-m-l.com/go/babelfish19.pl
(my website seems to be down today, probably due to some
perl hacking i was doing on it yesterday, so check back later.)

but i expect most people will download the z.m.l. files instead,
either to their desktop/laptop machines, or to wireless readers.

z.m.l. viewer-programs -- similar to babelfish, but even better --
will display the z.m.l. files offline.  (these viewer-programs will
also download the entire library, including newly-posted books,
in the background, to _duplicate_ books as widely as possible,
while at the same time constantly growing the person's library.)

z.m.l. viewer-programs are amazingly easy to code, and to port
to other platforms.  (i have written them in 3 languages already.)
and -- importantly -- they kick other viewer-programs to the curb.

eventually, there'll be little need to "produce derivative formats".

indeed, i would say "no need", but i'm not dumb enough to think
that companies with deep pockets -- like adobe and amazon --
are simply gonna roll over and play dead and let z.m.l. be king.
and the publishing companies will play a role as well, because
the possibility for authors to easily create e-books scares them,
so they'll attempt to impose a complex system on publishing...

so the upshot is we're gonna be dealing with .pdf and .mobi for
a long time to come.  furthermore, the web is a factor as well,
so .html is something that must be brought into the picture too.

that is why i decided to demonstrate that z.m.l. converts well...

well, that plus -- at least at the outset -- the so-called ability of
x.m.l.-based methodologies to "produce derivative formats" was
one of the big selling-points.  indeed, go over to the d.p. forums
and you'll see that this is _still_ one of the selling-points for .tei.

it is, of course, hype.  we've looked at the "conversions" produced,
and found them to be wanting.  they fall far short of what's desired.
sometimes they fall _so_ short that "vapor" is a more honest label.
plus, in spite of the fact that there were supposedly "all kinds of"
x.s.l.t. scripts capable of auto-generating a _plethora_ of formats,
when push came to shove, it ended up that marcello is the go-to,
and he made it clear that he didn't really care much about output...
so the .tei folks are stuck with some pretty crummy-looking .pdfs.

meanwhile, i've managed to turn out some pretty respectable .pdfs.
they're not yet as beautiful as i'd like them to be, not by a long shot,
but they've got a ton of functionality built into them, and that's 
good...

given the fact that .pdf is "supported" on a lot of the reader-machines,
i think it's probably important that z.m.l. can auto-generate .pdf 
now...
but again, long-term, my viewer-program can kick the acrobat's ass...

same thing with .html.  with the web, it's obviously an important 
format.
at least until web-browsers support light-markup internally, which will 
be
not too long from now, at least if light-markup continues its steady 
march.
in the meantime, though, it's very good that .zml can convert to 
.html...

.html is also good for reader-machines, since they all support that 
format.

furthermore, something the heavy-markup crowd wants you not to notice
is that the .html output that is auto-generated from a .zml file is 
basically
_the_exact_same_thing_ as the .html output generated from an .xml file.
indeed, i believe if you compare my "my antonia" .html version with 
jon's,
you'll find that mine is actually _more_ capable.  if i remember 
correctly,
the only thing his had that mine didn't was an i.d. on _every_ 
paragraph.
and it would take me less than 10 minutes to add that to my routines...

but again, a focus on "conversions" to "other formats" is beside the 
point.
yeah, i talk about it, but only because i want to show clearly that i 
can
undermine even their very best selling point.  same benefits.  lower 
price.

so don't get hung up on "conversions".  it's better not to have to do 
any.
people who own a rocketbook will tell you they can now do conversions
in their sleep.  but they'll also tell you they would rather be 
dreaming...

with the easy-to-author, easy-to-edit, easy-to-remix zen markup 
language,
people will soon come to see that there's just no need for another 
format...
especially since, if they _do_ ever need one, it's just a button-click 
away.
but it will surprise you how quickly they'll decide to jettison other 
formats.

and yeah, yeah, it will be easy for my opponents to label that 
"ridiculous".
so that might set them off to yapping for a little while again, in an 
attempt
to distract you from the fact they have no credibility left.  but 
believe me,
in a couple of years from now, when i have proven it, with tasty 
pudding,
they will again want you to forget they ever said anything to the 
contrary...

-bowerbird


________________________________________________________________________
Email and AIM finally together. You've gotta check out free AOL Mail! - 
http://mail.aol.com

From hart at pglaf.org  Mon Oct 29 08:49:59 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 08:49:59 -0700 (PDT)
Subject: [gutvol-d] Oct. 29, 1991 "Internet" First Appears
Message-ID: <Pine.LNX.4.64.0710290846210.18344@pglaf.org>


16 years ago today the word "Internet" first appeared on a
front page or cover of the major media.

Wall Street Journal, Page One, the story about eBooks.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg


From hart at pglaf.org  Mon Oct 29 11:31:54 2007
From: hart at pglaf.org (Michael Hart)
Date: Mon, 29 Oct 2007 11:31:54 -0700 (PDT)
Subject: [gutvol-d] CORRECTION: Oct. 29, 1991 "Internet" First Appears
Message-ID: <Pine.LNX.4.64.0710291125080.21583@pglaf.org>


Perhaps I should have included a reference to an earlier
article in the Washington Post about the "worm" that did
some serious slowing down of the Internet in 1988.

I don't know why the article I was referring to did not,
in any way, mention this. . .perhaps they don't thing of
a political paper such as The Washington Post as "major"
world media, or perhaps it just wasn't in their index.

Micheael


16 years ago today the word "Internet" first appeared on a
front page or cover of the major media.

Wall Street Journal, Page One, the story about eBooks.


Thanks!!!

Michael S. Hart
Founder
Project Gutenberg