From Gutenberg9443 at aol.com  Fri Oct  1 05:07:09 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Oct  1 05:07:21 2004
Subject: [gutvol-d] Indexing Editors, etc.
Message-ID: <144.3502adb2.2e8ea26d@aol.com>

 
In a message dated 9/19/2004 10:38:22 AM Mountain Standard Time,  
marevalo@marevalo.net writes:

And it  would be great to have the complete bibliographical record of the
book (o  books) used as source for the digital edition on every new  text.


Or at least the date of original publication and the name of the original  
publisher. Not having those makes it difficult to cite the book in research  
work. I usually resort to going to the Library of Congress record to get that,  
and then include the fact that this is an on-line edition etc. etc. etc.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041001/887c6099/attachment.html
From Bowerbird at aol.com  Fri Oct  1 10:56:23 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct  1 10:56:35 2004
Subject: [gutvol-d] Indexing Editors, etc.
Message-ID: <8e.1648d182.2e8ef447@aol.com>

marevalo said:
>   >   it would be great to have the complete bibliographical record 
>   >   of the book (or books) used as source for the digital edition 

anne said:
>   Or at least the date of original publication 
>   and the name of the original publisher.

this is a recurring request, of course.

it might be interesting to have a public forum, like a wiki, where
requests like these could be made, so we could see the cumulation.

as it is, i get the distinct impression these go in one ear and
out the other, and we go along in ignorant bliss thinking that
all of our users are completely happy...

-bowerbird
From marcello at perathoner.de  Fri Oct  1 11:10:06 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct  1 11:10:15 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <144.3502adb2.2e8ea26d@aol.com>
References: <144.3502adb2.2e8ea26d@aol.com>
Message-ID: <415D9D7E.5000908@perathoner.de>

Gutenberg9443@aol.com wrote:

>     And it would be great to have the complete bibliographical record of the
>     book (o books) used as source for the digital edition on every new text.
> 
> Or at least the date of original publication and the name of the 
> original publisher. Not having those makes it difficult to cite the book 
> in research work. I usually resort to going to the Library of Congress 
> record to get that, and then include the fact that this is an on-line 
> edition etc. etc. etc.


You give me a list of all publication dates and places etc. and I'll 
insert them into the database. If PG has all TP&Vs archived for 
copyright purposes, it should not be an impossible task.

Adding database support for more attributes is a matter of a few hours. 
It just doens't make sense to add more attributes if nobody volunteers 
to fill them in.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From traverso at dm.unipi.it  Fri Oct  1 11:19:53 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Oct  1 11:20:02 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <415D9D7E.5000908@perathoner.de> (message from Marcello
	Perathoner on Fri, 01 Oct 2004 20:10:06 +0200)
References: <144.3502adb2.2e8ea26d@aol.com> <415D9D7E.5000908@perathoner.de>
Message-ID: <200410011819.i91IJr72024672@posso.dm.unipi.it>

>>>>> "Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:

    Marcello> Gutenberg9443@aol.com wrote:

    >> And it would be great to have the complete bibliographical
    >> record of the book (o books) used as source for the digital
    >> edition on every new text.
    >> 
    >> Or at least the date of original publication and the name of
    >> the original publisher. Not having those makes it difficult to
    >> cite the book in research work. I usually resort to going to
    >> the Library of Congress record to get that, and then include
    >> the fact that this is an on-line edition etc. etc. etc.


    Marcello> You give me a list of all publication dates and places
    Marcello> etc. and I'll insert them into the database. If PG has
    Marcello> all TP&Vs archived for copyright purposes, it should not
    Marcello> be an impossible task.

    Marcello> Adding database support for more attributes is a matter
    Marcello> of a few hours.  It just doens't make sense to add more
    Marcello> attributes if nobody volunteers to fill them in.


It would be very simple, if just the information were not removed
systematically. At DP, the full information (including a transcription
of the titlepage) is kept in proofreading, but it is removed, if not
by the post-procesors, by the whitewashers.

Carlo


From jon at noring.name  Fri Oct  1 11:42:05 2004
From: jon at noring.name (Jon Noring)
Date: Fri Oct  1 11:42:23 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <8e.1648d182.2e8ef447@aol.com>
References: <8e.1648d182.2e8ef447@aol.com>
Message-ID: <169440744187.20041001124205@noring.name>

marevalo said:
>>   >   it would be great to have the complete bibliographical record 
>>   >   of the book (or books) used as source for the digital edition 


anne said:
> Or at least the date of original publication
> and the name of the original publisher.


Bowerbird said:

> this is a recurring request, of course.
>
> it might be interesting to have a public forum, like a wiki, where
> requests like these could be made, so we could see the cumulation.
>
> as it is, i get the distinct impression these go in one ear and
> out the other, and we go along in ignorant bliss thinking that
> all of our users are completely happy...

Agreed.

I believe PG should change their policy (if they haven't already) and
for all new titles to include the full citation of the source (or
sources if someone made a composite using two or more differing
editions, which btw I believe PG should discourage.)

I surmise the reason for the past (and I assume current) policy of
obfuscating the source had to do with fear of copyright litigation --
in essence "providing information to the enemy." However, since nearly
all the texts produced today are from scans which are preserved, it is
no longer possible to obfuscate the source.

Anyway, there are a few other arguments in support of full source
citation, including the most important: assuring integrity of the text
to the original source. This is not a trivial issue.

Jon Noring

From skip at nextra.sk  Sun Oct  3 17:14:55 2004
From: skip at nextra.sk (Skippi)
Date: Sun Oct  3 17:15:23 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <169440744187.20041001124205@noring.name>
References: <8e.1648d182.2e8ef447@aol.com>
	<169440744187.20041001124205@noring.name>
Message-ID: <1194651402.20041004021455@nextra.sk>

Friday, October 1, 2004, 8:42:05 PM, Jon wrote:

> I believe PG should change their policy (if they haven't already) and
> for all new titles to include the full citation of the source (or
> sources if someone made a composite using two or more differing
> editions, which btw I believe PG should discourage.)
...
> Anyway, there are a few other arguments in support of full source
> citation, including the most important: assuring integrity of the text
> to the original source. This is not a trivial issue.

I agree too and suggest that may be this information could be kept in
a XML format conforming some DTD (PG own) so that the book can be very
easily processed or catalogued. But this is going too far from the
plain text idea PG was built and still succesfuly lives on.

-- 
 Skippi                            mailto:skip@nextra.sk

From hart at pglaf.org  Mon Oct  4 10:00:35 2004
From: hart at pglaf.org (Michael Hart)
Date: Mon Oct  4 10:00:36 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <1194651402.20041004021455@nextra.sk>
References: <8e.1648d182.2e8ef447@aol.com>
	<169440744187.20041001124205@noring.name>
	<1194651402.20041004021455@nextra.sk>
Message-ID: <Pine.LNX.4.60.0410040959480.10957@pglaf.org>


If the original source you use turns out to have errors,
as nearly all books do, do you want the errors preserved?

mh

On Mon, 4 Oct 2004, Skippi wrote:

> Friday, October 1, 2004, 8:42:05 PM, Jon wrote:
>
>> I believe PG should change their policy (if they haven't already) and
>> for all new titles to include the full citation of the source (or
>> sources if someone made a composite using two or more differing
>> editions, which btw I believe PG should discourage.)
> ...
>> Anyway, there are a few other arguments in support of full source
>> citation, including the most important: assuring integrity of the text
>> to the original source. This is not a trivial issue.
>
> I agree too and suggest that may be this information could be kept in
> a XML format conforming some DTD (PG own) so that the book can be very
> easily processed or catalogued. But this is going too far from the
> plain text idea PG was built and still succesfuly lives on.
>
> --
> Skippi                            mailto:skip@nextra.sk
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From shalesller at writeme.com  Mon Oct  4 10:28:58 2004
From: shalesller at writeme.com (D. Starner)
Date: Mon Oct  4 10:29:03 2004
Subject: [gutvol-d] Indexing Editors, etc.
Message-ID: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com>

Michael Hart writes:
 
> If the original source you use turns out to have errors, 
> as nearly all books do, do you want the errors preserved? 

Yes. That's way too unconditional, and there's a lot of
minor errors in texts that we can just correct; but I've
been reading a book on Beowulf which talks about one of
the first published transcriptions that "corrected" a lot
of things and screwed with work on Beowulf for 50 years.
I hate to imagine us adding anacronistic spelling to a
work or making a work harder to understand. On DP, people
frequently ask about obvious errors that turn out to be
correct. If we want to be editors, let us be editors and
take full responsibility for checking other editions and
writing introductions and bibliographies and keeping notes
about what we've changed.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hart at pglaf.org  Mon Oct  4 10:41:08 2004
From: hart at pglaf.org (Michael Hart)
Date: Mon Oct  4 10:41:10 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com>
References: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0410041035200.13188@pglaf.org>


On Mon, 4 Oct 2004, D. Starner wrote:

> Michael Hart writes:
>
>> If the original source you use turns out to have errors,
>> as nearly all books do, do you want the errors preserved?
>
> Yes. That's way too unconditional, and there's a lot of
> minor errors in texts that we can just correct; but I've
> been reading a book on Beowulf which talks about one of
> the first published transcriptions that "corrected" a lot
> of things and screwed with work on Beowulf for 50 years.
> I hate to imagine us adding anacronistic spelling to a
> work or making a work harder to understand. On DP, people
> frequently ask about obvious errors that turn out to be
> correct. If we want to be editors, let us be editors and
> take full responsibility for checking other editions and
> writing introductions and bibliographies and keeping notes
> about what we've changed.

If we do take such responsibility, then we are creating
a new "critical edition," which was always our goal.

However, we have tried to avoid the arguments that come with
this sort of thing, as per a previous discussion about this,
when we argued about the punctuation in "To be or not to be."

Once we get into that kind of discussion, it really never ends,
take my father's word for it, he was a great Shakespeare prof.

Thanks!

Michael
From ke at gnu.franken.de  Mon Oct  4 10:40:41 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Mon Oct  4 11:31:55 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <1194651402.20041004021455@nextra.sk> (Skippi's message of "Mon, 
	4 Oct 2004 02:14:55 +0200")
References: <8e.1648d182.2e8ef447@aol.com>
	<169440744187.20041001124205@noring.name>
	<1194651402.20041004021455@nextra.sk>
Message-ID: <shwty6z8iu.fsf@tux.gnu.franken.de>

Skippi <skip@nextra.sk> writes:

> I agree too and suggest that may be this information could be kept in
> a XML format conforming some DTD (PG own) so that the book can be very
> easily processed or catalogued.

It would be wise to go with the TEI DTD and, actually, soem support for
the TEI DTD is already available.


Michael Hart <hart@pglaf.org> writes:

> If the original source you use turns out to have errors,
> as nearly all books do, do you want the errors preserved?

Sure.  At least, say what you changed and why.  Also I strongly
recommend to keep the original page references; you can hide them, but
it should be possibile to make them visible the user is interested in
them.  This could be done using a simple CSS mechanism.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From scott_bulkmail at productarchitect.com  Mon Oct  4 11:23:10 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Mon Oct  4 13:02:12 2004
Subject: [gutvol-d] Indexing Editors, etc.
In-Reply-To: <Pine.LNX.4.60.0410040959480.10957@pglaf.org>
References: <8e.1648d182.2e8ef447@aol.com>
	<169440744187.20041001124205@noring.name>
	<1194651402.20041004021455@nextra.sk>
	<Pine.LNX.4.60.0410040959480.10957@pglaf.org>
Message-ID: <p06110408bd87417f1e60@[192.168.0.52]>

>If the original source you use turns out to have errors,
>as nearly all books do, do you want the errors preserved?

One of the advantages of an XML (or similar) "master version" is that we can have our cake and eat it too.  The MASTER can capture the mistake AND the correction; a separate process can render either or both in various formats, e.g. HTML with little "tool tips" and of course plain text (optionally with or without "errors", anachronistic spellings, etc.).

Three examples:
	<alt why="modern" what="today">to-day</alt>
	<alt why="typo" what="spelling">speeling</alt>
	<alt why="intentional" what="for">f8r</alt>

As an example of the third case, Twain's CT Yankee includes a "newspaper article" full of "typos" that are intended to illustrate an amateur job of typesetting.

NOTE that the xml tags and attributes are just made up on the fly.  I'm NOT advocating any specific tags.  Also, I think it's better to start with something rather than wait for a perfect design.  It's generally easy to transform from one xml into another, e.g. to (or from):
	<archaic modern="today">to-day</archaic>
	<typo correct="spelling">speeling</typo>
	<intentional index="for">f8r</intentional>
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From Bowerbird at aol.com  Mon Oct  4 13:21:09 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct  4 13:21:32 2004
Subject: [gutvol-d] re: toasted-cheese sandwich
Message-ID: <1e8.2b80fab1.2e930ab5@aol.com>

michael said:
>   If the original source you use turns out to have errors,
>   as nearly all books do, do you want the errors preserved?

me?  nope.  i want 'em corrected, with a note to that effect.

but that's if you're _sure_ it's an error.  it's not always that clear.

if you _suspect_ something _might_ be an error, but aren't sure,
even after detective work, i would want a note made to that effect.

***

>   However, we have tried to avoid the arguments that
>   come with this sort of thing, as per a previous discussion 
>   about this, when we argued about the punctuation in
>   "To be or not to be."

in such cases, make note of the arguments.           :+)

***

skippi said:
>   XML

karl said:
>   TEI DTD 
...
>   CSS

scott said:
>   XML

that x.m.l.  it's good for _everything_, isn't it?       :+)

i just heard there's a new thing out now where x.m.l. can 
actually make you a toasted-cheese sandwich, no kidding,
and you can even specify -- with a tag -- how dark or light
you want the bread toasted.  i gotta get me that thing, man,
because i'm jonesin' for a toasted-cheese sandwich right now...

-bowerbird
From scott_bulkmail at productarchitect.com  Mon Oct  4 13:50:16 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Mon Oct  4 13:50:53 2004
Subject: [gutvol-d] re: toasted-cheese sandwich
In-Reply-To: <1e8.2b80fab1.2e930ab5@aol.com>
References: <1e8.2b80fab1.2e930ab5@aol.com>
Message-ID: <p0611040dbd8766e8e8a2@[192.168.0.52]>

>that x.m.l.  it's good for _everything_, isn't it?       :+)

It does an outstanding job of all the uses that I've ever seen it suggested for on PG lists (based on reviewing several months of archives).

How does Z/M/L handle the 3 cases that I listed?
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From Bowerbird at aol.com  Mon Oct  4 14:13:37 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct  4 14:14:06 2004
Subject: [gutvol-d] re: toasted-cheese sandwich
Message-ID: <1d8.2cd34e3d.2e931701@aol.com>

scott said:
>   How does Z/M/L handle the 3 cases that I listed?

oh please, nobody here wants to talk about z.m.l.

ok, maybe 2 or 3 people, but nobody else.

and those 2 or 3 should join the beta-test
for my viewer-program, where we can
discuss issues like the ones scott lists.

     zml_talk-subscribe@yahoogroups.com

now i'm off to make that sandwich...

-bowerbird

p.s.  to answer your question, though, scott,
annotations could be used to make those notes.
From nwolcott2 at kreative.net  Sat Oct  2 13:03:17 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Tue Oct  5 06:14:19 2004
Subject: [gutvol-d] Indexing Editors, etc.
References: <144.3502adb2.2e8ea26d@aol.com>
Message-ID: <008e01c4aadc$b6ac78e0$0b9495ce@net>

I understand the need to conceal the date, publisher etc for most PG books. But I think an exception could be made if the book is over 100 years old. Books of this age are likely to be among the first published and hence have authenticity. Often the publisher has ceased publication, and in this case (as opposed to being bought out) there is no harm in listing the publisher. 
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
  ----- Original Message ----- 
  From: Gutenberg9443@aol.com 
  To: gutvol-d@lists.pglaf.org 
  Sent: Friday, October 01, 2004 8:07 AM
  Subject: Re: [gutvol-d] Indexing Editors, etc.


  In a message dated 9/19/2004 10:38:22 AM Mountain Standard Time, marevalo@marevalo.net writes:
    And it would be great to have the complete bibliographical record of the
    book (o books) used as source for the digital edition on every new text.

  Or at least the date of original publication and the name of the original publisher. Not having those makes it difficult to cite the book in research work. I usually resort to going to the Library of Congress record to get that, and then include the fact that this is an on-line edition etc. etc. etc.

  Anne


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d@lists.pglaf.org
  http://lists.pglaf.org/listinfo.cgi/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041002/83f0fa08/attachment.html
From shalesller at writeme.com  Tue Oct  5 13:23:08 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct  5 13:23:57 2004
Subject: [gutvol-d] Indexing Editors, etc.
Message-ID: <20041005202308.51B3C4BDA9@ws1-1.us4.outblaze.com>

> I understand the need to conceal the date, publisher etc for most PG books. 

I don't. Most of the reprint editions I've seen don't; they usually have
printed right on the verso which edition they are a copy of.

> But I think an exception could be made if the book is over 100 years old. 
> Books of this age are likely to be among the first published and hence have 
> authenticity. Often the publisher has ceased publication, and in this case 
> (as opposed to being bought out) there is no harm in listing the publisher. 

I'm doing a lot of books over 100 years old that are new editions of older 
books. And there's a lot of cases where (for example) the American edition 
came out months after the British edition, but isn't considered authentic 
at all. Likewise, the Early English Text Society is still publishing, as is
the Oxford University Press and most other university presses. I don't
think time makes a bit of difference here.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From marcello at perathoner.de  Thu Oct  7 11:34:59 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct  7 11:35:09 2004
Subject: [gutvol-d] Fundraising thru Amazon ???
Message-ID: <41658C53.9000108@perathoner.de>

Warning: potential heresy ahead!


Seeing that many `independent' PG websites are trying to make money 
using our books and our catalog data, basically mixing search results 
from PG and Amazon Ads, eg.

   http://www.abacci.com/books/
   http://textual.net/access.gutenberg

why not do some fundraising thru Amazon ourselves?


Basically, we set up a page and tell our visitors:

   If you ever feel the need to buy a book at Amazon
   don't go there directly but always thru the PG site.
   Thus Amazon will pass a small percentage of the revenue
   back to PG. This way you can donate to PG without
   spending anything and virtually without any trouble.
   Just delete your old Amazon bookmark and bookmark
   this page instead.


More details at:

   http://www.amazon.com/gp/browse.html?node=3435371


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Thu Oct  7 11:38:37 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct  7 11:38:50 2004
Subject: [gutvol-d] Fundraising thru Amazon ???
Message-ID: <20041007183837.811752F9DE@ws6-3.us4.outblaze.com>

I've got no problem with it.  Just don't make it too obtrusive and I think it's a fine way to try to bring some income into our little corner of the Net.

Josh

----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
Date: Thu, 07 Oct 2004 20:34:59 +0200
To: Project Gutenberg volunteer discussion <gutvol-d@lists.pglaf.org>
Subject: [gutvol-d] Fundraising thru Amazon ???

> Warning: potential heresy ahead!
> 
> 
> 
> Seeing that many `independent' PG websites are trying to make money 
> using our books and our catalog data, basically mixing search results 
> from PG and Amazon Ads, eg.
> 
>    http://www.abacci.com/books/
>    http://textual.net/access.gutenberg
> 
> why not do some fundraising thru Amazon ourselves?
> 
> 
> Basically, we set up a page and tell our visitors:
> 
>    If you ever feel the need to buy a book at Amazon
>    don't go there directly but always thru the PG site.
>    Thus Amazon will pass a small percentage of the revenue
>    back to PG. This way you can donate to PG without
>    spending anything and virtually without any trouble.
>    Just delete your old Amazon bookmark and bookmark
>    this page instead.
> 
> 
> More details at:
> 
>    http://www.amazon.com/gp/browse.html?node=3435371
> 
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From maitriv at yahoo.com  Thu Oct  7 11:55:33 2004
From: maitriv at yahoo.com (maitri venkat-ramani)
Date: Thu Oct  7 11:55:44 2004
Subject: [gutvol-d] Fundraising thru Amazon ???
In-Reply-To: <41658C53.9000108@perathoner.de>
Message-ID: <20041007185533.97941.qmail@web52302.mail.yahoo.com>


The method that you suggest is what Newsscan does at the end of their
Honorary Subscriber segment.  The only problem I have with it is
sending people to Amazon at all, when they should be getting out of
their houses and keeping local bookstores afloat.  Granted, it's easier
and cheaper (and how many of us live in the internet age now), but
still gives me that feeling akin to shopping at WalMart.

Social policy aside, it's a great way to get some of the money Amazon
is giving away.  

Cheers,
Maitri

--- Marcello Perathoner <marcello@perathoner.de> wrote:

> Warning: potential heresy ahead!
> 
> 
> 
> Seeing that many `independent' PG websites are trying to make money 
> using our books and our catalog data, basically mixing search results
> 
> from PG and Amazon Ads, eg.
> 
>    http://www.abacci.com/books/
>    http://textual.net/access.gutenberg
> 
> why not do some fundraising thru Amazon ourselves?
> 
> 
> Basically, we set up a page and tell our visitors:
> 
>    If you ever feel the need to buy a book at Amazon
>    don't go there directly but always thru the PG site.
>    Thus Amazon will pass a small percentage of the revenue
>    back to PG. This way you can donate to PG without
>    spending anything and virtually without any trouble.
>    Just delete your old Amazon bookmark and bookmark
>    this page instead.
> 
> 
> More details at:
> 
>    http://www.amazon.com/gp/browse.html?node=3435371
> 
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From gbnewby at pglaf.org  Thu Oct  7 11:59:09 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Oct  7 11:59:11 2004
Subject: [gutvol-d] Fundraising thru Amazon ???
In-Reply-To: <41658C53.9000108@perathoner.de>
References: <41658C53.9000108@perathoner.de>
Message-ID: <20041007185909.GA25006@pglaf.org>

On Thu, Oct 07, 2004 at 08:34:59PM +0200, Marcello Perathoner wrote:
> Warning: potential heresy ahead!
> 
> 
> 
> Seeing that many `independent' PG websites are trying to make money 
> using our books and our catalog data, basically mixing search results 
> from PG and Amazon Ads, eg.
> 
>   http://www.abacci.com/books/
>   http://textual.net/access.gutenberg
> 
> why not do some fundraising thru Amazon ourselves?
> 
> 
> Basically, we set up a page and tell our visitors:
> 
>   If you ever feel the need to buy a book at Amazon
>   don't go there directly but always thru the PG site.
>   Thus Amazon will pass a small percentage of the revenue
>   back to PG. This way you can donate to PG without
>   spending anything and virtually without any trouble.
>   Just delete your old Amazon bookmark and bookmark
>   this page instead.
> 
> 
> More details at:
> 
>   http://www.amazon.com/gp/browse.html?node=3435371

While I'd like to add a few hundred words of disclaimer
about how Amazon has stolen our works and the works
of our authors (including copyrighted contemporary
authors like Sam Vaknin),

and about how they keep trying to "partner" with PG,
but always drop off the edge of the earth after we do
a bunch of work for them, and have never done *anything*
for us, including what they've offered & promised,

and about how putting ink on dead trees is completely passe,

and about how I'm *still* boycotting them over the
1-click patent thing, and so should you,

and about how they give eBooks a bad name by having
completely disfunctional "within the book" pages on their site,
and DRM'd versions of public domain content,

and more ...

I think it's OK to put them in some far-off corner of
gutenberg.net (*not* on "links & affiliates", please,
which should be for people we *like*) with this pass-through
link.

I'd like to do the same for O'Reilly & BN.com, if they
have similar programs.

I'd also like to make sure we get the $$$ from Amazon (and
how much it will be), so we can have full disclosure to
our buyers - um, I mean, readers - about what their actions
do for us.  Maybe someone else will want to pay us
$200 per year or so to NOT put a link to Amazon (I'm
guessing that's about the most we'd get from this).
  -- Greg

From j.hagerson at comcast.net  Thu Oct  7 18:16:38 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Thu Oct  7 18:19:04 2004
Subject: [gutvol-d] Fundraising thru Amazon ???
In-Reply-To: <20041007185909.GA25006@pglaf.org>
Message-ID: <00be01c4acd4$7998ae70$6401a8c0@enterprise>

Yesterday, I saw a financial news story to the effect that Google is
planning a similar (full-text search of books) service.

Here is a link I found in Google News, by searching "Google and books"
http://www.itweek.co.uk/news/1158624
Many more citations are available.


From sly at victoria.tc.ca  Sat Oct  9 01:26:13 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Oct  9 01:26:36 2004
Subject: [gutvol-d] Extra spaces in html files
Message-ID: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>

Dear fellow PG volunteers,

I know that discussing issues of markup in PG files
is a pointless argument that rarely goes anywhere.
Still, I must ask if is it generally acceptable to
most PG volunteers to have HTML files in the collection
with massive amounts of redundant white space in them?

By this point in time, there are megabytes of storage
space in the PG archive which consist of only spaces
because of much indentation in html files.

Take a look at the html source of the recently released
Edward Lear "A Book of Nonsense" to see an example a little
more extreme than most I've seen:

http://www.gutenberg.net/etext/13646

Andrew
From hyphen at hyphenologist.co.uk  Sat Oct  9 01:59:12 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sat Oct  9 02:00:03 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
Message-ID: <hh9fm0pqc9ftshnl2159mhhmo32ae9e64l@4ax.com>

On Sat, 9 Oct 2004 01:26:13 -0700 (PDT),  Andrew Sly <sly@victoria.tc.ca>
wrote:

| Dear fellow PG volunteers,
| 
| I know that discussing issues of markup in PG files
| is a pointless argument that rarely goes anywhere.
| Still, I must ask if is it generally acceptable to
| most PG volunteers to have HTML files in the collection
| with massive amounts of redundant white space in them?
| 
| By this point in time, there are megabytes of storage
| space in the PG archive which consist of only spaces
| because of much indentation in html files.
| 
| Take a look at the html source of the recently released
| Edward Lear "A Book of Nonsense" to see an example a little
| more extreme than most I've seen:
| 
| http://www.gutenberg.net/etext/13646

Just had a look at it and IMO it appears to be *very* well done.
The indentation is only *two* spaces per level, whereas some would use
*eight* spaces per level.

As anyone who has done hand programming of html or any computer language,
knows, the indenting and other white space in the code is *absolutely*
essential for understanding the code, especially after a year or two, when
you have forgotten everything about it.   The white space is even more
essential when modifieing other peoples code.

-- 
Dave Fawthrop <dave hyphenologist co uk>  Don't eat cousin Banana she
shares 50% of your genes.  Do not kill cousin House Mouse, it is not his
fault he is doubly incontinent.  Flies need your help.   Killing cousin
salmonella with bleach is murder, he is as much alive as you are. ;-)


From traverso at dm.unipi.it  Sat Oct  9 02:25:04 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sat Oct  9 02:25:29 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca> (message
	from Andrew Sly on Sat, 9 Oct 2004 01:26:13 -0700 (PDT))
References: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
Message-ID: <200410090925.i999P4iv004470@posso.dm.unipi.it>

>>>>> "Andrew" == Andrew Sly <sly@victoria.tc.ca> writes:

    Andrew> Dear fellow PG volunteers,

    Andrew> I know that discussing issues of markup in PG files is a
    Andrew> pointless argument that rarely goes anywhere.  Still, I
    Andrew> must ask if is it generally acceptable to most PG
    Andrew> volunteers to have HTML files in the collection with
    Andrew> massive amounts of redundant white space in them?

    Andrew> By this point in time, there are megabytes of storage
    Andrew> space in the PG archive which consist of only spaces
    Andrew> because of much indentation in html files.

    Andrew> Take a look at the html source of the recently released
    Andrew> Edward Lear "A Book of Nonsense" to see an example a
    Andrew> little more extreme than most I've seen:

I have taken the file, unzipped, replaced every multiple whitespace
with single withspace and rezipped; the saving has been 365 bytes (out
of 640KB).

The message of Andrew, as received by me, with all the headers etc,
was 3247 bytes.

Although one might discuss logical indenting in html sources, versus
75 column texts, I don't think that the space is at issue; discussing
bytes, or even megabytes, when the archive is terabytes, is discussing
00001% savings.

Carlo
From jonathan_ingram at yahoo.com  Sat Oct  9 03:46:10 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Sat Oct  9 03:46:33 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
Message-ID: <20041009104610.11378.qmail@web41722.mail.yahoo.com>


--- Andrew Sly <sly@victoria.tc.ca> wrote:

> Dear fellow PG volunteers,
> 
> I know that discussing issues of markup in PG files
> is a pointless argument that rarely goes anywhere.
> Still, I must ask if is it generally acceptable to
> most PG volunteers to have HTML files in the collection
> with massive amounts of redundant white space in them?
> 
> By this point in time, there are megabytes of storage
> space in the PG archive which consist of only spaces
> because of much indentation in html files.

Just as much space is wasted by the pointless way we insert newlines into text
editions to keep line lengths down to 80 characters. Much more space is wasted
by the odd decision to include in PG poor-quality computerized 'readings' of PG
material.

The easiest way to drastically reduce the amount of wasted space used by PG is
to get rid of the multiple editions, transition to one decently marked up XML
master format, and convert to required output formats on the fly. This has
approximately zero chance of happening any time soon.

--
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From jeroen at bohol.ph  Sat Oct  9 07:07:16 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sat Oct  9 07:06:03 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <20041009104610.11378.qmail@web41722.mail.yahoo.com>
References: <20041009104610.11378.qmail@web41722.mail.yahoo.com>
Message-ID: <4167F094.5020908@bohol.ph>

Jonathan Ingram wrote:

>The easiest way to drastically reduce the amount of wasted space used by PG is
>to get rid of the multiple editions, transition to one decently marked up XML
>master format, and convert to required output formats on the fly. This has
>approximately zero chance of happening any time soon.
>  
>
I am a big supporter of XML, but I challenge you to automatically create 
an acceptible ASCII version from one of my XML files without manual 
intervention... One small warning, they have loads of tables and other 
challenging stuff. I think it can be done, but it is far from trivial.

Jeroen.
From sly at victoria.tc.ca  Sat Oct  9 09:38:15 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Oct  9 09:38:20 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <hh9fm0pqc9ftshnl2159mhhmo32ae9e64l@4ax.com>
References: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
	<hh9fm0pqc9ftshnl2159mhhmo32ae9e64l@4ax.com>
Message-ID: <Pine.GSO.4.58.0410090931340.7155@vtn1.victoria.tc.ca>


Thank you for everyone's feedback.

A closer look at the file I mentioned shows that it uses tabs,
not spaces for indenting, so it will appear differently depending
on what program you use to view it.
(the main body of the text is all indented by eight tabs, which
for me, made it appear to start in the 64th column)

Thanks,
Andrew
From ke at gnu.franken.de  Sat Oct  9 09:50:38 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Sat Oct  9 09:44:47 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <4167F094.5020908@bohol.ph> (Jeroen Hellingman's message of "Sat, 
	09 Oct 2004 16:07:16 +0200")
References: <20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
Message-ID: <sh3c0nhm3l.fsf@tux.gnu.franken.de>

Jeroen Hellingman <jeroen@bohol.ph> writes:

> I am a big supporter of XML, but I challenge you to automatically create 
> an acceptible ASCII version from one of my XML files without manual 
> intervention...

Don't waste your time on so called ASCII version.  Simple HTML as a
replacement for the traditional ASCII version is "good enough" - then
tools like lynx or w3m or links(?) can do the dirty work.  I do not know
whether there are special HTML device for the blind; but I know some of
them use lynx to browse (parts of) the web.

> One small warning, they have loads of tables and other challenging
> stuff. I think it can be done, but it is far from trivial.

First, these text browser can display tables and if this is not good
enough, you can always press a magic key and view the HTML source.

Of course, if people want to spend their time on ASCII versions, it is
their business.  But the XML version must be the source for all other
formats.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From gbnewby at pglaf.org  Sat Oct  9 19:18:14 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Oct  9 19:18:15 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <sh3c0nhm3l.fsf@tux.gnu.franken.de>
References: <20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph> <sh3c0nhm3l.fsf@tux.gnu.franken.de>
Message-ID: <20041010021814.GB15791@pglaf.org>

On Sat, Oct 09, 2004 at 06:50:38PM +0200, Karl Eichwalder wrote:
> Jeroen Hellingman <jeroen@bohol.ph> writes:
> 
> > I am a big supporter of XML, but I challenge you to automatically create 
> > an acceptible ASCII version from one of my XML files without manual 
> > intervention...
> 
> Don't waste your time on so called ASCII version.  Simple HTML as a
> replacement for the traditional ASCII version is "good enough" - then
> tools like lynx or w3m or links(?) can do the dirty work.  I do not know
> whether there are special HTML device for the blind; but I know some of
> them use lynx to browse (parts of) the web.
> 
> > One small warning, they have loads of tables and other challenging
> > stuff. I think it can be done, but it is far from trivial.
> 
> First, these text browser can display tables and if this is not good
> enough, you can always press a magic key and view the HTML source.
> 
> Of course, if people want to spend their time on ASCII versions, it is
> their business.  But the XML version must be the source for all other
> formats.

I'm just writing to point out that Karl's statements are not
consistent with how Project Gutenberg processes and distributes
eBooks.  See more in our FAQ at gutenberg.net

In short:
- we *require* plain text, except in cases where the format,
  language or other aspects make it impossible or highly difficult

As Jeroen mentioned, we're anxious to have an automatic
transformation from XML to HTML and from XML to plain text.
These have proven more difficult than expected, although both
Jeroen & Marcello have solutions that are pretty good.  People
who think they know how to accomplish this task should 
send a URL to documentation & a demonstration.
  -- Greg

PS: People who want PDF-only, XML-only, HTML-only, TeX-only,
etc. are welcome to start their own projects.  PG might even
be willing to license our name to you (more on this in
http://gutenberg.net/about).
From nwolcott2 at kreative.net  Sat Oct  9 23:04:21 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sat Oct  9 23:21:05 2004
Subject: [gutvol-d] Extra spaces in html files
References: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
Message-ID: <008601c4ae90$c56264a0$1d9895ce@net>

Most of the white space in the html is tab characters, so that cuts things
down quite a bit. Also if a few tabs make the html source more readable (and
editable) then why not? In any event the tabs use up far less space than the
pictures.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Andrew Sly" <sly@victoria.tc.ca>
To: <gutvol-d@lists.pglaf.org>
Sent: Saturday, October 09, 2004 4:26 AM
Subject: [gutvol-d] Extra spaces in html files


> Dear fellow PG volunteers,
>
> I know that discussing issues of markup in PG files
> is a pointless argument that rarely goes anywhere.
> Still, I must ask if is it generally acceptable to
> most PG volunteers to have HTML files in the collection
> with massive amounts of redundant white space in them?
>
> By this point in time, there are megabytes of storage
> space in the PG archive which consist of only spaces
> because of much indentation in html files.
>
> Take a look at the html source of the recently released
> Edward Lear "A Book of Nonsense" to see an example a little
> more extreme than most I've seen:
>
> http://www.gutenberg.net/etext/13646
>
> Andrew
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>

From tb at baechler.net  Sun Oct 10 01:28:13 2004
From: tb at baechler.net (Tony Baechler)
Date: Sun Oct 10 01:27:38 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <sh3c0nhm3l.fsf@tux.gnu.franken.de>
References: <4167F094.5020908@bohol.ph>
	<20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
Message-ID: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>

At 06:50 PM 10/9/2004 +0200, you wrote:
>Jeroen Hellingman <jeroen@bohol.ph> writes:
>
> > I am a big supporter of XML, but I challenge you to automatically create
> > an acceptible ASCII version from one of my XML files without manual
> > intervention...
>
>Don't waste your time on so called ASCII version.  Simple HTML as a
>replacement for the traditional ASCII version is "good enough" - then
>tools like lynx or w3m or links(?) can do the dirty work.  I do not know
>whether there are special HTML device for the blind; but I know some of
>them use lynx to browse (parts of) the web.


Hello.  Yes, I am blind and I still use Lynx regularly.  However, it does 
not create clean ASCII files.  Every page I convert has tww blank spaces at 
the beginning of every line and it inserts junk to mark links and image 
placeholders.  Also, more and more sites no longer work with text browsers 
so using Lynx or Links is becoming a thing of the past.  Please don't even 
get me started on how poor Internet Explorer does at plain text dumps, 
however it is currently the most accessible graphical browser.

One thing I really like about the current PG model is that I can quickl go 
to the ftp site, grab a file, unzip it and have readable plain text.  I 
would not want to have to download a master xml file and convert it or have 
the PG site convert it on the fly and try to download it with my 
browser.  Let's not lose sight of the goal of PG, to make as many ebooks 
available to as many people on as many platforms as possible.  I can 
download the same file to my Windows or Linux machines and they are just as 
accessible.  I can load it into a portable notetaker for the blind and it 
is still just as accessible.  I can even put it on my old Apple II and yes, 
it's still accessible.  I hope this doesn't change. 

From ke at gnu.franken.de  Sat Oct  9 22:18:19 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Sun Oct 10 07:31:48 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <20041010021814.GB15791@pglaf.org> (Greg Newby's message of "Sat, 
	9 Oct 2004 19:18:14 -0700")
References: <20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph> <sh3c0nhm3l.fsf@tux.gnu.franken.de>
	<20041010021814.GB15791@pglaf.org>
Message-ID: <shu0t3f8x0.fsf@tux.gnu.franken.de>

Greg Newby <gbnewby@pglaf.org> writes:

> - we *require* plain text, except in cases where the format,
>   language or other aspects make it impossible or highly difficult

In cases where plain text is not impossible or highly difficult, use
lynx's or w3m's -dump option.  Problem solved.  Most of the time this
will look better than hand-crafted .txt files.

> PS: People who want PDF-only, XML-only, HTML-only, TeX-only,
> etc. are welcome to start their own projects.

That's what I do.  But this does not mean I did not try to cooperate.

> PG might even be willing to license our name to you (more on this in
> http://gutenberg.net/about).

The name is not important for my (little) project.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From ke at gnu.franken.de  Sun Oct 10 07:19:24 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Sun Oct 10 07:31:49 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> (Tony
	Baechler's message of "Sun, 10 Oct 2004 01:28:13 -0700")
References: <4167F094.5020908@bohol.ph>
	<20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
	<5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>
Message-ID: <shhdp2fyfn.fsf@tux.gnu.franken.de>

Tony Baechler <tb@baechler.net> writes:

> However, it does not create clean ASCII files.  Every page I convert
> has tww blank spaces at the beginning of every line and it inserts
> junk to mark links and image placeholders.

I appreciate your feedback very much!  I guess with a little bit
post-processing we can improve the output.  Or we should use 'w3m' for
creating txt files.

> One thing I really like about the current PG model is that I can quickl go 
> to the ftp site, grab a file, unzip it and have readable plain text.

Yes, I don't want you to produce txt files on your own.  We should
change the way how we create txt files.  Doing txt files by hand is too
slow.  Often it is necessary to improve a text (typos, missing part,
random garbage); if you have to apply the same correction to various
files manually you must spend more time than necessary and such a
procedure is error prone by itself.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From Bowerbird at aol.com  Sun Oct 10 09:12:12 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Oct 10 09:12:23 2004
Subject: [gutvol-d] Re: Extra spaces in html files
Message-ID: <1dc.2d93808d.2e9ab95c@aol.com>


tony, thank you!

your input as a blind reader is extremely valuable.  i love it.


>   Let's not lose sight of the goal of PG

ha!  and you've got a sense of humor too!         ;+)

-bowerbird
From nwolcott2 at kreative.net  Sun Oct 10 20:00:32 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Sun Oct 10 20:04:50 2004
Subject: [gutvol-d] St. Nicholas Magazines
Message-ID: <005e01c4af3e$84610340$ae9495ce@net>

I have about 10 years of annuals of St. Nicholas Magazine rescued from a dumpster. Does anyone know the address of the DP High speed scanning place, and whether anyone there would process these? They have wonderful engravings. I would lpay the postage, if I knew where to send them and that someone would process them to DP. 


nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041010/bcb2bdc9/attachment.html
From hart at pglaf.org  Thu Oct 14 04:59:47 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Oct 14 04:59:48 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <shhdp2fyfn.fsf@tux.gnu.franken.de>
References: <4167F094.5020908@bohol.ph>
	<20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
	<5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>
	<shhdp2fyfn.fsf@tux.gnu.franken.de>
Message-ID: <Pine.LNX.4.60.0410140457120.9205@pglaf.org>


On Sun, 10 Oct 2004, Karl Eichwalder wrote:

> Tony Baechler <tb@baechler.net> writes:
>
>> However, it does not create clean ASCII files.  Every page I convert
>> has tww blank spaces at the beginning of every line and it inserts
>> junk to mark links and image placeholders.
>
> I appreciate your feedback very much!  I guess with a little bit
> post-processing we can improve the output.  Or we should use 'w3m' for
> creating txt files.
>
>> One thing I really like about the current PG model is that I can quickl go
>> to the ftp site, grab a file, unzip it and have readable plain text.
>
> Yes, I don't want you to produce txt files on your own.  We should
> change the way how we create txt files.  Doing txt files by hand is too
> slow.  Often it is necessary to improve a text (typos, missing part,
> random garbage); if you have to apply the same correction to various
> files manually you must spend more time than necessary and such a
> procedure is error prone by itself.

When I was faced with these problems, I just wrote macros for my
word processor to take out leading and trailing spaces.  If there
were sections of poetry or songs that looked better indented, then
I just changed the spaces in those to @'s and then did a global
search and replace [after first searching for @'s already there].
These steps all combined take less time than I spent writing this.

Michael Hart

From hart at pglaf.org  Thu Oct 14 05:26:14 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Oct 14 05:26:16 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>
References: <4167F094.5020908@bohol.ph>
	<20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
	<5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>
Message-ID: <Pine.LNX.4.60.0410140524110.9205@pglaf.org>


One more suggestion:

there are many brands of word processors and other programs
that include file conversion [and the kinds of macros I had
mentioned earlier], so I should think it would be easy enuf
to find one that met your specifications.

My own suggestion would be to start with things such as the
Word Perfect versions, don't just try one version, they are
quite different version to version.

Michael

From hart at pglaf.org  Thu Oct 14 05:37:21 2004
From: hart at pglaf.org (Michael Hart)
Date: Thu Oct 14 05:37:23 2004
Subject: [gutvol-d] Extra spaces in html files
In-Reply-To: <Pine.GSO.4.58.0410090931340.7155@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0410090124170.12378@vtn1.victoria.tc.ca>
	<hh9fm0pqc9ftshnl2159mhhmo32ae9e64l@4ax.com>
	<Pine.GSO.4.58.0410090931340.7155@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.60.0410140536520.9205@pglaf.org>


On Sat, 9 Oct 2004, Andrew Sly wrote:

>
> Thank you for everyone's feedback.
>
> A closer look at the file I mentioned shows that it uses tabs,
> not spaces for indenting, so it will appear differently depending
> on what program you use to view it.
> (the main body of the text is all indented by eight tabs, which
> for me, made it appear to start in the 64th column)

That's why TABS are not recommended in any Project Gutenberg file.

Michael

From ke at gnu.franken.de  Thu Oct 14 07:44:08 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Thu Oct 14 09:31:40 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <Pine.LNX.4.60.0410140524110.9205@pglaf.org> (Michael Hart's
	message of "Thu, 14 Oct 2004 05:26:14 -0700 (PDT)")
References: <4167F094.5020908@bohol.ph>
	<20041009104610.11378.qmail@web41722.mail.yahoo.com>
	<4167F094.5020908@bohol.ph>
	<5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com>
	<Pine.LNX.4.60.0410140524110.9205@pglaf.org>
Message-ID: <shvfdduzpj.fsf@tux.gnu.franken.de>

Michael Hart <hart@pglaf.org> writes:

> there are many brands of word processors and other programs
> that include file conversion [and the kinds of macros I had
> mentioned earlier], so I should think it would be easy enuf
> to find one that met your specifications.

Converting any file into HTML is easy, but that's not the point.  If you
are interested in good HTML or PDF you must start with a sematically
tagged file (these days that's mostly an XML file).

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From Bowerbird at aol.com  Thu Oct 14 10:02:44 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 14 10:03:01 2004
Subject: [gutvol-d] Re: Extra spaces in html files
Message-ID: <42.5a526037.2ea00b34@aol.com>

karl said:
>   If you are interested in good HTML or PDF 
>   you must start with a sematically tagged file
>   (these days that's mostly an XML file).

can you give and defend your definition of "good" in this case?

ditto with "semantically tagged file"?

and, if you are up to the challenge, what is your recommendation
as to the route that should be taken to get a library of 14,000+
e-texts converted to the brand of x.m.l. markup you think is best?

(bonus points if you can convince all the other x.m.l. advocates that
the markup version you prefer is better than the ones they prefer.)

finally, greg recently requested that people come forward with
working routines to implement an x.m.l.-master methodology.
are you able to answer that call?  did you?  if so, do let us know.

-bowerbird
From marcello at perathoner.de  Thu Oct 14 10:10:27 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 14 10:10:36 2004
Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html files]
In-Reply-To: <42.5a526037.2ea00b34@aol.com>
References: <42.5a526037.2ea00b34@aol.com>
Message-ID: <416EB303.8040800@perathoner.de>

Bowerbird@aol.com wrote:

>>  If you are interested in good HTML or PDF 
>>  you must start with a sematically tagged file
>>  (these days that's mostly an XML file).
> 
> 
> can you give and defend your definition of "good" in this case?
> 
> ditto with "semantically tagged file"?
> 
> and, if you are up to the challenge, what is your recommendation
> as to the route that should be taken to get a library of 14,000+
> e-texts converted to the brand of x.m.l. markup you think is best?
> 
> (bonus points if you can convince all the other x.m.l. advocates that
> the markup version you prefer is better than the ones they prefer.)
> 
> finally, greg recently requested that people come forward with
> working routines to implement an x.m.l.-master methodology.
> are you able to answer that call?  did you?  if so, do let us know.
> 
> -bowerbird


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 14 10:35:36 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 14 10:35:54 2004
Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html
	files]
Message-ID: <128.4dabf8bd.2ea012e8@aol.com>

i have a question for the person tending this listserve:
is this kind of name-calling condoned on this listserve?

-bowerbird
From jeroen at bohol.ph  Thu Oct 14 15:29:08 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Thu Oct 14 15:29:25 2004
Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html
	files]
In-Reply-To: <416EB303.8040800@perathoner.de>
References: <42.5a526037.2ea00b34@aol.com> <416EB303.8040800@perathoner.de>
Message-ID: <416EFDB4.9090708@bohol.ph>


Well, lets keep the name calling off-line, and the discussion pure..., 
and realise that XML is not a format, but a way of specifying formats 
(and probably all these formats have in common is that they use angled 
brackets in some way), and that semantically tagged is an ideal, that 
even the most ambitious attempts at a generic DTD for pre-existing texts 
(and that is what we are mostly dealing with in PG) have not reached, 
and is either unreachable (since we can't know the original intend with 
much of the formatting we encounter) or impractical (since the effort to 
do all this tagging is just too big, and isn't really needed by 99% of 
the users.) In my opinion, the best attempt to such a generic beast has 
been the TEI effort, which is described in a massive 1400 page document, 
still requires customization for numerous academic projects (both are 
bad news; both are unavoidable given the complexity of the task) -- but 
which can cover 95 percent of all text with just 5 percent of that bulk 
in an incarnation called TEI-Lite, and that is basically all I suggest 
to PG to adopt as a standard. The nice thing of this monster is that we 
can add those 5 percent, and if somebody decides to add more, nothing 
will stop him, and he can easily return the improved version to the 
collection.

Doing fully automatic convertion to good paged PDFs for printing nice 
copies (and I mean good, as different from workable) will probably 
always remain a dream, as good layout, just as good a good typographic 
design is a skill, learned through doing it a lot. Even in a highly 
programmable environment such as TeX, I've never been able to print 
something from "semantic" markup without manual interventions once in a 
while -- even for something as arcane as a two column dictionary.
Simularly, doing a good HTML (as different from a reasonable HTML) will 
probably also require manual intervention and tweaking once in a 
while... but both these things do not disqualify the large benefits we 
could have from having TEI tagged master copies in our collection, even 
if just at a relatively simple level of tagging (just marking headings, 
divisions, italics, footnotes, and tables).

The task of producing nice HTML / Printable versions of XML documents is 
further complicated by the highly verbose and somewhat unintuitive model 
of XSLT, which is presented as the most important tool for this task -- 
from the computer scientist purist point of view that might be true, but 
for many less gods, who think five lines of basic is already a lot, its 
functional programming model and verbosity is a real piss-off.

Getting 14000+ texts to XML can be done, just as they where produced 
initially, by starting somewhere with the first one, and not stopping 
until we've completed them all.

A very simple alternative way would be to load them in OpenOffice, apply 
the formatting you like and save it (OpenOffice uses XML files for 
everything, and collects them in zip archives. If you don't believe 
that, change the extention of an OpenOffice document to .zip, and have a 
look inside) ofcourse that formatting would be very much non-"semantic".

Jeroen.

(Still formatting his ebooks in SGML based TEI)


Marcello Perathoner wrote:

> Bowerbird@aol.com wrote:
>
>>>  If you are interested in good HTML or PDF  you must start with a 
>>> sematically tagged file
>>>  (these days that's mostly an XML file).
>>
>>
>>
>> can you give and defend your definition of "good" in this case?
>>
>> ditto with "semantically tagged file"?
>>
>> and, if you are up to the challenge, what is your recommendation
>> as to the route that should be taken to get a library of 14,000+
>> e-texts converted to the brand of x.m.l. markup you think is best?
>>
>> (bonus points if you can convince all the other x.m.l. advocates that
>> the markup version you prefer is better than the ones they prefer.)
>>
>> finally, greg recently requested that people come forward with
>> working routines to implement an x.m.l.-master methodology.
>> are you able to answer that call?  did you?  if so, do let us know.
>>
>> -bowerbird
>
>
>

From jon at noring.name  Thu Oct 14 16:45:07 2004
From: jon at noring.name (Jon Noring)
Date: Thu Oct 14 16:46:12 2004
Subject: YesLogic's Prince and OpenReader (was Re: [gutvol-d] Don't feed the
	troll ! [Was: Extra spaces in html files])
In-Reply-To: <416EFDB4.9090708@bohol.ph>
References: <42.5a526037.2ea00b34@aol.com> <416EB303.8040800@perathoner.de>
	<416EFDB4.9090708@bohol.ph>
Message-ID: <8536547921.20041014174507@noring.name>

Jeroen wrote:

> The task of producing nice HTML / Printable versions of XML documents is 
> further complicated by the highly verbose and somewhat unintuitive model 
> of XSLT, which is presented as the most important tool for this task -- 
> from the computer scientist purist point of view that might be true, but 
> for many less gods, who think five lines of basic is already a lot, its 
> functional programming model and verbosity is a real piss-off.

There is actually a fairly powerful "non-professional" alternative to
the XSLT/XSL-FO approach to converting XML into PDF (or similar
page-oriented layout): YesLogic's Prince product (soon to be at version
4.0 with optimized PDF output and embedded fonts -- wait until 4.0 is
released in the next few days.)

Prince uses the XML+CSS approach, and of course invokes the advanced CSS2
and some of the proposed CSS3 constructs. The founder of YesLogic, Michael
Day, serves on the CSS Working Group of W3C, so he is quite aware of
the power and limitations of CSS. Of course, there are a few knotty things
that the current CSS2 cannot do, but YesLogic has added a few "custom" CSS
constructs to fill in the voids, just as both Mozilla and Opera have
(little known, btw). (I also want to add for those few here interested
that the CSS parser in Prince is probably the best out there.)

Now, I do agree that the absolute best outputs for print from XML
sources via the XSLT/XSL-FO and Prince approaches require human
intervention ("tweaking"), but the nice thing with a tool like Prince is
that it gets one most of the way there, uses the slightly easier-to-use
CSS, and allows for manual tweaking until the PDF is just right.

Prince supports SVG and plans to add MathML support as well. They are
a major supporter of the OpenReader System which I'm leading the
development of: http://www.openreader.org .

As an aside, for OpenReader I'm now building a supporter's/endorser's
page, and any company, organization or individual willing to add their
logo or name to the page, contact me in private email -- I'll send you
the link to the current draft supporters page if you're interested in
supporting/endorsing OpenReader. Maybe PG Foundation is interested?
Greg? Michael?

Btw, OpenReader plans to eventually natively support TEI-Lite (or
maybe a well-defined subset of TEI or TEI-Lite) without need for
conversion, including supporting constructs not supported in HTML web
browsers such as inline notes and the like. Refer to the OpenReader
web site for the details. Heck, we may even support ZML if it becomes
popular as Bowerbird believes it will -- it'd be trivial to support
ZML, actually (we'd internally convert it to XML and then present it
using standardized CSS style sheets.)

Jon Noring

From j.hagerson at comcast.net  Thu Oct 14 19:19:08 2004
From: j.hagerson at comcast.net (John Hagerson)
Date: Thu Oct 14 19:17:31 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <002401c4b25d$5e1c4470$6401a8c0@enterprise>

Please picture this scenario:

I'm a volunteer who has scanned a public-domain book and wants to make it
available through the PG distribution mechanism (free of charge, available
until the Internet collapses under the weight of spam and next-generation
pornography, yadda, yadda, yadda).

Today, if I can convert this book to plain text (according to some stated
formatting conventions), I may submit the book. If I'm ambitious, I can
create an HTML version, which presents the same information, but allows
"real" formatting rather than _italic_ and *bold*. 

In the background, however, there is this Whole New World(tm) of semantic
tagging, which presumably will allow the book to make snacks and provide
entertainment during the reading process. But, for me, as a volunteer, who
spends a considerable amount of time working on books, but enjoys actually
finishing one and seeing it posted, I can't get my arms around the benefits.

Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO.

Could someone please explain the benefit of semantic tagging and why it
won't horribly lengthen the amount of time required to produce an eBook?

Thank you.


From stephen.thomas at adelaide.edu.au  Thu Oct 14 20:11:18 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Thu Oct 14 20:11:39 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
References: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
Message-ID: <416F3FD6.5080705@adelaide.edu.au>

John Hagerson wrote:
> ...
> 
> Could someone please explain the benefit of semantic tagging and why it
> won't horribly lengthen the amount of time required to produce an eBook?

Well, I'll try:

First, let me say that for many works, for the purpose of 
*reading* the work, it doesn't matter. (I'll probably be flamed 
for that, but never mind.) Your simple, basic, novel, in which 
there are a great many paragraphs of text, divided into chapters 
with obvious headings like "CHAPTER II", don't really need much 
more than the very basic, simple HTML P tag.

However, not all works are so simple. Yesterday I had cause to 
look at Immanuel Kant's /The Science of Right/, in which the 
author chose to use a great many divisions, subdivisions, 
sections, etc. -- all with their own headers. Since I converted 
this from plain text to HTML, I needed to determine from the 
plain text which were headings, subheadings, sub-sub-headings, 
etc. And unfortunately, this has required some guess-work by me. 
So, one benefit of more detailed tagging would be that for such 
a work, it would be made obvious and explicit which were 
headings, and which sub-headings. In other words, the structure 
intended by Kant is recorded in the tagging.

Another example: look at any play. You have speech, names of 
speakers, stage directions, headings, and divisions into Act and 
Scene. All of these are made explicit by the tagging. Without 
tagging, there may well be confusion at some point as to what is 
speech and what is stage direction, for example.

In a plain text file, we do make some effort to distinguish 
different elements of a work: quotations are indented, headings 
in UPPER CASE and centered, etc. But any kind of complexity in 
the work tends quickly to make that unworkable.


Regards,
Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From ke at gnu.franken.de  Thu Oct 14 20:13:45 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Thu Oct 14 20:31:06 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <42.5a526037.2ea00b34@aol.com> (Bowerbird@aol.com's message of
	"Thu, 14 Oct 2004 13:02:44 EDT")
References: <42.5a526037.2ea00b34@aol.com>
Message-ID: <shlle8vfkm.fsf@tux.gnu.franken.de>

Bowerbird@aol.com writes:

> can you give and defend your definition of "good" in this case?

"Save as HTML" normally is not good enough.

> ditto with "semantically tagged file"?

Why do you ask?

> and, if you are up to the challenge, what is your recommendation
> as to the route that should be taken to get a library of 14,000+
> e-texts converted to the brand of x.m.l. markup you think is best?

We can keep the old file unchanged for the time being.  XML produced by
http://www.pgdp.net/ is good enough to work with.

> finally, greg recently requested that people come forward with
> working routines to implement an x.m.l.-master methodology.
> are you able to answer that call?  did you?  if so, do let us know.

For converting TEI XML to HTML and PDF you can use Sebastian Rahtz' XSL
stylesheets:

    http://www.tei-c.org/Stylesheets/teixsl.html

I'm old fashioned and like playing with DSSSL tools (that's all in
German and not that polished nor finished -- take it as a proof of
concept):

    http://www.gnu.franken.de/Tieck/
    http://www.gnu.franken.de/Tieck/Dokumente/Koepke/

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From joshua at hutchinson.net  Fri Oct 15 04:08:16 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 04:08:20 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
References: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
Message-ID: <416FAFA0.1030304@hutchinson.net>

Steve makes a good answer in another post, but I wanted to add my 
personal holy grail that hopefully a TEI-Lite master format will help 
bring about...

A single master document.

Right now, I create a ASCII version and then a HTML version.  If I make 
the ASCII version first, it almost never fails that I find at least one 
more mistake when I then do the HTML version.  I fix it there, but I 
have to remember it and go back to the ASCII version and make the fix 
there.  And god forbid the fix requires another rewrap.

A master document format that is auto-converted to the others (at an 
acceptable level) would be wonderful and, imo, worth a little extra up 
front effort to prepare it.

If someone could get a working bit of code in place, I'd be happy to 
start testing it like crazy and work on old texts to get it converted to 
that format.

Josh

John Hagerson wrote:

>Please picture this scenario:
>
>I'm a volunteer who has scanned a public-domain book and wants to make it
>available through the PG distribution mechanism (free of charge, available
>until the Internet collapses under the weight of spam and next-generation
>pornography, yadda, yadda, yadda).
>
>Today, if I can convert this book to plain text (according to some stated
>formatting conventions), I may submit the book. If I'm ambitious, I can
>create an HTML version, which presents the same information, but allows
>"real" formatting rather than _italic_ and *bold*. 
>
>In the background, however, there is this Whole New World(tm) of semantic
>tagging, which presumably will allow the book to make snacks and provide
>entertainment during the reading process. But, for me, as a volunteer, who
>spends a considerable amount of time working on books, but enjoys actually
>finishing one and seeing it posted, I can't get my arms around the benefits.
>
>Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO.
>
>Could someone please explain the benefit of semantic tagging and why it
>won't horribly lengthen the amount of time required to produce an eBook?
>
>Thank you.
>
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

From nihil_obstat at mindspring.com  Fri Oct 15 06:13:56 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Fri Oct 15 06:14:00 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net>


I started e-books in the old days when PG was only plain text.  Then after quite a long lapse I had returned to discover that I could release a book in HTML if I wished, supplying a standard TXT along with it.

I am happy with this arrangement, sometimes doing both HTML and TXT, and sometimes just TXT depending on how highly formatted the original was.  I tend to work the opposite way, though, doing the HTML first (using a text editor incidentally), then stripping the code for the TXT.  It is probably not the most efficient way, but hobbies are not supposed to be efficient.

I am ignorant too about the acronyms you mentioned.  I am also very pragmatic, and hope to remain totally ignorant of these until someone proves to me--with a history of examples--that it is worth it.  TXT and HTML have such histories, so I shall stick with these for now.

Regarding HTML, some thoughts. . .

- Use the full range of tags when appropriate (but if possible stick with the older 3.2 tags unless necessary.  I always try the simplest tool first that will do the job).  There was a reply about the limitations in TXT with heading hierarchies.  HTML has several levels of header tags that are meant to be used for this purpose.  Other tags can be used creatively to achieve other ends.  A list of the 3.2 tags are at http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to validate, though).

- The huge benefit of HTML (besides the text formatting that you mentioned) is the ability to insert images.  Some books I would never have considered working on if could not have done an HTML.

- Don't forget to set the background color if you want a specific color (in the BODY tag, or style sheet).  I have seen hundreds of pages where the writer assumes that white is always the default background color for everyone (not true) intending the graphics to blend into the background.


-----Original Message-----
From: Joshua Hutchinson <joshua@hutchinson.net>
Sent: Oct 15, 2004 7:08 AM
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] I'm sorry but I don't get it...

Steve makes a good answer in another post, but I wanted to add my 
personal holy grail that hopefully a TEI-Lite master format will help 
bring about...

A single master document.

Right now, I create a ASCII version and then a HTML version.  If I make 
the ASCII version first, it almost never fails that I find at least one 
more mistake when I then do the HTML version.  I fix it there, but I 
have to remember it and go back to the ASCII version and make the fix 
there.  And god forbid the fix requires another rewrap.

A master document format that is auto-converted to the others (at an 
acceptable level) would be wonderful and, imo, worth a little extra up 
front effort to prepare it.

If someone could get a working bit of code in place, I'd be happy to 
start testing it like crazy and work on old texts to get it converted to 
that format.

Josh

John Hagerson wrote:

>Please picture this scenario:
>
>I'm a volunteer who has scanned a public-domain book and wants to make it
>available through the PG distribution mechanism (free of charge, available
>until the Internet collapses under the weight of spam and next-generation
>pornography, yadda, yadda, yadda).
>
>Today, if I can convert this book to plain text (according to some stated
>formatting conventions), I may submit the book. If I'm ambitious, I can
>create an HTML version, which presents the same information, but allows
>"real" formatting rather than _italic_ and *bold*. 
>
>In the background, however, there is this Whole New World(tm) of semantic
>tagging, which presumably will allow the book to make snacks and provide
>entertainment during the reading process. But, for me, as a volunteer, who
>spends a considerable amount of time working on books, but enjoys actually
>finishing one and seeing it posted, I can't get my arms around the benefits.
>
>Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO.
>
>Could someone please explain the benefit of semantic tagging and why it
>won't horribly lengthen the amount of time required to produce an eBook?
>
>Thank you.
>
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From scott_bulkmail at productarchitect.com  Fri Oct 15 06:39:29 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Fri Oct 15 06:41:36 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
References: <002401c4b25d$5e1c4470$6401a8c0@enterprise>
Message-ID: <p06110427bd9571da32d4@[192.168.0.52]>

I'll take your questions in reverse order.

>why [semantic tagging]
>won't horribly lengthen the amount of time required to produce an eBook?

I think a two-part answer is important here.

1. The great news is that basic semantic tagging is roughly the same effort as HTML.  And, if PG had acceptable MASTER-to-text conversion, the overall effort would be REDUCED compared to creating BOTH text and HTML by hand.

Today, creating an eText involves throwing information away, e.g. converting what is clearly multiple levels of heading into ALL CAPS -- which loses any distinction between the levels.  The key to creating a MASTER is to preserve this information.

Sometimes this will require a tiny bit more time (to use the correct tag or add the appropriate attribute) but often it will take less time than manually converting to ALL CAPS or whatever.

And, as I've argued elsewhere, there's no need to wait for widespread agreement on any particular set of XML tags.  If used consistently, it's much, much easier to convert from one XML representation to another than to convert from text to HTML.  In fact, it's also fine to skip XML and just use consistent HTML with appropriate div/span tags and/or attributes on regular HTML tags.  What's important is to stop throwing useful information away and instead to capture it in a way that can be processed automatically.

Takeaway point: reliable MASTER-to-text conversion would increase the number of eTexts produced per unit of volunteer time investment.  (And, as DP folks have argued, additional automation would streamline other stages too.)


2. There's a second level of semantic tagging that *does* require more effort: adding information that's useful but isn't represented in print.  For example, perhaps we want to label every quotation with the name of the speaker.  That's easy in a play, since the name is printed.  That's quite a lot of work in prose since the name may or may not occur adjacent to the quote, and even when it does, could be before or after, and may be represented several ways (e.g. "Arthur", "The King", "His Majesty").

I'm actually a fan of rich semantic markup, but, to be honest, the benefits of this second level are much smaller and the effort much greater.  In the foreseeable future, this is likely only to be done when the volunteer has a specific end use in mind.


>Could someone please explain the benefit of semantic tagging

Others have addressed this, but I want to summarize and add a few points.

1. A single MASTER copy from which all other versions can be generated automatically.  Plain text and HTML of course, but also PDF and the various eBook formats.  Just as important: more than one rendition of any particular format can be created, e.g. a set of HTML files split by chapter or even page, or PDF formatted for a particular screen size, paper size, or printing layout (e.g. as a booklet).

2. Capture information that's beyond what is generally printed, but is useful to certain audiences and/or in certain contexts.  e.g. (from an earlier thread) the MASTER can capture a mistake AND the correction; or other variations.  See Re[2]: [gutvol-d] Indexing Editors, etc. from Oct. 4, 2004 for details.

3. Automated processes that "add value" in some way, e.g. using a different computer voice for different characters, or creating an index by character.
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From nwolcott2 at kreative.net  Fri Oct 15 07:53:39 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Fri Oct 15 07:57:43 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
References: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net>
Message-ID: <004801c4b2c6$ca0042e0$c99495ce@net>

But PG has adopted standards which limit the range of tags and CSS you can
use, so you may not be able to specify changes in background color or font,
such as Alice in Wonderland. Some contributors put their HTML elsewhere,
perhaps for this reason. Bad news.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Dennis McCarthy" <nihil_obstat@mindspring.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, October 15, 2004 9:13 AM
Subject: Re: [gutvol-d] I'm sorry but I don't get it...


>
> I started e-books in the old days when PG was only plain text.  Then after
quite a long lapse I had returned to discover that I could release a book in
HTML if I wished, supplying a standard TXT along with it.
>
> I am happy with this arrangement, sometimes doing both HTML and TXT, and
sometimes just TXT depending on how highly formatted the original was.  I
tend to work the opposite way, though, doing the HTML first (using a text
editor incidentally), then stripping the code for the TXT.  It is probably
not the most efficient way, but hobbies are not supposed to be efficient.
>
> I am ignorant too about the acronyms you mentioned.  I am also very
pragmatic, and hope to remain totally ignorant of these until someone proves
to me--with a history of examples--that it is worth it.  TXT and HTML have
such histories, so I shall stick with these for now.
>
> Regarding HTML, some thoughts. . .
>
> - Use the full range of tags when appropriate (but if possible stick with
the older 3.2 tags unless necessary.  I always try the simplest tool first
that will do the job).  There was a reply about the limitations in TXT with
heading hierarchies.  HTML has several levels of header tags that are meant
to be used for this purpose.  Other tags can be used creatively to achieve
other ends.  A list of the 3.2 tags are at
http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to
validate, though).
>
> - The huge benefit of HTML (besides the text formatting that you
mentioned) is the ability to insert images.  Some books I would never have
considered working on if could not have done an HTML.
>
> - Don't forget to set the background color if you want a specific color
(in the BODY tag, or style sheet).  I have seen hundreds of pages where the
writer assumes that white is always the default background color for
everyone (not true) intending the graphics to blend into the background.
>
>
> -----Original Message-----
> From: Joshua Hutchinson <joshua@hutchinson.net>
> Sent: Oct 15, 2004 7:08 AM
> To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] I'm sorry but I don't get it...
>
> Steve makes a good answer in another post, but I wanted to add my
> personal holy grail that hopefully a TEI-Lite master format will help
> bring about...
>
> A single master document.
>
> Right now, I create a ASCII version and then a HTML version.  If I make
> the ASCII version first, it almost never fails that I find at least one
> more mistake when I then do the HTML version.  I fix it there, but I
> have to remember it and go back to the ASCII version and make the fix
> there.  And god forbid the fix requires another rewrap.
>
> A master document format that is auto-converted to the others (at an
> acceptable level) would be wonderful and, imo, worth a little extra up
> front effort to prepare it.
>
> If someone could get a working bit of code in place, I'd be happy to
> start testing it like crazy and work on old texts to get it converted to
> that format.
>
> Josh
>
> John Hagerson wrote:
>
> >Please picture this scenario:
> >
> >I'm a volunteer who has scanned a public-domain book and wants to make it
> >available through the PG distribution mechanism (free of charge,
available
> >until the Internet collapses under the weight of spam and next-generation
> >pornography, yadda, yadda, yadda).
> >
> >Today, if I can convert this book to plain text (according to some stated
> >formatting conventions), I may submit the book. If I'm ambitious, I can
> >create an HTML version, which presents the same information, but allows
> >"real" formatting rather than _italic_ and *bold*.
> >
> >In the background, however, there is this Whole New World(tm) of semantic
> >tagging, which presumably will allow the book to make snacks and provide
> >entertainment during the reading process. But, for me, as a volunteer,
who
> >spends a considerable amount of time working on books, but enjoys
actually
> >finishing one and seeing it posted, I can't get my arms around the
benefits.
> >
> >Except for recognizing the acronyms, I am agnostic to
XML/ZML/TEI/ABC/EIEIO.
> >
> >Could someone please explain the benefit of semantic tagging and why it
> >won't horribly lengthen the amount of time required to produce an eBook?
> >
> >Thank you.
> >
> >
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
> ---------------------------
> Dennis McCarthy
> nihil_obstat@mindspring.com
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>

From nihil_obstat at mindspring.com  Fri Oct 15 08:26:31 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Fri Oct 15 08:26:35 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net>

I once had to ask the whitewasher to put the background color code back in (it is only a few character's long) and he did it.  He had put in a new automated header code that overwrote my BODY tag.

I see the wisdom in avoiding the FONT tag as much as possible, particularly the font face (i.e. Arial, ComicSans, etc.)  seems something bound to get lost or unrecognized in future browser/OS versions.  If the font face is important enough (0.01% of the time?) one could do a PDF or page scans.

Even when I am being a stickler about preserving the text format as much as possible, I only aim for it to be 99% useful for 99% of readers, rather than trying to make a perfect reprint.  Of course there is no perfect reproduction. . .

I once heard a researcher talk about a man he found smelling manuscripts in a library.  A conversation started where the man explained he was trying to trace diseases in European towns.  A vinegar spray was apparently used at one time as an attempted disinfectant when papers where transfered between infected and uninfected areas.  You shall never get a reproduced smell from microfilm, a page scan, on an e-book.

-----Original Message-----
From: Norm Wolcott <nwolcott2@kreative.net>
Sent: Oct 15, 2004 10:53 AM
To: Dennis McCarthy <nihil_obstat@mindspring.com>, 
	Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] I'm sorry but I don't get it...

But PG has adopted standards which limit the range of tags and CSS you can
use, so you may not be able to specify changes in background color or font,
such as Alice in Wonderland. Some contributors put their HTML elsewhere,
perhaps for this reason. Bad news.
nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
----- Original Message -----
From: "Dennis McCarthy" <nihil_obstat@mindspring.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, October 15, 2004 9:13 AM
Subject: Re: [gutvol-d] I'm sorry but I don't get it...


>
> I started e-books in the old days when PG was only plain text.  Then after
quite a long lapse I had returned to discover that I could release a book in
HTML if I wished, supplying a standard TXT along with it.
>
> I am happy with this arrangement, sometimes doing both HTML and TXT, and
sometimes just TXT depending on how highly formatted the original was.  I
tend to work the opposite way, though, doing the HTML first (using a text
editor incidentally), then stripping the code for the TXT.  It is probably
not the most efficient way, but hobbies are not supposed to be efficient.
>
> I am ignorant too about the acronyms you mentioned.  I am also very
pragmatic, and hope to remain totally ignorant of these until someone proves
to me--with a history of examples--that it is worth it.  TXT and HTML have
such histories, so I shall stick with these for now.
>
> Regarding HTML, some thoughts. . .
>
> - Use the full range of tags when appropriate (but if possible stick with
the older 3.2 tags unless necessary.  I always try the simplest tool first
that will do the job).  There was a reply about the limitations in TXT with
heading hierarchies.  HTML has several levels of header tags that are meant
to be used for this purpose.  Other tags can be used creatively to achieve
other ends.  A list of the 3.2 tags are at
http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to
validate, though).
>
> - The huge benefit of HTML (besides the text formatting that you
mentioned) is the ability to insert images.  Some books I would never have
considered working on if could not have done an HTML.
>
> - Don't forget to set the background color if you want a specific color
(in the BODY tag, or style sheet).  I have seen hundreds of pages where the
writer assumes that white is always the default background color for
everyone (not true) intending the graphics to blend into the background.
>
>
> -----Original Message-----
> From: Joshua Hutchinson <joshua@hutchinson.net>
> Sent: Oct 15, 2004 7:08 AM
> To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
> Subject: Re: [gutvol-d] I'm sorry but I don't get it...
>
> Steve makes a good answer in another post, but I wanted to add my
> personal holy grail that hopefully a TEI-Lite master format will help
> bring about...
>
> A single master document.
>
> Right now, I create a ASCII version and then a HTML version.  If I make
> the ASCII version first, it almost never fails that I find at least one
> more mistake when I then do the HTML version.  I fix it there, but I
> have to remember it and go back to the ASCII version and make the fix
> there.  And god forbid the fix requires another rewrap.
>
> A master document format that is auto-converted to the others (at an
> acceptable level) would be wonderful and, imo, worth a little extra up
> front effort to prepare it.
>
> If someone could get a working bit of code in place, I'd be happy to
> start testing it like crazy and work on old texts to get it converted to
> that format.
>
> Josh
>
> John Hagerson wrote:
>
> >Please picture this scenario:
> >
> >I'm a volunteer who has scanned a public-domain book and wants to make it
> >available through the PG distribution mechanism (free of charge,
available
> >until the Internet collapses under the weight of spam and next-generation
> >pornography, yadda, yadda, yadda).
> >
> >Today, if I can convert this book to plain text (according to some stated
> >formatting conventions), I may submit the book. If I'm ambitious, I can
> >create an HTML version, which presents the same information, but allows
> >"real" formatting rather than _italic_ and *bold*.
> >
> >In the background, however, there is this Whole New World(tm) of semantic
> >tagging, which presumably will allow the book to make snacks and provide
> >entertainment during the reading process. But, for me, as a volunteer,
who
> >spends a considerable amount of time working on books, but enjoys
actually
> >finishing one and seeing it posted, I can't get my arms around the
benefits.
> >
> >Except for recognizing the acronyms, I am agnostic to
XML/ZML/TEI/ABC/EIEIO.
> >
> >Could someone please explain the benefit of semantic tagging and why it
> >won't horribly lengthen the amount of time required to produce an eBook?
> >
> >Thank you.
> >
> >
> >
> >_______________________________________________
> >gutvol-d mailing list
> >gutvol-d@lists.pglaf.org
> >http://lists.pglaf.org/listinfo.cgi/gutvol-d
> >
> >
> >
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
> ---------------------------
> Dennis McCarthy
> nihil_obstat@mindspring.com
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From joshua at hutchinson.net  Fri Oct 15 08:33:25 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 08:33:30 2004
Subject: [gutvol-d] PG TEI pages
Message-ID: <20041015153325.EB1EFEDEC5@ws6-1.us4.outblaze.com>

The recent discussion has me wanting to go back and refresh myself on what TEI options were currently available.  However the links to the online TEI converter at the PG home back seems to be dead.  Is this something was removed or has the link just grown old and retired when no one was looking?

http://www.gutenberg.org/tei/

Josh
From Bowerbird at aol.com  Fri Oct 15 10:17:16 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 10:17:31 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <6d.357ee9e9.2ea1601c@aol.com>

dennis said:
>   - The huge benefit of HTML (besides the text formatting 
>   that you mentioned) is the ability to insert images.  
>   Some books I would never have considered working on 
>   if could not have done an HTML.

the ability of a web-browser to combine text and images
is indeed truly wonderful.  doing that -- cross-platform,
24/7, world-wide -- was one big reason the web took off.

but, just as you need to use a certain kind of viewer-app
(i.e., that web-browser) to attain this inclusion of images,
an intelligent viewer for e-texts can _also_ achieve this.
regular text-viewers won't do it.  but specialized ones will.

indeed, my viewer-program shows images when it is used
to display a text-file, _provided_ that text-file includes
information that tells _which_ image to display _where_.
(just like a web-browser needs the img tag with that info.)

amazingly, however, this obviously-relevant-and-important
information is often simply _not_included_ in the text-file.

indeed, the information is sometimes _stripped_from_ files!
(the in-process working files from distributed proofreaders
routinely contain a note regarding the presence of an image,
a line that contains the caption for the image if there is one.)

what is needed, so that an intelligent viewer-program can
know where to place an image, and what file contains it,
is some kind of indicator in the file giving that information.
the indicator could be as crude as a filename, or it could be
more subtle.  (i'll detail this, if you would like me to do so.)

all of this is just to resubmit a plea that i have made before
(and will _continue_ making until i get a positive response!)
for information about the name and location of graphic-files
to be included in the _plain-text_ versions of the e-texts...

-bowerbird
From joshua at hutchinson.net  Fri Oct 15 10:39:21 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 10:39:27 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>


> all of this is just to resubmit a plea that i have made before
> (and will _continue_ making until i get a positive response!)
> for information about the name and location of graphic-files
> to be included in the _plain-text_ versions of the e-texts...

And now it is no longer a plain-text file.  With the added penalty of having no existing validators to make sure that the markup used is done correctly.

Basically, you're reinventing the wheel for no purpose here.

Josh
From joel at oneporpoise.com  Fri Oct 15 11:05:22 2004
From: joel at oneporpoise.com (Joel A. Erickson)
Date: Fri Oct 15 11:05:16 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
References: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net>
Message-ID: <001d01c4b2e1$8a48a960$6501a8c0@JOEL>

From: "Dennis McCarthy":
> I once heard a researcher talk about a man he found smelling manuscripts 
> in a library.  A conversation started where the man explained he was 
> trying to trace diseases in European towns.  A vinegar spray was 
> apparently used at one time as an attempted disinfectant when papers where 
> transfered between infected and uninfected areas.  You shall never get a 
> reproduced smell from microfilm, a page scan, on an e-book.


Is that "on an e-book" intended to be "or an e-book." If so, I'm not so sure 
about never being able to reproduce the smell. Not that I'm particularly 
keen on smelling books, but I've heard of working prototypes of scent 
devices activated digitally. 

From Bowerbird at aol.com  Fri Oct 15 11:20:11 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 11:20:38 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <13c.3d465e9.2ea16edb@aol.com>

john said:
>   for me, as a volunteer, who spends 
>   a considerable amount of time working on books, 
>   but enjoys actually finishing one and seeing it posted, 
>   I can't get my arms around the benefits.
...
>   Could someone please explain the benefit of semantic tagging 
>   and why it won't horribly lengthen the amount of time required 
>   to produce an eBook?

first of all, thank you for asking your questions.
i look forward to hearing some answers to them.

and thank you for your history of doing e-texts
for project gutenberg.  it's important to retain
the volunteers who have been working all along...

i wanted to make a point about one thing you said...


>   If I'm ambitious, I can create an HTML version, 
>   which presents the same information, but allows
>   "real" formatting rather than _italic_ and *bold*. 

actually, if you take a look at that "real" formatting
in the html-source, you'll see it's plain-ascii, namely:
     [i]italic[/i] and [b]bold[/b]
or -- if you prefer --
     [em]emphasis[/em] and [strong]strong[/strong]
except, of course, using angle-brackets 
instead of the square ones that i used so
the brackets wouldn't get swallowed up or interpreted.

but yes, of course, i know what you _meant_,
which is that when the e-text is _displayed_,
the _viewer-program_ converts that "markup"
appropriately, into "real" italics and real bold,
even though there were no italics or bold in the source,
just the _tags_ that indicated that styling was present.

that is, you need to use the appropriate "user agent"
(to use the markup-geek terminology now in favor)
that knows how to interpret the markup and render it.

however, it's not that difficult to write a viewer-app
that can take the plain-text file as input and render
any words surrounded with _underscores_ as italics,
and any words surrounded with *asterisks* as bold.
it's just a different "user agent" interpreting the
different markup, and rendering it as called for...

i say that based on experience.  i've written such an app.

and indeed, it's not that difficult to write a converter
that will change the underscore _form_of_italics_
into the other [i]form of italics[/i] that uses brackets.
it's rather easy to see they are functionally equivalent.

the difference between the two forms in the _raw_ file is
the underscore form _enhances_ the user's comprehension,
while the bracket form [em]obscures[/em] it, and badly...

rather than creating 14,000+ new files, with all the work
that entails, we can achieve the same end by distributing
_one_ viewer-program that utilizes the existing e-texts...

-bowerbird
From Bowerbird at aol.com  Fri Oct 15 11:27:56 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 11:28:10 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <1e.35f0956e.2ea170ac@aol.com>

joshua said:
>   I wanted to add my personal holy grail that 
>   hopefully a TEI-Lite master format will help bring about...
>   A single master document.

i've detailed this on another reply i am writing,
which is still in progress, but i'll headline it here:
a methodology of a "master document" is worthwhile,
but there's no reason that master _must_ be in t.e.i.
any form of markup that captures all the information
that's deemed necessary can serve capably as a master.


>   A master document format that is auto-converted to the others 
>   (at an acceptable level) would be wonderful and, imo, 
>   worth a little extra up front effort to prepare it.

well, yes, it would be wonderful.  and worth the effort.
and if it didn't take much extra effort, but instead was
intuitive even to untrained volunteers, that would be
_really_ special...

   
>   If someone could get a working bit of code in place, 
>   I'd be happy to start testing it like crazy and 
>   work on old texts to get it converted to that format.

i've already got a "bit of code in place" using z.m.l. -- 
a.k.a. zen markup language, a.k.a. zero markup language
-- but i doubt you're interested in testing it _at_all_,
let alone "like crazy", which is quite alright with me,
thank you very much, as i don't need _your_ help...

nonetheless, the beta-test is open to all, by e-mailing:
     zml_talk-subscribe@yahoogroups.com

-bowerbird
From joshua at hutchinson.net  Fri Oct 15 11:42:03 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 11:42:10 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <20041015184203.5F882EDED2@ws6-1.us4.outblaze.com>

> i've already got a "bit of code in place" using z.m.l. -- 
> a.k.a. zen markup language, a.k.a. zero markup language
> -- but i doubt you're interested in testing it _at_all_,
> let alone "like crazy", which is quite alright with me,
> thank you very much, as i don't need _your_ help...
> 
> nonetheless, the beta-test is open to all, by e-mailing:
>      zml_talk-subscribe@yahoogroups.com
> 

If you *ever* actually release any code that people can test, I *will* test it "like crazy."  And if, all indications to contrary, you've actually produced something useful, I'll be the first person to eat crow on the public boards.

Remember, though, it has to be able to convert from a "master" format to other formats easily and automatically... otherwise, you're just reinventing HTML in your own image.

Josh
From gbnewby at pglaf.org  Fri Oct 15 11:51:18 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Oct 15 11:51:19 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <001d01c4b2e1$8a48a960$6501a8c0@JOEL>
References: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net>
	<001d01c4b2e1$8a48a960$6501a8c0@JOEL>
Message-ID: <20041015185118.GB16361@pglaf.org>

On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote:
> From: "Dennis McCarthy":
> >I once heard a researcher talk about a man he found smelling manuscripts 
> >in a library.  A conversation started where the man explained he was 
> >trying to trace diseases in European towns.  A vinegar spray was 
> >apparently used at one time as an attempted disinfectant when papers where 
> >transfered between infected and uninfected areas.  You shall never get a 
> >reproduced smell from microfilm, a page scan, on an e-book.
> 
> 
> Is that "on an e-book" intended to be "or an e-book." If so, I'm not so 
> sure about never being able to reproduce the smell. Not that I'm 
> particularly keen on smelling books, but I've heard of working prototypes 
> of scent devices activated digitally. 

I do not know if there is an online source for this story,
but have seen a printed copy and believe it is legimate.
The story is that old books can develop mold or fungus.
Sometimes this can be very light, and it might be between
the pages (not just on the cover).  Any preservation
librarian can verify this fact.

The interesting part is that the molds or fungi (or spores)
have demonstrated psychoactive properties.  In short, 
sniffing old books can get you high and/or cause hallucination.
  -- Greg


From joshua at hutchinson.net  Fri Oct 15 12:34:31 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 12:34:40 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <20041015193431.A293D109A28@ws6-4.us4.outblaze.com>


> 
> The interesting part is that the molds or fungi (or spores)
> have demonstrated psychoactive properties.  In short, 
> sniffing old books can get you high and/or cause hallucination.
>   -- Greg
> 

This explains an awful lot about some of my old English professors ...
From nihil_obstat at mindspring.com  Fri Oct 15 12:55:00 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Fri Oct 15 12:55:10 2004
Subject: [gutvol-d] Sniffing Books
Message-ID: <6187538.1097870101072.JavaMail.root@wamui04.slb.atl.earthlink.net>

1) Correction, read "or an e-book" for "on an e-book."

2) I am pretty sure I heard this on public radio while driving home two or more years ago.  The specific topic was second thoughts on an initiative by some libraries to put books on microfilm, then purge the originals from their collection (as a space saving project).  Could not tell you the name of the speaker, but he had been an advocate of purging hard copies, and the vinegar incident helped him rethink it.

3) The vinegar idea sounds strange enough that the researcher may have thought it up after sniffing enough mold spores, and actually ended up believing it.

-----Original Message-----
From: Greg Newby <gbnewby@pglaf.org>
Sent: Oct 15, 2004 2:51 PM
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] I'm sorry but I don't get it...

On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote:
> From: "Dennis McCarthy":
> >I once heard a researcher talk about a man he found smelling manuscripts 
> >in a library.  A conversation started where the man explained he was 
> >trying to trace diseases in European towns.  A vinegar spray was 
> >apparently used at one time as an attempted disinfectant when papers where 
> >transfered between infected and uninfected areas.  You shall never get a 
> >reproduced smell from microfilm, a page scan, on an e-book.
> 
> 
> Is that "on an e-book" intended to be "or an e-book." If so, I'm not so 
> sure about never being able to reproduce the smell. Not that I'm 
> particularly keen on smelling books, but I've heard of working prototypes 
> of scent devices activated digitally. 

I do not know if there is an online source for this story,
but have seen a printed copy and believe it is legimate.
The story is that old books can develop mold or fungus.
Sometimes this can be very light, and it might be between
the pages (not just on the cover).  Any preservation
librarian can verify this fact.

The interesting part is that the molds or fungi (or spores)
have demonstrated psychoactive properties.  In short, 
sniffing old books can get you high and/or cause hallucination.
  -- Greg


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From colc at gutenberg.net.au  Fri Oct 15 15:02:18 2004
From: colc at gutenberg.net.au (Col Choat)
Date: Fri Oct 15 15:04:16 2004
Subject: [gutvol-d] Sniffing Books
In-Reply-To: <6187538.1097870101072.JavaMail.root@wamui04.slb.atl.earthlink.net>
Message-ID: <LGEBJEPCJPGOHPFBBJFKEEEFDDAA.colc@gutenberg.net.au>

Don't get too hung up on this one, as I am working on a "virtual aroma
emitter". You rub your left ear in a certain way as you read the e-book and
can then bring forth mould, vinegar, coffee, new-mown grass or whatever is
required by that page to enhance your reading experience. I just have a few
technical hitches to overcome. Version 2 will emit the smells without
rubbing your ear. It will recognise words like coffee, grass, perfume, roast
beef, etc. We will need a black list and a white list, of course. Most of us
don't want to experience the actual smells as we are reading about running
around the sewers below the streets of Paris.

Col Choat

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Dennis McCarthy
Sent: Saturday, 16 October 2004 5:55 AM
To: Project Gutenberg Volunteer Discussion; Project Gutenberg Volunteer
Discussion
Subject: [gutvol-d] Sniffing Books


1) Correction, read "or an e-book" for "on an e-book."

2) I am pretty sure I heard this on public radio while driving home two or
more years ago.  The specific topic was second thoughts on an initiative by
some libraries to put books on microfilm, then purge the originals from
their collection (as a space saving project).  Could not tell you the name
of the speaker, but he had been an advocate of purging hard copies, and the
vinegar incident helped him rethink it.

3) The vinegar idea sounds strange enough that the researcher may have
thought it up after sniffing enough mold spores, and actually ended up
believing it.

-----Original Message-----
From: Greg Newby <gbnewby@pglaf.org>
Sent: Oct 15, 2004 2:51 PM
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] I'm sorry but I don't get it...

On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote:
> From: "Dennis McCarthy":
> >I once heard a researcher talk about a man he found smelling manuscripts
> >in a library.  A conversation started where the man explained he was
> >trying to trace diseases in European towns.  A vinegar spray was
> >apparently used at one time as an attempted disinfectant when papers
where
> >transfered between infected and uninfected areas.  You shall never get a
> >reproduced smell from microfilm, a page scan, on an e-book.
>
>
> Is that "on an e-book" intended to be "or an e-book." If so, I'm not so
> sure about never being able to reproduce the smell. Not that I'm
> particularly keen on smelling books, but I've heard of working prototypes
> of scent devices activated digitally.

I do not know if there is an online source for this story,
but have seen a printed copy and believe it is legimate.
The story is that old books can develop mold or fungus.
Sometimes this can be very light, and it might be between
the pages (not just on the cover).  Any preservation
librarian can verify this fact.

The interesting part is that the molds or fungi (or spores)
have demonstrated psychoactive properties.  In short,
sniffing old books can get you high and/or cause hallucination.
  -- Greg


_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From ian at babcockbrown.com  Fri Oct 15 15:28:57 2004
From: ian at babcockbrown.com (Ian Stoba)
Date: Fri Oct 15 15:27:02 2004
Subject: [gutvol-d] Sniffing Books
In-Reply-To: <LGEBJEPCJPGOHPFBBJFKEEEFDDAA.colc@gutenberg.net.au>
References: <LGEBJEPCJPGOHPFBBJFKEEEFDDAA.colc@gutenberg.net.au>
Message-ID: <9A993116-1EF9-11D9-A9B2-003065D6440E@babcockbrown.com>

Isn't there a way we could incorporate this into the XML markup?

Given the recent discussions on this list, this option might appeal to 
both groups:

1. Those who are attracted to markup:

	<locus city="Paris" location="sewers" elevation="-15" 
illumination="dim">
		<odor category="stench" source="sewage" state="raw" 
intensity="overwhelming" />
	</locus>

2. Those who think XML smells like....

Sorry, I couldn't help it.

--Ian


On Oct 15, 2004, at 3:02 PM, Col Choat wrote:

> Don't get too hung up on this one, as I am working on a "virtual aroma
> emitter". You rub your left ear in a certain way as you read the 
> e-book and
> can then bring forth mould, vinegar, coffee, new-mown grass or 
> whatever is
> required by that page to enhance your reading experience. I just have 
> a few
> technical hitches to overcome. Version 2 will emit the smells without
> rubbing your ear. It will recognise words like coffee, grass, perfume, 
> roast
> beef, etc. We will need a black list and a white list, of course. Most 
> of us
> don't want to experience the actual smells as we are reading about 
> running
> around the sewers below the streets of Paris.
>

From gbnewby at pglaf.org  Fri Oct 15 16:28:20 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Oct 15 16:28:21 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <004801c4b2c6$ca0042e0$c99495ce@net>
References: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net>
	<004801c4b2c6$ca0042e0$c99495ce@net>
Message-ID: <20041015232820.GC22068@pglaf.org>

On Fri, Oct 15, 2004 at 10:53:39AM -0400, Norm Wolcott wrote:
> But PG has adopted standards which limit the range of tags and CSS you can
> use, so you may not be able to specify changes in background color or font,
> such as Alice in Wonderland. Some contributors put their HTML elsewhere,
> perhaps for this reason. Bad news.

A slight correction: it's true that if you submit HTML files,
it's likely for CSS, bgcolors and other stuff to be stripped
out.  Part of this is our automated "add a header" programs.
Part is a desire to let the HTML be fairly generic.

But if you have an eBook that you'd really like to be
displayed with particular colors, fonts, etc., just ask.  
The only real "standard" is that we strongly desire valid
HTML (per http://validator.w3.org).  The rest is processing,
programs and procedures, which might have the same impact
as a standard sometimes, but should not be mistaken for
one.

As MH likes to say, we're pretty well willing to try almost
anything, at least in small quantities.  Just ask.
  -- Greg

From Bowerbird at aol.com  Fri Oct 15 17:13:17 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 17:13:36 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <155.40f4f729.2ea1c19d@aol.com>

i've got responses in the hopper to 
jeroen, jon, stephen, karl, and scott.

(being a troll is hard work,
but somebody's gotta do it;
and god gave me the looks.)      ;+)

but i'll save all those until monday,
give everyone some time to reflect,
rest, maybe write their own post...

but my bottom-line summary is this:

1)  z.m.l. is a fantastic format for users,
in the hands of an intelligent viewer-app;

2)  z.m.l. can be a great master-format,
as it's easy to create and maintain; and

3)  though z.m.l. will create other formats,
people will prefer z.m.l., due to the viewer.

doubt it?  then join the beta-test,
and tear my little baby to shreds...
zml_talk-subscribe@yahoogroups.com

oh, i wrote a reply to josh too.
that one i'll save until tuesday...
or wednesday...  or next month...     :+)

just one thing before i go, so i can give
stephen the weekend to get a head-start...


stephen said:  
>   In a plain text file, we do make some effort to 
>   distinguish different elements of a work: 
>   quotations are indented, headings in UPPER CASE and 
>   centered, etc. But any kind of complexity in the work 
>   tends quickly to make that unworkable.

my findings are that you are _incorrect_ in that assessment.

i don't believe you can show me many e-texts from the library
that i cannot format unequivocally using zen markup language.
the figure i usually give is 3%, which now is 420+ e-texts,
but i'll be surprised if the number you can find gets that high.
frankly, i don't think you'll be able to find more than a few...

but you are welcome to try...

dig up a list (of 20-40?) e-texts from the library
that you think can not be handled with my z.m.l.,
and i'll take a look at them and see if you're right...

give it your best shot...

(if anyone wants to help stephen out with some pointers to
some particularly difficult e-texts, send him a backchannel!)

***

have a nice weekend, everyone!

-bowerbird

p.s.  i'm still wondering if name-calling
and personal attacks are condoned here...
From Gutenberg9443 at aol.com  Fri Oct 15 17:42:47 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Oct 15 17:43:06 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <e5.45a670a.2ea1c887@aol.com>

 
In a message dated 10/15/2004 12:51:30 PM Mountain Standard Time,  
gbnewby@pglaf.org writes:

In  short, 
sniffing old books can get you high and/or cause  hallucination.


And if you have asthma . . . I'm trying to work (my 
own, not PGLAF's) on copying some 1930s translations 
of Ancient Egyptian medical textbooks. I need them 
for a book I'm writing. Egad! They are killing me!
 
But just in case anyone wonders, the Egyptians by 
about 2500 to 3000 BCE had medicine at a height 
it wouldn't reach again until the late 19th-early 20th 
centuries. Imhotep was both the architect for the
Great Pyramid AND the author of the first surgical
textbook known to have existed. He got a lot of
practice by studying people who had been injured
at the construction sight, but he clearly also accompanied
an army into battle at least once, because he also
describes battlefield injuries.
 
I would scan them for PGLAF before sending them
back to the universities they came from, but my
scanner program has indigestion. It's glad to 
photocopy, but if I ask it to scan and save it lies
down and turns up its little curly toes, insisting it
HAS saved when it patently has not. Also, one of
the books is somewhat bigger than my scanner.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/d1537f4d/attachment.html
From Gutenberg9443 at aol.com  Fri Oct 15 17:47:22 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Oct 15 17:47:39 2004
Subject: [gutvol-d] Sniffing Books
Message-ID: <1d8.2d75eafe.2ea1c99a@aol.com>

 
In a message dated 10/15/2004 4:04:24 PM Mountain Standard Time,  
colc@gutenberg.net.au writes:

>>Don't get too hung up on this one, as I am working on a  >>"virtual aroma
>>emitter". You rub your left ear in a certain  way as you >>read the e-book 
and
>>can then bring forth mould,  vinegar, coffee, new-mown >>grass or whatever 
is
>>required by  that page to enhance your reading >>experience. I just have a  
few
>>technical hitches to overcome. Version 2 will emit the  >>smells without
>>rubbing your ear. It will recognise words  like coffee, >>grass, perfume, 
roast
>>beef, etc. We will need  a black list and a white list, of >>course. Most 
of us
>>don't  want to experience the actual smells as we are >>reading about  
running
>>around the sewers below the streets of  Paris.


Aha! So YOU'RE the one who has been sneaking 
into Snapes's study abducting his potion ingredients! 
Doggone it, you KNOW Harry and Ron and Hermione 
got in trouble over it! Apologize and admit your guilt. 
(Gad. That sounds like a line from THE LAST EMPEROR 
or TO LIVE, doesn't it!)
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/3bcaedf4/attachment-0001.html
From Gutenberg9443 at aol.com  Fri Oct 15 17:49:52 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Oct 15 17:50:09 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <b8.6426d19e.2ea1ca30@aol.com>

 
In a message dated 10/15/2004 6:13:45 PM Mountain Standard Time,  
Bowerbird@aol.com writes:

>>p.s.  i'm still wondering if name-calling
>>and  personal attacks are condoned here...


Only if done with a wink. Like this--
 
;-)
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/b5f63e8f/attachment.html
From foundation3 at softhome.net  Fri Oct 15 18:17:33 2004
From: foundation3 at softhome.net (Craig Morehouse)
Date: Fri Oct 15 18:18:19 2004
Subject: [gutvol-d] responses in the hopper
In-Reply-To: <155.40f4f729.2ea1c19d@aol.com>
References: <155.40f4f729.2ea1c19d@aol.com>
Message-ID: <1097889453.2641.2.camel@localhost>

On Fri, 2004-10-15 at 20:13, Bowerbird@aol.com wrote:

[snip]

> 
> have a nice weekend, everyone!
> 

Thanks. It should be fun. No hurricanes on the Mid-Florida agenda this
week.

> -bowerbird
> 
> p.s.  i'm still wondering if name-calling
> and personal attacks are condoned here...
> ___

No, they're not, you poofing friggleschnitz!

;-)

> ____________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
-- 

The fact that no one understands you doesn't mean you're an artist.

From shalesller at writeme.com  Fri Oct 15 16:24:52 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Oct 15 19:14:08 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <20041015232452.18E1D4BDAB@ws1-1.us4.outblaze.com>

> and indeed, it's not that difficult to write a converter 
> that will change the underscore _form_of_italics_ 
> into the other [i]form of italics[/i] that uses brackets. 
> it's rather easy to see they are functionally equivalent. 

Which is odd, because they aren't. How do you convert _th_operat_er_
to brackets? Is it [i]th[/i] operate [i]er[/i], or [i]th[/i]operate[i]er[/i]
(which would be unsurprising in some of the Middle English editions I've
been scanning)?

Once again, you're going to blow this off as irrelevant, aren't you.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From stephen.thomas at adelaide.edu.au  Fri Oct 15 20:16:15 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Fri Oct 15 20:16:37 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
Message-ID: <4170927F.8040902@adelaide.edu.au>

Actually, I took bowerbird's plea to mean simply that he wanted 
*some* indication in a plain text version of where images 
appeared, and which image was used. E.g.:

	text
	text

	[image: xyz.gif]

	text
	text

This would not seem to be too much to ask, and I think Lynx will 
do this if you use the -dump option to save HTML as plain text.


Incidentally, at what point did the world decide to indicate 
italics by placing underscores before and after text? The 
canonical usage used to be to use a forward slash (solidus), / 
to indicate italics, and underscores to indicate underlining. 
(And * to indicate bold.)


Steve


Joshua Hutchinson wrote:

>  
> 
>>all of this is just to resubmit a plea that i have made before
>>(and will _continue_ making until i get a positive response!)
>>for information about the name and location of graphic-files
>>to be included in the _plain-text_ versions of the e-texts...
> 
> 
> And now it is no longer a plain-text file.  With the added penalty of having no existing validators to make sure that the markup used is done correctly.
> 
> Basically, you're reinventing the wheel for no purpose here.
> 
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From joshua at hutchinson.net  Fri Oct 15 20:32:42 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 15 20:32:35 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <4170927F.8040902@adelaide.edu.au>
References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
	<4170927F.8040902@adelaide.edu.au>
Message-ID: <4170965A.5020706@hutchinson.net>

Steve Thomas wrote:

> Actually, I took bowerbird's plea to mean simply that he wanted *some* 
> indication in a plain text version of where images appeared, and which 
> image was used. E.g.:
>
>     text
>     text
>
>     [image: xyz.gif]
>
>     text
>     text
>
> This would not seem to be too much to ask, and I think Lynx will do 
> this if you use the -dump option to save HTML as plain text.
>
Sure, it isn't hard.  But neither is HTML markup for the same thing.

[img src="xyz.gif"]

And this has the added benefit of being parse-able (is that a word?) by 
a whole slew of already existing validators, link-checkers, editors, etc.

It isn't that what bowerbird proposes is impossible to do.  It's that 
it's already been done, and arguably better than he'll do it, simply 
because there are YEARS of development and more people than I can count 
behind HTML.  ZML has bowerbird and nothing (so far) to show for it.

The bi problem with how bowerbird defines plain text (basically, ASCII 
letters with some markup for italics, bold, images, etc) also fits 
HTML/XML.  But he claims that HTML/XML don't qualify as plain text. 

What exactly is different about them?  Complexity?  Hardly.  HTML can be 
QUITE simple.  If all you want is to mark italics, bold, and images, 
HTML is ridiculously simple.

Josh
From shalesller at writeme.com  Fri Oct 15 19:51:53 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Oct 15 20:51:54 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com>

Bowerbird@aol.com writes:
> 3) though z.m.l. will create other formats, 
> people will prefer z.m.l., due to the viewer. 

Only those people who will install a viewer to read Project
Gutenberg books, probably a small percentage of those who
visit Project Gutenberg.
 
> doubt it? then join the beta-test, 
> and tear my little baby to shreds... 

You ignore my critiques, so I'd rather not waste my time
writing them. Furthermore, I run Un*x, not Windows.
 
> i don't believe you can show me many e-texts from the library 
> that i cannot format unequivocally using zen markup language. 
> the figure i usually give is 3%, which now is 420+ e-texts, 
> but i'll be surprised if the number you can find gets that high. 
> frankly, i don't think you'll be able to find more than a few... 

Last time I checked, you only supported ASCII, so we can toss all
our non-English texts in there. "Selections from the Writings of
Lord Dunsany" provides a great example of an English book where
you can't just magically add the accents in from a list of accented
words, because the accents are in the character's names, plus
anything that has Greek, or anything discussing Eastern Europe,
or translations of the Sanskrit holy works, etc.

Another fact is that Project Gutenberg's books were only ASCII
plain text for a long time, and still are for a large part. 
Thus people doing work for PG did books that could be done well 
in ASCII plain text. Of course, you can handle 97% today, but DP 
does more and more stuff that isn't just novels with purely linear 
text.
 
And last, I still object to this standard. We wouldn't build a
building that 3% of the people couldn't enter. We shouldn't
even consider standardizing on a system that can't handle 3%
of the books we do. 420 books is a lot of books, and even if
it stays three percent, it will only get larger.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Fri Oct 15 21:43:38 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 21:43:59 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <e8.43fe1bb.2ea200fa@aol.com>

anne said:
>   Only if done with a wink. Like this--     ;-)

wonderful!        ;+)

-bowerbird
From hacker at gnu-designs.com  Fri Oct 15 21:52:35 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Fri Oct 15 21:53:35 2004
Subject: [gutvol-d] responses in the hopper
In-Reply-To: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com>
References: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410160052260.23465@angst.gnu-designs.com>


>> 3) though z.m.l. will create other formats, people will prefer z.m.l., 
>> due to the viewer.

> Only those people who will install a viewer to read Project Gutenberg 
> books, probably a small percentage of those who visit Project Gutenberg.

  	I've been lurking, but I'm a long-time contributor to these and 
similar efforts. I really don't think any of this bickering is very 
productive.

  	Can we all just begin turning our attention to a common goal, 
instead of arguing about who has the better Acme Widget this week?

  	I've got some comments to lend to the discussion, but I've 
refrained, because I see how some of the seemingly-neutral comments are 
taken, and responded to.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From stephen.thomas at adelaide.edu.au  Fri Oct 15 21:53:59 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Fri Oct 15 21:54:20 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
In-Reply-To: <4170965A.5020706@hutchinson.net>
References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
	<4170927F.8040902@adelaide.edu.au>
	<4170965A.5020706@hutchinson.net>
Message-ID: <4170A967.70507@adelaide.edu.au>

Josh,

I can't argue with you -- I've spent years marking up plain text 
into HTML, because I believe that HTML provides a superior ebook 
to plain text. (Others may feel free to disagree -- just don't 
tell me about it.)

But PG seems wedded to the idea that there must always be a 
plain text version, and if we're going to create a plain text 
from an HTML with images, then where's the problem with 
retaining at least the location of the images in the plain text?

Steve


Joshua Hutchinson wrote:
> Steve Thomas wrote:
> 
>> Actually, I took bowerbird's plea to mean simply that he wanted *some* 
>> indication in a plain text version of where images appeared, and which 
>> image was used. E.g.:
>>
>>     text
>>     text
>>
>>     [image: xyz.gif]
>>
>>     text
>>     text
>>
>> This would not seem to be too much to ask, and I think Lynx will do 
>> this if you use the -dump option to save HTML as plain text.
>>
> Sure, it isn't hard.  But neither is HTML markup for the same thing.
> 
> [img src="xyz.gif"]
> 
> And this has the added benefit of being parse-able (is that a word?) by 
> a whole slew of already existing validators, link-checkers, editors, etc.
> 
> It isn't that what bowerbird proposes is impossible to do.  It's that 
> it's already been done, and arguably better than he'll do it, simply 
> because there are YEARS of development and more people than I can count 
> behind HTML.  ZML has bowerbird and nothing (so far) to show for it.
> 
> The bi problem with how bowerbird defines plain text (basically, ASCII 
> letters with some markup for italics, bold, images, etc) also fits 
> HTML/XML.  But he claims that HTML/XML don't qualify as plain text.
> What exactly is different about them?  Complexity?  Hardly.  HTML can be 
> QUITE simple.  If all you want is to mark italics, bold, and images, 
> HTML is ridiculously simple.
> 
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From traverso at dm.unipi.it  Fri Oct 15 22:15:37 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Oct 15 22:15:59 2004
Subject: [gutvol-d] Fw: [PG-EU] Daisy and Gutenberg
References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
	<4170927F.8040902@adelaide.edu.au>
	<4170965A.5020706@hutchinson.net> <4170A967.70507@adelaide.edu.au>
Message-ID: <200410160515.i9G5FbF7004572@posso.dm.unipi.it>


I forward this message, appeared on PG-EU:
Carlo

------------------------------------------------------------------------

From: "Branko Collin" <collin@xs4all.nl>
To: pg-eu@vrijschrift.org
Priority: normal
X-Virus-Scanned: by XS4ALL Virus Scanner
Subject: [PG-EU] Daisy and Gutenberg
Sender: pg-eu-admin@vrijschrift.org
Reply-To: pg-eu@vrijschrift.org
Date: Sat, 16 Oct 2004 01:09:34 +0200
X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on posso.dm.unipi.it
X-Spam-Level: 
X-Spam-Status: No, hits=-4.8 required=2.5 tests=AWL,BAYES_00 autolearn=ham 
	version=2.64


The following is more of a gutvol-d subject, but I am no longer 
subscribed there, so I'll post it here.

Vrijschrift was kind enough to get a seat reserved for a Project 
Gutenberg volunteer at the Symposium for Alternative Models for 
Copyright, and Wiebe gave me that seat.

At the symposium, drs. Maarten Verboom of FNB 
(<http://www.fnb.nl/sub_home/english.html>, subtitle: "literature and 
information for people with a reading disability") gave a talk, which 
I did not get to hear, because I had to go and visit a prospective 
customer.

However, afterwards I did chat with his colleague, Arne Leeman. I 
asked him whether they knew of Project Gutenberg ("yes, definitely"), 
whether they are using our texts ("Yes", though not many), and 
whether there are things we could do to make things easier for them.

Actually, there is, and that is publishing books in Daisy (an XML 
application specifically geared to speaking books). I told him that 
that may not be the XML standard we will be ending up with, to which 
he replied that any XML would be better than plain vanilla text, 
especially richer mark-up, because it could make texts easier to 
convert to Daisy.

I also invited FNB to send us requests for public domain books they 
like to see digitized. (I seem to remember that there are content 
providers at DP who take requests.)

-- 
branko collin
collin@xs4all.nl
_______________________________________________
PG-EU mailing list
PG-EU@vrijschrift.org
http://mailman.vrijschrift.nl/listinfo/pg-eu
From Bowerbird at aol.com  Fri Oct 15 22:37:09 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 15 22:37:33 2004
Subject: [gutvol-d] I'm sorry but I don't get it...
Message-ID: <1dc.2e23ae00.2ea20d85@aol.com>

david said:
>   Once again, you're going to blow this off as irrelevant, aren't you.

no, i'm going to tell you to join the beta-testing listserve,
where questions such as this can be raised and answered...         :+)

have a nice weekend, mr. starner...         ;+)

-bowerbird
From tb at baechler.net  Fri Oct 15 23:07:30 2004
From: tb at baechler.net (Tony Baechler)
Date: Fri Oct 15 23:06:43 2004
Subject: [gutvol-d] Fw: [PG-EU] Daisy and Gutenberg
In-Reply-To: <200410160515.i9G5FbF7004572@posso.dm.unipi.it>
References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com>
	<4170927F.8040902@adelaide.edu.au>
	<4170965A.5020706@hutchinson.net> <4170A967.70507@adelaide.edu.au>
Message-ID: <5.2.0.9.0.20041015230622.01fbb910@snoopy2.trkhosting.com>

Hello.  You can find more information about DAISY including conversion 
tools below.  I think that the html books could be converted fairly easily 
since they include headings already.

http://www.daisy.org/

I think there is a free converter but I'm not sure.  I know most of them 
are commercial.

From colc at gutenberg.net.au  Sat Oct 16 00:44:13 2004
From: colc at gutenberg.net.au (Col Choat)
Date: Sat Oct 16 00:46:19 2004
Subject: [gutvol-d] Sniffing Books
In-Reply-To: <1d8.2d75eafe.2ea1c99a@aol.com>
Message-ID: <LGEBJEPCJPGOHPFBBJFKMEEKDDAA.colc@gutenberg.net.au>

Wow, you guys really have helped me overcome the glitches. I knew that I
needed two magic words to jolt the emitter into life and, while I certainly
wasn't the one abducting Snap'es ingredients, once Harry's name was brought
up I KNEW that one of the words just HAD to be 'Expelliarmus!'. I would
never have guessed that 'XML' was the other word, but it was. I am sitting
here now, reading the recently posted 'Thrilling Stories Of The Ocean', by
Marmaduke Park, and the smell of the sea is wafting about me. It looks like
xml IS a magic bullet after all. I just wonder if it can help me to create a
gentle breeze to flutter the curtains a little. Or would that be asking too
much?
  -----Original Message-----
  From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com
  Sent: Saturday, 16 October 2004 10:47 AM
  To: gutvol-d@lists.pglaf.org
  Subject: Re: [gutvol-d] Sniffing Books


  In a message dated 10/15/2004 4:04:24 PM Mountain Standard Time,
colc@gutenberg.net.au writes:
    >>Don't get too hung up on this one, as I am working on a >>"virtual
aroma
    >>emitter". You rub your left ear in a certain way as you >>read the
e-book and
    >>can then bring forth mould, vinegar, coffee, new-mown >>grass or
whatever is
    >>required by that page to enhance your reading >>experience. I just
have a few
    >>technical hitches to overcome. Version 2 will emit the >>smells
without
    >>rubbing your ear. It will recognise words like coffee, >>grass,
perfume, roast
    >>beef, etc. We will need a black list and a white list, of >>course.
Most of us
    >>don't want to experience the actual smells as we are >>reading about
running
    >>around the sewers below the streets of Paris.


  Aha! So YOU'RE the one who has been sneaking
  into Snapes's study abducting his potion ingredients!
  Doggone it, you KNOW Harry and Ron and Hermione
  got in trouble over it! Apologize and admit your guilt.
  (Gad. That sounds like a line from THE LAST EMPEROR
  or TO LIVE, doesn't it!)

  Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041016/79f5523b/attachment.html
From Bowerbird at aol.com  Sat Oct 16 01:14:39 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Oct 16 01:15:05 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <1db.2ce31fcb.2ea2326f@aol.com>

hacker@gnu-designs.com
>   I've been lurking, but I'm a long-time contributor 
>   to these and similar efforts.

are you the "hacker" who did the 9/11 report,
and was talking about doing wikipedia too?

-bowerbird
From shalesller at writeme.com  Sat Oct 16 12:42:13 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Oct 16 12:42:22 2004
Subject: [gutvol-d] responses in the hopper
Message-ID: <20041016194213.A60C24BDAB@ws1-1.us4.outblaze.com>

"David A. Desrosiers" writes:

> I've got some comments to lend to the discussion, but I've 
> refrained, because I see how some of the seemingly-neutral comments are 
> taken, and responded to. 

"seemingly" is an important word there. Bowerbird has started several
flame wars here, and most of us have given up any hope of seeing 
anything productive out of him. Trust us, we're a lot more open to
people who haven't repeatedly told us "I could help you, but I won't,
because you're not worth it."

(For one exact quote, try:

> i've written the routines to do this (and other things), so i have 
> supreme confidence that the investment would be worthwhile. 
>
> (if i hadn't been treated so badly by some people here, i would 
> be happy to give you the routines. but cheer up, they are _not_ 
> that difficult to write.) 

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hacker at gnu-designs.com  Sat Oct 16 21:12:29 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Oct 16 21:13:34 2004
Subject: [gutvol-d] responses in the hopper
In-Reply-To: <1db.2ce31fcb.2ea2326f@aol.com>
References: <1db.2ce31fcb.2ea2326f@aol.com>
Message-ID: <Pine.LNX.4.61.0410170012100.18275@aphrodite.gnu-designs.com>


> are you the "hacker" who did the 9/11 report, and was talking about 
> doing wikipedia too?

 	One and the same, yes.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Sat Oct 16 21:13:16 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Oct 16 21:14:34 2004
Subject: [gutvol-d] responses in the hopper
In-Reply-To: <Pine.GSO.4.58.0410160833460.20467@vtn1.victoria.tc.ca>
References: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410160052260.23465@angst.gnu-designs.com>
	<Pine.GSO.4.58.0410160833460.20467@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.61.0410170007230.18275@aphrodite.gnu-designs.com>


> All that I've learned from these "discussions" is that a number of 
> people have a number of various ideas about mark-up (myself 
> included), and there's no way to make everybody happy.

 	...much like bringing 5 friends into a video store, and trying 
to agree on one movie for everyone to watch. Not going to happen ;)

 	That being said, I'd be interested in seeing a list of the 
tools people know of, or are working on, or have worked with in the 
past, that can be used to take a 7-bit ascii text PG work, and convert 
it into other formats.

 	Like you, I have some ideas of my own (as well as some tools 
I've rolled myself to help), and I'd like to see what everyone else is 
using right now.

 	A quick google for my full name (with initial and in quotes) 
will tell you exactly why I'm interested in this exact topic of 
discussion ;)


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From Bowerbird at aol.com  Mon Oct 18 12:54:18 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct 18 12:54:34 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <6d.35c457ea.2ea5796a@aol.com>

craig said:
>   you poofing friggleschnitz!

mommy, mommy, craig is starting a flamewar!
he called me a poofing friggleschnitz, mommy!
i demand that he be banned!      ;+)

***

david said:
>   One and the same, yes.

great!          :+)

and you did a little bit of work on...
what was it?... plucker, right?     ;+)

if you are prepared to tackle the i.m.d.b.,
15,000 e-texts should be a piece of cake...


>   ...much like bringing 5 friends into a video store, 
>   and trying  to agree on one movie for everyone 
>   to watch. Not going to happen ;)

i wish things were that inconsequential.
but the project gutenberg e-texts make up
the most important e-library, historically.

although even now it's starting to be dwarfed
by other efforts, it would be a proper tribute
to michael if it were to be well-maintained...


>   That being said, I'd be interested in seeing 
>   a list of the  tools  people know of, or are 
>   working on, or have worked with in the past, 
>   that can be used to take a 7-bit ascii text 
>   PG work, and convert it into other formats.

"convert" is a rather loose and 
unspecific word, wouldn't you say?    :+)

nonetheless, i'll cut to the chase...

the main problem with the e-texts is
their formatting is _so_ inconsistent.
so before you can do anything useful
with them, you must write routines
that can resolve their inconsistency.

the inconsistency is very maddening,
because it's so pointless.  although
some is understandable, considering
how many hands created the e-texts,
the sadder truth is that much of it
could have been prevented; however,
mr. newby and company simply fail 
to grasp the negative consequences 
of the inconsistency, and thus never
made it their priority to minimize it.

the good news is you _can_ write
routines that will fix the problem.
it is _not_ impossible, just thorny;
the biggest expenditure of time is a
quality-control check to make sure
that you knew every inconsistency.
their variety will amaze and astound.

subsequent conversion to any format
is straightforward once you have done
the job of resolving the inconsistency.

you don't even have to do that job,
if you don't want to, you can just
go to david moynihan at blackmask
and get his files, as he has edited
out almost all the inconsistency,
which is what then allowed him
to make a half-dozen versions of
most e-texts in the entire library.

if you're looking for explicit info,
ron burkey did a converter called
"gutenmark", and his website at
http://www.sandroid.org/gutenmark
does a good job of documenting the
inconsistency he faced on the way,
before he gave up the effort, saying:
>   the more perfect my 
>   automated conversions became, 
>   the farther (in my own mind) 
>   I seemed to be from 
>   having a perfect conversion. 

i think that's a nice way of saying that
the more he learned about the e-texts,
the more he found out how bad they are,
from the standpoint of consistency...

there is also some basic information at:
palmdigitalmedia.com/dropbook/converting

but i'd guess that at this point in time,
moynihan will have the most expertise
about the problems you would be facing.
much of it might be inside his noggin,
but i do know he has a _lot_ of macros
that undoubtedly embed gobs of wisdom.
and, more to the point, david has shown,
incontrovertibly, that mass conversions
to a plethora of formats is fully possible.

recently, david even _offered_ his files
to project gutenberg, but -- as far as i
know -- his gift was spurned, for some
bizarre reason i'll never be able to grasp.

oh yeah, i've written some routines that
squash out most of the inconsistency, and 
there's a way you could pry 'em out of me
-- namely, if you got support for my z.m.l.
(zen markup language) built into plucker.
it's a simple rule-set; you could probably
have it up-and-running in a couple days...
backchannel me if you're interested.     :+)

once you've vanquished the inconsistency,
there are other concerns, which might or
might not be a problem to you, including:
1.  errors in the e-texts, lots of them.
2.  styling lost or converted to all-caps.
3.  information about images discarded.
4.  image filenames are often not unique.
5.  accents lost in many foreign e-texts.
6.  a confusing redundancy of some books.
7.  attacks levied if you reveal problems.

oh yeah, also make sure that you are always
working with the freshest e-texts available,
as i'm not sure if they make an announcement
whenever they make corrections to an e-text;
they just quietly substitute in the new file...

***

i would welcome you here, but i am
on my way out the door _very_ soon...     :+)

there are a handful of tarking naugshlocks here
_so_ unworthy of my help they made me decide
to decline to do any work for project gutenberg,
in spite of its great historical importance and 
my highest regard for the genius of michael hart.
i'm sure others, like you, will cover my absence,
while i will be happy grazing greener pastures...

at any rate, have a nice day...          ;+)

-bowerbird
From joshua at hutchinson.net  Mon Oct 18 13:23:58 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 18 13:24:07 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <20041018202358.BA3CC9E93C@ws6-2.us4.outblaze.com>

----- Original Message -----
From: Bowerbird@aol.com

> i would welcome you here, but i am
> on my way out the door _very_ soon...     :+)

Don't tease me.

Josh
From hacker at gnu-designs.com  Mon Oct 18 13:34:59 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Mon Oct 18 13:35:38 2004
Subject: [gutvol-d] re: poofing and tarking
In-Reply-To: <6d.35c457ea.2ea5796a@aol.com>
References: <6d.35c457ea.2ea5796a@aol.com>
Message-ID: <Pine.LNX.4.61.0410181559590.26733@aphrodite.gnu-designs.com>


> and you did a little bit of work on... what was it?... plucker, 
> right?  ;+)

 	Quite a bit more than "a little", but yes, thats me.

> if you are prepared to tackle the i.m.d.b., 15,000 e-texts should be 
> a piece of cake...

 	Yep, once the structures are laid out for the classifications 
of works covered by Gutenberg. A good bulk of this has already been 
done by hundreds of contributors over the years.

> although even now it's starting to be dwarfed by other efforts, it 
> would be a proper tribute to michael if it were to be 
> well-maintained...

 	What other efforts are you alluding to? Why not help those 
people who insist on reinventing a fleet of new wheels, to collaborate 
with existing projects that have similar/same goals?

> "convert" is a rather loose and unspecific word, wouldn't you say? 
> :+)

 	Yes, and specifically chosen for that reason. Gutenberg etexts 
are nonspecific, and "converting" them means taking a slightly 
different approach, depending on what I'm converting; poems, plays, 
books, etc. for each work. You can't use a single rigid approach for 
all works.

> the main problem with the e-texts is their formatting is _so_ 
> inconsistent. so before you can do anything useful with them, you 
> must write routines that can resolve their inconsistency.

 	And this is exactly what the Distributed Proofreaders project 
proposes to solve, and they've been pretty successful thus far, IIRC.

> the inconsistency is very maddening, because it's so pointless. 
> although some is understandable, considering how many hands created 
> the e-texts, the sadder truth is that much of it could have been 
> prevented; however, mr. newby and company simply fail to grasp the 
> negative consequences of the inconsistency, and thus never made it 
> their priority to minimize it.

 	I've had a lot of luck stepping out of the box, and analyzing 
the text based on the "style" of the text, versus the actual content 
itself. I was approached by someone who is doing a paper and his PhD 
thesis on this exact kind of approach. Basically (with my expertise 
and help) he's taking the bulk of Gutenberg, importing every word from 
every work into a database, and then running his own algorithms across 
the entire collection, to pull out the styles by known authors.

 	For example, with his approach, you can determine that a work 
claiming to be by "A. Einstein", is the same author as one claiming to 
be by "Albert Einstein", (S. Clemens -> Mark Twain -> Samuel Clemens, 
etc.)

 	From there, you can then begin correcting the inaccuracies in 
the titling, authoring, and inflection of the work itself, including 
basic things like sentence structure, spelling, and so on.

 	I've extended the schema quite a bit to allow some interesting 
other queries to be run ("Show me all works larger than 100 pages, 
written by male authors between the years 1951 to 1957").

 	With that done, it is a (relatively) simple matter to convert 
the 7-bit ascii text to something more manageable, such as structured 
XML + an associated DTD to turn that into something else.

> the good news is you _can_ write routines that will fix the problem. 
> it is _not_ impossible, just thorny; the biggest expenditure of time 
> is a quality-control check to make sure that you knew every 
> inconsistency. their variety will amaze and astound.

 	And I assume you've done this? And your routines are made 
public somewhere, so others can improve and correct them to continue 
to be better? I don't recall seeing a URL to download your code or 
routines. Can you reply back with that, so we can take a look?

> you don't even have to do that job, if you don't want to, you can 
> just go to david moynihan at blackmask and get his files, as he has 
> edited out almost all the inconsistency, which is what then allowed 
> him to make a half-dozen versions of most e-texts in the entire 
> library.

 	And where is his code? Where are his "routines"? I don't see 
them on his site at all. I'll send him an email later this week to see 
if he wants to contribute those all back.

 	ALL of the talk of how "easy" this is, is completely 
irrelevant, if nobody wants to actually contribute that knowledge back 
so others can improve and benefit from it.

 	If you're not willing to do this, then our conversation stops 
here. There is no point in continuing the discussion, if you intend on 
trying to retain "control" of this kind of logic within your own 
circle of projects.

> if you're looking for explicit info, ron burkey did a converter 
> called "gutenmark", and his website at 
> http://www.sandroid.org/gutenmark does a good job of documenting the 
> inconsistency he faced on the way, before he gave up the effort, 
> saying:

 	I've talked to Ron before via email, and described some of my 
needs for improvements to his tool. He's no longer maintaining it, so 
it is up to me (if I choose) to update his code and improve it 
further.

> recently, david even _offered_ his files to project gutenberg, but 
> -- as far as i know -- his gift was spurned, for some bizarre reason 
> i'll never be able to grasp.

 	What was that "bizarre reason"? Is he still on this list? Did 
anyone else obtain his code? Does it exist out there for download?

> oh yeah, i've written some routines that squash out most of the 
> inconsistency, and there's a way you could pry 'em out of me -- 
> namely, if you got support for my z.m.l. (zen markup language) built 
> into plucker. it's a simple rule-set; you could probably have it 
> up-and-running in a couple days... backchannel me if you're 
> interested.  :+)

 	Not interested. Our code is freely available. If you want 
someone to support "your" format, then you'll probably have to take 
that first step by justifying and documenting it. The only page I 
could find describing the format was here:

 	http://czt.sourceforge.net/zml/

 	And I assume thats not your project or code.

 	If it is anything different than HTML, it would require 
significant re-engineering of the core parser components used in 
Plucker and a lot of testing to make sure it didn't break anything in 
the existing parser in the process.

 	In other words, not a couple-of-days of effort as you suggest.

> once you've vanquished the inconsistency, there are other concerns, 
> which might or might not be a problem to you, including:
> 1.  errors in the e-texts, lots of them.

 	What kind of errors? Incorrect hyphens? Broken paragraphs? 
Missing end quotes? (this is common)

> 2.  styling lost or converted to all-caps.

 	Impossible to regain, unless you have the original work 
in-hand, to see if there were actual CAPS used, or not. Maybe the 
"errors" were intentional. Many authors use poetic license to express 
their thoughts, and sometimes those things break the rules of grammar 
and spelling.

> 3.  information about images discarded.

 	Same, see above.

> 4.  image filenames are often not unique.

 	How do you mean? You mean 1.jpg 1.jpg 1.jpg appearing in three 
places, but intended to represent 3 _different_ images? Where do you 
see this inconsistency? Give me an example of a Gutenberg work that 
shows this. I'd like to verify it for myself.

> 5.  accents lost in many foreign e-texts.

 	Seems to be a problem with the auditor/editor's charset or 
support for those charsets in their editor. I agree that the original 
nature and charset of the document should be retained. How do you 
express a Cyrillic text in 7-bit ascii? You can't.

> 6.  a confusing redundancy of some books.

 	Such as?

> 7.  attacks levied if you reveal problems.

 	Are you revealing the "problems" in a condescending way? Or in 
a constructive way? The way you approach the "Hey, this is broke" 
process is very telling as to how you will be received and responded 
to for same.

> oh yeah, also make sure that you are always working with the 
> freshest e-texts available, as i'm not sure if they make an 
> announcement whenever they make corrections to an e-text; they just 
> quietly substitute in the new file...

 	..which is exactly why you should have your own mirror of 
Gutenberg, or a subset of it as you work on the pieces.

> there are a handful of tarking naugshlocks here _so_ unworthy of my 
> help they made me decide to decline to do any work for project 
> gutenberg, in spite of its great historical importance and my 
> highest regard for the genius of michael hart. i'm sure others, like 
> you, will cover my absence, while i will be happy grazing greener 
> pastures...

 	If you are "moving on", then it behooves you to try to 
contribute what you've learned (in terms of knowledge, code, or 
"routines") back to those who will continue to contribute and learn.

 	We're only here to help the next generation learn and improve. 
If we're not leaving anything here by which others can remember us and 
grow themselves; if we're not teaching others as we learn ourselves, 
then what is the point?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From Gutenberg9443 at aol.com  Mon Oct 18 14:11:12 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Oct 18 14:11:35 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <1b9.3df0833.2ea58b70@aol.com>

 
In a message dated 10/18/2004 1:54:55 PM Mountain Standard Time,  
Bowerbird@aol.com writes:

but --  as far as i
know -- his gift was spurned, for some
bizarre reason i'll  never be able to grasp.


I don't know why it was spurned, but I do know that he's posted a lot of  
stuff that is still in copyright and sooner or later lawyers are going to eat  
his lunch. As long as nobody could figure out who owned the copyright, that was  
okay, but now that courts have ruled on the owner, it's another matter.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041018/da89e74e/attachment.html
From Gutenberg9443 at aol.com  Mon Oct 18 14:12:35 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Mon Oct 18 14:12:51 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <6.35ff8a75.2ea58bc3@aol.com>

 
In a message dated 10/18/2004 1:54:55 PM Mountain Standard Time,  
Bowerbird@aol.com writes:

there  are a handful of tarking naugshlocks here
_so_ unworthy of my help they  made me decide
to decline to do any work for project gutenberg,
in spite  of its great historical importance and 
my highest regard for the genius of  michael hart.
i'm sure others, like you, will cover my absence,
while i  will be happy grazing greener pastures...


Why can't you just vanish, if such is your preference,
without being obnoxious about it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041018/08c63aa8/attachment.html
From joshua at hutchinson.net  Mon Oct 18 14:28:21 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 18 14:28:31 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <20041018212821.A66BB4F583@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] re: poofing and tarking
Date: Mon, 18 Oct 2004 16:34:59 -0400 (EDT)

> 
>  	I've had a lot of luck stepping out of the box, and analyzing 
> the text based on the "style" of the text, versus the actual content 
> itself. I was approached by someone who is doing a paper and his PhD 
> thesis on this exact kind of approach. Basically (with my expertise 
> and help) he's taking the bulk of Gutenberg, importing every word from 
> every work into a database, and then running his own algorithms across 
> the entire collection, to pull out the styles by known authors.

<snipped some good stuff>

You've really intrigued me by your description of what you're working on.  Is there anywhere I can read up more on it?  Sounds very promising.

Josh
From Bowerbird at aol.com  Mon Oct 18 15:19:17 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct 18 15:19:34 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <193.313f28ef.2ea59b65@aol.com>

anne said:
>   I don't know why it was spurned, but 
>   I do know that he's posted a lot of 
>   stuff that is still in copyright and 
>   sooner or later lawyers are going to 
>   eat his lunch. As long as nobody could 
>   figure out who owned the copyright, 
>   that was okay, but now that courts have 
>   ruled on the owner, it's another matter.

i don't know anything about that.

i'm very supportive of people who will
have the guts to publish something and
take a chance at being dragged into court,
if they are making that thing _available_
when it was an orphan out of circulation.

but again, i don't know about blackmask,
so i don't know if that applies, or not...

-bowerbird
From Bowerbird at aol.com  Mon Oct 18 15:54:29 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct 18 15:54:58 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <1d8.2db73d37.2ea5a3a5@aol.com>

david said:
>   Quite a bit more than "a little", but yes, thats me.

that's what the winkey-smile was all about...       :+)


>   What other efforts are you alluding to? 
>   Why not help those people who insist on 
>   reinventing a fleet of new wheels, to collaborate 
>   with existing projects that have similar/same goals?

well, mainly because amazon is looking for any "help".
and my guess is that google will be equally isolationist.
moreover, the publishers they are both coddling do not 
share the "let's share" mentality that drove michael hart.

stanford has talked about a huge digitization effort, but
i'm not sure how far they are, or if they've even started.
but it is safe to say they've got enough money that they
won't be on the lookout for any volunteers to assist them.

and i'm not sure what's up with the million-book project,
or even who is in charge of that effort, but their objective
seems slightly different than the one guiding things here;
they're seem to make scan-books, not cleaned-up e-texts.
(they _are_ doing o.c.r., but they're not proofing the stuff.)

and the specific area that's of _most_ interest to me is
the one comprised of the d.i.y. authors using cyberspace
to connect directly to their audience, sidestepping the
clutches of the middlemen who were necessary before.
this is _new_ content, so i think it'll eventually eclipse
the public-domain that is the thrust of project gutenberg...


>   And this is exactly what the Distributed Proofreaders project 
>   proposes to solve, and they've been pretty successful thus far, IIRC.

um... distributed proofreaders _might_ solve the problem of
inconsistent formatting _sometime_ down the line, if their
policies settle into place.  but if you've looked at their output 
throughout their history, you'll know they have not done it yet.

but, honestly, that shouldn't even be their job in the first place.
project gutenberg should have had solid formatting rules down
long before distributed proofreaders even came into being...


>   I've had a lot of luck stepping out of the box, and 
>   analyzing the text based on the "style" of the text, 
>   versus the actual content itself. I was approached by 
>   someone who is doing a paper and his PhD thesis on 
>   this exact kind of approach. Basically (with my expertise 
>   and help) he's taking the bulk of Gutenberg, importing 
>   every word from every work into a database, and 
>   then running his own algorithms across the entire collection, 
>   to pull out the styles by known authors.

well, that sounds interesting.        :+)

but "authorial style" is fairly irrelevant to the matter of
formatting the text in the way that a typographer does it,
or an e-book requires, which is actually the task at hand...


>   And I assume you've done this? 

yep.


>   And your routines are made public somewhere, 
>   so others can improve and correct them to continue to be better?

nope.  if i turned 'em loose now, the tarking naugshlocks here
could just pick them up for their own nefarious purposes, and
i have _no_ intention of letting my hard work be used that way.

i came here to the project gutenberg listserves in the first place
because i intended to share, but that intention has been squashed.

besides, all the negative feedback i've gotten here has convinced me
that people out in the world think that what i've done is impossible.
so i figure there must be a couple bucks in it, and why give them up?
i don't have a day-job, and my girlfriend deserves some nice things...


>   And where is his code? Where are his "routines"?

i don't believe that moynihan has ever made his macros available.
he _did_ offer them to project gutenberg, but i guess they declined.
i can't express to you just how stupid i think _that_ decision was...


>   ALL of the talk of how "easy" this is, is completely irrelevant, 
>   if nobody wants to actually contribute that knowledge back 
>   so others can improve and benefit from it.

if somebody thinks something (which would be very valuable to them)
is impossible, and you know that it isn't, don't you think that you should
_tell_them_?

i do.  that's why i'm here. 

i'm not gonna do it _for_ them, because they've abused me so
-- are you in the habit of helping people who mistreat you? --
but, since i do believe so passionately in electronic-books,
i feel i have an obligation to _try_ and make them wake up.

that's why i've stayed here for so long, nearly a year,
and taken all the abuse that they have dished out to me.
because i believe in e-books, and i admire michael hart.
(michael, by the way, has been very supportive of me.)

but i'm about to give up, because they just won't listen.
nonetheless, i feel _good_ about the fact that i _tried_...


>   If you're not willing to do this, then our conversation stops here.

ok.  no problem.

i believe in sharing, and told you what you could share with me,
if you want me to share my work back with you, but if that's not
acceptable to you, then i'm cool with that.  i'll go my own way...


>   What was that "bizarre reason"?

you'll have to ask the people in charge.


>   If you want someone to support "your" format, 
>   then you'll probably have to take that first step by 
>   justifying and documenting it. 

i've done that.  i've posted it several times on this listserve.
and i will send it to you backchannel.  11 dirt-simple rules.


>   The only page I could find describing the format was here:
>   http://czt.sourceforge.net/zml/
>   And I assume thats not your project or code.

you're right, that's not it.


>   What kind of errors? Incorrect hyphens? Broken paragraphs? 
>   Missing end quotes? (this is common)

all of those, yes.  and many more.  every kind imaginable.
and some that you never would have been able to imagine.


>   Impossible to regain, unless you have the original work in-hand, 
>   to see if there were actual CAPS used, or not. 

right.


>   Maybe the "errors" were intentional. Many authors use
>   poetic license to express their thoughts, and sometimes 
>   those things break the rules of grammar and spelling.

that's not it.  the all-caps convention dates back to the days of 
keypunch machines, when computers had no lower-case characters.
(there is a rumor that i started that michael hart actually entered
"alice in wonderland" on a keypunch machine.  don't know if it's true.)


>   How do you mean? You mean 1.jpg 1.jpg 1.jpg appearing in 
>   three places, but intended to represent 3 _different_ images? 

no, i mean 1.jpg being used in 3 different _e-texts_.

which means that you can't dump those e-texts into the same folder
without experiencing a filename crash.  which means that you need to
rework all the filenames in the library if you want 'em to be unique,
which is something that you do really want, if you value your sanity...


>   How do you express a Cyrillic text in 7-bit ascii? You can't.

right, that's the problem, though 8-bit e-texts have become common.
many mostly-english texts, though, do have foreign words in them
where an 8-bit diacritic was chopped down into a 7-bit character.
some of these can be automatically replaced.  but then you are
running the risk of turning what _was_ a non-diacritic into one.
(for example, burkey cites the change of "role" to "role" with a hat
on the "o".  but you know there are plenty of plain "roles" out there.)


>   Such as?

many large works have been split up into smaller sections,
but then also "collected" into one e-text as well.

there are also "collections" of certain authors, and so on.

you'll want to cull out this redundancy...


>   Are you revealing the "problems" in a condescending way? 
>   Or in a constructive way? 

i know of no more constructive way to reveal a problem than to
diagram the code that will fix it and volunteer to write the app.
i've done that, and had shit heaped at me.  you be the judge.


>   The way you approach the "Hey, this is broke" process 
>   is very telling as to how you will be received and 
>   responded to for same.

and the way i am "received and responded to"
is very telling as to whether i will _continue_
to offer my code that will help fix the problems...


>   which is exactly why you should have 
>   your own mirror of Gutenberg, 
>   or a subset of it as you work on the pieces.

i was just letting you know that piece of information.
otherwise, you might think that you could get the d.v.d.
of the e-texts, and simply work on the e-texts from that.
odds are some of those files have already been replaced...


>   If you are "moving on", then it behooves you to try to 
contribute what you've learned (in terms of knowledge, code, or 
"routines") back to those who will continue to contribute and learn.

i've left a ton of messages, detailing the problems and
laying out exquisite details about the fixes i suggested.
feel free to mine that, if you can plow through the flack.

(the vast bulk of my posts are over on the u.n.c. archives;
i'm not sure if those have been brought to this list, or not.
you should also be aware that the .html conversion program
over on the u.n.c. machine was faulty, and many of the threads
are cut off in midstream.  the full thread is in the .html source,
so you would have to recover the missing messages from there.
that's a good example of how a conversion program can mess up.)


>   We're only here to help the next generation learn and improve. 
>   If we're not leaving anything here by which others can 
>   remember us and grow themselves; if we're not teaching others 
>   as we learn ourselves, then what is the point?

oh, don't get me wrong, david...          :+)

as i outlined above, there are plenty of arenas in the e-book world,
project gutenberg is just one of them.  all the others need help too.

so i don't intend to stop speaking, or to stop working on e-books.
i've been doing continuous work on e-books for over 25 years now.

to the contrary, i intend to _continue_to_speak_, and _loudly_,
and to finally _go_to_work_ and get some of my things finished,
so e-book authors out there in the world can start _using_ them.

the difference is, rather than speak here _quietly_  and _privately_
with the project gutenberg folks here _behind_the_scenes_ on their
own listserves, trying to get them to pay attention to the problems
in their library, i will instead speak _publicly_, using my new blog,
making noise about the many problems that are being ignored here,
so at least the rest of the world learns -- and grows -- from them.

so instead of working to make the "people in charge" here smarter,
which has essentially meant banging my head against a brick wall,
i can instead spend some time _productively_ by making programs
that people can use to spread e-books out into the world at large.
my time, thoughts, and work-product are too valuable and important
to continue wasting them here on people who do not appreciate them.

yours probably are too, but maybe you're more "diplomatic", and
maybe they won't ignore you or badger you when you say something
that they desperately _need_ to hear, even if they don't _want_ to.

oh yeah, and eventually i'll even come back and clean up the e-texts,
when i've automated my various procedures to the fullest extent and
implemented them in code, and the people-in-charge here have made
a mess of the library in the process of trying to make x.m.l. work.

because michael hart deserves better than that...

-bowerbird
From hacker at gnu-designs.com  Mon Oct 18 18:08:16 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Mon Oct 18 18:08:48 2004
Subject: [gutvol-d] [ANN] Project Gutenberg IRC channel
Message-ID: <Pine.LNX.4.61.0410182053490.19416@aphrodite.gnu-designs.com>


 	I've offered this before, and maybe it got lost in the chaff 
of previous messages and threads, so I'll offer it up again.

 	If anyone wants to feel free to discuss issues related to 
Project Gutenberg, ebooks, conversions, tools, bugs, the weather, or 
whatever else in "real-time", they can join our network to do so. I 
run an irc network that is dedicated to developer-related support 
issues on various projects, and I would like to extend that invitation 
to include Project Gutenberg as well.

 	Feel free to join the channel #gutenberg or #project.gutenberg 
on our network... (irc.sourcefubar.net) and talk about any of the 
current, past, future, or whatever events you'd like, related to PG 
and other similar projects and products. We have redundant tri-coastal 
links, so the network will not "split" or go down like (cough) other 
similar networks.

 	We also offer ssl-only ports, for those who wish to make sure 
their conversations are "secured". Many project teams use our network 
for exactly that purpose, in private, secured channels to discuss 
issues related to their projects.

 	If you join the same network on port 994, and configure your 
client appropriately (making sure to get the validated cert from 
cacert.org), you will be able to talk securely from client-to-server, 
without the feer of snooping.

 	These are exciting times, and I hope to see everyone there!

 	Can someone with authority publicize this on PG's website?

 	Welcome aboard!


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From Bowerbird at aol.com  Mon Oct 18 20:13:39 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct 18 20:13:59 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <67.356bca9c.2ea5e063@aol.com>

anne said:
>   Why can't you just vanish, 
>   if such is your preference,
>   without being obnoxious about it?

i don't think my posts are "obnoxious",
anne, or i wouldn't write them.

but if you think they _are_ "obnoxious",
anne, then why do you read them?

-bowerbird
From Bowerbird at aol.com  Mon Oct 18 23:15:41 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Oct 18 23:16:04 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1d4.2d17520f.2ea60b0d@aol.com>

jeroen said:
>   Well, lets keep the name calling off-line, and the discussion pure...

sounds like an excellent idea to me.  let's see if marcello will agree.

***

i appreciate your analysis, and agree with it in large part,
because i think you've faced a good number of the problems.

to pull them out into a bullet-point list, they are these:

>   that semantically tagged is an ideal, that even the most 
>   ambitious attempts at a generic DTD for pre-existing texts 

>   (and that is what we are mostly dealing with in PG) 
>   have not reached

>   and is either unreachable (since we can't know 
>   the original intend with much of the formatting we encounter) 

>   or impractical
>   (since the effort to do all this tagging is just too big

>   and isn't really needed by 99% of the users.) 

>   In my opinion, the best attempt to 
>   such a generic beast has been the TEI effort

>   which is described in a massive 1400 page document, 

>   still requires customization for numerous academic projects 

>   (both are bad news; both are unavoidable 
>   given the complexity of the task)


>   but which can cover 95 percent of all text 
>   with just 5 percent of that bulk 

>   in an incarnation called TEI-Lite, 
>   and that is basically all I suggest to PG to adopt as a standard. 

so if i was to summarize the bulk of what you've said here,
concentrating on the negative, but hopefully in a fair way...

     semantically tagging is an ideal 
     which may be unreachable, 
     and is certainly impractical,
     since it is a big effort and is
     just not needed by most readers.
     
     one method -- t.e.i. -- runs to
     14,000 pages of documentation,
     yet still requires "customization".
     
     however, a less-complex subset
     -- called t.e.i.-lite -- is available,
     and that is what i recommend...

again, i don't mean to "load" the argument by concentrating
on the negative aspects of a heavy-markup approach like this,
because i can certainly see benefits of marked-up e-texts too.

certainly a minimal form of markup is practically a requirement
to move the e-texts to a reasonable e-book and typographic future.

and if the library was already marked-up in x.m.l., and working,
i would probably have no objections at all to continuing with it...

but the reality is that the library is _not_ marked-up already.
so it is necessary for us to examine very closely the _costs_ of
_doing_ any markup, to make sure the _benefits_ outweigh them.

in a phrase, we need to be cognizant of the _cost-benefit_ratio_.

in particular, we should also consider _all_forms_of_markup_
that we think could give us a reasonable set of the benefits at a 
range of costs, to see which gives us the best cost-benefit ratio.


>   Doing fully automatic convertion to good paged PDFs for 
>   printing nice copies (and I mean good, as different from workable) 
>   will probably always remain a dream

sometimes dreams come true, you know...      :+)


>   as good layout, just as good a good typographic design 
>   is a skill, learned through doing it a lot. 

i agree.  completely.

it is also worth noting that we need to be able to deliver
not just _one_ "good paged .pdf" of an e-text, but rather
an entire _spectrum_ of "good paged .pdfs" -- in order to
satisfy the entire spectrum of _readers_ out in the world.
we can't just churn out a .pdf in 12-point-type and be done,
because some readers will want 18-point-type, or 36-point.
most will want a plain white background, but some will want
a pale blue one, or a faint yellow one, or who knows what color.

to be able to give the user that full range of options and _still_
deliver "a good paged pdf with good typographic design" is hard!

i believe it is also true, however, that this skill can be
implemented in source-code if we dedicate some effort.
(it's difficult.  but it's not like sending a man to the moon.)

i have taken the first steps in making that effort, and i would
encourage you to feel free to give me constructive criticism
in examining the progress that i've made, and guiding it along.
that beta-test listserve:  zml_talk-subscribe@yahoogroups.com

or, since you are doing well here in the realm of theoretical,
perhaps you might want to instead specify what "a good pdf"
would look like, or what _you) mean by a "nice" printed copy.

i don't think there is a lot of awareness here along these lines,
and i think it would move the discussion along _significantly_
if we could come to share some agreement on what we _want_.

at some point in time, we are going to have to evaluate the quality
of the output we get from various methodologies, to determine if it
is "good enough" or not.  to do that, we need to develop a standard...

i'm not saying i think it will be _difficult_ to create our standard.
to the contrary, i think it will be fairly easy, once we get started.
rather what i am saying is that that work has not been done here,
so we are still operating in the dark to a large degree.


>   Even in a highly programmable environment such as TeX, 
>   I've never been able to print something from "semantic" markup 
>   without manual interventions once in a while -- 
>   even for something as arcane as a two column dictionary.

i believe you.


>   Simularly, doing a good HTML (as different from a reasonable HTML) 
>   will probably also require manual intervention and tweaking 

i believe you here, too.

and once again here, there is little conscious agreement here
about _what_ constitutes a "good" .html version of an e-text
(as distinguished from a "reasonable" one, to use your terms).

as with the pdf/print standard, i think that it will be fairly simple
to come to agreement about what we want .html versions to be like
-- the best of the files being done now come fairly close, i'd say --
but we haven't actually done the process of forming that agreement.


>   but both these things do not disqualify the large benefits 
>   we  could have from having TEI tagged master copies 

here you are confounding two arguments.

the argument for having a "master" version that will
generate all the "ancillary" versions is _overwhelming_.
it's just ridiculous to try and maintain multiple versions;
the costs of that are far too high for the benefits returned.

but the argument that that "master" version should be t.e.i. 
-- or t.e.i.-lite or any of the other x.m.l.-based formats --
is _far_ less compelling.  i think z.m.l. makes a better master.


>   even if just at a relatively simple level of tagging 
>   (just marking headings, divisions, italics, footnotes, and tables).

i wholeheartedly agree that a "simple level of tagging"
that "marks" these type of things in an unequivocal manner
is a very important minimum-usability hurdle to clear.

as you might expect, though, i don't think angle-brackets
are necessary at all to create this "simple level of tagging".

i do _not_ expect you to take that on faith, however.
i'll show you how to do it.  the proof is in the pudding.


>   The task of producing nice HTML / Printable versions 
>   of XML documents is further complicated by the 
>   highly verbose and somewhat unintuitive model of XSLT, 
>   which is presented as the most important tool for this task

agreed, and i'm glad you recognize the huge costs in this arena.


>   from the computer scientist purist point of view 
>   that might be true, but for many less gods, 
>   who think five lines of basic is already a lot, 
>   its  functional programming model and verbosity 
>   is a real piss-off.

i'm glad you said that, so i didn't have to...


>   Getting 14000+ texts to XML can be done, 
>   just as they where produced initially, 
>   by starting somewhere with the first one,
>   and not stopping until we've completed them all.

that's the attitude!        :+)

is that the wisest choice of action, though?
i'm not nearly so convinced of that.
i think we need to set a better path,
and go off on _that_ one...


>   A very simple alternative way would be to 
>   load them in OpenOffice, 
>   apply the formatting you like 
>   and save it 

i am even less convinced of the wisdom _or_ 
the "simplicity", of _that_ course of action...

any manual methodology is likely to be quite inferior,
from a cost-benefit perspective, because the costs
would be astronomical.  even if you're using volunteers,
at some point, you have to place value on human labor...

if you cannot automate some 95% of the initial markup,
you need to take your method back to the drawing board.
we need to save the human labor to do the _checking_ of
the markup, not waste it doing the initial markup itself...


>   of course that formatting would be very much non-"semantic".

which, of course, negates a lot of the benefits as well,
and thus degenerates the cost-benefit ratio even further.

(and i should point out that none of your discussion really
gets at the essence of what _semantic_ markup would be.)


>   (Still formatting his ebooks in SGML based TEI)

i respect the work you are putting into the effort, immensely.

-bowerbird
From jonathan_ingram at yahoo.com  Tue Oct 19 05:33:46 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Tue Oct 19 05:33:48 2004
Subject: [gutvol-d] [ANN] Project Gutenberg IRC channel
In-Reply-To: <Pine.LNX.4.61.0410182053490.19416@aphrodite.gnu-designs.com>
Message-ID: <20041019123346.80922.qmail@web41726.mail.yahoo.com>


--- "David A. Desrosiers" <hacker@gnu-designs.com> wrote:
>  	Feel free to join the channel #gutenberg or #project.gutenberg 
> on our network... (irc.sourcefubar.net) and talk about any of the 
> current, past, future, or whatever events you'd like, related to PG 
> and other similar projects and products. We have redundant tri-coastal 
> links, so the network will not "split" or go down like (cough) other 
> similar networks.

Along similar lines, we at Distributed Proofreaders also use a chat/conference
room to communicate. We use Jabber's multi-user chat feature, though, rather
than IRC. If you'd like to join, then create a Jabber account (if you haven't
already), and connect to pgdp@muc.jabber.org . All welcome, particularly if
you're a user of/contributor to DP.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From marcello at perathoner.de  Tue Oct 19 06:54:37 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 06:54:43 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <1d4.2d17520f.2ea60b0d@aol.com>
References: <1d4.2d17520f.2ea60b0d@aol.com>
Message-ID: <41751C9D.2010609@perathoner.de>

N.B. Bowerbirds first post on gutvol-p on 11/06/03 is attached below in 
its full glory for the readers convenience.


 > bowerbird 11/06/03
 > it reminds me that i firmly believe us users should file
 > a class-action lawsuit against you computer overlords

 > bowerbird 10/14/04
 > is this kind of name-calling condoned on this listserve?

Bowerbird started name-calling in his very first post and now he is 
soooo sensitive about it. In good ol' usenet tradition, he who dishes 
out freely must also be able to pocket graciously, but, of course, this 
applies to other people, not to Bowerbird himself.

Also he got himself kicked out from at least one newsgroup (ask Jon 
Noring for details) and was at one point set under moderator supervision 
on this list. (Why has this changed? Have some bits got lost in the move 
to pglaf.org?)


 > bowerbird 11/06/03
 > this isn't flame-bait, 'cause i ain't even gonna argue with ya.
 > i've concluded it's a waste of my time to even _discuss_ x.m.l.

Since then, Bowerbird has done nothing else than _cuss_ XML, that is: 
belittle the XML language and the people who are using it. He concluded 
that he'd rather waste _other_ people's time than his own.

If Bowerbird was really interested in establishing an alternative to XML 
markup, he would have fired up emacs and started coding his reader. In a 
month or two he would have shown us the first prototype. If the thing 
really was heaps better than XML, we would have acclaimed him and 
considered changing the DP formatting rules to his format.

Of course, flamewars being his favorite pastime, he got nowhere with his 
reader.

Furthermore he wastes the time of other volunteers and newcomers to this 
mailing list by luring them into yet another endless discussion of the 
exact same topic we already had plenty before. (The topic being: the 
self-celebration of His Royal Highness Bowerbird.)


 > bowerbird 11/06/03
 > you're already over-budget and severely behind-schedule [...]
 > i have written such a program, and i'll have a beta version soon.

Bowerbird first announced his reader on 02/14/03.

   > -- for immediate release --
   > [...]
   > bowerbird intelligentleman announces
   > an open-source project geared toward
   > creating an o.e.b. "presentation system",
   > i.e., a cross-platform reader-program
   > that will allow users to read o.e.b files.
   > [...]
   > bowerbird further indicated that he is fully confident that
   > the effort would bear fruit quickly, since he has previously
   > programmed a wide variety of electronic-book applications.

   http://www.gnutemberg.org/pipermail/libergnu.mbox/libergnu.mbox

That was 20 months ago.

Since then we saw a lot of announcements but never a line of source 
code. (Note: he says "Open Source" in his press release, and he also 
says OEB, which is an XML application.)

If Bowerbird was in good faith, he'd published some source code 
immediately after his announcement to have people review it and comment 
on it. As it stands, nobody has ever seen one single line of his alleged 
mother of all readers. (All we did see were some `screenshots' probably 
done with Microsoft Paint.)


 > bowerbird 11/06/03
 > so you better know i'm prepared to deliver.

I defy Bowerbird to publish the source code of what he has done in 20+ 
months of development. Hic Rhodus, Bowerbird, hic salta! Prove to us 
that you can build a better reader. Don't give us any of your lame 
excuses but deliver now or be silent forever. (Lame excuses we already 
had include: I won't show you because you are so nasty.)


 > bowerbird 11/06/03
 > in a phrase, it's time to put up or shut up.

Of course, this rule again applies to other people, not to Bowerbird 
himself: he did not put up and never will shut up.


Conclusion: Bowerbird is a kook

   (def: http://www.catb.org/~esr/jargon/html/K/kook.html )

who knowingly wastes the time of volunteers who could otherwise do many 
useful things for PG. And he has got nothing to show for it.


And now, for the enlightenment of the newcomers, and for the 
entertainment of those of us who know how hard Bowerbird has been 
working and how much he has achieved in this short year, Bowerbirds 
posting debut on gutvol-p of nearly a year ago ... unabridged.


> i've been writing some apps for project gutenberg,
> so i subscribed to this listserve this evening, and
> i went back and read all the posts for a full year,
> just to get the flavor of what has gone on here...
> 
> boy, what a waste of time...        :+)
> 
> it reminds me that i firmly believe us users should file
> a class-action lawsuit against you computer overlords
> for all the time and trouble you have dragged us through
> in trying to transition us to x.m.l.
> 
> you're already over-budget and severely behind-schedule
> in delivering on the things x.m.l. was supposed to bring us,
> and we haven't seen even a fraction of the promised benefits.
> 
> this isn't flame-bait, 'cause i ain't even gonna argue with ya.
> i've concluded it's a waste of my time to even _discuss_ x.m.l.
> 
> there are 10,000 e-texts in the project gutenberg library --
> 4,000 more than there were when you had your last flamewar
> -- so you've got lots of opportunity to show the value of x.m.l.,
> just get to work, and let us know when you're done doing markup.
> 
> heck, don't even bother to contact us then, just go right to work
> making some x.m.l.-savvy _viewer-programs_ for us end-users,
> because it doesn't do us one bit of good to have marked-up files
> if we don't have any viewers that can make use of that mark-up.
> 
> in a phrase, it's time to put up or shut up.
> 
> and yes, i realize you'll throw that challenge right back at me,
> so you better know i'm prepared to deliver.
> 
> i say we don't need much "markup" -- sometimes _none_ --
> to turn project gutenberg's plain-ascii e-text files into a
> slick electronic-book experience for end-users, if we only
> put a little bit of intelligence into an e-book viewer-program.
> 
> i have written such a program, and i'll have a beta version soon.
> so i hope i've pissed you off, because i _like_ hostile beta-testers;
> i trust them to step past the polite praise and tell me what's wrong.
> 
> that's enough for now, i gotta get back to work.  and so do you...
> 
> -bowerbird


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Oct 19 08:21:46 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 08:22:03 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <1e4.2d39426d.2ea68b0a@aol.com>


once again, because i guess i haven't said it enough,
the beta-test for my viewer-program is now open...

you can join in and see the elusive software yourself
by e-mailing:  zml_talk-subscribe@yahoogroups.com

as soon as i have enough people to get that test going,
i'll be able to move on to that class-action suit against
the computer overlords for wasting our time with x.m.l.
i think the settlement on _that_ is gonna be _huge_!      ;+)

-bowerbird
From marcello at perathoner.de  Tue Oct 19 08:28:43 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 08:28:49 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <1e4.2d39426d.2ea68b0a@aol.com>
References: <1e4.2d39426d.2ea68b0a@aol.com>
Message-ID: <417532AB.9000400@perathoner.de>

Bowerbird@aol.com wrote:

> you can join in and see the elusive software yourself
> by e-mailing:  zml_talk-subscribe@yahoogroups.com

You said it was Open Source. Then why don't you just mail me the sources 
or even better, post a link to the sources tarball, so everybody can see?

Or are you going back on that?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Oct 19 09:33:57 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 09:34:13 2004
Subject: [gutvol-d] Re: Extra spaces in html files
Message-ID: <197.316bf610.2ea69bf5@aol.com>

karl said:
>   "Save as HTML" normally is not good enough.

well, that tells me what's "not good enough".

but it doesn't tell me about what "good" is...


>   Why do you ask?

because i want to know what you think.

and, as i said, i think the conversation here would benefit
from creating a _standard_ that we can use to _evaluate_
the output that we expect from the methodology we adopt.

if a procedure can create an .html version and a .pdf version
-- and whatever other versions we decide are necessary --
that meet this standard, then we know we've got a winner...


>   We can keep the old file unchanged for the time being.  
>   XML produced by http://www.pgdp.net/ 
>   is good enough to work with.

ok, i'll take you word for it on that.

so i can take an e-text, run it through some converter
located somewhere on the site (where, exactly, is it?),
and come out with some x.m.l., if i understand correctly.

if i do, then the question as to how you get the entire library
converted over is answered -- run all the e-texts through this.

and thus the conversion to x.m.l. is simple.  (if i understand you.)

and then what?

how do i turn that x.m.l. file into an .html file?  into a .pdf?
back into a plain-text file?  (for looking "ahead" to the time
when the x.m.l. file, as the "master", is the only one retained.)
i know the standard answer is through x.s.l.t. conversions,
but how does a person step through those conversions today?

there is also the question of _maintaining_ the x.m.l. file --
entailing things like editing errors out of it, updating it, etc.
where do we get volunteers who have the expertise to do that?
a quick review at x.m.l.-coding -- i'm looking at a .tei version
of alice in wonderland from marcello -- reveals it is complex,
definitely not the type of thing you could entrust to most people.
since the whitewashers are even now at the point of overload
and burnout, just from the task of verifying the submissions of
distributed proofreaders, who will be responsible for _this_?


>   For converting TEI XML to HTML and PDF 
>   you can use Sebastian Rahtz' XSL stylesheets:
>   http://www.tei-c.org/Stylesheets/teixsl.html

thanks, that's good info.  would you please take some e-texts
-- you can choose any you want -- and convert them to x.m.l.
and do the output conversion to .html and .pdf for us please?

that way, we can subject these output-files to evaluation...


>   I'm old fashioned and like playing with DSSSL tools 
>   (that's all in German and not that polished nor finished 
>   -- take it as a proof of concept):
>    http://www.gnu.franken.de/Tieck/
>    http://www.gnu.franken.de/Tieck/Dokumente/Koepke/

i don't know what "dsssl" is, or understand german, but
i'll go take a look at those websites to see what i see...

in the meantime, will you generate those samples please?
(or feel free to point us to some that you've already done.)

-bowerbird

p.s.  i've jotted down a few of the elements that _i_ think
are essential to an electronic-book, and which should be
included in any "standard" that we create, and will post that...
From marcello at perathoner.de  Tue Oct 19 09:49:41 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 09:49:48 2004
Subject: [gutvol-d] How new technology scares away users
Message-ID: <417545A5.5060105@perathoner.de>

This is a video that turned up in the TEI list about the problems users 
have with all those new-fangled publishing technologies.

Its in Danish but you still get the gist.

   http://homepages.nyu.edu/~mz34/helpdesk.WMV


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Oct 19 09:56:27 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 09:56:41 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <e2.3ebe5bb.2ea6a13b@aol.com>

marcello said:
>   You said it was Open Source. 

my open-source effort failed to attract any programmers.
so it's dead in the water.  (oh, and here's a hint, marcello:
if you see me issuing a _press_release_, it's likely a joke,
as i make fun of people who think "spin" is doing something.
if it's a pro-x.m.l. press release to boot, you can be certain.)

but yeah, no programmers = no open-source program...

however, jon noring's open-source project is still alive --
   http://www.openreader.org
-- and i can assure you that that one is _not_ a put-on.
(which doesn't mean it will eventually succeed either.)
so i would encourage any interested people to join that.
i'm guessing jon would welcome programmers _warmly_.

meanwhile, my own individual viewer-program is fine!
did i mention that i'm now getting the beta-test going?
you can subscribe!  zml_talk-subscribe@yahoogroups.com


>   Or are you going back on that?

many open-source proposals never get off the ground.
just a fact of life.  thanks for your interest, however.
but now i must get back to the discussion in progress...

-bowerbird

p.s.  i'm not "soooo sensitive", as you put it.  if name-calling
_is_ condoned on this list -- i still haven't got an answer --
then that's just _fine_ with me, because there are a couple
tarking naugshlocks that i'd be happy to call names...     :+)
you see, i am able to maintain a _sense_of_humor_ in this.
i just want to make sure that i understand the ground-rules...
From hacker at gnu-designs.com  Tue Oct 19 10:02:04 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 10:02:41 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <197.316bf610.2ea69bf5@aol.com>
References: <197.316bf610.2ea69bf5@aol.com>
Message-ID: <Pine.LNX.4.61.0410191247470.21990@aphrodite.gnu-designs.com>


> so i can take an e-text, run it through some converter located 
> somewhere on the site (where, exactly, is it?), and come out with 
> some x.m.l., if i understand correctly.

 	Can we start using proper acronyms here? The industry accepted 
term you want to be using here is XML, not "x.m.l.", unless by "x.m.l" 
you mean some other format which is not XML.

> how do i turn that x.m.l. file into an .html file?  into a .pdf? 
> back into a plain-text file?  (for looking "ahead" to the time when 
> the x.m.l. file, as the "master", is the only one retained.) i know 
> the standard answer is through x.s.l.t. conversions, but how does a 
> person step through those conversions today?

 	You use an XSLT.

> there is also the question of _maintaining_ the x.m.l. file -- 
> entailing things like editing errors out of it, updating it, etc. 
> where do we get volunteers who have the expertise to do that?

 	Create a tool that can go from PG etext, in "normalized" 
format to PG's accepted version of an XML document of that PG work.

> thanks, that's good info.  would you please take some e-texts -- you 
> can choose any you want -- and convert them to x.m.l. and do the 
> output conversion to .html and .pdf for us please?

> that way, we can subject these output-files to evaluation...

 	I thought your tool did exactly this. Am I mistaken?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From ciesiels at bigpond.net.au  Tue Oct 19 10:02:41 2004
From: ciesiels at bigpond.net.au (Michael Ciesielski)
Date: Tue Oct 19 10:02:55 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <e2.3ebe5bb.2ea6a13b@aol.com>
References: <e2.3ebe5bb.2ea6a13b@aol.com>
Message-ID: <417548B1.5080305@bigpond.net.au>

Bowerbird@aol.com wrote:

>meanwhile, my own individual viewer-program is fine!
>did i mention that i'm now getting the beta-test going?
>you can subscribe!  zml_talk-subscribe@yahoogroups.com
>  
>

Your beta test program isn't 'fine'. It's the most counter-intuitive 
software I've ever used. There's also the itty-bitty issue of it not 
being able to *open* files, save your preselected texts, the sources of 
which are conveniently hidden.

The 'talk' list is farcial, as all posts must be approved by bowerbird.

Mike

Email me off-list if you'd like a copy of this rancid pudding without 
surrendering your soul/Yahoo ID.
From Bowerbird at aol.com  Tue Oct 19 10:12:27 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 10:12:42 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <1a7.29e5ae04.2ea6a4fb@aol.com>

mike said:
>   Email me off-list if you'd like a copy of this
>   rancid pudding without surrendering your soul/Yahoo ID.

alright, mike!

i haven't even released my app yet,
and it's already being _bootlegged_!

that makes me feel all warm and fuzzy inside...      :+)

thankyouthankyouthankyou...

-bowerbird

p.s.  i'll answer the rest of mike's post next week (or next month),
but this part was just too good to pass up...        ;+)
From hacker at gnu-designs.com  Tue Oct 19 10:14:15 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 10:14:41 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <e2.3ebe5bb.2ea6a13b@aol.com>
References: <e2.3ebe5bb.2ea6a13b@aol.com>
Message-ID: <Pine.LNX.4.61.0410191313100.21990@aphrodite.gnu-designs.com>


> but yeah, no programmers = no open-source program...

 	Aren't you the programmer in this case? You get to choose the 
license, as the person responsible for actually creating the project 
and writing the code. Or are you asking for others to write the code?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From marcello at perathoner.de  Tue Oct 19 10:14:41 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 10:14:49 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <197.316bf610.2ea69bf5@aol.com>
References: <197.316bf610.2ea69bf5@aol.com>
Message-ID: <41754B81.6070800@perathoner.de>

Bowerbird@aol.com wrote:

> thanks, that's good info.  would you please take some e-texts
> -- you can choose any you want -- and convert them to x.m.l.
> and do the output conversion to .html and .pdf for us please?

Why don't you go ahead and publish the source code for the "Open Source" 
ebook reader program you announced on 14 Feb 2003 and which has been 
almost in beta stage ever since?

Instead of burdening your homework onto other volunteers?

If you want to see the output from those stylesheets you can run them 
yourself. Contrary to your vapourware reader Sebastians stylesheets are 
working and put up for download.


 > that way, we can subject these output-files to evaluation...

If you want to evaluate, go to

    http://www.gutenberg.org/tei/examples/

there are TEI source + HTML, PDF, TXT and PalmDoc generated versions for 
Alice in Wonderland and Life on the Mississippi ready to download.

Of course, you'll also find the sources for the conversion tools there, 
under GPL.

Of course, you'll also find there an online utility to convert from TEI 
to HTML, PDF, TXT and PalmDoc.

Of course, you'll also find there a manual explaining how to mark up 
your text so they work best with the conversion utilities.


> in the meantime, will you generate those samples please?
> (or feel free to point us to some that you've already done.)

In the meantime will you roll a tarball of your "rancid pudding"* reader 
sources and post them please?

The only thing which hasn't made an inch of progress in 20 months is 
your reader program. But maybe that's the reason you want to keep 
everybody else too from doing useful stuff.


* "rancid pudding": endearing epithet uttered by a beta-tester of this 
reader.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Tue Oct 19 10:29:00 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 10:29:08 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <e2.3ebe5bb.2ea6a13b@aol.com>
References: <e2.3ebe5bb.2ea6a13b@aol.com>
Message-ID: <41754EDC.9000406@perathoner.de>

Bowerbird@aol.com wrote:

> my open-source effort failed to attract any programmers.
> so it's dead in the water.  (oh, and here's a hint, marcello:
> if you see me issuing a _press_release_, it's likely a joke,

Let me get this straight:

1. 14 Feb 2003: You announce you will code an open source ebook reader.

2. 19 Oct 2004: You have nothing to show.

3. You retroactively declare the announcement to be a joke.

4. You think that did save your face.

Think again.


> bowerbird: 14 Feb 2003 press release
> bowerbird further indicated that he is fully confident that
> the effort would bear fruit quickly, since he has previously
> programmed a wide variety of electronic-book applications.

 > bowerbird: 19 Oct 2004
> but yeah, no programmers = no open-source program...

So you admit you were lying to the press as you told them you were 
confident you could pull off this thing because you were such a good 
programmer yourself.


> meanwhile, my own individual viewer-program is fine!
> did i mention that i'm now getting the beta-test going?
> you can subscribe!  zml_talk-subscribe@yahoogroups.com

Yeah. I heard from one of your beta-testers: "Rancid pudding".


>>  Or are you going back on that?
> 
> many open-source proposals never get off the ground.

So you just switched to closed source.

Why should I be even remotely interested in a reader that:

  - is closed source
  - uses a proprietary non-standard file format
  - has got "rancid pudding" as best review to date?

If I wanted that, minus the bad review, I would take the Micro$oft 
Reader, which is ready and working today.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Oct 19 10:56:54 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 10:57:08 2004
Subject: [gutvol-d] Re: Extra spaces in html files
Message-ID: <f6.43244b0c.2ea6af66@aol.com>

david said:
>   Can we start using proper acronyms here? 
>   The industry accepted term you want to be using here is XML, 
>   not "x.m.l.", unless by "x.m.l" you mean some other format 
>   which is not XML.

well, i'm not much for "industry accepted terms",
but the _real_ reason i put the periods in is that
with any acronym, when you use all-lower-case,
as i do, using periods helps reader comprehension.
i even do it with z.m.l., so it's no slam on x.m.l., ok?
if it bothers you, just ignore it.

***

i said:
>   >   there is also the question of _maintaining_ the x.m.l. file -- 
>   >   entailing things like editing errors out of it, updating it, etc. 
>   >   where do we get volunteers who have the expertise to do that?

david said:
>   Create a tool that can go from PG etext, in "normalized" format 
>   to PG's accepted version of an XML document of that PG work.

i'm not clear what you're saying here.
the question is the difficulty of maintaining an x.m.l. file.
you would transform the x.m.l. file to do maintenance on it?

also, there is no "normalized" format, nor is there an "accepted 
version" of an x.m.l. document of the corresponding p.g. e-text.
(but there are lots of contenders for that position.)


>   I thought your tool did exactly this. Am I mistaken?

it will.  and when it does, i'll submit those files for evaluation.
in the meantime, i'm asking karl if he can do it with x.m.l. now.
everyone talks about it like it's a developed, stable technology.
so i'm wondering what the hold-up is...

-bowerbird
From Gutenberg9443 at aol.com  Tue Oct 19 11:17:53 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Tue Oct 19 11:18:07 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <14.362c6fea.2ea6b451@aol.com>

 
In a message dated 10/18/2004 4:19:45 PM Mountain Standard Time,  
Bowerbird@aol.com writes:

>>i don't know anything about that.

>>i'm very  supportive of people who will
>>have the guts to publish something  and
>>take a chance at being dragged into court,
>>if they  are making that thing _available_
>>when it was an orphan out of  circulation.

>>but again, i don't know about  blackmask,
>>so i don't know if that applies, or  not...


I agree. Two of the series involved were Doc Savage
and The Shadow. Other people had posted all of
these, and David got them from their sites and 
reposted them at Blackmask (I think he did make 
other arrangements for the last Doc Savage books, 
which had not been posted when the decision 
was handed down.). At this point, it was a public service, 
and I appreciated it very much. However, several months
ago courts ruled that Conde Nast owned both
series. Whether Conde Nast DESERVES to own 
the copyrights, having acquired them by buying 
a company that had bought another company 
and so on for several steps, is now a moot 
point. Conde OWNS them. I know for a fact, 
having been in touch with Conde Nast, that 
Conde Nast intends to rerelease them, and 
expects to finalize agreements on how
to do that by the end of the year.
 
When the court case wound up, the people doing the
original posting immediately pulled their sites down.
David continued to keep them on his. It is
arguably still a public service, because the titles
still aren't available commercially. But legally,
he is bucking copyright. SO FAR Conde Nast
has not gone after anybody, and is being as
considerate as legally possible with those who 
kept the books alive. But sooner or later, if David 
doesn't remove them from his site, Conde Nast 
WILL go after him.
 
I am trying to convince Conde Nast that, as they
wait for whatever it takes to get them all in print,
that they allow downloads from Blackmask and
from FictionWise, my favorite commercial e-
book publisher, for the nominal sum of a dollar
apiece, and that they take down the dollar
version when the new version is ready on each
book. I don't know yet what they're going to wind
up doing.
 
However, because of my personal knowledge of
this particular situation, I would really hesitate to
post any of Blackmask's titles without finding out
for myself what the copyright status is. Project
Gutenberg MUST comply with the law, no matter
what other people do or don't do.
 
As a writer, I agree a hundred and ten percent
that books should be kept alive, legally if possible,
by piracy otherwise. I don't WANT my books to die
when I do. All the same, once a copyright has been
established and firmly assigned to person or
corporation A, it becomes illegal for person or
corporation B to continue to traffic in that book,
even if the trafficking is free.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/919c3ee0/attachment.html
From hacker at gnu-designs.com  Tue Oct 19 11:19:28 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 11:20:43 2004
Subject: [gutvol-d] Re: Extra spaces in html files
In-Reply-To: <f6.43244b0c.2ea6af66@aol.com>
References: <f6.43244b0c.2ea6af66@aol.com>
Message-ID: <Pine.LNX.4.61.0410191413210.32163@aphrodite.gnu-designs.com>


> well, i'm not much for "industry accepted terms", but the _real_ 
> reason i put the periods in is that with any acronym, when you use 
> all-lower-case, as i do, using periods helps reader comprehension. i 
> even do it with z.m.l., so it's no slam on x.m.l., ok? if it bothers 
> you, just ignore it.

 	But you continue to refer to HTML as such, and not h.t.m.l, 
and PDF as such instead of p.d.f. Can we try to be consistent?

 	For readers (including myself) who is trying to grasp the 
broad number of projects that people are creating and using to work 
with PG texts, it makes it difficult when doing research on "ZML" 
brings up a completely unrelated series of projects. Similarly, 
searching for "z.m.l." in this case, doesn't surface anything 
relevant.

> i'm not clear what you're saying here. the question is the 
> difficulty of maintaining an x.m.l. file. you would transform the 
> x.m.l. file to do maintenance on it?

 	And I'm saying DON'T maintain an XML file.

 	Maintain the text, in a structured, normalized format (i.e. 
add some parameters by which paragraphs can be spaced, quotes can be 
used, etc., ala LaTeX).

> also, there is no "normalized" format, nor is there an "accepted 
> version" of an x.m.l. document of the corresponding p.g. e-text. 
> (but there are lots of contenders for that position.)

 	XML is infinitely extensible (hence the 'X' in the acronym). 
This means two completely independent authors can use their own XML 
formatting and rules, and yet both can output completely compatible 
formats from their own transformations. That is the whole point of 
XML. XML is nothing more than a container, an empty bucket. XML does 
absolutely nothing on its own.

> it will.  and when it does, i'll submit those files for evaluation. 
> in the meantime, i'm asking karl if he can do it with x.m.l. now. 
> everyone talks about it like it's a developed, stable technology. so 
> i'm wondering what the hold-up is...

 	From what I read in recent posts, others are wondering the 
same about your tool, and what it purports to produce. If you have it 
in beta test already, why not submit what kinds of files IT can 
produce, and let others compare that output to their own versions of 
what their own tools can produce.

 	The point is, turf-wars, name-calling, and vaporware projects 
aren't adding value to the overall goals of PG, as I read them. That 
includes my tools as well, but mine aren't really along the same sort 
of strategic direction as PG's overall goals. They just happen to 
intersect at several places.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From Gutenberg9443 at aol.com  Tue Oct 19 11:22:02 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Tue Oct 19 11:22:20 2004
Subject: [gutvol-d] re: poofing and tarking
Message-ID: <13e.4135e4d.2ea6b54a@aol.com>

 
In a message dated 10/18/2004 9:14:03 PM Mountain Standard Time,  
Bowerbird@aol.com writes:

but if  you think they _are_ "obnoxious",
anne, then why do you read  them?


The same reason I read all the other posts.
I need to know what's going on. This statement
from you is rather ingenuous, don't you think,
when you are deliberately being obnoxious.
 
If goody-goody-goody-I-have-the-solution-to-
your-problem-but-I-won't-share-it-with-you-
because-I-don't-like-you isn't being deliberately
obnoxious, I don't know what is. Lead, follow,
or get out of the way.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/91c819c2/attachment.html
From Gutenberg9443 at aol.com  Tue Oct 19 11:49:27 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Tue Oct 19 11:49:47 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <e6.5b7cdcf5.2ea6bbb7@aol.com>

 
In a message dated 10/19/2004 12:16:07 AM Mountain Standard Time,  
Bowerbird@aol.com writes:

Doing  fully automatic convertion to good paged PDFs for 
>    printing nice copies (and I mean good, as different from workable)  
>   will probably always remain a  dream


This is A goal. It is not, and cannot be, THE goal.
It would be great to have everything in printable
PDF for people who want printable PDF. If
you want to keep ten thousand books on your
computer, printable PDF isn't worth the end
product of bovine digestion.
 
I loathe PDF. I'm sure I'm not the only person 
who uses Gutenberg who is in my situation: 
I'm going blind--slowly, fortunately, unlike a 
neighbor who went blind overnight--and I can't get
PDF documents on my Rocket, which means
that as my vision continues to deteriorate I'm
going to have to read sitting in front of my
computer if I want to read something that is
not available in a format I can convert to text
or HTML in order to convert it to Rocket.
 
I agree with Michael. Post everything in TXT
first AND THEN do anything else you want to
do with it. I believe that is one of the goals of
of the DP team, which has all the scanned
pages on computer to work from. HTML,
even the "Save As" kind of HTML, can
maintain formatting if you tell it to; I know
because I've done it often.
 
A basic problem in this entire discussion is that there
are a lot of people here who are program-happy,
as opposed to computer-happy. I'm computer-
happy, but like the vast majority of people who
use Gutenberg, I'm really not interested in umpteen
different programs. I just want a book I can read.
As a scholar, I might at times need the specific
coding which will tell me what used this punctuation
mark or that whatever that doesn't come across on
txt, but if I need that, I can obtain the book someway 
and reinsert the punctuation and formatting and
whatever.
 
The village schoolmaster in a third world village, who
has two hours of electricity a day, one cellular phone
for the entire village, and an obsolete laptop donated
to him by a first world company with a connection
from the phone to the laptop cobbled together by
a gadget-minded Peace Corps volunteer or church
or UN aid worker, doesn't give a squiddly about
umlauts and grave accents. He just wants BOOKS
that he can READ to his students during the
two hours a day that the electricity is on.
 
The cowboy who's going to be stuck all winter in
a back-country cabin looking after a herd of cattle
in a snowed-in high pasture, or the astronaut,
or the submariner, or the scientists in a South Pole 
research station, or the kids going to bush-
school on the radio in Australia or Alaska--these
people don't need pretty pages. THEY NEED
BOOKS. They need good books. That's all.
 
If we go back to the very basics, this is the goal of
Project Gutenberg. It is no mistake that the very first 
things Michael posted were the most important 
documents of freedom. An educated populace
can be kept enslaved for only so long, and
then the privy hits the fan.
 
We are the world's free public library. We do not
serve, nor do we even NEED to serve, the
few people in elite professions who want,
and need, to be able to account for
every comma and every umlaut. People
who are arguing their heads off about ten
different ways to format are losing sight 
of the goal. It is hard to remember that your
goal was draining the swamp if you are
up to your a** in alligators. Stop creating
alligators. If YOU--whoever YOU happens
to be--want to create all kinds of pretty
formats, do it. That's grand. But don't try
to inflict your vision on all of PGLAF.
 
The TXT versions MUST come first. Then
people can be joyfully reading the new
books, while other people create other
formats for those nice new books.
 
Now can we go back to draining the swamp?
Notice I said "can," not "may." We Ph.D.s in
English know our grammar. I MEANT "can."
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/770e2424/attachment-0001.html
From shalesller at writeme.com  Tue Oct 19 12:13:23 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct 19 12:26:14 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com>

> The village schoolmaster in a third world village, who
> has two hours of electricity a day, one cellular phone
> for the entire village, and an obsolete laptop donated
> to him by a first world company with a connection
> from the phone to the laptop cobbled together by
> a gadget-minded Peace Corps volunteer or church
> or UN aid worker, doesn't give a squiddly about
> umlauts and grave accents. 

Of course he does. How on Earth can he teach German or
French, or expect his students to read a book in a language
they are familiar with (in large parts of Africa, that
would be French), without the proper umlauts and grave
accents?
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jeroen at bohol.ph  Tue Oct 19 12:49:09 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Tue Oct 19 12:49:22 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com>
References: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com>
Message-ID: <41756FB5.3090702@bohol.ph>

D. Starner wrote:

>Of course he does. How on Earth can he teach German or
>French, or expect his students to read a book in a language
>they are familiar with (in large parts of Africa, that
>would be French), without the proper umlauts and grave
>accents?
>  
>
Even worse, many African languages are written with the Latin alphabet, 
but using additional letters, such as an F with a curl, which, until 
very recently weren't supported by most computers or typewriters, and 
thus conveniently replaced by their nearest counterparts. You could have 
a look at this nice page on the Gentium font, which is really nice, and 
was developed with support for african languages in mind. 
(http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=Gentium)

Support for most Indian languages is only widely available since Windows 
and Office XP, and many less widely used languages are still not 
supported, let alone on the old hardware we donate (I decided recently 
to increase my bottom line from Pentium 90 to Pentium II 266 for 
machines I donate to schools in the Philippines, the latter can just run 
windows 2000 with Unicode support.)

Jeroen.

From jeroen at bohol.ph  Tue Oct 19 12:51:12 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Tue Oct 19 12:50:51 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <e6.5b7cdcf5.2ea6bbb7@aol.com>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com>
Message-ID: <41757030.1020209@bohol.ph>

Gutenberg9443@aol.com wrote:

>  
> The TXT versions MUST come first. Then
> people can be joyfully reading the new
> books, while other people create other
> formats for those nice new books.
>  

> Anne


For a novel, you may be right, but for complex texts, like most teaching 
materials require something more than that: illustrations, tables, maybe 
formulas, etc. HTML is really the bottom line here. Also, since it is 
normally easier to throw something away than to add, I prefer to go to 
XML first, and then create HTML and Text from that.

Jeroen.

From marcello at perathoner.de  Tue Oct 19 12:52:43 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 12:52:54 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <e6.5b7cdcf5.2ea6bbb7@aol.com>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com>
Message-ID: <4175708B.3080800@perathoner.de>

Gutenberg9443@aol.com wrote:

> The village schoolmaster in a third world village, who
> has two hours of electricity a day, one cellular phone
> for the entire village, and an obsolete laptop donated
> to him by a first world company with a connection
> from the phone to the laptop cobbled together by
> a gadget-minded Peace Corps volunteer or church
> or UN aid worker, doesn't give a squiddly about
> umlauts and grave accents.

True, if he happens to speak or teach English.

If he happens to speak or teach any other language of the world he will 
care very much for accents, squigglies and umlauts. I wouldn't want to 
teach my pupils eg. French from a book without accents.

And don't start me about schoolmasters who speak or teach Chinese, 
Korean, Japanese, Hebrew, Arab, Vietnamese, Thai etc.

Actually the 7bit craze at PG at some point went so far as to convert 
Chinese etexts to 7bit, completely mangling the text. Now suppose some 
Chinese reader did actually download one of those garbled texts and 
tried to make it work. Not good for the image of PG. Fortunately those 
bogus files have been tossed out since.

Personally I hold that all the 7bit files of foreign books are useless 
and dangerous because people may get hold of them instead of the 8bit 
files they can use.

Notice I said "can," not "could."


> He just wants BOOKS
> that he can READ to his students during the
> two hours a day that the electricity is on.

In this case he should trade the notebook against a PDA with solar cell 
charger. Then he could read 24 hours a day.

Ah, and, of course, he would have to download the HTML or PDB file, 
because the hard-wrapped TXT files are very hard to read on a PDA.


> The cowboy who's going to be stuck all winter in
> a back-country cabin looking after a herd of cattle
> in a snowed-in high pasture, or the astronaut,
> or the submariner, or the scientists in a South Pole 
> research station, 

Acually the South Pole research stations have a pretty fat pipe and 
plenty of the latest and greatest in computer gadgets. Wish I had.


> If we go back to the very basics, this is the goal of
> Project Gutenberg. It is no mistake that the very first 
> things Michael posted were the most important 
> documents of freedom.

You are very America-centric, aren't you? The most important documents 
of freedom are those of the French revolution (with accents). And if I 
were Chinese I would probably hold that the most important etc. etc. is 
Maos Red Book (unicode). Most importancy is relative.

Michael posted those first because the computer he was on couldn't hold 
any longer texts.


> An educated populace
> can be kept enslaved for only so long, and
> then the privy hits the fan.

Don't kid yourself. You can fool almost all of the people almost all of 
the time. The rest you shoot.

Or, how do you explain that the most "civilized" countries of this world 
still use war as an instrument for "solving" international conflicts.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Tue Oct 19 13:06:36 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 19 13:06:47 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <41756FB5.3090702@bohol.ph>
References: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com>
	<41756FB5.3090702@bohol.ph>
Message-ID: <417573CC.7020002@perathoner.de>

Jeroen Hellingman wrote:

> Support for most Indian languages is only widely available since Windows 
> and Office XP, and many less widely used languages are still not 
> supported, let alone on the old hardware we donate (I decided recently 
> to increase my bottom line from Pentium 90 to Pentium II 266 for 
> machines I donate to schools in the Philippines, the latter can just run 
> windows 2000 with Unicode support.)

Then Linux should run just fine.

It has also the advantage of not tying those countries into harmful 
financial dependencies.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Oct 19 13:46:50 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 13:47:13 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <a0.199ecc11.2ea6d73a@aol.com>

here's my "first draft" on specific elements of a well-done e-book.

for a similar analysis, on behalf of distributed proofreaders, see
      http://dave.maddockfamily.org:81/dp/htmlspec/proposal.html

i would particularly like to hear from jon ingram on this topic,
as he's done some rather nice .html files for project gutenberg.

note that these aspects are for _books_, albeit a large range of them.
magazines and newspapers present additional interesting challenges.
(ingram could comment on those, too, because he's done magazines.)

these are the criteria _i_ use to evaluate the quality of an e-book.

please feel free to add anything you feel is missing,
or challenge anything you believe to be unnecessary...

-bowerbird


_____ headers and links

headers should be big and/or bold, and start on a new page

multiple levels of headers should be exposed/hidden at will.
headers at different levels should be sized differently.

a table of contents should be created (preferably automatically)
(ditto a table of illustrations, footnotes, tables, when applicable)

headers should be hotlinked from a table of contents,
and they should hotlink _back_ to the table of contents

headers should hotlink to the previous header, and the next

internal references to a header (see chapter 2) should be hotlinked

any fully-specified u.r.l. should be a hotlink to that website


_____ other typography

block-quotes should be indented, maybe set off in a box

tables should look "nicely done", maybe set off in a box

the title-page and front-matter should look presentable

note indicators should be linked to the note itself,
and the note should be backlinked to the indicator

index items should be linked to the place in the text,
and a backlink should be made as well, if at all possible

"also see" references in an index should hotlink appropriately

images should be viewable and resizeable at will

widow/orphan control is essential

display of line-numbers in poems should be optional

special treatment of each character's dialogue in a play
(e.g., each rendered in a distinct color) should be an option

if the e-book is replicating an existing paper-book,
then the page-numbers from that p-books should be
available, with a user option as to their display, and
the e-book display should be able to mimic the p-book

if the e-book is replicating an existing paper-book, then
the user should be able to print it and duplicate the p-book,
_but_also_ be able to change any print parameters at will


_____ things about the viewer-program...

fast-loading, responsive, customizable, 1-page or 2-page display

pagesize, font, fontsize, leading, text-color, background-color,
significant lines and strings highlighted, annotations possible,
justification (vertical and horizontal) at the choice of the user
From jeroen at bohol.ph  Tue Oct 19 14:48:49 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Tue Oct 19 14:48:03 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <1d4.2d17520f.2ea60b0d@aol.com>
References: <1d4.2d17520f.2ea60b0d@aol.com>
Message-ID: <41758BC1.8040300@bohol.ph>

Bowerbird@aol.com wrote:

>     
>     however, a less-complex subset
>     -- called t.e.i.-lite -- is available,
>     and that is what i recommend...
>
>  
>
You do have a curious way of avoiding capital letters... :-)

After al my objections against XML and TEI, you may wonder why I still 
recommend to use TEI lite is that it forms a very decent base to start 
some structurial tagging with -- you don't need the full 1400 pages of 
TEI to get started with it, and you also don't need to reinvent the 
wheel, and come up with some alternative, equally simple scheme. Doing 
this has the added benefit that for those text that require it, you can 
easily step up, and work with the full set, if so required or desired.

If you're just against using angled brackets, they are simple to use and 
understand by both humans and computers. You can do more fancy tricks to 
make marked-up texts look more like plain text, but attempts to do so, 
both by TeX or SGML add considerable complexity to the reader -- both 
machine _and_ human. XML has one thing for it, and that is its 
simplicity (and that some people build complicated things on it, such as 
namespaces, XSLT, etc., that require a course in computer-science could 
be quite hidden from most users.)

You can ofcourse object out of principle against something 1400 pages 
thick, but that is unavoidable, given the complexity and wide diversity 
of books that have been published in the 500+ years since Gutenberg's 
invention.

Since much of the difficult stuff of XML will eventually be hidden from 
users. Future versions of layout programs will probably be able to read 
a thing coded in TEI directly (doing an XSLT transform to some internal 
format), and format it nicely according to some defaults. You can then 
apply all the required formatting tweaks to it, export to some nice 
lay-out format (XSL-FO, maybe, PDF, or who knows), and safe all your 
nice tweaks, linked to your original TEI, so you have best of both worlds.

I already have numerous benefits from working in XML, in that I can 
generate nice HTML files (that often need no touch-up at all) and 
reasonable plain ASCII for PG, but also have spelling checking on a per 
language base, extract all fragments in a certain language, create 
tables of contents, etc. on the fly, extract dublin core bibliographic 
records, and more.

Jeroen.
From shalesller at writeme.com  Tue Oct 19 15:19:07 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct 19 15:20:05 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com>

Bowerbird@aol.com writes:
> headers should be big and/or bold, and start on a new page 
> headers at different levels should be sized differently.

Headers should be headers, and defined by the user-agent.
 
> multiple levels of headers should be exposed/hidden at will. 

I don't understand this one. But don't bother responding, because
I know better then to expect an explanation from you. 
 
> (ditto a table of illustrations, footnotes, tables, when applicable) 

Why a table of footnotes? Footnotes are designed to be of minimal
interest.
 
> headers should be hotlinked from a table of contents, 
> and they should hotlink _back_ to the table of contents 

Don't waste time linking them back; Bowerbird's the only
one who wants that feature.

> headers should hotlink to the previous header, and the next 

Why, especially if they're linked to the table of contents?

> any fully-specified u.r.l. should be a hotlink to that website 

I thought you weren't concerned with features less than 1% of PG's
books have?  
 
> _____ other typography 

Typography is not something we should worry about.
 
> block-quotes should be indented, maybe set off in a box 

block-quotes shouldn't be set off in a box; I've never
seen a book that did that, and it would usually provide too\
much emphasis.

> tables should look "nicely done", maybe set off in a box 
> 
> the title-page and front-matter should look presentable 

No, they should look poorly done and unpresentable. And again,
tables shouldn't be set off in a box, for the same reasons
block-quotes shouldn't be.

> index items should be linked to the place in the text, 
> and a backlink should be made as well, if at all possible 

A backlink from where? And why? I think we should use links
only where they are explicit or at least loudly implicit in
the original work.
 
> images should be viewable and resizeable at will 

Again, of course they should be viewable, and resizeable at
will both strikes me as largely gratitious and user-interface
dependent.
 
> widow/orphan control is essential 

Again, this is all about the user-interface. And I can hardly
say that it's essential; it strikes me as a feature almost 
pointless in online reading.

> display of line-numbers in poems should be optional 

Again, this is all about the user-interface. Again, why does it
matter? Do we really have a whole lot of people desperately
crying out not to see the line-numbers on their poetry?

> special treatment of each character's dialogue in a play 
> (e.g., each rendered in a distinct color) should be an option 

Again, this is all about the user-interface. The underlying
principle is sound.

> if the e-book is replicating an existing paper-book, 
> then the page-numbers from that p-books should be 
> available, with a user option as to their display, and 

Yes.

> the e-book display should be able to mimic the p-book 
> 
> if the e-book is replicating an existing paper-book, then 
> the user should be able to print it and duplicate the p-book, 

"Duplicate" the physical book? We aren't preserving nearly
enough information to do that. Long-s, line endings, and
font information is just the start.

> _but_also_ be able to change any print parameters at will 

Again, uselessly vague. What print parameters do you want to
be able to change?
 
> _____ things about the viewer-program... 
> 
> fast-loading, responsive, customizable, 1-page or 2-page display 
> 
> pagesize, font, fontsize, leading, text-color, background-color, 
> significant lines and strings highlighted, annotations possible, 
> justification (vertical and horizontal) at the choice of the user 

Sure, whatever. That can all be left to the viewing program's authors,
and none of it is exactly revolutionary.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Tue Oct 19 16:02:21 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 16:02:41 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1ea.2d89e678.2ea6f6fd@aol.com>

jeroen said:
>   You do have a curious way of avoiding capital letters... :-)

yes i do.       :+)

thank your for this additional post, which
makes your analysis even more even-handed.

-bowerbird
From jgruber at tampabay.rr.com  Tue Oct 19 16:26:42 2004
From: jgruber at tampabay.rr.com (Joseph R. Gruber)
Date: Tue Oct 19 16:26:17 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <1a7.29e5ae04.2ea6a4fb@aol.com>
Message-ID: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com>

For anyone that wants this <quote>rancid pudding</quote> you can grab it
from:

http://www.josephgruber.com/pudding0727-exe.zip
http://www.josephgruber.com/pudding0727[1].osx.sit

Leech away...  Oh, btw, feel free to sue me or send a cease and desist to my
ISP.  It's BrightHouse Networks in case you need any help...

Joseph

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com
Sent: Tuesday, October 19, 2004 1:12 PM
To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com
Subject: re: Re: [gutvol-d] Why Bowerbird is a kook

mike said:
>   Email me off-list if you'd like a copy of this
>   rancid pudding without surrendering your soul/Yahoo ID.

alright, mike!

i haven't even released my app yet,
and it's already being _bootlegged_!

that makes me feel all warm and fuzzy inside...      :+)

thankyouthankyouthankyou...

-bowerbird

p.s.  i'll answer the rest of mike's post next week (or next month),
but this part was just too good to pass up...        ;+)
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From Bowerbird at aol.com  Tue Oct 19 16:37:02 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 16:37:20 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <1f5.12f2dac.2ea6ff1e@aol.com>

david said:
>   Aren't you the programmer in this case? 

nope, not for the o.e.b. reader-program.

i _am_ the programmer for my program,
which takes a text-file in z.m.l. format.
this is my own project as an individual;
i wrote it as a present for michael hart.

for the o.e.b. viewer, i would have volunteered
my services toward sheparding the project and
designing the user-interface and functionality.

but, as i said, since jon noring has now started
his own open-source effort to make a reader-app,
i would suggest programmers support him instead.

***

>   And I'm saying DON'T maintain an XML file.
>   Maintain the text, in a structured, normalized format 
>   (i.e. add some parameters by which paragraphs 
>   can be spaced, quotes can be used, etc., ala LaTeX).

that's what i'm saying, david.
and z.m.l. is my version of that
"structured, normalized format".
but the company line here is that
the x.m.l. file will be the master.


>   If you have it in beta test already, why not submit 
>   what kinds of files IT can produce

my program is a viewer-program.  it doesn't produce files.
it takes ordinary plain-text raw-ascii files as _input_
-- i.e., files just like the current e-texts in the library --
and displays them, giving the user the complete range of
functionalities they should expect in an electronic-book.

generating other versions of an e-text -- like .html --
(and i'll try to remember to preface "html" with a period)
is an add-on for a later version of the program.

the purpose of my program is to give the end-user all of
the _benefits_ of a marked-up file, without any markup.
to do that, my program has to figure out the _structure_
of the file on its own.  naturally, once it has _done_ that,
it will be relatively straightforward for it to churn out
an .html file (or an .rtf file) that reflects that structure.

as far as generating a .pdf, the end-user can do that by
printing the e-book to a .pdf driver.


>   The point is, turf-wars, name-calling, and vaporware projects 
>   aren't adding value to the overall goals of PG, as I read them. 

i agree.  but my program is not "vaporware".

and if i can -- as i say -- give the benefits of markup to end-users
without requiring project gutenberg to actually _do_ any markup,
then i think i'll add immense value to the overall goal of the project.

but i'm sure the x.m.l. advocates will _still_ want to do markup.
some people just _like_ doing things the hard way...       ;+)

-bowerbird
From Bowerbird at aol.com  Tue Oct 19 17:05:07 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 19 17:05:30 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <f5.440d4b6f.2ea705b3@aol.com>

joseph said:
>   http://www.josephgruber.com/pudding0727-exe.zip
>   http://www.josephgruber.com/pudding0727[1].osx.sit

alright, the bootleggers are going to work for me!

i'll have a new version up in the next few days,
be sure to check back regularly for that!       :+)

but to make sure your copy is virus-free,
you should download it from yahoogroups.

because if you get it from somewhere else,
someone _might_ have tampered with it...

just a word to the wise, since i know that
you p.c. people have a hard time with virii...

-bowerbird
From hacker at gnu-designs.com  Tue Oct 19 17:35:33 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 17:36:47 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <f5.440d4b6f.2ea705b3@aol.com>
References: <f5.440d4b6f.2ea705b3@aol.com>
Message-ID: <Pine.LNX.4.61.0410192031520.30110@aphrodite.gnu-designs.com>


> but to make sure your copy is virus-free, you should download it 
> from yahoogroups.

 	Virus-free? Aren't you distributing it in source form?

 	If you're not distributing source, running a non-descript 
binary, whether it is assured to be virus-free or not is a really 
stupid thing to do. I wouldn't trust the code without being able to 
audit/edit the source anyway.

> because if you get it from somewhere else, someone _might_ have 
> tampered with it...

 	Which is why a multitude of virus scanners exist, for those 
platforms that happen to be succeptable to these kinds of things.

> just a word to the wise, since i know that you p.c. people have a 
> hard time with virii...

 	A word to the wiser, "virii" is not a word[1].


[1] http://code.gnu-designs.com/plural-of-virus.html


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From stephen.thomas at adelaide.edu.au  Tue Oct 19 17:40:57 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Tue Oct 19 17:41:19 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <e6.5b7cdcf5.2ea6bbb7@aol.com>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com>
Message-ID: <4175B419.6030301@adelaide.edu.au>

Thanks Anne for reminding us all about the original objective of 
PG -- making texts available for people to read, wherever, and 
with whatever equipment.

OK, you've somewhat overstated the case, and I think by now we'd 
all agree that "8-bit" characters are important. But it is a 
shame that most of the geeks -- no offence, I count myself as 
one -- on this list, immediately skipped your main point to 
whine about the need for accents and foreign scripts. You guys 
can't seem to see the wood for the trees.

Personally, I've seen the debate about XML (not to mention 
z.m.l.) somewhere before -- oh, wait up, it was on THIS list, 
what, about eight months back? And didn't Jon go and set up a 
pgxml list for that discussion to continue? And didn't that list 
go strangely quiet shortly thereafter?

You can draw your own conclusions from that. Me? I decided that 
my own project -- building a library of high-quality HTML "web 
books" was more important than trying to get a room full of 
experts to agree on even basic things like should we use 
TEI-lite or invent our own DTD. Basically, Anne is right -- who 
cares about this stuff? Only the few enthusiasts on this list. 
Most users of PG don't go around grumbling about the lack of XML 
or the ability to output as PDF. They're just stoked to be able 
to find the text online.

And on the subject of PDF, I agree with Anne -- it sucks. Why? 
Well, apart from being too fuzzy to read on screen, it locks the 
user into a format that's chosen by the engine which created it. 
Want a different font or type size? Too bad, whoever wrote the 
XSLT decided that for you.

But create an HTML file, properly, and then the user can do what 
they like with it. Want to print it out in Georgia 24pt? No 
problem. Your choice.

Anyway ... think I'll go and convert a few more books now.

Steve


Gutenberg9443@aol.com wrote:

> ...
>  
> A basic problem in this entire discussion is that there
> are a lot of people here who are program-happy,
> as opposed to computer-happy. I'm computer-
> happy, but like the vast majority of people who
> use Gutenberg, I'm really not interested in umpteen
> different programs. I just want a book I can read.  ...

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From ke at gnu.franken.de  Tue Oct 19 18:38:52 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Tue Oct 19 19:35:44 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <4175B419.6030301@adelaide.edu.au> (Steve Thomas's message of
	"Wed, 20 Oct 2004 10:10:57 +0930")
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
Message-ID: <shacuiqic3.fsf@tux.gnu.franken.de>

Steve Thomas <stephen.thomas@adelaide.edu.au> writes:

> But create an HTML file, properly, and then the user can do what 
> they like with it. Want to print it out in Georgia 24pt? No 
> problem. Your choice.

HTML isn't the best choice if you are interested in printing.  Define
"properly created" :)  XML plus a customizable stylesheet (XSL or DSSSL)
is better.  For those who do not want to create a printable PDF file on
their own offer a pre-generated PDF file.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From ke at gnu.franken.de  Tue Oct 19 18:57:10 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Tue Oct 19 19:35:45 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <41758BC1.8040300@bohol.ph> (Jeroen Hellingman's message of "Tue, 
	19 Oct 2004 23:48:49 +0200")
References: <1d4.2d17520f.2ea60b0d@aol.com> <41758BC1.8040300@bohol.ph>
Message-ID: <sh6556qhhl.fsf@tux.gnu.franken.de>

Jeroen Hellingman <jeroen@bohol.ph> writes:

> Since much of the difficult stuff of XML will eventually be hidden from 
> users. Future versions of layout programs will probably be able to read 
> a thing coded in TEI directly (doing an XSLT transform to some internal 
> format), and format it nicely according to some defaults.

It already works for simple books using CSS; here is an example (a
text by Ludwig Tieck in German):

    http://www.gnu.franken.de/Tieck/Werke/dichterleben/

Sorry for the strange layout - it is just a test.  It works with Mozilla
1.6 and better.

Of course, the other feature you mentioned are more important.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From ke at gnu.franken.de  Tue Oct 19 19:08:36 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Tue Oct 19 19:35:48 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com> (D. Starner's
	message of "Tue, 19 Oct 2004 14:19:07 -0800")
References: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com>
Message-ID: <sh1xfuqgyj.fsf@tux.gnu.franken.de>

"D. Starner" <shalesller@writeme.com> writes:

>> headers should be hotlinked from a table of contents, 
>> and they should hotlink _back_ to the table of contents 
>
> Don't waste time linking them back; Bowerbird's the only
> one who wants that feature.
>
>> headers should hotlink to the previous header, and the next 
>
> Why, especially if they're linked to the table of contents?

Here Bowerbird is right - that's a nice feature and, of course, it is
supported by HTML since ages (check out the "link" element).

>> display of line-numbers in poems should be optional 
>
> Again, this is all about the user-interface. Again, why does it
> matter? Do we really have a whole lot of people desperately
> crying out not to see the line-numbers on their poetry?

It is easy to solve this issue using different CSS stylesheets.

>> special treatment of each character's dialogue in a play 
>> (e.g., each rendered in a distinct color) should be an option 
>
> Again, this is all about the user-interface. The underlying
> principle is sound.

Yes, user-interface issue (-> CSS).

>> if the e-book is replicating an existing paper-book, 
>> then the page-numbers from that p-books should be 
>> available, with a user option as to their display, and 
>
> Yes.

CSS.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From skip at nextra.sk  Tue Oct 19 19:41:45 2004
From: skip at nextra.sk (Skippi)
Date: Tue Oct 19 19:42:16 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <shacuiqic3.fsf@tux.gnu.franken.de>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
Message-ID: <1169277476.20041020044145@nextra.sk>

Hello Karl!

Wednesday, October 20, 2004, 3:38:52 AM, you wrote:

> HTML isn't the best choice if you are interested in printing. 

IMO: HTML isn't the best choice if you are interested in anything.

This obsolete format should not be considered if even mentioned. Use
XHTML + CSS instead, if you are allergic to XML. With properly
written XHTML and customizable CSS user can do what ever he wishes
with the files and the things are still as they should be.

-- 

 Skippi                            mailto:skip@nextra.sk

From shalesller at writeme.com  Tue Oct 19 19:58:44 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct 19 19:58:59 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com>

Karl Eichwalder writes:
> It is easy to solve this issue using different CSS stylesheets. 

I have three problems with using CSS. First, and most fundamental,
if you want to turn page numbers and line numbers on and off, and
offer a choice of 5 different background colors, you need 20 different
CSS options. Another feature will at least double the number of files.

Secondly, it doesn't work on many web browsers, and I don't think
that lynx and friends have any intent of ever supporting CSS.

Lastly, it's never struck me as particularly user-friendly. If you
can point me to a place where it's actually used and it's easy to
change without being a computer science major, I'd appreciate it.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From hacker at gnu-designs.com  Tue Oct 19 20:00:04 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 20:00:49 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <1169277476.20041020044145@nextra.sk>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
	<1169277476.20041020044145@nextra.sk>
Message-ID: <Pine.LNX.4.61.0410192253510.7955@aphrodite.gnu-designs.com>


> IMO: HTML isn't the best choice if you are interested in anything.

> This obsolete format should not be considered if even mentioned. Use 
> XHTML + CSS instead, if you are allergic to XML.

 	XHTML is HTML 4.0 designed to work as an XML application. In 
fact, there aren't a lot of differences between HTML and XHTML.

 	Of course, XHTML, HTML, and XML are all "children" of SGML 
anyway, so we're all talking about generalized markup in some form or 
another.

 	You can't set up a rigid set of rules that will apply across 
all past, present and future versions of printed works in electronic 
format. Whatever format you choose to use, must be extensible enough 
to scale for future capabilities, as well as the ability to handle the 
capabilities of documents created in the past.

> With properly written XHTML and customizable CSS user can do what 
> ever he wishes with the files and the things are still as they 
> should be.

 	Almost whatever s/he wishes. There are limitations in every 
format, depending on how broadly you want to consider using it.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Tue Oct 19 20:05:02 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Tue Oct 19 20:05:49 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com>
References: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410192300440.7955@aphrodite.gnu-designs.com>


> I have three problems with using CSS. First, and most fundamental, 
> if you want to turn page numbers and line numbers on and off, and 
> offer a choice of 5 different background colors, you need 20 
> different CSS options. Another feature will at least double the 
> number of files.

 	Not quite... unless you're not using CSS properly.

 	With the proper use of CSS selectors, hidden and visible 
properties, and other attributes and classes, you can make this very 
small and tight. It just takes a bit of up-front planning to get it 
all working right. Most people don't use CSS in any sort of optimized 
format.

> Secondly, it doesn't work on many web browsers, and I don't think 
> that lynx and friends have any intent of ever supporting CSS.

 	Hence the "C" part of the CSS spec.

 	It should always degrade properly to continue to work with the 
lesser capabilities of older browsers or browsers that don't support 
the full rich CSS styles. This includes PDAs, cellphones, WAP devices, 
screen scrapers, syndicated feeds, text-based browsers, text-to-speech 
devices, and so on.

> Lastly, it's never struck me as particularly user-friendly. If you 
> can point me to a place where it's actually used and it's easy to 
> change without being a computer science major, I'd appreciate it.

 	Why should you want to "change" the CSS? Maybe I'm missing 
your goals here. Can you try to explain this a bit further, perhaps by 
providing some examples you've done that solve/show these problems?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From shalesller at writeme.com  Tue Oct 19 20:10:50 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct 19 20:11:06 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020031050.D502F4BDA9@ws1-1.us4.outblaze.com>

Steve Thomas writes:
> Most users of PG don't go around grumbling about the lack of XML 
> or the ability to output as PDF. They're just stoked to be able 
> to find the text online. 

That's why they're users of PG. If they needed XML or PDF, they
go elsewhere. And frankly, I've heard many complaints about how
hard it is to process PG texts and how much information is lost.
I've personally found it a pain to produce good printed versions
of the PG etexts.

I think it a bad idea to start saying that "Most users ... don't
go around grumbling", because many of those who would grumble
will go elsewhere, and many of those who do grumble don't do
so where we can here.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From shalesller at writeme.com  Tue Oct 19 20:16:03 2004
From: shalesller at writeme.com (D. Starner)
Date: Tue Oct 19 20:18:41 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020031603.B46C74BDA9@ws1-1.us4.outblaze.com>

"David A. Desrosiers" writes:

> With the proper use of CSS selectors, hidden and visible 
> properties, and other attributes and classes, you can make this very 
> small and tight.

I don't care how it works internally; how does it look to the
users?

> It should always degrade properly to continue to work with the 
> lesser capabilities of older browsers or browsers that don't support 
> the full rich CSS styles. This includes PDAs, cellphones, WAP devices, 
> screen scrapers, syndicated feeds, text-based browsers, text-to-speech 
> devices, and so on. 

So this won't work for many users, in fact the group of users that
would most likely want to turn off line numbers on poetry. I think
that important to remember.
 
> Why should you want to "change" the CSS? Maybe I'm missing 
> your goals here. Can you try to explain this a bit further, perhaps by 
> providing some examples you've done that solve/show these problems? 

Again, I'm not looking at it internally. So far, all I've seen with CSS
forces you to change the HTML code to change things. How does this
look to the end user?
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From stephen.thomas at adelaide.edu.au  Tue Oct 19 21:54:30 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Tue Oct 19 21:54:54 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <shacuiqic3.fsf@tux.gnu.franken.de>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
Message-ID: <4175EF86.8060108@adelaide.edu.au>

As usual, people have missed the point of the original post 
(Anne's) which was that we need to remember the *user* -- that 
guy in Africa with only 2 hours of electricity each day. Anne 
suggested (I think) that he uses a laptop, but more likely he's 
using a worn-out IBM 486 running Windows 3, so all this 
geek-talk about XML and XSLT etc. is irrelevant to him -- he'll 
be lucky if he can run a standard web browser.

[I can't believe that people still think they're doing good by 
shipping old 486's to Africa -- but apparently its true. I 
recently donated some old Pentium II's to a charity, and they 
couldn't believe their luck.]

Anyway:

Karl Eichwalder wrote:
> Steve Thomas <stephen.thomas@adelaide.edu.au> writes:
> 
> 
>> But create an HTML file, properly, and then the user can do
>>  what they like with it. Want to print it out in Georgia 
>> 24pt? No problem. Your choice.
> 
> 
> HTML isn't the best choice if you are interested in printing.
>  Define "properly created" :)  XML plus a customizable 
> stylesheet (XSL or DSSSL) is better.  For those who do not 
> want to create a printable PDF file on their own offer a 
> pre-generated PDF file.
> 

My definition of "properly created" HTML would be HTML4 strict, 
plus CSS. I was trying to avoid obvious detail.

And HTML is the *best* choice for printing if you don't have the 
in-depth knowledge about XML/XSL etc. or the tools to make that 
happen. Anyone with IE6 can make a pretty good print of my HTML 
books, straight from the browser.


Skippi wrote:

> This obsolete format should not be considered if even
> mentioned. Use XHTML + CSS instead, if you are allergic to
> XML. With properly written XHTML and customizable CSS user
> can do what ever he wishes with the files and the things are
> still as they should be.
> 

XHTML is -- for practical purposes -- the same as HTML 4 strict, 
except that it enforces good practice, whereas HTML allows the 
author some latitude.

The important difference, to a user, is that HTML is pretty much 
guaranteed to work in all browsers, whereas XHTML can be 
"difficult" in some circumstances -- e.g. if you include the 
<!xml> header, it can fould up IE6.

A while ago, I started converting all my ebooks to XHTML, but 
immediately ran into problems that were'nt worth my time to fix.

One day, browsers will commonly deal correctly with XML of any 
type, with an appropriate style sheet. Right now, HTML is the 
format that works best. This is, I know, very boring to those of 
us who like playing with the latest gizmos and formats. But the 
reality is, if you want the widest possible audience, you've got 
to give them the format that's easiest for them.

For a more detailed discussion of this topic, see the archives, 
about January this year if memory serves, where you'll probably 
find me -- and you -- saying much the same things.


-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From tb at baechler.net  Tue Oct 19 23:55:02 2004
From: tb at baechler.net (Tony Baechler)
Date: Tue Oct 19 23:54:11 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <20041020031050.D502F4BDA9@ws1-1.us4.outblaze.com>
Message-ID: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com>

At 07:10 PM 10/19/2004 -0800, you wrote:
>Steve Thomas writes:
> > Most users of PG don't go around grumbling about the lack of XML
> > or the ability to output as PDF. They're just stoked to be able
> > to find the text online.
>
>That's why they're users of PG. If they needed XML or PDF, they
>go elsewhere. And frankly, I've heard many complaints about how
>hard it is to process PG texts and how much information is lost.


I don't want to add to the flame war here, but I can say this, which has 
been said here before.  Sometimes I will find a PG text which I would like 
more information about, so I will go to google and search for it.  In 
almost all cases, I have found tons of sites which somehow convert the 
books into html or a similar format.  blackmask.com immediately comes to 
mind but there are lots of others.  Many don't give credit to PG at 
all.  My point is that yes, I agree with gutenberg9443 in that I would much 
rather have plain text first and worry about the rest later, but many 
people don't need to complain to PG about plain text only for the simple 
reason that they can look for almost anything on google and find a nicer 
formatted version.

I would like to see PG eventually go to xml not because I particularly like 
the format but because the new DAISY standard for digital talking books for 
the blind uses a form of xml.  It should, in theory, be possible to convert 
html to DAISY, but how well that would work I don't know.  If anyone wants 
to analyze a set of DAISY files, go to http://bookshare.org/ and search for 
an early PG title.  I say "early" because they apparently quit adding the 
newer titles.  I think there might be a demo link on there just for public 
domain books.

I will make one other comment on accents.  Yes, I can see the importance of 
8-bit files.  I have a local mirror of almost all of PG on my system and I 
finally switched to getting 8-bit files only of works in 
non-English.  However, since I am blind and I read with speech, the accents 
really don't matter since the synthesizer doesn't pronounce them 
anyway.  If it sees a letter in the high ASCII range, it skkips it.  This 
is especially bad, for example, with the works of Tolkien because accents 
are used so heavily. 

From Bowerbird at aol.com  Wed Oct 20 00:16:06 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 00:16:30 2004
Subject: [gutvol-d] re: e-books for blind people
Message-ID: <1f4.13259f2.2ea76ab6@aol.com>

tony, my "viewer" program has text-to-speech.

right now i've just turned it on for the mac,
because speech synthesis is so easy there...

but if you are on windows, and you're willing to
give me feedback on it, i'll do the extra work to
make windows work too, if that's what you own,
as the headers from your e-mail would indicate...

accessibility is _so_ very important to society!
while text-to-speech is _vital_ for blind people,
i also think a lot of sighted people will come to
appreciate it as well, so this is a very important
arena to me, and i'd appreciate your help with it
very much.  think of it as your own private app!      :+)

for instance, i can program around that problem
where accented letters are skipped, and any other
glitches too, so the plain-text files, as they are,
will be as useful to you as any daisy file would be.

because at the rate books are being scanned these days,
nobody is going to have enough time to mark 'em all up
-- i'm not surprised the daisy people can't keep pace --
so we have to find a way to make plain-text shine...

as you say, it's workable right now, in fact often it's the
best choice available, but i know we can make it better...

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 01:26:13 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 01:26:38 2004
Subject: [gutvol-d] is your head spinning?
Message-ID: <1d5.2d0bf731.2ea77b25@aol.com>

is your head spinning?
do you feel like you're swimming through acronym soup?

here's a little _overview_ to help you get your bearings,
at least in regard to _my_ work, _my_ viewer-program,
_my_ format, _my_ markup system, and _my_ philosophy.

1.  the e-texts -- as they are now -- must be regularized.
2.  i can write programs to do most of that automatically.
3.  the results need to be checked for quality control, and
4.  some missing information will need to be re-inserted.
5.  once that is done, the files will be _finished_, in that
6.  my viewer will present them as high-powered e-books.
7.  users can push a button to create high-end .html files,
8.  or save text as an .rtf file, or print out to paper or .pdf,
9.  in a way that gives 'em customized high-quality output.
10.  my program will do text-to-speech, and screenshots,
11.  and let people explore the project gutenberg library,
12.  and easily report errors they encounter in any e-text.
13.  those error-correction reports will be automatically
14.  routed to a system that presents all the material, so
15.  a human only has to say "yes" to approve the mod, and
16.  change-logs will be updated and a notice distributed.
17.  this e-text standardization and ease of handling will
18.  nurture a flowering of synergistic uses of the library
19.  by an array of creative and imaginative programmers
20.  that will engender a book-driven revolution in thought.
21.  and everyone will live happily ever after.  the end.

-bowerbird
From holden.mcgroin at dsl.pipex.com  Wed Oct 20 02:44:33 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Wed Oct 20 02:44:12 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com>
References: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com>
Message-ID: <41763381.4010304@dsl.pipex.com>

Joseph R. Gruber wrote:
> For anyone that wants this <quote>rancid pudding</quote> you can grab it
> from:
> 
> http://www.josephgruber.com/pudding0727-exe.zip
> http://www.josephgruber.com/pudding0727[1].osx.sit
> 
> Leech away...  Oh, btw, feel free to sue me or send a cease and desist to my
> ISP.  It's BrightHouse Networks in case you need any help...

Wow, a Windows version _and_ an OSX version. Any suggestions on how to 
get it running on Linux? How about the source code?

Cheers,
Holden
From holden.mcgroin at dsl.pipex.com  Wed Oct 20 02:57:29 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Wed Oct 20 02:57:08 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <1169277476.20041020044145@nextra.sk>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com>
	<4175B419.6030301@adelaide.edu.au>	<shacuiqic3.fsf@tux.gnu.franken.de>
	<1169277476.20041020044145@nextra.sk>
Message-ID: <41763689.3040002@dsl.pipex.com>

Skippi wrote:
> This obsolete format should not be considered if even mentioned. Use
> XHTML + CSS instead, if you are allergic to XML. With properly
> written XHTML and customizable CSS user can do what ever he wishes
> with the files and the things are still as they should be.

I've heard some mobile devices with limited memory aren't able to parse 
XHTML files. Someone using an older browser may not either, plus the CSS 
may not even be of any use to them. I don't see why we should exclude 
these people. HTML has its uses just like XHTML, XML and PDF do. The 
ideal would be to have the texts in an XML-based format so transforming 
to standards-compliant HTML (and XHTML/PDF) is trivial. An XML->HTML 
converter could then be written by anyone who cares enough about it to 
do so.

Cheers,
Holden
From Bowerbird at aol.com  Wed Oct 20 03:29:15 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 03:30:07 2004
Subject: [gutvol-d] Why Bowerbird is a kook
Message-ID: <fe.44a4cff.2ea797fb@aol.com>

holden said:
>   Wow, a Windows version _and_ an OSX version. 

classic mac is also available, for people running that.


>   Any suggestions on how to get it running on Linux? 

i've announced that for release in 2005.

you can make it happen faster 
by buying me a linux machine.

or you can just wait for it...        :+)


>   How about the source code?

it's not available.

even if it was,
it's in realbasic.

so you'd need to
buy that program
(for which source
is not available)
to compile it.

you could also
port it to some
other language,
i guess.

but why not
just rewrite it?

the ideas are not
all that complex.

and i'm not
that good of 
a programmer.

in fact, i write
spaghetti code.
good spaghetti --
never crashes,
always works.
but hard to
decipher...

so rewriting it is
what you'd end up
doing anyway,
i would guess...

so grab a copy,
see how it works,
and start coding...

-bowerbird
From stephen.thomas at adelaide.edu.au  Wed Oct 20 03:37:05 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Oct 20 03:37:35 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com>
Message-ID: <41763FD1.7060004@adelaide.edu.au>

At 07:10 PM 10/19/2004 -0800, somebody wrote:

> Steve Thomas writes:
>> Most users of PG don't go around grumbling about the lack
>> of XML or the ability to output as PDF. They're just stoked
>> to be able to find the text online.
> 
> That's why they're users of PG. If they needed XML or PDF,
> they go elsewhere. 

That's not the point. People don't go to PG thinking, "hmmm, I 
wonder if they have any XML files". They go looking for a book. 
If you want the text of a particular book, you'll use it 
whatever format it comes in, so long as you have the software to 
handle that format. Nobody "needs" XML or PDF. They "need" the 
words of the book. Formats are secondary.

One of the original ideals of PG was that there had to be a 
plain text version, on the basis that everyone had at least the 
tools to handle plain text. Now-a-days, almost everyone has a 
web browser, so HTML comes second on the accessibility list.

Very few people, I imagine, have the necessary tools to work 
with a TEI or SGML file.

Now, there's nothing wrong with the notion of converting all PG 
texts to some XML master format, and then exporting that to 
umpteen other formats on demand. Practically though, that's a 
lot of work -- a *lot* of work -- and I don't yet see any signs 
that progressing. Commercially (if one were to do this 
commerically -- this is a hypothetical), I'd estimate such a 
conversion task, for 10,000 books, to cost around $1,000,000 in 
salaries alone.

Of course, there's always volunteer effort. But if volunteers 
are busy converting plain texts to XML so that they can be 
output as plain text (or HTML/PDF/...), does that reduce the 
effort put into scanning/OCR/proof-reading?

Could it be better to put the PG effort into getting plain text 
editions out, and leave it to others to do the extra conversion 
to XML etc.? This is a model that has worked really very well 
for quite a few years, without complaint from any but a few 
tech-enthusiasts.


-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From marcello at perathoner.de  Wed Oct 20 03:41:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 03:41:34 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <4175B419.6030301@adelaide.edu.au>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
Message-ID: <417640C6.5020607@perathoner.de>

Steve Thomas wrote:


> Basically, Anne is right -- who cares about this stuff? 

That is the exact same answer Tim Berners-Lee got when he first 
presented his stuff. :-)

"I can view a text file with "more", I just hit the space bar until I 
get to the right page. With your new-fangled format I need a -- what? -- 
browser? I don't have one. Why should I need a `browser' just to read 
some text?"


 > Only the few
> enthusiasts on this list. Most users of PG don't go around grumbling 
> about the lack of XML or the ability to output as PDF. They're just 
> stoked to be able to find the text online.

I can assure you that some do. Many start their own projects to markup 
PG texts, most of them dont go very far, though. One example:

   http://gutenberg.hwg.org/


> And on the subject of PDF, I agree with Anne -- it sucks. 

The only format we have today to bring mathematics to the 
unsophisticated user. If you don't want to install TeX, PDF is the only way.

PDF is not so bad. It is widely accepted, well documented, free tools 
exist to generate PDFs.

It has all the limitations of paper books, though. You cannot resize a 
printed book, or change the font, etc. Well same limitations for PDF. It 
hasn't stopped people from buying paper books.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jgruber at tampabay.rr.com  Wed Oct 20 03:50:58 2004
From: jgruber at tampabay.rr.com (Joseph R. Gruber)
Date: Wed Oct 20 03:51:19 2004
Subject: [gutvol-d] Why Bowerbird is a kook
In-Reply-To: <fe.44a4cff.2ea797fb@aol.com>
Message-ID: <200410201050.i9KAooNw006283@ms-smtp-05.tampabay.rr.com>

>> in fact, i write
>> spaghetti code.
>> good spaghetti --
>> never crashes,
>> always works.
>> but hard to
>> decipher...

Bulls__t -- Try this.  Run the program and then Ctrl+Alt+Del it.
Ooops...there goes a "never-crash".   First time I ever seen a program crash
when you try to end task on it. ;)

Joseph

P.S. Why are you hiding the etexts in your code instead of making them
separate .txt's?


From jonathan_ingram at yahoo.com  Wed Oct 20 04:00:09 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 04:00:32 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <417640C6.5020607@perathoner.de>
Message-ID: <20041020110009.62059.qmail@web41702.mail.yahoo.com>


--- Marcello Perathoner <marcello@perathoner.de> wrote:

> Steve Thomas wrote:
> 
> 
> > Basically, Anne is right -- who cares about this stuff? 
> 
> That is the exact same answer Tim Berners-Lee got when he first 
> presented his stuff. :-)

Indeed. Over at DP we're progressing, in small baby-steps, toward producing
decently marked up editions of all new material we produce. And when we at DP
find a markup format we're comfortable with, then PG had better get comfortable
with it as well, because we are now produce the vast majority of all PG
material.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From marcello at perathoner.de  Wed Oct 20 04:13:02 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 04:13:26 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <41763FD1.7060004@adelaide.edu.au>
References: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com>
	<41763FD1.7060004@adelaide.edu.au>
Message-ID: <4176483E.4040403@perathoner.de>

Steve Thomas wrote:

> Nobody "needs" XML 
> or PDF. They "need" the words of the book.

Nobody "needs" television or cars. All they "need" is a newspaper and a 
pair of shoes.


> Very few people, I imagine, have the necessary tools to work with a TEI 
> or SGML file.

TEI is not intended as end-user format. End-users should grab the 
generated HTML file.


> Now, there's nothing wrong with the notion of converting all PG texts to 
> some XML master format, and then exporting that to umpteen other formats 
> on demand. [...] I'd estimate 
> such a conversion task, for 10,000 books, to cost around $1,000,000 in 
> salaries alone.

So think of what great value we would donate to the world.


> Could it be better to put the PG effort into getting plain text editions 
> out, and leave it to others to do the extra conversion to XML etc.? This 
> is a model that has worked really very well for quite a few years, 
> without complaint from any but a few tech-enthusiasts.

The main downside is that they mark up a *copy* of the text. When the 
original gets updated, the marked up copy falls out of sync and so all 
the generated formats.

This problem can only be obviated if PG is to markup the original.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Oct 20 04:17:51 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 04:18:14 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <20041020110009.62059.qmail@web41702.mail.yahoo.com>
References: <20041020110009.62059.qmail@web41702.mail.yahoo.com>
Message-ID: <4176495F.8070805@perathoner.de>

Jonathan Ingram wrote:

> Indeed. Over at DP we're progressing, in small baby-steps, toward producing
> decently marked up editions of all new material we produce. And when we at DP
> find a markup format we're comfortable with, then PG had better get comfortable
> with it as well, because we are now produce the vast majority of all PG
> material.


A simple XSLT will convert your format into TEI.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Oct 20 05:05:36 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 05:05:43 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
Message-ID: <20041020120536.AE85F4F45F@ws6-5.us4.outblaze.com>

Most of use HTML as a shorthand for HTML with CSS (or XHTML, if you prefer).

----- Original Message -----
From: Skippi <skip@nextra.sk>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Re: jeroen's even-handed analysis
Date: Wed, 20 Oct 2004 04:41:45 +0200

> 
> Hello Karl!
> 
> Wednesday, October 20, 2004, 3:38:52 AM, you wrote:
> 
> > HTML isn't the best choice if you are interested in printing. 
> 
> IMO: HTML isn't the best choice if you are interested in anything.
> 
> This obsolete format should not be considered if even mentioned. Use
> XHTML + CSS instead, if you are allergic to XML. With properly
> written XHTML and customizable CSS user can do what ever he wishes
> with the files and the things are still as they should be.
> 
> -- 
> 
>  Skippi                            mailto:skip@nextra.sk
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Wed Oct 20 05:16:13 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 05:16:15 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com>

Well, I use CSS to allow the user to switch between showing page numbers and not showing page numbers on the fly.  In fact, both CSS style sheets are embedded within the main HTML file so that extra files are unnecessary.

Here is a link to a Bay State Monthly issue that uses this feature...

http://www.gutenberg.org/dirs/1/3/7/6/13761/13761-h/13761-h.htm

In Mozilla-based browsers, you can switch between the style sheets very easily by clicking the icon in the lower left corner of the browser window.  The default setting is NOT to show the page numbers, since the majority of people could care less how the original paper version was numbered.  But for those that WANT to know, clicking the Original Page Numbers style will have all the original page numbers appear in the margin.

Now Internet Explorer doesn't seem to have a way to switch styles on the fly, which is a shame, but it just defaults to not showing the page numbers.  The page numbers are still in the HTML source, though, so someone that needs it can still get to the information is they absolutely had to.

I am *hoping* that when the switch to XML/TEI happens in the future, this markup will transition fairly easily, too.

Josh

----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Re: aspects of  a well-done e-book
Date: Tue, 19 Oct 2004 18:58:44 -0800

> 
> Karl Eichwalder writes:
> > It is easy to solve this issue using different CSS stylesheets. 
> 
> I have three problems with using CSS. First, and most fundamental,
> if you want to turn page numbers and line numbers on and off, and
> offer a choice of 5 different background colors, you need 20 different
> CSS options. Another feature will at least double the number of files.
> 
> Secondly, it doesn't work on many web browsers, and I don't think
> that lynx and friends have any intent of ever supporting CSS.
> 
> Lastly, it's never struck me as particularly user-friendly. If you
> can point me to a place where it's actually used and it's easy to
> change without being a computer science major, I'd appreciate it.
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Wed Oct 20 05:21:17 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 05:21:20 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020122118.117874F481@ws6-5.us4.outblaze.com>

----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>

> 
>  	With the proper use of CSS selectors, hidden and visible 
> properties, and other attributes and classes, you can make this very 
> small and tight. It just takes a bit of up-front planning to get it 
> all working right. Most people don't use CSS in any sort of optimized 
> format.

It sounds like you're just the expert I've been looking for.

Can CSS somehow specify a "general" part of the style and then have "special" sections that modify it when that style is selected.

For instance, a general section that sets up a margin size, justifies the text ...  Then a style for showing page numbers and a section for not showing page numbers.

Right now, my CSS header includes the general section twice, once for each style.  If I could just have it once, it would cut down on the size of the header quite a bit AND allow me to add some new features to the CSS header without feeling bad about how huge the header is getting.

JOsh
From hacker at gnu-designs.com  Wed Oct 20 06:27:08 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 06:27:38 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com>
References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410200919020.26661@aphrodite.gnu-designs.com>


> In fact, both CSS style sheets are embedded within the main HTML 
> file so that extra files are unnecessary.

 	If you have a lot of texts, putting the stylesheet directly 
inside the HTML unnecessarily bloats the content, and removes one of 
the main benefits of CSS.. being able to separate content from 
presentation.

 	This means that if you have 1,500 works all formatted with an 
internal stylesheet, and you want to change the fonts for one class 
and add some borders around another, and add a selector for a new text 
class... you have to modify 1,500 stylesheets, insteasd of one. Yes, 
you could do all of that with a single perl one-liner, but why should 
you?

> In Mozilla-based browsers, you can switch between the style sheets 
> very easily by clicking the icon in the lower left corner of the 
> browser window.

 	Or, more correctly, by going to View -> Use Style, because 
there is no such selector in Mozilla or "Mozilla-based browsers" in 
the lower left-hand corner. At least not on my Unix, Linux and Windows 
versions of Mozilla (all current).

> But for those that WANT to know, clicking the Original Page Numbers 
> style will have all the original page numbers appear in the margin.

 	Why not also break up the pages with border-bottom on the 
bottom of each respective div, so they look like _actual_ pages.

> Now Internet Explorer doesn't seem to have a way to switch styles on 
> the fly, which is a shame, but it just defaults to not showing the 
> page numbers.

 	Well, that is mostly because MSIE is not a browser, at least 
not according to the standards body which defines how a web browser 
should function, from the socket level all the way on up to the 
presentation level.

 	MSIE is a file manager, based on an ActiveX control that tries 
to render HTML. It supports HTML3.2 fully, "most" of HTML4, "some" of 
CSS1, hardly any CSS2, and CSS3... whats that?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From skip at nextra.sk  Wed Oct 20 06:48:21 2004
From: skip at nextra.sk (Skippi)
Date: Wed Oct 20 06:48:35 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410200919020.26661@aphrodite.gnu-designs.com>
References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com>
	<Pine.LNX.4.61.0410200919020.26661@aphrodite.gnu-designs.com>
Message-ID: <1888158910.20041020154821@nextra.sk>

Hello David!

Wednesday, October 20, 2004, 3:27:08 PM, you wrote:

>         Or, more correctly, by going to View -> Use Style, because 
> there is no such selector in Mozilla or "Mozilla-based browsers" in 
> the lower left-hand corner. At least not on my Unix, Linux and Windows
> versions of Mozilla (all current).

It works perfectly on Firefox. The icon in the lower left corner is
not instant, appears only when proper file is loaded.

-- 
 Skippi

From jonathan_ingram at yahoo.com  Wed Oct 20 06:57:50 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 06:57:53 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed
	analysis)
In-Reply-To: <4176495F.8070805@perathoner.de>
Message-ID: <20041020135750.11303.qmail@web41728.mail.yahoo.com>


--- Marcello Perathoner <marcello@perathoner.de> wrote:
> A simple XSLT will convert your format into TEI.

I'm not sure any use of XSLT can be called simple :). I've tried reading the
spec, and I'm still recovering from the headaches. Fortunately there are easier
ways to style (rather than transform) XML, using CSS. This is very well
supported in all the Mozilla derivatives. While XSLT is something I'm going to
have to look at eventually, for the moment I'm happy with CSS :).

If you want an example of what I'm playing with at the moment: recently I and
another DP volunteer have been kicking around some ideas for semantic markup of
drama. While initially we were working with straight HTML, this quickly gets
annoying, due to the amount of messing around with divs involved, and the need
to consider how the output will be displayed on browsers with poor support for
CSS. I've found it much easier to investigate options by working with an
'HTML+extra tags' markup.

You can see my current working by looking at the blah.* files here:

  http://www.pgdp.net/phpBB2/viewtopic.php?p=94734

Save each file to the name given in its post subject heading. Any Mozilla
derivative should show the .xml file styled in a way which almost exactly
replicates the .html file. The source for the XML edition is much easier to
read.

Those of you who know TEI can probably tell that 'my' markup is very similar to
TEI markup (although a little more verbose). Much of it was arrived at
independently, which makes me more confident that this styling approach is
relatively sensible. The example demonstrates markup of drama and poetry, with
decent handling of line continuations and line numbers in poetry, and stage
directions in drama. I've used the HTML 'edition' of this poetry markup for
quite a while now in texts I've PPed for PG.

Note that this is still a work in progress, so resist the tempation to
criticise the minutiae of my CSS :).

One of the other reasons I think a simple XML-style is useful is that we're
currently planning to seperate the proofreading rounds from the markup rounds
at DP. Every page of a DP project currently goes through two 'rounds' of
processing. In each round proofers are expected to not only detect OCR errors,
but add inline markup for italic, bold, material in non-Latin alphabets, etc.,
and add block markup for poetry, tables, and so on. This will be split into an
initial two rounds only concerned with the text, plus an extra procedure to
mark the text correctly. At the moment the markup we use is homegrown and
kludgy -- we have a great opportunity at the moment to move to something more
sensible, and I strongly believe that some simple XML-derivative is the markup
we need. 

I'm even more convinced of the utility of XML for DP now that I've seen how
easy it is to style it. One of the problems of relying on something like XSLT
is that it can be hard to go backwards from errors in the output to find the
corresponding error in the original XML input. Being able to get direct
feedback by viewing a styled version of the XML makes life much easier.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From joshua at hutchinson.net  Wed Oct 20 07:43:59 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 07:44:02 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com>


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
> 
> > In fact, both CSS style sheets are embedded within the main HTML 
> > file so that extra files are unnecessary.
> 
>  	If you have a lot of texts, putting the stylesheet directly 
> inside the HTML unnecessarily bloats the content, and removes one of 
> the main benefits of CSS.. being able to separate content from 
> presentation.
> 
>  	This means that if you have 1,500 works all formatted with an 
> internal stylesheet, and you want to change the fonts for one class 
> and add some borders around another, and add a selector for a new text 
> class... you have to modify 1,500 stylesheets, insteasd of one. Yes, 
> you could do all of that with a single perl one-liner, but why should 
> you?

Well, in a perfect world, we could guarantee that the separate CSS file is accessible and life is good.  Unfortunately, since we can't guarantee the CSS file is there, we decided to embed the CSS inside the HTML.  It bloats it somewhat, but it is still smaller than the obligatory PG header information, so I don't feel TOO badly about it.  And now we get a fully self-contained file.

> 
> > In Mozilla-based browsers, you can switch between the style sheets 
> > very easily by clicking the icon in the lower left corner of the 
> > browser window.
> 
>  	Or, more correctly, by going to View -> Use Style, because 
> there is no such selector in Mozilla or "Mozilla-based browsers" in 
> the lower left-hand corner. At least not on my Unix, Linux and Windows 
> versions of Mozilla (all current).
> 

Now that I think about it, you may be right... In Firefox (which is what I have on this machine), there is no View -> Use Style menu option, but there is the icon in the bottom left corner.  *shrug*

> > But for those that WANT to know, clicking the Original Page Numbers 
> > style will have all the original page numbers appear in the margin.
> 
>  	Why not also break up the pages with border-bottom on the 
> bottom of each respective div, so they look like _actual_ pages.
> 

I have a big aversion to taking an electronic document and presenting it as "pages."  First and foremost, it is ugly.  Second, it is going to wreck havoc whenever the user wants to change font sizes, page sizes, etc.  This method allows the "scholar" to have original page number references (which the scholars in the original discussion said was important) without tying the online layout to the limitations of the physical page layout.

> > Now Internet Explorer doesn't seem to have a way to switch styles on 
> > the fly, which is a shame, but it just defaults to not showing the 
> > page numbers.
> 
>  	Well, that is mostly because MSIE is not a browser, at least 
> not according to the standards body which defines how a web browser 
> should function, from the socket level all the way on up to the 
> presentation level.
> 
>  	MSIE is a file manager, based on an ActiveX control that tries 
> to render HTML. It supports HTML3.2 fully, "most" of HTML4, "some" of 
> CSS1, hardly any CSS2, and CSS3... whats that?
> 
> 

You are preaching to the choir here!  The only thing I use IE for anymore is to check a new HTML document before posting it to make sure IE isn't mangling it TOO badly.

JHutch
From marcello at perathoner.de  Wed Oct 20 08:25:29 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 08:25:37 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's
	even-handed analysis)
In-Reply-To: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
Message-ID: <41768369.6050204@perathoner.de>

Jonathan Ingram wrote:

> I'm not sure any use of XSLT can be called simple :). I've tried reading the
> spec, and I'm still recovering from the headaches. Fortunately there are easier
> ways to style (rather than transform) XML, using CSS. This is very well
> supported in all the Mozilla derivatives. While XSLT is something I'm going to
> have to look at eventually, for the moment I'm happy with CSS :).

CSS, while simpler, is less powerful and gives you only HTML.


> If you want an example of what I'm playing with at the moment: recently I and
> another DP volunteer have been kicking around some ideas for semantic markup of
> drama.

What I've done with Faust is to reformat the text file in a sensible way 
and then use perl to automatically add TEI markup.

I advise to use a perl script to add the basic markup and to refine the 
markup in a second markup-proofing step.


> Those of you who know TEI can probably tell that 'my' markup is very similar to
> TEI markup (although a little more verbose). Much of it was arrived at
> independently, which makes me more confident that this styling approach is
> relatively sensible. 

Why do people keep reinventing the wheel? TEI is perfectly good and 
designed explicitly for the task we have at hand. And it is a standard 
that is already in use in many e-libraries worldwide.

I don't think we'll get PG to post texts in non-standard cooked-up 
formats. They are already making enough fuzz over perfectly valid TEI files.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hacker at gnu-designs.com  Wed Oct 20 08:28:28 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 08:29:39 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com>
References: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410201105160.20548@aphrodite.gnu-designs.com>


> Well, in a perfect world, we could guarantee that the separate CSS 
> file is accessible and life is good.  Unfortunately, since we can't 
> guarantee the CSS file is there, we decided to embed the CSS inside 
> the HTML.

 	If you can guarantee the HTML is there, you can guarantee that 
the CSS is there. If the CSS is missing, it shouldn't "break" the 
usability of the HTML document.

> It bloats it somewhat, but it is still smaller than the obligatory 
> PG header information, so I don't feel TOO badly about it.  And now 
> we get a fully self-contained file.

 	I don't understand the correlation. What does your CSS size 
have to do with the obligatory PG header size?

> Now that I think about it, you may be right... In Firefox (which is 
> what I have on this machine), there is no View -> Use Style menu 
> option, but there is the icon in the bottom left corner.  *shrug*

 	For those that want to see this in a much-more expanded 
version, go to http://w3.org/Style/ in a Gecko-based browser, and 
click on the icon, or go to View -> Use Style, and try the various 
stylesheets listed there.

> I have a big aversion to taking an electronic document and 
> presenting it as "pages."  First and foremost, it is ugly.

 	I submit that having page numbers in an unintuitive place 
(left-side margins, which doesn't appear in any printed work I can 
find), is just as ugly.

> Second, it is going to wreck havoc whenever the user wants to 
> change font sizes, page sizes, etc.

 	Having the border at the bottom of page 423 with a font size 
of 1.0em is still going to put the border at the bottom of the page 
when the font is 2.8em.

 	I think I'm missing your allegory here. Can you explain?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From joshua at hutchinson.net  Wed Oct 20 08:48:00 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 08:48:05 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
> 
> > Well, in a perfect world, we could guarantee that the separate CSS 
> > file is accessible and life is good.  Unfortunately, since we can't 
> > guarantee the CSS file is there, we decided to embed the CSS inside 
> > the HTML.
> 
>  	If you can guarantee the HTML is there, you can guarantee that 
> the CSS is there. If the CSS is missing, it shouldn't "break" the 
> usability of the HTML document.
> 

I can guarantee that CSS file is in the PG directory.  I can't guarantee that Joe Sixpack will download that when he grabs the HTML file.  Overall, this makes things simpler for the consumer of the e-text.

> > It bloats it somewhat, but it is still smaller than the obligatory 
> > PG header information, so I don't feel TOO badly about it.  And now 
> > we get a fully self-contained file.
> 
>  	I don't understand the correlation. What does your CSS size 
> have to do with the obligatory PG header size?
> 

The CSS adds to the size of the etext, and on some level that feels ... wrong.  I can't explain why, it just does.  However, whenever PG posts a new e-text, they add a great big header and footer to the document for legal reasons.  That thing absolutely dwarfs the CSS style header is size, so I don't feel AS badly as I might otherwise.  It was mostly a throw-away comment, so don't read too much into it.

> > Now that I think about it, you may be right... In Firefox (which is 
> > what I have on this machine), there is no View -> Use Style menu 
> > option, but there is the icon in the bottom left corner.  *shrug*
> 
>  	For those that want to see this in a much-more expanded 
> version, go to http://w3.org/Style/ in a Gecko-based browser, and 
> click on the icon, or go to View -> Use Style, and try the various 
> stylesheets listed there.
> 
> > I have a big aversion to taking an electronic document and 
> > presenting it as "pages."  First and foremost, it is ugly.
> 
>  	I submit that having page numbers in an unintuitive place 
> (left-side margins, which doesn't appear in any printed work I can 
> find), is just as ugly.
> 

The original page breaks were necessitated by the size of paper the publisher used.  There is almost never a functional meaning to the page breaks in a book (except things like chapter breaks, which are easily marked up with horizontal rules or something to that effect).  The page numbers in the margins are small and fairly unobstrusive, yet give the information in the easiest manner I could devise.  Furthermore, they are completely hidden unless the read WANTS to have that information.

> > Second, it is going to wreck havoc whenever the user wants to 
> > change font sizes, page sizes, etc.
> 
>  	Having the border at the bottom of page 423 with a font size 
> of 1.0em is still going to put the border at the bottom of the page 
> when the font is 2.8em.
> 
>  	I think I'm missing your allegory here. Can you explain?
> 

If you put visible page breaks into an HTML document, the user is going to expect that document to print to his printer at exactly those page breaks.  Good luck.  

Also, page breaks would only make sense if you broke them into "visual" chunks.  By that, I mean sizes that fit into one screen at a time -- no scrolling.  However, if the user has a different resolution than you, it ain't gonna work.  If he changes the font size, it ain't gonna work.

Basically, using visual page dividers is getting into typography, something you want to avoid.  Good HTML lets the browser and the user format the text.  You just tell them what KIND of text it is.  The page numbers are not meant to give you visual indication of page breaks as much as contextual information regarding the original source... which some people find very important and as it's fairly easy for me to include that information without disturbing the other readers, I do.

Josh

PS None of this is an argument for my CSS based HTML over TEI-Lite.  I would LOVE if we have TEI-Lite capabilities right now... But we don't.  
From hacker at gnu-designs.com  Wed Oct 20 08:58:32 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 08:59:42 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>


> I can guarantee that CSS file is in the PG directory.  I can't 
> guarantee that Joe Sixpack will download that when he grabs the HTML 
> file.

 	Agreed. If he wants a richer reading experience, he should 
grab the CSS. Pretty simple overall. If the reader wants to grab 200 
etexts, its easier to let them know they need one .css file, than 200 
identical css stanzas.

 	I understand your needs, but you're un-CSS-ifying CSS.

> The CSS adds to the size of the etext, and on some level that feels 
> ... wrong.  I can't explain why, it just does.  However, whenever PG 
> posts a new e-text, they add a great big header and footer to the 
> document for legal reasons.  That thing absolutely dwarfs the CSS 
> style header is size, so I don't feel AS badly as I might otherwise.

 	The PG header is considered "content", while CSS is considered 
"presentation". Again, I understand where you're coming from here, I 
just don't personally agree with it. I'm more of a purist, in the 
strictest sense of the word. ;)

> If you put visible page breaks into an HTML document, the user is 
> going to expect that document to print to his printer at exactly 
> those page breaks.  Good luck.

 	This is why 'media="print"' exists in a CSS declaration.

 	See here for more:

 	http://www.w3.org/TR/REC-CSS2/media.html

> Also, page breaks would only make sense if you broke them into 
> "visual" chunks.  By that, I mean sizes that fit into one screen at 
> a time -- no scrolling.  However, if the user has a different 
> resolution than you, it ain't gonna work.  If he changes the font 
> size, it ain't gonna work.

 	You can't translate a book into something read in a web 
browser, and retain the same functionality. The whole point of a 
scrollbar is to remove that constraint.

 	Though I agree, unnessarily-long webpages (scrolling down for 
hundreds of pages) are a pain, but the alternative is much more 
painful.

> The page numbers are not meant to give you visual indication of page 
> breaks as much as contextual information regarding the original 
> source... which some people find very important and as it's fairly 
> easy for me to include that information without disturbing the other 
> readers, I do.

 	Right. Your page numbers don't correlate to anything, except 
an "Oh thats neat!" kind of feeling as you imagine what it would be 
like to be reading page 423 in the printed (dead-tree) version of that 
particular work. Page 423 in your numbering scheme is not the 423'd 
page as seen in my browser.

> PS None of this is an argument for my CSS based HTML over TEI-Lite. 
> I would LOVE if we have TEI-Lite capabilities right now... But we 
> don't.

 	I'm still gathering info and doing research on all of the 
alternatives presented thus far. TEI is one of the datapoints in my 
research.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From jon at noring.name  Wed Oct 20 09:03:14 2004
From: jon at noring.name (Jon Noring)
Date: Wed Oct 20 09:03:31 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's
	even-handed analysis)
In-Reply-To: <41768369.6050204@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de>
Message-ID: <167527178375.20041020100314@noring.name>

Marcello wrote:
> Jonathan Ingram wrote:

>> I'm not sure any use of XSLT can be called simple :). I've tried
>> reading the spec, and I'm still recovering from the headaches.
>> Fortunately there are easier ways to style (rather than transform)
>> XML, using CSS. This is very well supported in all the Mozilla
>> derivatives. While XSLT is something I'm going to have to look at
>> eventually, for the moment I'm happy with CSS :).

> CSS, while simpler, is less powerful and gives you only HTML.

CSS can be applied to any XML markup for viewing on web standards
browsers, but because current CSS is limited there are certain HTML
functions (tags) it simply won't be able to enable, or to enable
cleanly. In CSS 'display', for instance, there is no value for
identifying when some XML element represents an object/image, nor a
hypertext link/anchor. Obviously, there is no 'display' value for an
inline note because HTML never supported this. (We find markup for
inline notes in the TEI and DocBook vocabularies. Note that CSS can
move a span of inline text to the side in its own box, I've tested it
out myself, but IE6 unfortunately does not recognize the needed CSS so
in IE6 the inline note stays inline, not a good thing.)

Of course, one can use XLink for object/image embedding and anchors
(and XLink makes more sense anyway than using CSS since it is a
vocabulary-independent means to embed objects and enable links), 
but then current web browsers are very deficient in XLink support
(Mozilla has very limited XLink support -- haven't tested FireFox yet
-- while IE and Opera have zero XLink support.)

The OpenReader System 1.0, should it become a reality (and we are
working on it -- we've made great strides in the last few weeks in
garnering fairly high-level support), intends to fully support the
more important parts of the XLink specification in version 1.0. We may
also add one or more custom CSS values to 'display' to emulate links/
anchors, objects/images and inline notes (OpenReader will include a
facility to open 'booklets' to display non-inline content, in part to
support OEBPS which enables this cool ebook feature.) We also plan to
investigate a future version of OpenReader to *natively* support
TEI-Lite or some subset of TEI (including handling inline notes which
will be trivial for OpenReader to handle.) We may even develop a
next-generation styling language to address the deficiencies of
current CSS2 and CSS3 but which doesn't have the complexity of
XSLT/XSL-FO. The problem with CSS is its ties to the HTML paradigm and
legacy support. In OpenReader, we are freeing ourselves from these
legacy issues and thus can think outside the box and move on to the
next generation web browser -- in essence to go beyond HTML.

Jon Noring
OpenReader: http://www.openreader.org/

From jonathan_ingram at yahoo.com  Wed Oct 20 09:03:28 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 09:03:35 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's
	even-handed analysis)
In-Reply-To: <41768369.6050204@perathoner.de>
Message-ID: <20041020160328.75110.qmail@web41704.mail.yahoo.com>


--- Marcello Perathoner <marcello@perathoner.de> wrote:
> I don't think we'll get PG to post texts in non-standard cooked-up 
> formats. 

Neither do I, and I don't want them to. Hopefully we'll either use TEI, or a
markup which can easily and losslessly transformed into TEI. However, there are
a lot of people out there, including a lot of DP volunteers, who are
unconvinced about the utility of XML, and one of the best ways to *fail* to
change their mind is to plonk 1400 pages of documentation in front of them and
say 'here's what you should be using, come back when you've finished reading'
-- this is true even of TEI-lite, which has some foibles you have to see past
(overly terse tags, for example -- at least to my mind :) ).

I used to be one of the members of the 'undecided about XML' camp myself. I've
gradually changed my mind, and I'm working on helping to change the minds of
those I'm working with. I also don't just automatically accept the TEI-way as
being best for the applications I wish to use it for -- so I've been developing
my own structured markup which I'm happy with, and which happens to have
converged very closely to the corresponding TEI markup. As I said in my
previous email, this makes me much more confident to accept the use of
TEI-style markup in areas where I haven't had the time to investigate
alternatives.

Those of you who aren't involved in DP will probably see nothing more about
this until the day that 85% of new PG ebooks come with a TEI edition :).

-- 
Jon Ingram


__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail
From joshua at hutchinson.net  Wed Oct 20 09:09:40 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 09:09:46 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020160941.073CD109756@ws6-4.us4.outblaze.com>


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
> 
>  	Right. Your page numbers don't correlate to anything, except 
> an "Oh thats neat!" kind of feeling as you imagine what it would be 
> like to be reading page 423 in the printed (dead-tree) version of that 
> particular work. Page 423 in your numbering scheme is not the 423'd 
> page as seen in my browser.

But that isn't its purpose.  It's purpose, solely and completely, is to provide information about the original source.  This isn't widely needed, so it is hidden.  But, for the few that need that information (and again, this discussion was held in the DP forums and the scholarly types really clamored for this), the information is available.  It is most definitely not there to provide you any indication of where page breaks would or should occur in your browser or in a printed copy.

Josh
From Bowerbird at aol.com  Wed Oct 20 09:25:32 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 09:25:45 2004
Subject: [gutvol-d] Why Bowerbird is a genius
Message-ID: <ba.63165d94.2ea7eb7c@aol.com>

gruber said:
>   Run the program and then Ctrl+Alt+Del it.
>   Ooops...there goes a "never-crash". 

control-alt-delete?
is that how you end all your programs?      :+)
i recommend you try the "quit" button instead,
or choose "quit" or "exit" under the file menu.

but if that's a bug, which is entirely possible
-- to be expected in fact -- in a beta-version,
i'll fix it.  but please take the bug-reports to
the beta-test listserve, so they'll be logged.

but don't bother with doing that _now_.
as your version is almost 3 months old.
a new version will be out very soon, and
you should wait to test that one instead...


>   P.S. Why are you hiding the etexts in your code
>   instead of making them separate .txt's?

first things first.  priorities. and control of the
degrees of freedom for enhanced troubleshooting

the first objective is to get the program solid.
to focus on that, it's wise to use content that
i _know_ has been correctly formatted in z.m.l.

once the app is acting correctly and is stable,
i'll turn to texts that might be marked up wrong,
confident that if i get unexpected behavior, it's
due to an incorrect text or a defective z.m.l. rule,
in all likelihood, and not some bug in the program.

but the e-texts aren't really "hidden" right now.
if you scrutinize any version of the program in
a file-viewer, you will see the e-texts inside,
"hiding" in plain sight in their raw-ascii glory.
they can be recovered, in full, easily, any time.

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 09:27:01 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 09:27:21 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <190.3192affa.2ea7ebd5@aol.com>

jon said:
>   And when we at DP find a markup format we're comfortable with, 
>   then PG had better get comfortable with it as well, because 
>   we are now produce the vast majority of all PG material.

"we make all your base."

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 09:35:39 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 09:35:54 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <146.364dc5ec.2ea7eddb@aol.com>

josh said:
>   Now Internet Explorer doesn't seem to have a way 
>   to switch styles on the fly, which is a shame

a big shame, since i.e. still has -- what -- 93% of all surfers?

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 09:46:25 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 09:46:37 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <92.1829c179.2ea7f061@aol.com>

the hacker (there's too many davids running around here) said:
>   If you have a lot of texts, putting the stylesheet directly 
>   inside the HTML unnecessarily bloats the content, and 
>   removes one of the main benefits of CSS.. being able to 
>   separate content from presentation.
>   This means that if you have 1,500 works 
>   all formatted with an internal stylesheet, 
>   and you want to change the fonts for one class 
>   and add some borders around another, and 
>   add a selector for a new text class... you have to 
>   modify 1,500 stylesheets, insteasd of one. 
>   Yes, you could do all of that with 
>   a single perl one-liner, but why should you?

sometimes people act like c.s.s. is some magic new technology.

in reality, it's just a stylesheet.  and this trade-off between
"one stylesheet for all your documents" versus "one for each"
is a well-known dilemma to anyone who has used stylesheets.
(my virginity in that arena went to ventura publisher in 1989.)

the upshot is that each method has benefits and shortcomings,
and you can only really make an informed decision relevant to
your particular situation when you know all of them full-on.

i look forward to the learning process -- over the next 5 years? --
as this comes to be appreciated by the c.s.s. community at large,
and relish the time of its culmination, when we can start to make
some real progress, instead of just re-grasping old knowledge...

-bowerbird
From hacker at gnu-designs.com  Wed Oct 20 09:47:32 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 09:48:03 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <146.364dc5ec.2ea7eddb@aol.com>
References: <146.364dc5ec.2ea7eddb@aol.com>
Message-ID: <Pine.LNX.4.61.0410201246160.2084@aphrodite.gnu-designs.com>


> a big shame, since i.e. still has -- what -- 93% of all surfers?

 	And decreasing every day.

 	Users aren't using MSIE because it is the superior product, 
they're using it because they have no idea there are significanly more 
secure, functional, compliant browser alternatives out there, and 
because it came with their pee-cee, with a nice convenient icon right 
on their desktop.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From hacker at gnu-designs.com  Wed Oct 20 09:51:36 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 09:52:03 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <92.1829c179.2ea7f061@aol.com>
References: <92.1829c179.2ea7f061@aol.com>
Message-ID: <Pine.LNX.4.61.0410201251060.2084@aphrodite.gnu-designs.com>


> sometimes people act like c.s.s. is some magic new technology.

 	I think you mean CSS. There is no such thing as "c.s.s.".


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From Bowerbird at aol.com  Wed Oct 20 09:56:00 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 09:56:26 2004
Subject: [gutvol-d] re: homegrown and kludgy
Message-ID: <8.59f797c8.2ea7f2a0@aol.com>

ingram (there's too many jons running around here) said:
>   At the moment the markup we use is homegrown and kludgy

yay!  somebody actually said it, right out loud!


>   -- we have a great opportunity at the moment 
>   to move to something more sensible, and 
>   I strongly believe that some 
>   simple XML-derivative 
>   is the markup we need. 

"simple xml-derivative" is an oxymoron.


>   I'm even more convinced of the utility of XML for DP 
>   now that I've seen how easy it is to style it. 
>   One of the problems of relying on something like XSLT
>   is that it can be hard to go backwards from
>   errors in the output to find 
>   the corresponding error in the original XML input. 
>   Being able to get direct feedback by viewing 
>   a styled version of the XML makes life much easier.

yep, things will be a lot better when the current crop
comes to realize some of the benefits of w.y.s.i.w.y.g.
(threw the baby out with the bathwater on that one...)

if we could take the 5-year learning curve on _that_
and do it simultaneously with the one on stylesheets,
that would be _really_ great, wouldn't it?

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 10:06:45 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 10:07:08 2004
Subject: [gutvol-d] re: what i've been suggesting all along
Message-ID: <191.30f2ff61.2ea7f525@aol.com>

marcello said:
>   What I've done with Faust is to 
>   reformat the text file in a sensible way 

>   and then use perl to automatically add TEI markup.

bingo.  now do that to the whole library.
that's what i've been suggesting all along.


>   I advise to use a perl script to add the basic markup 
>   and to refine the markup in a second markup-proofing step.

i advise to write a program to add the basic markup
(perl is fine, but so is any other tool someone uses),
and then to refine your _program_ until you no longer
need to do _any_ further refinements of its output.
(or until it's easier to refine output than the program.)

in the long run, doing things _that_ way will save you
_tons_and_tons_ of unnecessary, one-time-only work.

and, again, this is what i've been suggesting all along.

now someone will come along and say, "that can't be done".
and then i'll say "you're wrong, it can be done, i've done it."
rinse and repeat, month after month, for nearly a year now.

and no, i _won't_ do it for you, just to "prove" that i can...

-bowerbird
From Gutenberg9443 at aol.com  Wed Oct 20 10:10:46 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 10:10:59 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <12c.4e97793b.2ea7f616@aol.com>

 
In a message dated 10/19/2004 3:20:51 PM Mountain Standard Time,  
jeroen@bohol.ph writes:

Also,  since it is 
normally easier to throw something away than to add, I prefer  to go to 
XML first, and then create HTML and Text from  that.


I have no problem at all with that, as long
as the HTML and TXT also are posted.
 
As to whoever it was who said that I am American-
centered, yes, I am. The only language I speak
besides English is Spanish. I'm learning to read
French so that I can read LE FIGARO better,
but will probably never pronounce it 
properly. Would like to learn German but I
probably won't live that long. The fact remains
that the MAJORITY of PGLAF's books are in
English. So I'm talking about what is necessary
IN ENGLISH.
 
I did not mention PDAs at all. I mentioned
laptops. The PDA I have refuses to speak
to my computer or allow my computer to
speak to it, so I'm rather limited in that
direction.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/c8a0d82e/attachment.html
From jonathan_ingram at yahoo.com  Wed Oct 20 10:19:42 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 10:19:49 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <190.3192affa.2ea7ebd5@aol.com>
Message-ID: <20041020171942.21949.qmail@web41713.mail.yahoo.com>


--- Bowerbird@aol.com wrote:

> jon said:
> >   And when we at DP find a markup format we're comfortable with, 
> >   then PG had better get comfortable with it as well, because 
> >   we are now produce the vast majority of all PG material.
> 
> "we make all your base."

In a very real sense, all PG's base do, indeed, belong to DP.

-- 
Jon Ingram


__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 
From marcello at perathoner.de  Wed Oct 20 10:29:42 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 10:29:50 2004
Subject: XML won't eat your children (was Re: [gutvol-d]
	jeroen's	even-handed analysis)
In-Reply-To: <167527178375.20041020100314@noring.name>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<167527178375.20041020100314@noring.name>
Message-ID: <4176A086.8030101@perathoner.de>

Jon Noring wrote:

> The OpenReader System 1.0, should it become a reality (and we are
> working on it -- we've made great strides in the last few weeks in
> garnering fairly high-level support), intends to fully support the
> more important parts of the XLink specification in version 1.0. We may
> also add one or more custom CSS values to 'display' to emulate links/
> anchors, objects/images and inline notes (OpenReader will include a
> facility to open 'booklets' to display non-inline content, in part to
> support OEBPS which enables this cool ebook feature.) We also plan to
> investigate a future version of OpenReader to *natively* support
> TEI-Lite or some subset of TEI (including handling inline notes which
> will be trivial for OpenReader to handle.) We may even develop a
> next-generation styling language to address the deficiencies of
> current CSS2 and CSS3 but which doesn't have the complexity of
> XSLT/XSL-FO. The problem with CSS is its ties to the HTML paradigm and
> legacy support. In OpenReader, we are freeing ourselves from these
> legacy issues and thus can think outside the box and move on to the
> next generation web browser -- in essence to go beyond HTML.

May I ask how many people are working on this and what the time frame 
may be?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Oct 20 10:35:00 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 10:35:17 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <68.4790aea6.2ea7fbc4@aol.com>

the hacker said:
>   Your page numbers don't correlate to anything, 
>   except an "Oh thats neat!" kind of feeling 
>   as you imagine what it would be like to be 
>   reading page 423 in the printed (dead-tree) 
>   version of that particular work. 
>   Page 423 in your numbering scheme 
>   is not the 423'd page as seen in my browser.

actually, having a solid congruence between
our information as it exists as ink-on-paper
and as it exists when displayed on-screen
will ultimately prove to be far more crucial
than merely a "oh, that's neat" kind of feeling.

it doesn't have to be a 1-to-1 congruence,
but some kind of major ratio is important.

it might be 1-to-2, where one printed page
equals 2 screens, such as is the case now
for most monitors, in the sense that they'll
nicely display _half_ of an 8.5*11-inch page.

or it can be 2-to-1,  where two printed pages
equal 1 screen, such as is the case right now
for most monitors, in the sense that they'll
nicely display a 2-page spread of a 5*8 novel.

and it _could_ be 1-to-1, too, as is the case
now if we take our monitors and turn them
from landscape to portrait, where they will
display an 8.5*11-inch pagesize quite nicely.
(go ahead and place a piece of paper up against
your monitor right now, you'll see what i mean.)

oh yeah, please don't some yahoo pipe up and say
"but we can't expect that every screen will be
the size of our desktop monitors".  _of_course_
there will be a wide variety of screen-sizes, but
the notion that a major-ratio congruence will be 
useful _still_ has the same credence and weight.

one of the main reasons people resonate to .pdf
is that the congruence between screen and paper
makes them comfortable.  they see equivalence.
making the equivalence as transparent as possible
is a powerful step to ease people toward e-books.

and further, once we have "clipboard computers"
-- a p.c. with the form-factor of a clipboard,
with wireless web-access from anywhere --
there'll be mass movement to that screensize,
exactly because it maps 1-to-1 on 8.5*11 paper.

-bowerbird
From jtinsley at pobox.com  Wed Oct 20 10:35:28 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Wed Oct 20 10:36:21 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's
	even-handed analysis)
In-Reply-To: <41768369.6050204@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de>
Message-ID: <20041020173528.GB3366@panix.com>

On Wed, 20 Oct 2004 17:25:29 +0200, Marcello Perathoner <marcello@perathoner.de> wrote:

>
>I don't think we'll get PG to post texts in non-standard cooked-up 
>formats. They are already making enough fuzz over perfectly valid TEI files.


That last is, if not inaccurate, at least misleading.

And I think you mean, by "PG" and "they" above, the WWs. So let's get
down to it.

Nobody has an objection to valid TEI texts, but valid TEI texts alone
_are not enough_. An XML file that cannot be read (by an actual human)
is as useful as a lock with no key.

We need the key as well as the lock.

I really no longer give any headroom at all to the approach "Post XML
Now Because That Is The One True Way And We'll Figure Out How To Read
It Later." If for no other reason, then because the most important
part of the WW job is to check the texts before posting, and if we
can't read it, we can't find the errors, and if we can't find the
errors, we can't fix 'em.

We WWs would all LOVE to have only one format (XML) uploaded, and
generate all posting files from that. It would cut out an amazing
amount of work and uncertainty. Further dowwn the line, we can get to
looking at posting just the XML, and generate other formats on the
fly, but let's take one step at a time. Considering that this step to
date has already taken three years or so, that's not overly cautious!

The first thing we need to do is get substantial agreement on a flavor
of XML -- not ruling out the addition of future flavors, you
understand, but we need to get at least one of them bedded down before
we attack others. Teixlite seems to be the majority choice among those
relatively few volunteers who are enthusiastic about XML, so let's
say, for the purpose of this discussion, that that's the one we're
working on.

Next, we need a process for adding the header and footer for PG texts
for the selected flavor. That shouldn't be a problem; if we can agree
how to tag them, we can automate that. (We don't actually _have_
agreement about tagging them, but I can't believe that could end up
being a problem, once we settle on the rest.)

Next, we need a process, using open-source, cross-platform tools --
the standarder the better -- to convert that XML into, at a minimum,
plain text and HTML. Other formats are welcome but optional. That
process must work for _all_ teixlite files, not just ones that are
specially cooked, using constraints not specified within the chosen
DTD. Here's where we hit the rocks today. 

I give considerable credit to you, Marcello, and to Jeroen, as the
only people I know of who have come up with at least partial answers
and approaches to this. Maybe you have refined your processes, but the
last time I tried, I couldn't put Jeroen's files through your process,
and get the expected results. I think you have most of it down,
though. Is it close enough to try again?

I don't want to imply specific means from which this process is to be
constructed. Obviously XSLT is one possible approach, but I certainly
do not want to imply limitations on what that process should use. The
only things we must have -- both for our own internal practical
purposes and for the use of future readers -- is that it should work
reliably on _all_ texts that conform to the XML DTD chosen, be open
source, and be cross-platform. A reader needs to be able to tweak the
transform and re-run on her own desktop. 

And just re-reading that last, when I say "must work reliably on ALL
texts" I do not mean to imply that the same XSLT must be used for all
texts, though obviously that would be of benefit, if we can manage it.

I've held just about every position on XML at one time or another,
and I'm all XMLed out. I no longer believe it is worth spending my
time on, until somebody (else!) solves the issues I've just laid out.

jim

From marcello at perathoner.de  Wed Oct 20 10:37:53 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 10:38:01 2004
Subject: XML won't eat your children (was Re: [gutvol-d]
	jeroen's	even-handed analysis)
In-Reply-To: <20041020160328.75110.qmail@web41704.mail.yahoo.com>
References: <20041020160328.75110.qmail@web41704.mail.yahoo.com>
Message-ID: <4176A271.2010105@perathoner.de>

Jonathan Ingram wrote:

> and one of the best ways to *fail* to
> change their mind is to plonk 1400 pages of documentation in front of them and
> say 'here's what you should be using, 

Then don't do that.

You don't plonk the IBM PC Technical Reference Manual (5000 pages) in 
front of your secretary if you want her to type a few pages in M$-Word. 
You just give her a "Word for Dummies" book and that is all she needs. 
She don't need to know about the difference between AGP and PCI-X bus.

The full TEI spec explains the DTD and what not. Nobody needs that 
except the implementors.

There are many gentle introductions to TEI-Lite floating around. And 
thats another advantage of using a standard. You don't have to write 
that stuff yourself.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Gutenberg9443 at aol.com  Wed Oct 20 10:46:53 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 10:47:08 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <9a.174b3315.2ea7fe8d@aol.com>

 
In a message dated 10/19/2004 3:36:41 PM Mountain Standard Time,  
jeroen@bohol.ph writes:

Of  course he does. How on Earth can he teach German or
>French, or expect  his students to read a book in a language
>they are familiar with (in  large parts of Africa, that
>would be French), without the proper  umlauts and grave
>accents?
>  
>
>Even worse,  many African languages are written with the >Latin alphabet, 
>but  using additional letters, such as an F with a curl, >which, until  
>very recently weren't supported by most computers or >typewriters,  and 
>thus conveniently replaced by their nearest >counterparts. You  could have 


Lead, follow, or get out of the way. Can you supply a way to do this? If  so, 
do it. If not, quit bellyaching. I have gotten a sufficient number of  
letters and emails from Africans to be aware that in many African countries,  
learning English is very desirable but is not done well.
 
I proofread for PGLAF a book in French which had been translated into  
English but had maintained the French forms of a good many names, titles, and  other 
words. As my husband speaks French fluently, I had him check everything I  
had done. It wound up being posted in two versions: one without the French  
characters and one with the French characters. As I had worked extremely hard to  
make sure the French characters were right, I felt sad when I tried to read 
the  version without the French characters. But all the same, I'd rather that 
readers  have that version than no version at all of the book.
 
The principle of the greatest good for the greatest number doesn't mean  
let's throw out the lesser numbers.
 
IF I AM WRITING IN ENGLISH OR READING IN ENGLISH I don't need the grave  
accents and the umlauts UNLESS I AM DOING SCHOLARLY WORK. I cannot reasonably  
express an opinion of how to do works in other languages because I don't speak  
those languages. I do know that books in English posted in TXT are readable to  
all English-speaking people, and that includes many people for whom English 
is  their second or third language.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/9236398e/attachment.html
From marcello at perathoner.de  Wed Oct 20 10:47:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 10:47:18 2004
Subject: [gutvol-d] Why Bowerbird is a genius
In-Reply-To: <ba.63165d94.2ea7eb7c@aol.com>
References: <ba.63165d94.2ea7eb7c@aol.com>
Message-ID: <4176A49E.5090707@perathoner.de>

Bowerbird@aol.com wrote:

> a new version will be out very soon, and
> you should wait to test that one instead...

Don't fear. We haven't done anything else since you first announced your 
reader on 14 Feb 2003, 20 months ago.


>>  P.S. Why are you hiding the etexts in your code
>>  instead of making them separate .txt's?
> 
> the first objective is to get the program solid.
> to focus on that, it's wise to use content that
> i _know_ has been correctly formatted in z.m.l.

Malicious tongues would argue that you want to keep beta-testers from 
using their own ZML texts and discovering what a useless piece of crap 
your software is.

If I want to read a new book I have to get a new reader?

Even Micro$oft never went that far in Digital Restriction Management.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Wed Oct 20 10:04:36 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 10:48:11 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
Message-ID: <20041020170436.532794BDAA@ws1-1.us4.outblaze.com>

Steve Thomas writes:
> As usual, people have missed the point of the original post 
> (Anne's) which was that we need to remember the *user* -- that 
> guy in Africa with only 2 hours of electricity each day.

I'm not spending as much time as I do with PG for him. I seriously
doubt that he's interested in Ossian in Germany or Selections
from Early Middle English. My target user is a scholar, whether
a kid in high school, or a college student or professor or other
person who may not have or may not be interested in waiting on
interlibrary loan. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Gutenberg9443 at aol.com  Wed Oct 20 10:48:08 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 10:49:40 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <13e.44a2533.2ea7fed8@aol.com>

 
In a message dated 10/19/2004 3:48:35 PM Mountain Standard Time,  
jeroen@bohol.ph writes:

I  already have numerous benefits from working in XML, in that I can 
generate  nice HTML files (that often need no touch-up at all) and 
reasonable plain  ASCII for PG, but also have spelling checking on a per 
language base,  extract all fragments in a certain language, create 
tables of contents,  etc. on the fly, extract dublin core bibliographic 
records, and  more.


Good. Do it.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/32b50650/attachment-0001.html
From Gutenberg9443 at aol.com  Wed Oct 20 10:53:52 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 10:54:02 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <199.31dbff61.2ea80030@aol.com>

 
In a message dated 10/19/2004 5:00:14 PM Mountain Standard Time,  
shalesller@writeme.com writes:

>  index items should be linked to the place in the text, 
> and a backlink  should be made as well, if at all possible 

A backlink from where? And  why? I think we should use links
only where they are explicit or at least  loudly implicit in
the original work.


I do understand this one. If I can click the index number in the text to  
take me to the index entry, I then want to click something on the index entry  
that will take me back to the same place in the text. Ditto footnotes and  
endnotes, which I now do by inserting them at the end of the appropriate  paragraph 
and double-indenting them.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/7b335f69/attachment.html
From Bowerbird at aol.com  Wed Oct 20 10:53:53 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 10:54:17 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <159.4250e0be.2ea80031@aol.com>

the hacker said:
>   And decreasing every day.

not nearly fast enough, though.        :+)

and given that today's machines will likely
serve their owners needs -- i.e., do e-mail --
for the next decade, it's gonna be a long haul.


>   Users aren't using MSIE because it is the superior product

d'uh...        ;+)


>   they're using it because they have no idea there are 
>   significanly more secure, functional, compliant 
>   browser alternatives out there

there are a lot of things that users "don't know".

but unless you're willing to actually inform them
-- which can be a _tremendously_ difficult job --
then you must accept that if you want to be of service 
to them, then you have to work within their limitations.

project gutenberg wants to be of service to people;
that's why it has been the most successful e-library.

the alternative -- favored by most techies, it seems --
is to leave people behind.  that's fine if you want to be
a minority.  i have been a mac user for a very long time,
so that tells you where i stand on that matter personally.
but know that people aren't going to be writing you letters
thanking you for what you've done, like they do michael hart.


>   and because it came with their pee-cee, 
>   with a nice convenient icon right on their desktop.

monopolies suck, don't they...

feel free to fight the power,
but know that 85% of those 93%
won't be bothered to follow you...

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 10:55:47 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 10:55:57 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <1b8.41e17b6.2ea800a3@aol.com>

the hacker said:
>   I think you mean CSS.

yes, that is exactly what i meant.
how psychic of you to pick that up.      ;+)

-bowerbird
From marcello at perathoner.de  Wed Oct 20 10:57:02 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 10:57:10 2004
Subject: [gutvol-d] re: what i've been suggesting all along
In-Reply-To: <191.30f2ff61.2ea7f525@aol.com>
References: <191.30f2ff61.2ea7f525@aol.com>
Message-ID: <4176A6EE.8030707@perathoner.de>

Bowerbird@aol.com wrote:

>>  and then use perl to automatically add TEI markup.
> 
> bingo.  now do that to the whole library.
> that's what i've been suggesting all along.

You have been saying nothing of the kind.

You said all markup was wasted because ZML was "two steps better" than XML.

TEI is an XML application after all.


> and no, i _won't_ do it for you, just to "prove" that i can...

Until now you have only proven that you don't know what you are talking 
about.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Wed Oct 20 10:15:40 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 11:01:37 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020171540.523E14BDA9@ws1-1.us4.outblaze.com>

Steve Thomas writes:

> That's not the point. People don't go to PG thinking, "hmmm, I 
> wonder if they have any XML files". They go looking for a book. 
> If you want the text of a particular book, you'll use it 
> whatever format it comes in, so long as you have the software to 
> handle that format. Nobody "needs" XML or PDF. They "need" the 
> words of the book. Formats are secondary. 

What if they have a page reference to the standard (or only) edition 
of the book? Then they "need" the page numbers. What if they have
a speech synthesizer smart enough to do multiple languages, but
they need the languages marked? Then they "need" language tagging.
What if they "need" to process a table? Then they "need" a system
that doesn't ASCII-format tables.

(And, BTW, a speech synthesizer that just skips accented letters
is just lame. Removing the accents could be done in one line of
Perl or a dozen lines of Fortran.)
 
> Could it be better to put the PG effort into getting plain text 
> editions out, and leave it to others to do the extra conversion 
> to XML etc.? This is a model that has worked really very well 
> for quite a few years, without complaint from any but a few 
> tech-enthusiasts. 

No, it doesn't work real well. The value of XML is in what it
includes that plain text doesn't, and a lot of that is lost in
the plain text version. You need the original book to fix that. 
Even with the original book, it can be a pain, whereas it's trivial 
to keep page numbers (for example) in the original processing.

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Gutenberg9443 at aol.com  Wed Oct 20 11:03:25 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 11:03:46 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <a9.647db767.2ea8026d@aol.com>

 
In a message dated 10/20/2004 12:54:25 AM Mountain Standard Time,  
tb@baechler.net writes:

I would  like to see PG eventually go to xml not because I particularly like 
the  format but because the new DAISY standard for digital talking books for  
the blind uses a form of xml. 


Thank you for this input.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/080243af/attachment.html
From shalesller at writeme.com  Wed Oct 20 10:21:03 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 11:06:40 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com>

?"David A. Desrosiers" writes:

> > a big shame, since i.e. still has -- what -- 93% of all surfers? 
> 
> And decreasing every day. 
> 
> Users aren't using MSIE because it is the superior product, 
> they're using it because they have no idea there are significanly more 
> secure, functional, compliant browser alternatives out there, and 
> because it came with their pee-cee, with a nice convenient icon right 
> on their desktop. 

And nothing's going to fix that in the forseeable future. And what about 
us who serf the net through the services of a library, and have no option 
on which browser to use?

I really think you'll are dismissing IE and Lynx too quickly. We can't
just support Mozilla, now or in the future.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From shalesller at writeme.com  Wed Oct 20 10:26:09 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 11:10:53 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020172609.C9E684BDAB@ws1-1.us4.outblaze.com>

Gutenberg9443@aol.com writes:
> So I'm talking about what is necessary
> IN ENGLISH.

Then let's not act like we're doing this for third-world countries,
since many of them don't speak English. If we were doing this
for third-world countries, we should be doing a lot more Spanish
and French and Arabic and a bunch of other languages that we generally
totally ignore.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Wed Oct 20 11:12:13 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 11:12:29 2004
Subject: [gutvol-d] re: at a minimum, plain text and HTML
Message-ID: <1e0.2cd9670c.2ea8047d@aol.com>

jim said:
>   I no longer believe it is worth spending my time on, 
>   until somebody (else!) solves the issues I've just laid out.

thanks to jim for saying what needs to be said.

now one specific point about one of the paragraphs in his post...

>   we need a process, using open-source, cross-platform tools
>   -- the standarder the better -- to convert that XML 
>   into, at a minimum, plain text and HTML. 
>   Other formats are welcome but optional.

if you're willing to settle for plain-text and .html,
then doing the files in plain-text and refining your
text2html converter is _far_ more cost-effective.

if -- sometime down the line -- the move to x.m.l.
really is inevitable (and not just hyped to be that),
you will find that your text2html converter can be
improved so that it will convert to x.m.l., and you
will have saved yourself an enormous tagging job...

-bowerbird
From hacker at gnu-designs.com  Wed Oct 20 11:14:36 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 11:15:05 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <159.4250e0be.2ea80031@aol.com>
References: <159.4250e0be.2ea80031@aol.com>
Message-ID: <Pine.LNX.4.61.0410201411500.12059@aphrodite.gnu-designs.com>


> there are a lot of things that users "don't know".

> but unless you're willing to actually inform them -- which can be a 
> _tremendously_ difficult job -- then you must accept that if you 
> want to be of service to them, then you have to work within their 
> limitations.

 	Speak for yourself, but I've successfully gotten 14 local 
businesses to switch completely over to Firefox on Windows for their 
primary browser, in the last 60 days. This includes every workstation 
running a browser with Internet connectivity in all 14 businesses.

 	I'm doing my part. Are you?

>>   and because it came with their pee-cee,
>>   with a nice convenient icon right on their desktop.

> monopolies suck, don't they...

 	Actually, they don't even cross my radar, ever.

> feel free to fight the power, but know that 85% of those 93% won't 
> be bothered to follow you...

 	Nor do I care. I only care for the ones who are willing to 
make their lives, and the lives of others better. For the users who 
refuse to learn, to adapt, and to grow, they can stagnate and stay in 
their own nice warm puddle.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Gutenberg9443 at aol.com  Wed Oct 20 11:19:47 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 11:19:59 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <15a.4173d96b.2ea80643@aol.com>

 
In a message dated 10/20/2004 4:41:26 AM Mountain Standard Time,  
marcello@perathoner.de writes:

Well  same limitations for PDF. It 
hasn't stopped people from buying paper  books.


They've da** well stopped ME from buying
paper  books. I CAN'T READ THE BLOODY
THINGS! I have about 800 paperback books
right beside my bed that I CANNOT READ IN
BED and most of them are in no other format.
I read them in the living room, using a magnifying
glass when necessary. Most of my hardcover
books I can still read in the living room, but I
can't read them in bed either. I'm to the
point that I would far rather read a hundred-
year-old book on screen than a brand new one
on paper, even if it's a topic in which I am
extremely interested.
 
I'm really not interested in converstion
from XTM or XTL or whatever it is, if
you're expecting the reader to do the
conversion. Back to my third-world
schoolmaster with his donated 486
and a slow CD reader--if we send
him a CD of PG books in English he
can read them and he can use them
to teach his students English, which will
greatly improve their chances of finding
decent work when they are adults.
But he can do this only if the books are
in TXT format.
 
Please. I am not trying to start a flame war.
I detest flame wars. I am simply returning,
again and again, to Michael Hart's original
vision. No matter what ELSE we do to the
texts, we are betraying what makes PG
special if we require everybody to have this
program or that program which probably won't
run on most obsolete or obsolescent computers.
 
All this other stuff sounds grand. I wish I
could understand it. But I can't. Neither
can 99.9999999% of the other people who
use PD.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/8424b090/attachment-0001.html
From Gutenberg9443 at aol.com  Wed Oct 20 11:28:13 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 11:28:30 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <29.642cce53.2ea8083d@aol.com>

 
In a message dated 10/20/2004 9:48:33 AM Mountain Standard Time,  
joshua@hutchinson.net writes:

The  original page breaks were necessitated by the size of paper the 
publisher  used.  There is almost never a functional meaning to the page breaks in a  
book (except things like chapter breaks, which are easily marked up with  
horizontal rules or something to that effect).  


Speaking as a writer, I strongly disagree. I often use page breaks as a  
transition, and most other fiction writers do the same thing. (That's where I  
learned it.) To keep page breaks in a TXT version, simply insert # # # at the  
left margin where the page break belongs. That way TXT isn't confused or  
confusing, and the reader can see that as a page break.
 
Anne
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/a77536f1/attachment.html
From hacker at gnu-designs.com  Wed Oct 20 11:31:15 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 11:32:06 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com>
References: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410201423210.12059@aphrodite.gnu-designs.com>


> I really think you'll are dismissing IE and Lynx too quickly. We 
> can't just support Mozilla, now or in the future.

 	There is this myth, and you just confirmed it again, supported 
by 100% of the people who hear that supporting MSIE is not a wise 
decision, that for some reason, that is interpreted as "won't" support 
MSIE.

 	I get see this in web development circles all the time, when I 
explain that I develop against the standards, and in Mozilla, and I 
test in 13 browsers, including MSIE. I _always_ get people who come 
back with "Why don't you support MSIE?". Apparently logic and clear 
thought aren't among their better traits. For some reason, the notion 
that I develop in Mozilla, using standards, somehow means I am not 
making code that would work in MSIE. Nothing could be farther from the 
truth.

 	Just because I support Mozilla, does not mean I do NOT support 
MSIE. That being said, if my code works in 13 browsers, and fails in 
MSIE, my code is not the problem. I do, however, refuse to add "hacks" 
to get MSIE to do what it should be doing anyway... following the 
standards.

 	If the code works in MSIE, and breaks in Mozilla, MSIE is the
 	problem.

 	If the code works in Mozilla, and breaks in MSIE, MSIE is the
 	problem.

 	Which brings me to a great quote I found which is related to 
this exact issue of Microsoft intentionally ignoring the published 
standards:

         "Microsoft properly asserts that OpenOffice is not 100%
          compatible with their product. Microsoft, however, has
          apparently decided not to support the OpenOffice formats
          either, for which they have no excuse: the standards for
          OpenOffice documents are publicly available, whereas
          Microsoft makes it a habit to sue people for reverse
          engineering their own formats."

 	Anyway, I don't want to turn this into a browser religious 
war.

David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Bowerbird at aol.com  Wed Oct 20 11:31:55 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 11:32:40 2004
Subject: [gutvol-d] Why Bowerbird is a genius
Message-ID: <12c.4e9a7208.2ea8091b@aol.com>

marcello said:
>   Malicious tongues would argue that 

malicious tongues would, would they?
but _your_tongue_ wouldn't, would it?


>   Malicious tongues would argue that 
>   you want to keep beta-testers from 
>   using their own ZML texts 

once they have helped me locate all the bugs in the program,
i'll actively _want_ beta-testers to run their own z.m.l. texts.

until then, though, i don't want somebody reporting a "bug"
in the program that is _actually_ due to their improper z.m.l.

you have to do things in the correct order, that's all.

people also have to keep in mind that the rules of z.m.l. are
also "in-progress" at the same time, being continually refined,
as conditions require, so there are many open parameters here.
use of a constrained set of texts is the logical course of action.


>   and discovering what a useless piece of crap your software is.

well, much better to learn that early from the beta copy
than having to wait all that time for the release version,
don't you think?        :+)

-bowerbird

p.s.  one definition:  "condone... 2. to give tacit approval to:
_by_his_silence,_he_seemed_to_condone_their_behavior_."
-- page 278, random house webster's college dictionary...
From joshua at hutchinson.net  Wed Oct 20 11:41:00 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 11:41:09 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020184101.0F2849E980@ws6-2.us4.outblaze.com>

I can hate IE and still support it.  Honest!  

What I do, personally, is create a minimum functionality that will work in all browsers.  Then, if I see a need that goes beyond that, and only some of the browsers support it, fine.  As long as it doesn't degrade the minimum functionality in all browsers, I'm willing to add it.

That is what the page numbers markup currently does.  It hides the page numbers for those the minimum, default behavior, but if you have a browser that supports it, you can see those page numbers appear.  Similarly with poetry.  It has features that allow the browser to rewrap nicely if there is a long line, if the necessary CSS support is there ... but if not, it still displays the poem with its normal indents, it just doesn't rewrap nicely for you.

If I try something and it dies on one of the browsers, I take it back out or find another compatible way.

In my case, at the very least, if you find something I've worked on that does NOT degrade gracefully in Lynx, etc....  Let me know.  I consider that a bug in my work.

Josh

----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: re: Re: [gutvol-d] Re: aspects of  a well-done e-book
Date: Wed, 20 Oct 2004 09:21:03 -0800

> 
> ?"David A. Desrosiers" writes:
> 
> > > a big shame, since i.e. still has -- what -- 93% of all surfers? 
> > 
> > And decreasing every day. 
> > 
> > Users aren't using MSIE because it is the superior product, 
> > they're using it because they have no idea there are significanly more 
> > secure, functional, compliant browser alternatives out there, and 
> > because it came with their pee-cee, with a nice convenient icon right 
> > on their desktop. 
> 
> And nothing's going to fix that in the forseeable future. And what about 
> us who serf the net through the services of a library, and have no option 
> on which browser to use?
> 
> I really think you'll are dismissing IE and Lynx too quickly. We can't
> just support Mozilla, now or in the future.
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Wed Oct 20 11:43:14 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 11:43:23 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <190.3192affa.2ea7ebd5@aol.com>
References: <190.3192affa.2ea7ebd5@aol.com>
Message-ID: <4176B1C2.3010007@perathoner.de>

Bowerbird@aol.com wrote:

> "we make all your base."

You should definitely learn to get your quotes right.

   all your base are belong to us

http://www.catb.org/%7Eesr/jargon/html/A/all-your-base-are-belong-to-us.html


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Oct 20 11:46:04 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 11:46:14 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com>

Let me assure you.? NONE of us (not even bowerbird) is expecting Joe Sixpack to have the conversion tools loaded on his computer.

Whether the eventual process will have the whitewashers creating the TXT and HTML files from XML or the server building them on the fly, no one wants that burden to fall on the reader.  There will always be plain text files available.  Even the most hardened XML-phile among us isn't going to take that away.  (Even if they refuse to ever actually read a text in plain-ascii format!  ;) )

Josh

----- Original Message -----
From: Gutenberg9443@aol.com

Please. I am not trying to start a flame war.
I detest flame wars. I am simply returning,
again and again, to Michael Hart's original
vision. No matter what ELSE we do to the
texts, we are betraying what makes PG
special if we require everybody to have this
program or that program which probably won't
run on most obsolete or obsolescent computers.
From Bowerbird at aol.com  Wed Oct 20 11:46:45 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 11:46:59 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <55.645b1253.2ea80c95@aol.com>

the hacker said:
>   Nor do I care.  I only care for the ones who are 
>   willing to make their lives, and the lives of others better. 
>   For the users who refuse to learn, to adapt, and to grow,
>   they can stagnate and stay in their own nice warm puddle.

fine.  i have no problem with that.  none whatsoever.
take whatever attitude you want to, that's what i do.

but michael hart's attitude -- which is the one that has
made project gutenberg the most successful e-library
in all cyberspace -- is to serve the trailing-edge user...

and now _you_ -- like so many other e-book initiatives before,
and i specifically include myself in that group -- find yourself
coming here to use michael's e-text files for your own purpose.

kind of ironic, isn't it?

-bowerbird
From joshua at hutchinson.net  Wed Oct 20 11:49:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 11:50:00 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020184952.01B2AEDC4F@ws6-1.us4.outblaze.com>

Are you sure we're talking about the same thing?  By page break, I mean when you get to the bottom of the physical piece of paper.  What you describe sounds more to me like what I usually refer to as a thought break ... a little white space or a graphic symbol between sections of text to indicate a scene transition or time passing, etc.  Typically, in PG texts, those are marked with 5 asterisks (functionally equivalent to your # # #).

Josh

----- Original Message -----
From: Gutenberg9443@aol.com
 
Speaking as a writer, I strongly disagree. I often use page breaks as a transition, and most other fiction writers do the same thing. (That's where I learned it.) To keep page breaks in a TXT version, simply insert # # # at the left margin where the page break belongs. That way TXT isn't confused or confusing, and the reader can see that as a page break.

Anne


From Bowerbird at aol.com  Wed Oct 20 11:56:32 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 11:56:48 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <197.31921c87.2ea80ee0@aol.com>

marcello said:
>   You should definitely learn to get your quotes right.
>   all your base are belong to us

except _that_ doesn't correspond to what ingram said...

which is precisely why i reworked the phrase.

and whether you rework gibberish "correctly"
or not seems to be rather beside the point, not?

at any rate, have a nice day, marcello.       :+)

-bowerbird
From hacker at gnu-designs.com  Wed Oct 20 11:57:02 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 11:58:07 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <55.645b1253.2ea80c95@aol.com>
References: <55.645b1253.2ea80c95@aol.com>
Message-ID: <Pine.LNX.4.61.0410201455280.12672@aphrodite.gnu-designs.com>


>>   Nor do I care.  I only care for the ones who are
>>   willing to make their lives, and the lives of others better.
>>   For the users who refuse to learn, to adapt, and to grow,
>>   they can stagnate and stay in their own nice warm puddle.

> and now _you_ -- like so many other e-book initiatives before, and i 
> specifically include myself in that group -- find yourself coming 
> here to use michael's e-text files for your own purpose.

> kind of ironic, isn't it?

 	..only in the sense that you've taken my words completely out 
of context, and twisted them to suit a discussion that wasn't even 
part of the original reply.

 	I'm rapidly tiring of this, and its a waste of my time, as 
well as the time of others. If we're not moving forward, we're not 
moving, and that is never a wise thing to continue to expend effort 
upon.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Bowerbird at aol.com  Wed Oct 20 12:01:07 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 12:01:26 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1da.2e1f99d3.2ea80ff3@aol.com>

starner said:
>   The value of XML is in what it includes that plain text doesn't, 
>   and a lot of that is lost in the plain text version.

lost?  or just not currently included?  or even deliberately thrown out?

consider carefully what your language implies, it might constrain you...

meanwhile, i will point out once again that nobody has challenged my
contention that i can represent all important book features using z.m.l.
plain text, when formatted wisely, is far more powerful than you know.

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 12:09:26 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 12:09:49 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1f2.13d38da.2ea811e6@aol.com>

joshua said:
>   NONE of us (not even bowerbird) is expecting Joe Sixpack 
>   to have the conversion tools loaded on his computer.

i expect that joe sixpack won't _need_ a "conversion tool"
because my viewer will give him more e-book functionality
from z.m.l. plain-text than any other format/viewer gives him.

and if it happens he _does_ need a conversion for a good reason,
then i'll build that conversion routine into my program for him...

so i _do_ expect that he'll have a conversion tool on his computer,
but i _also_ expect that he'll never find any good reason to use it...

(in the long run, anyway.  but until my viewer is ported to the pda,
and to web-servers, an .html converter will probably be necessary.)

-bowerbird
From joshua at hutchinson.net  Wed Oct 20 12:17:37 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 12:17:50 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020191737.29D554F462@ws6-5.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> meanwhile, i will point out once again that nobody has challenged my
> contention that i can represent all important book features using z.m.l.
> plain text, when formatted wisely, is far more powerful than you know.


Repeat after me...

ZML ... IS ... NOT ... PLAIN ... TEXT!
From Bowerbird at aol.com  Wed Oct 20 12:18:39 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 12:18:56 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <1e2.2ca102ae.2ea8140f@aol.com>

the hacker said:
>   only in the sense that you've taken my words completely out 
of context, and twisted them to suit a discussion that wasn't even 
part of the original reply.

you said -- as clearly as it can be said -- that
your attitude is to leave the trailing edge behind.

i only said that's not what michael's attitude is.

and it is a historical fact that it has been michael
-- and _not_ all the people with your attitude --
that has created the best library in cyberspace so far.

if you don't give him -- and his attitude -- that credit,
you will (like so many others) fail to learn from history.


>   I'm rapidly tiring of this, and its a waste of my time, 
>   as well as the time of others. 

you're a quick study, david.
it took me 6 months to figure that out.
you've gotten the message in one week.         :+)


>   If we're not moving forward, we're not moving

this place hasn't been able to move forward on x.m.l. in 3 years.
if you read the archives, you'll see all these threads are re-runs.


>   and that is never a wise thing to continue to expend effort upon.

righto.  that's why i'm leaving shortly.
i just came back for one last harrah,
to set the record straight one more time
that "i told you so, and did it repeatedly..."

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 12:25:44 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 12:25:58 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1dc.2e9d3a95.2ea815b8@aol.com>

josh said:
>   ZML ... IS ... NOT ... PLAIN ... TEXT!

of course it is.

x.m.l. and h.t.m.l. and s.g.m.l. are too.

it all reduces down to 1s and 0s.
it's just that some formats are
more or less readable in that state.

-bowerbird
From marcello at perathoner.de  Wed Oct 20 12:35:11 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 12:35:20 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020173528.GB3366@panix.com>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<20041020173528.GB3366@panix.com>
Message-ID: <4176BDEF.7050008@perathoner.de>

Jim Tinsley wrote:

> Nobody has an objection to valid TEI texts, but valid TEI texts alone
> _are not enough_. An XML file that cannot be read (by an actual human)
> is as useful as a lock with no key.

Not so. Having a TEI text posted would enable third-party developers to 
come up with their own converter solutions eve if we didn't get very far 
with ours. There are a lot of people around who already convert the text 
files into other formats. Their jobs would get much easier.


> I really no longer give any headroom at all to the approach "Post XML
> Now Because That Is The One True Way And We'll Figure Out How To Read
> It Later." If for no other reason, then because the most important
> part of the WW job is to check the texts before posting, and if we
> can't read it, we can't find the errors, and if we can't find the
> errors, we can't fix 'em.

A TEI text is basically a text file. So you can read it in any editor. 
If you use emacs you can also validate the TEI file against the DTD 
without leaving the editor.

A perfectly valid TEI file with no spelling errors should be good enough 
to post.

What you expect from us TEI developers is that we produce the 150% 
perfect solution before you even consider starting to post files. That 
is not the way software development works.

And this attitude is in my opinion the main cause why we have gotten 
nowhere with TEI in the last 3 years.

Lets start now with a version 0.0.1 of the TEI process. Of course at 
some later time we'll have to do all the posted files over again. 
Probably more than once. But its better than sitting here and playing 
with bowerbird because we are bored.


> Next, we need a process, using open-source, cross-platform tools --
> the standarder the better -- to convert that XML into, at a minimum,
> plain text and HTML. Other formats are welcome but optional. That
> process must work for _all_ teixlite files, not just ones that are
> specially cooked, using constraints not specified within the chosen
> DTD. Here's where we hit the rocks today. 

TEI defines a standard way to extend the DTD. I used this standard way 
to extend the TEI DTD into what I called PGTEI. This still is a 
perfectly valid TEI DTD according to the TEI specs.


> I don't want to imply specific means from which this process is to be
> constructed. Obviously XSLT is one possible approach, but I certainly
> do not want to imply limitations on what that process should use. The
> only things we must have -- both for our own internal practical
> purposes and for the use of future readers -- is that it should work
> reliably on _all_ texts that conform to the XML DTD chosen, be open
> source, and be cross-platform. A reader needs to be able to tweak the
> transform and re-run on her own desktop. 

You misunderstand what a DTD is. It just gives you syntactical 
correctness. I can cook up a perfectly valid XHTML file which is 
semantically bogus:

   <div><h6>1</h6>
     <div><h5>1.1</h5>
        <div><h4>1.1.1</h4>
          ...
        </div>
     </div>
   </div>

This is valid HTML (didn't bother to check) but will render not so well.

You cannot build a conversion tool that will produce good results on all 
syntactically valid TEI files, like you cannot build a browser that will 
make sense out of semantically bogus HTML files.

Furthermore TEI is geared towards marking up existent texts, so scholars 
can study the text without having to get the physical book. It is not so 
good as a master format for print processing. That's why I had to add 
some more tags and attributes to my DTD. (Which doesn't make any text 
that uses my DTD less standard, because TEI is expressly designed to be 
extensible. But I'm repeating myself.)

> And just re-reading that last, when I say "must work reliably on ALL
> texts" I do not mean to imply that the same XSLT must be used for all
> texts, though obviously that would be of benefit, if we can manage it.

So why not start posting texts marked up in PGTEI, which will by 
definition work well in my conversion chain?

And at the same time start posting Jeroens texts, which will convert 
fine in his chain?

This way we could both start putting up an automatic online conversion 
chain. (The guy who did this already in Java has somehow vanished, so I 
think we have to start over again.)

For the start I will act as interim Post-Processor for people wanting to 
post PGTEI and pass on to you only the perfectly good ones. You'll just 
have to stick in the etext number where I put 5 asterisks.

I claim the .pgtei file extension, Jeroen can claim what extension he 
sees fit for his files. So we can have bith an alice30.pgtei and an 
alice30.jtei.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Wed Oct 20 12:47:51 2004
From: jon at noring.name (Jon Noring)
Date: Wed Oct 20 12:48:21 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <4176BDEF.7050008@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de>
Message-ID: <103540655640.20041020134751@noring.name>

Marcello wrote:

> TEI defines a standard way to extend the DTD. I used this standard way 
> to extend the TEI DTD into what I called PGTEI. This still is a 
> perfectly valid TEI DTD according to the TEI specs.

I probably missed it from one of your prior messages, but do you have
your PGTEI documented anywhere? Have you put together an actual
Schema/DTD which can be used to validate documents for validity to
PGTEI? And a list of your custom vocabulary extensions?

Also, another question to ask is if it is documented anywhere how
Jeroen's version of TEI compares with your PGTEI?

Thanks!

Jon

From marcello at perathoner.de  Wed Oct 20 12:52:26 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 12:52:39 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <1e2.2ca102ae.2ea8140f@aol.com>
References: <1e2.2ca102ae.2ea8140f@aol.com>
Message-ID: <4176C1FA.6080706@perathoner.de>

Bowerbird@aol.com wrote:

> righto.  that's why i'm leaving shortly.

Promises. Promises.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Oct 20 12:55:49 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 20 12:55:59 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <103540655640.20041020134751@noring.name>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<20041020173528.GB3366@panix.com>	<4176BDEF.7050008@perathoner.de>
	<103540655640.20041020134751@noring.name>
Message-ID: <4176C2C5.6070801@perathoner.de>

Jon Noring wrote:

> I probably missed it from one of your prior messages, but do you have
> your PGTEI documented anywhere? Have you put together an actual
> Schema/DTD which can be used to validate documents for validity to
> PGTEI? And a list of your custom vocabulary extensions?

Start here:

   http://www.gutenberg.org/tei/

> Also, another question to ask is if it is documented anywhere how
> Jeroen's version of TEI compares with your PGTEI?

No.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Oct 20 12:56:29 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 12:56:45 2004
Subject: [gutvol-d] sudden surge in demand
Message-ID: <8b.17ea1fb8.2ea81ced@aol.com>

due to the sudden surge in demand, i've uploaded
a new version of my beta-test viewer-program...

it can be found in the files section of the 
yahoogroups listserve for the beta-test,
which is under the name of "zml_talk".

you can join that beta-test by subscribing via
e-mail:  zml_talk-subscribe@yahoogroups.com

this upload is just "the daily build", and has _not_
been reviewed, so it should be considered as such...

people who want to run a version _known_ to be stable
should hold off on this.  such a version will go up soon.

additionally, no reports are necessary on this version,
since a few of the features have yet to be implemented.

but if you'd like to see the development of the program
since the version uploaded on 7/27, it's there for you...

sorry it's taken so long, but you know how things are.
time flies when you're squashing bugs...

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 13:15:18 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 13:15:44 2004
Subject: [gutvol-d] Posting TEI
Message-ID: <5b.5b84bd22.2ea82156@aol.com>

marcello said:
>   A TEI text is basically a text file. 

right.  if you could tell josh this, i'd appreciate it.


>   So you can read it in any editor. 

"read" might not be the best word-choice there.
"view" is more appropriate, i would think, since
the markup will often obscure the content itself.


>   If you use emacs you can also 
>   validate the TEI file against the DTD 

>   without leaving the editor.

that's a nice thing to be able to do.

but the .tei might be _valid_ and
still not do what it's supposed to do,
accurately reflect the underlying structure
and therefore result in a correct rendering...

because i know you're all curious,
with the z.m.l. authoring-tool,
people will be able to see exactly
what the viewer-program will display,
in a window alongside their edit window.

if it doesn't look right in the display window,
you make changes in the edit window until it does.
the rules are so simple that this is very easy to do.


>   A perfectly valid TEI file with no spelling errors 
>   should be good enough to post.

only if it's marked up _correctly_, though, right?


>   So we can have bith an alice30.pgtei and an alice30.jtei.

that sounds like loads of fun...  twice as much, at least!

-bowerbird
From hacker at gnu-designs.com  Wed Oct 20 13:27:40 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 13:29:08 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <1e2.2ca102ae.2ea8140f@aol.com>
References: <1e2.2ca102ae.2ea8140f@aol.com>
Message-ID: <Pine.LNX.4.61.0410201625120.3746@angst.gnu-designs.com>


> you said -- as clearly as it can be said -- that your attitude is to 
> leave the trailing edge behind.

 	I said no such thing, and now I know why you seem to have such a 
loyal following of people "supporting" your efforts here.

 	Try spending a little more time learning why people hold the 
opinions and convictions they have about this project, and a little less 
time rewording what they've said to suit your next argument to counter 
them with. I think you'll find a great deal more exists if you tempt 
people with wine than vinegar.

> this place hasn't been able to move forward on x.m.l. in 3 years. if you 
> read the archives, you'll see all these threads are re-runs.

 	I believe the term you meant to use there was XML, not x.m.l.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From joshua at hutchinson.net  Wed Oct 20 13:29:30 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 13:29:41 2004
Subject: [gutvol-d] Posting TEI
Message-ID: <20041020202930.5295E2F8F4@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> 
> marcello said:
> >   A TEI text is basically a text file. 
> 
> right.  if you could tell josh this, i'd appreciate it.
> 

Text file and PLAIN text file are two different things.  XML/HTML/XHTML/TEI/TEI-Lite/ZML ... those are all text files.  None of them are PLAIN text files, which is what you always seem to advocate.

ZML (Zero Markup Language) is ... wait for it ... a MARKUP LANGUAGE.  

That means it is just like XML/HTML, etc.  Except that the rest are open standards with open source utilities available.  ZML is you just wasting our time.

Josh
From joshua at hutchinson.net  Wed Oct 20 13:34:14 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 13:34:23 2004
Subject: [gutvol-d] Posting TEI
Message-ID: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Jim Tinsley wrote:
> 
> > Nobody has an objection to valid TEI texts, but valid TEI texts alone
> > _are not enough_. An XML file that cannot be read (by an actual human)
> > is as useful as a lock with no key.
> 
> Not so. Having a TEI text posted would enable third-party developers to 
> come up with their own converter solutions eve if we didn't get very far 
> with ours. There are a lot of people around who already convert the text 
> files into other formats. Their jobs would get much easier.


I'm hoping Jim (or someone else) can clear up something for me.  If I create a TEI document, use it to create a regular 8-bit ASCII file and valid HTML file, then submit all three to the whitewashers ... will they post all three (assuming the ASCII file clears GutCheck and the HTML clears the W3C validator)?

If not, why not?

If yes, why can't this be the "incremental" development that Marcello was alluded to?

Please, this is not attacking anybody's stance.  I'm really just trying to understand the positions/policies here.

Josh
From jtinsley at pobox.com  Wed Oct 20 13:59:34 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Wed Oct 20 14:00:58 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <4176BDEF.7050008@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de>
Message-ID: <20041020205934.GA22445@panix.com>

On Wed, Oct 20, 2004 at 09:35:11PM +0200, Marcello Perathoner wrote:
>Jim Tinsley wrote:
>
>>Nobody has an objection to valid TEI texts, but valid TEI texts alone
>>_are not enough_. An XML file that cannot be read (by an actual human)
>>is as useful as a lock with no key.
>
>Not so. Having a TEI text posted would enable third-party developers to 
>come up with their own converter solutions eve if we didn't get very far 
>with ours. There are a lot of people around who already convert the text 
>files into other formats. Their jobs would get much easier.
>

I really do not mean to be disrepectful when I -- speaking for myself --
say that I'm not interested in spending my time making developers' jobs
easier. That's not what I'm here for. We have text, and HTML, both
proven and well-supported formats that we know how to work with and for
which we know there is a demand. I'll stick to those until we can see
a way clear through to making successful XML.


>>I really no longer give any headroom at all to the approach "Post XML
>>Now Because That Is The One True Way And We'll Figure Out How To Read
>>It Later." If for no other reason, then because the most important
>>part of the WW job is to check the texts before posting, and if we
>>can't read it, we can't find the errors, and if we can't find the
>>errors, we can't fix 'em.
>
>A TEI text is basically a text file. So you can read it in any editor. 
>If you use emacs you can also validate the TEI file against the DTD 
>without leaving the editor.
>
>A perfectly valid TEI file with no spelling errors should be good enough 
>to post.

Correct spelling is necessary but not sufficient. I don't know about
other people, but I most commonly find errors by skimming the text.
I can't do that with XML. Also, the validity of the XML gives me
no comfort at all that, say, paragraphs are sensibly separated. I can
do that with text or HTML to a high degree of accuracy, because I can
read them naturally in a viewer program. There are many such types of
problems that I can detect by eye quite quickly -- provided I am seeing
the text laid out in a natural way.

>
>What you expect from us TEI developers is that we produce the 150% 
>perfect solution before you even consider starting to post files. That 
>is not the way software development works.
>

Not 150%, surely! :-)

And it may not be the way software development works, but then we're not
a software development project. HTML already works. TeX already works.
I've spent enough of my hours trying to get XML to work; I now leave
that to others.

>And this attitude is in my opinion the main cause why we have gotten 
>nowhere with TEI in the last 3 years.
>
>Lets start now with a version 0.0.1 of the TEI process. Of course at 
>some later time we'll have to do all the posted files over again. 
>Probably more than once. But its better than sitting here and playing 
>with bowerbird 

 . . . or vice-versa? :-) . . .

>because we are bored.
>

Anyway, I disagree with your substantive point above. I say that until we
have (or SOMEBODY has) a . . . . OK, a 90% solution, we should not post.

>
>>Next, we need a process, using open-source, cross-platform tools --
>>the standarder the better -- to convert that XML into, at a minimum,
>>plain text and HTML. Other formats are welcome but optional. That
>>process must work for _all_ teixlite files, not just ones that are
>>specially cooked, using constraints not specified within the chosen
>>DTD. Here's where we hit the rocks today. 
>
>TEI defines a standard way to extend the DTD. I used this standard way 
>to extend the TEI DTD into what I called PGTEI. This still is a 
>perfectly valid TEI DTD according to the TEI specs.
>
>
>>I don't want to imply specific means from which this process is to be
>>constructed. Obviously XSLT is one possible approach, but I certainly
>>do not want to imply limitations on what that process should use. The
>>only things we must have -- both for our own internal practical
>>purposes and for the use of future readers -- is that it should work
>>reliably on _all_ texts that conform to the XML DTD chosen, be open
>>source, and be cross-platform. A reader needs to be able to tweak the
>>transform and re-run on her own desktop. 
>
>You misunderstand what a DTD is. It just gives you syntactical 
>correctness. I can cook up a perfectly valid XHTML file which is 
>semantically bogus:
>
>  <div><h6>1</h6>
>    <div><h5>1.1</h5>
>       <div><h4>1.1.1</h4>
>         ...
>       </div>
>    </div>
>  </div>
>
>This is valid HTML (didn't bother to check) but will render not so well.
>
>You cannot build a conversion tool that will produce good results on all 
>syntactically valid TEI files, like you cannot build a browser that will 
>make sense out of semantically bogus HTML files.

I think one of us is not understanding the other, or perhaps both. I'm pretty
sure I did not misunderstand what a DTD is. I do understand that an XML file
that is valid just means that it is syntactically correct. This is actually
the same point I made above: the fact that the XML is valid does not mean 
that paragraph breaks are in the right place -- which is one of the reasons
why I must be able to convert it to something I can read in order to check it.

I certainly do not require a conversion tool that will correct misplacement
of paragraph marks (though it would be nice! :-) -- I just require that the
process for, say, teixlite will work reliably on all teixlite files; that it
will produce syntactically valid HTML, and, I suppose you might reasonably
say "syntactically valid" text. Actually, now that I say that, I recall a
case where syntactically valid XML made invalid HTML through a bug. Anyway,
that's not the problem. If the process we agree for teixlite is, say, run
it through Saxon, then I expect to be able to run all teixlite files 
through Saxon, and not have a submitter say "oh, no, you must use Xalan for
this file, and not just any Xalan, but one with my patch in it."

I have no objection to requiring, say, a patched version of Saxon, but if so 
I expect that patched version to be stable, to work for all teixlite files 
submitted, to be open-source, and to be cross-platform.


>
>Furthermore TEI is geared towards marking up existent texts, so scholars 
>can study the text without having to get the physical book. It is not so 
>good as a master format for print processing. That's why I had to add 
>some more tags and attributes to my DTD. (Which doesn't make any text 
>that uses my DTD less standard, because TEI is expressly designed to be 
>extensible. But I'm repeating myself.)
>
>>And just re-reading that last, when I say "must work reliably on ALL
>>texts" I do not mean to imply that the same XSLT must be used for all
>>texts, though obviously that would be of benefit, if we can manage it.
>
>So why not start posting texts marked up in PGTEI, which will by 
>definition work well in my conversion chain?
>

I think we were very close to that a year and a half ago. I had a 
request in to you to fix the "blockquote" thing, Greg had laid
down the requirements for the license. And if anyone has followed
up any of that, they didn't copy me on it.

Does anyone apart from you favor using PGTEI? In principle, of 
course, it doesn't matter, but in practice, we really couldn't
cope with multiple XSLT conversion methods all happening at the
same time.

Your chain was, at least, rather difficult to implement. I haven't
checked to see whether it still is. Can it be implemented on a Mac?
on Win32? Is there a stable tarball somewhere?

You see, we appear to differ very fundamentally on one point. It's
my lock and key analogy again. I do not want to start down the road
of producing posted files from an XML if the transform, will be, for
any reason, not repeatable in a year's time, or five, or ten. I do
not want to start down the road of producing posted files from XML
if an end-user who wants to -- on whatever platform -- cannot 
replicate the process. I think that you don't care about this, or
at least, it's not a priority for you, but it is one for me.

>And at the same time start posting Jeroens texts, which will convert 
>fine in his chain?
>

What we said last year still holds: we need somebody -- who is not me,
not any of us WWs -- to create the process. The one that I defined in
my earlier posting today. When we've got that, stable and documented,
or at least understood, I really think we can proceed. But _I_, at
least, have not got the time to spend experimenting, and I _know_ that
David Widger doesn't.

>This way we could both start putting up an automatic online conversion 
>chain. (The guy who did this already in Java has somehow vanished, so I 
>think we have to start over again.)
>
>For the start I will act as interim Post-Processor for people wanting to 
>post PGTEI and pass on to you only the perfectly good ones. You'll just 
>have to stick in the etext number where I put 5 asterisks.
>

No; I, at least, don't want to work with an experimental process in which
each text is an exception. I want a process in which the text comes in,
I add the header, I run the conversion process and I check the resulting
files. If we can't get to that point, I don't, as I said before, want
to spend time on it. If _you_ can do this, then there is no reason,
given a stable process, why _I_ can't.

When somebody gets to this point, please let me know.

>I claim the .pgtei file extension, Jeroen can claim what extension he 
>sees fit for his files. So we can have bith an alice30.pgtei and an 
>alice30.jtei.
>

Why can't we just name them .xml? I see no reason to invent extensions.
_Is_ there one? Not that it matters much, just curious why you would 
think this a good idea.

jim

From jtinsley at pobox.com  Wed Oct 20 14:09:12 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Wed Oct 20 14:09:22 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com>
References: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com>
Message-ID: <20041020210912.GB22445@panix.com>

On Wed, Oct 20, 2004 at 03:34:14PM -0500, Joshua Hutchinson wrote:
>
>
>I'm hoping Jim (or someone else) can clear up something for me.  If I create a TEI document, use it to create a regular 8-bit ASCII file and valid HTML file, then submit all three to the whitewashers ... will they post all three (assuming the ASCII file clears GutCheck and the HTML clears the W3C validator)?
>

No. Not today, and, I hope, never.

>If not, why not?
>

This is exactly what was starting to happen, and what we
backed away from. In the scenario you quote, where you
create the HTML and text from the XML, how do I check the
XML? Take your word for it that you didn't change anything
when creating the HTML? If you could create the HTML, why
can't I? What happened in a few cases was that I spent many
hours checking each of the three files separately, and if
I find a markup error in the HTML, how do I relate that back
to the XML, and . . . it was just a nightmare. Not a good
way to go. I think we were all clear on this much: the XML
way forward is to develop a reliable conversion method that
the WWs can use to produce the other files. 

I really, honestly, do think that until we've got that (and
why shouldn't we have it?? what's so unreasonable about it?)
we should hold off. Which is what we agreed. A moratorium.
That has lasted a lot longer than any of us would have believed 
at the time, because despite the apparent reasonableness --
to me, at least -- of the request, we still ain't got it.

jim

From jonathan_ingram at yahoo.com  Wed Oct 20 14:14:31 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 14:14:42 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020205934.GA22445@panix.com>
Message-ID: <20041020211431.57061.qmail@web41709.mail.yahoo.com>

--- Jim Tinsley <jtinsley@pobox.com> wrote:
> Correct spelling is necessary but not sufficient. I don't know about
> other people, but I most commonly find errors by skimming the text.
> I can't do that with XML. 

As my post earlier on today indicates, this isn't true. 

Assume that PG starts accepting some TEI-related schema. All you need is a
relatively simple CSS stylesheet, and you can open the XML and view it
perfectly directly.

See 
http://faculty.washington.edu/dillon/xml/
for some examples where you can view styled (XML-conformant) TEI directly in
your browser, with no intermediate transformations required.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From shalesller at writeme.com  Wed Oct 20 13:41:26 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 14:15:12 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com>

"David A. Desrosiers" writes:

> Just because I support Mozilla, does not mean I do NOT support 
> MSIE. That being said, if my code works in 13 browsers, and fails in 
> MSIE, my code is not the problem. I do, however, refuse to add "hacks" 
> to get MSIE to do what it should be doing anyway... following the 
> standards. 

If your job is to build a castle in a swamp, you don't get
to blame the fact that the castle sinks into the swamp on the
fact that the swamp doesn't follow standards. In real life,
MSIE is a defacto standard, and not fixing your code to work
with it is being an ideological pain in the ass that doesn't
like to work with reality. I don't have an option which browser
to use. Many other people just don't care enough about computers
and your causes to switch. Let's support our users as they come,
and not ignore them because they aren't interested in computers.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jonathan_ingram at yahoo.com  Wed Oct 20 14:24:13 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Wed Oct 20 14:24:24 2004
Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's
	even-handed analysis)
In-Reply-To: <4176A271.2010105@perathoner.de>
Message-ID: <20041020212413.8205.qmail@web41708.mail.yahoo.com>


--- Marcello Perathoner <marcello@perathoner.de> wrote:

> Jonathan Ingram wrote:
> 
> > and one of the best ways to *fail* to
> > change their mind is to plonk 1400 pages of documentation in front of them
> and
> > say 'here's what you should be using, 
> 
> Then don't do that.
> 
> You don't plonk the IBM PC Technical Reference Manual (5000 pages) in 
> front of your secretary if you want her to type a few pages in M$-Word. 
> You just give her a "Word for Dummies" book and that is all she needs. 
> She don't need to know about the difference between AGP and PCI-X bus.

You're quite right. I let the current confrontational 'vibe' of this mailing
list get the better of me. Sorry.

The point I was trying to make is that there are many people, myself included,
who need to be given real arguments in favour of using something like TEI, and
who won't accept that TEI does things the right way just because it's been
around for a while :). There's quite a few people like me at DP, and I imagine
there are quite a few more reading gutvol-d. As I've convinced myself that, at
least in the areas I've investigated, TEI's methods seem quite sensible, I'm
more open to 'trusting' the rest of it... and I thought some people would be
interested in joining me on this journey.

As gutvol-d is being a little too confrontational for me at the moment, I'll
probably go back to exhibiting my enthusiasm in the more congenial atmosphere
of DP.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From shalesller at writeme.com  Wed Oct 20 13:54:51 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 14:29:15 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041020205451.E250A4BDA9@ws1-1.us4.outblaze.com>

"Joshua Hutchinson" writes:

> In my case, at the very least, if you find something 
> I've worked on that does NOT degrade gracefully in 
> Lynx, etc.... Let me know. I consider that a bug in my work. 

Part of my problem is that I'd rather have it just work in
Lynx, IE, etc. instead of degrading gracefully. In the long
run, I'd like to see us generate HTML from the XML that
just works in Lynx and IE by hard-coding the CSS stuff in
where possible, possibly even producing a special text-browser
HTML; this all of course alongside the HTML for standards-compliant
browsers (which we probably ought to call Netscape 6+, Mozilla and
most other non-IE browsers, instead of assuming our client base
knows or cares about standards-complaince.)

Of course, CSS is the best option right now. 

(It's a little off-topic, but are you still up for doing the
first Early English Text Society HTML edition?)
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jtinsley at pobox.com  Wed Oct 20 14:32:06 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Wed Oct 20 14:32:17 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020211431.57061.qmail@web41709.mail.yahoo.com>
References: <20041020205934.GA22445@panix.com>
	<20041020211431.57061.qmail@web41709.mail.yahoo.com>
Message-ID: <20041020213206.GA10983@panix.com>

On Wed, Oct 20, 2004 at 02:14:31PM -0700, Jonathan Ingram wrote:
>--- Jim Tinsley <jtinsley@pobox.com> wrote:
>> Correct spelling is necessary but not sufficient. I don't know about
>> other people, but I most commonly find errors by skimming the text.
>> I can't do that with XML. 
>
>As my post earlier on today indicates, this isn't true. 
>

If I may nit-pick, I think it more correct to say that it
isn't _always_ true. That is, it is not true when there 
exists a CSS that works with the XML.

Jeroen provided XML like this, which I thought was very
good indeed. For any of you who haven't seen it, please
point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml
which is an absolute pleasure to read. (Well, if you're
a geek, that is, and if you ain't, whatcha doin. here?? :-)

I said before, and I say again, that where such an XML is 
provided, HTML is probably redundant. ("Probably" because
a significant use of HTML is as input to PDA readers like,
say, Mobipocket, and I'm not sure if they would swallow
this XML without requiring a Heimlich.)

I know of no CSS for Marcello's PGTEI. Perhaps one could
be crafted for it.

>Assume that PG starts accepting some TEI-related schema. All you need is a
>relatively simple CSS stylesheet, and you can open the XML and view it
>perfectly directly.
>
>See 
>http://faculty.washington.edu/dillon/xml/
>for some examples where you can view styled (XML-conformant) TEI directly in
>your browser, with no intermediate transformations required.
>

It does still leave the plain-text question hanging, but I
do think that XML+CSS is a Good Thing, even if the XML is
also destined to go through XSLT as well.

jim

From hacker at gnu-designs.com  Wed Oct 20 14:36:46 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 14:38:08 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com>
References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410201727390.4096@angst.gnu-designs.com>


> In real life, MSIE is a defacto standard, and not fixing your code to 
> work with it is being an ideological pain in the ass that doesn't like 
> to work with reality.

 	Exactly. Its a good thing my code works in everything, from PDA to 
full blown xinerama-enabled desktop with browser, without any changes or 
hacks or workarounds required.

 	But I agree with part of your statement. There are thousaands of 
sites out there that don't take the same level of care that I take with my 
code, to ensure this overly-pedantic level of compatibility.

> I don't have an option which browser to use. Many other people just 
> don't care enough about computers and your causes to switch.

 	These aren't "my causes". If users want a richer Internet browsing 
experience, they'll explore the alternatives, or they won't. If they want 
to reduce their level of maliscious exposure, they'll explorer the other 
alternatives, or they won't. If companies want to reduce the amount of 
technical support calls and man hours required to support MSIE, they'll 
switch to a standards-compliant browser, or they won't.

 	The choice is theirs.

> Let's support our users as they come, and not ignore them because they 
> aren't interested in computers.

 	I support several thousand users in various capacities and across 
many dozens of projects, including my own. As a mentor, educator, and 
student myself, it is part of my process to present all of the possible 
choices to solve a particular problem to the user, and let them decide.

 	Just saying that you can't support suggesting the alternatives 
because Microsoft has a larger percentage of their file manager in use on 
the desktop environment, isn't fair to the end-user.

 	But this is getting way off topic, into the realm of religious 
wars about "Which editor is best?" (vi, of course ;), or browser wars. 
Let's get back to focusing on the issues related to PG and making the 
project and ancillary support tools and formats better and better.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From shalesller at writeme.com  Wed Oct 20 14:12:16 2004
From: shalesller at writeme.com (D. Starner)
Date: Wed Oct 20 14:43:25 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041020211216.7ECBE4BDA9@ws1-1.us4.outblaze.com>

Gutenberg9443@aol.com writes:
> Lead, follow, or get out of the way. Can you supply a way 
> to do this? If so, do it. 

We can. XML, among other things, is a simple way to do this.

> But all the same, I'd rather that readers have that version 
> than no version at all of the book.

How many people can't read the Latin-1 version? How many people
read that version when they could read the Latin-1 version
just fine? People have sent back editions with the accents re-added
despite the fact that we already had an edition with accents,
so there is evidence that having both editions is bad.

> The principle of the greatest good for the greatest number 
> doesn't mean let's throw out the lesser numbers.

How dare we be so provincial as to ignore the EBCDIC users. There's
thousands of character sets in the world--what gives us the right
to ignore the "lesser numbers" of non-ASCII users?
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jtinsley at pobox.com  Wed Oct 20 14:50:41 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Wed Oct 20 14:50:57 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020213206.GA10983@panix.com>
References: <20041020205934.GA22445@panix.com>
	<20041020211431.57061.qmail@web41709.mail.yahoo.com>
	<20041020213206.GA10983@panix.com>
Message-ID: <20041020215041.GA3631@panix.com>

I was just reading over my last posting, hoping I wasn't
the one sending bad vibes to Jonathan, who is exactly the
kind of person we _need_ in a discussion like this, when
I came across something else that Marcello said, that I
didn't comment on first time round:

>Lets start now with a version 0.0.1 of the TEI process. Of course at 
>some later time we'll have to do all the posted files over again. 

Now, please don't take this as a policy statement or 
anything, but I really, really HATE doing anything 
KNOWING that it's wrong and will have to be done again.
I mean, bone-deep HATE it.

Factor that in however you will. An argument against setting up
an experiment in a production environment, or a personal foible?
I report -- you decide! :-)


jim

From Gutenberg9443 at aol.com  Wed Oct 20 15:13:04 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 15:14:38 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <1e3.2c7b180e.2ea83cf0@aol.com>

 
In a message dated 10/20/2004 12:43:31 PM Mountain Standard Time,  
shalesller@writeme.com writes:

we  should be doing a lot more Spanish
and French and Arabic and a bunch of  other languages that we generally
totally  ignore.


We don't ignore them. We beg, plead, and implore for them. But we don't get  
them. I sent a personal letter to the King of Saudi Arabia explaining what we  
are doing and telling him that we would greatly appreciate both books in 
Arabic  and Arabic books that are translated into English. In case you don't know 
it,  some of the most important books of exploration and history in the middle 
 ages happen to be in Arabic. So are some seminal mathematical books, along 
with  a good many other books.
 
His Majesty's staff ignored me.
 
I sent a similar letter to the Saudi Aramco Oil Company. I got similar  
results, despite the fact that Saudi Aramco World is one of the best  
National-Geographic type magazines in print. Every month when our copy arrives  my husband 
reads it first on the grounds that he's the historian in the family;  then I 
get it the second he lays it down.
 
We WANT all these other texts. Obviously what goes for English does not  
necessarily go for other languages, so quit badgering me about that. Now, have  
you got any bright ideas where we can GET those books in other languages? If so, 
 get them, and the sooner the better. I assure that they will be posted as 
soon  as their copyright status is determined.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/d5617341/attachment.html
From Gutenberg9443 at aol.com  Wed Oct 20 15:15:15 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 15:15:45 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <67.35a9d1d7.2ea83d73@aol.com>

 
In a message dated 10/20/2004 12:50:16 PM Mountain Standard Time,  
joshua@hutchinson.net writes:

Are you  sure we're talking about the same thing?  By page break, I mean when 
you  get to the bottom of the physical piece of paper.  What you describe  
sounds more to me like what I usually refer to as a thought break ... a little  
white space or a graphic symbol between sections of text to indicate a scene  
transition or time passing, etc.  Typically, in PG texts, those are  marked 
with 5 asterisks (functionally equivalent to your # #  #).


Okay. The name of those things is page break. Obviously in computerese the  
word page break has a different meaning, as I should remember myself 
considering  how many times I have inserted a page break into something.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/d05c51b4/attachment.html
From Bowerbird at aol.com  Wed Oct 20 15:16:32 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 15:16:47 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <8b.17ed6e78.2ea83dc0@aol.com>

the hacker said:
>   I said no such thing

it's silly to argue about this.  this is what you said:
>   For the users who refuse to learn, to adapt, and to grow,
>   they can stagnate and stay in their own nice warm puddle.

you also said this:
>   if my code works in 13 browsers, and fails in MSIE, 
>   my code is not the problem. I do, however, 
>   refuse to add "hacks" to get MSIE to do what 
>   it should be doing anyway... following the standards.

those aren't the types of things michael hart would say.

there's no reason to get all up in a huff.  i'm not telling you
that you need to change.  on most days, i share your feelings.

and i have said often that every e-book-related innovation
will come sniffing for a chance to work with this library,
and needs to prove its ability to handle it to earn its stripes.
so i'm not faulting you for being here.  that's why i'm here.

but what you've said here is _not_ what michael would say.

so the simple fact is that yours is _not_ the attitude that
has guided project gutenberg from where it started to today.
the mission here has been to do _whatever_it_might_take_
for those e-texts to be available to the maximum audience.
among the many things that that has meant is to work with
the _trailing_ edge, not the _leading_ edge, of technology.
and that strategy hasn't caused it to "stagnate", but rather
what has caused it to grow into the biggest cyber-library...

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 15:22:58 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 15:23:11 2004
Subject: [gutvol-d] re: news flash, josh is psychic too
Message-ID: <12a.4e63e09c.2ea83f42@aol.com>

josh said:
>   ZML (Zero Markup Language) is ... wait for it ... a MARKUP LANGUAGE.  

you seem to be psychic today too.  must be the rain...

a plain-text file and a z.m.l. file are so similar that
-- even using your newfound psychic powers, josh --
you probably can't tell them apart.  look at alice30.txt.
is it a plain-text file?  or is it a z.m.l. file?  you tell me.

the important points about z.m.l. that make it relevant are that:
1)  it is extremely simple, so simple anyone can use it, and
2)  it does the job that needs to be done.

now, let us discuss x.m.l. in that context...

-bowerbird
From Gutenberg9443 at aol.com  Wed Oct 20 15:34:14 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Oct 20 15:34:32 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <194.308d19da.2ea841e6@aol.com>

 
In a message dated 10/20/2004 3:43:48 PM Mountain Standard Time,  
shalesller@writeme.com writes:

> The  principle of the greatest good for the greatest number 
> doesn't mean  let's throw out the lesser numbers.

How dare we be so provincial as to  ignore the EBCDIC users. There's
thousands of character sets in the  world--what gives us the right
to ignore the "lesser numbers" of non-ASCII  users?


Excuse me, I thought that was exactly what I said. Doesn't "doesn't mean  
let's throw out the lesser numbers" mean the same as "what gives us the right to  
ignore the lesser numbers"?  I don't want to ignore anybody. 
 
And I'm crawling back into the woodwork. Every time I start posting I wind  
up in a flame war, which is the last thing on earth I want.
 
Anne
 
Egads.
 
I'm crawling 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/af577479/attachment-0001.html
From sly at victoria.tc.ca  Wed Oct 20 15:40:15 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Oct 20 15:40:26 2004
Subject: [gutvol-d] Languages in PG
In-Reply-To: <1e3.2c7b180e.2ea83cf0@aol.com>
References: <1e3.2c7b180e.2ea83cf0@aol.com>
Message-ID: <Pine.GSO.4.58.0410201530130.16427@vtn1.victoria.tc.ca>


> In a message dated 10/20/2004 12:43:31 PM Mountain Standard Time,
> shalesller@writeme.com writes:
>
> we  should be doing a lot more Spanish
> and French and Arabic and a bunch of  other languages that we generally
> totally  ignore.

In reply, Anne, Gutenberg9443@aol.com wrote:

> We don't ignore them. We beg, plead, and implore for them. But we don't get
> them.

More in the same vein...

Perhaps for a little reminder, check out this faq:
http://gutenberg.net/faq/G-15

I've been contributing a few French-Canadian books to PG
myself, by reformatting some already online elsewhere.
I've also done the same with German texts in the past.
I find it goes a good deal slower when I'm not too
familiar with the language in question, because I'm
afraid of letting obvious mistakes get through...

Also, the numbers below (taken from the catalog) show that,
although PG's non-english content can certainly be expanded,
it is not insignificant:
French (367)
German (307)
Finnish (85)
Chinese (69)
Spanish (59)
Italian (36)


I just scanned through a list of the titles posted in the last
seven days, and a quick count gave me 23 in languages other
than English. That doesn't seem to me to be "totally ignoring"


Andrew
From hacker at gnu-designs.com  Wed Oct 20 15:45:07 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 15:46:10 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <8b.17ed6e78.2ea83dc0@aol.com>
References: <8b.17ed6e78.2ea83dc0@aol.com>
Message-ID: <Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>


> it's silly to argue about this.  this is what you said:

>>   For the users who refuse to learn, to adapt, and to grow,
>>   they can stagnate and stay in their own nice warm puddle.

 	You wrongly asserted that I voted to "leave the trailing edge 
behind". I said no such thing. Users make their own choice to learn or not 
to learn. The results of that choice are their own, and I have nothing to 
do with it. I can only do my part to educate and mentor as necessary. End 
of discussion on this point.

> you also said this:

>>   if my code works in 13 browsers, and fails in MSIE,
>>   my code is not the problem. I do, however,
>>   refuse to add "hacks" to get MSIE to do what
>>   it should be doing anyway... following the standards.

> those aren't the types of things michael hart would say.

 	Well that isn't surprising. I'm not Michael Hart.

> so the simple fact is that yours is _not_ the attitude that has guided 
> project gutenberg from where it started to today.

 	You're comparing how I treat content delivered for an audience 
using primarily web browsers (i.e. webpages) with PG etexts, which are not 
being viewed in a web browser. My ideas and beliefs on web development are 
quite different from my ideas and beliefs about how to best engineer a 
scalable electronic book format.

 	Please don't mix my words up like this. This is the third time 
you've done it, and each time, you've strategically taken my words out of 
context to try to suit your own bend in the discussion.

> among the many things that that has meant is to work with the _trailing_ 
> edge, not the _leading_ edge, of technology. and that strategy hasn't 
> caused it to "stagnate", but rather what has caused it to grow into the 
> biggest cyber-library...

 	My goal is to provide PG etexts (as well as those from about a 
dozen other places) in a format for everyone who can read, regardless of 
platform, reader, language, file format, and also to include those who 
cannot read at all.

 	I never claimed that my interest in the PG project was in direct 
alignment with Michael Hart, or anyone else on this list for that matter. 
I fully expect that my beliefs will intersect a lot of the beliefs of 
others, and contracdict those of yet other members here. Such is life, and 
it is through this combination of agreement and disagreement, that actual 
action gets done.

 	Nothing would evolve, if there weren't others who thought their 
own beliefs were better than the beliefs of others, and that they were 
committed to persuing them until completion, despite strong objection from 
others.

 	There's no point in continuing this further. I'm done.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Bowerbird at aol.com  Wed Oct 20 16:03:36 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 16:03:54 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in
	hand
Message-ID: <1e1.2d1fafad.2ea848c8@aol.com>

the hacker said:
>   You wrongly asserted that I voted to 
>   "leave the trailing edge behind". I said no such thing. 
>   Users make their own choice to learn or not to learn. 

the trailing edge _is_ largely composed of people who 
-- as you put it -- "refuse to learn, to adapt, and to grow".
(ironically, some of them actually want to read books!)

further, telling them you don't care if they "stagnate and 
stay in their own nice warm puddle" is leaving them behind.
you can try to fancy it up with a spin, but that's what it is.

and the fact of the matter is that a whole lot of people are
_stuck_ with machines that simply will not run a browser
that is "standards-compatible", because they don't believe
-- rightly or wrongly -- that they can afford a newer one,
perhaps because this month they're trying to decide whether
they want to spend their last dollars keeping warm or eating.

there are people right here on this listserve who've told you
that they don't control the browser that their machine runs,
probably because they are using a computer in their library
that billy g. installed there precisely to extend his monopoly.

or maybe, once again ironically, it was a computer that was
put into a school when microsoft fulfilled the terms of the
court judgment, thereby (amazingly) extending the monopoly.

or maybe, even more likely, they're in a school that hasn't had
any budget to buy new computers since the ones they got in '94.

there's a whole big world out there, with lots of people in it...

myself, i'm running a mac g3, circa 1998, with os8.1.
can you tell me a standards-compliant browser to use?

-bowerbird
From jgruber at tampabay.rr.com  Wed Oct 20 16:09:09 2004
From: jgruber at tampabay.rr.com (Joseph R. Gruber)
Date: Wed Oct 20 16:08:46 2004
Subject: [gutvol-d] Why Bowerbird is a genius
In-Reply-To: <ba.63165d94.2ea7eb7c@aol.com>
Message-ID: <200410202308.i9KN8SB9009523@ms-smtp-04.tampabay.rr.com>

Not raw ascii glory -- hex...get it right.

And now I don't end all my programs by ctrl+alt+del but ones that lock up I
do (eg: your program).

Also, why announce your program for beta testing and then say don't worry
about reporting bugs since you have a new version coming out soon
anyway...what a waste of time.

Oh and you can put your ZML in .txt's and package them with an installer.
You can then guarantee the txts are properly "formatted".

Joseph

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com
Sent: Wednesday, October 20, 2004 12:26 PM
To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com
Subject: re: RE: Re: [gutvol-d] Why Bowerbird is a genius

gruber said:
>   Run the program and then Ctrl+Alt+Del it.
>   Ooops...there goes a "never-crash". 

control-alt-delete?
is that how you end all your programs?      :+)
i recommend you try the "quit" button instead,
or choose "quit" or "exit" under the file menu.

but if that's a bug, which is entirely possible
-- to be expected in fact -- in a beta-version,
i'll fix it.  but please take the bug-reports to
the beta-test listserve, so they'll be logged.

but don't bother with doing that _now_.
as your version is almost 3 months old.
a new version will be out very soon, and
you should wait to test that one instead...


>   P.S. Why are you hiding the etexts in your code
>   instead of making them separate .txt's?

first things first.  priorities. and control of the
degrees of freedom for enhanced troubleshooting

the first objective is to get the program solid.
to focus on that, it's wise to use content that
i _know_ has been correctly formatted in z.m.l.

once the app is acting correctly and is stable,
i'll turn to texts that might be marked up wrong,
confident that if i get unexpected behavior, it's
due to an incorrect text or a defective z.m.l. rule,
in all likelihood, and not some bug in the program.

but the e-texts aren't really "hidden" right now.
if you scrutinize any version of the program in
a file-viewer, you will see the e-texts inside,
"hiding" in plain sight in their raw-ascii glory.
they can be recovered, in full, easily, any time.

-bowerbird
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From jgruber at tampabay.rr.com  Wed Oct 20 16:13:49 2004
From: jgruber at tampabay.rr.com (Joseph R. Gruber)
Date: Wed Oct 20 16:13:25 2004
Subject: [gutvol-d] sudden surge in demand
In-Reply-To: <8b.17ea1fb8.2ea81ced@aol.com>
Message-ID: <200410202313.i9KND7T6008392@ms-smtp-03.tampabay.rr.com>

For those who don't want to give your email to this (I'll hold off on what I
really want to say) -- you can get the latest version at:

http://www.josephgruber.com/pudding1020-exe.zip

It's virus free (but if you don't trust it feel free to give your info to
this....nm. ;)

Joseph

-----Original Message-----
From: gutvol-d-bounces@lists.pglaf.org
[mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com
Sent: Wednesday, October 20, 2004 3:56 PM
To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com
Subject: [gutvol-d] sudden surge in demand

due to the sudden surge in demand, i've uploaded
a new version of my beta-test viewer-program...

it can be found in the files section of the 
yahoogroups listserve for the beta-test,
which is under the name of "zml_talk".

you can join that beta-test by subscribing via
e-mail:  zml_talk-subscribe@yahoogroups.com

this upload is just "the daily build", and has _not_
been reviewed, so it should be considered as such...

people who want to run a version _known_ to be stable
should hold off on this.  such a version will go up soon.

additionally, no reports are necessary on this version,
since a few of the features have yet to be implemented.

but if you'd like to see the development of the program
since the version uploaded on 7/27, it's there for you...

sorry it's taken so long, but you know how things are.
time flies when you're squashing bugs...

-bowerbird
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From sly at victoria.tc.ca  Wed Oct 20 16:30:06 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Oct 20 16:30:18 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020215041.GA3631@panix.com>
References: <20041020205934.GA22445@panix.com>
	<20041020211431.57061.qmail@web41709.mail.yahoo.com>
	<20041020213206.GA10983@panix.com> <20041020215041.GA3631@panix.com>
Message-ID: <Pine.GSO.4.58.0410201629570.4958@vtn1.victoria.tc.ca>


I've read almost every that's been sent to the gutvol-d list in
the recent burst of messages.

I think it may be worthwhile trying to place everything that's
been said in the larger perspective...

Throughout much of its history, PG as an organization has
been open to posting texts with formatting details or
additional file formats done as volunteers wished to
contribute them. Some examples of closed, propriatory
formats (such as .prc and .lit) can be even found.

This freedom has led to a wonderful array of inconsistencies
and differences of approach which are probably most fully
realized only by those who try to analyze, or convert large
portions of the PG collection. (At least a few people involved
in these recent discussions fall into that catagory.)

I would argue that if we go about posting various people's
implementations of markup using XML, we risk forming an
increasingly incompatible jumble of formats.

Andrew

From Bowerbird at aol.com  Wed Oct 20 16:38:01 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 16:38:17 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
Message-ID: <158.419ce2ec.2ea850d9@aol.com>

marcello said:
>   1. 14 Feb 2003: You announce you will 
>   code an open source ebook reader.

yes.  that was a big joke.      :+)
i was pulling jon noring's leg.
read posts on that listserve.
everyone knew i am anti-xml.
everyone knew i am anti-oeb.
anyone who's stupid enough to
think _i_ would be _serious_
in announcing an effort to write
an oeb viewer is _really_ stupid.

nonetheless, i _would_ have
gone ahead with the project
if anyone would have responded.
as long as _some_ programmers
were willing to puzzle through 
the difficulty of figuring out o.e.b.,
i'd happily advise them on the u.i.

but no one showed up, so it died.

that's not unusual.  there are many
open-source projects on sourceforge
that have died with one contributor.

heck, there are more than a couple
_directly_ for project gutenberg
-- creating viewer-programs --
that have died early on the vine...

my guess is that jon noring still has
less than 3 programmers involved
-- jon, care to comment on that? --
in his open-source openreader thing,
and david rothman has been flogging
openreader _incessantly_ on his blog,
even relaying a specific request for
mac programmers to join in and help.

nonetheless, i would _still_ go ahead
with _my_ open-source o.e.b. project,
if any programmers were to turn up.
do you know any?  let's go to work!
the more viewers we have, the better.

on second thought, have 'em join jon's effort.
the fewer open-source projects that _fail_,
the better off we will be, in the long run...

see, i've prodded jon noring for _years_ to 
get a viewer-program for his beloved o.e.b.
it's _silly_ to propose a "standard format"
and then not have any tools that support it!
(you need a viewer _and_ an authoring-tool!)

i don't know if this joke was the thing that
actually got jon to get to work on the task,
but if it was, then i am sure glad i did it.
(he's a hard worker.  if he directed his energy
in a productive way, he might do a good job.)

actually, it was probably the full-on review
i wrote in response to jon's o.e.b. puff piece
on ebookweb.com that was the real motivation,
if there was anything specific that _i_ did.
but whatever got him to pay some attention
to the point i'd been making for many years,
it was "a good thing", as jon would put it.


>   2. 19 Oct 2004: You have nothing to show.

_that_ project has "nothing to show".

my own viewer-program, which has _never_
been open-source, and probably never will be,
not until the open-source community can match it,
is ready for beta-testing.  have i given the address
that people can use to join that beta-test listserve?
yes, i do believe i have.


>   3. You retroactively declare the announcement to be a joke.

it was a joke from the time the post was a gleam in my eye.   :+)
anyone who knows me knows that i do not do press releases.
the mere _thought_ is funny.  i puke on the shoes of the press.


>   4. You think that did save your face.

>   Think again.

i'm thinking all the time, marcello, all the time...


>   So you admit you were lying to the press

how outrageous!  it's _their_ job to lie to _me_!    :+)

what was i thinking?  oh yeah, i know.  that "press release"
never got any farther than the listserve where i "released" it,
which -- if i remember correctly -- was populated by about
two posters at the time, jon noring and me.  hence, the joke...

-bowerbird
From stephen.thomas at adelaide.edu.au  Wed Oct 20 17:02:37 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Oct 20 17:02:59 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
Message-ID: <4176FC9D.6000409@adelaide.edu.au>

David A. Desrosiers wrote:
> 
>> I can guarantee that CSS file is in the PG directory.  I can't 
>> guarantee that Joe Sixpack will download that when he grabs the HTML 
>> file.
> 

This is one of those problems with no easy answer. If you want 
the user to be able to download your book to read offline, then 
you've got to also make sure the user downloads the style sheet 
that goes with it.

[If you only expect them to read online, it doesn't matter.]

My own solution is simply to make a zip file for downloading, 
which includes both the html page(s) and style sheet.

I use the same style sheet for all books, but actually copy it 
to each ebook's directory, so there are currently around 850 
copies of the same style sheet. But it is trivial to update them 
all from the master. It still uses more space, but the 
alternative, having all the html files link to a single css 
doesn't allow for zipping and downloading.


Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From stephen.thomas at adelaide.edu.au  Wed Oct 20 17:02:47 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Wed Oct 20 17:03:05 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
Message-ID: <4176FCA7.5060809@adelaide.edu.au>

David A. Desrosiers wrote:
> 
>     You can't translate a book into something read in a web browser, and 
> retain the same functionality. The whole point of a scrollbar is to 
> remove that constraint.

Yeay! Something I've been saying for years. The "e" in ebook 
gives us opportunities that don't exist in print, so let's use them.

> 
>     Though I agree, unnessarily-long webpages (scrolling down for 
> hundreds of pages) are a pain, but the alternative is much more painful.

Reading a book with hundreds of pages is painful. I don't see 
why scrolling is any more painful than turning pages. (The 
Mobipocket reader for Palm also has an auto scroll option which 
just scrolls the text slowly by, which could be a nice feature 
in browsers.)

One advantage of print is the ease of bookmarking a spot -- 
something that can't be done easily on most ebooks, although I'm 
working on a simple HTML solution.

I also now provide a single HTML file version and a multi-page 
version of my ebooks. Usually the multi-page version splits the 
work into chapters (or whatever is the major division for the 
work). The multi-page version was mainly intended to make online 
reading easier -- there's less to download for each chapter. It 
also means that Google is more likely to index the content -- 
they have, I think, a 100k limit per file. But most browsers can 
easily accomodate the complete, single-file version of the 
average work, up to a MB or so. Something like Don Quixote is a 
bit more of a problem as a single file, being large in text size 
and also carrying many illustrations, making the total download 
many megabytes. Something that large really needs to be split.


Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From Bowerbird at aol.com  Wed Oct 20 17:15:09 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 17:15:25 2004
Subject: [gutvol-d] Why Bowerbird is a genius
Message-ID: <154.41f2c1d2.2ea8598d@aol.com>

joseph said:
>   Not raw ascii glory -- hex...get it right.

have you looked?  maybe it's different on the p.c. side,
but on my mac, when i pull either one of the versions
into a file-viewer, and scroll down, the text is right there.


>   Also, why announce your program for beta testing 
>   and then say don't worry about reporting bugs since 
>   you have a new version coming out soon anyway...
>   what a waste of time.

you seemed very eager to have the program, and to
spread it around, even posting it on your own site,
so i figured that kind of devotion deserved the
newest version i could provide.  and once i have
gotten all the features to where _i_ want them,
in a few days or so, _then_ i will be ready for
beta-testers to send me reports on that version.
until then, anything they report might well be
something that i already intend to fix anyway...

but then again, you don't really care anyway, do you?
nonetheless, i will respond to you, at least until we
have posted as many messages with the "genius" header
as we posted with the "kook" header, which was a lot...

-bowerbird
From hacker at gnu-designs.com  Wed Oct 20 17:18:16 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 17:19:13 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <4176FC9D.6000409@adelaide.edu.au>
References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
	<4176FC9D.6000409@adelaide.edu.au>
Message-ID: <Pine.LNX.4.61.0410202012310.10090@angst.gnu-designs.com>


> I use the same style sheet for all books, but actually copy it to each 
> ebook's directory, so there are currently around 850 copies of the same 
> style sheet. But it is trivial to update them all from the master.

 	That seems like a horrible waste of inodes. I feel this pain, 
because I ran out of inodes on one of my arrays working on some PG works, 
even though I had 50GiB of space free on the drive. I had to reformat with 
more inodes to work around the problem.

> It still uses more space, but the alternative, having all the html files 
> link to a single css doesn't allow for zipping and downloading.

 	Here's an easy solution: In each .zip, you include a copy of the 
stylesheet, the same stylesheet you include with every copy... except, 
when you unzip the works, they go into a structure like this:

 	Gutenberg/
 	|-- books
 	|   |-- Book_One.xml
 	|   `-- Book_Two.xml
 	`-- styles
 	    `-- Gutenberg.css

 	Every .zip that you unzip into there, will overwrite Gutenberg.css 
with the copy that you duplicate inside each .zip file, and the .xml (or 
.html or text or whatever) versions of the books go into a separate 
subdir. In your .xml files, you use the standard <base url="..."> clause 
or simply point your style declaration to ../styles/Gutenberg.css.

 	This is exactly how it works on the Web in general, for very 
similar projects. Did that make sense?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Wed Oct 20 17:28:51 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 17:30:14 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <4176FCA7.5060809@adelaide.edu.au>
References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410201150160.28742@aphrodite.gnu-designs.com>
	<4176FCA7.5060809@adelaide.edu.au>
Message-ID: <Pine.LNX.4.61.0410202019250.10090@angst.gnu-designs.com>


> Reading a book with hundreds of pages is painful. I don't see why 
> scrolling is any more painful than turning pages. (The Mobipocket reader 
> for Palm also has an auto scroll option which just scrolls the text 
> slowly by, which could be a nice feature in browsers.)

 	We've had that in Plucker for quite some time also (and Plucker's 
format is openly documented, unlike MobiPocket's format).

 	Related to that, you CAN have autoscroll in your browser (again, 
making the assumption that you're using a standards-compliant browser).

 	http://autoscroll.mozdev.org/

> One advantage of print is the ease of bookmarking a spot -- something 
> that can't be done easily on most ebooks, although I'm working on a 
> simple HTML solution.

 	We've got bookmarking, and we're adding cross-document bookmarks 
and interlinking in our next version. We've been thinking about these (and 
other similar problems and solutions) for quite awhile now.

> I also now provide a single HTML file version and a multi-page version 
> of my ebooks. Usually the multi-page version splits the work into 
> chapters (or whatever is the major division for the work).

 	I do the same for my HOWTO documents, sourced from SGML. One call 
each with with jade or sgmltools will generate the multi-document version 
of HTML or the single-document version. I run that through hindent and 
tidy for a few passes, and out comes properly-validated XHTML (mostly).

 	You can see what one of those kinds of preparations looks like 
over here. This particular work is only HTML4.0 Transitional, and not 
fully validated yet, but you can see what I did with the stylesheet and 
general output of the SGML:

 	http://faqs.gnu-designs.com/pokerfaq/

 	The mobile version is over here (with screenshots):

 	http://plkr.org/news/46

> The multi-page version was mainly intended to make online reading easier 
> -- there's less to download for each chapter. It also means that Google 
> is more likely to index the content -- they have, I think, a 100k limit 
> per file.

 	Funny you mention that. I've been doing some SEO work on my HTML
version of the 9/11 Commission Report, and the original chapters I 
converted were 100+k and more, many of them into the 200k and 300k range. 
I took some time to split those up into their own subchapters. You can see 
THAT work over here:

 	http://911.gnu-designs.com/

 	I put a ton of hand-editing and automated work into this 
particular effort. With over 7,000 downloads of the mobile formats I've 
created from that work, it seems to be quite popular. It is this same 
level of quality that I am striving for with PG works I convert.

> Something that large really needs to be split.

 	We agree.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Bowerbird at aol.com  Wed Oct 20 17:32:59 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 17:33:14 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <75.360fea32.2ea85dbb@aol.com>

stephen said:
>   One advantage of print is the ease of bookmarking a spot -- 
>   something that can't be done easily on most ebooks, 

depends on the viewer-program.


>   although I'm working on a simple HTML solution.

ok.       :+)


>   most browsers can easily accomodate the complete, 
>   single-file version of the average work, up to a MB or so. 

that download is downright painful if you're on dial-up.

and if there are images involved, it gets even worse.

and -- at least on some browsers, not naming any names --
when c.s.s. is used, the formatting doesn't seem to get done
until the whole file is downloaded, which is a huge handicap.

(and every time you resize the window, you have to wait again.)


>   Something like Don Quixote is a bit more of a problem 
>   as a single file, being large in text size and also carrying
>   many illustrations, making the total download many megabytes. 
>   Something that large really needs to be split.

or downloaded as a zip and read offline.  and even then,
it can take a while for a browser to load and display it.
if you want, i can do some actual timed trials on my mac;
i suspect that the results might surprise you.  in general,
i've found speed in computers is very easy to get used to;
it's difficult to remember how slow an old computer was,
unless you actually fire it up and deal with it once again.
when you do, it'll usually dumbfound you how slow it was,
and you wonder how you ever got anything done at that pace.
same with broadband.  put a bunch of techies back on dialup,
and they would understand why the masses are slow to adopt
-- the systems the techies are building are simply unusable...

-bowerbird
From joshua at hutchinson.net  Wed Oct 20 17:46:40 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 17:46:37 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020213206.GA10983@panix.com>
References: <20041020205934.GA22445@panix.com>	<20041020211431.57061.qmail@web41709.mail.yahoo.com>
	<20041020213206.GA10983@panix.com>
Message-ID: <417706F0.4070207@hutchinson.net>

Jim Tinsley wrote:

>On Wed, Oct 20, 2004 at 02:14:31PM -0700, Jonathan Ingram wrote:
>  
>
>>--- Jim Tinsley <jtinsley@pobox.com> wrote:
>>    
>>
>Jeroen provided XML like this, which I thought was very
>good indeed. For any of you who haven't seen it, please
>point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml
>which is an absolute pleasure to read. (Well, if you're
>a geek, that is, and if you ain't, whatcha doin. here?? :-)
>  
>
First off, let me say that ... is a beautiful e-text.  I really like the 
look and thanks to Jeroen for producing it and Jim for point it out!


And next, let me make a modest proposal.  Jon (in the DP forums) is 
making some progress toward a XML/CSS standard of sorts.  I'm going to 
be watching closely (and helping as much as I can).  One of the things 
I'm going to be pushing for is TEI-Lite compliance as much as possible.

Since Marcello has his PGTEI document guidelines on the web site, I'll 
be looking through that for ideas and such.

I'll be going over this with Jon when I can, but my early idea is that 
we work on a couple of DP e-texts (the two of us have TONS to choose 
from!) and improve the XML markup standard enough for basic work.  In a 
few weeks or so, I'd like to get a few projects posted to PG that use 
XML (TEI) with a CSS style sheet in place of the normal HTML that we 
always produce on our projects.  The normal text file will of course be 
created.  Once we have a canon of TEI to work with, hopefully the 
developers out there can start working on tools to help produce HTML or 
TEXT or PDF directly from the master.

It seems to me that the XML/CSS process is the best method to 
incrementally approach a XML master.  Marcello has a point that if we 
wait until we have a 100% solution, we may never get there... But a 
XML/CSS process is doable now and it gets us closer.

Now... everyone let me know where my logic fails.

(Everyone but bowerbird... don't even bother to respond, please... I'm 
trying to actually get something going besides a diverting flame war!)

Josh
From joshua at hutchinson.net  Wed Oct 20 17:51:51 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 17:51:48 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410201727390.4096@angst.gnu-designs.com>
References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com>
	<Pine.LNX.4.61.0410201727390.4096@angst.gnu-designs.com>
Message-ID: <41770827.4070100@hutchinson.net>

David A. Desrosiers wrote:

>     But this is getting way off topic, into the realm of religious 
> wars about "Which editor is best?" (vi, of course ;), or browser wars. 
> Let's get back to focusing on the issues related to PG and making the 
> project and ancillary support tools and formats better and better.

Bah... Winguts rules VI!

(Ok, you have to be a DP geek to know the winguts editor... but trust 
me, it rules! ;) )

Josh
From joshua at hutchinson.net  Wed Oct 20 17:58:18 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 17:58:15 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <194.308d19da.2ea841e6@aol.com>
References: <194.308d19da.2ea841e6@aol.com>
Message-ID: <417709AA.7040201@hutchinson.net>

Gutenberg9443@aol.com wrote:

>  
> And I'm crawling back into the woodwork. Every time I start posting I 
> wind up in a flame war, which is the last thing on earth I want.
>  
> Anne
>  

Come on back out, Anne.  It's just that everyone gets so worked up when 
bowerbird is around that they start snapping at everything.

Plus, each of us has his or her own little pet issues (for instance, 
mine is wanting a master document format ... David's is character set 
support and LYNX level browser support ... Jon loves his CSS markup).  
When our pet issues come up, sometimes we have trouble seeing the forest 
for the trees or realize the tone of our words.

David and I have argued plenty of times on these subjects and probably 
will plenty of times more.... but when I stop to think about it, we 
actually agree on most things.  We are just passionate and text 
communication doesn't always handle passion well.

Anyways, no one is trying to chase you away.  Please come back out to play!

Josh
From joshua at hutchinson.net  Wed Oct 20 18:01:02 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 20 18:00:59 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>
References: <8b.17ed6e78.2ea83dc0@aol.com>
	<Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>
Message-ID: <41770A4E.9030305@hutchinson.net>

David A. Desrosiers wrote:

>     There's no point in continuing this further. I'm done.

You know, I think everyone that has a conversation with bowerbird says 
this at some point.  Heck, I've said it have dozen times or more ... and 
then he says something so aggravating that I can't help but respond.

If nothing else, and I've said this before, bowerbird is a very good troll.

Josh
From hacker at gnu-designs.com  Wed Oct 20 18:11:02 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Oct 20 18:12:15 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <41770A4E.9030305@hutchinson.net>
References: <8b.17ed6e78.2ea83dc0@aol.com>
	<Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>
	<41770A4E.9030305@hutchinson.net>
Message-ID: <Pine.LNX.4.61.0410202110350.11249@angst.gnu-designs.com>


> If nothing else, and I've said this before, bowerbird is a very good 
> troll.

 	Every bridge has its troll.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From servalan at ar.com.au  Wed Oct 20 18:18:35 2004
From: servalan at ar.com.au (Pauline)
Date: Wed Oct 20 18:20:19 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <41770827.4070100@hutchinson.net>
References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com>	<Pine.LNX.4.61.0410201727390.4096@angst.gnu-designs.com>
	<41770827.4070100@hutchinson.net>
Message-ID: <41770E6B.4050904@ar.com.au>

<delurk>
Joshua Hutchinson wrote:
> Bah... Winguts rules VI!

Yup. Huge time saver for processing texts: ascii, HTML, Unicode, 
Foo-AutoFormat8001...

Get da gooeyguts here:
http://mywebpages.comcast.net/thundergnat/guiguts.html

& the best software support I've ever encountered. More info & guiguts 
help/discussion in the DP Forums. You'll need to register at DP to view 
the relevant Forums.
</delurk>

Cheers,
P
-- 
Distributed Proofreaders: http://www.pgdp.net
"Preserving history one page at a time."

JabberID: servalan@jabber.org
Jabber? - http://www.jabber.org/about/overview.php

From Bowerbird at aol.com  Wed Oct 20 18:26:51 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 18:27:11 2004
Subject: [gutvol-d] Posting TEI
Message-ID: <1de.2c209486.2ea86a5b@aol.com>

joshua said:
>   (Everyone but bowerbird... 
>   don't even bother to respond, please... 
>   I'm trying to actually get something going 

i'm all in favor of actually getting something going.

as far as not "bothering" to respond,
if you'll do the same for me, it's a deal.

-bowerbird
From jon at noring.name  Wed Oct 20 18:38:57 2004
From: jon at noring.name (Jon Noring)
Date: Wed Oct 20 18:39:22 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
In-Reply-To: <158.419ce2ec.2ea850d9@aol.com>
References: <158.419ce2ec.2ea850d9@aol.com>
Message-ID: <91561720843.20041020193857@noring.name>

Bowerbird wrote:
> marcello said:

>>   1. 14 Feb 2003: You announce you will 
>>   code an open source ebook reader.

> yes.  that was a big joke.      :+)
> i was pulling jon noring's leg.

Hark, I hear my name!


> my guess is that jon noring still has
> less than 3 programmers involved
> -- jon, care to comment on that? --

Yes.

Plan A for the OpenReader 1.0 code base (there is a Plan B) is 75%
complete (and in good shape.) Because of a request not to discuss it
in any more detail, I can't say any more on this except that the
people working on it are really sharp and active in the XML and CSS
worlds (the founder actively serves on the W3C CSS working group),
with a *proven* track record in creating a real-pudding,
honest-to-god, and excellent XML+CSS-based product. They know their
document rendering stuff as well as anyone. We are seeking support to
finish the last 25% of the job, since they are a commercial outfit
with professional programmers and an investment in the code base they
do have, so that is holding things up, but the needed support is
small, and the final codebase will be donated and released under an
open source license under the control of the Consortium (discussed
below). It's a much better approach than kludging something together
from scratch since the codebase we are starting from is of very high
commercial quality, fast, compact, and supports many of the advanced
features we need such as SVG and advanced font-handling (not to
mention probably the best CSS parser in the world, and of course
fairly complex XML document handling capabilities suitable for OEBPS.)
It is also fully cross-platform (it is primarily developed for Linux
but already portable to Windows, thus it will easily port to Windows
and Mac OS X, both desktop and mobile flavors. Support for legacy Mac
and Palm is detailed at our web site (for the Plucker developers
reading this, I'd like to chat with you!)

However, there is more than just issuing the open source code base. We
also need to intelligently hammer out the OpenReader encapsulation
format spec (which is intended for more than just ebooks, such as
encapsulating web sites to compete with Microsoft's proprietary MHT
format), and most importantly the OpenReader conformance requirements,
so anyone else building their own OpenReader browser will not deviate
too far from the vision (we will encourage competitive OpenReader
browsers -- Mozilla, Opera and Safari folk are all capable of building
their own OpenReader versions, although it won't be trivial for them
since they will need to add SVG support and higher typographic
rendering capability including "paged" display which at present they
don't do for web browsers.) We will balance out the need for following
strict conformance rules in order to use the name 'OpenReader' with a
desire not to stifle innovation. I believe we will reach a proper
balance.

In addition, it is important to establish a Consortium, which is
simply an organized group of various key players in the ebook and
digital publication worlds who want OR to succeed (and are dedicated
to both open source and open standards in the digital publication
industry) since they will take advantage of it in some way which will
benefit them (either profit-wise for profit companies, and for
non-profits it will further their goals.) The Consortium (comprising
the members, and not any one individual such as yours truly) will hold
the IP to the OpenReader trademark so as to enforce conformance
requirements, and to maintain and improve the specifications via
established Technical Working Groups, either working under OpenReader
or maybe under some other umbrella organization (I've been offered the
DAISY-NISO umbrella, for example.)

So a lot of effort is going on behind-the-scenes to build the needed
relationships and interest in the Consortium, and we've had a great
increase in interest in the last couple weeks, with some fairly *big*
names in the ebook universe deciding to throw their name behind the
OpenReader vision. I don't believe we are at "critical mass" yet, but
we are definitely getting a lot closer. Will the little train make it
over the hill? -- we'll see.

Of course, anyone reading this, whether representing a company or
organization, or simply an interested individual, who wishes to
publicly state their support/endorsement for OpenReader (with no other
obligation asked for), please contact me in private. We are preparing
a supporters/endorsers web page showing the logos of companies/
organizations with links, and the names/affiliations of individuals.


> in his open-source openreader thing,
> and david rothman has been flogging
> openreader _incessantly_ on his blog,
> even relaying a specific request for
> mac programmers to join in and help.

Yes, we want OpenReader to be a community developed and maintained
effort. Community is the key to success in this instance, in my
opinion, since the ebook and digital publication realms are
essentially commercially and organizationally oriented (authors,
publishers, retailers, accessibility activists, librarians/archivists,
etc. -- but we will not forget to give ebook buyers and readers a say
in the process, unlike past standardization efforts in the ebook
realm which ignored them.) For example, refer to the Digital Radio
Mondiale effort ( http://www.drm.org/ -- yes, they use 'DRM' as their
acronym!) for an archetype of a fairly successful community effort to
establish an international, open standard for shortwave (and BCB/AM)
digital radio. Notice how they formed their "Consortium" -- getting
buy-in from a large number of companies/organizations who are working
together for mutual interest.

Jon Noring
http://www.openreader.org/

From Bowerbird at aol.com  Wed Oct 20 18:56:09 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 18:56:30 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <148.366aff1b.2ea87139@aol.com>

joshua said:
>   Come on back out, Anne.  It's just that 
>   everyone gets so worked up when bowerbird is around 
>   that they start snapping at everything.

that deal, of course, includes talking about me
in posts that you address to other people too...

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 19:02:54 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 19:03:13 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
Message-ID: <42.5ae9eb0a.2ea872ce@aol.com>


that's all wonderful to hear, jon, congratulations!
the world needs a good e-book viewer-program.
(now don't forget the authoring-tool as well!)         :+)

-bowerbird
From Bowerbird at aol.com  Wed Oct 20 19:03:54 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 20 19:04:10 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <1dc.2ea74971.2ea8730a@aol.com>

joshua said:
>   If nothing else, and I've said this before, bowerbird is a very good 
troll.

just can't help but blame me for your own bad behavior, can you...

-bowerbird
From ke at gnu.franken.de  Wed Oct 20 20:00:14 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Wed Oct 20 20:31:00 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com> (Joshua
	Hutchinson's message of "Wed, 20 Oct 2004 13:46:04 -0500")
References: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com>
Message-ID: <shr7nsojwh.fsf@tux.gnu.franken.de>

"Joshua Hutchinson" <joshua@hutchinson.net> writes:

> There will always be plain text files available.

You do more harm than good with these "plain text files".  At least,
please adjust your filename conventions:

    foo.zip   -> foo-7.zip
    foo-8.zip -> foo.zip

Otherwise you can fool the innocent downloader easily.  And stop wasting
bandwidth: delete all .txt files if a .zip file is available.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From lofstrom at lava.net  Wed Oct 20 20:49:51 2004
From: lofstrom at lava.net (Karen Lofstrom)
Date: Wed Oct 20 20:50:08 2004
Subject: [gutvol-d] Aside on old computers
In-Reply-To: <4175EF86.8060108@adelaide.edu.au>
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
	<4175EF86.8060108@adelaide.edu.au>
Message-ID: <Pine.BSI.4.58.0410201743510.20578@malasada.lava.net>


On Wed, 20 Oct 2004, Steve Thomas wrote:

> [I can't believe that people still think they're doing good by
> shipping old 486's to Africa -- but apparently its true. I
> recently donated some old Pentium II's to a charity, and they
> couldn't believe their luck.]

My Linux users group installs thin client computer labs for schools. We
happily accept PIIs, but turn down 486s. We use PIIs and PIIIs as thin
clients, removing the hard drives and installing bootable NIC cards, and
connect them to a fast server running K12LTSP Linux. We can create a
usable 30 client computer lab for $3000 or so, since the clients are all
donations.

Currently we're preparing clients for our first foreign lab, to be run by
Peace Corps Volunteers in Western Samoa. If that works -- then perhaps
Africa :)

-- 
Karen Lofstrom
Zora on DP

From tb at baechler.net  Thu Oct 21 00:13:59 2004
From: tb at baechler.net (Tony Baechler)
Date: Thu Oct 21 00:13:07 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041020184101.0F2849E980@ws6-2.us4.outblaze.com>
Message-ID: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>

At 01:41 PM 10/20/2004 -0500, you wrote:
>That is what the page numbers markup currently does.  It hides the page 
>numbers for those the minimum, default behavior, but if you have a browser 
>that supports it, you can see those page numbers appear.  Similarly with 
>poetry.  It has features that allow the browser to rewrap nicely if there 
>is a long line, if the necessary CSS support is there ... but if not, it 
>still displays the poem with its normal indents, it just doesn't rewrap 
>nicely for you.


OK, but I have a question.  I regularly use Lynx because of convenience.  I 
prefer plain text but I will sometimes use lynx to convert html when 
necessary.  Let's say that I do, in fact, want the page numbers.  How am I 
supposed to get them if my browser doesn't support it?  Lynx doesn't do css 
as far as I know, so what you're saying is that page numbers will always be 
hidden from me unless I want to look at the raw html source.  Because plain 
text is, among other things, removing the markup from html, wouldn't that 
also eliminate the page numbers?  I can use IE and it is accessible to the 
blind, but according to what you said IE hides different styles 
anyway.  So, unless I misunderstood the above completely, some information 
will always be inaccessible to me.  Right?  Please don't tell me to use 
Mozilla or some other browser.  At some point they will probably be 
accessible, but not for now.  They are working on it but aren't there yet.

I would like to repeat that I still prefer plain text and normally I 
wouldn't even care about line indents or page numbers. However, it would be 
nice to at least be able to access the information if I have a need for it. 

From tb at baechler.net  Thu Oct 21 00:27:51 2004
From: tb at baechler.net (Tony Baechler)
Date: Thu Oct 21 00:26:59 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
In-Reply-To: <20041020170436.532794BDAA@ws1-1.us4.outblaze.com>
Message-ID: <5.2.0.9.0.20041021002504.025972f0@snoopy2.trkhosting.com>

At 09:04 AM 10/20/2004 -0800, you wrote:
>I'm not spending as much time as I do with PG for him. I seriously
>doubt that he's interested in Ossian in Germany or Selections
>from Early Middle English. My target user is a scholar, whether
>a kid in high school, or a college student or professor or other
>person who may not have or may not be interested in waiting on
>interlibrary loan.


I would like to briefly comment on this.  As I've said here before, I am 
blind.  Yes, it is possible to get books in Braille or on cassette.  Until 
recently, electronic books were very hard to come by and PG was really the 
only major producer of them.  When I was in high school, often teachers 
wouldn't bother to get books in Braille in time.  I would often find out 
literally the day before that I needed to produce a book out of thin 
air.  PG saved my butt many times.  I was able to keep up because David 
Price had posted the works of Dickens and others had posted the works of 
Twain.  I would have never made it through English otherwise.  For that I 
will always be greatful! 

From jonathan_ingram at yahoo.com  Thu Oct 21 01:02:54 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Thu Oct 21 01:03:12 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>
Message-ID: <20041021080254.30607.qmail@web41727.mail.yahoo.com>


--- Tony Baechler <tb@baechler.net> wrote:
> OK, but I have a question.  I regularly use Lynx because of convenience.  I 
> prefer plain text but I will sometimes use lynx to convert html when 
> necessary.  Let's say that I do, in fact, want the page numbers.  How am I 
> supposed to get them if my browser doesn't support it?  

The markup I'm currently using for page numbers will not display them on
non-CSS-capable browsers -- and by default won't display on CSS capable
browsers either unless you change the stylesheet / switch to an alternate
stylesheet. It would be possible to use a different markup, which wouldn't
display page numbers on CSS-capable browsers (which can hide sections of HTML),
but would always display them on non-CSS-capable browsers.

As text-mode browsers are the main example of non-CSS browsers in use today,
the former markup made more sense to use, as it replicates the behavour
exhibited by the text-only edition (which doesn't record page numbers).

Both markup styles allow you to navigate to a particular page number, even in
non-CSS browsers, by using named anchors (i.e. append #pageXXX to the end of
the URL).

As you say, the information is in the source file, but currently inaccessible
to you. One of the ways to solve this problem is to switch to a relatively
standard master document format, such as TEI, combined with flexible tools that
could convert the source to other editions such as HTML or text, while allowing
us to choose how much of the preserved information, and to also choose how that
information was encoded. You could then easily generate for yourself a 'with
page numbers' text edition of the document you're interested in.

-- 
Jon Ingram


__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo 
From jonathan_ingram at yahoo.com  Thu Oct 21 01:09:22 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Thu Oct 21 01:09:41 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <417706F0.4070207@hutchinson.net>
Message-ID: <20041021080922.31094.qmail@web41721.mail.yahoo.com>


--- Joshua Hutchinson <joshua@hutchinson.net> wrote:
> I'll be going over this with Jon when I can, but my early idea is that 
> we work on a couple of DP e-texts (the two of us have TONS to choose 
> from!) and improve the XML markup standard enough for basic work.  In a 
> few weeks or so, I'd like to get a few projects posted to PG that use 
> XML (TEI) with a CSS style sheet in place of the normal HTML that we 
> always produce on our projects.  The normal text file will of course be 
> created.  Once we have a canon of TEI to work with, hopefully the 
> developers out there can start working on tools to help produce HTML or 
> TEXT or PDF directly from the master.

Just to put people's minds at rest, I don't believe we should post XML+CSS
without (at the very least) an HTML edition -- certainly not until we have
agreement on a common base of XML to use, and well tested tools to convert from
this to (at the minimum) an HTML edition that displays acceptably on a wide
range of browses.

Even if we do end-up using a 'nonstandard' XML markup at DP, I agree with
Joshua that we should try as hard as possible to ensure it can be converted
easily to TEI (derivatives of which seem to be in favour around here). People
at PG will not see the DP-internal markup, only our output, which will conform
to the standards we will hopefully agree on at some point :).

-- 
Jon Ingram


__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail 
From Bowerbird at aol.com  Thu Oct 21 02:14:47 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 02:15:12 2004
Subject: [gutvol-d] 126 messages in one day
Message-ID: <1f4.144353d.2ea8d807@aol.com>

by my count, there were 126 messages yesterday.

there were probably some _months_ in the last half year
when this listserve did not total that many messages...

there are a certain number of messages that i _will_
post before leaving here.  if y'all like this intense level
of traffic, then just keep on responding like you have been.
we can turn each one of my topics into a long-drawn-out
thread, if that's really what you want to do.  we can.

on the other hand, if you'd prefer a more sedate experience,
i suggest you just let me post my messages and move on...

whatever you want to do is quite alright with me...

but let's look at my "aspects of a well-done e-book" thread;
that generated 36 replies, and the only one that was really
pursuing the topic was david starner's, and his message was...
well, go and re-read it, if you want, and evaluate it yourself.
suffice it to say that y'all still don't have a standard for that.
it might be nice if we could have a _useful_ discussion.

as for the let's-get-x.m.l.-going crowd, perhaps what you need
is your own _separate_listserve_ to coordinate your efforts,
one where i am not allowed, so you can really be "productive".
good idea, eh?

-bowerbird
From marcello at perathoner.de  Thu Oct 21 02:42:16 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 02:42:39 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020210912.GB22445@panix.com>
References: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com>
	<20041020210912.GB22445@panix.com>
Message-ID: <41778478.3060609@perathoner.de>

Jim Tinsley wrote:

> That has lasted a lot longer than any of us would have believed 
> at the time, because despite the apparent reasonableness --
> to me, at least -- of the request, we still ain't got it.

Maybe the request is just that: reasonable to *you* and nobody else.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 21 03:10:18 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 03:10:43 2004
Subject: [gutvol-d] Posting TEI
Message-ID: <1e1.2d2ba8d4.2ea8e50a@aol.com>

marcello said:
>   Maybe the request is just that: 
>   reasonable to *you* and nobody else.

yeah, right...

-bowerbird
From marcello at perathoner.de  Thu Oct 21 03:31:59 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 03:32:29 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020205934.GA22445@panix.com>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<20041020173528.GB3366@panix.com>	<4176BDEF.7050008@perathoner.de>
	<20041020205934.GA22445@panix.com>
Message-ID: <4177901F.7010006@perathoner.de>

Jim Tinsley wrote:

> I really do not mean to be disrepectful when I -- speaking for myself --
> say that I'm not interested in spending my time making developers' jobs
> easier. That's not what I'm here for. 

One one hand you wonder why we developers are not able to come up with a 
solution on the other hand you are not disposed to get one inch back on 
your developer-unfriendly position.

Your policy of not posting TEI files is at present the main roadblock.

It's like requesting the final release of the product before allowing a 
beta test.

I have been doing other (hopefully useful) work and have not looked at 
the TEI code for about a year now because I don't see a way to get it to 
work with this `moratorium' in place.


> We have text, and HTML, both
> proven and well-supported formats that we know how to work with and for
> which we know there is a demand. I'll stick to those until we can see
> a way clear through to making successful XML.

You sure know how to work with PLAIN ALL CAPS ASCII TEXT FILES but 
that's not a reason to shun all progress since.


> Correct spelling is necessary but not sufficient. I don't know about
> other people, but I most commonly find errors by skimming the text.
> I can't do that with XML.

After a few weeks you'll skim thru TEI like you skim thru plain text. 
(Use an editor that highlights the tags and use a low contrast color for 
the tags.)


> And it may not be the way software development works, but then we're not
> a software development project. 

But you depend on software. DP is 250.000 lines of code. If it was not 
for software you wouldn't have much to do.


> that's not the problem. If the process we agree for teixlite is, say, run
> it through Saxon, then I expect to be able to run all teixlite files 
> through Saxon, and not have a submitter say "oh, no, you must use Xalan for
> this file, and not just any Xalan, but one with my patch in it."

You have to use PGTEI stylesheets to convert PGTEI text. You can use 
them with any XSLT 1.0 compliant processor.


> You see, we appear to differ very fundamentally on one point. It's
> my lock and key analogy again. I do not want to start down the road
> of producing posted files from an XML if the transform, will be, for
> any reason, not repeatable in a year's time, or five, or ten.

This amounts to the same as: never start at all. Remember: the first 
files were uppercase ascii. We *had* to do them over again. We *are* 
doing all pre-10K texts over again. We *will* have to do the TEI files 
over again, maybe more than once. That's only being realistic.


> I do
> not want to start down the road of producing posted files from XML
> if an end-user who wants to -- on whatever platform -- cannot 
> replicate the process.

Then you should also post all the scanned pages so a user can redo the 
OCR on her platform if she wants to.

I think we can postpone this, because the user can grab the converted 
files. And if converting at home is an issue with him, hey!, the tools 
are Free Software. He can change them until they work on his platform 
and and submit the patches to us.


>>For the start I will act as interim Post-Processor for people wanting to 
>>post PGTEI and pass on to you only the perfectly good ones. You'll just 
>>have to stick in the etext number where I put 5 asterisks.
> 
> No; I, at least, don't want to work with an experimental process in which
> each text is an exception.

Is there some qualifying exam to become a whitewasher?

I ask, because by now I'm so desperate that I'm quite willing to become 
a whitewasher myself just to see some TEI texts posted.


> Why can't we just name them .xml? I see no reason to invent extensions.
> _Is_ there one? Not that it matters much, just curious why you would 
> think this a good idea.

Because there ain't such a thing as an XML file. XML is just a framework 
for building applications. XHTML is an XML application, SVG is an XML 
application, TEI is an XML application, OpenOffice file format is an XML 
application ...

Labelling a file .xml is like labelling a Word file .bytes


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Oct 21 03:59:48 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 04:00:19 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020213206.GA10983@panix.com>
References: <20041020205934.GA22445@panix.com>	<20041020211431.57061.qmail@web41709.mail.yahoo.com>
	<20041020213206.GA10983@panix.com>
Message-ID: <417796A4.8000703@perathoner.de>

Jim Tinsley wrote:

> I know of no CSS for Marcello's PGTEI. Perhaps one could
> be crafted for it.

It already works pretty well with Jeroens XSL:

   http://www.gutenberg.org/tei/examples/css/lmiss.xml

I had to replace all named entities (like &mdash;) with numeric ones. I 
did that manually, so maybe I got some of them wrong.

All quotation signs are missing because I replace quotation signs with 
<q> </q> and Jeroen does not. But this should be very easy to add to 
Jeroens XSL.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From cweyant at twcny.rr.com  Thu Oct 21 04:44:40 2004
From: cweyant at twcny.rr.com (Curtis A. Weyant)
Date: Thu Oct 21 04:41:12 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410200919020.26661@aphrodite.gnu-designs.com>
References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com>
	<Pine.LNX.4.61.0410200919020.26661@aphrodite.gnu-designs.com>
Message-ID: <4177A128.10803@twcny.rr.com>

David A. Desrosiers wrote:

>     Or, more correctly, by going to View -> Use Style, because there is 
> no such selector in Mozilla or "Mozilla-based browsers" in the lower 
> left-hand corner. At least not on my Unix, Linux and Windows versions of 
> Mozilla (all current).

Firefox has an icon as JHutch describes.

Curtis.
From cweyant at twcny.rr.com  Thu Oct 21 04:52:03 2004
From: cweyant at twcny.rr.com (Curtis A. Weyant)
Date: Thu Oct 21 04:48:35 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <Pine.LNX.4.61.0410201246160.2084@aphrodite.gnu-designs.com>
References: <146.364dc5ec.2ea7eddb@aol.com>
	<Pine.LNX.4.61.0410201246160.2084@aphrodite.gnu-designs.com>
Message-ID: <4177A2E3.9080703@twcny.rr.com>

David A. Desrosiers wrote:

>     Users aren't using MSIE because it is the superior product, they're 
> using it because they have no idea there are significanly more secure, 
> functional, compliant browser alternatives out there, and because it 
> came with their pee-cee, with a nice convenient icon right on their 
> desktop.

This is why I installed Firefox on my mom's computer for her. I then 
deleted the IE icon and changed the Firefox icon to read "Internet" -- 
and I have no doubt she has not noticed a single change.

;O)

Curtis.
From marcello at perathoner.de  Thu Oct 21 04:49:30 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 04:49:55 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <8b.17ed6e78.2ea83dc0@aol.com>
References: <8b.17ed6e78.2ea83dc0@aol.com>
Message-ID: <4177A24A.40000@perathoner.de>

Bowerbird@aol.com wrote:

> those aren't the types of things michael hart would say.

That ain't no Hank Williams song!


> among the many things that that has meant is to work with
> the _trailing_ edge, not the _leading_ edge, of technology.
> and that strategy hasn't caused it to "stagnate", but rather
> what has caused it to grow into the biggest cyber-library...

Have you got any data to sustain your theory, like, uhm, a 
representative poll of pg user population?

Or are you bumbling along without a clue as usual, causally linking to 
facts that, for all *you* know, may be completely unrelated.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Oct 21 05:00:38 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 05:00:42 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
In-Reply-To: <158.419ce2ec.2ea850d9@aol.com>
References: <158.419ce2ec.2ea850d9@aol.com>
Message-ID: <4177A4E6.8060805@perathoner.de>

Bowerbird@aol.com wrote:

> anyone who's stupid enough to
> think _i_ would be _serious_
> in announcing an effort to write
> an oeb viewer is _really_ stupid.

Hey! Why be narrow-minded about this?

I think everybody who thinks that you'll ever be _serious_ is really stupid.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Thu Oct 21 05:09:19 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 05:09:19 2004
Subject: [gutvol-d] Re: jeroen's even-handed analysis
Message-ID: <20041021120919.105919E97A@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Karl Eichwalder <ke@gnu.franken.de>
> 
> "Joshua Hutchinson" <joshua@hutchinson.net> writes:
> 
> > There will always be plain text files available.
> 
> You do more harm than good with these "plain text files".  At least,
> please adjust your filename conventions:
> 
>     foo.zip   -> foo-7.zip
>     foo-8.zip -> foo.zip
> 
> Otherwise you can fool the innocent downloader easily.  And stop wasting
> bandwidth: delete all .txt files if a .zip file is available.
> 

Personally, I lump 8-bit and 7-bit text files together when I say plain text files.  I wouldn't shed a single tear if we did away with the 7-bit text files from here on out.  Then again, I personally, wouldn't shed a tear if we did away with text files completely, but that is a personal preference.  7bit files aren't any easier to read than 8bit files and the lost information (accents, etc) can be significant.

As far as the having everything .zip ... that's being a little anal, don'tcha think?

Josh
From marcello at perathoner.de  Thu Oct 21 05:10:43 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 05:10:46 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <Pine.LNX.4.61.0410202110350.11249@angst.gnu-designs.com>
References: <8b.17ed6e78.2ea83dc0@aol.com>	<Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>	<41770A4E.9030305@hutchinson.net>
	<Pine.LNX.4.61.0410202110350.11249@angst.gnu-designs.com>
Message-ID: <4177A743.6020108@perathoner.de>

David A. Desrosiers wrote:

>> If nothing else, and I've said this before, bowerbird is a very good 
>> troll.
> 
>     Every bridge has its troll.

How did you get rid of the troll?

I seem to recall, by having the bear come after you.

Maybe we could lure another troll onto this list and have the two of 
them fight each other until they've both had it.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Thu Oct 21 05:15:12 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 05:15:14 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041021121512.D434A4F4AB@ws6-5.us4.outblaze.com>

Well, the page numbers is something I looked into having in the non-CSS versions (LYNX).  However, it ended up being unworkable.  Basically, since you can't move them to the side in LYNX, they appear right in the middle of the text.  And then, if you didn't know already what it was, it looked like it was part of the sentence you were reading and didn't make a whole lot of sense.

This was the best compromise I could come up with.  A minimum functionality (the functionality offered by the vast majority of HTML texts we offer) with a few added benefits for the browsers that can support them.

And IE can show the page numbers, it just doesn't do it by default.  Since IE doesn't have on the fly style switching, it requires modifying the HTML doc manually.  Definitely something I wish I could avoid, but since I can't, I tried to default the layout to the least obtrusive.

Josh


----- Original Message -----
From: Tony Baechler <tb@baechler.net>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: re: Re: [gutvol-d] Re: aspects of  a well-done e-book
Date: Thu, 21 Oct 2004 00:13:59 -0700

> 
> At 01:41 PM 10/20/2004 -0500, you wrote:
> >That is what the page numbers markup currently does.  It hides the page 
> >numbers for those the minimum, default behavior, but if you have a browser 
> >that supports it, you can see those page numbers appear.  Similarly with 
> >poetry.  It has features that allow the browser to rewrap nicely if there 
> >is a long line, if the necessary CSS support is there ... but if not, it 
> >still displays the poem with its normal indents, it just doesn't rewrap 
> >nicely for you.
> 
> 
> OK, but I have a question.  I regularly use Lynx because of convenience.  I 
> prefer plain text but I will sometimes use lynx to convert html when 
> necessary.  Let's say that I do, in fact, want the page numbers.  How am I 
> supposed to get them if my browser doesn't support it?  Lynx doesn't do css 
> as far as I know, so what you're saying is that page numbers will always be 
> hidden from me unless I want to look at the raw html source.  Because plain 
> text is, among other things, removing the markup from html, wouldn't that 
> also eliminate the page numbers?  I can use IE and it is accessible to the 
> blind, but according to what you said IE hides different styles 
> anyway.  So, unless I misunderstood the above completely, some information 
> will always be inaccessible to me.  Right?  Please don't tell me to use 
> Mozilla or some other browser.  At some point they will probably be 
> accessible, but not for now.  They are working on it but aren't there yet.
> 
> I would like to repeat that I still prefer plain text and normally I 
> wouldn't even care about line indents or page numbers. However, it would be 
> nice to at least be able to access the information if I have a need for it. 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Thu Oct 21 05:22:48 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 05:22:50 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041021122248.8451EEDD67@ws6-1.us4.outblaze.com>


----- Original Message -----
From: "Curtis A. Weyant" <cweyant@twcny.rr.com>
> 
> This is why I installed Firefox on my mom's computer for her. I then 
> deleted the IE icon and changed the Firefox icon to read "Internet" -- 
> and I have no doubt she has not noticed a single change.
> 

Heh!  I'm not the only one that does "stealth" fixes on relatives' computers! ;)

Josh
From hacker at gnu-designs.com  Thu Oct 21 06:21:28 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Oct 21 06:22:11 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041021080922.31094.qmail@web41721.mail.yahoo.com>
References: <20041021080922.31094.qmail@web41721.mail.yahoo.com>
Message-ID: <Pine.LNX.4.61.0410210920090.25167@angst.gnu-designs.com>


> Even if we do end-up using a 'nonstandard' XML markup at DP, I agree 
> with Joshua that we should try as hard as possible to ensure it can be 
> converted easily to TEI (derivatives of which seem to be in favour 
> around here). People at PG will not see the DP-internal markup, only our 
> output, which will conform to the standards we will hopefully agree on 
> at some point :).

 	It doesn't matter if it is "non-standard XML" (of course, there is 
no such thing, as long as it is a well-formed XML document). Once the 
format is in something like XML, we are all free to create our own output 
from that base, including "correcting" the XML to output a different form 
of XML from it.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Thu Oct 21 06:26:23 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Oct 21 06:27:10 2004
Subject: [gutvol-d] 126 messages in one day
In-Reply-To: <1f4.144353d.2ea8d807@aol.com>
References: <1f4.144353d.2ea8d807@aol.com>
Message-ID: <Pine.LNX.4.61.0410210921530.25167@angst.gnu-designs.com>


> there were probably some _months_ in the last half year when this 
> listserve did not total that many messages...

 	Excuse me, "Listserve" is the trademarked name of a product owned 
and created by L-Soft International, Inc. We shouldn't be referring to the 
PG lists as a list run by that product, because well, it isn't. This is a 
mailing list run by a product called Mailman.

 	I may be being pedantic here, but since we should all be aware of 
certain copyright and trademark issues as we continue to work on PG and 
other similar products, the distinction is an important one to make.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Thu Oct 21 06:31:18 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Oct 21 06:32:11 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <4177A2E3.9080703@twcny.rr.com>
References: <146.364dc5ec.2ea7eddb@aol.com>
	<Pine.LNX.4.61.0410201246160.2084@aphrodite.gnu-designs.com>
	<4177A2E3.9080703@twcny.rr.com>
Message-ID: <Pine.LNX.4.61.0410210927110.25167@angst.gnu-designs.com>


> This is why I installed Firefox on my mom's computer for her. I then 
> deleted the IE icon and changed the Firefox icon to read "Internet" -- 
> and I have no doubt she has not noticed a single change.

 	I've done something similar for some local businesses here as 
well, except the IE icon itself, launches FireFox. You also have to 
remember to set the user's default browser to FireFox, or IE will always 
be used, and you have to make sure you change the shortcut that the one in 
the Start bar points to (the one next to the desktop icon, etc. to the 
right of the Start bar.)

 	After that, I installed about 6 extensions that help them with 
their daily work, and to compel them to stay on FireFox, and I installed 
the IE FireFox theme, so it "looks" identical.

 	So far, no complaints, and one company even said they can't 
believe they were using MSIE before for all of their Inter and Intranet 
work, given the huge number of nice features in FireFox that are lacking 
in MSIE. (Little trivia fact: The last version of MSIE was released over 
two years ago).

 	This also included the dramatic reduction in their support calls 
after installing FireFox, because they didn't have to handle sending out a 
tech to deal with viruses, trojans, spyware, popups, and other things that 
MSIE seems rife to proliferate.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From SCREAMING.xml.queen at gmail.com  Thu Oct 21 06:36:33 2004
From: SCREAMING.xml.queen at gmail.com (name>XML Queen</name)
Date: Thu Oct 21 06:36:36 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <4a9b73c9041021063668bcda88@mail.gmail.com>

Marcello Perathoner wrote:

> Maybe we could lure another troll onto this list and have the two of them fight each other until they've both had it.

You rang, babycakes?

I am XML Queen Letitia! Hear me ROAR!! 

You've been doing things all 
wrong for the last 30 years. :+?/>

x/M/l is the Way, the Truth, and the Light. There's no point
what-so-fricken-ever
in
giving people choice. FEED the Democratic citizens of Gutenbergia XmL
and they will
use it
with my viewer PROGRAM which will dazzle them. 

I have written 
ROUTINES in LOGO which will transform your worthless TXT into XML,
XSLT, XQML, and XBB38GGOBS.
You cannot have these
routines unless you bow down to me becuase otherwise YOU are not worthy. 
It will be fun for you to 
redo work already done ;+?/>

I have an ally in this list I SEE. BOWERBIRD 
is very evanGELical, but she 
consistently mis-labels xML (I know z and x are very close on the keyboard). 

I am a competitive interpretive yogalates-funk artist AND I KNOW
BETTER THAN YOU GEEKS.

Fear me,
Letita.
--
twajs
From tb at baechler.net  Thu Oct 21 06:46:03 2004
From: tb at baechler.net (Tony Baechler)
Date: Thu Oct 21 06:44:58 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <20041021080254.30607.qmail@web41727.mail.yahoo.com>
References: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>
Message-ID: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>

At 01:02 AM 10/21/2004 -0700, you wrote:
>As you say, the information is in the source file, but currently inaccessible
>to you. One of the ways to solve this problem is to switch to a relatively
>standard master document format, such as TEI, combined with flexible tools 
>that
>could convert the source to other editions such as HTML or text, while 
>allowing
>us to choose how much of the preserved information, and to also choose how 
>that
>information was encoded. You could then easily generate for yourself a 'with
>page numbers' text edition of the document you're interested in.


So, does this mean that I now not only have to download the master xml 
file, the css, and a set of conversion tools?  You must be kidding, 
right?  If it came to that, I would rather have the plain text and forget 
the page numbers.  It is already inconvenient to use "lynx -dump -nolist 
filename.htm."  Why in the world would I want to run it through a 
conversion tool and still have to do that anyway?  OK, so a plain text file 
can be output directly from the xml.  I still have to go through at least 
one extra conversion step that I wouldn't have to otherwise.  I had a look 
at sgml just to see how hard it would be to get plain text.  What a royal 
pain!  I gave up when it kept complaining about some file missing when I 
was using their samples. 

From joshua at hutchinson.net  Thu Oct 21 06:51:02 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 06:51:06 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
Message-ID: <20041021135103.0B17A1097DB@ws6-4.us4.outblaze.com>

No, no, no.  We NEVER expect you (as a consumer) to have to convert files from the master XML format to format du jour.

However, if you want something non-standard (like a text file that *includes* source page numbers), *then* you may have to convert it yourself.  If you want an HTML with pink backgrounds and purple text, you can convert a copy yourself (if you have the tools and the know-how) but we won't have such a version pre-made on the download page.

Basically, we will never have LESS available than we do now.  The XML initiative is about giving more options above and beyond the baseline we already have (as well as making like easier on the text preparers and maintainers).

Josh


----- Original Message -----
From: Tony Baechler <tb@baechler.net>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: re: Re: [gutvol-d] Re: aspects of  a well-done e-book
Date: Thu, 21 Oct 2004 06:46:03 -0700

> 
> At 01:02 AM 10/21/2004 -0700, you wrote:
> >As you say, the information is in the source file, but currently inaccessible
> >to you. One of the ways to solve this problem is to switch to a relatively
> >standard master document format, such as TEI, combined with flexible tools 
> >that
> >could convert the source to other editions such as HTML or text, while 
> >allowing
> >us to choose how much of the preserved information, and to also choose how 
> >that
> >information was encoded. You could then easily generate for yourself a 'with
> >page numbers' text edition of the document you're interested in.
> 
> 
> So, does this mean that I now not only have to download the master xml 
> file, the css, and a set of conversion tools?  You must be kidding, 
> right?  If it came to that, I would rather have the plain text and forget 
> the page numbers.  It is already inconvenient to use "lynx -dump -nolist 
> filename.htm."  Why in the world would I want to run it through a 
> conversion tool and still have to do that anyway?  OK, so a plain text file 
> can be output directly from the xml.  I still have to go through at least 
> one extra conversion step that I wouldn't have to otherwise.  I had a look 
> at sgml just to see how hard it would be to get plain text.  What a royal 
> pain!  I gave up when it kept complaining about some file missing when I 
> was using their samples. 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Thu Oct 21 07:06:08 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 07:06:13 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020215041.GA3631@panix.com>
References: <20041020205934.GA22445@panix.com>	<20041020211431.57061.qmail@web41709.mail.yahoo.com>	<20041020213206.GA10983@panix.com>
	<20041020215041.GA3631@panix.com>
Message-ID: <4177C250.3020600@perathoner.de>

Jim Tinsley wrote:

>>Lets start now with a version 0.0.1 of the TEI process. Of course at 
>>some later time we'll have to do all the posted files over again. 
> 
> Now, please don't take this as a policy statement or 
> anything, but I really, really HATE doing anything 
> KNOWING that it's wrong and will have to be done again.
> I mean, bone-deep HATE it.

*Why* is it wrong?

If, having to redo something is an indication of premature start, we 
shouldn't have posted the first 10.000 books, because we have to repost 
them all.

I have been architecturing and programming for 20 years now and I cannot 
remember one single instance I got it 100% right the first time.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jonathan_ingram at yahoo.com  Thu Oct 21 07:12:44 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Thu Oct 21 07:12:47 2004
Subject: [gutvol-d] Re: aspects of  a well-done e-book
In-Reply-To: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
Message-ID: <20041021141244.32137.qmail@web41701.mail.yahoo.com>


--- Tony Baechler <tb@baechler.net> wrote:
> So, does this mean that I now not only have to download the master xml 
> file, the css, and a set of conversion tools? 

If you wanted material in plaintext format which wasn't in the plaintext
edition provided already, yes.

> If it came to that, I would rather have the plain text and forget the 
> page numbers.  

If you want the mainstream plain text edition, then you download the mainstream
plain text edition. If you want to create your own edition with extra material,
then at the moment you're completely out of luck, as there's no way for you to
generate it. With a transformable master document, then you get the chance to
get your hands on this information in the format you want, for the first time.

-- 
Jon Ingram


__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 
From gbnewby at pglaf.org  Thu Oct 21 08:02:27 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Oct 21 08:02:28 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <4177901F.7010006@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com>
	<4177901F.7010006@perathoner.de>
Message-ID: <20041021150227.GA17442@pglaf.org>

On Thu, Oct 21, 2004 at 12:31:59PM +0200, Marcello Perathoner wrote:
> Jim Tinsley wrote:
> 
> >I really do not mean to be disrepectful when I -- speaking for myself --
> >say that I'm not interested in spending my time making developers' jobs
> >easier. That's not what I'm here for. 
> 
> One one hand you wonder why we developers are not able to come up with a 
> solution on the other hand you are not disposed to get one inch back on 
> your developer-unfriendly position.
> 
> Your policy of not posting TEI files is at present the main roadblock.
> 
> It's like requesting the final release of the product before allowing a 
> beta test.
> 
> I have been doing other (hopefully useful) work and have not looked at 
> the TEI code for about a year now because I don't see a way to get it to 
> work with this `moratorium' in place.

I don't understand this limitation, so will rephrase what
we're waiting for.  It was among the first messages in this thread.

** What we want is an automatic means of generating canonical
** documents from an XML master.

The minimums are:
	XML --> HTML
and	XML --> text  (yes, it's ok to go via HTML)

Displaying XML directly in a browser is not a requirement, but
is nice to have.  There are a few subsidiary requirements,
like incorporating the header materials in a sanely marked
up way (trivial with teixlite.dtd, but not unambiguous).

Both Marcello and Jeroen have demonstrated techniques for
these, but neither is quite ready.  We do have several
XML documents online, we also have this list (also gutvol-p),
and there are a couple of demonstration pages.

Your claim that we need to start posting more stuff in
XML in order to achieve the ** goal above does not make sense
to me.  I do not see the logic.

I'm personally not strongly opposed to doing all sorts of
experimentation, and do NOT feel the urge to get it right
from the start.  I also am certain that there is not going
to be a one-size-fits-all technical solution for all of our
content.

I've asked both Marcello & Jeroen for updates & ideas in
the past months.  Maybe they did not get my messages.  My
belief is that there is a definite commitment at PG (including
DP) in creating XML masters.  I also believe that TEI-lite
encoding will work well for the majority of our content.

>From my point of view, I'd rather see the gutvol-d group
of highly motivated & talented individuals focused on
solving the remaining challenges for the solutions that Marcello
and Jeroen have in place already.  Arguing about whether
we'll use XML is a waste of time: we will.  The challenges
before us are primarily technical, not policy.
  -- Greg

PS: No, I have not read every message in the threads over
the past few days.  If there's another solution somewhere,
I hope someone can point it out to me.

From brad at chenla.org  Wed Oct 20 00:25:18 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Oct 21 08:03:32 2004
Subject: [gutvol-d] jeroen's even-handed analysis
In-Reply-To: <4175B419.6030301@adelaide.edu.au> (Steve Thomas's message of
	"Wed, 20 Oct 2004 10:10:57 +0930")
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
Message-ID: <wkis95am1t.fsf@chenla.org>


Ack! This is a looong post.... and I'd promised myself I wouldn't get
dragged into this flame-fest :(

Steve Thomas <stephen.thomas@adelaide.edu.au> writes:

> OK, you've somewhat overstated the case, and I think by now we'd all
> agree that "8-bit" characters are important. But it is a shame that
> most of the geeks -- no offence, I count myself as one -- on this
> list, immediately skipped your main point to whine about the need for
> accents and foreign scripts. You guys can't seem to see the wood for
> the trees.
>

You're right, it's not just about accents, and it's not just about
consistently converting texts into different formats, though these
are both important issues in their own right.

This aside, it's you who have it backwards.  You keep talking about
the end-use of the text, which is opening up a file and reading it.

But it's far from being this simple.  XML is not meant for humans, it
is meant for software.  The XML will be converted to plain-text, HTML
and PDF for humans but mostly the XML will be used by applications
humans need to find texts and determine if they are worth reading in
the first place.

If you have a small library with 10,000 books in it, and the library
is shelved roughly by category you can easily get to know it just by
glancing over the spines.  

You could even have a rough list that breaks down the books by title,
author and category.  But if you have 100,000 or a 1,000,000 books in
your library your job of finding things becomes a lot more
difficult.  Keyword searching ala Google fill never cut it.  Google
gives you a means of finding your car keys -- you know what you are
looking for and you ask it to look for places which it thinks might
have them.

Ask Google for a list of the works by Charles Dickens and you will get
a list of web pages it thinks has lists of Charles Dickens' works.
Ask the LOC Catalog the same question and it will return you a list
of items in their catalog which claim to have been written by Charles
Dickens.  But this list would be huge because of duplicate editions
of individual works.  A Christmas Carol alone turns up a couple
hundred items.

But what if you could ask this same question and it would return a
list of works (not web pages, or different editions) by Charles
Dickens organized in any way you want?  But this is not a good
example.  Can you ask for a list of all the characters in Great
Expectations?  Can you search for all contemporary obituaries of
Charles Dickens?

To build applications which answer these types of questions requires
more than a good cataloging system (though the FRBR approach goes a
long ways in this regard) you need the table of contents of each work
(a TOC is description of the structure of a text) and you need to have
a good index of what is in the text.  A back of book index is more
than just a matter of keywords, it is a form of semantic markup.  It
maps concepts, people, places and events to the text itself.

By combining the catalog metadata, the table of contents, and a good
quality index we have the basic tools for finding a book and
determining if it is worth reading.  We do this today in libraries but
it is a slow laborious task which requires you going to a catalog
looking for possible candidates, then retrieving each candidate and
scanning it's TOC, preface, dust-jacket blurb or introduction or index
to determine if it's worth reading.

Traditional libraries are restricted by the physical medium that
books are published in.  But if you could pull all of these elements
together into a consistent framework, you would have a remarkable
resource which would transform an archive of books into a repository
of knowledge which is far more valuable and powerful than the sum of
its parts.

Semantic markup like TEI is needed not only for creating this kind of
library, but for creating services which will be needed as the amount
of information on the Net grows beyond what even monster search
services like Google can handle.

You talk about missing the forest for the trees but you forget that a
large part of the forest is a tangled root system deep underground
which the end user will never see.  Without that root system the
forest will die.  Structured, semantic markup and rich cataloging are
the root system of a library.

Anyone who says -- I don't care about the technical stuff just give
me what I want, doesn't understand that it's the technical stuff
which enables them to get the stuff they want.

Is this hard work?  Hell yes, and it should be.  Understanding,
evaluating and making sense of the world around us is the most
difficult thing humans do.  But saying that it's not worth doing
because it's hard is simply pathetic.

Look at works like the OED.  Would they have been created if their
attitude was, oh, it's too hard to build a dictionary based on
historical principles and I don't read the quotes much anyway, so just
give me a list of words.  Even if you don't read the quotes, the
unabridged OED and the unabridged Websters, or Century Dictionary
were used to create  brilliant concise works like Merriam-Websters
Collegiate, or the Concise Oxford English Dictionary.  The OED and
the massive collection of research and material that was created to
write is the root system for all dictionaries Oxford produces.

The more important question we should be asking is, what is the role
that PG and even DP should be playing in all of this.  It's
reasonable to ask that PG produce basic structured markup which shows
the basic structure and important elements in each text.  This is no
more difficult than HTML.

I believe that a new group needs to be established who will then take
the simple TEI produced by PG and DP and then doing more complex
cataloging, indexing and semantic markup which will then be sent back
to PG to be released as new editions.

The TEI documentation (which is 1,400 pages -- not 14,000 as Bowerbird
exaggerated) recommends that markup be done in several passes.  Start
with simple structural markup (as I said, is about the same as HTML),
and then pass it onto another team which can do a second more detailed
pass, and so on until its complete.  In this way you have a means of
creating texts which will be gradually woven into the library but
everyone will be using a consistent and interoperable format which can
be as simple or as complex as anyone requires.

If everything is in basic TEI-Lite, it will be easy for smaller
specialized groups to come along to do this additional markup.  A
group could form around a single author like Mark Twain, or around a
category of works like mathematics.  Then it will be easy for them to
donate back their work to PG, making the texts richer, rather than
their work becoming a separate branch of the texts which aren't
interoperable with the PG editions.

Plain-text, HTML and PDF can't do this because they are display
formats for human consumption.  Each have their uses and their
markets.  TEI is used for the root system which needs to be grown,
tended and cared for as the forest grows, even if 90% of people aren't
even aware it's there or don't understand that the applications they
depend on to find any particular tree in the forest and see if it's
the tree they need, wouldn't work without it.

To understand where I'm coming from on all of this, I should mention
(plug plug plug) that I've been working on just such a system
(http://www.chenla.org) which is divided into two parts -- the Burr
Metadata Framework (BMF) which is meant to be sort of a Wiki markup
for both integration of and export to TEI and MARC. 

The second part is the Librarium which uses BMF to integrate the
catalog with the works in a library.  We have recently put up our
first experimental record (an authority record Charles Dickens) which
has been converted into html, and plain text.  Conversion to TEI and
MARC is coming.

Taken together, the system can be used to integrate library catalogs
with books and other texts and reference works all together with
authority data for persons and groups, geographic locations, events
and concepts.  We don't intend to be a service for the general public
but rather create a catalog and content for other use in other
libraries and web sites.

The site is hosted at ibiblio.  In the next few weeks we should have
enough documentation and another 30 or more records (which we call
Burrs) online to make a general announcement of the project.  I'm
still ironing out some bugs in the version control software and still
need to do a lot of work to complete a general introduction to the
design but it's all getting there.

At the moment we can convert BMF to Emacs-Wiki format which I then use
to publish to Blosxom which delivers basic HTML.  BMF was designed
with conversion to TEI in mind, though this might seem hard to
believe when you look at the BMF source the first time (there is a
link to a pretty-print version of the Dickens source).

So what's in it for PG?

The Librarium will be developing detailed authority and bibliographic
records for all PG material and it's hoped that PG can eventually draw
on our catalog material for it's own authority records and catalog.
This should be a help both for books already in PG's collection but
also for copyright clearance for new books and free up resources for
putting out more books, with better metadata.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From joshua at hutchinson.net  Thu Oct 21 08:18:16 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 08:18:20 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <20041021151816.691302F975@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Brad Collins <brad@chenla.org>

<snipped a wonderful essay>

Thank you, Brad.  That was probably the best essay I've read on WHY XML markup is such a good idea.  I've got a good 40-50 years left in this lifetime.  I want to get to the digital e-text nirvana by then.  XML seems to be the best first step.

Josh
From nwolcott2 at kreative.net  Thu Oct 21 08:42:46 2004
From: nwolcott2 at kreative.net (Norm Wolcott)
Date: Thu Oct 21 08:47:29 2004
Subject: [gutvol-d] POD update.
Message-ID: <000f01c4b784$b30609e0$4f9495ce@net>

This is probably of most interest to DP'ers, but this is the only discussion site. 

Recently several new self publishing sites have emerged with zero up front publishing costs. Two I have looked at are www.lulu.com and www.cafepress.com There are 2 jules verne books on Cafe. Costs are $7 + .03/page for Cafe and $4 + .02/page for Lulu. Print resoloution, gray scale images not covered in the web sites. 

These prices are comparable to xerox costs. I checked out the shipping it was $4 flat on cafe, and 2.46 media mail from Lulu for a 8 1/2 by 11 300 p book. 

Journey to the Interior of the earth was 300 pages, so it might include some illustrations. No preview was available. Might be on Amazon.com, says Boulder Pier Press.

Check back withme if you get any more info on these or better no cost upfront services. 


nwolcott2@post.harvard.edu  Friar Wolcott, Gutenberg Abbey, Sherwood Forrest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/2b19d8cb/attachment.html
From ke at gnu.franken.de  Thu Oct 21 08:21:54 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Thu Oct 21 09:31:50 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <20041021150227.GA17442@pglaf.org> (Greg Newby's message of "Thu, 
	21 Oct 2004 08:02:27 -0700")
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com>
	<4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org>
Message-ID: <shis94nlkd.fsf@tux.gnu.franken.de>

Greg Newby <gbnewby@pglaf.org> writes:

> Your claim that we need to start posting more stuff in
> XML in order to achieve the ** goal above does not make sense
> to me.  I do not see the logic.

I am very interested in XML files.  Please post them even if they look
useless to you.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From Bowerbird at aol.com  Thu Oct 21 09:33:37 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 09:33:49 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
Message-ID: <1e2.2cbee04e.2ea93ee1@aol.com>

marcello said:
>   I think everybody who thinks that 
>   you'll ever be _serious_ is really stupid.

and i think any reasonable outsider reading these threads
will be able to clearly see that i contribute a lot of thought,
while all you do here is come on and badger me continuously.

-bowerbird
From joshua at hutchinson.net  Thu Oct 21 09:40:44 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 09:40:50 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <20041021164044.47AE29E9A9@ws6-2.us4.outblaze.com>

Honestly, I see an easy compromise here.

As long as a conformant TEXT file and a conformant HTML file show up with the XML file, I say post all three.  Granted, right now we don't have a method for the WW'ers to verify the XML file is valid, so if you want to put a disclaimer to that effect in the file ... fine.

But this way, the XML folks can get the catalog of texts they want and the process will be able to make incremental steps forward.

Josh


----- Original Message -----
From: Karl Eichwalder <ke@gnu.franken.de>
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] Re: barriers to XML posting
Date: Thu, 21 Oct 2004 17:21:54 +0200

> 
> Greg Newby <gbnewby@pglaf.org> writes:
> 
> > Your claim that we need to start posting more stuff in
> > XML in order to achieve the ** goal above does not make sense
> > to me.  I do not see the logic.
> 
> I am very interested in XML files.  Please post them even if they look
> useless to you.
> 
> -- 
>                                                          |      ,__o
>                                                          |    _-\_<,
> http://www.gnu.franken.de/ke/                            |   (*)/'(*)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Thu Oct 21 09:46:05 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 09:46:10 2004
Subject: [gutvol-d] press releases and puke on a reporter's shoes
Message-ID: <20041021164605.1A47FEDE98@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> 
> marcello said:
> >   I think everybody who thinks that 
> >   you'll ever be _serious_ is really stupid.
> 
> and i think any reasonable outsider reading these threads
> will be able to clearly see that i contribute a lot of thought,
> while all you do here is come on and badger me continuously.
> 

Yep ... That's why *so* many people have been jumping to your defense lately!  You're the poor little victim of us big mean bullies!

Josh
From holden.mcgroin at dsl.pipex.com  Thu Oct 21 09:49:00 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Thu Oct 21 09:48:17 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
Message-ID: <4177E87C.90101@dsl.pipex.com>

> So, does this mean that I now not only have to download the master xml 
> file, the css, and a set of conversion tools?  You must be kidding, 
> right?  If it came to that, I would rather have the plain text and 
> forget the page numbers.  It is already inconvenient to use "lynx -dump 
> -nolist filename.htm."  Why in the world would I want to run it through 
> a conversion tool and still have to do that anyway?  OK, so a plain text 
> file can be output directly from the xml.  I still have to go through at 
> least one extra conversion step that I wouldn't have to otherwise.

Why? The whole idea behind PG moving to XML is not to complicate things, 
it's to give more flexibility while retaining simplicity. How about this 
situation:

PG files are, by default, coded in XML. All other formats are then 
automatically generated from that XML format. There would still be TXT 
versions, there would still be HTML versions. Getting those would be no 
harder than it is for you to retrieve a TXT file now. All this 
conversion stuff should be done by the PG back-end, not the end-user 
(why make a human do a machine's job?).

That way, instead of manually preparing every different format like what 
goes on at PG for the most part now, we could make every format 
available with only the effort of creating a super-format from which 
every other format could be derived and a set of tools which could 
automatically generate other formats from the super-format. If someone 
wants the entire PG library to be available in some obscure format, then 
it could be if they can just write a converter which outputs that format.

Cheers,
Holden
From marcello at perathoner.de  Thu Oct 21 09:54:35 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 09:54:43 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <20041021150227.GA17442@pglaf.org>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<20041020173528.GB3366@panix.com>	<4176BDEF.7050008@perathoner.de>
	<20041020205934.GA22445@panix.com>	<4177901F.7010006@perathoner.de>
	<20041021150227.GA17442@pglaf.org>
Message-ID: <4177E9CB.7080200@perathoner.de>

Greg Newby wrote:

> I don't understand this limitation, so will rephrase what
> we're waiting for.  It was among the first messages in this thread.
> 
> ** What we want is an automatic means of generating canonical
> ** documents from an XML master.
> 
> The minimums are:
> 	XML --> HTML
> and	XML --> text  (yes, it's ok to go via HTML)

You already got that. There are 2 different ways to do this, both of 
them mature enough for beta testing.

The roadblock is that a 100% correct and complete solution was requested 
by Jim before he considered starting to post TEI texts.

Now, we don't have a toolchain for the whitewashers that is equivalent 
to the one already in place for TXT and HTML files.

That's why I volunteered to act as "interim" whitewasher: to manually go 
thru the steps needed to post a TEI file and derivative formats, to 
understand how this toolchain needs to be built. I will only take a few 
texts (maybe a dozen) from a few selected sources


Some of the objections raised by Jim will not go away real soon. He says 
he cannot skim thru a TEI file like thru a TXT or HTML file. But there 
are at present no readers that accept TEI as native file format. If we 
had to build that first (Jon Noring is trying), we will likely never 
start posting.

I feel Jim is raising artificial objections he knows we cannot overcome. 
If he doesn't want to learn TEI and he doesn't feel like proofing a TEI 
text in emacs, fine. But then, he should step aside and let other people 
do this work.


Now for another thing.

Jim fears that we will end up with a lot of files marked up in 
differing TEI dialects. OTOH, the moratorium has actively encouraged this.

People being eager to try TEI and there being no official place to post 
TEI files, everybody has posted the files they have marked up in a 
different place. I have been working on my dialect, Jeroen on his and DP 
is cooking up another one. There is no central "clearing house" where we 
can see the other guys work. I don't say it would be impossible for me 
to obtain a glimpse of the TEI texts the folks at DP are working on, it 
would just be much easier if I could get them from the archive.

At this point we need to set a signal that the TEI era has started.

We don't need more discussion about whether TEI is the right language, I 
think we are all agreed on that. pgxml.org is dead and ZML is good for 
laughs.

What we need now is to compare notes, all who have been doing TEI and 
get to an agreement of which dialect to use. That can be best reached if 
we all post samples of our work and try to run the other guys markup 
thru our XSL etc. etc.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 21 10:35:55 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 10:36:11 2004
Subject: [gutvol-d] re: poor little victim
Message-ID: <f6.4368ae82.2ea94d7b@aol.com>

joshua said:
>   You're the poor little victim of us big mean bullies!

i'm no "poor little victim".

i am the one toying with all of you.

i wasn't "complaining" about marcello's badgering
-- i find it amusing he shoots himself in the foot, and 
the last thing i will do is go whining to a moderator --
i was simply remarking on it as a mere observation,
confident that an objective outsider would share it...

your little circle of friends deludes itself
if y'all think that the lurkers can't see that...

-bowerbird
From Bowerbird at aol.com  Thu Oct 21 11:00:02 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 11:00:18 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <1b8.4414d73.2ea95322@aol.com>

the hacker said:
>   Every bridge has its troll.

you're now a full-fledged member of the group, david.  _welcome_...

-bowerbird
From Gutenberg9443 at aol.com  Thu Oct 21 12:35:48 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Oct 21 12:36:14 2004
Subject: [gutvol-d] jeroen's even-handed analysis
Message-ID: <104.53658a2d.2ea96994@aol.com>

 
In a message dated 10/20/2004 6:58:19 PM Mountain Standard Time,  
joshua@hutchinson.net writes:

Anyways,  no one is trying to chase you away.  Please come back out to  play!


Thank you.
 
I continue to think that PG is one of the most important weapons of  freedom, 
and if that is naive, so be it.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/69b0d379/attachment.html
From Gutenberg9443 at aol.com  Thu Oct 21 12:36:47 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Oct 21 12:36:59 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <d9.1768f65b.2ea969cf@aol.com>

 
In a message dated 10/20/2004 7:12:28 PM Mountain Standard Time,  
hacker@gnu-designs.com writes:

Every  bridge has its troll.


So where are the billy goats gruff? THEY know how to deal with  trolls.
 
Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/cdef92ea/attachment.html
From joshua at hutchinson.net  Thu Oct 21 12:38:31 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 12:38:41 2004
Subject: [gutvol-d] A layman's critique of PGTEI
Message-ID: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>

First of all, please take this as it is intended, namely my experiences while attempting to convert a short and sweet text to the PGTEI format found here (http://www.gutenberg.org/tei/).  It is my hope that this will lead to some improvements in the process.

I also apologize for the large size of this message.  I felt it was necessary to get all the information in place in the email so no one was making assumptions about info that was alluded to but not present.

Ok, I decided to start at the beginning ... namely our first e-text, the US Declaration of Independence (original: http://www.gutenberg.org/etext/1).  The upside is that is short and very simple.

The first thing I did was grab the standard PGTEI header from the documentation (http://www.gutenberg.org/tei/doc/pg-guide.html#toc_12).  This spaghetti is lot easier than it looks.  Further, I definitely see how this could be easily generated by filling out a web form (somewhat similar to what I understand is done right now when DP submits a text to the whitewashers).  It contains information like the original creation date, who wrote it, Library of Congress subject classifications, who converted it to TEI (me), etc.

(The PGTEI encoded file is attached at the end of this message for reference purposes.)

The nice thing here is that the PG header and footer information is auto-generated by the <front></front> and <back></back> sections.

The Declaration file in PG has a small foreword from Michael.  I felt that should not be marked as if it were part of the main document.  Luckily, TEI-Lite documentation provided a solution.  (http://www.tei-c.org/Lite/teiu5_en.html#h52)  You can mark up some text in the header section with a type of foreword.

EXAMPLE:

<div1 type="foreword">
  This is an example foreword section.
</div1>

Note: Paragraphs have to be surrounded by <p></p> markup, just like HTML.  This shouldn't be difficult for anyone trying to tackle this... It certainly felt natural enough for me.

Next, I added the actual Declaration text in the <body></body> section.  Again, all the paragraphs needed to be wrapped in <p></p>.  I also ran into one problem with the & character.  Instead of looking up the escape code, I went lazy and just converted it to 'and'.  This would also be required for an HTML edition, so I don't consider that a big deal.

That was it on creating the PGTEI markup.  Total time, even with looking things up, maybe 20 minutes of my time.  And, no, I haven't done this before, so this is coming into it raw.

Next step was to use the validator on the page (http://www.gutenberg.org/tei/services/tei-online).  It complained about one typo on my part and the & I mentioned before.  The errors are NOT very friendly, but anyone familiar with the W3C validator should be able to puzzle it out.

Next, I had it create a text file.  This went very well.  The resulting file looked pretty good to me.  I didn't run it through GutCheck, but nothing jumped out at me as being problematic.  Granted, this was a very simple text, so there are probably limitations in this conversion that I just haven't run into yet.

Lastly, I had it create a HTML file.  There are two problems I encountered here.  One, cosmetic and fixable by changing the CSS, isn't that big a deal.  The second is more of a deal breaker, but still fixable, I'd imagine.

1) The CSS specifies more of a printed page style than a web based style.  For instance, all the paragraphs have no blank line between them and have a first line indent, just like a printed page.  However, to me, this was a bit jarring, since it isn't the format I'd used to on the web.  Again, this is mostly cosmetic and easily changeable.

2) The resulting HTML, while rendering fine in the browsers I have here, is NOT valid HTML.  The file specifies HTML 4.01 strict, but there were 13 warnings/errors when I used W3C's validator on it.  I didn't check real closely, but it looked like some of them were perfectly valid under HTML 4.01 transitional, and the others are fixable.  The XSLT conversion process can probably be tweaked by someone knowledgable in that area to eliminate the validation errors.

***

Well, that's my quick personal experiment.  My question for the experts: Can the HTML validation problem be easily fixed?  I'd also like to request a change to the CSS used, but that is a personal preference and something to really worry about after the show-stoppers are fixed.

My next experiment will choose a text with some other stuff like poetry in it, so that I can see what more complexity does to the whole process.

Josh

****

Attached Declaration PGTEI file:


<?xml version="1.0" encoding="iso-8859-1" ?>

<!DOCTYPE TEI.2 SYSTEM "pgtei.dtd">

<TEI.2 lang="en-us">
<teiHeader>

  <fileDesc>

    <titleStmt>
      <title>The Declaration of Independence</title>
      <author></author>
    </titleStmt>

    <editionStmt>
      <edition n="12">Edition 12 
        <date value="2004-10">October 2004</date>
      </edition>
    </editionStmt>

    <publicationStmt>
      <publisher>Project Gutenberg</publisher>
      <pubPlace><xref url="www.gutenberg.org">www.gutenberg.org</xref></pubPlace>
      <date value="2004-10">October 2004</date>
      <idno type='etext-nr'>1</idno>
      <idno type='etext-file'>when</idno>
      <availability status='free'>
        <p>This eBook is for the use of anyone anywhere at no cost and with
	almost no restrictions whatsoever. You may copy it, give it away or
	re-use it under the terms of the Project Gutenberg License included
	online at <xref url="www.gutenberg.org/license">www.gutenberg.org/license</xref></p>
      </availability>
    </publicationStmt>

    <sourceDesc>
      <bibl>
        unknown
      </bibl>
    </sourceDesc>

  </fileDesc>

  <encodingDesc>
    <classDecl>
      <taxonomy id="lc">
        <bibl>
	  <title>Library of Congress Classification</title>
	</bibl>
      </taxonomy>
    </classDecl>
  </encodingDesc>

  <profileDesc>
    <langUsage>
      <language id="en-us">American</language>
    </langUsage>

    <textClass>
      <classCode scheme="lc">
        JK: Political science: Political inst. and pub. Admin.: United States
      </classCode>
      <keywords>
        <list>
	  <item>Government</item>
          <item>United States</item>
	</list>
      </keywords>
    </textClass>

  </profileDesc>

  <revisionDesc>
    <change>
      <date value="1971-12">December, 1971</date>
      <respStmt>
        <name>Michael S. Hart</name>
      </respStmt>
      <item>Project Gutenberg Edition 12</item>
    </change>
    <change>
      <date value="2004-10">October 2004</date>
      <respStmt>
        <name>Joshua Hutchinson</name>
      </respStmt>
      <item>TEI markup</item>
    </change>
  </revisionDesc>
</teiHeader>

<text>
  <front>
    <divGen type="titlepage" />
    <divGen type="pgheader" rend="newpage" />
    <divGen type="toc"      rend="newdoublepage" />

<div1 type="foreword">
<p>The United States Declaration of Independence was the first Etext 
released by Project Gutenberg, early in 1971.  The title was stored
in an emailed instruction set which required a tape or diskpack be
hand mounted for retrieval.  The diskpack was the size of a large
cake in a cake carrier, cost $1500, and contained 5 megabytes, of
which this file took 1-2%.  Two tape backups were kept plus one on
paper tape.  The 10,000 files we hope to have online by the end of
2001 should take about 1-2% of a comparably priced drive in 2001.</p>

<p>This file was never copyrighted, Sharewared, etc., and is thus for
all to use and copy in any manner they choose.  Please feel free to
make your own edition using this as a base.</p>

<p>In my research for creating this transcription of our first Etext,
I have come across enough discrepancies [even within that official
documentation provided by the United States] to conclude that even
"facsimiles" of the Declaration of Indendence will NOT going to be
all the same as the original, nor of other "facsimiles."  There is
a plethora of variations in capitalization, punctuation, and, even
where names appear on the documents [which names I have left out].</p>

<p>The resulting document has several misspellings removed from those
parchment "facsimiles" I used back in 1971, and which I should not
be able to easily find at this time, including "Brittain."</p>
</div1>

  </front>

  <body>

<div>
  <head>The Declaration of Independence of The United States of America</head>

<p>When in the Course of human events, it becomes necessary for
one people to dissolve the political bands which have connected
them with another, and to assume, among the Powers of the earth,
the separate and equal station to which the Laws of Nature and
of Nature's God entitle them, a decent respect to the opinions
of mankind requires that they should declare the causes which
impel them to the separation.</p>

<p>We hold these truths to be self-evident, that all men are created equal,
that they are endowed by their Creator with certain unalienable Rights,
that among these are Life, Liberty, and the pursuit of Happiness.
That to secure these rights, Governments are instituted among Men,
deriving their just powers from the consent of the governed,
That whenever any Form of Government becomes destructive of these ends,
it is the Right of the People to alter or to abolish it, and to institute
new Government, laying its foundation on such principles and organizing
its powers in such form, as to them shall seem most likely to effect
their Safety and Happiness.  Prudence, indeed, will dictate that Governments
long established should not be changed for light and transient causes;
and accordingly all experience hath shown, that mankind are more disposed
to suffer, while evils are sufferable, than to right themselves by abolishing
the forms to which they are accustomed.  But when a long train of abuses and
usurpations, pursuing invariably the same Object evinces a design to reduce
them under absolute Despotism, it is their right, it is their duty, to throw
off such Government, and to provide new Guards for their future security.
--Such has been the patient sufferance of these Colonies; and such is now
the necessity which constrains them to alter their former Systems of Government.
The history of the present King of Great Britain is a history of repeated
injuries and usurpations, all having in direct object the establishment
of an absolute Tyranny over these States.  To prove this, let Facts
be submitted to a candid world.</p>

<p>He has refused his Assent to Laws, the most wholesome and necessary
for the public good.</p>

<p>He has forbidden his Governors to pass Laws of immediate
and pressing importance, unless suspended in their operation
till his Assent should be obtained; and when so suspended,
he has utterly neglected to attend to them.</p>

<p>He has refused to pass other Laws for the accommodation of
large districts of people, unless those people would relinquish
the right of Representation in the Legislature, a right
inestimable to them and formidable to tyrants only.</p>

<p>He has called together legislative bodies at places unusual,
uncomfortable, and distant from the depository of their
Public Records, for the sole purpose of fatiguing them
into compliance with his measures.</p>

<p>He has dissolved Representative Houses repeatedly, for opposing
with manly firmness his invasions on the rights of the people.</p>

<p>He has refused for a long time, after such dissolutions,
to cause others to be elected; whereby the Legislative Powers,
incapable of Annihilation, have returned to the People at large
for their exercise; the State remaining in the mean time exposed
to all the dangers of invasion from without, and convulsions within.</p>

<p>He has endeavoured to prevent the population of these States;
for that purpose obstructing the Laws of Naturalization of Foreigners;
refusing to pass others to encourage their migration hither,
and raising the conditions of new Appropriations of Lands.</p>

<p>He has obstructed the Administration of Justice, by refusing his Assent
to Laws for establishing Judiciary Powers.</p>

<p>He has made judges dependent on his Will alone, for the tenure
of their offices, and the amount and payment of their salaries.</p>

<p>He has erected a multitude of New Offices, and sent hither swarms of
Officers to harass our People, and eat out their substance.</p>

<p>He has kept among us, in times of peace, Standing Armies
without the Consent of our legislatures.</p>

<p>He has affected to render the Military independent of
and superior to the Civil Power.</p>

<p>He has combined with others to subject us to a jurisdiction
foreign to our constitution, and unacknowledged by our laws;
giving his Assent to their Acts of pretended legislation:</p>

<p>For quartering large bodies of armed troops among us:</p>

<p>For protecting them, by a mock Trial, from Punishment for any Murders
which they should commit on the Inhabitants of these States:</p>

<p>For cutting off our Trade with all parts of the world:</p>

<p>For imposing taxes on us without our Consent:</p>

<p>For depriving us, in many cases, of the benefits of Trial by Jury:</p>

<p>For transporting us beyond Seas to be tried for pretended offences:</p>

<p>For abolishing the free System of English Laws in a neighbouring
Province, establishing therein an Arbitrary government,
and enlarging its Boundaries so as to render it at once
an example and fit instrument for introducing the same
absolute rule into these Colonies:</p>

<p>For taking away our Charters, abolishing our most valuable Laws,
and altering fundamentally the Forms of our Governments:</p>

<p>For suspending our own Legislatures, and declaring themselves
invested with Power to legislate for us in all cases whatsoever.</p>

<p>He has abdicated Government here, by declaring us out of his Protection
and waging War against us.</p>

<p>He has plundered our seas, ravaged our Coasts, burnt our towns,
and destroyed the lives of our people.</p>

<p>He is at this time transporting large armies of foreign mercenaries
to compleat the works of death, desolation and tyranny, already begun
with circumstances of Cruelty and perfidy scarcely paralleled in the
most barbarous ages, and totally unworthy of the Head of a civilized nation.</p>

<p>He has constrained our fellow Citizens taken Captive on the high Seas
to bear Arms against their Country, to become the executioners of
their friends and Brethren, or to fall themselves by their Hands.</p>

<p>He has excited domestic insurrections amongst us, and has
endeavoured to bring on the inhabitants of our frontiers,
the merciless Indian Savages, whose known rule of warfare,
is an undistinguished destruction of all ages, sexes and conditions.</p>

<p>In every stage of these Oppressions We have Petitioned for Redress
in the most humble terms:  Our repeated Petitions have been answered
only by repeated injury.  A Prince, whose character is thus marked
by every act which may define a Tyrant, is unfit to be the ruler
of a free People.</p>

<p>Nor have We been wanting in attention to our British brethren.
We have warned them from time to time of attempts by their
legislature to extend an unwarrantable jurisdiction over us.
We have reminded them of the circumstances of our emigration and
settlement here.  We have appealed to their native justice
and magnanimity, and we have conjured them by the ties of our
common kindred to disavow these usurpations, which would inevitably
interrupt our connections and correspondence.  They too have been
deaf to the voice of justice and of consanguinity.  We must, therefore,
acquiesce in the necessity, which denounces our Separation, and hold them,
as we hold the rest of mankind, Enemies in War, in Peace Friends.</p>

<p>We, therefore, the Representatives of the United States of America,
in General Congress, Assembled, appealing to the Supreme Judge of
the world for the rectitude of our intentions, do, in the Name,
and by the Authority of the good People of these Colonies,
solemnly publish and declare, That these United Colonies are,
and of Right ought to be Free and Independent States;
that they are Absolved from all Allegiance to the British Crown,
and that all political connection between them and the State
of Great Britain, is and ought to be totally dissolved;
and that as Free and Independent States, they have full Power to
levy War, conclude Peace, contract Alliances, establish Commerce,
and to do all other Acts and Things which Independent States may
of right do.  And for the support of this Declaration, with a firm
reliance on the Protection of Divine Providence, we mutually pledge
to each other our Lives, our Fortunes and our sacred Honor.</p>

</div>

  </body>

  <back rend="newdoublepage">
    <divGen type="footnotes" />
    <divGen type="colophon" rend="newpage" />
    <divGen type="pgfooter" rend="newpage" />
  </back>

</text>
</TEI.2>
From Bowerbird at aol.com  Thu Oct 21 13:00:46 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 13:01:00 2004
Subject: [gutvol-d] A layman's critique of PGTEI
Message-ID: <c4.185a07b4.2ea96f6e@aol.com>

joshua said:
>   My next experiment will choose a text 
>   with some other stuff like poetry in it,
>   so that I can see what more complexity 
>   does to the whole process.

when you have worked your way up to it,
i suggest you do the test-suite i created.

when you can mark _that_ up in a way that
makes everyone here happy, you'll have gone
a long way towards meeting jim's criteria...

and when you can convert it to all formats,
you will have gone _all_ the way...

(of course, that still won't get the e-texts
actually marked up, but that's minor, right?)

-bowerbird
From hacker at gnu-designs.com  Thu Oct 21 13:04:10 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Oct 21 13:05:27 2004
Subject: [gutvol-d] A layman's critique of PGTEI
In-Reply-To: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>
References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410211553290.2109@angst.gnu-designs.com>


> First of all, please take this as it is intended, namely my experiences 
> while attempting to convert a short and sweet text to the PGTEI format 
> found here (http://www.gutenberg.org/tei/).  It is my hope that this 
> will lead to some improvements in the process.

 	I went through every link there, and could not find a reference to 
download any sort of tool or set of tools that purports to convert the TEI 
format to other formats. Where did you find the converter to use locally?

 	The docs link to an "online" converter, which no longer exists at 
that link, apparently. This one is dead:

 	http://www.gutenberg.net/testing/gnutenberg/tei-online.php

 	This one (linked from the front page) is not:

 	http://www.gutenberg.org/tei/services/tei-online

 	I don't see where the code to this online converter, or any 
converter that works with TEI for that matter, is documented, referenced, 
or linked to. Did I miss the link on one of the other pages? It looks like 
we're all back to square one... reinventing all of our own wheels from 
scratch.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From jeroen at bohol.ph  Thu Oct 21 13:18:03 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Thu Oct 21 13:17:43 2004
Subject: [gutvol-d] Posting TEI
In-Reply-To: <20041020213206.GA10983@panix.com>
References: <20041020205934.GA22445@panix.com><20041020211431.57061.qmail@web41709.mail.yahoo.com>
	<20041020213206.GA10983@panix.com>
Message-ID: <4178197B.7060804@bohol.ph>

Jim Tinsley wrote:

> If I may nit-pick, I think it more correct to say that it
>
>isn't _always_ true. That is, it is not true when there 
>exists a CSS that works with the XML.
>
>Jeroen provided XML like this, which I thought was very
>good indeed. For any of you who haven't seen it, please
>point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml
>which is an absolute pleasure to read. (Well, if you're
>a geek, that is, and if you ain't, whatcha doin. here?? :-)
>
>I said before, and I say again, that where such an XML is 
>provided, HTML is probably redundant. ("Probably" because
>a significant use of HTML is as input to PDA readers like,
>say, Mobipocket, and I'm not sure if they would swallow
>this XML without requiring a Heimlich.)
>
>I know of no CSS for Marcello's PGTEI. Perhaps one could
>be crafted for it.
>
>  
>

One additional note, before the XML of this text is rendered on your 
browser, it is fed through an XSLT stylesheet, which turns it into HTML, 
and then, to that HTML, CSS is applied. The entire process is done for 
you by your browser. The XML follows TEILite, and validates on a 
validating parser; the HTML should validate on a validating HTML parser.

Mercello's PGTEI is close enough to TEI that this will probably give 
very decent results on his files too. He basically added a few small 
extentions to TEILite, which are "documented" in his well commented DTD 
or XSLT sheets. (But that is stuff for specialists really)

Jeroen.

From jeroen at bohol.ph  Thu Oct 21 13:33:34 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Thu Oct 21 13:33:10 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <4177E9CB.7080200@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de><20041020173528.GB3366@panix.com>	<4176BDEF.7050008@perathoner.de><20041020205934.GA22445@panix.com>	<4177901F.7010006@perathoner.de><20041021150227.GA17442@pglaf.org>
	<4177E9CB.7080200@perathoner.de>
Message-ID: <41781D1E.5010700@bohol.ph>


>
> People being eager to try TEI and there being no official place to 
> post TEI files, everybody has posted the files they have marked up in 
> a different place. I have been working on my dialect, Jeroen on his 
> and DP is cooking up another one. There is no central "clearing house" 
> where we can see the other guys work. I don't say it would be 
> impossible for me to obtain a glimpse of the TEI texts the folks at DP 
> are working on, it would just be much easier if I could get them from 
> the archive.


Personally, I try to stick as closely to TEILite as possible. I can add 
extentions to it, but then can easily produce XSLT to pull out those 
extentions before posting. I think a few of Marcello's extentions for 
PGTEI are not needed, as elements exists to encode the same information, 
or alternative mechanisms can be divised within TEILite -- but even if 
you stick to pure TEILite, you will need to agree on conventions, for 
example, I leave in quotation marks (as I have numerous old works that 
deal with these in a very irregular way, turning them to <q> and </q> 
would be difficult. Marcello leaves them out. We can fix an XSLT to 
re-supply them, and even an XSLT to supply them only if they are removed 
(given we agree on a standard way of documenting this fact) -- and that 
is what you need if you're working on a certain project using TEI -- a 
gentle intruduction and some guidelines. A few very nice ones are on the 
Net.

If people wish, I can set up a website with TEI versions of _all_ my 
posted texts, both in my original master SGML, and converted to XML. 
Gives everybody something to experiment with.

Jeroen.


From joshua at hutchinson.net  Thu Oct 21 13:58:02 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 21 13:58:11 2004
Subject: [gutvol-d] A layman's critique of PGTEI
Message-ID: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com>

I used the online converter.  I'm baby-stepping my way into this.  Marcello's converter was the easiest starting point for me.

I didn't check into any closer, but it looks like there is a link to a .zip file containing  Marcello's XSLT stylesheets.

Any I don't think we all reinventing the wheel... Other than Marcello's stuff, I don't see ANY body's wheel out there.  I'm working with what I have available.

I figure if a TEI newb like me can get something working reliably, we're getting somewhere.

Josh


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] A layman's critique of PGTEI
Date: Thu, 21 Oct 2004 16:04:10 -0400 (EDT)

> 
> 
> > First of all, please take this as it is intended, namely my experiences 
> > while attempting to convert a short and sweet text to the PGTEI format 
> > found here (http://www.gutenberg.org/tei/).  It is my hope that this 
> > will lead to some improvements in the process.
> 
>  	I went through every link there, and could not find a reference to 
> download any sort of tool or set of tools that purports to convert the TEI 
> format to other formats. Where did you find the converter to use locally?
> 
>  	The docs link to an "online" converter, which no longer exists at 
> that link, apparently. This one is dead:
> 
>  	http://www.gutenberg.net/testing/gnutenberg/tei-online.php
> 
>  	This one (linked from the front page) is not:
> 
>  	http://www.gutenberg.org/tei/services/tei-online
> 
>  	I don't see where the code to this online converter, or any 
> converter that works with TEI for that matter, is documented, referenced, 
> or linked to. Did I miss the link on one of the other pages? It looks like 
> we're all back to square one... reinventing all of our own wheels from 
> scratch.
> 
> 
> David A. Desrosiers
> desrod@gnu-designs.com
> http://gnu-designs.com
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From hacker at gnu-designs.com  Thu Oct 21 14:06:43 2004
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Thu Oct 21 14:07:27 2004
Subject: [gutvol-d] A layman's critique of PGTEI
In-Reply-To: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com>
References: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0410211704390.2109@angst.gnu-designs.com>


> Any I don't think we all reinventing the wheel... Other than Marcello's 
> stuff, I don't see ANY body's wheel out there.  I'm working with what I 
> have available.

 	You've nailed the problem dead-on. Nobody is providing any tools 
or converters for this, and hence, everyone is forced to reinvent their 
own version of the wheel. And because they had to do it themselves, they 
don't care to release it (or are embarrassed because of the quality of 
their own code), compounding the problem for everyone else.

 	There is a real, documented, psychology behind this, and I think 
this stumbling block is really causing a lot of fracturing amongst the 
rest of the potential contributors out there, myself included.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Bowerbird at aol.com  Thu Oct 21 14:29:49 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 14:30:01 2004
Subject: [gutvol-d] 126 messages in one day
Message-ID: <155.41991e6b.2ea9844d@aol.com>

david said:
>   Excuse me, "Listserve" is the trademarked name 
>   of a product owned and created by L-Soft International, Inc.

right.  and amazon has a patent on one-click web purchasing.
and various companies now "own" pieces of the human genome.

if l-soft wants to come after me, they know my e-mail address.

-bowerbird
From Bowerbird at aol.com  Thu Oct 21 14:37:33 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 14:37:51 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <1e2.2cc70258.2ea9861d@aol.com>

joshua said:
>   Honestly, I see an easy compromise here.

oh-oh, something smells bad here...         :+)


>   As long as a conformant TEXT file 
>   and a conformant HTML file 
>   show up with the XML file, 
>   I say post all three.  

well then i'm sure glad you don't _have_ a say...


>   Granted, right now we don't have a method 
>   for the WW'ers to verify the XML file is valid

all these minor details, eh?


>   so if you want to put a disclaimer to that effect in the file ... fine.

a disclaimer?  that's your "compromise".  yeah, right.

for those on the outside, who wonder why this fuss is being made
whether an .xml file can be "posted", it's because the x.m.l. people
want the imprimatur of being "official".  why is that so important?
because that's how the x.m.l. ponzi game is being played these days.
people are adopting x.m.l. not because they think it's the best route,
but rather because they've been "convinced" that it is "inevitable".
even though -- in too many situations -- it just plain doesn't work,
this "inevitability" makes people shrug and say, "ok, give me some."
after all, you don't want to miss out on the ground floor, do you?

they will tell you time and time again how there are "so many tools"
for dealing with x.m.l., how x.m.l. is gonna be able to do conversions
for whatever format anyone wants, but they can't even demonstrate
a simple ability to convert out a text file and an .html version now.
and when you call them on that, they whine about how unreasonable
you are being, and how unfair it is to expect "150% perfection".  bull.

it is _far_ more sensible -- especially in, as jim delicately put it,
a "production environment rather than an experimental one" -- to
make the process _work_ before you put it in play, the x.m.l. people
don't want to be bothered with that "technicality" before the fact.
that's something that someone will figure out "later".  yeah, right.
if x.m.l. gets the stamp of approval here, what's the motivation for
x.m.l. experts to come make it work?  after all, there's no money
in it for them here.  they're off being high-paid consultants, telling
the next mark, "look, even project gutenberg is using x.m.l. now too."

as marcello puts it:

>   At this point we need to set a signal that the TEI era has started.

he's not interested in actually making t.e.i. _work_ in reality
-- he tried, and got a grand total of two simple e-texts done --
he just wants to "set a signal" that the "era has started" here...

-bowerbird
From marcello at perathoner.de  Thu Oct 21 15:05:02 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 15:05:15 2004
Subject: [gutvol-d] A layman's critique of PGTEI
In-Reply-To: <Pine.LNX.4.61.0410211553290.2109@angst.gnu-designs.com>
References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>
	<Pine.LNX.4.61.0410211553290.2109@angst.gnu-designs.com>
Message-ID: <4178328E.8050502@perathoner.de>

David A. Desrosiers wrote:

>     I don't see where the code to this online converter, or any 
> converter that works with TEI for that matter, is documented, 
> referenced, or linked to. 

The link is on this page.

   http://www.gutenberg.org/tei/


   http://www.gutenberg.org/tei/src/gnutenberg-press-0.0.2.tgz


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 21 15:21:54 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 15:22:08 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <145.3687ad7b.2ea99082@aol.com>

marcello said:
>   Have you got any data to sustain your theory, like, uhm, 
>   a representative poll of pg user population?

first of all, let me say that i think it would be _great_
to have some way to survey project gutenberg users!
might open a lot of eyes around here.  (or probably not.)

i've talked to a lot of people about project gutenberg,
and discussed it on a lot of listserves over the years.
since i'm not involved in it, and because i'm curious,
i learn what people think; i hear the good and the bad.

and you know what the biggest eye-opener was for me?
it was last december, at the 10,000th-e-text gathering,
when michael gave a talk at the berkeley public library.
many of the attendees seemed to be just learning about
the project; they were excited, but had some questions.
the most frequent one revolved on how to read the e-texts.
in other words, these people didn't even know how to open
and read a text-file, or an .html file.  and when greg had to
try and explain a "zip" file, their eyes started to glaze over.
we overestimate -- and that word is an understatement -- 
the sophistication of the audience far too frequently here.
so i think it'd be great to open a communication pipeline
between us and them, so we got to know them a bit better...

anyway...

my view on the factors that have made project gutenberg a
success is based on listening to what michael himself says,
coupled with some very-long-term observation of the many
different e-book projects that have _failed_ along the way...

the more they depended on tech not yet in the mainstream,
the faster they plummeted.

the more they depended on having the newest hardware,
the faster they plummeted.

the more they depended on special knowledge by the user,
the faster they plummeted.

the more they cluttered up the text with extraneous stuff
-- including everything from proprietary formats to d.r.m. --
the faster they plummeted.

yet michael's project -- which michael himself considers to 
be successful exactly because he stripped down to basics --
maintained its ground and grew just as he predicted it would.

and today, most computer owners have grown totally weary
of the need to constantly update, to buy new hardware and/or
install complex new software.  they are digging their heels in,
deciding to make do with what they have.  p.c. sales have been
in _decline_ for years, after a decade-plus of yearly increases.

and the situation will only get worse as things go from here.
software has always grown the must-upgrade pie in the past,
but there's just no money in it now, so the decline will spiral.
when billy g. talks to his shareholders these days, he doesn't
talk about his software; nope, he talks about his i.p. patents.
"innovation" used to be his buzzword (albeit a very big lie),
but now it's "licensing" (and this one we can surely believe).
when the 800-pound gorilla decides to get in your way, beware.
an absence of reasons to upgrade will make users dig in deeper.
they'll live with what they've got, and we'll have live with that.

and this is _not_ -- as some techies would have you believe --
because they are stupid, or don't want to "grow", it's because
they don't want to always have to be updating their computer.

just like some people like to tinker with their car -- great! --
but other people just want to get in it and drive somewhere...

i'm from the mac side, where the mantra has always been
"it just works", so maybe i'm biased, but it just _amazes_
me how much time the rest of the world spends _fiddling_.
but guess what?  you've used up all of the users' patience.
if what you give 'em won't work without fiddling, forget it.

and i'll kick this up to another level of abstraction as well.
the reason _books_ -- paper ones -- have been so successful
is because they are utterly and completely _simple_ to use.
a child can learn how to use a book.  the more difficulty that
you tack onto electronic-books, the more you buck history...

books have also been an instrument that let people _rise_.
it's part of the philosophy underlying public library systems.
when you move books out to a place where only new machines
can access them, you're making a very bad political decision.
instead of books closing the gap between the rich and the poor,
they become another wedge making that gap bigger and bigger...

e-texts "too cheap to meter" is michael's bedrock philosophy,
and moving this library to a technological methodology that
won't run on the trailing-edge -- let alone one that cannot even
prove itself to _work_ -- does that philosophy grave disservice.

in conclusion...

what this means is that the new imperative is to "make it work"
using the existing infrastructure, or you too will plummet, fast.

and _that_ means that if you want to introduce some innovation,
the obligation should be on you to prove that it works, first...

the end-user is tired of being your guinea pig...

-bowerbird
From marcello at perathoner.de  Thu Oct 21 15:43:55 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 15:44:07 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <1e2.2cc70258.2ea9861d@aol.com>
References: <1e2.2cc70258.2ea9861d@aol.com>
Message-ID: <41783BAB.2020901@perathoner.de>

Bowerbird@aol.com wrote:

> as marcello puts it:
> 
>>  At this point we need to set a signal that the TEI era has started.
> 
> he's not interested in actually making t.e.i. _work_ in reality
> -- he tried, and got a grand total of two simple e-texts done --

Bzzzzt. Wrong. But thank you for playing.

He has *25* titles marked up in 3 languages.

Ranging from Alice (illustrated), to Life on the Mississippi (tables and 
footnotes), Faust and Wallenstein (plays), Deutschland. Ein 
Winterm?rchen (lyrics) and a technical manual about, guess what? PGTEI.

Go to

   http://www.gnutenberg.de/search/titles/results/

and eat crow, Bowerbird.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Thu Oct 21 15:39:32 2004
From: shalesller at writeme.com (D. Starner)
Date: Thu Oct 21 15:51:31 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <20041021223932.16F954BDA9@ws1-1.us4.outblaze.com>

Bowerbird@aol.com writes:

> what this means is that the new imperative is to "make it work" 
> using the existing infrastructure, or you too will plummet, fast. 
> 
> and _that_ means that if you want to introduce some innovation, 
> the obligation should be on you to prove that it works, first... 

So does that mean you're going to stop pushing ZML on us? Or are you
going to argue over the definition of the word "is" again?
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Thu Oct 21 15:59:13 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 15:59:39 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <12a.4e8737b7.2ea99941@aol.com>

marcello said:
>   He has *25* titles marked up in 3 languages.

have you run out the .html and .txt versions,
and put them online, so i can evaluate them?


>   Go to

>   http://www.gnutenberg.de/search/titles/results/

>   and eat crow, Bowerbird.

actually, i'm tired of being a guinea pig,
so i won't be going anywhere today, thanks.

if you say you've got 25 titles done, i'll believe it.

but you could have 250 done and it wouldn't change my point.

let me know when you get to 2,500, marked up and converted
to .html and plain-text.  because that number will impress me.

-bowerbird
From marcello at perathoner.de  Thu Oct 21 15:59:55 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 16:00:08 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <145.3687ad7b.2ea99082@aol.com>
References: <145.3687ad7b.2ea99082@aol.com>
Message-ID: <41783F6B.1080500@perathoner.de>

Bowerbird@aol.com wrote:

> i've talked to a lot of people about project gutenberg,
> and discussed it on a lot of listserves over the years.

Dissed and cussed, yes, but not discussed.


> the most frequent one revolved on how to read the e-texts.
> in other words, these people didn't even know how to open
> and read a text-file, or an .html file.  and when greg had to
> try and explain a "zip" file, their eyes started to glaze over.

So its better to give them a reader that freezes on them the first time 
they use it and takes their whole machine down if they press ctrl-alt-del.


> the reason _books_ -- paper ones -- have been so successful
> is because they are utterly and completely _simple_ to use.

Thats what you think: take a look here if your decrepit Macintrash can 
handle videos:

   http://homepages.nyu.edu/~mz34/helpdesk.WMV


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Oct 21 16:03:35 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 16:03:47 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <145.3687ad7b.2ea99082@aol.com>
References: <145.3687ad7b.2ea99082@aol.com>
Message-ID: <41784047.8060203@perathoner.de>

Bowerbird@aol.com wrote:

> what this means is that the new imperative is to "make it work"
> using the existing infrastructure, or you too will plummet, fast.

I heard your program still crashes after printing the headline ...

But, of course, the advice you are meting out is for other people to 
follow, not for you.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 21 16:23:41 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 16:24:02 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <9e.176f512e.2ea99efd@aol.com>

mar cello said:
>   Dissed and cussed, yes, but not discussed.

actually, i think you would be hard-pressed to find
even one place where i have said something bad
about project gutenberg.  i give it very high praise.

even on my upcoming blog, where i fully intend to
speak frankly about some bad decisions that i think
are being made here by some people around michael,
i will continue to say good things about the library,
and especially about michael.  as many people recall,
you tarking naugshlocks here have often accused me of 
"kissing michael's ass".  no one's ever said that about me,
not in regard to anyone, but i do think michael is a genius,
so i'll continue to say good things about him and his library.

but some _other_ people might not get such rosy treatment...


>   So its better to give them a reader that 
>   freezes on them the first time they use it and 
>   takes their whole machine down if they press ctrl-alt-del.

beta-test software can do that sometimes.

but i've had not one report of my program doing that,
if that's what you are trying to imply here, marcello.

if the tester can't report or replicate the crash,
and describe the conditions of its occurrence too,
i'll believe it's a problem unique to their machine.
most likely with their windows operating system.
maybe you've heard there are a few bugs in that...


>   Thats what you think: take a look here 
>   if your decrepit Macintrash can handle videos:

you think you're cool insulting someone
because their machine is old, don't you?

but yeah, my mac can "handle videos" just fine, thank you.
so can the authoring-tools that i _program_ on this old mac.

of course, they also work fine on the circa-1989 mac that
i used before this one, thanks to a thing called quicktime,
have you heard of that?, so i guess it's not too surprising...

-bowerbird
From Bowerbird at aol.com  Thu Oct 21 16:28:47 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 16:29:06 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <156.420709c0.2ea9a02f@aol.com>

marcello said:
>   I heard your program still crashes after printing the headline ...

i've heard not a single report of that, confirmed or unconfirmed.

but if you can report that, take it to my beta-test listserve.
the people signed up to this listserve don't want to hear it...

-bowerbird
From marcello at perathoner.de  Thu Oct 21 17:28:08 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 21 17:28:23 2004
Subject: [gutvol-d] Best of Bowerbird
Message-ID: <41785418.4020306@perathoner.de>


New "Best of Bowerbird" fansite at:

   http://www.gnutenberg.de/bowerbird/


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Oct 21 17:39:20 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 17:39:52 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
Message-ID: <dc.1784d972.2ea9b0b8@aol.com>

starner said:
>   So does that mean you're going to stop pushing ZML on us? 

pushing?  that's a rather loaded term, don't you think?

and inaccurate too.  z.m.l. is 
a solution to problems with the library.
i will be the person who implements that solution.

i came here to _show_ you that solution, so
if y'all were _smart_ enough to see it as such,
y'all could help me with that implementation...

but i didn't count any chickens before they were hatched,
so it is no loss to me that y'all are unable to see clearly.

i was always willing to implement the solution myself,
and i still am.  what _i_ got out of the whole deal was
a much more detailed picture of how to make it happen,
by virtue of having explained it some six ways to sunday.

meanwhile, same rule as day one:  the proof is in the pudding.


>   Or are you going to argue over the definition of the word "is" again?

what?

-bowerbird
From jtinsley at pobox.com  Thu Oct 21 18:20:30 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Thu Oct 21 18:20:45 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <4177E9CB.7080200@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com>
	<4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org>
	<4177E9CB.7080200@perathoner.de>
Message-ID: <20041022012030.GA23907@panix.com>

On Thu, 21 Oct 2004 18:54:35 +0200, Marcello Perathoner <marcello@perathoner.de> wrote:

>I feel Jim is raising artificial objections he knows we cannot overcome. 
>If he doesn't want to learn TEI and he doesn't feel like proofing a TEI 
>text in emacs, fine. But then, he should step aside and let other people 
>do this work.

I find this very offensive.

I came home, and was reading happily enough through the threads until
this.

I differ with you quite profoundly about the implementation of XML,
and, I'm sure, several other issues. But my opinions are honest, and
based on what I believe is best for PG as a whole. I do not "raise
artificial objections" -- these are the expectations I have had for
XML as far back as I can remember, and they are expectations regularly
assumed, if not met, by people who evangelize XML. I "learned TEI"
(not all of it, of course) with the hope of using it in PG, in late
2001/early 2002, and I marked up my first book in XML in February,
2002, which was long before I ever heard your name.

If you can't accept that I am debating these issues in good faith,
there is no point in continuing this discussion.

jim

From cannona at fireantproductions.com  Thu Oct 21 18:29:56 2004
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Oct 21 18:31:18 2004
Subject: [gutvol-d] Best of Bowerbird
In-Reply-To: <41785418.4020306@perathoner.de>
References: <41785418.4020306@perathoner.de>
Message-ID: <6.1.2.0.0.20041021202349.01c217f8@mail.fireantproductions.com>

I was just wondering today which text would be deemed worthy of becoming 
#15000, now that we're close.

I'll be looking forward to seeing it posted.

http://www.gutenberg.org/dirs/1/5/0/0/15000/15000.zml


At 07:28 PM 10/21/2004, you wrote:

>New "Best of Bowerbird" fansite at:
>
>   http://www.gnutenberg.de/bowerbird/
>
>
>
>--
>Marcello Perathoner
>webmaster@gutenberg.org
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From Bowerbird at aol.com  Thu Oct 21 18:50:21 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 21 18:50:41 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <1e2.2cced801.2ea9c15d@aol.com>


>   If you can't accept that I am debating these issues in good faith,
>   there is no point in continuing this discussion.

oh c'mon, jim, "good faith"?  from marcello?

yeah, i got a good laugh out of that one...        :+)

-bowerbird
From Gutenberg9443 at aol.com  Thu Oct 21 22:05:19 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Oct 21 22:05:39 2004
Subject: [gutvol-d] English and request
Message-ID: <193.31ace50b.2ea9ef0f@aol.com>

I don't know what European language is spoken in Somalia; in the past parts  
of it have been colonized by British, French, and Italians. I met a Somali  
refugee today, one of a good many that our Church has "adopted." Catholic  
charities are also working with some of the Somalis in Salt Lake City, and I'm  
sure that other churches I don't know about are doing the same.
 
In this small group, there is one adult male, who speaks broken  English; two 
young adult women, one of whom speaks broken English and the  other of whom 
speaks only a few words; an older adult woman who speaks very  little English; 
and four children. They are all Muslims, and their first  language is Somali. 
The man got a job about two days after he got here; the  young adult women are 
both starting work tomorrow, even though one of them had a  miscarriage only 
yesterday. The grandmother will be caring for the children  while the other 
adults are working.
 
They all want desperately to learn English.
 
PG has a lot of children's books, and I can prepare a CD of them.  But T and 
I don't have a spare computer. Does ANYBODY have an extra  laptop or notebook 
that could be given to them? If so, let me know, and I'll  find out whether it 
should be sent directly to them or to me to get to them. I  think the 
grandmother could learn a lot more English in a hurry if they had a  computer and if 
the two adults who speak reasonably good English would read and  translate.
 
Uh . . . what was that about people in Africa don't need books in  English?
 
We gave them some paper reference books, but we long since gave all our  
children's books to the shelter for battered women and their children. We'll  
probably be able to get them some books through the library's book sale, but  
having a whole CD of children's books and a way to read them would help them so  
much.
 
I am a scholar. I have written reference books, and I am very grateful to  
those who post scholarly material, especially Pepys's diaries, which are some of 
 the most fascinating books I have ever read.
 
But to my mind, it is these people, and others in unfortunate situations or  
locations, to whom PG should be mainly aiming. Scholars are going to get their 
 books one way or another, if they have to hitchhike to the closest good  
library.
 
I grew up in a town without a library; when I went to visit my grandmother  
she knew that the first place she had to take me was the library. What a  
blessing it would have been to me to have computers and PG then!
 
Anne
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041022/27260b29/attachment.html
From brad at chenla.org  Thu Oct 21 23:34:32 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Oct 21 23:36:19 2004
Subject: [gutvol-d] Aside on old computers
In-Reply-To: <Pine.BSI.4.58.0410201743510.20578@malasada.lava.net> (Karen
	Lofstrom's message of "Wed, 20 Oct 2004 17:49:51 -1000 (HST)")
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
	<4175EF86.8060108@adelaide.edu.au>
	<Pine.BSI.4.58.0410201743510.20578@malasada.lava.net>
Message-ID: <wkhdonjm6f.fsf@chenla.org>

Karen Lofstrom <lofstrom@lava.net> writes:

> On Wed, 20 Oct 2004, Steve Thomas wrote:
>
>> [I can't believe that people still think they're doing good by
>> shipping old 486's to Africa -- but apparently its true. I
>> recently donated some old Pentium II's to a charity, and they
>> couldn't believe their luck.]
>
> My Linux users group installs thin client computer labs for schools. We
> happily accept PIIs, but turn down 486s. We use PIIs and PIIIs as thin
> clients, removing the hard drives and installing bootable NIC cards, and
> connect them to a fast server running K12LTSP Linux. We can create a
> usable 30 client computer lab for $3000 or so, since the clients are all
> donations.
>

I can't speak for Africa, but I have spent the last 14 years living in
the deepest parts of China, Laos, and Cambodia.  In the last 7 years I
have not seen anything older than a PII except in a few old government
systems and ancient bank computer networks running OS/2.

And I'm not talking about the big cities like Vientiene, I'm talking
about villages which barely have electricity and other odd corners
with flakey old generators grumbling in blackend soot covered back sheds.

Just because the electricity is only on for a few hours a day doesn't
mean that people don't have access to okay technology.  Hell, I've
seen rice farmers along the Mekong River using picture phones to send
pictures of babies to relatives in Bangkok.

The third world ain't always as backward as people in the first world
think.  There are large areas that are that bad, but then ebooks will
not be an option for them until they have bridges connecting them to
settled areas, or proper water, bottled gas for cooking....

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From tb at baechler.net  Fri Oct 22 01:14:40 2004
From: tb at baechler.net (Tony Baechler)
Date: Fri Oct 22 01:13:45 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <4177E87C.90101@dsl.pipex.com>
References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
Message-ID: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com>

At 05:49 PM 10/21/2004 +0100, you wrote:
>>So, does this mean that I now not only have to download the master xml 
>>file, the css, and a set of conversion tools?  You must be kidding, 
>>right?  If it came to that, I would rather have the plain text and forget 
>>the page numbers.  It is already inconvenient to use "lynx -dump -nolist 
>>filename.htm."  Why in the world would I want to run it through a 
>>conversion tool and still have to do that anyway?  OK, so a plain text 
>>file can be output directly from the xml.  I still have to go through at 
>>least one extra conversion step that I wouldn't have to otherwise.
>
>Why? The whole idea behind PG moving to XML is not to complicate things, 
>it's to give more flexibility while retaining simplicity. How about this 
>situation:


Apparently context was lost here.  The "why" is that, according to what 
Joshua was saying, the page numbers are not available anywhere in the plain 
text because they would look ugly.  OK, I understand that and I myself 
might not even want them most of the time.  However, if I decide that for a 
particular file I want them, I have to go to the master xml document and do 
my own conversion.  The PG supplied plain text won't help me, and the html 
won't work correctly in Lynx or IE.  Therefore, I have to redo the 
conversion to get the information I want in the plain text file or whatever 
other format.  This does not seem simpler to me. 

From traverso at dm.unipi.it  Fri Oct 22 01:33:31 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Oct 22 01:33:55 2004
Subject: [gutvol-d] Aside on old computers
In-Reply-To: <wkhdonjm6f.fsf@chenla.org> (message from Brad Collins on Fri, 22
	Oct 2004 13:34:32 +0700)
References: <e6.5b7cdcf5.2ea6bbb7@aol.com> <4175B419.6030301@adelaide.edu.au>
	<shacuiqic3.fsf@tux.gnu.franken.de>
	<4175EF86.8060108@adelaide.edu.au>
	<Pine.BSI.4.58.0410201743510.20578@malasada.lava.net>
	<wkhdonjm6f.fsf@chenla.org>
Message-ID: <200410220833.i9M8XVW4016806@posso.dm.unipi.it>

>>>>> "Brad" == Brad Collins <brad@chenla.org> writes:

    Brad> Karen Lofstrom <lofstrom@lava.net> writes:

    >> On Wed, 20 Oct 2004, Steve Thomas wrote:
    >> 
    >>> [I can't believe that people still think they're doing good by
    >>> shipping old 486's to Africa -- but apparently its true. I
    >>> recently donated some old Pentium II's to a charity, and they
    >>> couldn't believe their luck.]
    >>  My Linux users group installs thin client computer labs for
    >> schools. We happily accept PIIs, but turn down 486s. We use
    >> PIIs and PIIIs as thin clients, removing the hard drives and
    >> installing bootable NIC cards, and connect them to a fast
    >> server running K12LTSP Linux. We can create a usable 30 client
    >> computer lab for $3000 or so, since the clients are all
    >> donations.
    >> 

    Brad> I can't speak for Africa, but I have spent the last 14 years
    Brad> living in the deepest parts of China, Laos, and Cambodia.
    Brad> In the last 7 years I have not seen anything older than a
    Brad> PII except in a few old government systems and ancient bank
    Brad> computer networks running OS/2.

    Brad> And I'm not talking about the big cities like Vientiene, I'm
    Brad> talking about villages which barely have electricity and
    Brad> other odd corners with flakey old generators grumbling in
    Brad> blackend soot covered back sheds.

When I started a EU-financed international research project on
symbolic computation, some 12 years ago, the computers that
we were using were 486. And they were running linux, (slackware) X,
and I was able to run TeX, and view high quality output. I am still
using and developing the software that we wrote in this project.

There should be something wrong if in 12 years what was good for a
half-million-dollar research project isn't even good for a forest
village. Not only this, but also the following generation of
processors (Pentium-I).

Carlo


From traverso at dm.unipi.it  Fri Oct 22 02:06:53 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Oct 22 02:07:16 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <41783BAB.2020901@perathoner.de> (message from Marcello
	Perathoner on Fri, 22 Oct 2004 00:43:55 +0200)
References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de>
Message-ID: <200410220906.i9M96rMS019592@posso.dm.unipi.it>


The problem is how to have beta-testing AND respect PG tradition of
posting only definitive stuff.


Would this be useful?


I might offer web space, computing and  bandwidth to post XML, convert
it to txt and html and what else, and submit the result to
whitewashing. 

You will be able to have installed all the software to handle the
conversion, and have submissions converted by automatic procedures.

This might be seen as a beta-test of xml whitewashing procedures. I am
at most neutral to xml (I recognize its unavoidability, but I complain
the trend, I would prefer a more human-friendly markup). So it will
not be pro-XML biased. And I am authorized to whitewashing, so this
can be seen as making my whitewashing in public.

The posting-and-converting should be automatic: a web interface to
submit a zip/tar.gz/tar.bz2 file, semi-automatic unzipping and
conversion, poster and site administrator OK to make the posting
public. Then the whitewashing could start WITHOUT corrections: if
anything in the result is wrong, then one should repeat the
submission.  If the post will be XML + converted, or converted only,
will be PG choice. The posts, complete of XML, will remain
indefinitely on the test site. Of course, an additional line will be
included to warn that the file is not an official PG file but only an
intermediate working file. But except for this line, everything should
be identical to a PG file, header and footer, PG number and filename
included.

Drawback: the server is located in Italy, so I cannot do it for non-EU
clearable items. You'll have to submit clearance for death+70 (with
procedures to decide, but a copy of a LOC authority record or an
encyclopaedia article will of course be enough).


Carlo
From marcello at perathoner.de  Fri Oct 22 03:15:59 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 03:16:21 2004
Subject: [gutvol-d] English and request
In-Reply-To: <193.31ace50b.2ea9ef0f@aol.com>
References: <193.31ace50b.2ea9ef0f@aol.com>
Message-ID: <4178DDDF.4010307@perathoner.de>

Gutenberg9443@aol.com wrote:

> Uh . . . what was that about people in Africa don't need books in  English?

You said those people were refugees. You said they came to the US and 
wanted to learn English. That very is clear.

But I fail to see how this could possibly imply that people who live in 
Africa wanted to learn English.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Oct 22 03:22:50 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 03:23:15 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>	<5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>	<5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com>
Message-ID: <4178DF7A.6000808@perathoner.de>

Tony Baechler wrote:

> according to what 
> Joshua was saying, the page numbers are not available anywhere in the 
> plain text because they would look ugly.

> However, if I decide 
> that for a particular file I want them, I have to go to the master xml 
> document and do my own conversion.

> This does not seem simpler to me.

That may not be simple but is still better than what you have now: if 
the txt file happens to lack the page numbers there is no way you could 
get them short of redoing the book.

In TEI, it may be not quite simple to set up, but you *can* do it. (And 
you can do lot more.) Im my eyes that's a big advantage.

Of course, XML is not and has never claimed to be the solution to all 
the world's problems.

ZML is :-)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From stephen.thomas at adelaide.edu.au  Fri Oct 22 03:47:10 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Fri Oct 22 03:47:40 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <200410220906.i9M96rMS019592@posso.dm.unipi.it>
References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de>
	<200410220906.i9M96rMS019592@posso.dm.unipi.it>
Message-ID: <4178E52E.2030107@adelaide.edu.au>

A question (possibly better put over on the DP list):

Is it possible to OCR a scan directly to XML? Or is the output 
from OCR always going to be text?

If the first, then we need two processes -- one to deal with new 
scans (OCR to XML), one to deal with existing plain texts (to 
convert them to XML).

But if the output of OCR is still going to be plain text, then 
we can use the same process to convert both existing and new 
books to XML.


Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From marcello at perathoner.de  Fri Oct 22 03:49:34 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 03:49:56 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <200410220906.i9M96rMS019592@posso.dm.unipi.it>
References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de>
	<200410220906.i9M96rMS019592@posso.dm.unipi.it>
Message-ID: <4178E5BE.9010805@perathoner.de>

Carlo Traverso wrote:

> The problem is how to have beta-testing AND respect PG tradition of
> posting only definitive stuff.

I believe the PG policy is (or at least, has been at some point) to 
encourage the posting of preliminary material.

 From the PG header:

   Please note:  neither this list nor its contents are final till
   midnight of the last day of the month of any such announcement.
   The official release date of all Project Gutenberg Etexts is at
   Midnight, Central Time, of the last day of the stated month.  A
   preliminary version may often be posted for suggestion, comment
   and editing by those who wish to do so.  To be sure you have an
   up to date first edition [xxxxx10x.xxx] please check file sizes
   in the first week of the next month.


That is exactly what we want to do: post a preliminary version for 
suggestion, comment and editing.

I don't understand why this is not possible for a TEI file.


> I might offer web space, computing and  bandwidth to post XML, convert
> it to txt and html and what else, and submit the result to
> whitewashing. 

Thank you. As for the server I can also offer one located in Germany, so 
the same limitations apply.

But this is sooo tedious! We have to replicate the exact setup of 
gutenberg.org *and* pglaf.org to get reliable results from the beta-test.

Example: my servers are all debian and have perl 5.8 whereas ibiblio is 
redhat enterprise with perl 5.6. This has often before given me headache 
because programs that ran at home, misteriously failed at ibiblio.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From holden.mcgroin at dsl.pipex.com  Fri Oct 22 03:54:23 2004
From: holden.mcgroin at dsl.pipex.com (Holden McGroin)
Date: Fri Oct 22 03:53:45 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com>
References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>	<5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com>	<5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com>
	<5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com>
Message-ID: <4178E6DF.9020101@dsl.pipex.com>

Tony Baechler wrote:
>>> So, does this mean that I now not only have to download the master 
>>> xml file, the css, and a set of conversion tools?  You must be 
>>> kidding, right?  If it came to that, I would rather have the plain 
>>> text and forget the page numbers.  It is already inconvenient to use 
>>> "lynx -dump -nolist filename.htm."  Why in the world would I want to 
>>> run it through a conversion tool and still have to do that anyway?  
>>> OK, so a plain text file can be output directly from the xml.  I 
>>> still have to go through at least one extra conversion step that I 
>>> wouldn't have to otherwise.
>>
>>
>> Why? The whole idea behind PG moving to XML is not to complicate 
>> things, it's to give more flexibility while retaining simplicity. How 
>> about this situation:
> 
> Apparently context was lost here.  The "why" is that, according to what 
> Joshua was saying, the page numbers are not available anywhere in the 
> plain text because they would look ugly.  OK, I understand that and I 
> myself might not even want them most of the time.  However, if I decide 
> that for a particular file I want them, I have to go to the master xml 
> document and do my own conversion.  The PG supplied plain text won't 
> help me, and the html won't work correctly in Lynx or IE.  Therefore, I 
> have to redo the conversion to get the information I want in the plain 
> text file or whatever other format.  This does not seem simpler to me.

Why must _you_ do it? If the information's available, then it would be 
TRIVIAL to add an option to the TXT or HTML converter which says "check 
here if you want page numbers included."

We're really arguing over features in a system which hasn't been built 
yet, where even the form of the system isn't even set yet. _I_ can 
envision a system where we have the standard TXT and HTML files 
generated in the same format as we have them now but where there's a 
simple web page where you can configure the version you want. Want Page 
Numbers? Tick a box. Want each chapter in a separate file? Tick a box.

So, whereas before, you had to have the standard TXT or HTML versions 
because that was all that was available, now we can actually talk about 
making customised versions as people want them. Maybe the settings could 
even be stored as a Cookie so you choose which settings you want once 
then every time you look at a text on PG, the text will be created as 
_you_ like it.

We can only do cool stuff like this _because_ we're creating this new 
super-format which contains information far beyond what was previously 
available in the TXT and HTML versions.

Cheers,
Holden
From marcello at perathoner.de  Fri Oct 22 03:57:00 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 03:57:23 2004
Subject: [gutvol-d] A layman's critique of PGTEI
In-Reply-To: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>
References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com>
Message-ID: <4178E77C.2010309@perathoner.de>

Joshua Hutchinson wrote:

> Well, that's my quick personal experiment.  My question for the
> experts: Can the HTML validation problem be easily fixed?  


At present there is a known bug in the title page. The converter 
produces a H1 inside a SPAN. (The SPAN should be a DIV or the H1 should 
be dropped.)

This is a thing i would like to postpone until we are agreed about how 
to format a title page. There are far too many ways you can do that 
according to the specs. Supporting them all will be very difficult.

I didn't get any other warnings running your example thru the validator. 
Can you mail me the output of the validator with "show code" enabled?


 > I'd also
 > like to request a change to the CSS used, but that is a personal
 > preference and something to really worry about after the
 > show-stoppers are fixed.

The CSS is in an external file. You can make your own.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From traverso at dm.unipi.it  Fri Oct 22 04:20:58 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Fri Oct 22 04:21:23 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <4178E5BE.9010805@perathoner.de> (message from Marcello
	Perathoner on Fri, 22 Oct 2004 12:49:34 +0200)
References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de>
	<200410220906.i9M96rMS019592@posso.dm.unipi.it>
	<4178E5BE.9010805@perathoner.de>
Message-ID: <200410221120.i9MBKwwx022743@posso.dm.unipi.it>

>>>>> "Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:

    Marcello> Example: my servers are all debian and have perl 5.8
    Marcello> whereas ibiblio is redhat enterprise with perl 5.6. This
    Marcello> has often before given me headache because programs that
    Marcello> ran at home, misteriously failed at ibiblio.

That's one of the points. The conversion tools are mature when they
are independent on the exact version of the software that you
have. And having a "neutral" site for testing is one of the important
points: you cannot rely on your own configuration.

PG has to rely on tools that are stable, not on bleeding edge. 

Carlo

From marcello at perathoner.de  Fri Oct 22 05:24:53 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 05:25:18 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <200410221120.i9MBKwwx022743@posso.dm.unipi.it>
References: <1e2.2cc70258.2ea9861d@aol.com>
	<41783BAB.2020901@perathoner.de>	<200410220906.i9M96rMS019592@posso.dm.unipi.it>	<4178E5BE.9010805@perathoner.de>
	<200410221120.i9MBKwwx022743@posso.dm.unipi.it>
Message-ID: <4178FC15.1020303@perathoner.de>

Carlo Traverso wrote:

> That's one of the points. The conversion tools are mature when they
> are independent on the exact version of the software that you
> have.

I was referring to the scripts that run the catalog.


> PG has to rely on tools that are stable, not on bleeding edge. 

This is open source development. We dont have enough resources to test 
the tools everywhere before releasing. We need bug reports and patches 
from the people out there. I don't even have a Winsloth machine ...


The mentality of "everything has to be perfect before we start" doesn't 
work.


Linus didn't post Linux when it was ready, he posted it when it was no 
more than a filesystem with a bit of memory management attached.

Tim Berners-Lee didn't start with XHTML 1.1. He started with what he had 
and refined it later.

Michael Hart didn't wait till he got a computer that understood lower 
case. He started with upper case only and fixed that later.


Success stories. Not an argument, but maybe an illustration.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Fri Oct 22 05:45:37 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 05:46:00 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <20041022124537.B625AEDE84@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Tony Baechler <tb@baechler.net>
> 
> Apparently context was lost here.  The "why" is that, according to what 
> Joshua was saying, the page numbers are not available anywhere in the plain 
> text because they would look ugly.  OK, I understand that and I myself 
> might not even want them most of the time.  However, if I decide that for a 
> particular file I want them, I have to go to the master xml document and do 
> my own conversion.  The PG supplied plain text won't help me, and the html 
> won't work correctly in Lynx or IE.  Therefore, I have to redo the 
> conversion to get the information I want in the plain text file or whatever 
> other format.  This does not seem simpler to me. 
> 

You're right, converting your own plain text is not simpler.

Right now, if you grab the plain text file of any project in the collection, it won't have page numbers.

Right now, if you grab a few select HTML files in the collection, it has an option to show page numbers.

In the future, if you want what he have now, nothing will change.  You'll grab the text file and it won't have page numbers.

In the future, if you want something more/different than what we have now ... you can get it, but it requires an extra step.  Right now, you can't get it, period.  That's the advancement that XML promises.

I really want people to understand that the move to XML master documents will NOT take away anything from what we have now.  It will give us more options beyond that basic setup.

Josh

PS In the distant future, I foresee a web page where you can customize what options you want and the server generates the file to your specifications on the fly.   That's down the road, though.

From joshua at hutchinson.net  Fri Oct 22 05:54:43 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 05:55:06 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Steve Thomas <stephen.thomas@adelaide.edu.au>
> 
> A question (possibly better put over on the DP list):
> 
> Is it possible to OCR a scan directly to XML? Or is the output 
> from OCR always going to be text?
> 

That is a very DP related question, but I'll answer here as best as I understand the future plans (and let others correct me where needed).

The plan at DP is to move from the current 2 round proofing model to a (probably) 4 round proofing/markup model.

The content provider will take the scans and OCR them normally.  That part doesn't change.

Then, there are 2 rounds of proofing that concentrate on typos, spelling, etc.  Very similar to the 2 rounds we have now.

Then, there are 2 MORE rounds of markup.  Here is where all the markup like poetry, italics/bold, footnotes, chapter headings, thoughtbreaks, etc, etc are done.

Then, when the final result gets out of 4 rounds, it is nicely marked up (in theory) XML.  The post-processor does his/her normal magic, combining all the pages, running validators on it, etc.

As far as the OCR process, we currently run some pre-processors on text to fix common scannos, etc.  I'd be surprised if those pre-processors didn't improve/change as the XML world emerges at DP.

Josh
From joshua at hutchinson.net  Fri Oct 22 06:03:11 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 06:03:34 2004
Subject: [gutvol-d] A layman's critique of PGTEI
Message-ID: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com>

My apologies.  I ran both Tidy and the Validator on the file (as is my normal HTML procedure, so it is second nature).  Tidy reported 13 warnings.  The W3C validator just show one (which you mention below).

I misspoke when I said the errors were all from the validator.

Now, from your comments, it looks like the next thing to do is decide on a standard title page.  I personally, don't have a problem with the format you have.  It is clean and easy to read.  There are some things, as I mentioned before, that I would change on a CSS level, but generally, I like the layout well enough.

Josh

----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> At present there is a known bug in the title page. The converter 
> produces a H1 inside a SPAN. (The SPAN should be a DIV or the H1 should 
> be dropped.)
> 
> This is a thing i would like to postpone until we are agreed about how 
> to format a title page. There are far too many ways you can do that 
> according to the specs. Supporting them all will be very difficult.
> 
> I didn't get any other warnings running your example thru the validator. 
> Can you mail me the output of the validator with "show code" enabled?
> 
> 
>  > I'd also
>  > like to request a change to the CSS used, but that is a personal
>  > preference and something to really worry about after the
>  > show-stoppers are fixed.
> 
> The CSS is in an external file. You can make your own.
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Fri Oct 22 06:41:18 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 06:41:47 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <20041022012030.GA23907@panix.com>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>	<41768369.6050204@perathoner.de>
	<20041020173528.GB3366@panix.com>	<4176BDEF.7050008@perathoner.de>
	<20041020205934.GA22445@panix.com>	<4177901F.7010006@perathoner.de>
	<20041021150227.GA17442@pglaf.org>	<4177E9CB.7080200@perathoner.de>
	<20041022012030.GA23907@panix.com>
Message-ID: <41790DFE.3070606@perathoner.de>

Jim Tinsley wrote:

>>I feel Jim is raising artificial objections he knows we cannot overcome. 
>>If he doesn't want to learn TEI and he doesn't feel like proofing a TEI 
>>text in emacs, fine. But then, he should step aside and let other people 
>>do this work.
> 
> I find this very offensive.
> 
> I came home, and was reading happily enough through the threads until
> this.

I am sorry if I spoilt your evening and I apologize for that.

I said "I feel" and that's the truth. Maybe it's just my fault.


> these are the expectations I have had for
> XML as far back as I can remember, and they are expectations regularly
> assumed, if not met, by people who evangelize XML. 

Some of your expectation cannot be met. Some would imply an enourmous 
expense of time on the developers part to save relatively little time on 
your part.


> I "learned TEI"
> (not all of it, of course) with the hope of using it in PG, in late
> 2001/early 2002, and I marked up my first book in XML in February,
> 2002, which was long before I ever heard your name.

I have marked up 25 books, prose, lyrics and plays. And I transformed 
all of them successfully to HTML, TXT, PDF and PalmDoc. That was a year ago.

I could have done more but I felt that it was better to go public with 
what I had, to get comments and suggestions from other people. I thought 
if PG posted some of those files I would get comments.

Since then I have been waiting. I think I have done my part. My files 
are done better than many I see posted.

Even if we had to fix them later, the philosophy of PG did at some point 
expressly allow the posting of preliminary files. I cannot see why this 
simple request should cause so much trouble and fear today.


These are some of your expectations that you should reconsider:

> I really no longer give any headroom at all to the approach "Post XML
> Now Because That Is The One True Way And We'll Figure Out How To Read
> It Later." If for no other reason, then because the most important
> part of the WW job is to check the texts before posting, and if we
> can't read it, we can't find the errors, and if we can't find the
> errors, we can't fix 'em.

You can read a TEI file in an editor. You can spell-check it. You can 
validate it. You can find the errors. The process is just a bit 
different from what you have now, and will always be until there crop up 
some native TEI readers.


> That
> process must work for _all_ teixlite files, not just ones that are
> specially cooked, using constraints not specified within the chosen
> DTD. Here's where we hit the rocks today. 

Impossible. There are things you cannot specify in a DTD but still must 
be followed to get a semantically correct file. (This holds for every 
XML application not just for PGTEI.) You always have to obey some extra 
rules besides validity. These are put down in the PGTEI guide.


> The
> only things we must have -- both for our own internal practical
> purposes and for the use of future readers -- is that it should work
> reliably on _all_ texts that conform to the XML DTD chosen, be open
> source, and be cross-platform. A reader needs to be able to tweak the
> transform and re-run on her own desktop. 

Same as above. The DTD is not strict enough (RelaxNG will be better, but 
it's still early). There will always be valid TEI files that do not 
transform to `correct' output files.

I don't see why it is necessary for the conversion tools to run on 
everybodies desktop before we can start posting files. If the tools run 
on pglaf.org and gutenberg.org that is more than enough for a start. The 
tools can be fixed later. That won't make posted valid TEI files invalid.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From juliet.sutherland at verizon.net  Fri Oct 22 06:49:41 2004
From: juliet.sutherland at verizon.net (Juliet Sutherland)
Date: Fri Oct 22 06:50:05 2004
Subject: [gutvol-d] Re: barriers to XML posting
References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de>
	<200410220906.i9M96rMS019592@posso.dm.unipi.it>
Message-ID: <02b701c4b83d$fb0cc350$6501a8c0@Unicorn>

If Carlo or someone else is willing to help with admnistering it, we can 
provide webspace, computing, and bandwidth on either the PGDP server or our 
test server to be used for this same purpose. Being located in the US, we 
would be following the same copyright rules at PG.

We would also be happy to keep XML versions of any projects until PG is 
ready to accept them.

JulietS

----- Original Message ----- 
From: "Carlo Traverso" <traverso@dm.unipi.it>
To: <gutvol-d@lists.pglaf.org>
Sent: Friday, October 22, 2004 5:06 AM
Subject: Re: [gutvol-d] Re: barriers to XML posting


>
> The problem is how to have beta-testing AND respect PG tradition of
> posting only definitive stuff.
>
>
> Would this be useful?
>
>
> I might offer web space, computing and  bandwidth to post XML, convert
> it to txt and html and what else, and submit the result to
> whitewashing.
>
> You will be able to have installed all the software to handle the
> conversion, and have submissions converted by automatic procedures.
>
> This might be seen as a beta-test of xml whitewashing procedures. I am
> at most neutral to xml (I recognize its unavoidability, but I complain
> the trend, I would prefer a more human-friendly markup). So it will
> not be pro-XML biased. And I am authorized to whitewashing, so this
> can be seen as making my whitewashing in public.
>
> The posting-and-converting should be automatic: a web interface to
> submit a zip/tar.gz/tar.bz2 file, semi-automatic unzipping and
> conversion, poster and site administrator OK to make the posting
> public. Then the whitewashing could start WITHOUT corrections: if
> anything in the result is wrong, then one should repeat the
> submission.  If the post will be XML + converted, or converted only,
> will be PG choice. The posts, complete of XML, will remain
> indefinitely on the test site. Of course, an additional line will be
> included to warn that the file is not an official PG file but only an
> intermediate working file. But except for this line, everything should
> be identical to a PG file, header and footer, PG number and filename
> included.
>
> Drawback: the server is located in Italy, so I cannot do it for non-EU
> clearable items. You'll have to submit clearance for death+70 (with
> procedures to decide, but a copy of a LOC authority record or an
> encyclopaedia article will of course be enough).
>
>
>
> Carlo
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 


From marcello at perathoner.de  Fri Oct 22 06:52:18 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 06:52:43 2004
Subject: [gutvol-d] A layman's critique of PGTEI
In-Reply-To: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com>
References: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com>
Message-ID: <41791092.3010907@perathoner.de>

Joshua Hutchinson wrote:

> Now, from your comments, it looks like the next thing to do is decide
> on a standard title page.  I personally, don't have a problem with
> the format you have.  It is clean and easy to read.

The title page you see is automatically generated from the teiHeader 
inside the <divGen type="titlepage"> transformation.

You can also have a custom title page if you replace the <divGen> with 
your own <div> like this:

   <div type="titlepage">
     <docTitle>Hamster Hooey and his Gooey Kablooie</docTitle>
     <p>by</p>
     <docAuthor><name>Maple Syrup</name></docAuthor>
     <p> bla bla </p>
   </div>


I didn't get very far implementing custom title pages because I have 
always worked from the PG text only. I never did see any real title page 
of a PG book and have no notion of how funny they get.

If you can spare the time, it would help immensely if you could grab 
some representative scanned title pages at DP and put them up somewhere 
for everybody to see so we could discuss how to mark them up.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From juliet.sutherland at verizon.net  Fri Oct 22 07:19:14 2004
From: juliet.sutherland at verizon.net (Juliet Sutherland)
Date: Fri Oct 22 07:19:37 2004
Subject: [gutvol-d] Re: barriers to XML posting
References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>
Message-ID: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn>

At DP, most of us use ABBYY Finereader (versions ranging from 5.0 to 7.0) to 
do the OCR work. It does not currently have an option to save the result as 
XML, though I suppose they might well implement something like that 
eventually. Also, for proofreading purposes, it is much easier to work with 
material that does not yet have all the XML tags, etc.

We have always planned to have formating rounds, and, in fact, they are 
currently in active development and I hope they will be in place by the end 
of the year. I expect that the nature of the formatting rounds will change 
with time. My hope, however, is that in most cases, even the people working 
in the formating rounds will not have to see all the verbosity that goes 
with XML and that can be represeted unambiguously in a more reader friendly 
way. Paragraph markers are an example that springs easily to mind. They can 
be added automatically later and it would be a serious waste of volunteer 
time to have to type those in in place of the blank line that we currently 
use. That case is trivially obvious, but most others may not be. One of the 
things that we will have to work out through experience is exactly what 
kinds of markup happen at which stage of the process.

When one really gets into the details, there are a staggering number of 
them. A formating/markup issue that we've struggled with recently is 
teaching people how to know when to include a period inside italics and when 
not to. And then getting them to do it correctly. Yes, I know this doesn't 
matter for html, and won't matter for XML, but it does matter for plain text 
versions that use underscores to mark italics. I mention it only as an 
example of the tiddly, little, significant details that must be worked out 
in a day-to-day production environment.

But back to my point. I expect that we will end up with a combination of 
automatic tools and manual intervention. Exactly what will happen where in 
the process remains to be determined. We'll try something, which inevitably 
won't be the right thing, and we will proceed with incremental changes until 
we end up with a system that works reasonably well. Ideally the output of 
that system will be an XML master file that can then be used to generate 
versions in whatever form any individual user requests.

And answering another request from another message: I have LOTS of scanned 
title pages. Where would you like them?

JulietS
DP Site Admin

----- Original Message ----- 
From: "Joshua Hutchinson" <joshua@hutchinson.net>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Sent: Friday, October 22, 2004 8:54 AM
Subject: Re: [gutvol-d] Re: barriers to XML posting


----- Original Message -----
From: Steve Thomas <stephen.thomas@adelaide.edu.au>
>
> A question (possibly better put over on the DP list):
>
> Is it possible to OCR a scan directly to XML? Or is the output
> from OCR always going to be text?
>

That is a very DP related question, but I'll answer here as best as I 
understand the future plans (and let others correct me where needed).

The plan at DP is to move from the current 2 round proofing model to a 
(probably) 4 round proofing/markup model.

The content provider will take the scans and OCR them normally.  That part 
doesn't change.

Then, there are 2 rounds of proofing that concentrate on typos, spelling, 
etc.  Very similar to the 2 rounds we have now.

Then, there are 2 MORE rounds of markup.  Here is where all the markup like 
poetry, italics/bold, footnotes, chapter headings, thoughtbreaks, etc, etc 
are done.

Then, when the final result gets out of 4 rounds, it is nicely marked up (in 
theory) XML.  The post-processor does his/her normal magic, combining all 
the pages, running validators on it, etc.

As far as the OCR process, we currently run some pre-processors on text to 
fix common scannos, etc.  I'd be surprised if those pre-processors didn't 
improve/change as the XML world emerges at DP.

Josh
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d


From joshua at hutchinson.net  Fri Oct 22 07:21:22 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 07:21:46 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Jim Tinsley wrote:
> 
> > That
> > process must work for _all_ teixlite files, not just ones that are
> > specially cooked, using constraints not specified within the chosen
> > DTD. Here's where we hit the rocks today. 
> 
> Impossible. There are things you cannot specify in a DTD but still must 
> be followed to get a semantically correct file. (This holds for every 
> XML application not just for PGTEI.) You always have to obey some extra 
> rules besides validity. These are put down in the PGTEI guide.
> 

Hmm... Maybe I misunderstand here.  If a file comes in, marked up in TEI-Lite and we cannot transform it with our standard process, it seems to me either the DTD we've chosen is incomplete or the TEI markup has a bug.

Now, if a new text needs a feature not in our current DTD (am I using the teminology right here), I'm not against modifying the DTD standard to include it, but there would need to be some procedure to do it so that it gets "reviewed" by others first.

Or, maybe there is a way to define new elements that are outside the standard DTD within the XML submission file itself?  Again, I'm trying to learn this as I go, so if my question is stupid, I apologize in advance.

> 
> > The
> > only things we must have -- both for our own internal practical
> > purposes and for the use of future readers -- is that it should work
> > reliably on _all_ texts that conform to the XML DTD chosen, be open
> > source, and be cross-platform. A reader needs to be able to tweak the
> > transform and re-run on her own desktop. 
> 
> Same as above. The DTD is not strict enough (RelaxNG will be better, but 
> it's still early). There will always be valid TEI files that do not 
> transform to `correct' output files.
> 
> I don't see why it is necessary for the conversion tools to run on 
> everybodies desktop before we can start posting files. If the tools run 
> on pglaf.org and gutenberg.org that is more than enough for a start. The 
> tools can be fixed later. That won't make posted valid TEI files invalid.
> 

If we have the tools on the server and available for use, that is sufficient for me.  But I also think that all the files (DTD, XSLT, and whatever else) should always be available for download for the industrious person that DOES want to run it on their own machine.

Josh
From joshua at hutchinson.net  Fri Oct 22 07:32:49 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 07:33:13 2004
Subject: [gutvol-d] A layman's critique of PGTEI
Message-ID: <20041022143249.9E4889E945@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> If you can spare the time, it would help immensely if you could grab 
> some representative scanned title pages at DP and put them up somewhere 
> for everybody to see so we could discuss how to mark them up.
> 

Hey, this I can do!

I'll try to get a good cross section of project types.

American Missionary (periodical) - http://www.pgdp.net/projects/projectID3f1ea8bfa6d0c/227.png

Manhood Perfectly Restored (non-fiction pamphelt) - http://www.pgdp.net/projects/projectID4173613f31c06/003.png

Mike Flannery On Duty and Off (novel fiction) - http://www.pgdp.net/projects/projectID4154ff24abb42/002.png

The History of Woman Suffrage (non-fiction) - http://www.pgdp.net/projects/projectID403a76a8ebb0f/0001.png

Josh
From Bowerbird at aol.com  Fri Oct 22 08:13:53 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 08:14:23 2004
Subject: [gutvol-d] Aside on old computers
Message-ID: <9a.178ae7fc.2eaa7db1@aol.com>

carlo said:
>   There should be something wrong 
>   if in 12 years what was good for 
>   a half-million-dollar research project 
>   isn't even good for a forest village.

well, that technology _did_ move _very_ fast
in those 12 years, so i don't believe one could
always say that unequivocally -- for instance,
there's no reason to make the third world wait
12 years to get the cell-phones we have now,
when we can give them to them immediately --
but nonetheless, it _is_ true that 12-year-old
computers _can_ display an electronic-book
just _fine_, if we use its resources _wisely_,
rather than imposing bloatware on it instead...

it's possible to build a state-of-the-art viewer
that runs under windows95.  i know, i've done it.

-bowerbird
From marcello at perathoner.de  Fri Oct 22 08:38:07 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 08:38:35 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>
	<02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
Message-ID: <4179295F.1010702@perathoner.de>

Juliet Sutherland wrote:

> And answering another request from another message: I have LOTS of 
> scanned title pages. Where would you like them?

Anywhere I can get them.

Or, if you prefer, zip and mail them to me. I can put them up at PG.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Gutenberg9443 at aol.com  Fri Oct 22 08:39:07 2004
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Oct 22 08:39:38 2004
Subject: [gutvol-d] English and request
Message-ID: <155.41b2bdc1.2eaa839b@aol.com>

 
In a message dated 10/22/2004 4:16:12 AM Mountain Standard Time,  
marcello@perathoner.de writes:

>>But I fail to see how this could possibly imply that  >>people who live in 
>>Africa wanted to learn  English.


English is becoming, if you will pardon a rather risible expression, the  
lingua franca of the business world. Most diplomacy is done in French and  
English. Of course someone who expects his/her offspring to remain in exactly  their 
present location and circumstances has no need to learn, or to teach their  
offspring,  any languages other than those spoken locally. But many small  
African countries have several different local languages--Ghana comes to  mind at 
once, with three languages and many dialects of those languages. English  is 
the official language there, because it's the only way that the country can  
get its business done when people of one cultural group can't even talk with  
people of another cultural group five miles away.
 
Let's look at another small country, not in Africa. This is a quotation  from 
the online version of World Book Encyclopedia:
 
"New Guineans speak more than 700 languages. Because of the number of  
languages, many people cannot communicate with neighbors who live only a short  
distance away. A growing number of eastern New Guineans speak Pidgin English, or  
Tok Pisin, as a second language. This lingua franca, or common language,  
enables speakers of different tongues to communicate with one another. In the  
west, many people speak Malay as a second language."
 
I have read elsewhere that those 700 languages involve 48 different  language 
families.
 
Tok Pisin works, but it is too awkward to use for anything more than local  
conversation. "Belly belonga me walk about too much" is an awkward way of 
saying  "I have an upset stomach," and "big feller you punch him teeth him cry" 
doesn't  immediately make me think of a piano.  Malay is much better, but it 
still  isn't a language that will allow somebody to get into the worldwide  market.
 
Afghanistan has three languages. Most middle and upper class Afghans also  
speak Farsi. I don't remember how many languages India has, but it's a  lot.
 
Most Americans do not understand that many Europeans routinely speak  several 
languages, and do not realize that learning another language, or two or  
three other languages, should ideally start in infancy.
 
Ideally, everybody worldwide would learn at least French and English in  
addition to their own local languages. In the real world, that's not going to  
happen. I have found that from speaking English and Spanish and having a working  
knowledge of Latin and linguistics, I can read fairly well in Portuguese and  
Italian. I miss a lot of words, but I can get the gist of what I'm reading. 
I'm  not up to reading a French textbook, but I can usually wade through an 
article  in Le Figaro. I'm hopeless in German, even if I recognize the root 
words,  because I do not comprehend the way German is put together.
 
I am not trying to be insular and I am not insulting anybody else's  
language. I hate to see any language die, because just about every language is  able 
to express at least one thing that other languages can't. For reasons I do  not 
understand, ancient Egyptian translates better into German than into French  
or English; therefore Egyptologists must learn German. I have to limp along on 
 English translations, realizing that they are inadequate.
 
 
I am certainly not saying that everybody in the world has to learn English.  
What I am saying is that, like it or not, it is one of the languages one must  
have to progress very far in life. Therefore, I think that books should be 
made  available in English to as close as possible to everybody. I would be 
overjoyed  if as many books were available in other languages, especially French, 
as are  available in English.
 
I hope this clarifies my position.
 
Anne

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041022/ddf54f0b/attachment.html
From scott_bulkmail at productarchitect.com  Fri Oct 22 09:00:34 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Fri Oct 22 09:05:52 2004
Subject: [gutvol-d] scanned title pages
In-Reply-To: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>
	<02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
Message-ID: <p06110406bd9edd2f6eac@[192.168.0.52]>

>And answering another request from another message: I have LOTS of scanned title pages. Where would you like them?

If there is diskspace + bandwidth to host and serve them, I think it would be useful to post every title page somewhere.  Or, at least to start with the top 100 or 1000 or some reasonable subset.

For example, I recently did a massive review of the "catalog" data (posted to GUTCAT, alas with minimal response).  With so many inconsistencies between various sources, I would like to be able to reference the original.
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From marcello at perathoner.de  Fri Oct 22 09:09:25 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 09:09:52 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com>
References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com>
Message-ID: <417930B5.5040907@perathoner.de>

Joshua Hutchinson wrote:

> Hmm... Maybe I misunderstand here.  If a file comes in, marked up in
> TEI-Lite and we cannot transform it with our standard process, it
> seems to me either the DTD we've chosen is incomplete or the TEI
> markup has a bug.

Consider following examples.

A DTD-based validator can catch this:

   <address>
     <date>01 Jan 2004</date>
   </address>

because a date has no business inside an address.

But not this:

   <address>
     <name>Chicago</name>
     <street>2830 North Clark</street>
     <place>Curl Up and Dye Beauty Salon</place>
   </address>

The validator cannot know that the markup is all wrong. Of course this 
will _transform_ all right.


> Now, if a new text needs a feature not in our current DTD (am I using
> the teminology right here), I'm not against modifying the DTD
> standard to include it, but there would need to be some procedure to
> do it so that it gets "reviewed" by others first.

TEI has a well documented interface for exactly this purpose.

Experience has shown that not even the full TEI can accomodate all 
cases. So, if you need to mark up something completely new, as eg. the 
message you just got from an alien civilization, you can expand the TEI 
DTD and still conform to the TEI standard.


> Or, maybe there is a way to define new elements that are outside the
> standard DTD within the XML submission file itself?  Again, I'm
> trying to learn this as I go, so if my question is stupid, I
> apologize in advance.

No. All you can define inside an XML file is the DTD (or other schema) 
you want to use and entities like &myentity;

Of course you can use a DTD that defines some stuff and then includes 
the standard TEI DTD. But, as said above, there is a better way to do 
that in TEI.


> If we have the tools on the server and available for use, that is
> sufficient for me.  But I also think that all the files (DTD, XSLT,
> and whatever else) should always be available for download for the
> industrious person that DOES want to run it on their own machine.

Already done. Start here:

   http://www.gutenberg.org/tei/


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Fri Oct 22 09:25:11 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 09:25:47 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <1f4.15d8e76.2eaa8e67@aol.com>

marcello said:
>   That may not be simple but is still better than what you have now:
>   if the txt file happens to lack the page numbers 
>   there is no way you could get them short of redoing the book.

you wouldn't need to "redo the book" to insert page number information.
and if the .html file had that information, you could do it automatically.

and if page-number information _was_ included in the text-file,
i would support it in my viewer-program.  so tony, or any other user,
could simply toggle its display on or off, by choosing a menu-item.
it's ridiculous to have users go through all the difficulty of doing a
conversion to access such a simple and basic piece of information.
y'all should step back and look at yourselves for even suggesting it.

and why should the text-file "happen to lack the page numbers" in the 
first place?  that kind of terminology makes it sound so "accidental".

distributed proofreaders retains page-number information through
all of its processes, because, get this, they find it's useful to them.
but then they drop it from the final product!  why?  don't they realize
that someone else might find it useful?  of course they do, that's why
people have started _retaining_ it (i almost said "including" it, but
that's the same type of error) in the .html versions.  but nonetheless,
it is still dropped from the text-file.  just like the information about
the names of image-files and their correct placement.  it's as if there
was a conscious attempt to make the text-files as useless as possible.

and what will the users at large think when they are informed that
this policy is in place?  i don't know for sure, but i'm gonna find out.

-bowerbird
From Bowerbird at aol.com  Fri Oct 22 09:30:52 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 09:31:25 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <d5.19a590f5.2eaa8fbc@aol.com>

marcello said:
>   But this is sooo tedious!  We have to replicate the exact setup of 
>   gutenberg.org *and* pglaf.org to get reliable results from the beta-test.
>   Example: my servers are all debian and have perl 5.8 
>   whereas ibiblio is redhat enterprise with perl 5.6. 
>   This has often before given me headache because 
>   programs that ran at home, misteriously failed at ibiblio.

if it can't even span 2 versions of linux
which run perl that is .2 versions apart,
that means the process is very fragile;
not nearly as robust as x.m.l. advocates
always make it sound in their sales pitch.

-bowerbird
From Bowerbird at aol.com  Fri Oct 22 09:34:16 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 09:34:48 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <8e.181c91ac.2eaa9088@aol.com>

holden said:
>   So, whereas before, you had to have the standard TXT or HTML versions 
>   because that was all that was available, now we can actually talk about 
>   making customised versions as people want them. 

so every time a person wants to switch an option,
they have to go and do the conversion over again?

do you really not see why that won't appeal to them?

-bowerbird
From joshua at hutchinson.net  Fri Oct 22 09:38:42 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 09:39:09 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <20041022163842.776FD2F8CB@ws6-3.us4.outblaze.com>

----- Original Message -----
From: Bowerbird@aol.com
> 
> holden said:
> >   So, whereas before, you had to have the standard TXT or HTML versions 
> >   because that was all that was available, now we can actually talk about 
> >   making customised versions as people want them. 
> 
> so every time a person wants to switch an option,
> they have to go and do the conversion over again?
> 
> do you really not see why that won't appeal to them?
> 

As opposed to not having an option at all, like we have now?

So, they can either have exactly what they have now... or if the want to, they can have more with a little extra effort.

You're right, we shouldn't give them anything new.

Oh, and don't tell me the reader program should do it.  The reader program will never be the same for every reader, even if you do actually produce a working program.  The only way to provide this information that is platform/reader program independent is to somehow put that into the source in a standard format that multiple reader programs will support.  So far, XML is the *only* format anyone has suggested that will allow that.

Josh
From marcello at perathoner.de  Fri Oct 22 09:44:25 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 09:44:53 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <1f4.15d8e76.2eaa8e67@aol.com>
References: <1f4.15d8e76.2eaa8e67@aol.com>
Message-ID: <417938E9.7030307@perathoner.de>

Bowerbird@aol.com wrote:

> and if page-number information _was_ included in the text-file,
> i would support it in my viewer-program.  so tony, or any other user,
> could simply toggle its display on or off, by choosing a menu-item.

You are being narrow-minded about this.

What if the user wanted to see *only* the page numbers. Your reader does 
not support that.

But with a simple XSL transformation the user can easily strip all 
except the page numbers from the TEI master file.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Fri Oct 22 10:03:29 2004
From: shalesller at writeme.com (D. Starner)
Date: Fri Oct 22 10:03:59 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com>

Steve Thomas writes:

> Is it possible to OCR a scan directly to XML? Or is the output 
> from OCR always going to be text? 

We don't usually scan to text; we scan to RTF, and guiprep extracts
some of the markup and converts it to lightly marked up text. guiprep
could certainly convert the RTF to XML if we wanted, but DP plans
to seperate the markup and proofing rounds.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From Bowerbird at aol.com  Fri Oct 22 10:10:52 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 10:11:28 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <df.452f944.2eaa991c@aol.com>

joshua said:
>   As opposed to not having an option at all, like we have now?

except there's a _much_ easier way to give them the option.

it's the same way you do it with .html -- put it into the file
in a way that enables them to turn it on and off as they wish.

if you're not imaginative enough to figure out how to do that,
then just sit on your hands until you see how i accomplish it.

if you ain't gonna put the information in the file,
no viewer on the surface of the planet can put it in.
but that's _your_ fault, not the fault of the program.
and it shows you don't have the user's interest at heart.


>   Oh, and don't tell me the reader program should do it.

telling _you_ is an exercise in futility.

but when i tell other people that a reader-program should do it,
and give them one which actually does it, _they_ will understand.
and then _they_ will start telling you to include that information.

i've tried speaking to "the powers that be" directly, and found them
nonresponsive, so i'll route stuff through the _users_ from now on.

-bowerbird
From Bowerbird at aol.com  Fri Oct 22 10:18:06 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 10:18:38 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <cf.1a1c0f40.2eaa9ace@aol.com>

marcello said:
>   You are being narrow-minded about this.

that's an _incredibly_ stupid thing to say.


>   What if the user wanted to see *only* the page numbers. 
>   Your reader does not support that.

well, my program doesn't support original-page-numbers
_at_all_ yet, we're talking about what i _will_ implement,
so i don't see how you can assert anything on the matter.

i'm not sure i understand the request, anyway -- 
you want to see _only_ the page numbers, nothing else?,
i don't see any utility in that, you would have to explain it,
but since the user can control the color of the text, they'd
just match it to the background color to make it "disappear" 
-- but if it is something that readers will want,
i _will_ support it.

-bowerbird
From jonathan_ingram at yahoo.com  Fri Oct 22 10:19:11 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Fri Oct 22 10:19:37 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <df.452f944.2eaa991c@aol.com>
Message-ID: <20041022171911.82526.qmail@web41721.mail.yahoo.com>


--- Bowerbird@aol.com wrote:

> joshua said:
> >   As opposed to not having an option at all, like we have now?
> 
> except there's a _much_ easier way to give them the option.
> 
> it's the same way you do it with .html -- put it into the file
> in a way that enables them to turn it on and off as they wish.

That's what we do, and the reader is any Mozilla derivative. If you don't wish
to use a Mozilla derivative, then when we switch to a TEI-based master format,
you can generate files to used in other readers, such as any text editor, and
which will contain as much of the information from the original as you require.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From marcello at perathoner.de  Fri Oct 22 10:19:14 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 10:19:46 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <d5.19a590f5.2eaa8fbc@aol.com>
References: <d5.19a590f5.2eaa8fbc@aol.com>
Message-ID: <41794112.9050309@perathoner.de>

Bowerbird@aol.com wrote:

> if it can't even span 2 versions of linux
> which run perl that is .2 versions apart,
> that means the process is very fragile;
> not nearly as robust as x.m.l. advocates
> always make it sound in their sales pitch.

Bumbling along without a clue, as usual?

perl has nothing to do with XML and the software I was speaking about 
drives the catalog.


-- 
Visit the Bowerbird Fansite: www.gnutenberg.de/bowerbird/
From jonathan_ingram at yahoo.com  Fri Oct 22 10:22:49 2004
From: jonathan_ingram at yahoo.com (Jonathan Ingram)
Date: Fri Oct 22 10:23:17 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <cf.1a1c0f40.2eaa9ace@aol.com>
Message-ID: <20041022172249.81135.qmail@web41727.mail.yahoo.com>

--- Bowerbird@aol.com wrote:
> i'm not sure i understand the request, anyway -- 
> you want to see _only_ the page numbers, nothing else?,
> i don't see any utility in that, 

One of the advantages of using a standard, structured, and well-supported
format for marking our texts is that we can do things with them that you don't
see the utility of.

-- 
Jon Ingram


_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com
From Bowerbird at aol.com  Fri Oct 22 10:28:25 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 10:28:58 2004
Subject: [gutvol-d] Re: barriers to XML posting
Message-ID: <1b9.463d0db.2eaa9d39@aol.com>

marcello said:
>   Bumbling along without a clue, as usual?

your process is _fragile_.  and everybody can see that.

the person who would be the emperor's tailor is naked.

-bowerbird
From marcello at perathoner.de  Fri Oct 22 10:29:45 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 10:30:15 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <df.452f944.2eaa991c@aol.com>
References: <df.452f944.2eaa991c@aol.com>
Message-ID: <41794389.9080702@perathoner.de>

Bowerbird@aol.com wrote:

> if you ain't gonna put the information in the file,
> no viewer on the surface of the planet can put it in.
 > but that's _your_ fault, not the fault of the program.
 > and it shows you don't have the user's interest at heart.

Assiming a worm has eaten himself thru a fat and juicy word in a book.
How do you mark that up in ZML ?

This is how we do it in TEI:

   B<gap type="wormhole" />werbird

And you ? Will there be a menu item to show / hide wormholes ?

Or are you gonna sacrifice the users interest over an inadequacy of your 
reader program ?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Fri Oct 22 10:32:56 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 10:33:35 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <9a.178fd9b0.2eaa9e48@aol.com>

ingram said:
>   One of the advantages of using 
>   a standard, structured, and well-supported format 
>   for marking our texts is that we can do things with them 
>   that you don't see the utility of.

if users want something, i'll provide it,
even if i _cannot_ see the utility of it...

but perhaps _you_ can explain to me
the utility of showing page-numbers
and nothing else?  because, for the
_life_ of me, it escapes me right now.

i've only thought about it for 5 minutes,
and maybe something will come to me
just as soon as i hit "send" on this, but...

-bowerbird
From jon at noring.name  Fri Oct 22 10:35:32 2004
From: jon at noring.name (Jon Noring)
Date: Fri Oct 22 10:36:06 2004
Subject: [gutvol-d] Re: barriers to XML posting
In-Reply-To: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com>
References: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com>
Message-ID: <108705515875.20041022113532@noring.name>

D. Starner wrote:
> Steve Thomas writes:

>> Is it possible to OCR a scan directly to XML? Or is the output 
>> from OCR always going to be text? 

> We don't usually scan to text; we scan to RTF, and guiprep extracts
> some of the markup and converts it to lightly marked up text. guiprep
> could certainly convert the RTF to XML if we wanted, but DP plans
> to seperate the markup and proofing rounds.

It is certainly possible to OCR directly to "XML", but it won't be
very useful XML. It is nigh impossible to train an OCR program, unless
we get breakthroughs with AI so we can build machines with human
intelligence, to unambiguously recognize and markup the *structure*
and *semantics* of documents and textual content (such as using the
TEI vocabulary designed for this purpose.)

Thus, there must be substantial human interaction to determine what
any chunk of text represents (structurally/semantically).

Of course, if the goal is simply to "clone" the original printed
text's visual presentation, then forget the above. But then the
resulting cloned text is a lot less useful for repurposing, for
accessibility and for other advanced purposes.

Jon Noring

From Bowerbird at aol.com  Fri Oct 22 10:46:02 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 10:46:37 2004
Subject: [gutvol-d] aspects of  a well-done e-book
Message-ID: <1b9.4659080.2eaaa15a@aol.com>

marcello said:
>   Assiming a worm has eaten himself
>   thru a fat and juicy word in a book.
>   How do you mark that up in ZML ?

with an annotation.

or with a form of emphasis you'd label as "wormhole".

(how is this _currently_ indicated in the e-texts?
oh, never mind, don't answer that, nobody cares...)

besides, this is the kind of question you should be asking
on the beta-test listserve for the z.m.l. viewer-program...

people on _this_ list don't need to read about
the simple implementational details of z.m.l.,
they just need to know that it's a good way to
escape from the complexity of x.m.l. markup,
a way that gives powerful e-book functionality,
and yet still resonates with the plain-text files
they are familiar with from project gutenberg...

-bowerbird
From hart at pglaf.org  Fri Oct 22 10:53:31 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Oct 22 10:53:32 2004
Subject: [gutvol-d] Languages in PG
In-Reply-To: <Pine.GSO.4.58.0410201530130.16427@vtn1.victoria.tc.ca>
References: <1e3.2c7b180e.2ea83cf0@aol.com>
	<Pine.GSO.4.58.0410201530130.16427@vtn1.victoria.tc.ca>
Message-ID: <Pine.LNX.4.60.0410221052520.20403@pglaf.org>


Don't forget all the languages available at pgcc.net


Michael

From joshua at hutchinson.net  Fri Oct 22 10:53:33 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 10:54:00 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> > Hmm... Maybe I misunderstand here.  If a file comes in, marked up in
> > TEI-Lite and we cannot transform it with our standard process, it
> > seems to me either the DTD we've chosen is incomplete or the TEI
> > markup has a bug.
> 
> Consider following examples.
> 
> A DTD-based validator can catch this:
> 
>    <address>
>      <date>01 Jan 2004</date>
>    </address>
> 
> because a date has no business inside an address.
> 
> But not this:
> 
>    <address>
>      <name>Chicago</name>
>      <street>2830 North Clark</street>
>      <place>Curl Up and Dye Beauty Salon</place>
>    </address>
> 
> The validator cannot know that the markup is all wrong. Of course this 
> will _transform_ all right.
> 
> 

Ok, I am learning here, honest.  But here's another dumb question in the meantime.

Shouldn't a TEI-Lite validator flag your second example as wrong, too?  Looking over the TEI-Lite documentation, you could markup that information, but in a slightly different format.

<address>
  <addrLine><name type="city">Chicago</name></addrLine>
  <addrLine><name type="street">2830 North Clark</name></addrLine>
  <addrLine><name type="place">Curl Up and Dye Beauty Salon</name></addrLine>
<address>

Now, putting aside the fact that I doubt I'd ever bother to mark up an address to that exacting of a detail ;) ... Am I understanding the role of the validator properly in that it should choke on the first, since it doesn't, as far as I can tell, conform to TEI-Lite?

> No. All you can define inside an XML file is the DTD (or other schema) 
> you want to use and entities like &myentity;
> 
> Of course you can use a DTD that defines some stuff and then includes 
> the standard TEI DTD. But, as said above, there is a better way to do 
> that in TEI.

That seems acceptable to me.  For instance, to continue you example above, if you wanted to add <place>, <name>, and <street> to the <address> markup, then you could put those elements in your personally DTD, which calls the TEI-Lite DTD, and then the validator should be able to parse it as acceptable code, right?

But the question then becomes, will the standard transform be able to handle the new code in your DTD?  If it just ignores what it doesn't understand, that would be acceptable, I'd think.  But if the new tags cause the transform to choke, then we'd have a problem.


Josh
From marcello at perathoner.de  Fri Oct 22 10:53:36 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 10:54:04 2004
Subject: [gutvol-d] aspects of  a well-done e-book
In-Reply-To: <9a.178fd9b0.2eaa9e48@aol.com>
References: <9a.178fd9b0.2eaa9e48@aol.com>
Message-ID: <41794920.5050109@perathoner.de>

Bowerbird@aol.com wrote:

> but perhaps _you_ can explain to me
> the utility of showing page-numbers
> and nothing else?  because, for the
> _life_ of me, it escapes me right now.

YHBT. YHL. HAND.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Fri Oct 22 11:02:57 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 11:03:24 2004
Subject: [gutvol-d] Languages in PG
Message-ID: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com>

That isn't PG, though, so it doesn't really apply to a discussion about the numbers of languages we have support for.

Josh

----- Original Message -----
From: Michael Hart <hart@pglaf.org>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Languages in PG
Date: Fri, 22 Oct 2004 10:53:31 -0700 (PDT)

> 
> 
> 
> Don't forget all the languages available at pgcc.net
> 
> 
> Michael
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Fri Oct 22 11:05:45 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 22 11:06:14 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com>
References: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com>
Message-ID: <41794BF9.6060503@perathoner.de>

Joshua Hutchinson wrote:

> Shouldn't a TEI-Lite validator flag your second example as wrong,
> too?  Looking over the TEI-Lite documentation, you could markup that
> information, but in a slightly different format.

That was just a general example. It was not meant to be specific to TEI.

> Now, putting aside the fact that I doubt I'd ever bother to mark up
> an address to that exacting of a detail ;) ... Am I understanding the
> role of the validator properly in that it should choke on the first,
> since it doesn't, as far as I can tell, conform to TEI-Lite?

It will choke if you validate my example against the TEI DTD. But I 
could write a SOMETHING DTD that validates that example all right.

> That seems acceptable to me.  For instance, to continue you example
> above, if you wanted to add <place>, <name>, and <street> to the
> <address> markup, then you could put those elements in your
> personally DTD, which calls the TEI-Lite DTD, and then the validator
> should be able to parse it as acceptable code, right?

Yes.

> But the question then becomes, will the standard transform be able to
> handle the new code in your DTD?  If it just ignores what it doesn't
> understand, that would be acceptable, I'd think.  But if the new tags
> cause the transform to choke, then we'd have a problem.

A standard TEI transform will simply ignore all tags he doesn't 
recognize. Just like a HTML browser does.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Fri Oct 22 11:10:03 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Oct 22 11:10:05 2004
Subject: [gutvol-d] scanned title pages
In-Reply-To: <p06110406bd9edd2f6eac@[192.168.0.52]>
References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>
	<02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
	<p06110406bd9edd2f6eac@[192.168.0.52]>
Message-ID: <20041022181003.GB24510@pglaf.org>

On Fri, Oct 22, 2004 at 12:00:34PM -0400, Scott Lawton wrote:
> >And answering another request from another message: I have LOTS of scanned title pages. Where would you like them?
> 
> If there is diskspace + bandwidth to host and serve them, I think it would be useful to post every title page somewhere.  Or, at least to start with the top 100 or 1000 or some reasonable subset.
> 
> For example, I recently did a massive review of the "catalog" data (posted to GUTCAT, alas with minimal response).  With so many inconsistencies between various sources, I would like to be able to reference the original.

I do have all of the title pages & verso pages submitted
electronically.  This is thousands and thousands of images.

Our new copyright system makes it relatively easy to just
find one online (though I have not made this feature available -
but it's easy).  I think this will be a good method for
the future.

The older system is not as easy, but I still have the images.
Just email me if you need images for a particular item.  If
you'd rather, I could package up the older clearances (pre-August '04
or so) and get them to you.  It's probably < 2GB total.

N.B., this stuff is not suitable for public redistribution
with our eBooks.  Many scans are not very high quality.  Some
are, and it would be fine with me to make them publicly
available somewhere.  I don't have much opinion about 
including these with the eBooks themselves - that's something
for the producer to decide.  Most title & verso pages are pretty
boring, though, so probably are not worth including as part
of an eBook.
  -- Greg
From joshua at hutchinson.net  Fri Oct 22 11:23:58 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 11:24:26 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <20041022182358.9FF3A4F4E0@ws6-5.us4.outblaze.com>

Excellent!  Your answers are exactly what I was hoping they would be!  (Does this mean I'm starting get my brain wrap around this stuff?  ;) )

I think the next step(s) should be making some XML, running in through the transforms, critiqing the output and improve the transform/CSS until we have a workable process, yes?

I'll try to take a look at some of the XML you and Joeren have done up to this point and see what the transform on the server does with them.  We've already identified a need for a "standard" title page format, so that'll be my first area I'll look at.

Josh


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] barriers to XML posting
Date: Fri, 22 Oct 2004 20:05:45 +0200

> 
> Joshua Hutchinson wrote:
> 
> > Shouldn't a TEI-Lite validator flag your second example as wrong,
> > too?  Looking over the TEI-Lite documentation, you could markup that
> > information, but in a slightly different format.
> 
> That was just a general example. It was not meant to be specific to TEI.
> 
> > Now, putting aside the fact that I doubt I'd ever bother to mark up
> > an address to that exacting of a detail ;) ... Am I understanding the
> > role of the validator properly in that it should choke on the first,
> > since it doesn't, as far as I can tell, conform to TEI-Lite?
> 
> It will choke if you validate my example against the TEI DTD. But I 
> could write a SOMETHING DTD that validates that example all right.
> 
> > That seems acceptable to me.  For instance, to continue you example
> > above, if you wanted to add <place>, <name>, and <street> to the
> > <address> markup, then you could put those elements in your
> > personally DTD, which calls the TEI-Lite DTD, and then the validator
> > should be able to parse it as acceptable code, right?
> 
> Yes.
> 
> > But the question then becomes, will the standard transform be able to
> > handle the new code in your DTD?  If it just ignores what it doesn't
> > understand, that would be acceptable, I'd think.  But if the new tags
> > cause the transform to choke, then we'd have a problem.
> 
> A standard TEI transform will simply ignore all tags he doesn't 
> recognize. Just like a HTML browser does.
> 
> 
> 
> -- 
> Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From Bowerbird at aol.com  Fri Oct 22 11:29:59 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 11:30:30 2004
Subject: [gutvol-d] presentation *is* structure (it's right in front of your
	eyes)
Message-ID: <12d.4debe45c.2eaaaba7@aol.com>

noring said:
>   It is certainly possible to OCR directly to "XML", 
>   but it won't be very useful XML. 

this is an important point here, folks, pay attention.

although lots and lots of the time, the x.m.l. advocates
talk about x.m.l. as if it were uniformly high-quality --
which it needs to be to spit out all those conversions,
or magically transform into a different breed of x.m.l.,
two qualities that are often discussed as "automatic"
-- the fact of the matter is that x.m.l. markup can be
awful, and serve no real purpose that is of use to us...

you should look at the x.m.l. that some apps churn out.

so even after a decision is made as to which brand of
t.e.i. to use (such as tei-lite), the real hard decisions
about how to implement a markup strategy still exist.

and after those are answered, the even more difficult
decisions about how to actually implement the strategy
will rear their ugly heads.  it's gonna be a long road, folks.


>   It is nigh impossible to train an OCR program, unless
>   we get breakthroughs with AI so we can build machines 
>   with human intelligence, to unambiguously recognize 
>   and markup the *structure* and *semantics* of 
>   documents and textual content (such as using 
>   the TEI vocabulary designed for this purpose.)

if you're waiting for "artificial intelligence" to come through,
you're gonna be waiting for a really long time.


>   Thus, there must be substantial human interaction to determine 
>   what any chunk of text represents (structurally/semantically).

that is the common understanding.

it is also wrong.

it might (or might not) be true of the _semantic_ nature of a "chunk".
(but we can put that matter aside, because _that_ issue is gonna be
difficult enough even when you have _humans_ work on it directly.)

but it is _definitely_ not true for the _structural_ role of a chunk.

in any book that was prepared by a professional typographer,
_presentation_ *is* _structure_, because that is _exactly_ what 
a good typographer does, uses _presentation_ to show _structure_.
that's why humans don't have trouble figuring out a book's structure.

i don't blame people for telling you this.  they don't know any better.
but the mere fact that they don't know any better does _not_ mean
that you have to believe what they tell you.  because they are wrong.

when they tell you the only way you can have that information is to
have humans encode it in a complex markup system, don't believe it.
they are wrong, and their mistake will waste _tons_ of your labor.
their emperor is naked, and you must tell them they need to go away.

you can get that information, easily.  it's right in front of your eyes.

of course, when they willy-nilly flatten the o.c.r. results of a book
to plain text, they throw away most of that valuable information.

but _even_then_, it's possible to ascertain most all of its structure.

for an obvious example, people recognize headers because they are
big and bold.  strip away fontsize and styling, it gets more difficult.
nonetheless, if you're smart, you can still locate headers accurately.
you can even write computer routines that will do it for you.  fast.
i know, because i've written 'em.  other people could write 'em too.

i repeat:  in a well-laid-out book, presentation _is_ structure.

and that is the message i have been communicating here for a year.
but nobody here seemed to want to believe it.  your advance notice
period has expired now, so i will go and tell the rest of the world...

-bowerbird
From hart at pglaf.org  Fri Oct 22 12:00:58 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Oct 22 12:01:00 2004
Subject: [gutvol-d] re: coming to michael doorstep with hat in hand
In-Reply-To: <4177A743.6020108@perathoner.de>
References: <8b.17ed6e78.2ea83dc0@aol.com>
	<Pine.LNX.4.61.0410201834500.4629@angst.gnu-designs.com>
	<41770A4E.9030305@hutchinson.net>
	<Pine.LNX.4.61.0410202110350.11249@angst.gnu-designs.com>
	<4177A743.6020108@perathoner.de>
Message-ID: <Pine.LNX.4.60.0410221200310.20403@pglaf.org>


I'm just going to as all parties concerned to dial it down a few notches.


From hart at pglaf.org  Fri Oct 22 12:08:27 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Oct 22 12:08:27 2004
Subject: [gutvol-d] Languages in PG
In-Reply-To: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com>
References: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0410221206550.20403@pglaf.org>


On Fri, 22 Oct 2004, Joshua Hutchinson wrote:

> That isn't PG, though, so it doesn't really apply to a discussion about the 
> numbers of languages we have support for.

Just making sure people know there are over 100 languages
available there for the taking when/if they want them.

mh
From ke at gnu.franken.de  Fri Oct 22 12:33:49 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Fri Oct 22 13:30:56 2004
Subject: [gutvol-d] Re: Languages in PG
In-Reply-To: <Pine.GSO.4.58.0410201530130.16427@vtn1.victoria.tc.ca> (Andrew
	Sly's message of "Wed, 20 Oct 2004 15:40:15 -0700 (PDT)")
References: <1e3.2c7b180e.2ea83cf0@aol.com>
	<Pine.GSO.4.58.0410201530130.16427@vtn1.victoria.tc.ca>
Message-ID: <shlldylf8i.fsf@tux.gnu.franken.de>

Andrew Sly <sly@victoria.tc.ca> writes:

> Also, the numbers below (taken from the catalog) show that,
> although PG's non-english content can certainly be expanded,
> it is not insignificant:
> French (367)
> German (307)
> Finnish (85)
> Chinese (69)
> Spanish (59)
> Italian (36)

Not too bad.  German is "slow" because many good texts are available
elsewhere.  It starts with http://gutenberg.spiegel.de; continues with
sites dedicated to special authors like Karl May, Arno Schmidt, Novalis,
or Georg Simmel; and does not end with digitizing projects located at
Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck).
Especially the Austrian project (alo - austrian literature online:
http://www.literature.at/) is very interesting even if seem to offer
only PDF "for free".

More German texts are tracked at http://www.litlinks.it

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From joshua at hutchinson.net  Fri Oct 22 13:38:21 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 22 13:38:50 2004
Subject: [gutvol-d] Re: Languages in PG
Message-ID: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com>

Interesting... Can any of those sites be raided for content to bolster our German titles?  (I can't read German, so my checking directly wouldn't do me any good!)

Josh

----- Original Message -----
From: Karl Eichwalder <ke@gnu.franken.de>
To: gutvol-d@lists.pglaf.org
Subject: [gutvol-d] Re: Languages in PG
Date: Fri, 22 Oct 2004 21:33:49 +0200

> 
> Andrew Sly <sly@victoria.tc.ca> writes:
> 
> > Also, the numbers below (taken from the catalog) show that,
> > although PG's non-english content can certainly be expanded,
> > it is not insignificant:
> > French (367)
> > German (307)
> > Finnish (85)
> > Chinese (69)
> > Spanish (59)
> > Italian (36)
> 
> Not too bad.  German is "slow" because many good texts are available
> elsewhere.  It starts with http://gutenberg.spiegel.de; continues with
> sites dedicated to special authors like Karl May, Arno Schmidt, Novalis,
> or Georg Simmel; and does not end with digitizing projects located at
> Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck).
> Especially the Austrian project (alo - austrian literature online:
> http://www.literature.at/) is very interesting even if seem to offer
> only PDF "for free".
> 
> More German texts are tracked at http://www.litlinks.it
> 
> -- 
>                                                          |      ,__o
>                                                          |    _-\_<,
> http://www.gnu.franken.de/ke/                            |   (*)/'(*)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From jon at noring.name  Fri Oct 22 14:20:47 2004
From: jon at noring.name (Jon Noring)
Date: Fri Oct 22 14:21:53 2004
Subject: [gutvol-d] presentation *is* structure (it's right in front of
	your eyes)
In-Reply-To: <12d.4debe45c.2eaaaba7@aol.com>
References: <12d.4debe45c.2eaaaba7@aol.com>
Message-ID: <84719031375.20041022152047@noring.name>

Bowerbird wrote:
> Jon Noring said:

>> It is certainly possible to OCR directly to "XML", but it won't be
>> very useful XML. 

> although lots and lots of the time, the x.m.l. advocates
> talk about x.m.l. as if it were uniformly high-quality --
> which it needs to be to spit out all those conversions,
> or magically transform into a different breed of x.m.l.,
> two qualities that are often discussed as "automatic"
> -- the fact of the matter is that x.m.l. markup can be
> awful, and serve no real purpose that is of use to us...

Definitely. The key is to use the right markup vocabulary and apply it
consistently. Any system representing document structure (such as ZML)
must be "right and sufficient" and be applied consistently.

Those who understand and speak of XML, they know that XML is not in
and of itself a specific markup vocabulary, it is a rule-set or
framework on how to apply markup to textual content. There are an
infinity of markup vocabularies, and a good markup vocabulary depends
upon the purpose of the markup. XML is used for both database and
publishing applications, and there are many extraordinarily successful
applications of XML. One of the most recent applications of XML which
a lot of people recognize and use is RSS, used for blog feeds and the
like. XHTML is used a lot on the Internet, and is no more complex (in
fact it is simpler in many ways) than legacy HTML.

ZML is an example of a "regularized plain text" system to represent
certain important textual document structures in a way which is fully
machine-readable. I could easily create an XML-based markup vocabulary
clone of the ZML system to represent the same identical structures.


>> Thus, there must be substantial human interaction to determine 
>> what any chunk of text represents (structurally/semantically).

> in any book that was prepared by a professional typographer,
> _presentation_ *is* _structure_, because that is _exactly_ what 
> a good typographer does, uses _presentation_ to show _structure_.
> that's why humans don't have trouble figuring out a book's structure.

Definitely. But what we require is to be able to machine-read and
machine-process the structure and semantics of a textual document.
Even if humans can figure this out by a simple visual glance of the
content in a high-typographic-quality presentation, does not
automatically mean it is easy for machines to do likewise. It is
also not easy to codify because visual presentation is "fuzzy" (pun
not intended), sometimes relying on surrounding context to precisely
define the document structure.

We have to remember that there are a lot of variances in conventions
(both historically and geographically) used for typographic layouts to
visually represent structure and semantics. Not only that, in some
cases they don't even follow conventions, especially when there are
oddities in the content where no convention has been firmly
established. And as previously noted, sometimes the context must be
factored in to fully ascertain structure and semantics.

The "Gedanken" test I use for the minimum requirements of machine-
readable markup (or system such as ZML) for textual documents is if a
text-to-speech engine is potentially capable of communicating the
structure and semantics of the content to a blind listener (who is
unfamiliar with any print conventions -- they've never heard the
terms 'italic' or 'bold') so they can, in real-time (i.e., a one-time
linear audio presentation), gain the same level of comprehension as a
sighted person (familar with typographic conventions) would in reading
a high-quality print version of the text. Pass this test, and the
markup will likely be pretty good for just about any purpose in
addition to accessibility.

Is ZML or other type of "regularized plain text" (or the XML-based
ZML markup vocabulary analog) sufficient to pass this test?


> when they tell you the only way you can have that information is to
> have humans encode it in a complex markup system, don't believe it.

The system only needs to be as complicated as needed to represent the
needed document structures and content semantics in a machine-readable
way such that it passes the test described above.

The $64,000 question therefore is what structure and semantics needs
to be represented in a machine-readable way, and to what degree of
precision. Maybe ZML (and its markup analog) is sufficient, maybe it
isn't.

I interpret from those here who have first-hand experience handling
large numbers of the various types of texts in Project Gutenberg, that
ZML (or any other type of "regularized plain text" system) does not
have sufficient granularity to pass the "test."

Of course, we can argue whether the test as I describe above is too
strict, or maybe not even on-target. But keep in mind this is what the
*accessibility community* wants in machine-readable textual documents,
and what they are working towards in their activities -- they've
wholeheartedly embraced XML-based approaches, for example.

To wave one's hand in dismissal and say they are being unrealistic or
stupid, or that they don't really matter in our decision-making, is a
pretty bigoted and "blind" position (pun intended) to take -- it is
also stupid since meeting their needs for structure and semantics has
many other benefits as well. I might ask a few text-to-speech experts
I know at DAISY to look at the ZML system and tell me if it has
sufficient structural granularity for high-quality text-to-speech
purposes. As far as I am concerned, if they come back and say "no it
doesn't", then I would recommend that PG should not consider ZML for
its Master format, but maybe consider ZML for its plain text output
versions.


> for an obvious example, people recognize headers because they are
> big and bold.  strip away fontsize and styling, it gets more
> difficult.  nonetheless, if you're smart, you can still locate
> headers accurately. you can even write computer routines that will
> do it for you.  fast.  i know, because i've written 'em.  other
> people could write 'em too.

Bold lines which appear by themselves in the flow of text are
sometimes used for structures other than headers. There are many other
similar weirdities involved with italicized text, indented text, etc.,
that we see in visual layouts of texts. Context is often important to
consider to unambiguously discern structure for a visual cue. For
example, one convention often used is that the names of ships is to
be italicized. Thus, if a machine is to discern the name of a ship
from linguistically emphasized text, it has to look at the context.


> i repeat:  in a well-laid-out book, presentation _is_ structure.

No, I'd say it is more accurate to say "for reading by eyesight,
structure is represented by visual presentation cues." Remember,
there are different types of presentation of text, not only visual.
To focus on visual as the only form of presentation that matters is
being very short-sighted (pun intended.)

The only time we must give up and focus only on the visual is when
visual presentation is an important and integral part of the content
itself, such as "poetry as art" and similar avante-garde things. (Here
SVG is of especial appeal, so we have an XML-based solution for this
as well.)


> and that is the message i have been communicating here for a year.
> but nobody here seemed to want to believe it.  your advance notice
> period has expired now, so i will go and tell the rest of the world...

And I've stated the core question to answer is:

"Is ZML (or any other system of regularized plain text) sufficient to
represent document structure and semantics for Project Gutenberg
Master texts?"

I assume Bowerbird is saying "yes", and many others here are saying
"No". I answer the question with a "No". Amusingly, Networker, a very
insightful ebook expert who often posts to The eBook Community, calls
ZML a type of ITF, "Impoverished Text Format", to indicate ZML has
insufficient granularity -- it is "impoverished".

Jon Noring

From jtinsley at pobox.com  Fri Oct 22 16:51:23 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Oct 22 16:51:58 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <41790DFE.3070606@perathoner.de>
References: <20041020135750.11303.qmail@web41728.mail.yahoo.com>
	<41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com>
	<4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com>
	<4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org>
	<4177E9CB.7080200@perathoner.de> <20041022012030.GA23907@panix.com>
	<41790DFE.3070606@perathoner.de>
Message-ID: <20041022235123.GD27926@panix.com>

On Fri, Oct 22, 2004 at 03:41:18PM +0200, Marcello Perathoner wrote:
>Jim Tinsley wrote:
>
>>>I feel Jim is raising artificial objections he knows we cannot overcome. 
>>>If he doesn't want to learn TEI and he doesn't feel like proofing a TEI 
>>>text in emacs, fine. But then, he should step aside and let other people 
>>>do this work.
>>
>>I find this very offensive.
>>
>>I came home, and was reading happily enough through the threads until
>>this.
>
>I am sorry if I spoilt your evening and I apologize for that.

You didn't spoil my evening; just my participation in the thread.
Having been so accused of evilly blocking the righteous progress
of destiny because of my own hidden agenda and neuroses, it's 
hard to say constructive things. 

But I will say one more thing: if you read what I actually _said_
you will realize that almost everyone in this thread -- I would
think -- could fairly easily create an XML and transform that meets
the criteria I laid down. I could myself. So could you, or Jeroen,
for sure. Josh and Jon, no problem. Anyone I've left out?

Of course, none of us could do it for ALL texts. Not yet. But it
doesn't need to be done for all texts; that was explicitly stated.
If somebody wants to set up a standard that works for prose texts
containing Title, Author, Chapter Heads, Paragraphs, Verses,
Letter Headings and Signatures -- plus emphasis and languages, and
try to work with that for a while, that would do. And which of us
could NOT do that with just Xalan or Saxon, a simple XSLT, and
quite a limited HTML-to-text converter?

Of course, it wouldn't handle Alice. It wouldn't handle footnotes
or tables. But for books that don't need these features it 
would work fine. There would be some details to work out in how
the PG header works with them, and maybe the XML file itself
should contain a description of how the HTML and text formats
were derived, so that when we fixed the texts we would know how
to remake them, or that some future reader could re-do the transform
to their own tastes.

And it would be good if, having got all that straight, we could set
it up and document it as a standard so that other people wouldn't 
need to reinvent that wheel. It may be limited, but nobody said that
we have to have a standard to cover all cases before tackling any.
And then the people who are interested could go on to add more 
features, enlarging the standard.

I'm surprised, after last year, that nobody has done this already.
I'm surprised that you and Jeroen, who, in your different ways,
had the best shot at XML didn't get together on it. Certainly, Greg
has been asking you both about it. It _would_ be nice if we had a
few people working together on it, so we get a shared understanding
and consensus.

Frankly, what is going to happen is that a few people at DP are
going to forge a workable standard between them. Others will take
it up, and then everyone will be doing it, so personally I'm just
waiting for it to happen.

jim

From Bowerbird at aol.com  Fri Oct 22 17:39:57 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 22 17:40:36 2004
Subject: [gutvol-d] presentation *is* structure (it's right in front
	of your eyes)
Message-ID: <ba.6368d422.2eab025d@aol.com>


thank you jon, for weighing in...

***

jon said:
>   Those who understand and speak of XML, they know that 
>   XML is not in and of itself a specific markup vocabulary, 

those who _know_ x.m.l. do know that, right.

but some of the people who _speak_ of x.m.l.
do _not_ seem to know it, and they gloss over
all the difficulties without much comprehension.

they think as long as something is "in x.m.l.",
it's gonna have all these magical properties,
when the truth of the matter is that you must
put a lot of sweat into it to get most of them,
sometimes more sweat than they're even worth.


>   there are many extraordinarily successful applications of XML.
>   One of the most recent applications of XML which a lot of people 
>   recognize and use is RSS, used for blog feeds and the like. 

on that you are correct.  any time that you want to exchange data
between incongruent applications, x.m.l. _can_ be a good solution.

(it's not _necessarily_ good, a lot of complications can occur that
mess things up regardless, but the _potential_ is certainly there.)

but even on this "successful" use in the case of r.s.s. and blog feeds,
there is -- as i am sure you know -- a great deal of "controversy"
concerning whether r.s.s. is the best way of doing it, or "atom" is...

and there are additional controversies about _which_ r.s.s. version
is the _best_ one.

and even when all those things get sorted out, what bloggers might
find is they have simply reinvented the wheel previously known as
an announcement listserve, where a missive is sent out to a group
of subscribers and simultaneously added to a cumulative website,
in which case a whole lot of work was done for no real good reason.
but hey, as long as everyone had fun along the way, i guess that's ok.


>   ZML is an example of a "regularized plain text" system 
>   to represent certain important textual document structures 
>   in a way which is fully machine-readable. I could easily create 
>   an XML-based markup vocabulary clone of the ZML system 
>   to represent the same identical structures.

you say that often.  but you've never really told us what the point is.

even if it's possible to represent a simple system in a complex one,
nothing is gained.  you've only lost the benefit you had of simplicity.

and indeed, that's my essence:  use the most simple system possible.


>   Definitely. But what we require is to be able to machine-read and
>   machine-process the structure and semantics of a textual document.

right, and my "machine" (i.e., app) can read and process the structure.

(and we really need to handle "structure" and "semantics" separately,
because semantics is a _lot_ more complex, and much too thorny to
just toss off so casually.  but i'll have more to say on that later...)


>   Even if humans can figure this out by a simple visual glance of 
>   the content in a high-typographic-quality presentation, does 
>   not automatically mean it is easy for machines to do likewise. 

let's put aside the question of how "easy" it is for a machine to do it.
what i have said here, and will say elsewhere, is my routines _can_.
and when i release the proof, other people will know that it's possible,
and they'll then be able to write their own routines that can do it too.
then everyone will wonder why they thought it was so difficult before.


>   It is also not easy to codify because visual presentation is "fuzzy" 
>   (pun not intended), sometimes relying on surrounding context 
>   to precisely define the document structure.

well, you can go on and on about all the reasons why it is difficult.
but once people are doing it, routinely, those "reasons" won't matter.


>   We have to remember that there are a lot of variances in conventions
>   (both historically and geographically) used for typographic layouts 
>   to visually represent structure and semantics. 

so someone will modify their routines to work with those conventions.


>   Not only that, in some cases they don't even follow conventions, 
>   especially when there are oddities in the content where 
>   no convention has been firmly established. 

"oddities" are only "oddities" until someone figures out their pattern.
because if there is no pattern, then nobody understood the structure
in the first place, so there's no way to mark it up using _any_ system.


>   And as previously noted, sometimes the context must be
>   factored in to fully ascertain structure and semantics.

ok, _now_ you're finally getting into the "semantic" part.

if the only way you can understand how to mark up the text
is to actually _understand_ the content, that is _semantic_.

and yes, you need a high level of "intelligence" -- either human or
artificial, and the artificial kind ain't here yet -- to do that markup,
which means that you need humans to do it, and that's why it's costly.

and even if you've got a lot of volunteer labor to throw at the task,
it might not be enough, because this job is also _complex_ to boot.
so you can't just use any volunteers, they have to be highly skilled.

and to top it all off, it's time-consuming, so it's even more costly.

that's why there are very high costs to doing semantic markup,
much higher than the costs of (even manual) structural markup.

and you know what the real kicker is?  even though the _costs_
are sky-high, the _benefits_ of semantic markup ain't that great.
certainly not from the standpoint of the average reader, anyway.
(some scholars might make out, if you coded what they want.)

hey, it's great that the machine can now tell you with certainty 
that the reason "new york times" has been rendered in italics is 
because it's a newspaper.  but the reader _already_knew_that_.
the writer made it clear in the course of setting the context.

i will get to more examples down below, but you get the drift...


>   The "Gedanken" test I use for the minimum requirements 
>   of machine-readable markup (or system such as ZML) 
>   for textual documents is if a text-to-speech engine 
>   is potentially capable of communicating the
>   structure and semantics of the content to a blind listener 
>   (who is unfamiliar with any print conventions -- 
>   they've never heard the terms 'italic' or 'bold')

i doubt you'd find a blind person who's never heard those terms.
but go on...


>   so they can, in real-time (i.e., a one-time linear audio presentation), 
>   gain the same level of comprehension as a sighted person 
>   (familar with typographic conventions) would in reading
>   a high-quality print version of the text. Pass this test, and 
>   the markup will likely be pretty good for just about any purpose 
>   in addition to accessibility.

not only will a text-to-speech engine be "potentially capable"
of communicating the content to a blind person, i actually
intend to build such an engine right into my viewer-program.

whether or not it delivers the _semantics_ of the content is 
wholly dependent on whether you put that information _into_
the file in the first place.  and -- of course -- that's true of
_any_ markup system.  but z.m.l. will have a way to put it in,
yes, and if you do, then there'll be a way to get it out as well.
you'll have to specify exactly _how_ the text-to-speech engine
should vocalize this info.  but any way you can do it, i can too.


>   Is ZML or other type of "regularized plain text" 
>   (or the XML-based ZML markup vocabulary analog) 
>   sufficient to pass this test?

yes.  that's what i've been saying all along.
that's what the test-suite is all about, baby.


>   The system only needs to be as complicated as needed to 
>   represent the needed document structures and content semantics in 
>   a machine-readable way such that it passes the test described above.

if you can do it, i can too.


>   The $64,000 question therefore is 
>   what structure and semantics needs to 
>   be represented in a machine-readable way, 
>   and to what degree of precision. 

different people will require different degrees of "precision".
my target-population is the one michael has always targeted.


>   Maybe ZML (and its markup analog) is sufficient, maybe it isn't.

of course, we can say that about any system, can't we...        ;+)


>   I interpret from those here who have 
>   first-hand experience handling large numbers of
>   the various types of texts in Project Gutenberg, 
>   that ZML (or any other type of "regularized plain text" system) 
>   does not have sufficient granularity to pass the "test."

well, that's how i read the feelings of everyone here
who has chimed in so far on the matter, except myself
and maybe a couple of other people in varying degrees.

but i note once again, for the record, that no one has
yet given me a list of "hard e-texts" that they think
might give my z.m.l. a run for its money on difficulty.
so we really don't have an answer to that yet, do we?


>   Of course, we can argue whether the test 
>   as I describe above is too strict, or maybe not even on-target. 

well, my primary aim is sighted people, so your test is not
"on-target", but that's ok, i understand what your point is.

i should note, however, that blind people seem to me to be
the most delighted group of users that project gutenberg has,
and are probably the people _most_ appreciative of plain text.

all this in spite of the fact that there is _no_ semantic markup 
-- and very little structural markup either -- in the e-texts.
no, it appears the magic formula for _that_ has been simple -- 
get everything else _out_of_the_way_ of the words themselves.

i will let you think about that...


>   But keep in mind this is what the *accessibility community* 
>   wants in machine-readable textual documents, and 
>   what they are working towards in their activities -- they've 
>   wholeheartedly embraced XML-based approaches, for example.

they've been misled to believe the promises just like everyone else.


>   To wave one's hand in dismissal 

it is dishonest to try to imply i am "waving my hand in dismissal".
please don't do that.


>   and say they are being unrealistic or stupid, 

i, of course, have never said anything like that.  don't say that i have.
please don't do that.


>   or that they don't really matter in our decision-making, 

it is unseemly of you to put those kind of words in _my_ mouth.
please don't do that.


>   is a pretty bigoted and "blind" position (pun intended) to take

which is what makes it so distasteful.  so just stop it.
please don't do that.


>   -- it is also stupid since meeting their needs for 
>   structure and semantics has many other benefits as well. 

enough, jon.
please don't do that.


>   I might ask a few text-to-speech experts I know at DAISY 
>   to look at the ZML system and tell me if it has
>   sufficient structural granularity for 
>   high-quality text-to-speech purposes. 

the judgement of bureaucrats doesn't impress me.
i'll listen to the reports of blind users themselves.


>   As far as I am concerned, if they come back and say 
>   "no it doesn't", then I would recommend that 
>   PG should not consider ZML for its Master format

i'm not seeking your endorsement, jon, so please feel free to
make any recommendation to project gutenberg that you want
concerning what they should consider for their master format.


>   but maybe consider ZML for its plain text output versions.

whatever.


>   Bold lines which appear by themselves in the flow of text 
>   are sometimes used for structures other than headers. 

my routines are not so brain-dead as to be confused by that.
but thanks for enlightening me.


>   There are many other similar weirdities involved with 
>   italicized text, indented text, etc., that we see in
>   visual layouts of texts. 

please do let me know about any mistakes that my routines 
make on any e-text in the library if you review my program,
as i am sure there are "weirdities" i've not yet come across.


>   Context is often important to consider to 
>   unambiguously discern structure for a visual cue. 
>   For example, one convention often used is that 
>   the names of ships is to be italicized. Thus, 
>   if a machine is to discern the name of a ship from 
>   linguistically emphasized text, it has to look at the context.

that's a very good example, jon, so i'll discuss it a bit.

my approach is to have the o.c.r. program _retain_text_styling_.
so if the ship-name was italicized in the original book, it would
continue to be italicized in the o.c.r. text (assuming recognition),
and that would carry through all the editing to the final version.

unless the person creating the digital version were to indicate
that those italics represented a ship-name, they would remain as
simple italics, and an end-user would be on her own to know why.
_just_like_she's_on_her_own_when_she_reads_a_paper-book_.

you might consider it to be some huge problem that the reader
doesn't know _exactly_why_ something is being italicized, but 
i don't think it is, because they virtually always figure it out...

even a blind reader can figure it out.  heck, even in the e-texts
with the italics stripped out, the blind reader can figure it out.

if you asked any of those readers -- sighted or blind -- how much
money they would pay to have that information supplied, to assess
how much _value_ they place on it, they would laugh in your face.
and that's _all_ you need to know about _that_ cost-benefit ratio.

in the _rare_ case where that information _might_ be valuable,
i have ways to mark it.  and as soon as you show me those cases,
and show me exactly how your x.m.l. markup provides a solution,
i will be quite happy to show you exactly how i would do it too.


>   No, I'd say it is more accurate to say "for reading by eyesight,
>   structure is represented by visual presentation cues." 

you're talking more about _output_ here.
whereas i am talking about _input_ instead.

i'm talking about how to examine the p-book -- specifically,
the o.c.r. that results from it -- to automatically determine
the structure of the text.  that structured text can then be
rendered visually (on-screen or paper) or via text-to-speech.

when i talk about "presentation", i'm talking about the p-book
that we work with as our original source.

however, in an aside, i've never even heard this _discussed_ yet,
not here or anywhere else for that matter, but the time has come
where we can expect to start seeing (or should i say "hearing")
books that have been "input" using voice-recognition technology.
in other words, the age of scanning might come to an abrupt end,
or taper off significantly, when people start creating e-books by
reading a book aloud into a voice-recognition system.  they are
remarkably improved these days, according to everything i read,
plus their cost might fall _considerably_ in the near future too,
and the number of people who might be willing to "enter" a book
in this manner is probably far greater than those willing to scan.
of course, it will take a new kind of software program to "fix"
the transcription errors that will occur using this input method,
but maybe that's already a part of these systems, i don't know...
not making any predictions here, just keeping my eye open for it.

what this might mean for blind people, i don't even have to say...


>   Remember, there are different types of presentation of text, 
>   not only visual.

the mac has had text-to-speech for well over a decade now, jon,
right in the system.  i've already put it in some of my e-book apps.


>   To focus on visual as the only form of presentation that matters 
>   is being very short-sighted (pun intended.)

good pun, if there can be said to be such a thing...     ;+)
but making the point to me is totally unnecessary.


>   And I've stated the core question to answer is:
>   "Is ZML (or any other system of regularized plain text) 
>   sufficient to represent document structure and semantics 
>   for Project Gutenberg Master texts?"

that _is_ the right question.


>   I assume Bowerbird is saying "yes"

there's no reason to "assume" that i am saying "yes".
i've actually _said_it_, over and over and over again.

and built a test-suite to prove it.


>   and many others here are saying "No". 

well, most everyone who has spoken up has said "no".

(dale and maybe james have given a limp "perhaps".)

and there might be some lurkers who i have convinced.

but by and large, all the loudmouths have loudly said "no".


>   I answer the question with a "No". 

well, thanks for putting yourself firmly on the record jon.
again.


>   Amusingly, Networker, a very insightful ebook expert who 
>   often posts to The eBook Community, calls ZML a type of ITF, 
>   "Impoverished Text Format", to indicate ZML has
>   insufficient granularity -- it is "impoverished".

well, heck, jon, if the only thing i'd ever heard about z.m.l.
was the one-sided "descriptions" you've given it over there,
i would think that it sounded like a ludicrous idea too.

networker will come around when he sees the real thing.
everyone will.  after all, the proof _is_ in the pudding...

-bowerbird
From hart at pglaf.org  Fri Oct 22 17:49:02 2004
From: hart at pglaf.org (Michael Hart)
Date: Fri Oct 22 17:49:05 2004
Subject: [gutvol-d] Re: Languages in PG
In-Reply-To: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com>
References: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com>
Message-ID: <Pine.LNX.4.60.0410221747080.3421@pglaf.org>


On Fri, 22 Oct 2004, Joshua Hutchinson wrote:

> Interesting... Can any of those sites be raided for content to bolster our 
> German titles?  (I can't read German, so my checking directly wouldn't do me 
> any good!)
>
> Josh

I forwarded this directly to our German Team leader to check for us.

Also, anyone interested might also want to take a look at Gunther Hille's
for the Gutenberg Projekt-DE

mh


[DE is the German abbr. for Germany]

>
> ----- Original Message ----- From: Karl Eichwalder <ke@gnu.franken.de> To: 
> gutvol-d@lists.pglaf.org Subject: [gutvol-d] Re: Languages in PG Date: Fri, 
> 22 Oct 2004 21:33:49 +0200
>
>>
>> Andrew Sly <sly@victoria.tc.ca> writes:
>>
>>> Also, the numbers below (taken from the catalog) show that,
>>> although PG's non-english content can certainly be expanded,
>>> it is not insignificant:
>>> French (367)
>>> German (307)
>>> Finnish (85)
>>> Chinese (69)
>>> Spanish (59)
>>> Italian (36)
>>
>> Not too bad.  German is "slow" because many good texts are available
>> elsewhere.  It starts with http://gutenberg.spiegel.de; continues with
>> sites dedicated to special authors like Karl May, Arno Schmidt, Novalis,
>> or Georg Simmel; and does not end with digitizing projects located at
>> Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck).
>> Especially the Austrian project (alo - austrian literature online:
>> http://www.literature.at/) is very interesting even if seem to offer
>> only PDF "for free".
>>
>> More German texts are tracked at http://www.litlinks.it
>>
>> --
>>                                                          |      ,__o
>>                                                          |    _-\_<,
>> http://www.gnu.franken.de/ke/                            |   (*)/'(*)
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d@lists.pglaf.org
>> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From ke at gnu.franken.de  Fri Oct 22 19:51:52 2004
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Fri Oct 22 20:31:03 2004
Subject: [gutvol-d] Re: Languages in PG
In-Reply-To: <Pine.LNX.4.60.0410221747080.3421@pglaf.org> (Michael Hart's
	message of "Fri, 22 Oct 2004 17:49:02 -0700 (PDT)")
References: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com>
	<Pine.LNX.4.60.0410221747080.3421@pglaf.org>
Message-ID: <shd5zakuyf.fsf@tux.gnu.franken.de>

Michael Hart <hart@pglaf.org> writes:

> On Fri, 22 Oct 2004, Joshua Hutchinson wrote:
>
>> Interesting... Can any of those sites be raided for content to bolster
>> our German titles?

At least, you can use theirs texts for comparison purposes.  Some of
them are "hidden" behind web interfaces (frames/javascript) and highly
fragmented...

> I forwarded this directly to our German Team leader to check for us.
>
> Also, anyone interested might also want to take a look at Gunther
> Hille's for the Gutenberg Projekt-DE

Gutenberg-DE is now to be found unter http://gutenberg.spiegel.de.

-- 
                                                         |      ,__o
                                                         |    _-\_<,
http://www.gnu.franken.de/ke/                            |   (*)/'(*)
From cweyant at twcny.rr.com  Sat Oct 23 05:33:36 2004
From: cweyant at twcny.rr.com (Curtis A. Weyant)
Date: Sat Oct 23 05:29:41 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <417930B5.5040907@perathoner.de>
References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com>
	<417930B5.5040907@perathoner.de>
Message-ID: <417A4FA0.3080206@twcny.rr.com>

Marcello Perathoner wrote:

> No. All you can define inside an XML file is the DTD (or other schema) 
> you want to use and entities like &myentity;

That's not true. You can define a full DTD (or a subset of one) within 
the XML document itself if you want. The W3C gives the following example 
in the XML 1.0 (3rd ed.) spec:

	<?xml version="1.0" encoding="UTF-8" ?>
	<!DOCTYPE greeting [
	  <!ELEMENT greeting (#PCDATA)>
	]>
	<greeting>Hello, world!</greeting>

This is a fully valid and well-formed XML file with the DTD defined in 
the DOCTYPE header instead of in a separate DTD file.

Of course, while you _can_ do that, it's probably not the best way.

Curtis.
From marcello at perathoner.de  Sat Oct 23 05:44:02 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 23 05:44:47 2004
Subject: [gutvol-d] presentation *is* structure (it's right in front	of
	your eyes)
In-Reply-To: <ba.6368d422.2eab025d@aol.com>
References: <ba.6368d422.2eab025d@aol.com>
Message-ID: <417A5212.4080700@perathoner.de>

Bowerbird@aol.com wrote:

> and indeed, that's my essence:  use the most simple system possible.

Use the most simple tool that does the job, but don't use a simpler one.

If you had done any research on ebooks before self-proclaiming yourself 
demi-god, you my have noticed that your toy markup language is woefully 
underpowered. You don't even handle the very first page of Sherlock Holmes.

Mark this up in ZML.

Note that "Being a reprint" is a subtitle to "Part I" and not a 
paragraf. Same goes for "Mr. Sherlock Holmes".

Note also that: "John H. Watson" is emphasized, although it's the only 
part of the title that's not italic.


---

PART I.

_Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D.,
_late of the Army Medical Department._


CHAPTER I.

MR. SHERLOCK HOLMES.


IN the year 1878 I took my degree of Doctor of Medicine
of the University of London, and proceeded to Netley to go
through the course prescribed for surgeons in the army.
...

---


> but i note once again, for the record, that no one has
> yet given me a list of "hard e-texts" that they think
> might give my z.m.l. a run for its money on difficulty.
> so we really don't have an answer to that yet, do we?

How about doing your homework yourself? The world at large was not 
created to do your bidding.

Go, find a slew of difficult texts, mark them up, fix your program and 
show us what you can do.

But, please, stop whining about us not doing your work.


> of course, it will take a new kind of software program to "fix"
> the transcription errors that will occur using this input method,
> but maybe that's already a part of these systems, i don't know...

Again, researching your stuff before starting a colossal handwave is out 
of the question.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Oct 23 06:01:12 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 23 06:01:57 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <417A4FA0.3080206@twcny.rr.com>
References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com>	<417930B5.5040907@perathoner.de>
	<417A4FA0.3080206@twcny.rr.com>
Message-ID: <417A5618.5030809@perathoner.de>

Curtis A. Weyant wrote:

> That's not true. You can define a full DTD (or a subset of one) within 
> the XML document itself if you want. The W3C gives the following example 
> in the XML 1.0 (3rd ed.) spec:
> 
>     <?xml version="1.0" encoding="UTF-8" ?>
>     <!DOCTYPE greeting [
>       <!ELEMENT greeting (#PCDATA)>
>     ]>
>     <greeting>Hello, world!</greeting>

You are right.

This way you could introduce some personal tags into a document and slip 
them past the validator.


> Of course, while you _can_ do that, it's probably not the best way.

It could become difficult to track which translator goes with which 
files. It is easier if you just reference one out of a known set of DTDs.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jlinden at projectgutenberg.ca  Sat Oct 23 11:13:15 2004
From: jlinden at projectgutenberg.ca (James Linden)
Date: Sat Oct 23 11:19:07 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <4177E9CB.7080200@perathoner.de>
Message-ID: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>

   Two points to mention before you read my actual reply:

   1) My apologies for the delay in responding. A configuration issue has
caused all my email to bounce from pglaf.org's mail server for the past
couple weeks, and only yesterday was I able to get it rectified. I will be
posting many delayed replies for the next few days.

   2) While the original email that I'm replying to here was written by
Marcello, my replies are not directed at him personally, but rather, at
everyone in the community.

----------------------------

   Of the many assumptions being made on the list these days, here are three
of the most erroneous -- and they all came in a single paragraph.

> We don't need more discussion about whether TEI is the right language, I
> think we are all agreed on that.

   Not everyone agrees with the use of TEI. I'm not even going to begin my
arguments again -- there would no point. Some of us, such as myself, foresee
issues, based on our own experiences, with using TEI (or varient) for PG
work. I'm resigned to the fact that it may be the only way to get XML into
PG at all, so I'll just deal with the issues on my own time at that point.
Simply put, TEI is one of the most verbose markup vocabularies available,
and using it for PG is going to turn off a LOT of people to XML. A simpler,
more concise vocabulary would be less intimidating!

> pgxml.org is dead

   PGXML is not dead at all. Just like other XML stuff in PG, it's been on
hiatus, mostly for two reasons: 1) lack of agreement in the community, and
2) lack of personal time to work on it. I will be meeting with the other
co-founder of pgxml.org (Ben Crowder) in November, after which time, we hope
to present a definitive plan for pgxml.org to the community.

> and ZML is good for laughs.

  You can laugh at ZML all you want, but from the examples and personal
discussion with Bowerbird, I have learned that ZML is not at all what most
people think it is. From the examples that I have seen, ZML is basically PG
vanilla text format, but cleaned up and normalized.

----------------------------

  The entire rest of this email is a rant, so please feel free to skip it.
You have been given the choice!

----------------------------

  Maybe if you learned to listen to other people, you'd not make such
erroneous assumptions. Maybe, just maybe, other people do have a clue, and
you aren't the only one that knows something.

  Yes, I have blocked Bowerbird from joining the PGXML list, but he is the
_only_ person that I've blocked. The only reason for this is because of
other people's reactions to him, not because of Bowerbird himself. While he
can be irritating and annoying, Bowerbird does have a clue about some of the
issues we have to deal with in PG work, particuarly in converting to other
formats, etc. If you don't like his attitude, ignore his posts. You can at
least try to extract the useful information that he does give from the flame
wars they often come in. This way, you might actually LEARN something.

  More mud is slung on this list than on ANY other list that I'm subscribed
to, but I have to admit, I'm only subscribed to about 200 active lists, so I
may be missing the mud-slinging ones. If you don't like the way something is
being done in PG, don't throw a hissy-fit. Get off your arse and do
something about it, or sit down and shut up, and let other people do what
they think should be done.

  I've made no secret of my personal opinions of PG:

    1) the website is a disgrace
    2) the archive is poorly organized
    3) the catalog system is a hack job done by unqualified people
    4) the PG text format is extremely disgusting
    5) PG makes volunteers work uphill to get anything done
    6) the lack of quality in our content offsets any gain from it

  Just because I have the opinions doesn't automatically mean I A) know what
I'm talking about, or B) have an alternative solution. Some of the biggest
technological innovations cames not from people who had a better idea, but
from people who knew the current idea wasn't that good, and were open minded
when the better idea came along.

  As a whole, I find that PG is not a very open-minded community. As a
community, we reguarly discourage people from volunteers, mostly because we
don't support them well. At times, not only do we not support them, but we
actively, and publically, bash their skulls.

  We lie to the general public about PG on a regular basis.

  When posting ebooks, we ignore the wishes of the volunteers who made the
texts.

  We don't even provide well-suited tools for the volunteers to use to
improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe
the tool has been offered to PG on a perpetual right to use for PG status,
but oh, lordy, that's just not good enough.

  We reguarly tell some of our hardest working volunteers that they are full
of crap (more or less). True, it's usually a paragraph long description of
what they are doing "wrong", but it's basically telling them their work is
unwanted.

  Do you realize that out of over 300 librarians I've talked to personally,
the first thing that came to mind for "ebooks" was the University of
Virginia's eText Library? PG simply is not a mover or a shaker in the ebook
world, regardless of the hocus pocus you might hear to the contrary.

----------------------------

  And, to all these things I've said, I'm a victim of some, but a
perpetrator of others. I'm just as guilty as you are. I have no soapbox,
only a conscience and a desire to change things.

----------------------------

  So, dangit all to heck, we don't have to be like this. PG _could_ be
great. PG SHOULD be great.

-- James


From marcello at perathoner.de  Sat Oct 23 12:39:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 23 12:40:02 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
References: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
Message-ID: <417AB35E.6010202@perathoner.de>

James Linden wrote:

>>pgxml.org is dead 
> 
>    PGXML is not dead at all.

I stand corrected. Sometime ago I noticed the domain was expired, but 
somebody re-registered it just yesterday. But I gather it wasn't you. 
Who is this ?


> $ whois pgxml.org
Domain ID:D105038884-LROR
Domain Name:PGXML.ORG
Created On:22-Oct-2004 20:47:27 UTC
Last Updated On:22-Oct-2004 20:47:48 UTC
Expiration Date:22-Oct-2005 20:47:27 UTC
Sponsoring Registrar:Go Daddy Software, Inc. (R91-LROR)
Status:TRANSFER PROHIBITED
Registrant ID:GODA-08593143
Registrant Name:Registration Private
Registrant Organization:Domains by Proxy, Inc.
Registrant Street1:15111 N Hayden Rd., Suite 160
Registrant Street2:PMB353
Registrant Street3:
Registrant City:Scottsdale
Registrant State/Province:Arizona
Registrant Postal Code:85260
Registrant Country:US
Registrant Phone:+1.4806242599
Registrant Phone Ext.:
Registrant FAX:
Registrant FAX Ext.:
Registrant Email:PGXML.ORG@domainsbyproxy.com
...

-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Sat Oct 23 12:56:32 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Oct 23 12:57:26 2004
Subject: [gutvol-d] re: the king of vaporware
Message-ID: <1e.3685a94c.2eac1170@aol.com>

james said:
>   More mud is slung on this list than on ANY other list 
>   that I'm subscribed to, but I have to admit, 
>   I'm only subscribed to about 200 active lists, 
>   so I may be missing the mud-slinging ones.

yeah, it's tough to keep up.        :+)

and those are just the listserves.

don't forget the forums...

like the one over at distributed proofreaders...

why just the other day (maybe yesterday?),
two minutes after you posted a message
where you mentioned pgxml.org, someone --
who must not know that kodekrash is you
-- said, "last i heard, pgxml was being run
by james linden, the king of vaporware..."

i guess he didn't know that's supposed to be
_my_ title...           :+)

anyway, like i said, it's tough to keep up...        ;+)

-bowerbird
From joshua at hutchinson.net  Sat Oct 23 13:04:54 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Oct 23 13:05:19 2004
Subject: [gutvol-d] re: the king of vaporware
In-Reply-To: <1e.3685a94c.2eac1170@aol.com>
References: <1e.3685a94c.2eac1170@aol.com>
Message-ID: <417AB966.8010204@hutchinson.net>

Bowerbird@aol.com wrote:

>said, "last i heard, pgxml was being run
>by james linden, the king of vaporware..."
>
>i guess he didn't know that's supposed to be
>_my_ title...           :+)
>  
>
Nah... You're the clown prince of vaporware.  The king position is still 
open.

While I haven't seen much beyond talk from James, he is usually very 
informed and reasonable.  I don't always agree with him, but I respect 
*his* opinions.

Josh
From joshua at hutchinson.net  Sat Oct 23 13:09:40 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Oct 23 13:10:03 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
References: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
Message-ID: <417ABA84.80705@hutchinson.net>

James Linden wrote:

<snipped a lot of stuff>

There are two things I wanted to say to this:

One was that other than bowerbird, the rest of the discussions, while 
sometimes passionate, are not typically mean spirited.  We just have 
people who feel passionate about their particular vision of what PG 
should be (as it looks like you feel as well, from your post).  In the 
end, most of the people who write the most respect each other and each 
other's opinions.  And discussion of the topics at hand is the only 
method we have of moving forward.

The second is that while a more simple XML markup, like what you loosely 
described as PGXML, sounds wonderful on the surface ... it requires, 
once again, largely reinvented the wheel AND not being compatible with a 
standard that seems to be gaining momentum out there.  Granted, the very 
nature of XML makes converting from a home-grown markup to TEI a 
possibility, removing the need to convert would seem to be the wiser path.

Josh
From marcello at perathoner.de  Sat Oct 23 13:29:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 23 13:30:00 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
References: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
Message-ID: <417ABF16.40203@perathoner.de>

James Linden wrote:

>   You can laugh at ZML all you want, but from the examples and personal
> discussion with Bowerbird, I have learned that ZML is not at all what most
> people think it is. From the examples that I have seen, ZML is basically PG
> vanilla text format, but cleaned up and normalized.

Because Bowerbird is more enamoured in hearing himself talking than in 
doing any research whatsoever, ZML doesn't address the simplest issues 
of text markup.

Bowerbird just somehow found the DP rules for formatting proofed texts 
and amplified them with some rather sub-optimal ad hoc extensions, like 
the use of tabs for marking centered text etc.

Altogether ZML is not much better (more likely worse) than what DP is 
outputting right now.


>   Yes, I have blocked Bowerbird from joining the PGXML list, but he is the
> _only_ person that I've blocked. The only reason for this is because of
> other people's reactions to him, not because of Bowerbird himself. 

That is an original way of seeing things ...


>   I've made no secret of my personal opinions of PG:
> 
>     1) the website is a disgrace
>     3) the catalog system is a hack job done by unqualified people

That's what I've fixed in the last year.


>     4) the PG text format is extremely disgusting

That's what I tried to fix. But ran against 5)

>     5) PG makes volunteers work uphill to get anything done


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jlinden at projectgutenberg.ca  Sat Oct 23 13:43:16 2004
From: jlinden at projectgutenberg.ca (James Linden)
Date: Sat Oct 23 13:45:02 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <417AB35E.6010202@perathoner.de>
Message-ID: <BJENJKIGOCIKEDJIGCBBAECFDCAA.jlinden@projectgutenberg.ca>

Oh geez... I'm going to ask Ben what happened. :-( Ok, so as far as I know
pgxml.org doesn't belong to us anymore, but that doesn't make it dead, just
the domain will have to change.

-- James

> -----Original Message-----
> From: gutvol-d-bounces@lists.pglaf.org
> [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Marcello
> Perathoner
> Sent: Saturday, October 23, 2004 3:39 pm
> To: Project Gutenberg Volunteer Discussion
> Subject: Re: [gutvol-d] barriers to XML posting
>
>
> James Linden wrote:
>
> >>pgxml.org is dead
> >
> >    PGXML is not dead at all.
>
> I stand corrected. Sometime ago I noticed the domain was expired, but
> somebody re-registered it just yesterday. But I gather it wasn't you.
> Who is this ?
>
>
> > $ whois pgxml.org
> Domain ID:D105038884-LROR
> Domain Name:PGXML.ORG
> Created On:22-Oct-2004 20:47:27 UTC
> Last Updated On:22-Oct-2004 20:47:48 UTC
> Expiration Date:22-Oct-2005 20:47:27 UTC
> Sponsoring Registrar:Go Daddy Software, Inc. (R91-LROR)
> Status:TRANSFER PROHIBITED
> Registrant ID:GODA-08593143
> Registrant Name:Registration Private
> Registrant Organization:Domains by Proxy, Inc.
> Registrant Street1:15111 N Hayden Rd., Suite 160
> Registrant Street2:PMB353
> Registrant Street3:
> Registrant City:Scottsdale
> Registrant State/Province:Arizona
> Registrant Postal Code:85260
> Registrant Country:US
> Registrant Phone:+1.4806242599
> Registrant Phone Ext.:
> Registrant FAX:
> Registrant FAX Ext.:
> Registrant Email:PGXML.ORG@domainsbyproxy.com
> ...
>
> --
> Marcello Perathoner
> webmaster@gutenberg.org
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>


From jlinden at projectgutenberg.ca  Sat Oct 23 13:48:14 2004
From: jlinden at projectgutenberg.ca (James Linden)
Date: Sat Oct 23 13:50:00 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <417ABF16.40203@perathoner.de>
Message-ID: <BJENJKIGOCIKEDJIGCBBEECFDCAA.jlinden@projectgutenberg.ca>

> >   I've made no secret of my personal opinions of PG:
> >
> >     1) the website is a disgrace
> >     3) the catalog system is a hack job done by unqualified people
>
> That's what I've fixed in the last year.

  Yes, I have to admit that you've been working VERY hard on the site, but
in my own opinion, it is still not up to par. That's not really any fault of
your's tho.

  I wouldn't call your new catalog system "fixed", but it IS better than the
old CGI one. :-)

> >     4) the PG text format is extremely disgusting
>
> That's what I tried to fix. But ran against 5)
>
> >     5) PG makes volunteers work uphill to get anything done

  Hey, I'm with you there. :-|

-- James


From scott_bulkmail at productarchitect.com  Sat Oct 23 14:20:24 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sat Oct 23 14:32:13 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <417ABA84.80705@hutchinson.net>
References: <BJENJKIGOCIKEDJIGCBBOEBPDCAA.jlinden@projectgutenberg.ca>
	<417ABA84.80705@hutchinson.net>
Message-ID: <p0611040fbda0764f8e4a@[192.168.0.52]>

>The second is that while a more simple XML markup, like what you loosely described as PGXML, sounds wonderful on the surface ... it requires, once again, largely reinvented the wheel

"Reinventing the wheel" is often something to be avoided, but I'm not sure it's a compelling issue here.  First, there are other models to use, e.g. XHTML.  Second, the most important standard is XML itself.  That's what enables an incredible variety of tools and platforms; the specific DTD is much less important.  (In fact, XML's designers made sure it was useful even without a DTD.)  Third, TEI was created for a very different world: scholarly publishing.  If PG's markup was going to be done by paid experts, TEI would probably be the best choice.  But I'm not convinced it's appropriate for a volunteer organization.  XML can be much simpler than HTML, yet TEI is (IMHO) more complex not less.

I just finished converting The Wonderful World of Oz to PGTEI.  (I'll post it on Classicosm.com once I have a chance to write up my impressions.)

During my learning process, I came across an interesting comparison of Shakespeare marked up using TEI and an "ad hoc" markup used by Jon Bosak (a key inventor of XML).  Though the comparison was done by a TEI advocate, I think Jon's is a much better model for our purpose.

http://www.tei-c.org.uk/Sample_Manuals/mueller-main.htm
A very gentle introduction to the TEI
(the comparison is near the end -- look for the garish background colors)


>Granted, the very nature of XML makes converting from a home-grown markup to TEI a possibility, removing the need to convert would seem to be the wiser path.

The whole point of a master format is that PG is going to convert to other useful formats.  If TEI is useful in and of itself, that can be just another conversion.
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From scott_bulkmail at productarchitect.com  Sat Oct 23 14:30:09 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sat Oct 23 14:32:19 2004
Subject: [gutvol-d] scanned title pages
In-Reply-To: <20041022181003.GB24510@pglaf.org>
References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com>
	<02ed01c4b842$1c31baf0$6501a8c0@Unicorn>
	<p06110406bd9edd2f6eac@[192.168.0.52]>
	<20041022181003.GB24510@pglaf.org>
Message-ID: <p06110410bda07baed185@[192.168.0.52]>

Greg typed:
>I do have all of the title pages & verso pages submitted
>electronically.  This is thousands and thousands of images.

>Just email me if you need images for a particular item.  If
>you'd rather, I could package up the older clearances (pre-August '04
>or so) and get them to you.  It's probably < 2GB total.

Thanks much for the offer.  I've moved beyond cataloging onto the main part of my project.  If I ever revisit the entire catalog, it would be great to have a DVD of all the title+verso pages (if they don't get posted somewhere in the meantime).  Meanwhile, I may indeed make some one-off requests.


>N.B., this stuff is not suitable for public redistribution
>with our eBooks.  Many scans are not very high quality.  Some
>are, and it would be fine with me to make them publicly
>available somewhere.  I don't have much opinion about
>including these with the eBooks themselves - that's something
>for the producer to decide.  Most title & verso pages are pretty
>boring, though, so probably are not worth including as part
>of an eBook.

I completely agree that there's no reason to include these in the default .zip file for the HTML or any other edition/format.  I just think it's important to make them available somewhere people can find them, e.g. for librarians, scholars or other catalogers.
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From shalesller at writeme.com  Sat Oct 23 16:50:34 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Oct 23 16:51:29 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <20041023235034.B91FC4BDA9@ws1-1.us4.outblaze.com>

Scott Lawton writes:

> During my learning process, I came across an interesting 
> comparison of Shakespeare marked up using TEI and an "ad 
> hoc" markup used by Jon Bosak (a key inventor of XML).  
> Though the comparison was done by a TEI advocate, I think 
> Jon's is a much better model for our purpose.
> 
> http://www.tei-c.org.uk/Sample_Manuals/mueller-main.htm
> A very gentle introduction to the TEI
> (the comparison is near the end -- look for the garish background colors)

I'd disagree. The bibliographic information in Jon's is insufficent;
it doesn't even give us an author, much less the rest of the information
we need to have computer readable. Once we've dumped the <hi rend="i">
lines in the TEI, the main difference is the line numbers (which are 
optional in TEI and necessary for some books) and the capacity to
do split metric lines, which is again necessary for some books. 

-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From shalesller at writeme.com  Sat Oct 23 22:44:46 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Oct 23 22:45:48 2004
Subject: [gutvol-d] barriers to XML posting
Message-ID: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com>

"James Linden" writes:
>   Yes, I have blocked Bowerbird from joining the PGXML list, but he is the
> _only_ person that I've blocked. The only reason for this is because of
> other people's reactions to him, not because of Bowerbird himself. 

Just like we innocuate against polio, not because of polio itself,
but because of other people's reactions to it?

> You can at
> least try to extract the useful information that he does give from the flame
> wars they often come in. This way, you might actually LEARN something.

My stress level is high enough without listening to a self-centered egomaniac
who has repeatedly stated his unwillingness to work with me.

>   More mud is slung on this list than on ANY other list that I'm subscribed
> to, but I have to admit, I'm only subscribed to about 200 active lists, so I
> may be missing the mud-slinging ones. 

I haven't seen that at all. It really doesn't compare to debian-devel and
other lists I've been on at all.

>   I've made no secret of my personal opinions of PG:
> 
>     1) the website is a disgrace
>     2) the archive is poorly organized
>     3) the catalog system is a hack job done by unqualified people
>     4) the PG text format is extremely disgusting
>     5) PG makes volunteers work uphill to get anything done
>     6) the lack of quality in our content offsets any gain from it

[More wild criticisms of PG]

What makes you think this is at all constructive? You use emotionally
charged words--"disgrace", "hack job", "disgusting"--and criticize 
almost everything about PG. I think it would be much more constructive
in the future to deal with one issue at a time.

> We don't even provide well-suited tools for the volunteers to use to
> improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe
> the tool has been offered to PG on a perpetual right to use for PG status,
> but oh, lordy, that's just not good enough.

I have no idea what the context was on this, and that would be terribly 
helpful. I'm sceptical to the idea that PG would turn down a great 
improvement on our current tools merely because they aren't open source.
However, open source is about the flexibility to get the job done.
There was a non-web based frontend to DP, but it had to be abandoned
because the author disappeared and nobody had the source to fix it as
the site changed. Having an open-source program means we can fix it,
we can port it, and we don't have to worry about whether we're using it
for "PG status" or for Rastko or for our private project. That's a
valuable thing, and something that PG should push for when possible.


-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From gbnewby at pglaf.org  Sun Oct 24 11:28:57 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Oct 24 11:28:59 2004
Subject: [gutvol-d] barriers to XML posting
In-Reply-To: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com>
References: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com>
Message-ID: <20041024182857.GA28975@pglaf.org>

On Sat, Oct 23, 2004 at 09:44:46PM -0800, D. Starner wrote:
> "James Linden" writes:
...
> > We don't even provide well-suited tools for the volunteers to use to
> > improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe
> > the tool has been offered to PG on a perpetual right to use for PG status,
> > but oh, lordy, that's just not good enough.
> 
> I have no idea what the context was on this, and that would be terribly 
> helpful. I'm sceptical to the idea that PG would turn down a great 
> improvement on our current tools merely because they aren't open source.
> However, open source is about the flexibility to get the job done.
> There was a non-web based frontend to DP, but it had to be abandoned
> because the author disappeared and nobody had the source to fix it as
> the site changed. Having an open-source program means we can fix it,
> we can port it, and we don't have to worry about whether we're using it
> for "PG status" or for Rastko or for our private project. That's a
> valuable thing, and something that PG should push for when possible.

I've never heard or seen a "party line" on open source from/for
PG.  Yes, it's preferable for the reasons David mentions.  Yes,
it's often free (which matters a lot when we want volunteers to
get their own copy).  Yes, there is a conceptual alignment between
PG's efforts to enhance the public domain and many open source
philosophies.  But we're essentially pramatists, using the tools
we have available to do the best job we can do.  

If people have tools to offer, they can/should offer them.  There
is a full range of tools in use at PG, and lots of stuff we 
developed ourselves.  There's always room for more tools that people
might be able to use.
  -- Greg
From joshua at hutchinson.net  Mon Oct 25 07:03:27 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 25 07:03:30 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
Message-ID: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com>

Continuing my evalaution of Marcello's PGTEI setup on the gutenberg website (http://www.gutenberg.org/tei/)...

I used the same Declaration of Independence file I used last week to comment on the XML markup itself.  This time I'm converted that XML file to HTML and TEXT using the online services section.  Below are the bulleted items that *I* believe need some improvement.

If anyone wants to duplicate my conversions, see my post from last week that contained the XML I used (or send me a quick e-mail and I'll forward the file on).

Josh

***

HTML conversion items:

1 - First thing that jumps out is the need for bigger left and right margins.  This is a simple CSS change.  Currently, DP has *mostly* standardized on 10% margins on the left and right.  This gives some nice white space for easier reading and gives room for things like original source page numbers and sidenotes to be put in the margin area.

2 - If the author field is left blank, the conversion shouldn't put a "by" out there all by itself.  Both the HTML and the TEXT version have this dangling word.

3 - The publication and edition date are both being printed, but it isn't clear which is which.  Maybe put "Original publication date:" label before the date itself?

4 - Since the title, author, etc. is already list in the first few lines, the second listing below the gutenberg disclaimer line is redundant.  Also, in that same spot, the language code is printed, which is nice, but I would suggest changing the format slightly.  Namely, put the language code in brackets after the written out language.

  i.e.:  English-United States [en-us]

For most of us normal humans, the language codes are not intuitive.

5 - In the CONTENTS section, if there are no footnotes/endnotes, don't list a NOTES section.

6 - Use standard HTML paragraph spacing.  Right now, the CSS specifies no blank line between paragraphs and an indent to the beginning of each paragraph.  While this matches the original paper source, for me at least, it is jarring to read on a computer screen.  This type of formatting would make perfect sense in the PDF conversion, since that one is geared for printing on paper.

7 - Need a horizontal rule (75% width seems right to me) between the CONTENTS section and the first section of the text.  Right now, they run together.

8 - Need horizontal rule between major divisions of the text.  Currently, the large type header gives a visual indication, but I don't believe it is enough.

9 - No need for the extra horizontal rule to mark off the FOOTNOTES section if there is no footnote section in that text.  Currently, this situation makes for two horizontal rules in a row in a text with no footnotes.

***

TEXT conversion items:

1 - It lists "The Project Gutenberg EBook of" twice.

2 - Has a dangling "by" line even when no author is specified.

3 - Same redundant title/author info as in the HTML conversion.

4 - Notes section appears whether there are any footnotes or not.


From joshua at hutchinson.net  Mon Oct 25 07:21:13 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 25 07:21:16 2004
Subject: [gutvol-d] Request for a PGTEI web form developer...
Message-ID: <20041025142113.2CA442F8FE@ws6-3.us4.outblaze.com>

The hardest part of creating a PGTEI text right now is the header information.  It is simply confusing to parse for a human and is very dense in information.  This will be the section that most quickly kills people enthusiasm for the format.

However, this section is very important because there is SO MUCH good information stored there.  And it is information that needs to be in XML because it is exactly the type of information most likely to be accessed by a computer indexing program or something similar.

That said, I am asking for someone with good web developing skills to set up to the plate and create a web page that allows a human to enter information in a nicely formatting web form and then spits out the PGTEI compliant header to be included in the beginning of a converted text.

I am very willing to help test such a beast and provide whatever help I can, but my HTML skills don't extend beyond the normal layout variety.  I've never played with forms, much less a backend that can create output from the web form input.

Thank you in advance to any willing to take this on!

Josh
From marcello at perathoner.de  Mon Oct 25 07:36:06 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon Oct 25 07:36:10 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
In-Reply-To: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com>
References: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com>
Message-ID: <417D0F56.4080405@perathoner.de>

Joshua Hutchinson wrote:

> 1 - First thing that jumps out is the need for bigger left and right
> margins.  This is a simple CSS change.  Currently, DP has *mostly*
> standardized on 10% margins on the left and right.  This gives some
> nice white space for easier reading and gives room for things like
> original source page numbers and sidenotes to be put in the margin
> area.

OTOH I like to read texts in a small (horizontally) browser window so I 
can put a shell window and the browser window on one screen. The shell 
is usually compiling something or doing boring work. If the shell 
stumbles over something I can immediately switch over, correct and 
switch back to my reading.

Big margins in the browser window would definitely be a major annoyance.

I think, the CSS provided is just an example. Everybody here has enough 
skills to build a CSS he/she likes. For the end user we may consider an 
"alternate stylesheet" model where she may switch between a set of 
predefined ones.


> 6 - Use standard HTML paragraph spacing.

Same as above.


> 7 - Need a horizontal rule (75% width seems right to me) between the
> CONTENTS section and the first section of the text.  Right now, they
> run together.
> 
> 8 - Need horizontal rule between major divisions of the text.
> Currently, the large type header gives a visual indication, but I
> don't believe it is enough.

Use the rend="newpage" or rend="newdoublepage" attribute on a div, 
front, back element like eg.:

   <div rend="newpage" type="chapter">

This will start a new page on paginated media and put a rule on HTML.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Mon Oct 25 07:45:22 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 25 07:45:26 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
Message-ID: <20041025144522.560E1109764@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> > 1 - First thing that jumps out is the need for bigger left and right
> > margins.  This is a simple CSS change.  Currently, DP has *mostly*
> > standardized on 10% margins on the left and right.  This gives some
> > nice white space for easier reading and gives room for things like
> > original source page numbers and sidenotes to be put in the margin
> > area.
> 
> OTOH I like to read texts in a small (horizontally) browser window so I 
> can put a shell window and the browser window on one screen. The shell 
> is usually compiling something or doing boring work. If the shell 
> stumbles over something I can immediately switch over, correct and 
> switch back to my reading.
> 
> Big margins in the browser window would definitely be a major annoyance.
> 
> I think, the CSS provided is just an example. Everybody here has enough 
> skills to build a CSS he/she likes. For the end user we may consider an 
> "alternate stylesheet" model where she may switch between a set of 
> predefined ones.
> 

That is why a 10% margin is used instead of a fixed value.  On a small window, the margin almost disappears, while on a "normal" sized window, it provides the white space.

Also, while I agree CSS changes are the fix for these, what I am trying to do here is help to create a "standard" conversion that is workable.  We don't really want the volunteer to HAVE to create their own CSS.  We want them to have the ability to, of course, if they want to, but the standard conversion should have a baseline that is the best we can do.

NOTE: Many of my suggestions are my personal opinion, such as the margins, and part of the purpose here is to get conflicting opinions for others.  So, coming to a baseline style consensus is also a dual objective here.

> 
> > 6 - Use standard HTML paragraph spacing.
> 
> Same as above.
> 

Which should be our baseline style, though?  If other people like the printer style paragraphs better, that's fine.  This is, again, my opinion here.

> 
> > 7 - Need a horizontal rule (75% width seems right to me) between the
> > CONTENTS section and the first section of the text.  Right now, they
> > run together.
> > 
> > 8 - Need horizontal rule between major divisions of the text.
> > Currently, the large type header gives a visual indication, but I
> > don't believe it is enough.
> 
> Use the rend="newpage" or rend="newdoublepage" attribute on a div, 
> front, back element like eg.:
> 
>    <div rend="newpage" type="chapter">
> 
> This will start a new page on paginated media and put a rule on HTML.
> 

Cool, I learned something here.  That takes care of that concern (at least to my mind, it does).

Josh
From joel at oneporpoise.com  Mon Oct 25 08:11:53 2004
From: joel at oneporpoise.com (Joel A. Erickson)
Date: Mon Oct 25 08:30:08 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
References: <20041025144522.560E1109764@ws6-4.us4.outblaze.com>
Message-ID: <003001c4baa4$f604a940$6501a8c0@JOEL>

Joshua Hutchinson wrote:
> NOTE: Many of my suggestions are my personal opinion, such as
> the margins, and part of the purpose here is to get conflicting opinions
> for others.  So, coming to a baseline style consensus is also a dual
> objective here.

I vote for 10% margins and indenting paragraphs with a space above about 1/2 
to 3/4ths the height of a standard line. 

From jon at noring.name  Mon Oct 25 09:50:22 2004
From: jon at noring.name (Jon Noring)
Date: Mon Oct 25 09:50:32 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
In-Reply-To: <417D0F56.4080405@perathoner.de>
References: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com>
	<417D0F56.4080405@perathoner.de>
Message-ID: <48961996703.20041025105022@noring.name>

Marcello wrote:
> Joshua Hutchinson wrote:

>> 1 - First thing that jumps out is the need for bigger left and right
>> margins.  This is a simple CSS change.  Currently, DP has *mostly*
>> standardized on 10% margins on the left and right.  This gives some
>> nice white space for easier reading and gives room for things like
>> original source page numbers and sidenotes to be put in the margin
>> area.

> ... I think, the CSS provided is just an example. Everybody here has
> enough skills to build a CSS he/she likes. For the end user we may
> consider an "alternate stylesheet" model where she may switch
> between a set of predefined ones.

The beauty of transforming "standardized" TEI documents into XHTML
[see note at end] is that, when done right (with no presentational
markup), the XHTML for all the documents will itself be uniform and
standardized, thus amenable to swappable CSS style sheets which can be
applied to almost the whole collection, if not all of it. Of course,
the documents will also be reasonably accessible since accessibility
is enhanced by this approach.

A favorite site of mine which demonstrates the power of swappable CSS
is "CSS Zen Garden", http://www.csszengarden.com/ , which essentially
uses the same, high quality (and accessible) document, and invites
anyone to submit their own CSS style sheet -- hundreds of style sheets
have been submitted so far from many web designers/artists/enthusiasts.
It's amazing to see the variation of complex styling which can be
applied to such a simple document (try viewing the base document
without CSS -- images are separate from the document and also
swappable in CSS Zen Garden.)

Certainly, how PG would enable style sheet swapping may be different
than how CSS Zen Garden does it, but that's beside the point. The
important point is that it can be done, and will be an exciting
addition to PG by allowing readers to "have it their way" rather than
"having it our way." We will not have to argue on whether we want 10%
or 20% margins, etc. This will also entice many to submit their own
CSS designs for people to use. But it all starts with the Master
markup being done *right*.

Jon Noring


[Note referenced above: This indicates that there should be NO
presentational markup in the source TEI-conforming documents -- to
take a pure structural/semantic approach to markup. About XHTML, the
documents spit out from XSLT should be XHTML 1.1, or at least the
content markup itself between <body>...</body> be valid to XHTML 1.1.
I suppose we could also offer a "legacy", pre-styled, non-CSS HTML
for those running really old and crusty, non-CSS browsers.]

From joshua at hutchinson.net  Mon Oct 25 10:23:43 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Mon Oct 25 10:23:51 2004
Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website
Message-ID: <20041025172343.43DD410976D@ws6-4.us4.outblaze.com>

----- Original Message -----
From: Jon Noring <jon@noring.name>
> Marcello wrote:
> > Joshua Hutchinson wrote:
> 
> Certainly, how PG would enable style sheet swapping may be different
> than how CSS Zen Garden does it, but that's beside the point. The
> important point is that it can be done, and will be an exciting
> addition to PG by allowing readers to "have it their way" rather than
> "having it our way." We will not have to argue on whether we want 10%
> or 20% margins, etc. This will also entice many to submit their own
> CSS designs for people to use. But it all starts with the Master
> markup being done *right*.
> 

I agree that the CSS provides a powerful and easy way to have different formats.  However, we need a "standard" format as a base.  The default style is the one that Joe Sixpack, for instance, will see when he clicks on the HTML doc link at the main PG website.  We can then have the ability to "swap" CSS at the click of a button, but that functionality is somewhere down the road.  We need a functional style in place now for this to move forward (back to Marcello's baby steps).

Some of the issues I brought up are purely presentational and fall under the CSS heading.  Some are functional (ie, the extra PG header at the beginning of the TEXT version).  I think both need to be addressed and a "fix" decided on before XML can move forward.

Currently, I'm working on marking up The Hunting of the Snark in PGTEI to see what further issues are introduced by poetry markup.  As expected, there are a few and they are largely presentational in nature (CSS), but they need to be addressed to make sure that the XML itself is sufficient to the task of handling the content.

Josh
From joshua at hutchinson.net  Tue Oct 26 06:33:04 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 06:33:08 2004
Subject: [gutvol-d] Final PGTEI run-thru for a while...
Message-ID: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>

This e-mail concludes the "common" items I want to check in PGTEI.  With the items in my previous tests and the ones today, I could mark up 95%+ of what I see in DP.  So, my next step is to start trying to understand how the transforms work and see what *I* can do to improve things.  Expect this to be slow going, folks, I'm an old English major who likes computers, so I have to puzzle things out as I go!  :)

****

For this one, I create a hodge-podge of stuff from the beginning of The Hunting of the Snark and my own gibberish additions to get to all the features I wanted to test.  The XML file is attached at the end for anyone wanting to reproduce my tests.  As before, the conversion was done with Marcello's online transforms at http://www.gutenberg.org/tei/services/tei-online.

****

Table markup:

1 - XML was very straight forward.  It is similar to HTML table markup, just with slightly different tag words. <row> instead of <tr>.  <cell> instead of <td>.  All in all, the more human friendly tags in the XML are easier to parse than the HTML.

Under the HTML conversion, the tables came out well.  No complaints there.

Under the TEXT conversion, the small table came out well.  However, when I used longer data items in the second table, the TEXT conversion did not do so well.  Basically, the text conversion does not try to line wrap the table cells at all, the table grew to be extremely wide.  This one is a bit of a show stopper as far as automated conversion is concerned.  Granted, the tables could be manually edited, but that hurts the whole reason for using a master document format.

**

Footnote markup:

Again, no real complains on the footnotes/endnotes.  It is pretty straight forward once you read the formatting rules.  The nice thing is that the conversion process handles moving the notes to their proper location for you.

However, I did have one question that I couldn't find the answer to.  How would you handle sidenotes?  It looked like you could put a place="left" (or "right") in the <note> tag, but PGTEI doesn't support that.  Is that even the right semantic tag for a sidenote?

TEXT conversion had one glitch.  For some reason, the footnote listing at the end of the text did not put a number 1 in front of the first footnote.  The second footnote was labelled with a 2 correctly.  This problem was not present in the HTML conversion.

**

Page number markup:

No complaints.  I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern.

**

Blockquotes:

I wanted to markup a blockquote example, but I didn't see how.  Anyone out there know how to handle a blockquote with a text?

**

Poetry markup:

I had the most notes for poetry, so I left it for last in the markup.

1 - How should we markup poetry indents?  In HTML, I use &nbsp;&nbsp; toput two spaces for indents on the text....  *edit* I just found in Marcello's guide that he suggests using &emsp; as a quad indent.  Works for me, unless someone has a different suggestion.

2 - It was unclear to me at first, that a poetry fragment still needed <lg> around it.  <l> which marks off one line of poetry is insufficient, because the poem line would still be treated as inline in the sentence with just <l>.  Putting <lg> around it set it off on its own line.

3 - If I understand the markup right, <lg> represents a portion of the poem, such as a single stanza.  To represent the whole poem in one structural element, you need a higher level tag.  Would <div1> work ok here?  Or is the some poem tag I'm missing?

HTML results - Poetry is not marked off well.  The poems are flush with the left margin.  Adding a larger margin around the poem will help it appear distinct from the prose text around it.

Also, the paragraph indenting is affected by poetry.  Since the conversion only indents a paragraph if the previous line was the end of a paragraph, it doesn't indent after a poem.  This is taken care of if we revert to standard HTML paragraph spacing.

****

Josh

****

source.xml--

============

<?xml version="1.0" encoding="iso-8859-1" ?>

<!DOCTYPE TEI.2 SYSTEM "pgtei.dtd">

<TEI.2 lang="en-gb">
<teiHeader>
  <fileDesc>
    <titleStmt>
      <title>The Hunting of the Snark</title>
      <author><name>Lewis Carroll</name></author>
    </titleStmt>
    <editionStmt>
      <edition n="12">Edition 12 
        <date value="1992-3">March 1992</date>
      </edition>
    </editionStmt>
    <publicationStmt>
      <publisher>Project Gutenberg</publisher>
      <pubPlace><xref 

url="www.gutenberg.org">www.gutenberg.org</xref></pubPlace>
      <date value="1992-3">March 1992</date>
      <idno type='etext-file'>snark12</idno>
      <availability>
        <p>This eBook is for the use of anyone anywhere at no cost and with
	almost no restrictions whatsoever. You may copy it, give it away or
	re-use it under the terms of the Project Gutenberg License included
	online at <xref 

url="www.gutenberg.org/license">www.gutenberg.org/license</xref></p>
      </availability>
    </publicationStmt>
    <sourceDesc>
      <bibl>
        THE MILLENNIUM&nbsp;FULCRUM&nbsp;EDITION 1.2
      </bibl>
    </sourceDesc>
  </fileDesc>
  <encodingDesc>
    <classDecl>
      <taxonomy id="lc">
        <bibl>
	  <title>Library of Congress Classification</title>
	</bibl>
      </taxonomy>
    </classDecl>
  </encodingDesc>
  <profileDesc>
    <langUsage>
      <language id="en-gb">British</language>
    </langUsage>
    <textClass>
      <classCode scheme="lc">
        *** <!-- LoC Class (PR, PQ, ...) -->
      </classCode>
      <keywords>
        <list>
	  <!-- <item>***</item> any keywords for PG search engine -->
	</list>
      </keywords>
    </textClass>
  </profileDesc>
  <revisionDesc>
    <change>
      <date value="1992-3">March 1992</date>
      <respStmt>
        <name>unknown</name> <!-- email: *** -->
      </respStmt>
      <item>Project Gutenberg Edition</item>
    </change>
    <change>
      <date value="2004-10">October 2004</date>
      <respStmt>
        <name>Joshua Hutchinson</name> <!-- your email -->
      </respStmt>
      <item>TEI markup</item>
    </change>
  </revisionDesc>
</teiHeader>

<text>
  <front>
    <divGen type="titlepage" />
    <divGen type="pgheader" rend="newpage" />
    <divGen type="toc"      rend="newdoublepage" />
  </front>

  <body>

<div>

<index index="toc" />

<index index="pdf" />

<index index="pdb" />

<head> THE HUNTING OF THE SNARK </head>

<head type="sub">an Agony in Eight Fits</head>

<head type="sub">  Lewis Carroll </head>

<head type="sub">  THE MILLENNIUM&nbsp;FULCRUM&nbsp;EDITION 1.2 </head>


</div>

<div rend="newpage" type="preface">

<index index="toc" />

<index index="pdf" />

<index index="pdb" />

<pb n="i" />

<head> PREFACE </head>

<p> If&mdash;and the thing is wildly possible&mdash;the charge of writing 

nonsense
were ever brought against the author of this brief but instructive
poem, it would be based, I feel convinced, on the line (in p.4) </p>

<lg><l>"Then the bowsprit got mixed with the rudder sometimes."
<note place="foot">
This is an example footnote.
</note>
</l></lg>

<p> In view of this painful possibility, I will not (as I might) appeal
indignantly to my other writings as a proof that I am incapable of
such a deed: I will not (as I might) point to the strong moral purpose
of this poem itself, to the arithmetical principles so cautiously
inculcated in it, or to its noble teachings in Natural History&mdash; I 

will
take the more prosaic course of simply explaining how it happened. </p>

<p>  The Bellman, who was almost morbidly sensitive about appearances,
used to have the bowsprit unshipped once or twice a week to be revarnished,
and it more than once happened, when the time came for replacing it, that
no one on board could remember which end of the ship it belonged to.
They knew it was not of the slightest use to appeal to the Bellman about 

it&nbsp;&mdash;
he would only refer to his Naval Code, and read out in pathetic tones
Admiralty Instructions which none of them had ever been able to 

understand&nbsp;&mdash;
so it generally ended in its being fastened on, anyhow, across the rudder.
The helmsman used to stand by with tears in his eyes; he knew it was all 

wrong,
but alas! Rule 42 of the Code, "No one shall speak to the Man at the Helm,"
had been completed by the Bellman himself with the words "and the Man at 

the
Helm shall speak to no one." So remonstrance was impossible, and no 

steering
could be done till the next varnishing day. During these bewildering 

intervals
the ship usually sailed backwards. </p>

<p>  As this poem is to some extent connected with the lay of the 

Jabberwock,
let me take this opportunity of answering a question that has often been 

asked
me, how to pronounce "slithy toves." The "i" in "slithy" is long, as in
"writhe"; and "toves" is pronounced so as to rhyme with "groves." Again, 

the
first "o" in "borogoves" is pronounced like the "o" in "borrow." I have 

heard
people try to give it the sound of the "o" in "worry. Such is Human
Perversity. </p>

<p>  This also seems a fitting occasion to notice the other hard works in 

that
poem. Humpty-Dumpty's theory, of two meanings packed into one word like a
portmanteau, seems to me the right explanation for all. </p>

<pb n="ii" />

<p>  For instance, take the two words "fuming" and "furious." Make up your
mind that you will say both words, but leave it unsettled which you will 

say
first. Now open your mouth and speak. If your thoughts incline ever so
little towards "fuming," you will say "fuming-furious;" if they turn, by 

even
a hair's breadth, towards "furious," you will say "furious-fuming;" but if 

you
have the rarest of gifts, a perfectly balanced mind, you will say 

"frumious."</p>

<p>  Supposing that, when Pistol uttered the well-known words&nbsp;&mdash;  

</p>

<lg><l>  "Under which king, Bezonian? Speak or die!" 
<note place="foot">
<p>This is, hopefully, an example of a multi-line footnote.</p>
<p>Here is where the second line of the footnote should be.</p>
</note>
</l></lg>

<p> Justice Shallow had felt certain that it was either William or Richard, 

but
had not been able to settle which, so that he could not possibly say either
name before the other, can it be doubted that, rather than die, he would 

have
gasped out "Rilchiam!" </p>

</div>


<div rend="newdoublepage">

<pb n="1" />

<index index="toc" />

<index index="pdf" />

<index index="pdb" />

<head>  Fit the First </head>

<head type="sub">  THE LANDING </head>

<lg>
<l>"Just the place for a Snark!" the Bellman cried,</l>
<l n="2">As he landed his crew with care;</l>
<l>Supporting each man on the top of the tide</l>
<l>By a finger entwined in his hair.  </l>
</lg>

<lg>
<l> "Just the place for a Snark! I have said it twice:</l>
<l>That alone should encourage the crew. </l>
<l>Just the place for a Snark! I have said it thrice: </l>
<l>What i tell you three times is true."</l>
</lg>

<lg>
<l>The crew was complete: it included a Boots&nbsp;&mdash; </l>
<l>A maker of Bonnets and Hoods&nbsp;&mdash; </l>
<l>A Barrister, brought to arrange their disputes&nbsp;&mdash; </l>
<l>And a Broker, to value their goods. </l>
</lg>

</div>

<div rend="newpage">

<pb n="2" />

<index index="toc" />

<index index="pdf" />

<index index="pdb" />

<head>Example of a Table</head>

<table rows="2" cols="2">
<row role="label">
<cell>Column 1 Heading</cell><cell>Column 2 Heading</cell>
</row>
<row role="data">
<cell>Column 1 Data</cell><cell>Column 2 Data</cell>
</row>
</table>

<table rows="2" cols="2">
<row role="label">
<cell>Column 1 Heading - REALLY REALLY REALLY REALLY REALLY REALLY REALLY 

REALLY REALLY REALLY REALLY REALLY LONG</cell><cell>Column 2 Heading - 

REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY 

REALLY REALLY REALLY REALLY REALLY LONG</cell>
</row>
<row role="data">
<cell>Column 1 Data - REALLY REALLY REALLY REALLY REALLY REALLY REALLY 

REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell><cell>Column 2 Data - 

REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY 

REALLY REALLY REALLY REALLY REALLY REALLY LONG</cell>
</row>
</table>

</div>

<div>

<index index="toc" />

<index index="pdf" />

<index index="pdb" />

<head>  THE END</head>


<p>   </p>

</div>


 </body>

  <back rend="newdoublepage">
    <divGen type="footnotes" />
    <divGen type="colophon" rend="newpage" />
    <divGen type="pgfooter" rend="newpage" />
  </back>

</text>
</TEI.2>

From jon at noring.name  Tue Oct 26 08:39:13 2004
From: jon at noring.name (Jon Noring)
Date: Tue Oct 26 08:39:24 2004
Subject: [gutvol-d] Final PGTEI run-thru for a while...
In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
Message-ID: <141044128062.20041026093913@noring.name>

Joshua wrote:

> This e-mail concludes the "common" items I want to check in PGTEI.
>
> ...
>
> 1 - How should we markup poetry indents?  In HTML, I use
> &nbsp;&nbsp; toput two spaces for indents on the text....  *edit* I
> just found in Marcello's guide that he suggests using &emsp; as a quad
> indent.  Works for me, unless someone has a different suggestion.

As we've discussed (and argued) before, it is my belief that, except
where typography is integral to the poem itself ("poetry as visual
art"), that poetry should be marked up in a structural, not
presentational, sense. This means text characters should NEVER be used
for visual layout purposes -- characters should be used only for
representing textual content. Using text characters for layout mucks
up usability, repurposeability, CSS styling, and accessibility.

Use XSL*, CSS or other styling language to effect the desired output.
End-users will now have more ability to tailor the verse to their
particular reading devices. Of course, a non-parsed comment could be
added to the markup explaining how the original was typeset for those
wishing to try to duplicate the original layout (but then, that's one
purpose for having access to the original page scans.)

Why some here are so enamored with needlessly duplicating the layout
of verse in markup is beyond me -- especially when the original page
scans are now preserved. I see no one here saying if the original text
had indented paragraphs, that we must use a tab or spaces at the start
of each paragraph in markup to duplicate that. Wherever the typography
is used to help the end-user identify the structure of the poem, that
is automatically amenable to structural markup (even if it has to be
customized for some really weird poem.) Only when the typography *is*
the poem itself does one resort to presentational markup, and here SVG
makes the most sense.

In a project I'm working on, the 1001 Arabian Nights by Sir Richard
F. Burton, there are literally thousands of "quatrains" spread
throughout the work. Burton, or the typesetter, chose to present
these quatrains in an unusual way, no doubt simply to save paper since
the following format makes each quatrain much more compact, and with
thousands of quatrains in 6000+ pages, this could mean a lot fewer
pages and substantially lower printing costs. Here's an example of how
a quatrain is typeset in the source:

   The blear-eyed scapes the pits  *  Wherein the lynx-eyed fall:
   A word the wise man slays       *  And saves the natural:
   The Moslem fails of food        *  The Kafir feasts in hall:
   What art or act is man's?       *  God's will obligeth all!

It is clear that the layout used in this example has nothing to do
with the quatrain itself (the original being Arabic and very likely
formatted in a totally different way.) In XHTML, here's how I have
chosen to structure it (as you see, the '*' character seen above is
not reproduced since it's purpose in the original is for typographic
layout only -- it is not part of the content of the verse, just as
page numbers are not part of the content of a work):

<div class="quatrain" id="q1234">
<p class="verse1">The blear-eyed scapes the pits</p>
<p class="verse2">Wherein the lynx-eyed fall:</p>
<p class="verse1">A word the wise man slays</p>
<p class="verse2">And saves the natural:</p>
<p class="verse1">The Moslem fails of food</p>
<p class="verse2">The Kafir feasts in hall:</p>
<p class="verse1">What art or act is man's?</p>
<p class="verse2">God's will obligeth all!</p>
</div>


With XSLT, if I wanted, the above could be transformed into the
original format Burton used in print, or it could be output in the
more traditional ABABABAB form of most 19th century Western poetry,
with no loss in comprehension of the quatrain itself. There is nothing
sacred about the typographic layout of *most* poetry I've seen, pretty
as it might be in the printed source -- it simply extends the various
typographic conventions used for ordinary prose to aid in
understanding the "voiceability" of the verse and how the verses
relate to each other. Only when we get to the "poetry as visual art"
craze we see a lot in 20th century poetry (and as a few have noted, in
older works) that we need to preserve the exact layout. As just noted,
SVG is certainly intriguing to do this layout preservation.

(This is not the only possible markup scheme, but works for my
purposes. I suggest PG study a more generalized structural markup
scheme for verse -- study maybe 100 random works containing verse and
see if for at least 90 of them some sort of general markup scheme can
be developed which, when converted to XHTML, allows a single CSS style
sheet to reasonably display the poetry as originally typeset. It would
not surprise me if such a 90% generalized markup scheme is possible:
a sort of "Poetry Markup Language" -- the other 10% would be covered
by customized extensions, and for "poetry as visual art" by SVG.)

Jon Noring

From joshua at hutchinson.net  Tue Oct 26 08:57:08 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 08:57:13 2004
Subject: [gutvol-d] Final PGTEI run-thru for a while...
Message-ID: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com>

----- Original Message -----
From: Jon Noring <jon@noring.name>
To: gutvol-d@lists.pglaf.org
Subject: Re: [gutvol-d] Final PGTEI run-thru for a while...
Date: Tue, 26 Oct 2004 09:39:13 -0600

> 
> Joshua wrote:
> 
> > This e-mail concludes the "common" items I want to check in PGTEI.
> >
> > ...
> >
> > 1 - How should we markup poetry indents?  In HTML, I use
> >    toput two spaces for indents on the text....  *edit* I
> > just found in Marcello's guide that he suggests using &#8195; as a quad
> > indent.  Works for me, unless someone has a different suggestion.
> 
> As we've discussed (and argued) before, it is my belief that, except
> where typography is integral to the poem itself ("poetry as visual
> art"), that poetry should be marked up in a structural, not
> presentational, sense. This means text characters should NEVER be used
> for visual layout purposes -- characters should be used only for
> representing textual content. Using text characters for layout mucks
> up usability, repurposeability, CSS styling, and accessibility.
> 
> Use XSL*, CSS or other styling language to effect the desired output.
> End-users will now have more ability to tailor the verse to their
> particular reading devices. Of course, a non-parsed comment could be
> added to the markup explaining how the original was typeset for those
> wishing to try to duplicate the original layout (but then, that's one
> purpose for having access to the original page scans.)
> 

The big difference here is that the indent spacing in lines of poetry is NOT just presentational.  It can and has been argued that the spacing is INTENTIONAL and STRUCTURAL to the poem.  Hence, the addition of the leading spaces, either through &nbsp; (non-breaking space) or &emsp; (quad space).

Your example of the layout of the quatrains is purely presentational.  They do not provide any structural meaning to the poem, whereas poetry indentions often DO provide structural meaning.

I'll let David argue the case further if necessary, as he's been the biggest proponent of poetry indents as structural vs presentational in the DP forum discussions on the subject.

Josh
From jon at noring.name  Tue Oct 26 09:21:06 2004
From: jon at noring.name (Jon Noring)
Date: Tue Oct 26 09:21:23 2004
Subject: [gutvol-d] Final PGTEI run-thru for a while...
In-Reply-To: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com>
References: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com>
Message-ID: <301046640781.20041026102106@noring.name>

Josh wrote
> Jon Noring wrote:

>> As we've discussed (and argued) before, it is my belief that, except
>> where typography is integral to the poem itself ("poetry as visual
>> art"), that poetry should be marked up in a structural, not
>> presentational, sense. This means text characters should NEVER be used
>> for visual layout purposes -- characters should be used only for
>> representing textual content. Using text characters for layout mucks
>> up usability, repurposeability, CSS styling, and accessibility.
>> 
>> Use XSL*, CSS or other styling language to effect the desired output.
>> End-users will now have more ability to tailor the verse to their
>> particular reading devices. Of course, a non-parsed comment could be
>> added to the markup explaining how the original was typeset for those
>> wishing to try to duplicate the original layout (but then, that's one
>> purpose for having access to the original page scans.)

> The big difference here is that the indent spacing in lines of
> poetry is NOT just presentational.  It can and has been argued that
> the spacing is INTENTIONAL and STRUCTURAL to the poem.  Hence, the
> addition of the leading spaces, either through &nbsp; (non-breaking
> space) or &emsp; (quad space).

How is this different than indenting the first line of a paragraph to
communicate to the reader "this is a new paragraph"?

All the poetry I've seen (and of course I've not seen it all), uses
indentations of various types to communicate to the reader the
*structure* of the poem, thus the poetry is amenable to structural
markup provided the granularity of the markup is sufficient. CSS can
be used to reproduce the original typographic layout if desired.

I also assert that except for "poetry as visual art", any indentation
intentionally added by the poet was done simply to assist the reader
in understanding the structure -- this is comparable to an author
insisting that new paragraphs indent the first line.


> Your example of the layout of the quatrains is purely presentational.
> They do not provide any structural meaning to the poem, whereas
> poetry indentions often DO provide structural meaning.

To the contrary -- the presentation used in Burton's quatrains does
communicate the underlying structure of quatrains, which is
essentially ABABABAB.

Jon Noring

From joshua at hutchinson.net  Tue Oct 26 09:50:30 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 09:50:38 2004
Subject: [gutvol-d] Final PGTEI run-thru for a while...
Message-ID: <20041026165030.B76192F94C@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Jon Noring <jon@noring.name>

<snipped poetry indent discussion>

Fundamentally, it comes down to whether the indentation is structual (it provides some author intended meaning to the poem) or simply presentational (the type setter put those indents there to make everything "pretty").

First argument for structural ...  The author can use indention differently for different poems within the same book.  That, to me, means that the indention pattern has some meaning for that author.  Eg: It was intentional that the third and fifth lines are indented and the seventh and ninth lines were double indented.  Yet, in the next poem, every other line is indented equally.  That is an intentional indentation and implies meaning to it.

Second argument for structural ... In the Chicago Manual of Style (reference I used is here: http://nutsandbolts.washcoll.edu/chicago.html), it does include an indent in the quoted poem it gives as the first example.  Since the style guide thinks that indention is important and should be preserved, I think that argues for the preservation in our texts as well.

Argument for using &nbsp; (non-breaking space) or &emsp; (quad space) ...  The full TEI spec (much less TEI-Lite) does not seem to have a tag element to control indents in a poem.  Hence, the use of spaces to specify those indents.  With the added benefit of being XML format independent.  If our text is converted to some other flavor of XML, the indents are kept intact.  Whereas if we tried to create a custom tag element to control indention, that tag element (and hence the information) would most likely be lost if the text was convert to some other flavor of XML.

Josh

PS After reading the full TEI spec section on poetry, I'm not sure at all anymore on how to tag an entire poem as a single entity.

ie, if you have two separate poems in a book of poetry, what element(s) do you use to mark one poem as a separate from the other?  <lg> markup seems to indicate "a stanza, refrain, verse paragraph, etc." (http://www.tei-c.org/P4X/VE.html)

it mentions using <div> to mark off sections of a long poem into Cantos or Books... Should I use the same <div> markup for setting off individual poems as separate entities?
From joshua at hutchinson.net  Tue Oct 26 10:12:42 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 10:12:49 2004
Subject: [gutvol-d] Found an indention example in TEI...
Message-ID: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com>

Google is such a wonderful thing... :)

I did some further digging and found an example (of the exact poem I was using for an example, no less) from the Electronic Text Center at the UofVirginia.  They use a simple rend element within the <l> tag to mark an indent.  

Anyone have a problem with that?  It keeps the indention information, but does it within the XML tag element.

The only possible issue I see if that it doesn't allow for variable length indents.

Josh

****

Lewis Carroll's The Hunting of the Snark

    <div1 type="fit" n="1">
    <head> Fit the First: THE LANDING </head>
    <pb n="45" />

        <lg type="stanza">
        <l>"Just the place for a Snark!" the Bellman cried,</l>
        <l rend="indent">As he landed his crew with care;</l>
        <l>Supporting each man on the top of the tide</l>
        <l rend="indent">By a finger entwined in his hair.</l>
        </lg>

        <pb n="46" />

        <lg type="stanza">
        <l>"Just the place for a Snark! I have said it twice:</l>
        <l rend="indent">That alone should encourage the crew.</l>
        <l>Just the place for a Snark! I have said it thrice:</l>
        <l rend="indent">What I tell you three times is true."</l>
        </lg>

        [ETC....] 

    </div1> 
From sly at victoria.tc.ca  Tue Oct 26 10:31:28 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Oct 26 10:31:36 2004
Subject: [gutvol-d] Found an indention example in TEI...
In-Reply-To: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com>
References: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com>
Message-ID: <Pine.GSO.4.58.0410261026190.5656@vtn1.victoria.tc.ca>


Some recent texts coming from dp have had css used for indicating
indentation in verse. They use class selectors such as i2, i4, i6
to indicate various levels of indentation. Could something similar
be adopted in this case?

Andrew

On Tue, 26 Oct 2004, Joshua Hutchinson wrote:

> Google is such a wonderful thing... :)
>
> I did some further digging and found an example (of the exact poem I was using for an example, no less) from the Electronic Text Center at the UofVirginia.  They use a simple rend element within the <l> tag to mark an indent.
>
> Anyone have a problem with that?  It keeps the indention information, but does it within the XML tag element.
>
> The only possible issue I see if that it doesn't allow for variable length indents.
>
> Josh
>
> ****
>
> Lewis Carroll's The Hunting of the Snark
>
>     <div1 type="fit" n="1">
>     <head> Fit the First: THE LANDING </head>
>     <pb n="45" />
>
>         <lg type="stanza">
>         <l>"Just the place for a Snark!" the Bellman cried,</l>
>         <l rend="indent">As he landed his crew with care;</l>
>         <l>Supporting each man on the top of the tide</l>
>         <l rend="indent">By a finger entwined in his hair.</l>
>         </lg>
>
>         <pb n="46" />
>
>         <lg type="stanza">
>         <l>"Just the place for a Snark! I have said it twice:</l>
>         <l rend="indent">That alone should encourage the crew.</l>
>         <l>Just the place for a Snark! I have said it thrice:</l>
>         <l rend="indent">What I tell you three times is true."</l>
>         </lg>
>
>         [ETC....]
>
>     </div1>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
From dlainson at sympatico.ca  Tue Oct 26 10:32:06 2004
From: dlainson at sympatico.ca (dlainson@sympatico.ca)
Date: Tue Oct 26 10:32:34 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
Message-ID: <417E51D6.16261.2CC444@localhost>


    Hello

    Here's a letter (which I'm apparently breaking some US law by 
    forwarding, but I'll take the risk) which I find disturbing.  Seems 
    that "Project Gutenberg established PGA to permit the illegal 
    downloading of works".  Of this I wasn't aware.  As a big contributor 
    to PGA it concerns me personally, as well as setting a very dangerous 
    precedent.

    Does one country have the right to dictate to another what a website 
    can contain when it falls within the law of the host country, and can 
    they force some sort of restrictions on the downloading of material?

    Don.

    ------- Forwarded message follows -------
    From:           	"Col Choat" <colc@gutenberg.net.au>
    To:             	"Don Lainson" <dlainson@sympatico.ca>
    Subject:        	FW: Copyright Infringement of Gone With the Wind
    Date sent:      	Tue, 26 Oct 2004 09:36:48 +1000


    -----Original Message-----
    From: Gonzalez, Dalgis [mailto:dgonzalez@fkkslaw.com]On Behalf Of 
    Selz, Thomas
    Sent: Tuesday, 26 October 2004 6:29 AM
    To: colc@gutenberg.net.au
    Cc: Paul Anderson Sr. (E-mail); Paul Anderson Jr. (E-mail); Thomas 
    Hal Clarke (E-mail); Thomas Hal Clarke (E-mail 2); Selz, Thomas
    Subject: Copyright Infringement of Gone With the Wind

    October 25, 2004
    
    
    Certified Mail-
    Return receipt Requested
    
    Project Gutenberg
    405 West Elm Street
    Urbana, IL 61801
    
    By e-mail (colc@gutenberg.net.au)

    Project Gutenberg of Australia
    
    
Re: Copyright 
    Infringement of Gone With the Wind
    
    To Whom It May Concern:
    We represent the Stephens Mitchell Trusts (the ?Trusts?), the owner 
    of the copyright to the book, Gone With The Wind (?GWTW?). There are 
    copyright provisions around the world, including, without limitation, 
    the United States Copyright Act, 17 U.S.C. ?101 et. seq, which grant 
    the Trusts, as copyright owner, the exclusive right to reproduce and 
    distribute GWTW in the United States and elsewhere.
    It has come to our attention that Project Gutenberg?s affiliate, 
    Project Gutenberg of Australia (?PGA?), is publishing GWTW in 
    electronic book form on its web site located at www.gutenberg.net.au 
    (the ?Web Site?). The Web Site states that PGA ?produces etexts in 
    accordance with Australian law? and that the books available on its 
    site are in the public domain in Australia. While the Web Site warns 
    that some of its ebooks may still be protected by copyright in the 
    U.S. and suggests that U.S. users check U.S. copyright laws or visit 
    Project Gutenberg?s U.S. web site for its list of public domain 
    works, there is nothing to prevent any U.S. user from simply 
    downloading GWTW from the Web Site. Indeed, we were able to do so 
    easily.
    It appears to us that Project Gutenberg established PGA to permit the 
    illegal downloading of works that are still subject to copyright 
    protection in the U.S. and elsewhere. Project Gutenberg?s and PGA?s 
    willful, knowing and unauthorized distribution of GWTW to users in 
    the U.S. and elsewhere where copyright protection remains available 
    is a blatant violation of our client?s rights under applicable 
    statutes and common law. Please be advised that Project Gutenberg 
    and PGA are subject to U.S. copyright law and to jurisdiction in the 
    U.S. for their infringing activities through applicable jurisdiction 
    statutes governing the commission of acts of infringement that either 
    occur in the U.S. or have an effect in the U.S.
    On behalf of the Trusts, we hereby demand that Project Gutenberg 
    and/or PGA confirm to us within five (5) days of receipt of this 
    letter that you have removed GWTW from the Web Site entirely or that 
    you have taken all necessary steps to prevent the downloading of GWTW 
    in all places in which it is protected by copyright. 
    Please be advised that if we have not received confirmation of your 
    willingness to comply with the foregoing demands, we will take all 
    appropriate steps to protect and enforce our clients? rights.
    This demand is without prejudice to all of the Trusts? rights and 
    remedies in this matter, both legal and equitable, all of which are 
    specifically and expressly reserved.
    
     Very truly yours,
    
    
     Thomas D. Selz
    
    cc:Paul H. Anderson, Sr., Esq.
     Paul Anderson, Jr., Esq.
     Thomas Hal Clarke, Jr., Esq.
    Dalgis E. Gonzalez 
    FrankfurtKurnit Klein & Selz, PC 
    488 Madison Avenue 
    New York, New York 10022 
    Tel: (212) 980-0120 x6735 
    Fax: (212) 593-9175 
    E-mail: dgonzalez@fkkslaw.com 

    This e-mail and any attached files are intended solely for the use of 
    the individual or entity to which this mail is addressed and may 
    contain information that is privileged, confidential and exempt from 
    disclosure under applicable law. Any use, disclosure, copying or 
    distribution of this e-mail or the attached files by anyone other 
    than the intended recipient is strictly prohibited. If you have 
    received this e-mail in error, please notify the sender by reply e-
    mail or collect call to (212) 980-0120 and delete this e-mail and 
    attached files from your system. Thank you.
    
    ------- End of forwarded message -------

    Don Lainson
    dlainson@sympatico.ca


From joshua at hutchinson.net  Tue Oct 26 10:53:37 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 10:53:45 2004
Subject: [gutvol-d] Found an indention example in TEI...
Message-ID: <20041026175337.E9330109774@ws6-4.us4.outblaze.com>

I found another example of using the rend attribute.

<l rend="indent1">As he landed his crew with care;</l>

<l rend="indent2">As he landed his crew with care;</l>

etc.

The indent level would indicate number of tab stop indentions for that line.  Not in the TEI spec itself, but seems to be widely used modification/addition.

Josh

----- Original Message -----
From: Andrew Sly <sly@victoria.tc.ca>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Found an indention example in TEI...
Date: Tue, 26 Oct 2004 10:31:28 -0700 (PDT)

> 
> 
> Some recent texts coming from dp have had css used for indicating
> indentation in verse. They use class selectors such as i2, i4, i6
> to indicate various levels of indentation. Could something similar
> be adopted in this case?
> 
> Andrew
> 
> On Tue, 26 Oct 2004, Joshua Hutchinson wrote:
> 
> >         <l rend="indent">As he landed his crew with care;</l>

From Bowerbird at aol.com  Tue Oct 26 11:05:41 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Oct 26 11:05:55 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
Message-ID: <1e0.2d85fdac.2eafebf5@aol.com>

frankfurtkurnit klein & selz said:
>   Tel: (212) 980-0120 x6735

michael, whenever you want us to call these guys,
you just let us know...         :+)

-bowerbird
From jeroen at bohol.ph  Tue Oct 26 11:39:48 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Tue Oct 26 11:39:41 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
In-Reply-To: <417E51D6.16261.2CC444@localhost>
References: <417E51D6.16261.2CC444@localhost>
Message-ID: <417E99F4.5000605@bohol.ph>


I would reply something following these lines.

1. The copyright of the work has experid accourding to Australian law.
2. PG of A is not subject to US copyright law, as it has no activities 
in the US. It is the responsibility of US visitors to our website to 
comply with US law, similar to a US visitor visiting Australia, and 
buying a printed copy of GWTW and bringing it to the US, or buying the 
same by postal order.
2. Nothing in Australian law requires PG of A to prevent access to 
public domain materials for visitors appearantly from outside Australia. 
Furthermore, no reliable means exists to determine the geographical 
location of a visitor to our website. Even if we would be able to 
implement such a mechanism, it would be easy to circumvent, using proxy 
servers.
3. The downloading for personal use and study, as is facilitated by the 
PG of Oz website, may, in many jurisdictions, consititute fair use of 
the work by the visitor, and hence, the downloading is not necessarily 
illegal in third countries, even if the work is still under copyright.
4. Project Gutenberg of Australia is legally an entirely independent 
organisation from PG of US, and PG of US cannot be held liable for any 
actions of PG of A. It has not been set up for the express purpose of 
evading US law, but as an independent sister organisation to allow 
Australian volunteers to distribute works in the public domain in Au. 
Nobody of PG US has any hand in establishing or running PG of A.

Hence, PG US is lacks the means comply with your request, and PG of A is 
fully within its rights to behave as it does.

Any litigation will be fully without merit. We advise you that any such 
litigation will be accompanied by a campaign on our side to increase 
public awareness of the overly long duration of copyright, and the 
highly disputable way such extentions have been bought in the US.

I think Michael has dealth with issues like these a few times, so may 
have a letter ready in the correct legalese...

Jeroen Hellingman


>  
>

From cannona at fireantproductions.com  Tue Oct 26 11:40:50 2004
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Tue Oct 26 11:42:16 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
In-Reply-To: <417E51D6.16261.2CC444@localhost>
References: <417E51D6.16261.2CC444@localhost>
Message-ID: <6.1.2.0.0.20041026132146.01ff6c30@mail.fireantproductions.com>

Cute letter.  It's built upon a foundation of faulty suppositions, and 
imaginative stretches of logic, but it's cute. :)

I'll be interested (if the powers that be decide to share) to see the 
official response (if any).

Sincerely
Aaron Cannon


At 12:32 PM 10/26/2004, you wrote:

>     Hello
>
>     Here's a letter (which I'm apparently breaking some US law by
>     forwarding, but I'll take the risk) which I find disturbing.  Seems
>     that "Project Gutenberg established PGA to permit the illegal
>     downloading of works".  Of this I wasn't aware.  As a big contributor
>     to PGA it concerns me personally, as well as setting a very dangerous
>     precedent.
>
>     Does one country have the right to dictate to another what a website
>     can contain when it falls within the law of the host country, and can
>     they force some sort of restrictions on the downloading of material?
>
>     Don.
>
>     ------- Forwarded message follows -------
>     From:               "Col Choat" <colc@gutenberg.net.au>
>     To:                 "Don Lainson" <dlainson@sympatico.ca>
>     Subject:            FW: Copyright Infringement of Gone With the Wind
>     Date sent:          Tue, 26 Oct 2004 09:36:48 +1000
>
>
>
>     -----Original Message-----
>     From: Gonzalez, Dalgis [mailto:dgonzalez@fkkslaw.com]On Behalf Of
>     Selz, Thomas
>     Sent: Tuesday, 26 October 2004 6:29 AM
>     To: colc@gutenberg.net.au
>     Cc: Paul Anderson Sr. (E-mail); Paul Anderson Jr. (E-mail); Thomas
>     Hal Clarke (E-mail); Thomas Hal Clarke (E-mail 2); Selz, Thomas
>     Subject: Copyright Infringement of Gone With the Wind
>
>     October 25, 2004
>
>
>     Certified Mail-
>     Return receipt Requested
>
>     Project Gutenberg
>     405 West Elm Street
>     Urbana, IL 61801
>
>     By e-mail (colc@gutenberg.net.au)
>
>     Project Gutenberg of Australia
>
>
>Re: Copyright
>     Infringement of Gone With the Wind
>
>     To Whom It May Concern:
>     We represent the Stephens Mitchell Trusts (the ?Trusts?), the owner
>     of the copyright to the book, Gone With The Wind (?GWTW?). There are
>     copyright provisions around the world, including, without limitation,
>     the United States Copyright Act, 17 U.S.C. ?101 et. seq, which grant
>     the Trusts, as copyright owner, the exclusive right to reproduce and
>     distribute GWTW in the United States and elsewhere.
>     It has come to our attention that Project Gutenberg?s affiliate,
>     Project Gutenberg of Australia (?PGA?), is publishing GWTW in
>     electronic book form on its web site located at www.gutenberg.net.au
>     (the ?Web Site?). The Web Site states that PGA ?produces etexts in
>     accordance with Australian law? and that the books available on its
>     site are in the public domain in Australia. While the Web Site warns
>     that some of its ebooks may still be protected by copyright in the
>     U.S. and suggests that U.S. users check U.S. copyright laws or visit
>     Project Gutenberg?s U.S. web site for its list of public domain
>     works, there is nothing to prevent any U.S. user from simply
>     downloading GWTW from the Web Site. Indeed, we were able to do so
>     easily.
>     It appears to us that Project Gutenberg established PGA to permit the
>     illegal downloading of works that are still subject to copyright
>     protection in the U.S. and elsewhere. Project Gutenberg?s and PGA?s
>     willful, knowing and unauthorized distribution of GWTW to users in
>     the U.S. and elsewhere where copyright protection remains available
>     is a blatant violation of our client?s rights under applicable
>     statutes and common law. Please be advised that Project Gutenberg
>     and PGA are subject to U.S. copyright law and to jurisdiction in the
>     U.S. for their infringing activities through applicable jurisdiction
>     statutes governing the commission of acts of infringement that either
>     occur in the U.S. or have an effect in the U.S.
>     On behalf of the Trusts, we hereby demand that Project Gutenberg
>     and/or PGA confirm to us within five (5) days of receipt of this
>     letter that you have removed GWTW from the Web Site entirely or that
>     you have taken all necessary steps to prevent the downloading of GWTW
>     in all places in which it is protected by copyright.
>     Please be advised that if we have not received confirmation of your
>     willingness to comply with the foregoing demands, we will take all
>     appropriate steps to protect and enforce our clients? rights.
>     This demand is without prejudice to all of the Trusts? rights and
>     remedies in this matter, both legal and equitable, all of which are
>     specifically and expressly reserved.
>
>      Very truly yours,
>
>
>      Thomas D. Selz
>
>     cc:Paul H. Anderson, Sr., Esq.
>      Paul Anderson, Jr., Esq.
>      Thomas Hal Clarke, Jr., Esq.
>     Dalgis E. Gonzalez
>     FrankfurtKurnit Klein & Selz, PC
>     488 Madison Avenue
>     New York, New York 10022
>     Tel: (212) 980-0120 x6735
>     Fax: (212) 593-9175
>     E-mail: dgonzalez@fkkslaw.com
>
>     This e-mail and any attached files are intended solely for the use of
>     the individual or entity to which this mail is addressed and may
>     contain information that is privileged, confidential and exempt from
>     disclosure under applicable law. Any use, disclosure, copying or
>     distribution of this e-mail or the attached files by anyone other
>     than the intended recipient is strictly prohibited. If you have
>     received this e-mail in error, please notify the sender by reply e-
>     mail or collect call to (212) 980-0120 and delete this e-mail and
>     attached files from your system. Thank you.
>
>     ------- End of forwarded message -------
>
>     Don Lainson
>     dlainson@sympatico.ca
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) 


From marcello at perathoner.de  Tue Oct 26 13:42:13 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Oct 26 13:42:26 2004
Subject: [gutvol-d] Found an indention example in TEI...
In-Reply-To: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com>
References: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com>
Message-ID: <417EB6A5.5060805@perathoner.de>

Joshua Hutchinson wrote:

>         <l>"Just the place for a Snark!" the Bellman cried,</l>
>         <l rend="indent">As he landed his crew with care;</l>

Indenting is difficult to handle because it is such a hybrid structural 
/ presentational stuff.

There are at least 3 types of indent:

  1) indent of a block(quote)
  2) indent of the first line of a paragraf
  3) indent of a verse line

Lets just consider 3)


Ways of purely presentational tagging:

   <l>&emsp;

     simple, robust, standard-conforming and already implemented.

   <l rend="indent">

     found in the TEI spec but limited because just one level.

   <l rend="indentX"> for X = 1, 2, 3

     ugly, makes me want to puke. Negative indents ?

   <l rend="indent indent">

     better. Compatible with TEI spec. Falls back to one indent
     if more than one is not supported by XSLT.

   <l rend="indent(-1)">

     still better.

   <l indent="1">

     most elegant but not so standard. Needs new attribute.


Structural tagging:

   <lg type="limerick">
   <l>There was a young lady of Riga,</l>
   <l>Who smiled as she rode on a tiger;</l>
   <lg rend="indent">
     <l>They returned from the ride</l>
     <l>With the lady inside,</l>
   </lg>
   <l>And the smile on the face of the tiger.</l>
   </lg>


This is all just off of the top of my head. Once we have figured out 
what we want, I can start implementing.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From nihil_obstat at mindspring.com  Tue Oct 26 13:45:03 2004
From: nihil_obstat at mindspring.com (Dennis McCarthy)
Date: Tue Oct 26 13:45:14 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
Message-ID: <11773933.1098823504193.JavaMail.root@wamui09.slb.atl.earthlink.net>


Have they claimed that the cab driver who hit Margaret Mitchell at 13th and Peachtree in 1949 was also working for the vast P.G. conspiracy?  Perhaps if she had know how it would have affected her copyright term in Australia, she would have been more careful crossing the street.

Has anyone looked into rewriting the Gutenberg License to prohibiting inherited copyright holders and their attorneys from downloading PG works?


---------------------------
Dennis McCarthy
nihil_obstat@mindspring.com

From joshua at hutchinson.net  Tue Oct 26 14:02:21 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Oct 26 14:02:30 2004
Subject: [gutvol-d] Found an indention example in TEI...
Message-ID: <20041026210221.6A880109711@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> >         <l>"Just the place for a Snark!" the Bellman cried,</l>
> >         <l rend="indent">As he landed his crew with care;</l>
> 

<snipped>

I think I understood everything but this line.

> 
>    <l rend="indentX"> for X = 1, 2, 3
> 
>      ugly, makes me want to puke. Negative indents ?
> 

Negative indents?  1, 2, 3 would seem to indicate number of indents to apply, which would seem to me would be a positive indent number.  I'm missing something here, I know it.


>    <l rend="indent indent">
> 
>      better. Compatible with TEI spec. Falls back to one indent
>      if more than one is not supported by XSLT.
> 

I don't really like it.  Too verbose and not as intuitive (IMO).


>    <l rend="indent(-1)">
> 
>      still better.

Related to the negative indent comment above, I'm sure, but I'm still not connecting the dots.

> 
>    <l indent="1">
> 
>      most elegant but not so standard. Needs new attribute.
> 
> 

The not as "standard" is the only problem I have.  But I could go either way if you feel strongly about it.

> Structural tagging:
> 
>    <lg type="limerick">
>    <l>There was a young lady of Riga,</l>
>    <l>Who smiled as she rode on a tiger;</l>
>    <lg rend="indent">
>      <l>They returned from the ride</l>
>      <l>With the lady inside,</l>
>    </lg>
>    <l>And the smile on the face of the tiger.</l>
>    </lg>
> 

Looks fine to me (as long as the rend attribute is possible in the <l> tag, too, if necessary).

Josh
From stephen.thomas at adelaide.edu.au  Tue Oct 26 17:33:01 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Tue Oct 26 17:33:20 2004
Subject: [gutvol-d] Found an indention example in TEI...
In-Reply-To: <20041026175337.E9330109774@ws6-4.us4.outblaze.com>
References: <20041026175337.E9330109774@ws6-4.us4.outblaze.com>
Message-ID: <417EECBD.7060107@adelaide.edu.au>

As it says in the "Snark":

?What?s the good of Mercator?s North Poles and Equators,
?Tropics, Zones, and Meridian Lines??
So the Bellman would cry: and the crew would reply
??They are merely conventional signs!

Ditto markup: merely conventional signs.  ;-)

The discussion is fascinating. About nine months back, I got
seriously interested in TEI, and was looking at converting all
my ebooks to TEI. Among a number of stumbling blocks I
encountered was this question of what to do with poetry.
Probably, I lack suffucuent energy or interest, or possibly time
-- always a great excuse. But I regret to say that I gave up at
this point.

But during my "research" into the poetry question, I wondered
aloud on the TEI list whether there were identified verse
structures which could/should be used in markup. E.g. sonnets
and limericks seem to have a generally accepted layout, so maybe
there were other forms too. Unfortunately, possibly because I
didn't pay attention in school, I am rather ignorant about such
things. Unfortunately, no one else on that list seemed to know
either.

Now, someone here just posted an example which began:

	<lg type="limerick"> ...

and for TEI's purposes, that's probably enough. (Although TEI
has the rend attribute, TEI is actually pretty weak on the
presentational side -- not just my opinion, but that of many
experts on the TEI list.) Unfortunately, it is not possible to
define a CSS style which will translate "limerick" into the
desired presentation.

In my HTML, I've used the em-space entity to indent lines where
necessary. It's the easy way out, I know, but somehow I can't
stomach the mess that results from

	<l rend="indent2">

etc.


Steve

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From gbnewby at pglaf.org  Wed Oct 27 01:34:41 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Oct 27 01:34:44 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
In-Reply-To: <417E51D6.16261.2CC444@localhost>
References: <417E51D6.16261.2CC444@localhost>
Message-ID: <20041027083441.GB5668@pglaf.org>

On Tue, Oct 26, 2004 at 01:32:06PM -0400, dlainson@sympatico.ca wrote:
> 
>     Hello
> 
>     Here's a letter (which I'm apparently breaking some US law by
>     forwarding, but I'll take the risk) which I find disturbing.  Seems
>     that "Project Gutenberg established PGA to permit the illegal
>     downloading of works".  Of this I wasn't aware.  As a big contributor
>     to PGA it concerns me personally, as well as setting a very dangerous
>     precedent.

Folks, it's safe to assume that the people who sent the letter (or
other folks who might send other letters) could access the gutvol-d
list or archives.  So, in the interest of not helping them to think of
new ways to harass us, I won't send a lot of detail on this particular
case, or the ones like it.

Suffice to say that, as others have commented, these folks are
incorrect in many things.  The PG response to such threats is to tell
them this (politely), and mention that we have done extensive legal
research over the years (in consultation with numerous lawyers) to
support our notions.  If *they* know of laws or legal precedents to
the contrary, we would be very happy to hear of them and will seek to
comply, as we do with all other laws.

We also offer to help them, by providing information about these laws
in the copies of the eBook(s) in question that we distribute, and to
help further by writing letters to infringers they can identify.  In
short, we point out the errors in their requests, assumptions, claims,
etc. and put the ball back in their court.

This has been an effective strategy over the years.  But of course,
it's effective mostly because we're right, and legal -- PG's diligence
in copyright procedures gives us a strong moral & practical ground to
stand on.

>     Does one country have the right to dictate to another what a website
>     can contain when it falls within the law of the host country, and can
>     they force some sort of restrictions on the downloading of material?

The short answer is, no.  We are not aware of anything like this, and
have looked extensively, and consulted with many legal experts.  There
are definitely some vague points and unknowns, and eventually there
might be treaties etc. that address some of these issues.
  -- Greg

Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org

From marcello at perathoner.de  Wed Oct 27 02:25:13 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 27 02:25:36 2004
Subject: [gutvol-d] Found an indention example in TEI...
In-Reply-To: <20041026210221.6A880109711@ws6-4.us4.outblaze.com>
References: <20041026210221.6A880109711@ws6-4.us4.outblaze.com>
Message-ID: <417F6979.2030200@perathoner.de>

Joshua Hutchinson wrote:

> Negative indents?  1, 2, 3 would seem to indicate number of indents
> to apply, which would seem to me would be a positive indent number.
> I'm missing something here, I know it.

What if you want a line to stick out

       stanza
       stanza
     like this?
       stanza
       stanza

Solutions:

   <l rend="indent-1">like this</l>

or

   <l rend="indent(-1)>like this</l>


I prefer the latter.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Oct 27 05:16:57 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 27 05:16:59 2004
Subject: [gutvol-d] Found an indention example in TEI...
Message-ID: <20041027121657.305649E930@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> Joshua Hutchinson wrote:
> 
> > Negative indents?  1, 2, 3 would seem to indicate number of indents
> > to apply, which would seem to me would be a positive indent number.
> > I'm missing something here, I know it.
> 
> What if you want a line to stick out
> 
>        stanza
>        stanza
>      like this?
>        stanza
>        stanza
> 
> Solutions:
> 
>    <l rend="indent-1">like this</l>
> 
> or
> 
>    <l rend="indent(-1)>like this</l>
> 
> 
> I prefer the latter.
> 
> 
Ah!  *light bulb goes on*  I understand now.  Ok, the second example is fine with me and it does look better in the (hopefully rare) instance where a negative indent is required.

Josh

From hart at pglaf.org  Wed Oct 27 09:24:02 2004
From: hart at pglaf.org (Michael Hart)
Date: Wed Oct 27 09:24:03 2004
Subject: !@!Re: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the
	Wind
In-Reply-To: <417E99F4.5000605@bohol.ph>
References: <417E51D6.16261.2CC444@localhost> <417E99F4.5000605@bohol.ph>
Message-ID: <Pine.LNX.4.60.0410270921400.13376@pglaf.org>


I should add that PGUS has had nothing to do with the formation
of any other PG effort, no monies that I am aware of have ever
been exchanged, and that we couldn't stop any of these if we
wanted to under my understanding of the current legal system.

Michael

From gbnewby at pglaf.org  Wed Oct 27 10:39:47 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Oct 27 10:39:49 2004
Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind
In-Reply-To: <20041027083441.GB5668@pglaf.org>
References: <417E51D6.16261.2CC444@localhost> <20041027083441.GB5668@pglaf.org>
Message-ID: <20041027173947.GA17942@pglaf.org>

On Wed, Oct 27, 2004 at 01:34:41AM -0700, Greg Newby wrote:
> On Tue, Oct 26, 2004 at 01:32:06PM -0400, dlainson@sympatico.ca wrote:
> > 
> >     Hello
> > 
> >     Here's a letter (which I'm apparently breaking some US law by
> >     forwarding, but I'll take the risk) which I find disturbing.  Seems
> >     that "Project Gutenberg established PGA to permit the illegal
> >     downloading of works".  Of this I wasn't aware.  As a big contributor
> >     to PGA it concerns me personally, as well as setting a very dangerous
> >     precedent.
> 
> Folks, it's safe to assume that the people who sent the letter (or
> other folks who might send other letters) could access the gutvol-d
> list or archives.  So, in the interest of not helping them to think of
> new ways to harass us, I won't send a lot of detail on this particular
> case, or the ones like it.

Saw this story on slashdot, as well as Michael's Part I of today's
newsletter.  So much for keeping this under our hats :-)

One quick factoid: None of PG (of US) has received any contact from
the Mitchell lawyers (not to Michael's house, despite the statements
in the letter, not to our business office in Utah, nor to my house in
Fairbanks, which is the corporate business address of record).  So,
it's a little premature to make any sort of response.

In case you were wondering, one of the first things we do when we get
such letters is confirm that the people involved are who they say they
are, and have some sort of legal relationship to the texts.  In this
case, we only have one forwarded email, which could be fake or from
parties without legal standing.  Thus, it's premature to even feel
confident that the letter/complaint is real, or that (if it is real)
the people who sent it are legitimate.
  -- Greg

PS: I probably won't have time to post to /. today [my day job is
calling], but people can feel free to repost my comments or extracts.
There aren't any secrets here, but I do urge a modicum of discretion
since so few facts are known.

> Suffice to say that, as others have commented, these folks are
> incorrect in many things.  The PG response to such threats is to tell
> them this (politely), and mention that we have done extensive legal
> research over the years (in consultation with numerous lawyers) to
> support our notions.  If *they* know of laws or legal precedents to
> the contrary, we would be very happy to hear of them and will seek to
> comply, as we do with all other laws.
> 
> We also offer to help them, by providing information about these laws
> in the copies of the eBook(s) in question that we distribute, and to
> help further by writing letters to infringers they can identify.  In
> short, we point out the errors in their requests, assumptions, claims,
> etc. and put the ball back in their court.
> 
> This has been an effective strategy over the years.  But of course,
> it's effective mostly because we're right, and legal -- PG's diligence
> in copyright procedures gives us a strong moral & practical ground to
> stand on.
> 
> >     Does one country have the right to dictate to another what a website
> >     can contain when it falls within the law of the host country, and can
> >     they force some sort of restrictions on the downloading of material?
> 
> The short answer is, no.  We are not aware of anything like this, and
> have looked extensively, and consulted with many legal experts.  There
> are definitely some vague points and unknowns, and eventually there
> might be treaties etc. that address some of these issues.
>   -- Greg
> 
> Dr. Gregory B. Newby
> Chief Executive and Director
> Project Gutenberg Literary Archive Foundation http://gutenberg.net
> A 501(c)(3) not-for-profit organization with EIN 64-6221541
> gbnewby@pglaf.org
> 
From scott_bulkmail at productarchitect.com  Wed Oct 27 11:25:00 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Oct 27 11:25:54 2004
Subject: [gutvol-d] Final PGTEI... blockquote
In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
Message-ID: <p06110421bda4db9e53d4@[192.168.0.52]>

>Blockquotes:
>
>I wanted to markup a blockquote example, but I didn't see how.  Anyone out there know how to handle a blockquote with a text?

Perhaps:
	<div rend="display">
	<q rend="display">

(not very intuitive, eh?)

It's mentioned briefly in Marcello's docs, once you know what to look for ("wider margins").

Also, search for these in Marcello's alice.tei and lmiss.tei examples.  The "q" one will add quote marks, unless supressed via the appropriate attribute.

I couldn't find it in the TEI Lite docs (though I assume it's there somewhere).  I did find it in Section 4.3 of "Bare Bones TEI" http://www.tei-c.org/Vault/Bare/ -- which also suggests that rend="block" is equivalent.  (I didn't find independent confirmation.)
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From scott_bulkmail at productarchitect.com  Wed Oct 27 11:25:15 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Oct 27 11:25:58 2004
Subject: [gutvol-d] Final PGTEI... page numbers
In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
Message-ID: <p0611041ebda47356d3c0@[192.168.0.52]>

>Page number markup:
>
>No complaints.  I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern.

I didn't see an explicit way to mark the original page numbers.  Perhaps as a marginal note?

<note place="margin">27</note>
-- 

Scott

Practical Software Innovation (tm), http://ProductArchitect.com/
From joshua at hutchinson.net  Wed Oct 27 11:33:52 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 27 11:34:05 2004
Subject: [gutvol-d] Final PGTEI... page numbers
Message-ID: <20041027183357.8AFEE4F4A7@ws6-5.us4.outblaze.com>

<pb n="x" />

Where x is the original source page number.

Here is a tentative TEI2HTML transform for it...

   <xsl:template match="/TEI.2/text//pb[@n]">
     <span id="page{@n}"
	   class="pagenum">[pg <xsl:value-of select="@n"/>]</span>
   </xsl:template>

(Thanks to Marcello for cleaning up my ugly first attempt at the transform.)

The CSS will need a span.pagenum defined to put the [pg x] markup in the side margin.

Josh

----- Original Message -----
From: Scott Lawton <scott_bulkmail@productarchitect.com>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Final PGTEI... page numbers
Date: Wed, 27 Oct 2004 14:25:15 -0400

> 
> >Page number markup:
> >
> >No complaints.  I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern.
> 
> I didn't see an explicit way to mark the original page numbers.  Perhaps as a marginal note?
> 
> <note place="margin">27</note>
> -- 
> 
> Scott
> 
> Practical Software Innovation (tm), http://ProductArchitect.com/
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Wed Oct 27 11:40:58 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 27 11:41:06 2004
Subject: [gutvol-d] Final PGTEI... blockquote
Message-ID: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com>

I think I'd prefer the rend="block" over "display" just because it is a little more intuitive.  Thanks for tracking that down Scott.

I also found how to handle sidenotes since I wrote that original message.

<note place="margin"> 

It is actually used in Marcello's PGTEI guide (though not talked about specifically... it is used to provide some info on another topic).

If anyone is wondering, I am keeping track of all this markup.  I gonna see about putting together a webpage with the ongoing notes.

Josh

----- Original Message -----
From: Scott Lawton <scott_bulkmail@productarchitect.com>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] Final PGTEI... blockquote
Date: Wed, 27 Oct 2004 14:25:00 -0400

> 
> >Blockquotes:
> >
> >I wanted to markup a blockquote example, but I didn't see how.  Anyone out there know how to handle a blockquote with a text?
> 
> Perhaps:
> 	<div rend="display">
> 	<q rend="display">
> 
> (not very intuitive, eh?)
> 
> It's mentioned briefly in Marcello's docs, once you know what to look for ("wider margins").
> 
> Also, search for these in Marcello's alice.tei and lmiss.tei examples.  The "q" one will add quote marks, unless supressed via the appropriate attribute.
> 
> I couldn't find it in the TEI Lite docs (though I assume it's there somewhere).  I did find it in Section 4.3 of "Bare Bones TEI" http://www.tei-c.org/Vault/Bare/ -- which also suggests that rend="block" is equivalent.  (I didn't find independent confirmation.)
> -- 
> 
> Scott
> 
> Practical Software Innovation (tm), http://ProductArchitect.com/
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From marcello at perathoner.de  Wed Oct 27 13:45:11 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Oct 27 13:45:23 2004
Subject: [gutvol-d] Final PGTEI... blockquote
In-Reply-To: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com>
References: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com>
Message-ID: <418008D7.8040702@perathoner.de>

Joshua Hutchinson wrote:

> I think I'd prefer the rend="block" over "display" just because it is
> a little more intuitive.  Thanks for tracking that down Scott.

"display" is different from "block".

A display is a chunk of text set off (displayed) from the rest usually 
by enlarging the left and right margins and inserting some top and 
bottom margins.

A block is just a plain old block with nothing special to distinguish it 
from any block fore or aft.

The HTML tag <blockquote> is just a bit misnamed.

> I also found how to handle sidenotes since I wrote that original
> message.
> 
> <note place="margin">

This works only in HTML for now. Ideas on how to display it in TXT 
without wasting a lot of space?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Oct 27 13:55:05 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 27 13:55:14 2004
Subject: [gutvol-d] Final PGTEI... blockquote
Message-ID: <20041027205505.B98BA2F959@ws6-3.us4.outblaze.com>


----- Original Message -----
From: Marcello Perathoner <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> > I think I'd prefer the rend="block" over "display" just because it is
> > a little more intuitive.  Thanks for tracking that down Scott.
> 
> "display" is different from "block".
> 
> A display is a chunk of text set off (displayed) from the rest usually 
> by enlarging the left and right margins and inserting some top and 
> bottom margins.
> 
> A block is just a plain old block with nothing special to distinguish it 
> from any block fore or aft.
> 
> The HTML tag <blockquote> is just a bit misnamed.
> 

The barebones guide didn't seem to make a distinction between the two, but if "display" seems more correct to you, I'm willing to use it.

> > I also found how to handle sidenotes since I wrote that original
> > message.
> > 
> > <note place="margin">
> 
> This works only in HTML for now. Ideas on how to display it in TXT 
> without wasting a lot of space?
> 

In a text file, I would say one of two ways.

1 - Easy way, just treat it like a footnote/endnote and stick it at the end.

2 - Slightly better way (pulled from how DP texts do it) ... Move the note to before the paragraph it is part off and mark it with a [Sidenote: blah blah] markup.

Both methods lose a little fidelity, since the Sidenote is not printed exactly right by the text it refers to, like it would in the original.  But method two keeps it fairly close, and context should allow the reader to easily tell the part of the paragraph it refers to.  Method one would allow the marker to appear near its original source location, but the information is now not in the same eye region.  The user must click to the notes section to see the information, which is commonly meant to be more accessible/more important than a typical footnote.

Josh
From Bowerbird at aol.com  Wed Oct 27 15:26:11 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 27 15:26:35 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
Message-ID: <1a5.29beadb0.2eb17a83@aol.com>

joshua said:
>   Both methods lose a little fidelity, since the Sidenote 
>   is not printed exactly right by the text it refers to, 
>   like it would in the original.  But method two keeps it 
>   fairly close, and context should allow the reader 
>   to easily tell the part of the paragraph it refers to.  
>   Method one would allow the marker to appear near its original 
>   source location, but the information is now 
>   not in the same eye region.  The user must click to 
>   the notes section to see the information, which is commonly 
>   meant to be more accessible/more important than a typical footnote.


is it unreasonable to want to view _all_ notes
-- sidenotes, footnotes, _and_ endnotes _too_ --
right there close to the context where they apply?

i think not.

in print form, it cannot be done, of course.
(not by the printer, anyway, although readers
can do a pretty good job of using their hands 
to hold both pages, and switch between them.)

hotlinks between a note and its referent can
enable a person to "switch" in a similar way.

but you're still looking at either one page or
the other, when you want to look at _both_.

and in the electronic arena, we can easily go
that step better, so why not take advantage?

in my viewer-program, all notes are stored in an 
end-note section, but any note can be "popped up"
by a user just by clicking on its note-indicator.

so on the left half of the screen, they have the
body of the text, and on the right-half they have
the end-note section in a scrollable edit-field.
(actually the whole file, but it's auto-positioned
at the appropriate note in the end-note section.)

this lets them see each note in the context of
the notes that surround it -- which can be very
useful when the author has used ibids and op cits.

in addition, if the user double-clicks inside the
scrolling-field on the number of _another_ note,
the display on the left-hand side jumps to show
the page that has the text that calls _that_ note.

so even though the notes are collected together
in a place that is removed from their referents
in the _file_, the viewer-program brings them
together in a way giving users maximum power,
letting them see text and note at the same time,
and navigate easily amongst all of the notes.

as i experiment with this system, i'm quite happy
i've achieved a very good solution to the problem,
and i consider myself to be "done" working on it...
(at least until i consider what to do when printing.)

but if you can think of any other capability i should
add to it, please suggest it, i would love to hear it!

and let the programmers of _your_ favorite viewer
-- whoever they are -- know that _you_ would enjoy
this ability to view _all_ notes just like sidenotes,
simultaneously with the text that references them,
so you would appreciate it if they would program that.

let those programmers know that you'll be willing to
format notes however they require in order to provide
this feature, but you definitely _want_ the capability.

and if you're not on speaking terms with the people
who are programming your viewer-tools, _why_not_?

-bowerbird
From joshua at hutchinson.net  Wed Oct 27 16:35:33 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Oct 27 16:35:02 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
In-Reply-To: <1a5.29beadb0.2eb17a83@aol.com>
References: <1a5.29beadb0.2eb17a83@aol.com>
Message-ID: <418030C5.7070505@hutchinson.net>

Yes, genius, we have that ability, too, in the HTML.  We were talking 
about the plain text version which is reader program agnostic.

Josh

Bowerbird@aol.com wrote:

>joshua said:
>  
>
>>  Both methods lose a little fidelity, since the Sidenote 
>>  is not printed exactly right by the text it refers to, 
>>  like it would in the original.  But method two keeps it 
>>  fairly close, and context should allow the reader 
>>  to easily tell the part of the paragraph it refers to.  
>>  Method one would allow the marker to appear near its original 
>>  source location, but the information is now 
>>  not in the same eye region.  The user must click to 
>>  the notes section to see the information, which is commonly 
>>  meant to be more accessible/more important than a typical footnote.
>>    
>>
>
>
>is it unreasonable to want to view _all_ notes
>-- sidenotes, footnotes, _and_ endnotes _too_ --
>right there close to the context where they apply?
>
>i think not.
>
>in print form, it cannot be done, of course.
>(not by the printer, anyway, although readers
>can do a pretty good job of using their hands 
>to hold both pages, and switch between them.)
>
>hotlinks between a note and its referent can
>enable a person to "switch" in a similar way.
>
>but you're still looking at either one page or
>the other, when you want to look at _both_.
>
>and in the electronic arena, we can easily go
>that step better, so why not take advantage?
>
>in my viewer-program, all notes are stored in an 
>end-note section, but any note can be "popped up"
>by a user just by clicking on its note-indicator.
>
>so on the left half of the screen, they have the
>body of the text, and on the right-half they have
>the end-note section in a scrollable edit-field.
>(actually the whole file, but it's auto-positioned
>at the appropriate note in the end-note section.)
>
>this lets them see each note in the context of
>the notes that surround it -- which can be very
>useful when the author has used ibids and op cits.
>
>in addition, if the user double-clicks inside the
>scrolling-field on the number of _another_ note,
>the display on the left-hand side jumps to show
>the page that has the text that calls _that_ note.
>
>so even though the notes are collected together
>in a place that is removed from their referents
>in the _file_, the viewer-program brings them
>together in a way giving users maximum power,
>letting them see text and note at the same time,
>and navigate easily amongst all of the notes.
>
>as i experiment with this system, i'm quite happy
>i've achieved a very good solution to the problem,
>and i consider myself to be "done" working on it...
>(at least until i consider what to do when printing.)
>
>but if you can think of any other capability i should
>add to it, please suggest it, i would love to hear it!
>
>and let the programmers of _your_ favorite viewer
>-- whoever they are -- know that _you_ would enjoy
>this ability to view _all_ notes just like sidenotes,
>simultaneously with the text that references them,
>so you would appreciate it if they would program that.
>
>let those programmers know that you'll be willing to
>format notes however they require in order to provide
>this feature, but you definitely _want_ the capability.
>
>and if you're not on speaking terms with the people
>who are programming your viewer-tools, _why_not_?
>
>-bowerbird
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

From Bowerbird at aol.com  Wed Oct 27 16:48:42 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Oct 27 16:48:58 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes,
	and end-notes
Message-ID: <b9.49420cb0.2eb18dda@aol.com>

joshua said:
>   Yes, genius, we have that ability, too, in the HTML.  

do you now?
then why don't i see more .html versions prepared this way?
read what i wrote, carefully, and then show me some e-texts
that have an .html version that can match those capabilities...

and what's with the "genius" comment?  are you being snide?


>   We were talking about the plain text version 
>   which is reader program agnostic.

my viewer-program takes plain-text files.

you can be as "agnostic" as you care to be, but
if you don't serve the readers, who are you serving?

-bowerbird
From brad at chenla.org  Wed Oct 27 21:02:32 2004
From: brad at chenla.org (Brad Collins)
Date: Wed Oct 27 21:04:47 2004
Subject: [gutvol-d] Final PGTEI... page numbers
In-Reply-To: <p0611041ebda47356d3c0@[192.168.0.52]> (Scott Lawton's message
	of "Wed, 27 Oct 2004 14:25:15 -0400")
References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
	<p0611041ebda47356d3c0@[192.168.0.52]>
Message-ID: <wk4qkfze07.fsf@chenla.org>

Scott Lawton <scott_bulkmail@productarchitect.com> writes:

>>Page number markup:
>>
>>No complaints.  I'll be looking into a transform that will place the
>>numbers in the margin, but that is a secondary concern.
>
> I didn't see an explicit way to mark the original page numbers.
> Perhaps as a marginal note?
>
> <note place="margin">27</note>
> -- 

Page numbers are put in the `pb' pagebreak element.

   <pb n="27" />

,----[ TEI Manual: 6.9.3 Milestone Tags ]
|  - <pb> marks the boundary between one page of a text and the next
|    in a standard reference system.
|
|    `ed' (edition) indicates the edition or version in which the page
|    break is located at this point
|  
|  - <lb> marks the start of a new (typographic) line in some edition
|    or version of a text.
| 
|    `ed' (edition) indicates the edition or version in which the line
|    break is located at this point
| 
|  - <cb> marks the boundary between one column of a text and the next
|    in a standard reference system.
| 
|    `ed' (edition) indicates the edition or version in which the column
|    break is located at this point
`----

There is no need for a `place' attribute, you can use rend="margin"
instead.  But this is confusing because this it's saying that the page
breaks in the original edition were in the margin.  And if so, which
margin, left or right?

Presentational markup should be used to indicate how the original was
marked up.  Instructions for how something should be displayed should
be done using CSS or XSLT.

I'm using a EETS edition of The Merlin as a development text because
it has a running analysis in the left margin, footnotes, and
indicates the page breaks in the original manuscript.

So in the electronic edition I need to indicate two different sets
of page breaks, one for the original manuscript and another for the
page breaks in the EETS edition.

This can easily be done using the edition `ed' attribute.

  <pb ed="ms" n="27" />

  <pb ed="EETS" n="53" />

Learning TEI is like learning Emacs or Unix like systems.  It's a
gradual process of incremental epiphanies.

TEI is a large and complex spec and takes some time to digest.  More
than once over the past couple of years I have quickly looked up
something in TEI and thought that it was silly and then came up with
my own alternate solution.  

However, most of the time, after putting my hack into practice I found
it didn't work as I expect and finally understood why TEI had had
done things the way they had.

I've come to respect TEI more and more as a mature body of
experience which I am trusting more and more.  If something seems
stupid or awkward I now try to stop and step back and assume that
there is a good chance I don't understand the design before trying to
cobble to together my own solution.

Detractors of XML on this list have brought up the fact that the TEI
manual is 1400 pages long as a negative.  Why?  This shows that TEI is
well documented.  As a general rule, the more documentation that is
available for a spec the more mature and useful the standard and the
easier it is to learn and implement.

I remember a sig file from someone back in the early 90's that went
something like, "documentation is a sign of failure".   This is
somewhat true for simple end-user applications, but it certainly
isn't true for things like computer languages and markup languages.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From Bowerbird at aol.com  Thu Oct 28 01:44:37 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 28 01:45:01 2004
Subject: [gutvol-d] Final PGTEI... page numbers
Message-ID: <65.36dc4587.2eb20b75@aol.com>

brad said:
>   Detractors of XML on this list have brought up 
>   the fact that the TEI manual is 1400 pages long 
>   as a negative.  Why?

actually, it was jeroen who initially mentioned that fact.

but since you asked, the reason this is seen as a "negative"
is because we think that precious few of the volunteers who
have traditionally shouldered the effort of creating e-texts
will continue to do so if an understanding of those 1400 pages
of t.e.i. documentation were to become a prerequisite.

but maybe now distributed proofreaders has enough people
on-board that they feel less uncomfortable taking that risk...

or maybe not, as their tentative plan thus far involves
adding two "markup" rounds (at least, and maybe more)
to their existing two "proofing" rounds, so as to minimize
the number of people who need to be concerned with markup.


>   This shows that TEI is well documented.

um, well, yes, i guess it does.  although _more_ documentation
is not _always_ a good sign of _better_ documentation, is it?


>   As a general rule, the more documentation that is available 
>   for a spec the more mature and useful the standard 
>   and the easier it is to learn and implement.

i'm not quite so sure i agree with that "general rule", brad...

i think it would be just as possible -- and more compelling --
to formulate a "general rule" that the more documentation that
a spec needs, the more complex it is, which means that it is
_harder_ to "learn and implement"...

i'm not afraid of documentation.  indeed, quite to the contrary,
i'm one of those rare people who often prefers reading it _first_,
because if you can stomach it, it'll save you lots of fiddling time.

and i'm a word geek too.  so i find the massive t.e.i. documentation
-- and indeed the whole framework itself -- to be a remarkable
and fascinating piece of work.  it is mind-boggling to witness how
_complex_ and _variegated_ a comprehensive examination of text
can become, once you pour a foundation and start building a building.

on the other hand, i can equally admire a system that boils things
down to their essence, and creates great benefits with few costs.

if all the volunteers contributing their efforts to project gutenberg
were word geeks as willing to throw themselves into a devotion of
documentation, like you and me, brad, it might not matter whether
we went with the complex system or one that is a lot more easy.

but given that they probably aren't, we should think very carefully
before committing them to a world with a high degree of difficulty.

as you put it, the learning of a complex system like t.e.i. is often
"a gradual process of incremental epiphanies".  can we _survive_
the situation where thousands of volunteers are put through that?
with perhaps many becoming alienated in the course of doing so?

unless i miss my guess, just the last few days of "how do we do this?"
posts on this listserve have tried the patience of most subscribers...
(which leads me to suggest that perhaps there is another listserve
that is more appropriate for that, where the markup geeks can go?)

-bowerbird
From brad at chenla.org  Thu Oct 28 06:57:42 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Oct 28 06:59:25 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
In-Reply-To: <418030C5.7070505@hutchinson.net> (Joshua Hutchinson's message
	of "Wed, 27 Oct 2004 19:35:33 -0400")
References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net>
Message-ID: <wk7jpbj67d.fsf@chenla.org>


The side note is not always the same as an end note or foot note.
Take the following attempt at an ascii version of the first page of
the EETS edition of the Romance of Merlin.  It's the best I could do
using a proportional font.

Note: This example is 80 columns--some mail programs might mangle
this.  If it looks like it's a mess, try re-sizing your window larger.
If it still doesn't look right then your mailer has inserted hard
line-breaks.

-- Begin --
			The Romance of Merlin.

                              ---------

			      CHAPTER I
.
	     CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

Fvll wrothe and angry  was  the  Deuell,  whan that  oure  lorde  [Fol 1a.]
hadde  ben  in  helle, and  had  take  oute  Adam  and  Eve, and  Anger of the
other at his plesier; and whan the fendes sien that, they hadden  Devil against
right  grete feer and  gret merveile;  thei assembleden to-gedir, our Lord.
and  seiden,  "What  is  he  this  thus  vs  supprisith and dis-
troyeth, in so moche that our strengthes ne nought ellis that we  Assembly of
haue  may  nought with-holde  hym,  nor again  hym  stonde in no  the fiends
diffence; but that he doth all that hym lyketh, we ne  trowe not  and their dis-
that eny man myght  be bore  of  woman,  but  that he sholde ben  cussion.
oures, and he that thus vs distroyeth, how is he born in whom we 
[did]1  knowe non erthely delyte."   Than ansuerde anothir fende 
and seide,  "He this hath distroyed  that  which we wende sholde 
haue be mooste oure a-vaile.  Remembre ye not  how the prophetes  The prophets
seiden,  how  that god  shulde  come in to erthe for to saue the  said that God
synners  of Adam and Eve,  and we  yeden  bysily  a-boute  theym  should come
that so seiden,  and dide them moste turment of eny othir pepill, on earth to
and it semed by their [feire]1  semblant, that it greved hem but  save sinners.
litill or nought, but they comforted hem that weren synners, and 
seide that oon  sholde come,  which sholde  delyuer  hem  out of 
tharldome and disese.

                            1 Illegible

                                                                1
-- End --

First let's ignore the running analysis and do a simple markup of the
main body of the passage:

---
<div n="1" type="chapter">
 <head type="title">The Romance of Merlin.</head>
 <head type="section title">CHAPTER I</head>
 <head type="subtitle">CONSULTATION OF DEVILS, AND BIRTH OF
 MERLIN.</head>

 <pb ed="folio" n="1a" />

 <p>Fvll wrothe and angry was the Deuell, whan that oure lorde hadde
  ben in helle, and had take oute Adam and Eve, and other at his
  plesier; and whan the fendes sien that, they hadden right grete feer
  and gret merveile; thei assembleden to-gedir, and seiden, "What is
  he this thus vs supprisith and dis-troyeth, in so moche that our
  strengthes ne nought ellis that we haue may nought with-holde hym,
  nor again hym stonde in no diffence; but that he doth all that hym
  lyketh, we ne trowe not that eny man myght be bore of woman, but
  that he sholde ben oures, and he that thus vs distroyeth, how is he
  born in whom we <unclear resp="wheatly">did</unclear> knowe non
  erthely delyte."  Than ansuerde anothir fende and seide, "He this
  hath distroyed that which we wende sholde haue be mooste oure
  a-vaile.  Remembre ye not how the prophetes seiden, how that god
  shulde come in to erthe for to saue the synners of Adam and Eve, and
  we yeden bysily a-boute theym that so seiden, and dide them moste
  turment of eny othir pepill, and it semed by their <unclear
  resp="wheatly">feire</unclear> semblant, that it greved hem but
  litill or nought, but they comforted hem that weren synners, and
  seide that oon sholde come, which sholde delyuer hem out of
  tharldome and disese.</p>

 <pb ed="eets" n="1" />
</div>
...

Except for the <unclear> tag this is all basic TEI-Lite.

We have replaced the page break note and the page break with <pb>
tags which indicate which edition they came from (the original folio
manuscript or the EETS edition).

We have also marked up the `Illegible' text with the <unclear> tag
with the responsibility attribute indicating that the original editor
of the EETS edition, Henry B. Wheatley was responsible for indicating
that the marked text was unclear.

Now, what about the running analysis. Can we use TEI to mark this up
as well?  

Yes.

---
<div n="1" type="chapter">
 <head type="title">The Romance of Merlin.</head>
 <head type="section title">CHAPTER I</head>
 <head type="sybtitle">CONSULTATION OF DEVILS, AND BIRTH OF 
 MERLIN.</head>

 <pb ed="folio" n="1a" />

 <p>
  <seg id="1">Fvll wrothe and angry was the Deuell, whan that oure
  lorde hadde ben in helle, and had take oute Adam and Eve, and other
  at his plesier; and whan the fendes sien that, they hadden right
  grete feer and gret merveile;</seg>

  <seg id="2">thei assembleden to-gedir, and seiden, "What is he this
  thus vs supprisith and dis- troyeth, in so moche that our strengthes
  ne nought ellis that we haue may nought with-holde hym, nor again
  hym stonde in no diffence; but that he doth all that hym lyketh, we
  ne trowe not that eny man myght be bore of woman, but that he sholde
  ben oures, and he that thus vs distroyeth, how is he born in whom we
  <unclear resp="wheatly">did</unclear> knowe non erthely delyte."  Than
  ansuerde anothir fende and seide, "He this hath distroyed that which
  we wende sholde haue be mooste oure a-vaile.</seg>

  <seg id="3">Remembre ye not how the prophetes seiden, how that god
  shulde come in to erthe for to saue the synners of Adam and Eve, and
  we yeden bysily a-boute theym that so seiden, and dide them moste
  turment of eny othir pepill, and it semed by their <unclear
  resp="wheatly">feire</unclear> semblant, that it greved hem but
  litill or nought, but they comforted hem that weren synners, and
  seide that oon sholde come, which sholde delyuer hem out of
  tharldome and disese.</seg>
 </p>

 <pb ed="eets" n="1" />

 <interpGrp type='analysis' resp="wheatley">
   <interp id='1' value='Anger of the Devil against our Lord.' />
   <interp id='2' value='Assembly of the fiends and their discussion.' />
   <interp id='3' value='The prophets said that God should come on earth
   to save sinners.' />
 </interpGrp>
</div>
...

We've broken the paragraph into sections which each have an id which
is used to link textual analysis using <interp> tags which are
collected in a `interpGrp' group tag.

An <interpGrp> can be created for the whole chapter or paragraph by
paragraph. 

Obviously this solution uses full blown TEI, not just the TEI-Lite
subset, but it does work.

I would suggest that in the case of works like The Romance of Merlin,
that PG should create two editions.

The first would be a clean, reference edition of the original text,
and then a second edition with all of the textual analysis and notes
from the victorian edition.

So using FRBR Entities the resulting texts would look something like
this (I think, this is very confusing and I could have screwed this up).

   W  Romance of Merlin (circa 1450-1460).
   E ... Text of Middle Eng. Translation of Middle French Suite De Merlin.
   M  ...... Original MS. Transcription.
   I  ......... Manuscript (University Library, Cambridge University)
   E  ...... Merlin A Prose Romance (Edited By Henry B. Wheatley, 
              Introduction by Edward Mead, 1899)
   M  ......... EETS ed. London, 1899.
   M  ......... PG TEI Master ed based on EETS ed. 2004.
   F  ............... PG Plain Text Ed.
   F  ............... PG HTML Ed.
   E  ...... Text Only Electronic Edition (TEI Ed. 2004).
   M  ............ PG TEI Master Ed.
   F  ............... PG Plain Text Ed.
   F  ............... PG HTML Ed.

W ==> Work
E ==> Expression
M ==> Manifestation
F ==> Format
I ==> Item/Instance

The Text-Only TEI version could then be used as a base reference text
by anyone to create new annotated editions.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From jon at noring.name  Thu Oct 28 07:24:14 2004
From: jon at noring.name (Jon Noring)
Date: Thu Oct 28 07:24:26 2004
Subject: [gutvol-d] Final PGTEI... page numbers
In-Reply-To: <65.36dc4587.2eb20b75@aol.com>
References: <65.36dc4587.2eb20b75@aol.com>
Message-ID: <501212428078.20041028082414@noring.name>

Bowerbird said:
> brad said:

>> Detractors of XML on this list have brought up the fact that the
>> TEI manual is 1400 pages long as a negative.  Why?

> but since you asked, the reason this is seen as a "negative"
> is because we think that precious few of the volunteers who
> have traditionally shouldered the effort of creating e-texts
> will continue to do so if an understanding of those 1400 pages
> of t.e.i. documentation were to become a prerequisite.
>
> but maybe now distributed proofreaders has enough people
> on-board that they feel less uncomfortable taking that risk...

The "1400" pages is for the full-blown TEI spec, which includes some
pretty obscure stuff. Interspersed within it include long and (to me)
fascinating general discourses on the structure of textual documents,
with copious examples. In essence, it is probably one of the better
"textbooks" ever written on this topic even if it is only there to
support the description of the TEI markup.


> or maybe not, as their tentative plan thus far involves
> adding two "markup" rounds (at least, and maybe more)
> to their existing two "proofing" rounds, so as to minimize
> the number of people who need to be concerned with markup.

Essentially yes. Distributed Proofreader's longer-term vision, as I
understand it (and Juliet can correct me where I'm off anywhere in
this message), is to settle upon some subset of TEI to apply to all
documents (either use TEI-Lite or some other comparable subset -- for
the occasional oddball document the more extended TEI will be used in
"manual" mode.) In addition, for most of DP's volunteers, the markup
will be "under-the-hood" and largely invisible -- most of the
volunteer work anyway is for copyediting the text (correcting OCR
errors), not markup insertion, so no need to require these volunteers
to learn the gory details of TEI. Only the most experienced and
interested of the DP volunteers, who do the final cleanup/finishing
stages, will actually play with the markup itself.


> as you put it, the learning of a complex system like t.e.i. is often
> "a gradual process of incremental epiphanies".  can we _survive_
> the situation where thousands of volunteers are put through that?
> with perhaps many becoming alienated in the course of doing so?

Well as I noted above, DP, where the action is for large-scale
production of e-texts (they are now the actual engine which drives
PG's growth), does not plan to inflict TEI on the general first-level
volunteers (this is what I inferred from my talks a while back with
Charles.) With regards to the specifics of the markup which DP will
eventually use (likely a subset of TEI as previously noted), that will
ultimately be determined by them based on compatibility with the
production interface as well as what works best for the various uses
(note the plural) of the texts.

[Aside: the DP-produced XML Master texts will certainly be used for
many purposes, all of which instill requirements on the markup
specification, and which must be considered -- this is the biggest
missing area not being discussed on gutvol-*. The most exciting of
these is where the DPXML texts will be archived into a special
library-like repository which allows a very high-level of end-user
interface and customizability to the collection (e.g., bookmarking,
annotation, interlinking within the repository and to other content
repositories, blogging, etc. -- all things several associates and I
are now working on.) Of course, the other uses are to generate
portable digital formats as the end-user wants, higher-quality
text-to-speech capability, and Michael Hart's dream of language
translation. These, too, guide the nature of the Master markup
vocabulary. Of course, there must be library-compatible and properly
designed catalog, metadata, and identifier information for each e-text
in the repository. And where they exist, the original page scans of
the source documents will also be available and interlinked with the
XML versions. Brewster Kahle at the Internet Archive will *gladly*
archive the page scans for DP/PG. I envision that most of the earlier
portion of the PG collection, which contains most of the classics,
will be redone by DP from source documents to assure proper metadata
collection, uniformity and conformity with the rest of the DPXML texts
and to have the page scans available. Once DP gets into major
production with many more volunteers, redoing the earlier texts won't
be a big deal -- it needs to eventually be done anyway, in my view.]

I would think and hope that DP will convene a formalized working group
of the various experts and enthusiasts here and elsewhere to hammer
out the DP Markup Specification based on requirements gathering and
analysis, which is the proper way to do this. The DPMWG will have a
more formalized and committed leadership structure, with weekly
teleconference calls. From my standards working group experience, it's
amazing how much stuff gets done during weekly teleconferences and the
occasional face-to-face meeting (biannual or annual), while written
listserv exchanges in a group like gutvol-* usually ends up going
around and around in circles. I expect it won't take that long to
hammer out the "beta" of the DP Markup Vocabulary when the working
group is organized properly and committed to generate and then resolve
the various requirements.

I would even ask someone like C. Michael Sperberg-McQueen to be an
advisor to the working group (his brother Roger Sperberg and I have
worked closely together on various projects in the past. <smile/>.) I
would think that DP's vision to include TEI in its next generation
system so as to do *large-scale* production of e-texts (possibly up to
a few hundred *per day* to begin the process of one million texts in
a decade or two) will greatly excite the TEI community and we will
attract some pretty smart and dedicated working group members to add
to the several already here. Volunteerism is not only for the "rank
and file" (those who will do the basic copyediting), but also includes
those who are more technically minded and understand the markup issues
as it relates to the production environment.

Jon Noring

From joshua at hutchinson.net  Thu Oct 28 07:36:51 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 28 07:36:55 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
Message-ID: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com>

That does work, but why note use the already existing <note place="margin"> markup?  Then, you don't have to do the extra work of segmenting your paragraph for the same result.  For an HTML edition, it already works in our transform.  The text version we just have to decide HOW to handle it, then code up the transform.

My version of your example:

 <div n="1" type="chapter">
  <head type="title">The Romance of Merlin.</head>
  <head type="section title">CHAPTER I</head>
  <head type="subtitle">CONSULTATION OF DEVILS, AND BIRTH OF
  MERLIN.</head>
 
  <pb ed="folio" n="1a" />
 
  <p><note place="margin">Anger of the Devil against our Lord.</note>
   Fvll wrothe and angry was the Deuell, whan that oure lorde hadde
   ben in helle, and had take oute Adam and Eve, and other at his
   plesier; and whan the fendes sien that, they hadden right grete feer
   and gret merveile; <note place="margin">Assembly of the fiends and their discussion.</note>
   thei assembleden to-gedir, and seiden, "What is
   he this thus vs supprisith and dis-troyeth, in so moche that our
   strengthes ne nought ellis that we haue may nought with-holde hym,
   nor again hym stonde in no diffence; but that he doth all that hym
   lyketh, we ne trowe not that eny man myght be bore of woman, but
   that he sholde ben oures, and he that thus vs distroyeth, how is he
   born in whom we <unclear resp="wheatly">did</unclear> knowe non
   erthely delyte."  Than ansuerde anothir fende and seide, "He this
   hath distroyed that which we wende sholde haue be mooste oure
   a-vaile.  <note place="margin">The prophets said that God should come on earth to save sinners.</note>
   Remembre ye not how the prophetes seiden, how that god
   shulde come in to erthe for to saue the synners of Adam and Eve, and
   we yeden bysily a-boute theym that so seiden, and dide them moste
   turment of eny othir pepill, and it semed by their <unclear
   resp="wheatly">feire</unclear> semblant, that it greved hem but
   litill or nought, but they comforted hem that weren synners, and
   seide that oon sholde come, which sholde delyuer hem out of
   tharldome and disese.</p>
 
  <pb ed="eets" n="1" />
 </div>


Josh


----- Original Message -----
From: Brad Collins <brad@chenla.org>
To: Project Gutenberg Volunteer Discussion <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
Date: Thu, 28 Oct 2004 20:57:42 +0700

> 
> 
> 
> The side note is not always the same as an end note or foot note.
> Take the following attempt at an ascii version of the first page of
> the EETS edition of the Romance of Merlin.  It's the best I could do
> using a proportional font.
> 
> Note: This example is 80 columns--some mail programs might mangle
> this.  If it looks like it's a mess, try re-sizing your window larger.
> If it still doesn't look right then your mailer has inserted hard
> line-breaks.
> 
> -- Begin --
> 			The Romance of Merlin.
> 
>                               ---------
> 
> 			      CHAPTER I
> .
> 	     CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.
> 
> Fvll wrothe and angry  was  the  Deuell,  whan that  oure  lorde  [Fol 1a.]
> hadde  ben  in  helle, and  had  take  oute  Adam  and  Eve, and  Anger of the
> other at his plesier; and whan the fendes sien that, they hadden  Devil against
> right  grete feer and  gret merveile;  thei assembleden to-gedir, our Lord.
> and  seiden,  "What  is  he  this  thus  vs  supprisith and dis-
> troyeth, in so moche that our strengthes ne nought ellis that we  Assembly of
> haue  may  nought with-holde  hym,  nor again  hym  stonde in no  the fiends
> diffence; but that he doth all that hym lyketh, we ne  trowe not  and their dis-
> that eny man myght  be bore  of  woman,  but  that he sholde ben  cussion.
> oures, and he that thus vs distroyeth, how is he born in whom we 
> [did]1  knowe non erthely delyte."   Than ansuerde anothir fende 
> and seide,  "He this hath distroyed  that  which we wende sholde 
> haue be mooste oure a-vaile.  Remembre ye not  how the prophetes  The prophets
> seiden,  how  that god  shulde  come in to erthe for to saue the  said that God
> synners  of Adam and Eve,  and we  yeden  bysily  a-boute  theym  should come
> that so seiden,  and dide them moste turment of eny othir pepill, on earth to
> and it semed by their [feire]1  semblant, that it greved hem but  save sinners.
> litill or nought, but they comforted hem that weren synners, and 
> seide that oon  sholde come,  which sholde  delyuer  hem  out of 
> tharldome and disese.
> 
>                             1 Illegible
> 
>                                                                 1
> -- End --
> 
> First let's ignore the running analysis and do a simple markup of the
> main body of the passage:
> 
> ---
> <div n="1" type="chapter">
>  <head type="title">The Romance of Merlin.</head>
>  <head type="section title">CHAPTER I</head>
>  <head type="subtitle">CONSULTATION OF DEVILS, AND BIRTH OF
>  MERLIN.</head>
> 
>  <pb ed="folio" n="1a" />
> 
>  <p>Fvll wrothe and angry was the Deuell, whan that oure lorde hadde
>   ben in helle, and had take oute Adam and Eve, and other at his
>   plesier; and whan the fendes sien that, they hadden right grete feer
>   and gret merveile; thei assembleden to-gedir, and seiden, "What is
>   he this thus vs supprisith and dis-troyeth, in so moche that our
>   strengthes ne nought ellis that we haue may nought with-holde hym,
>   nor again hym stonde in no diffence; but that he doth all that hym
>   lyketh, we ne trowe not that eny man myght be bore of woman, but
>   that he sholde ben oures, and he that thus vs distroyeth, how is he
>   born in whom we <unclear resp="wheatly">did</unclear> knowe non
>   erthely delyte."  Than ansuerde anothir fende and seide, "He this
>   hath distroyed that which we wende sholde haue be mooste oure
>   a-vaile.  Remembre ye not how the prophetes seiden, how that god
>   shulde come in to erthe for to saue the synners of Adam and Eve, and
>   we yeden bysily a-boute theym that so seiden, and dide them moste
>   turment of eny othir pepill, and it semed by their <unclear
>   resp="wheatly">feire</unclear> semblant, that it greved hem but
>   litill or nought, but they comforted hem that weren synners, and
>   seide that oon sholde come, which sholde delyuer hem out of
>   tharldome and disese.</p>
> 
>  <pb ed="eets" n="1" />
> </div>
> ...
> 
> Except for the <unclear> tag this is all basic TEI-Lite.
> 
> We have replaced the page break note and the page break with <pb>
> tags which indicate which edition they came from (the original folio
> manuscript or the EETS edition).
> 
> We have also marked up the `Illegible' text with the <unclear> tag
> with the responsibility attribute indicating that the original editor
> of the EETS edition, Henry B. Wheatley was responsible for indicating
> that the marked text was unclear.
> 
> Now, what about the running analysis. Can we use TEI to mark this up
> as well?  
> 
> Yes.
> 
> ---
> <div n="1" type="chapter">
>  <head type="title">The Romance of Merlin.</head>
>  <head type="section title">CHAPTER I</head>
>  <head type="sybtitle">CONSULTATION OF DEVILS, AND BIRTH OF 
>  MERLIN.</head>
> 
>  <pb ed="folio" n="1a" />
> 
>  <p>
>   <seg id="1">Fvll wrothe and angry was the Deuell, whan that oure
>   lorde hadde ben in helle, and had take oute Adam and Eve, and other
>   at his plesier; and whan the fendes sien that, they hadden right
>   grete feer and gret merveile;</seg>
> 
>   <seg id="2">thei assembleden to-gedir, and seiden, "What is he this
>   thus vs supprisith and dis- troyeth, in so moche that our strengthes
>   ne nought ellis that we haue may nought with-holde hym, nor again
>   hym stonde in no diffence; but that he doth all that hym lyketh, we
>   ne trowe not that eny man myght be bore of woman, but that he sholde
>   ben oures, and he that thus vs distroyeth, how is he born in whom we
>   <unclear resp="wheatly">did</unclear> knowe non erthely delyte."  Than
>   ansuerde anothir fende and seide, "He this hath distroyed that which
>   we wende sholde haue be mooste oure a-vaile.</seg>
> 
>   <seg id="3">Remembre ye not how the prophetes seiden, how that god
>   shulde come in to erthe for to saue the synners of Adam and Eve, and
>   we yeden bysily a-boute theym that so seiden, and dide them moste
>   turment of eny othir pepill, and it semed by their <unclear
>   resp="wheatly">feire</unclear> semblant, that it greved hem but
>   litill or nought, but they comforted hem that weren synners, and
>   seide that oon sholde come, which sholde delyuer hem out of
>   tharldome and disese.</seg>
>  </p>
> 
>  <pb ed="eets" n="1" />
> 
>  <interpGrp type='analysis' resp="wheatley">
>    <interp id='1' value='Anger of the Devil against our Lord.' />
>    <interp id='2' value='Assembly of the fiends and their discussion.' />
>    <interp id='3' value='The prophets said that God should come on earth
>    to save sinners.' />
>  </interpGrp>
> </div>
> ...
> 
> We've broken the paragraph into sections which each have an id which
> is used to link textual analysis using <interp> tags which are
> collected in a `interpGrp' group tag.
> 
> An <interpGrp> can be created for the whole chapter or paragraph by
> paragraph. 
> 
> Obviously this solution uses full blown TEI, not just the TEI-Lite
> subset, but it does work.
> 
> I would suggest that in the case of works like The Romance of Merlin,
> that PG should create two editions.
> 
> The first would be a clean, reference edition of the original text,
> and then a second edition with all of the textual analysis and notes
> from the victorian edition.
> 
> So using FRBR Entities the resulting texts would look something like
> this (I think, this is very confusing and I could have screwed this up).
> 
>    W  Romance of Merlin (circa 1450-1460).
>    E ... Text of Middle Eng. Translation of Middle French Suite De Merlin.
>    M  ...... Original MS. Transcription.
>    I  ......... Manuscript (University Library, Cambridge University)
>    E  ...... Merlin A Prose Romance (Edited By Henry B. Wheatley, 
>               Introduction by Edward Mead, 1899)
>    M  ......... EETS ed. London, 1899.
>    M  ......... PG TEI Master ed based on EETS ed. 2004.
>    F  ............... PG Plain Text Ed.
>    F  ............... PG HTML Ed.
>    E  ...... Text Only Electronic Edition (TEI Ed. 2004).
>    M  ............ PG TEI Master Ed.
>    F  ............... PG Plain Text Ed.
>    F  ............... PG HTML Ed.
> 
> W ==> Work
> E ==> Expression
> M ==> Manifestation
> F ==> Format
> I ==> Item/Instance
> 
> The Text-Only TEI version could then be used as a base reference text
> by anyone to create new annotated editions.
> 
> b/
> 
> -- 
> Brad Collins <brad@chenla.org>, Bangkok, Thailand
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Thu Oct 28 07:58:18 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 28 07:58:23 2004
Subject: [gutvol-d] Final PGTEI... page numbers
Message-ID: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com>


----- Original Message -----
From: Jon Noring <jon@noring.name>
> 

<snipped a lot of good stuff>

I just wanted to address the DP/PG TEI Working Subcommittee idea.

Probably not going to happen.  Things tend happen like this around DP:

One or two people see something irritating they want to scratch badly enough to start working on it themselves.  Then, they come up with a workable implementation that is put out as a "test" of some sort.  It gets refined and used more and more until it becomes a defacto standard.  (See our current proofing guidelines at DP for a prime example of this process.)

In this case, TEI was talked about but no one wanted to "scratch the itch" enough to take it on as their baby.  I'm finally irritated enough by multiple formats that I've decided to get something going.  With Marcello's invaluable technical expertise, I'm trying to keep the ball rolling.  It's slowed down by the fact that I am NOT a TEI/XML expert, but I am making some progress.  I've already learned a ton in the last week ... enough to be impressed by the TEI spec and by Marcello's work on the transforms so far.

My goal is to have a working XML->HTML and XML->TEXT conversion for 90% of the texts that go through DP sometime before Christmas.  The caveat here is that I'm still learning and while it looks doable to me now, I may learn something tomorrow that makes me revise my estimate.

So, for now, I guess I'm the "unofficial" working committee (well, Marcello, too, since I keep bugging the poor guy constantly and he's still nice enough to respond to my e-mails).  Others have provided very helpful pointers and advice, too, but I'm hoping to push this past just talking about it and into actually having something that works at some level.  (Right now, the XML -> HTML conversion is "almost" there ... the XML -> TEXT conversion needs more work.)

Josh
From scott_bulkmail at productarchitect.com  Thu Oct 28 08:03:37 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Thu Oct 28 08:11:58 2004
Subject: [gutvol-d] PGTEI and more
Message-ID: <p06110402bda6b3197d03@[192.168.0.52]>

My feedback on PGTEI is too long for email, so I posted it here:
	http://Classicosm.com/xml/feedbackonpgtei.html

Sections include:
	PGTEI Vocabulary
	PGTEI Examples
	PGTEI Documentation
	Generated HTML
	Default CSS
	Generated PDF
	Generated Text
	PGText to PGTEI

I also put together a (rough draft!) Quick Reference table, including a comparison to XHTML http://Classicosm.com/xml/pgteiquickreference.html

To help explore alternatives to TEI, I critique a side-by-side comparison to a dedicated vocabulary for plays: http://Classicosm.com/xml/tei-vs-play.html

Feedback welcome!!!
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - Classic Books
http://ProductArchitect.com/ - consulting
From scott_bulkmail at productarchitect.com  Thu Oct 28 08:03:42 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Thu Oct 28 08:12:02 2004
Subject: [gutvol-d] Final PGTEI... page numbers
In-Reply-To: <wk4qkfze07.fsf@chenla.org>
References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com>
	<p0611041ebda47356d3c0@[192.168.0.52]> <wk4qkfze07.fsf@chenla.org>
Message-ID: <p06110401bda6b2083cb3@[192.168.0.52]>

>Presentational markup should be used to indicate how the original was
>marked up.

Aha!  That wasn't clear to me since I've been approaching TEI as a "master" format, whereas it was really designed to describe existing texts (which is fine; that's also something I hope is part of PG's XML solution).


>  Instructions for how something should be displayed should
>be done using CSS or XSLT.

Agreed.  (Though I include all transformation methods here, not just XSLT.)


>I've come to respect TEI more and more as a mature body of
>experience which I am trusting more and more.  If something seems
>stupid or awkward I now try to stop and step back and assume that
>there is a good chance I don't understand the design before trying to
>cobble to together my own solution.

I think that's a good approach with things like TEI, XHTML, etc.  A bunch of very smart people spent quite a bit of time on them.

Three caveats:
1. there are still aspects that are *truly* awkward, e.g. rend="display" to indent (though I welcome a good explanation)

2. the design goals for TEI (or any other particular solution) may not match PG's design goals

3. different people work differently, so there's often no one "best" answer (e.g. some people love XSLT, some hate it)
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - Classic Books
http://ProductArchitect.com/ - consulting
From joshua at hutchinson.net  Thu Oct 28 08:38:05 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 28 08:38:09 2004
Subject: [gutvol-d] PGTEI notes webpage
Message-ID: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>

I have almost no notes on the page, but I do have a link to the current test.xml I'm using and the resulting html and text from the online conversion.  If you're at all interested in on-going progress, you can see if here.

http://home.alltel.net/hutch2000/test.html

Josh
From jon at noring.name  Thu Oct 28 09:13:24 2004
From: jon at noring.name (Jon Noring)
Date: Thu Oct 28 09:15:11 2004
Subject: (And a note on organization/Board) Re: [gutvol-d] Final PGTEI... page
	numbers
In-Reply-To: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com>
References: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com>
Message-ID: <1201218978515.20041028101324@noring.name>

Josh wrote:
> Jon Noring:

> I just wanted to address the DP/PG TEI Working Subcommittee idea.
>
> Probably not going to happen.  Things tend happen like this around DP:
>
> One or two people see something irritating they want to scratch
> badly enough to start working on it themselves.  Then, they come up
> with a workable implementation that is put out as a "test" of some
> sort.  It gets refined and used more and more until it becomes a
> defacto standard.  (See our current proofing guidelines at DP for a
> prime example of this process.)

Well, that is the way things are currently done in DP. But if Charles
and Juliet decide it is time to formalize some of the next generation
system development, they will make it happen. The option to create a
formal Working Group is always an option, and recommended at some
stage, even if it is to simply "finish" what is currently being done
by the various people individually hammering away at it and doing an
excellent job (such as you and Marcello, among others.)

Such a formal Working Group can attract some pretty sharp minds in the
TEI, text conversion, and other related communities to contribute
their time and energy and informed insights, and this has **many other
tangible benefits** to the goals of the DP and PG projects besides
just coming up with a workable TEI subset DP can use in its future
activities: it is important not to ignore the human and social
networking element in the equation, something which techno-geeks tend
to overlook.

For example, this gets "buy-in" to the DP/PG vision by many interested
communities (it now becomes "their" project), and their many
connections will greatly benefit DP and PG in its various activities,
such as greatly improving the chances of Foundation and similar
funding to help move DP's and PG's activities to the next level of
production, quality and wider acceptance. DP and PG are volunteer
activities -- it is best to do what is necessary to get the largest
number of the sharpest volunteer minds, individual and organizational.
Formalizing the various processes will help with attracting these
volunteers -- they tend not to join movements which have no
centralized authority and which don't try to forge close working
relationships with many related communities. (PG is essentially
rudderless in leadership by design, and does little effort to reach
out to other well-known organizations to form strategic partnerships
-- it acts as if the rest of the world does not exist. For example,
has PG tried to form a close working alliance with DAISY so as to
plugin with the accessibility community and to mobilize its help?)

(Note that Mozilla is now exploding on the scene and making a huge
impact with Firefox by competing directly with IE, and this is partly
because it is coming together in a more formal, organized way -- the
Mozilla Foundation -- with leadership which recognizes that even
volunteer, open source projects which aspire to greatness need to be
well-organized and to "network" closely with various recognized
communities -- to play with the proverbial "Big Boys". I know the
anarchist-oriented geeks here do not accept this assessment. For info
on who serves on the Board of Directors of the Mozilla Foundation:
http://www.mozilla.org/foundation/ . It includes Mitch Kapor, as the
Chair.)

For starters, why doesn't the PG Board of Trustees include some of the
top names in the etext, library and digital archiving, accessibility,
and public domain advocacy communities, all of whom support the
purpose and goals of PG and DP? Why isn't Brewster Kahle, for example,
on the Board? Why isn't there a representative of ALA on the Board?
Why isn't George Kerscher of DAISY, or someone of his caliber in the
accessibility community, on the Board? What about Larry Lessig or
John Perry Barlow or Cory Doctorow? Having such a distinguished Board
will open up all kinds of doors for PG including funding opportunities
-- and this can be done without compromising any of the goals and
vision of PG. Personally, I believe it will *attract* many more
enthusiastic volunteers, too, and create new excitement. And the DP-
produced texts will now become more important to many other
organizations since they now have a more personal stake in the work
product. Success breeds success; momentum is built.

Anyway, this is getting off of the main topic of gutvol-d...

Jon Noring

From joshua at hutchinson.net  Thu Oct 28 09:25:09 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 28 09:25:14 2004
Subject: [gutvol-d] PGTEI and more
Message-ID: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>


----- Original Message -----
From: Scott Lawton <scott_bulkmail@productarchitect.com>
> 
> My feedback on PGTEI is too long for email, so I posted it here:
> 	http://Classicosm.com/xml/feedbackonpgtei.html
> 
> Feedback welcome!!!

    quote is used in an example but apparently isn't part of TEI Lite (it's not in link_outAppendix A). What's the story?

It is part of the full TEI spec.  Thanks for pointing it out.  I meant to have it in my test.xml, but I forgot.  The test.xml should have <quote rend="display"> for blockquotes (and will on the next update.)

TEI-Lite is the starting point, but we will probably pull in other stuff from the full spec where we need it.

**

    q: in cases where the quotation marks don't balance, it may be difficult to automatically convert quotation marks to the appropriate q.../q form, and time consuming to manually proof. Accordingly, I suggest this step be left as optional.

I actually agree here.  I prefer using " instead of <q>.  Can any of the experts explain why this is a "bad idea"?

**

    pgHeader looks like it's contains information that should be described in teiHeader (though I'm new to TEI so may be wrong). alice.tei and lmiss.tei both contain pgHeader; the generated PGTEI does not.

Assuming I understand this part right ... The teiHeader contains all the information.  pgHeader is the call out to the part that takes the info in teiHeader and formats into a standard display header when you convert to HTML or TEXT.  Marcello is probably the guy to explain it more fully.

**

    Having separate index tags for TOC, PDF and PDB strikes me as unnecessary and prone to error. Shouldn't the TOC one suffice for all?

    In fact, the tag itself seems redundant. Shouldn't the head itself suffice? (If TEI requires it, that's another example of where I think TEI is too complex.)

Well, the reason they are separate is for the occasion where you have a header, but you don't want that header to appear in the Tabel of Contents.  HTML requires an anchor and <h1> markup both ... this is the TEI equivalent.

As for the multiple index entries, I wondered about the need myself, but I haven't gotten around to asking Marcello about it (or digging through documentation to try to understand the need).

**

     alice.tei: reg="Carroll, Lewis" should use the complete "authority" form, which I believe is "Carroll, Lewis, 1832-1898". Note that unlike the PG website, there are no parens around the dates. Here's an illustration of paren usage: "Baum, L. Frank (Lyman Frank), 1856-1919".

I'm hoping consistency in format will be achieved when we have 1) some examples in place and 2) a web form for generating the, admittedly confusing, teiHeader section.

**

    There appear to be two validation errors, e.g. in the link_outPGTEI documentation:
  Error (7/117): <SPAN> must not contain block level elements like <H1>.
  Error (379/1): The start tag for </P> can't be found.

Marcello knows about these and they will be fixed.

**

    In the documentation, why is "Versprich mir, Heinrich" repeated in the output, the second time in white?

This one confused me for a minute, too... Then I realized, it is the only way a HTML browser will be able to space over the right amount.  In effect, Marcello is trying to make the text invisible.  There may be a better way to hide the spacing text, but I haven't given it much thought yet.  It works now, if not in an "elegant solution" manner.

**

    The lack of space between paragraphs goes against Web conventions. (It's fine as an option but a poor choice for the default.)

Agreed.  I promise it will be changed.

**

Thanks again for your analysis!

Josh
From Bowerbird at aol.com  Thu Oct 28 09:46:28 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Oct 28 09:46:53 2004
Subject: [gutvol-d] Final PGTEI... page numbers
Message-ID: <85.192a0367.2eb27c64@aol.com>

jon noring said:
>   for most of DP's volunteers, the markup will be 
>   "under-the-hood" and largely invisible -- most of the
>   volunteer work anyway is for copyediting the text 
>   (correcting OCR errors), not markup insertion, 
>   so no need to require these volunteers to learn 
>   the gory details of TEI.

actually, that represents a poor understanding of the work-flow
at distributed proofreaders, under the current system anyway,
where the proofers are using a clumsy system of pseudo-markup.


>   Only the most experienced and interested of the DP volunteers, 
>   who do the final cleanup/finishing stages, will actually 
>   play with the markup itself.

well, now you're talking about the system that will be created,
and what that will ultimately look like has not yet been decided.
the way you've put it here is, to some degree, what is desired,
but there's some question about whether proofers can do their job
prior to the introduction of any markup at all.  of course, there is
_also_ some question about how easy it will be to do the proofing
if any obtrusive markup is "inflicted" on the text prior to proofing.

further, at present, "proofing" -- the act of catching and correcting
errors, either in the text or in the formatting -- happens right up
until the end of the text's processing, and i think the finding will be
that obtrusive markup, whenever it occurs, will short-circuit that.
whether the early rounds can be improved to the point that this
"short-circuiting" causes no problems is yet another open question.


>   Aside: the DP-produced XML Master texts will certainly be 
>   used for many purposes, all of which instill requirements on 
>   the markup specification, and which must be considered -- 
>   this is the biggest missing area not being discussed on gutvol-*. 

well, the discussion _here_ carries absolutely no weight at all.
if you want to know what d.p. is going to do, you'll have to go over
to their forums, where they're batting abut these issues right now.
(look under the "everything but distributed proofreaders"  section,
which is an odd place to put such a discussion, wouldn't you think?)


>   The most exciting of these is where the DPXML texts will be 
>   archived into a special library-like repository which allows 
>   a very high-level of end-user interface and customizability 
>   to the collection (e.g., bookmarking, annotation, interlinking 
>   within the repository and to other content repositories, blogging, 
>   etc. -- all things several associates and I are now working on.

sounds like you're off and running.  perhaps you could teach people here
how to crawl first.

-bowerbird
From jeroen at bohol.ph  Thu Oct 28 12:13:17 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Thu Oct 28 12:13:00 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
In-Reply-To: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>
References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>
Message-ID: <418144CD.7070707@bohol.ph>


As I promised already some time ago, I've prepared a draft TEI Lite 
conventions document, and a zip file with a TEI encoded file, following 
that. The end result of the TEI to HTML transform can already be seen in 
PG: http://www.gutenberg.net/etext/10772

To get the guidelines in PDF format: 
http://www.bohol.ph/PG/TEI-PG-Guidelines-0.1.pdf
in open office format: http://www.bohol.ph/PG/TEI-PG-Guidelines-0.1.sxw

To get the sample file: http://www.bohol.ph/PG/IncaLand.zip

I hope to be adding more examples soon, and update the guidelines, among 
others with what things are optional, and what will be required. I hope 
to keep the instructions within 30 pages.

Jeroen Hellingman.
From marcello at perathoner.de  Thu Oct 28 12:19:18 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 28 12:19:27 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <p06110402bda6b3197d03@[192.168.0.52]>
References: <p06110402bda6b3197d03@[192.168.0.52]>
Message-ID: <41814636.9080904@perathoner.de>

Scott Lawton wrote:

> Section 18: I strongly recommend omitting the requirement that TEX
> and NROFF characters must be escaped. (As far as I can tell, that's
> not part of TEI.) It may well be a useful optional feature; perhaps
> it could be turned on by including a specific processing instruction.

That limitation will go away once XSLT can handle entities or until I 
rewrite the XSLT transform into perl.


> quote is used in an example but apparently isn't part of TEI Lite
> (it's not in link_outAppendix A). What's the story?

<quote> is to be used when you quote some written source, <q> when you 
quote direct speech.


> q: in cases where the quotation marks don't balance, it may be
> difficult to automatically convert quotation marks to the appropriate
> q.../q form, and time consuming to manually proof. Accordingly, I
> suggest this step be left as optional.

It is optional. Using <q> will debug your quotes automatically, 
something that is near impossible to program any other way. Using <q> 
will also get the most pretty quote signs the output format can display.


> langUsage: I suggest the standard should be to omit the content of
> the tag (e.g. "British", which is probably more useful as "British
> English" or "English (British)"). This information should be
> generated to ensure consistency. (They appear in the generated PGTEI
> and in alice.tei, but not in lmiss.tei.)

You have to include only the languages you actually use in the text. The 
converter includes some more because it is easier to delete than to add 
and if you declare too many it doesn't hurt.

> pgHeader looks like it's contains information that should be
> described in teiHeader (though I'm new to TEI so may be wrong).
> alice.tei and lmiss.tei both contain pgHeader; the generated PGTEI
> does not. PGTEI Examples

pgHeader is a hack that can be removed once we agree on how to insert 
all that information in the teiHeader.


> Having separate index tags for TOC, PDF and PDB strikes me as
> unnecessary and prone to error. Shouldn't the TOC one suffice for
> all?

Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 
16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you 
don't always want the full <head> to appear in the contents.


> In the documentation, why is "Versprich mir, Heinrich" repeated in
> the output, the second time in white?

Because there is no other way to properly indent a continuation line in 
HTML. If you can figure out one (that does not use tables or 
javascript!) I'd like to hear.


> Are the heuristics from things like GutenMark included? That would
> seem to be quite valuable.

No. Its just a quick perl hack.

Better take GutenMark and make it output TEI instead of HTML. But that 
is a job for the author of GutenMark. Better bug him :-)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From brad at chenla.org  Thu Oct 28 12:45:38 2004
From: brad at chenla.org (Brad Collins)
Date: Thu Oct 28 12:47:21 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
In-Reply-To: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> (Joshua
	Hutchinson's message of "Thu, 28 Oct 2004 09:36:51 -0500")
References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com>
Message-ID: <wk3bzyk4nx.fsf@chenla.org>

"Joshua Hutchinson" <joshua@hutchinson.net> writes:

> That does work, but why note use the already existing <note
> place="margin"> markup?  Then, you don't have to do the extra work
> of segmenting your paragraph for the same result.  For an HTML
> edition, it already works in our transform.  The text version we
> just have to decide HOW to handle it, then code up the transform.
>

I'm not very happy with segmenting either...  but I am trying to fit
the concept of a running analysis into a larger framework for scaling
texts.

To understand what I am talking about it helps to think of a book in
terms of the 3D Modeling concept of LOD (Level of Detail).

A 3D computer model is made up of polygons.  The more polygons you
have, the more detailed the model.  Models used in big Hollywood films
may have millions of polygons in a model.  This allows you to create
believable virtual characters like Gollum, or (sadly) Jar Jar Binks.
But all of those polygons are expensive to render.  And if have a
shot with Gollum in the distance, you will be spending enormous
amounts of resources to draw polygons that can't be seen.

To deal with this, LOD is used to reduce the number of polygons in a
model the farther away it gets and then increase them again as the
model gets closer.

When you see a book from far away, you may only see the title on the
spine on the shelf.  When you get closer you take the book off the
shelf and read a synopsis of the contents of the book on the dust
jacket.  Get closer still and you see a table of contents.  Closer
again you turn to a chapter and there might be a summary of the
chapter at the beginning.  Then, in the case of works like the
Merlin, there is a running analysis which provides a paragraph by
paragraph summary.  Then, finally you get as close as you can and are
confronted by the body of text itself.

So, in this way you can see the running analysis as a way of zooming
in or scaling the text.  In the XML tests I was doing two years ago
on the Merlin I used this concept to progressively zoom in on the
book from a single title in a list, to a brief synopsis to the
detailed synopsis to a table of contents to a chapter summary to the
running analysis.

With this kind of a structural approach to summaries and descriptions
it was easy to create some very powerful browsing interfaces and
indexing mechanisms.

I haven't been happy with what I'd done before with the running
analysis so my last post were working notes towards finding a way
of incorporating summaries at different scales into a text rather
than a proposal for PG.

Your note approach is fine for providing a presentational means of
adding in a running analysis, but it doesn't tell us the span of text
that each note describes.  This is why TEI offers the <seg> and
<span> approach.

I _DO_ agree that this as overkill for PG texts.  When I went to
market today, I was struck by the fact that the egg stalls recieve
eggs from the farms in plastic flats.  The eggs arrive in an
organized structured way.  The eggs sellers then proceed to pile them
into piles in bins by price.  This helps to muddle the difference
between the eggs when you, the customer are picking through them.

The structure and organization provided by the flats was
counter productive for the egg sellers because it made it more
difficult to unload bum eggs.

In the same way, excessive structure in a marked up text makes it more
difficult to transform into simpler formats.  It would be very awkward
to map the notes in the interp tags to the segs in the text. Your
notes approach is a lot easier.

But I like the idea of having a base reference text which others can
use to overlay their own annotations.  The `resp' attribute is good
at indicating who has annotated what in a text, so that you could
easily toggle between annotations from different sources, or strip
them out all together.

The EETS version of the Merlin is a base text which Wheatley has
overlaid all sorts of information.  It's a good idea to keep the
markup mechanisms for overlayed annotations separate from the base
text that is being annotated.

This is a larger issue and goal than simply providing electronic
editions of books, and is beyond what PG is about.  But it is worth
keeping these ideas in the back of your mind, if for no other reason
than to remember that reading a book from cover to cover is not the
only or even the most common way that books are used.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From joshua at hutchinson.net  Thu Oct 28 13:04:53 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Oct 28 13:05:02 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
Message-ID: <20041028200453.19AACEDCA3@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Brad Collins <brad@chenla.org>

<snipped explanation of using <seg> vs using <note> >

OK, the gist I got was that you want to be able to tie the information from the margins to specific sections (a beginning and ending point) rather than tie to a single spot and assume it applies to some indeterminate length ahead.  I can see where <seg> allows that.

Is that much fidelity in the original?  For instance, the <seg> marks you applied ... is there something in the original that says this segment stops *right here*?  I ask because while the beginning of the <seg> seems clear, in the text I'm picturing in my head, the end of the segment would be kinda of fuzzy.  (Unless the text used a marker or line break or something at the same spot you are using <seg> breaks.)

If it is fuzzy, it seems the <note place="margin"> markup provides the same "here forward to some point" fidelity in information.

(I think if I had a scan of the original, I'd probably more fully understand what you were trying to do.)

Josh
From jon at noring.name  Thu Oct 28 13:27:23 2004
From: jon at noring.name (Jon Noring)
Date: Thu Oct 28 13:28:05 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
In-Reply-To: <wk3bzyk4nx.fsf@chenla.org>
References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com>
	<wk3bzyk4nx.fsf@chenla.org>
Message-ID: <1881234217531.20041028142723@noring.name>

Brad wrote:
> Josh wrote:

>> That does work, but why note use the already existing <note
>> place="margin"> markup?  Then, you don't have to do the extra work
>> of segmenting your paragraph for the same result.  For an HTML
>> edition, it already works in our transform.  The text version we
>> just have to decide HOW to handle it, then code up the transform.

> I'm not very happy with segmenting either...  but I am trying to fit
> the concept of a running analysis into a larger framework for scaling
> texts.
>
> [snip of example]
>
> Your note approach is fine for providing a presentational means of
> adding in a running analysis, but it doesn't tell us the span of text
> that each note describes.  This is why TEI offers the <seg> and
> <span> approach...
>
> In the same way, excessive structure in a marked up text makes it more
> difficult to transform into simpler formats.  It would be very awkward
> to map the notes in the interp tags to the segs in the text. Your
> notes approach is a lot easier...
>
> The EETS version of the Merlin is a base text which Wheatley has
> overlaid all sorts of information.  It's a good idea to keep the
> markup mechanisms for overlayed annotations separate from the base
> text that is being annotated.
>
> This is a larger issue and goal than simply providing electronic
> editions of books, and is beyond what PG is about.  But it is worth
> keeping these ideas in the back of your mind, if for no other reason
> than to remember that reading a book from cover to cover is not the
> only or even the most common way that books are used.

We have two issues as I see them here:

1) Notes, sidebars, running analysis, and other types of "out-of-spine"
   chunks of texts, as found in the original source work.

   To what chunk of text in the main, "in-spine" text flow each
   out-of-spine chunk applies to may not be explicitly marked in the
   source text. Rather it must be figured out by contextual analysis.

   Obviously, these "out-of-spine" chunks are important to be kept with
   the Master document format, whatever that may be.

2) Bookmarks, annotations, running commentary, references to and from
   other digital text works, etc., which is added on by third parties.

   This is the exciting aspect to make digital texts very useful, as
   I've previously noted. It is important to keep this stuff separate
   from the Master document format.
   

Assuming the Master digital texts are XML documents, item (2) can be
implemented using the various related W3C specifications of
XLink/XPath/XPointer. For example, it is possible with the full
XPointer specification to define an exact chunk of text within an XML
document.

>From the discussion of implementing item (1) within TEI, it appears
there's more than one way to do it, with segmenting allowing one to
specify the exact range of in-spine text which any out-of-spine
chunk applies to.

Just some general observations without any suggestions.

Jon Noring


(p.s., I use the terms "out-of-spine" and "in-spine" loosely based
upon the Open eBook Publication Structure, which defines these
constructs so ebook reading systems can implement more advanced
ways to present "out-of-spine" content. As Bowerbird noted, such
"out-of-spine" stuff can be presented in more innovative ways than
which is allowed in print, and even in HTML. For example, OEBPS
suggests popups to present out-of-spine content, which the Microsoft
Reader system implements (but which is largely unknown.)

The biggest mistake which the creators of HTML made is not to include
a <note> (or more generically-named) tag, which can define some chunk
of inline text as being "out-of-spine", and thereby be presented in a
popup window or similar innovative fashion. Of course, this would have
added significant complexity to the early browsers such as Mosaic, and
thus probably explains why this feature was not implemented. But this
lack of vision for such a powerful feature is still regrettable.
OpenReader definitely plans on making this a major feature, thus one
reason we're interested in native recognition of TEI documents.)

From marcello at perathoner.de  Thu Oct 28 13:52:10 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Oct 28 13:52:25 2004
Subject: [gutvol-d] Final PGTEI... page numbers
In-Reply-To: <501212428078.20041028082414@noring.name>
References: <65.36dc4587.2eb20b75@aol.com>
	<501212428078.20041028082414@noring.name>
Message-ID: <41815BFA.2000402@perathoner.de>

Jon Noring wrote:

> I would think and hope that DP will convene a formalized working group
> of the various experts and enthusiasts here and elsewhere to hammer
> out the DP Markup Specification based on requirements gathering and
> analysis, which is the proper way to do this.

I think design-by-committee is the wrong way to go about this. 
Experimenting markup with more and more complicated books and refining 
the specs along the way seems to me far more promising.

But that's the Cathedral vs. the Bazaar discussion again.

To see a particularly disgusting example of design by committee just 
look at XSLT.


> The DPMWG will have a
> more formalized and committed leadership structure, with weekly
> teleconference calls. From my standards working group experience, it's
> amazing how much stuff gets done during weekly teleconferences and the
> occasional face-to-face meeting (biannual or annual), while written
> listserv exchanges in a group like gutvol-* usually ends up going
> around and around in circles. 

Teleconferencing will essentially shut out all non-us based people via 
the prohibitive costs or via the language barrier. Non native English 
speakers like me may have a better standing in a written discussion channel.


> I would even ask someone like C. Michael Sperberg-McQueen to be an
> advisor to the working group

I don't know if the TEI people could advise us much.

What we need is not advice about the use of TEI as markup language but 
about the use of TEI as master format for automatic rendition into a 
wide variety of output formats. There is the tei-presentation list for 
this sort of thing but traffic there has been very light.

The only person who could really help is Sebastian Rahtz.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From shalesller at writeme.com  Thu Oct 28 15:07:35 2004
From: shalesller at writeme.com (D. Starner)
Date: Thu Oct 28 15:08:07 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
Message-ID: <20041028220735.784F56EEF6@ws1-5.us4.outblaze.com>

"Joshua Hutchinson" writes:
> (I think if I had a scan of the original, I'd probably more 
> fully understand what you were trying to do.) 

Take a look at any of the EETS works by PM EETS (that is, all
of them except the book titled Early Middle English). They've
got good examples of where you'd want to attach a note to a
strech of text. 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From shalesller at writeme.com  Thu Oct 28 16:06:39 2004
From: shalesller at writeme.com (D. Starner)
Date: Thu Oct 28 16:06:51 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
Message-ID: <20041028230639.1A4AA4BE64@ws1-1.us4.outblaze.com>

I have a few comments on the draft guidelines. It'd be nice to have page
numbers printed on the pages. A letter size PDF would be useful, but the
margins on this one seem generous enough to print on letter-sized paper.

DP probably will not be preserving the long-s, and I think it a little
unrealistic to expect most of PG's XML documents to preserve it. Also,
the description is incorrect; in English, it's used everywhere except
at the end of the word, and it was used until about 1800, making it
used in the 18th century. 

It's always used in Fraktur; are we going to preserve that? Counting 
that, it was used until the middle of the 20th century. It's probably
too minor for this document, but several German documents I've seen
use a non-ligatured long-s/s combination for the eszett, while not
using the long-s elsewhere. Even at the most pedantic, it's arguable 
whether this should be encoded with the long-s.

There should be an option to preserve running headers where they encode
information not found elsewhere. 

I think we should go with standards on the languages section; that is,
RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found
in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish
(how does this differ from sco?) in the draft, and I doubt anything would
choke on it today. #6 is a bad idea, especially as 3 letter 639 codes 
sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad
or x-gaddang is a better idea.

What happened to emph? All I see is rend. Likewise, I'd rather see foreign
do italics and let you mark it with rend="none" if needed, as that would
match how most books do it, and give a guideline to when to use foreign.

I partially marked up Japanese Literature, and eventually decided not to
mark up all the non-italics Japanese words used in running English text,
like names of plants and such. I think a comment to mark up running
foreign text and italized foreign words, but avoid single words, like
the names of plants and foods, in running text if not italized.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From jon at noring.name  Thu Oct 28 17:30:18 2004
From: jon at noring.name (Jon Noring)
Date: Thu Oct 28 17:30:41 2004
Subject: [gutvol-d] Working Group comments
Message-ID: <151248792125.20041028183018@noring.name>

Marcello wrote:
>Jon Noring wrote:

>> I would think and hope that DP will convene a formalized working group
>> of the various experts and enthusiasts here and elsewhere to hammer
>> out the DP Markup Specification based on requirements gathering and
>> analysis, which is the proper way to do this.

> I think design-by-committee is the wrong way to go about this. 
> Experimenting markup with more and more complicated books and refining 
> the specs along the way seems to me far more promising.

Experimentation can be done *while* the working group is working on
the specification, in an iterative process. It is important that
experimentation itself is guided by the requirements established by
the working group in looking at the bigger picture which can only be
done by formalized collaboration between the members representing the
various stakeholders interested in this. Of course, the most important
player is DP, which has to implement the spec into their work flow,
and they will certainly have their own laundry list of requirements.

Anyway, you and others have already made a great head start on the
experimental side, as well as establishing the first proposed "beta"
specification, PGTEI. The working group will not need to start from
scratch, but will be able to build upon the proverbial shoulders of
giants.

And to address the issue that the working group will likely not get it
perfect the first time (it won't) -- in an informal chat I had with
Charles a while back, one strategy he brought up is to come up with a
"version 1.0" of the DP Markup spec, and implement that in the DP
workflow. Then, as production rolls along, continue to update the spec
as necessary to handle problem texts. The issue of compatibility
(forward/backward) between versions of DP Markup spec will need to be
addressed, and may be solved by keeping version 1.0 simpler (a smaller
number of tags) and then add tags as needed over time. We don't want
to remove tags as time goes on (since we may have finished texts using
the deprecated/removed tags), but rather add support over time as
needed.


> But that's the Cathedral vs. the Bazaar discussion again.

Yep.


> To see a particularly disgusting example of design by committee just 
> look at XSLT.

And to see another particularly disgusting example of design by
committee, look at SGML and XML. And the worst of them all: TEI.
Nothing good ever comes out of committees / formalized working groups.

<smile/>


>> The DPMWG will have a
>> more formalized and committed leadership structure, with weekly
>> teleconference calls. From my standards working group experience, it's
>> amazing how much stuff gets done during weekly teleconferences and the
>> occasional face-to-face meeting (biannual or annual), while written
>> listserv exchanges in a group like gutvol-* usually ends up going
>> around and around in circles. 

> Teleconferencing will essentially shut out all non-us based people
> via the prohibitive costs or via the language barrier. Non native
> English speakers like me may have a better standing in a written
> discussion channel.

This is of course an issue. There are now very inexpensive and free
ways to hold a teleconference via VoIP or similar. In addition, all
teleconference meetings, typically 1.0 to 1.5 hours in length, will be
scribed and written Minutes produced. In-between teleconferences we
can discuss details on a group forum. It is possible to hold the
teleconferences at a time which is convenient for those in North/South
America and those in Europe -- it gets tougher to find a good time
when there are those in Australia/East Asia to include along with
Europe/Americas, but this can be worked out somehow.

>From firsthand experience it is amazing how more effective technical
working groups are when they can at least meet by phone on a regular
basis. It is a social and psychological thing which enhances working
together and group creativity, which I've yet to see in text-only
working relationships. At first I was skeptical (being a strong
introvert myself who has always worked "online"), but came around as I
observed the importance of human interaction by voice and in person
for *technical working groups*. It's amazing, really.


>> I would even ask someone like C. Michael Sperberg-McQueen to be an
>> advisor to the working group

> I don't know if the TEI people could advise us much.

I think they will be very helpful -- I know a few others who are very
knowledgeable about TEI in an XML-based publishing environment who can
help establish requirements. A couple people I know are world-class at
XSLT/XSL-FO (I think one of them served a while back on the original
XSL working group at W3C.) A close business acquaintance presently
serves on the CSS3 Working Group at W3C, and is working on high-
quality presentation using XML+CSS in his business.

Having varied views on the DP Markup vocabulary is important.


> What we need is not advice about the use of TEI as markup language but 
> about the use of TEI as master format for automatic rendition into a 
> wide variety of output formats. There is the tei-presentation list for 
> this sort of thing but traffic there has been very light.

Yes, definitely! But note that other groups will also be helpful by
providing requirements, including end-user requirements. Having sharp
tech people from the accessibility, the library/archive, the ebook
publishing, and related communities, will contribute to the bigger
picture by providing requirements that they see from their unique
perspectives (e.g., metadata from the librarian types.) If we want
DP-TNG (TNG: The Next Generation) texts to be good for a wide range of
uses, then we have to have people representing the various user groups
to have their say in the global design of DP Markup.

And I need to re-emphasize the importance to the embracement and
success of PG/DP by involving the various user groups with PG/DP, and
the inclusive working group approach is one of several strategies to
achieve this "buy in".


> The only person who could really help is Sebastian Rahtz.

Yes, definitely. I recall he and I chatted in private email a couple
years ago (for what I don't remember). I'm not sure if he is
subscribed to this forum. He is definitely very sharp and would make a
great addition to the working group if he is able to participate in
some capacity, even if only as an advisor.

Jon Noring

From bkeir at pgdp.net  Thu Oct 28 20:49:32 2004
From: bkeir at pgdp.net (bkeir@pgdp.net)
Date: Thu Oct 28 20:49:49 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
Message-ID: <50877.203.12.144.232.1099021772.squirrel@203.12.144.232>

>     q: in cases where the quotation marks don't balance, it may be
> difficult to automatically convert quotation marks to the appropriate
> q.../q form, and time consuming to manually proof. Accordingly, I
> suggest this step be left as optional.
>
> I actually agree here.  I prefer using " instead of <q>.  Can any of the
> experts explain why this is a "bad idea"?


Presumably <q> </q> is meant to top and tail a quotation, making it
possible to extract quotations from within a work if desired.

However I'd be worried about going to <q> because of the possible
ambiguities in quotations of multiple paragraphs, and the dangers of these
being retransformed to " incorrectly for the text versions.


"We often find at DP that people brought up on reading only contemporary
works, which rarely quote several paragraphs at a time, incorrectly expect
that each paragraph of a quotation needs a closing quote mark.

"People who have read a lot of 19th century books are well aware that
correct usage is that while each paragraph in a quoted passage starts with
a quotation mark, only the final paragraph in a quoted passage gets a
closing one.

"Like this."


From scott_bulkmail at productarchitect.com  Thu Oct 28 21:47:00 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Thu Oct 28 21:51:27 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <wk7jpbj67d.fsf@chenla.org>
References: <1a5.29beadb0.2eb17a83@aol.com>
	<418030C5.7070505@hutchinson.net> <wk7jpbj67d.fsf@chenla.org>
Message-ID: <p06110405bda77364aa78@[192.168.0.52]>

I'd like to address a different issue raised by Brad's example.  It may even be a typo of sorts or just a quick-and-dirty sample that's not representative -- but I've seen it elsewhere and think it should be covered in docs and perhaps verification suites.

>			      CHAPTER I
>.
>	     CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.


><div n="1" type="chapter">
> <head type="title">The Romance of Merlin.</head>
> <head type="section title">CHAPTER I</head>
> <head type="sybtitle">CONSULTATION OF DEVILS, AND BIRTH OF
> MERLIN.</head>

Using the plain meaning of the terms (rather than any special TEI meaning), it's clear that "CONSULTATION..." is the chapter title.  In this particular book, the chapter number appears on the previous line, as a roman numeral, preceeded by the word "CHAPTER" in all caps.  That's worth recording so that we can reproduce the original, but I don't think the above is the best way to do it.

I'm going to suggest some alternatives that seem more logical; perhaps TEI experts can "translate" these into valid TEI (or suggest extensions that are TEI-like).

First, let's take a simpler case; a chapter that starts with just the bare title:

         CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

I think the markup here can be very simple:

<div n="1" type="chapter">
 <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>

I don't think any TYPE attribute is required; that's clear from context.

Now, let's add "CHAPTER I".  It's sort of a label that precedes the actual chapter title (much like "Figure" or such for certain illustrations); that gives us:

<div n="1" type="chapter">
 <head type="label">CHAPTER I</head>
 <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>

NOTE: when automatically extracting chapter titles, it's important to get the first unadorned <head>, i.e. skip <head type="label">.  And, AFAIK, no "index" tag is required.

Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first.  Well, that description suggests:

<div n="1" type="chapter">
 <head type="book">The Romance of Merlin.</head>
 <head type="label">CHAPTER I</head>
 <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>

Thoughts?
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From marcello at perathoner.de  Fri Oct 29 04:00:19 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Oct 29 04:00:45 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <p06110405bda77364aa78@[192.168.0.52]>
References: <1a5.29beadb0.2eb17a83@aol.com>	<418030C5.7070505@hutchinson.net>
	<wk7jpbj67d.fsf@chenla.org> <p06110405bda77364aa78@[192.168.0.52]>
Message-ID: <418222C3.10402@perathoner.de>

Scott Lawton wrote:
> Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first.  Well, that description suggests:
> 
> <div n="1" type="chapter">
>  <head type="book">The Romance of Merlin.</head>
>  <head type="label">CHAPTER I</head>
>  <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
> 
> Thoughts?

The book title is at a different level from a chapter title so it gets 
its own div. If you find multiple chapter titles, you decide which is 
the main one and which are subtitles.

   <div type="book">
     <head>The Romance of Merlin</head>

     <div type="chapter">
       <head type="sub">Chapter I</head>
       <index index="toc" />
       <head>Consultations of Devils, and Birth of Merlin</head>


In PGTEI the <index level1> attribute defaults to the contents of the 
next <head> element. This will give you "Consultatons ..." in the TOC 
instead of "Chapter I".

If you want it different just use

   <index index="toc" level1="Chapter I, Birth of Merlin" />

or something like this.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From brad at chenla.org  Fri Oct 29 06:22:22 2004
From: brad at chenla.org (Brad Collins)
Date: Fri Oct 29 06:23:58 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <418222C3.10402@perathoner.de> (Marcello Perathoner's message
	of "Fri, 29 Oct 2004 13:00:19 +0200")
References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net>
	<wk7jpbj67d.fsf@chenla.org> <p06110405bda77364aa78@[192.168.0.52]>
	<418222C3.10402@perathoner.de>
Message-ID: <wkvfctzmk1.fsf@chenla.org>

Marcello Perathoner <marcello@perathoner.de> writes:


> The book title is at a different level from a chapter title so it gets
> its own div. If you find multiple chapter titles, you decide which is
> the main one and which are subtitles.

In the case of this specific example, the title of the book is
included on the first page of the first chapter.... which I wasn't
quite sure how to markup.  There is no need to include it in an
electronic edition.  The title page in it's own div is the correct
way to go.

The markup for the head elements was off the top of my head, I'm glad
someone caught that and pointed it out.

Cheers,

b/

--
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From joshua at hutchinson.net  Fri Oct 29 08:17:21 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 29 08:17:26 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
Message-ID: <20041029151721.737234F4F8@ws6-5.us4.outblaze.com>

All of this is very good stuff.  But I hope you don't mind if most of it is pushed back to the second iteration of PGTEI.  

My personal thoughts are to get a "standard" in place that handles what DP would normally label Easy through Normal difficult, in Latin-1 compatible texts.  Then, once we have that in place, move on the stuff that gives DP fits on a regular basis, like fraktur, long-s, non-Latin-1 texts (granted DP-Europe handles most of those now).

I definitely want to see the issues you bring up addressed.  I'm just trying to set some realistic boundaries on what we can address on an incremental basis.

Josh

----- Original Message -----
From: "D. Starner" <shalesller@writeme.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] draft TEI conventions and larger example file
Date: Thu, 28 Oct 2004 15:06:39 -0800

> 
> I have a few comments on the draft guidelines. It'd be nice to have page
> numbers printed on the pages. A letter size PDF would be useful, but the
> margins on this one seem generous enough to print on letter-sized paper.
> 
> DP probably will not be preserving the long-s, and I think it a little
> unrealistic to expect most of PG's XML documents to preserve it. Also,
> the description is incorrect; in English, it's used everywhere except
> at the end of the word, and it was used until about 1800, making it
> used in the 18th century. 
> 
> It's always used in Fraktur; are we going to preserve that? Counting 
> that, it was used until the middle of the 20th century. It's probably
> too minor for this document, but several German documents I've seen
> use a non-ligatured long-s/s combination for the eszett, while not
> using the long-s elsewhere. Even at the most pedantic, it's arguable 
> whether this should be encoded with the long-s.
> 
> There should be an option to preserve running headers where they encode
> information not found elsewhere. 
> 
> I think we should go with standards on the languages section; that is,
> RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found
> in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish
> (how does this differ from sco?) in the draft, and I doubt anything would
> choke on it today. #6 is a bad idea, especially as 3 letter 639 codes 
> sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad
> or x-gaddang is a better idea.
> 
> What happened to emph? All I see is rend. Likewise, I'd rather see foreign
> do italics and let you mark it with rend="none" if needed, as that would
> match how most books do it, and give a guideline to when to use foreign.
> 
> I partially marked up Japanese Literature, and eventually decided not to
> mark up all the non-italics Japanese words used in running English text,
> like names of plants and such. I think a comment to mark up running
> foreign text and italized foreign words, but avoid single words, like
> the names of plants and foods, in running text if not italized.
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 

From joshua at hutchinson.net  Fri Oct 29 08:26:43 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 29 08:26:48 2004
Subject: [gutvol-d] PGTEI and more
Message-ID: <20041029152643.91D679E987@ws6-2.us4.outblaze.com>


----- Original Message -----
From: bkeir@pgdp.net
> 
> >     q: in cases where the quotation marks don't balance, it may be
> > difficult to automatically convert quotation marks to the appropriate
> > q.../q form, and time consuming to manually proof. Accordingly, I
> > suggest this step be left as optional.
> >
> > I actually agree here.  I prefer using " instead of <q>.  Can any of the
> > experts explain why this is a "bad idea"?
> 
> 
> Presumably <q> </q> is meant to top and tail a quotation, making it
> possible to extract quotations from within a work if desired.
> 
> However I'd be worried about going to <q> because of the possible
> ambiguities in quotations of multiple paragraphs, and the dangers of these
> being retransformed to " incorrectly for the text versions.
> 
> 
> 
> "We often find at DP that people brought up on reading only contemporary
> works, which rarely quote several paragraphs at a time, incorrectly expect
> that each paragraph of a quotation needs a closing quote mark.
> 
> "People who have read a lot of 19th century books are well aware that
> correct usage is that while each paragraph in a quoted passage starts with
> a quotation mark, only the final paragraph in a quoted passage gets a
> closing one.
> 
> "Like this."
> 
There *is* a way to tell TEI to not print a closing quote, but it is rather cumbersome.  I think <q></q> markup is acceptable if used, but should be optional at best.

Josh
From joshua at hutchinson.net  Fri Oct 29 14:59:11 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Oct 29 14:58:32 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes
In-Reply-To: <b9.49420cb0.2eb18dda@aol.com>
References: <b9.49420cb0.2eb18dda@aol.com>
Message-ID: <4182BD2F.9060701@hutchinson.net>

Bowerbird@aol.com wrote:

>joshua said:
>  
>
>>  Yes, genius, we have that ability, too, in the HTML.  
>>    
>>
>
>do you now?
>then why don't i see more .html versions prepared this way?
>read what i wrote, carefully, and then show me some e-texts
>that have an .html version that can match those capabilities...
>
>  
>
Most of the recent HTML texts from DP use that sort of thing.


>and what's with the "genius" comment?  are you being snide?
>
>
>  
>

Caught that, did you, genius?

>>  We were talking about the plain text version 
>>  which is reader program agnostic.
>>    
>>
>
>my viewer-program takes plain-text files.
>
>you can be as "agnostic" as you care to be, but
>if you don't serve the readers, who are you serving?
>  
>
If the only thing that does what you want is a program that doesn't 
exist, how is that serving the readers?

Josh
From Bowerbird at aol.com  Fri Oct 29 16:24:36 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Oct 29 16:24:57 2004
Subject: [gutvol-d] on the question of sidenotes, footnotes,
	and end-notes
Message-ID: <1e0.2de5534a.2eb42b34@aol.com>

joshua said:
>   Most of the recent HTML texts from DP use that sort of thing.

nope.  you didn't read my post carefully.

show me the .html e-books from d.p. where the reader can
see the body of the text _and_ the note _simultaneously_...


>   Caught that, did you, genius?

yep.  just making sure everyone else did too.

there's this myth going around -- and you're the biggest
proponent of it -- that _i_ am the rude one around here...


>   If the only thing that does what you want 
>   is a program that doesn't exist, 
>   how is that serving the readers?

another falsehood you are working hard to propagate...

the newest beta of my program will go up this evening.
and i expect that it'll come out of beta very soon now...

-bowerbird
From Jeroen.Hellingman at kabelfoon.nl  Fri Oct 29 04:28:42 2004
From: Jeroen.Hellingman at kabelfoon.nl (Jeroen.Hellingman@kabelfoon.nl)
Date: Fri Oct 29 17:21:00 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
Message-ID: <20041029112842.21153556E7@betazoid.kabelfoon.nl>

Op 29-10-2004 01:06, schreef jij:

Thanks for your comments.

> I have a few comments on the draft guidelines. It'd be nice to have
page
> numbers printed on the pages. A letter size PDF would be useful, but
the
> margins on this one seem generous enough to print on letter-sized
paper.

I follow the ISO standard A4 size, which is an international standard,
and offers several benefits to letter, which is only used in the US. I
will keep margins generous though.

> DP probably will not be preserving the long-s, and I think it a
little
> unrealistic to expect most of PG's XML documents to preserve it.
Also,
> the description is incorrect; in English, it's used everywhere
except
> at the end of the word, and it was used until about 1800, making it
> used in the 18th century. 

Agreed. I will make this optional, and indeed, expect most people to
drop it. I normally keep it.

> It's always used in Fraktur; are we going to preserve that? Counting

> that, it was used until the middle of the 20th century. It's
probably
> too minor for this document, but several German documents I've seen
> use a non-ligatured long-s/s combination for the eszett, while not
> using the long-s elsewhere. Even at the most pedantic, it's arguable

> whether this should be encoded with the long-s.

I would suggest, let the person who prepares the text decide.

> There should be an option to preserve running headers where they
encode
> information not found elsewhere. 

I did this once, in an easy, but non TEI fashion, the formal method is
to use <fw> tags.


> I think we should go with standards on the languages section; that
is,
> RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5
found
> in the draft. #4 and #7 can be encoded as en-x-1800 and
en-x-Scottish
> (how does this differ from sco?) in the draft, and I doubt anything
would
> choke on it today. #6 is a bad idea, especially as 3 letter 639 codes

> sometimes overlap with SIL codes; if you need to encode Gaddang,
phi-x-SIL-gad
> or x-gaddang is a better idea.

Isn't sco gaelic?

I agree, and will adjust the guidelines (and the texts and tools I
have)

> What happened to emph? All I see is rend. Likewise, I'd rather see
foreign
> do italics and let you mark it with rend="none" if needed, as that
would
> match how most books do it, and give a guideline to when to use
foreign.

This is due to the way text are produced from printed sources, and it
is often difficult to establish the reason a word is in italics. It
could be because it is considered foreign, or for some other reason.

I use foreign exclusively as a holder for the lang attribute. If you
use the lang attribute consequently, you can use that to isolate
fragments in a certain language.


> I partially marked up Japanese Literature, and eventually decided not
to
> mark up all the non-italics Japanese words used in running English
text,
> like names of plants and such. I think a comment to mark up running
> foreign text and italized foreign words, but avoid single words,
like
> the names of plants and foods, in running text if not italized.

The decision to mark up words as foreign is sometimes difficult, and
can impose a lot of work in such cases. I normally do this, as it helps
much in spell checking. However, I would say, it is not required.


Jeroen.


From sly at victoria.tc.ca  Fri Oct 29 23:55:46 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Fri Oct 29 23:56:07 2004
Subject: [gutvol-d] Sidney L. Gulick
In-Reply-To: <1881234217531.20041028142723@noring.name>
References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com>
	<wk3bzyk4nx.fsf@chenla.org> <1881234217531.20041028142723@noring.name>
Message-ID: <Pine.GSO.4.58.0410292354520.22362@vtn1.victoria.tc.ca>

And now for a slight break from the xml/tei discussion...

One nice thing about helping to add additional information
to the PG catalog is that sometimes I end up learning about
some person who I never would have run accross otherwise.

The most recent example is Sidney L. Gulick

As I read a little more about this man, I thought that this is
someone I would not hesiate to call an American hero.
He dedicated his life to international friendship and understanding,
most notably instigating a program whereby over 12,000 "friendship
dolls" were made by Americans and sent to Japanese schools.

I've put together a wikipedia article about him and linked to it
from his author record in the PG catalog.

http://en.wikipedia.org/wiki/Sidney_Gulick

From stephen.thomas at adelaide.edu.au  Sat Oct 30 00:29:38 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Sat Oct 30 00:30:02 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
Message-ID: <418342E2.5080906@adelaide.edu.au>

Joshua Hutchinson wrote:
> 
> quote is used in an example but apparently isn't part of TEI
> Lite (it's not in link_outAppendix A). What's the story?

The common advice seems to be to use <q> to enclose quoted 
speech *inline*, and use <quote> for quoting larger blocks of 
text. The P4 TEI manual was a bit vague on this, but that seems 
to be a sensible convention worth using.

> 
> It is part of the full TEI spec.  Thanks for pointing it out.
> I meant to have it in my test.xml, but I forgot.  The
> test.xml should have <quote rend="display"> for blockquotes
> (and will on the next update.)

As I understand this (from an earlier post), 'rend="display"' is 
supposed to mean that the block should be indented (rather like 
the HTML blockquote).

This seems like a very poor choice of terms to me. CSS has a 
"display" property, which can take values such as "inline", 
"block", and -- crucially -- "none". "display:none" is used 
where you don't want the content displayed at all.

So using this rend="display" seems likely to result in confusion.

In any case, the choice is poor because it does not convey the 
information desired. If you use <quote> on its own without 
rend="display", does that indicate you don't want to display the 
content? Or that you don't want to indent it?

I personally don't see any need to use rend here. If you are 
quoting a passage from some other work, then enclose it in 
<quote> .. </quote>. That's enough. When someone comes to 
present this (e.g. in an HTML version), the most natural thing 
would be to convert the tag to blockquote. The rend is redundant.

> 
> q: in cases where the quotation marks don't balance, it may
> be difficult to automatically convert quotation marks to the
> appropriate q.../q form, and time consuming to manually
> proof. Accordingly, I suggest this step be left as optional.
> 
> I actually agree here.  I prefer using " instead of <q>.  Can
> any of the experts explain why this is a "bad idea"?

This was thrashed out at great length almost a year ago. 
Basically, while purists will see enormous merit in using <q> 
instead of quote marks, the practical approach is to stick with 
the quote marks, due to reasons outlined by another poster. (The 
terminating quote question with muli-paragraph quotes.)

There's also nothing *wrong* with using this:

	<q>"Hello,"</q> she said.

at least it's not disallowed in TEI.

I believe there's a place in the TEI header to indicate which 
practice you are using in the text.


-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From stephen.thomas at adelaide.edu.au  Sat Oct 30 00:39:00 2004
From: stephen.thomas at adelaide.edu.au (Steve Thomas)
Date: Sat Oct 30 00:39:21 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <p06110405bda77364aa78@[192.168.0.52]>
References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net>
	<wk7jpbj67d.fsf@chenla.org> <p06110405bda77364aa78@[192.168.0.52]>
Message-ID: <41834514.5040201@adelaide.edu.au>

ALl you need is this:

<div id="ch1" type="chapter" n="1">
    <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>

The "rendering agent" can then, if desired, use the type and n 
attributes to generate the additional "Chapter 1" heading.


Steve


Scott Lawton wrote:

> I'd like to address a different issue raised by Brad's example.  It may even be a typo of sorts or just a quick-and-dirty sample that's not representative -- but I've seen it elsewhere and think it should be covered in docs and perhaps verification suites.
> 
> 
>>			      CHAPTER I
>>.
>>	     CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.
> 
> 
> 
>><div n="1" type="chapter">
>><head type="title">The Romance of Merlin.</head>
>><head type="section title">CHAPTER I</head>
>><head type="sybtitle">CONSULTATION OF DEVILS, AND BIRTH OF
>>MERLIN.</head>
> 
> 
> Using the plain meaning of the terms (rather than any special TEI meaning), it's clear that "CONSULTATION..." is the chapter title.  In this particular book, the chapter number appears on the previous line, as a roman numeral, preceeded by the word "CHAPTER" in all caps.  That's worth recording so that we can reproduce the original, but I don't think the above is the best way to do it.
> 
> I'm going to suggest some alternatives that seem more logical; perhaps TEI experts can "translate" these into valid TEI (or suggest extensions that are TEI-like).
> 
> First, let's take a simpler case; a chapter that starts with just the bare title:
> 
>          CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.
> 
> I think the markup here can be very simple:
> 
> <div n="1" type="chapter">
>  <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
> 
> I don't think any TYPE attribute is required; that's clear from context.
> 
> Now, let's add "CHAPTER I".  It's sort of a label that precedes the actual chapter title (much like "Figure" or such for certain illustrations); that gives us:
> 
> <div n="1" type="chapter">
>  <head type="label">CHAPTER I</head>
>  <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
> 
> NOTE: when automatically extracting chapter titles, it's important to get the first unadorned <head>, i.e. skip <head type="label">.  And, AFAIK, no "index" tag is required.
> 
> Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first.  Well, that description suggests:
> 
> <div n="1" type="chapter">
>  <head type="book">The Romance of Merlin.</head>
>  <head type="label">CHAPTER I</head>
>  <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
> 
> Thoughts?

-- 
Stephen Thomas,
Senior Systems Analyst,
Adelaide University Library
ADELAIDE UNIVERSITY SA 5005
AUSTRALIA
Tel: +61 8 8303 5190  Fax: +61 8 8303 4369
Email: stephen.thomas@adelaide.edu.au
URL: http://staff.library.adelaide.edu.au/~sthomas/

From marcello at perathoner.de  Sat Oct 30 03:00:49 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 03:01:13 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <418342E2.5080906@adelaide.edu.au>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au>
Message-ID: <41836651.3040107@perathoner.de>

Steve Thomas wrote:

> The common advice seems to be to use <q> to enclose quoted speech 
> *inline*, and use <quote> for quoting larger blocks of text. The P4 TEI 
> manual was a bit vague on this, but that seems to be a sensible 
> convention worth using.

That would be presentational markup and very against the TEI specs. The 
specs are very detailed on this:

6.3.3 Quotation

This section discusses the following elements, all of which are often 
rendered by the use of quotation marks:

     * <q> contains a quotation or apparent quotation ? a representation 
of speech or thought marked as being quoted from someone else (whether 
in fact quoted or not); in narrative, the words are usually those of of 
a character or speaker; in dictionaries, q may be used to mark real or 
contrived examples of usage.
     * <quote> contains a phrase or passage attributed by the narrator 
or author to some agency external to the text.
     * <cit> A quotation from some other document, together with a 
bibliographic reference to its source.
     * <soCalled> contains a word or phrase for which the author or 
narrator indicates a disclaiming of responsibility, for example by the 
use of scare quotes or italics.

One form of presentational variation found particularly frequently in 
written and printed texts is the use of quotation marks. As with the 
typographic variations discussed in the preceding section, it is 
generally helpful to separate the encoding of the underlying textual 
feature (for example, a quotation or a piece of direct speech) from the 
encoding of its rendering (for example, the use of a particular style of 
quotation marks).

The most common and important use of quotation marks is, of course, to 
mark quotation, by which we mean simply any part of the text attributed 
by the author or narrator to some agency other than the narrative voice. 
Typical examples include passages cited from other works, for which the 
element <quote> may be used, and words or phrases attributed to other 
voices within the current work, for which the element <q> may be used. 
If this distinction between intra-textual and inter-textual voices 
cannot be made reliably, or is not of interest, then all quoted matter 
may simply be marked using the <q> tag. The editorial policy in this 
respect should be stated in the encoding description of the TEI Header. 
The <soCalled> element is used for cases where the author or narrator 
distances him or herself from the words in question without however 
attributing them to any other voice in particular.

   http://www.tei-c.org/P4X/CO.html#COHQQ


> As I understand this (from an earlier post), 'rend="display"' is 
> supposed to mean that the block should be indented (rather like the HTML 
> blockquote).
> 
> This seems like a very poor choice of terms to me. CSS has a "display" 
> property, which can take values such as "inline", "block", and -- 
> crucially -- "none". "display:none" is used where you don't want the 
> content displayed at all.
> 
> So using this rend="display" seems likely to result in confusion.
> 
> In any case, the choice is poor because it does not convey the 
> information desired. If you use <quote> on its own without 
> rend="display", does that indicate you don't want to display the 
> content? Or that you don't want to indent it?

   "These Guidelines make no binding recommendations for the
   values of the rend attribute; the characteristics of
   visual presentation vary too much from text to text and
   the decision to record or ignore individual characteristics
   varies too much from project to project. Some potentially
   useful conventions are noted from time to time at
   appropriate points in the Guidelines."

   -- http://www.tei-c.org/P4X/ref-GLOBAL.html

Thus we are perfectly right in making up a convention of our own. But 
TEI is not CSS. Although CSS and the rend attribute are both purely 
presentational we should not mix TEI and CSS conventions.


The "display" choice may be poor but it is exactly the same choice 
Sebastian Rahtz made in his stylesheets. Look at the code in:

   http://www.tei-c.org/Stylesheets/P4/html/teihtml-misc.xsl

While not dictated by TEI specs, using rend="display" makes our 
convention compatible with Sebastian's stylesheets.


Also, using <q rend="block"> would be a still poorer choice because the 
rend attribute is global and can be used on all TEI elements.

   <div rend="block">

is perfectly valid TEI and it would be quite counter-intuitive to have 
it set a display margin around the block, whereas

   <div rend="display">

makes quite clear what you want.


> This was thrashed out at great length almost a year ago. Basically, 
> while purists will see enormous merit in using <q> instead of quote 
> marks, the practical approach is to stick with the quote marks, due to 
> reasons outlined by another poster. (The terminating quote question with 
> muli-paragraph quotes.)

Using <q> has advantages:

  - automatically finds quotation mark errors
  - renderer can use prettiest quote in output format,
    eg. plain ugly apostrophe in TXT and pretty typografical
    quotes in PDF.
  - automatically extract quotes from text

and disadvantages:

  - more work


The argument about the terminating quote character in multi-paragraph 
quotes is moot since there is a way to deal with it:

   <p>He said: <q rend="pre">Blah.</q></p>

   <p><q>And blah.</q></p>


-- 
Marcello Perathoner
webmaster@gutenberg.org

From traverso at dm.unipi.it  Sat Oct 30 03:03:51 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sat Oct 30 03:04:15 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <418342E2.5080906@adelaide.edu.au> (message from Steve Thomas on
	Sat, 30 Oct 2004 16:59:38 +0930)
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au>
Message-ID: <200410301003.i9UA3p43031055@posso.dm.unipi.it>


It is usual, in freench typography (and in french typewriting too,
btw) to include an half-width, non-breaking space before "broken
punctiation", i.e. [:;!?].

Some typsestting engines (e.g. TeX through the \frenchspacing
declaration, and LaTeX through the \usepackage[francais]{babel}
header), implement this convention. So the TeX source should not contain
these spaces, that will be included by the rendering engine. Putting
in and uot these spaces can of course be automated.

What should be done to encode correctly a french text in TEI, and what
is (should be) done by the text rendering engine? For french text in
ISO-Latin it is customary to include a full non-breaking space, in
Unicode half-width spaces should be used. 

Similar conventions apply for em-dashes; here however spaces can be
broken, so half-width (breaking) spaces can be used instead. 

Carlo
From traverso at dm.unipi.it  Sat Oct 30 03:34:49 2004
From: traverso at dm.unipi.it (Carlo Traverso)
Date: Sat Oct 30 03:35:14 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <41836651.3040107@perathoner.de> (message from Marcello
	Perathoner on Sat, 30 Oct 2004 12:00:49 +0200)
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
Message-ID: <200410301034.i9UAYnHO024145@posso.dm.unipi.it>

>>>>> "Marcello" == Marcello Perathoner <marcello@perathoner.de> writes:

    Marcello> Steve Thomas wrote:

    >> The common advice seems to be to use <q> to enclose quoted
    >> speech *inline*, and use <quote> for quoting larger blocks of
    >> text. The P4 TEI manual was a bit vague on this, but that seems
    >> to be a sensible convention worth using.

    Marcello> That would be presentational markup and very against the
    Marcello> TEI specs. The specs are very detailed on this:


If TEI has to be used only semantically, then it is inadequate for PG
needs. PG markup has to contain presentational elements, in such a way
that one can obtain presentations "faithful to the original".

A PG-TEI encoded text should allow to call a transform to a
presentation form with an "original" formatting specification,
allowing to recover whatever was in the original, (as well as other
specifications allowing to change it). This might include, (referring
to quotations), the possibility of rendering a quoted section with
running quotation marks at the start of each line.

One should never forget that presentation IS semantic: this is evident
with heavily formatted poetry, (Mallarme's "Un coup de des jamais
n'abolira le hazard" is a quite extreme case) but in some form or
another it is always true.

Carlo


From jeroen at bohol.ph  Sat Oct 30 04:47:13 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sat Oct 30 04:47:07 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <41836651.3040107@perathoner.de>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au>
	<41836651.3040107@perathoner.de>
Message-ID: <41837F41.2000302@bohol.ph>

Marcello Perathoner wrote:

> Steve Thomas wrote:
>
>> The common advice seems to be to use <q> to enclose quoted speech 
>> *inline*, and use <quote> for quoting larger blocks of text. The P4 
>> TEI manual was a bit vague on this, but that seems to be a sensible 
>> convention worth using.
>
>
> That would be presentational markup and very against the TEI specs. 
> The specs are very detailed on this:
>
I do not agree with this, especially not in the context of pre-existing 
books, for a number of reasons.

0. TEI is highly flexible, and prescribes fairly little. You choose what 
elements you wish to mark up and which not.
1. Quotations do not nest well with paragraphs. TEI (or XML) do not 
provide mechanisms to properly represent overlapping hierarchies. Older 
books can be quite difficult to mark up this way, as closing marks are 
often missing, etc. (I can provide examples)
2. Quotation marks can be considered part of the content, and thus 
should be retained. Adding <q> elements to these parts is fully 
optional, and I would only provide these if I have a good reason to do 
so, as indicated in Marcello's mail. (and I would add, if you would like 
to create an aural style sheet, and have parts spoken by different 
voices, they also make sense, just as providing expantions of 
abbreviations, etc.!)
3. Adding <q> to all quotations (even with help of a script) is labour 
intensive, and adds little value.

>
>
>
> The argument about the terminating quote character in multi-paragraph 
> quotes is moot since there is a way to deal with it:
>
>   <p>He said: <q rend="pre">Blah.</q></p>
>
>   <p><q>And blah.</q></p>
>
And you will need a very smart renderer to correctly supply them, leave 
the quotation marks intact (inside or outside the <q>) or provide 
cumbersome rend attributes.

>
>
>

From jeroen at bohol.ph  Sat Oct 30 04:49:25 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sat Oct 30 04:49:17 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <200410301003.i9UA3p43031055@posso.dm.unipi.it>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au>
	<200410301003.i9UA3p43031055@posso.dm.unipi.it>
Message-ID: <41837FC5.60208@bohol.ph>

Carlo Traverso wrote:

>It is usual, in freench typography (and in french typewriting too,
>btw) to include an half-width, non-breaking space before "broken
>punctiation", i.e. [:;!?].
>
>  
>
My own go would be to ignore it in the encoded version, and let the 
rendering process deal with it.

Jeroen.

PS. The dutch story about Pisa is now in PP. Hope to post it somewhere 
next week -- and ofcourse will prepare a TEI version.

From marcello at perathoner.de  Sat Oct 30 04:49:21 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 04:49:47 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <200410301034.i9UAYnHO024145@posso.dm.unipi.it>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>	<418342E2.5080906@adelaide.edu.au>
	<41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
Message-ID: <41837FC1.8080704@perathoner.de>

Carlo Traverso wrote:

>     Marcello> Steve Thomas wrote:
> 
>     >> The common advice seems to be to use <q> to enclose quoted
>     >> speech *inline*, and use <quote> for quoting larger blocks of
>     >> text. The P4 TEI manual was a bit vague on this, but that seems
>     >> to be a sensible convention worth using.
> 
>     Marcello> That would be presentational markup and very against the
>     Marcello> TEI specs. The specs are very detailed on this:
> 
> If TEI has to be used only semantically, then it is inadequate for PG
> needs. PG markup has to contain presentational elements, in such a way
> that one can obtain presentations "faithful to the original".

I didn't say that. I said that using <q> and <quote> to markup inline 
and block quotes respectively was wrong.

In TEI all of the presentational stuff should be done with the rend 
attribute.

   <q rend="display">


As to the "faithful to the original" debate:

Most people are far too much enamoured of exactly replicating the one 
edition of the text they happen to work on. (I can understand people 
wanting to faithfully replicate a Shakespeare First Folio, but not the 
books PG usually produces.)

Most of the presentational attributes of any edition of a text are just 
whims of the publisher. Who cares if the authors name was printed in 
Zapf Chancery Slanted 17,4 pt gold embossed with 0.1em of extra 
inter-character spacing added? If you get a different edition of the 
same work the authors name will be printed in a very different font.

The best guess is to just encode that this is the authors name.


> One should never forget that presentation IS semantic: this is evident
> with heavily formatted poetry, (Mallarme's "Un coup de des jamais
> n'abolira le hazard" is a quite extreme case) but in some form or
> another it is always true.

That is a half truth at the best.

Presentation encodes semantics, but it is a lossy encoding.

The same presentational attribute "italics" can encode a wide range of 
semantic features like "emphasis", "foreign word", "name", etc.

If presentation could losslessly encode semantics, and an accepted 
standard existed how to do this, a program could recover the semantics 
from the presentation and mark up a text all by itself. But then, if a 
program can guess, why mark up at all?

This is Bowerbirds ZML approach. What Bowerbird does not understand is 
that there are far too many semantic features to make a presentational 
encoding reversible. (Technically Bowerbird is farther off the rocker 
still: he says that ASCII TXT can encode all semantics in the world, 
which is even sillier than to say that typography can.)

Mathematically speaking:

   Let PRE be the set of all presentational attributes
   that can reasonably be distinguished by human eye,
   and SEM be the set of all semantics.

   Then there is no bijective function PRE = f (SEM)

Thus we can say "presentation hints at semantics" but not "presentation 
IS semantic".


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Oct 30 05:35:12 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 05:35:16 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <41837F41.2000302@bohol.ph>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au>	<41836651.3040107@perathoner.de>
	<41837F41.2000302@bohol.ph>
Message-ID: <41838A80.7050808@perathoner.de>

Jeroen Hellingman wrote:

> 0. TEI is highly flexible, and prescribes fairly little. You choose what 
> elements you wish to mark up and which not.

Yes. But *if* you mark up you have to use the right element. Using 
<quote> for all displayed quotes is wrong.


> 1. Quotations do not nest well with paragraphs. TEI (or XML) do not 
> provide mechanisms to properly represent overlapping hierarchies. Older 
> books can be quite difficult to mark up this way, as closing marks are 
> often missing, etc. (I can provide examples)

I have marked up a lot of books with multi-paragraph quotations. I also 
have a script that replaces most quotation signs with <q> </q> and even 
gets <q rend="pre"> right most of the time. I found a lot of quotation 
mark errors in PG texts this way.


> 2. Quotation marks can be considered part of the content, and thus 
> should be retained. Adding <q> elements to these parts is fully 
> optional, and I would only provide these if I have a good reason to do 
> so, as indicated in Marcello's mail. (and I would add, if you would like 
> to create an aural style sheet, and have parts spoken by different 
> voices, they also make sense, just as providing expantions of 
> abbreviations, etc.!)

1. Quotation marks are just presentational markup for "this is a quote", 
no more than italic is presentational markup for "this is emphasized". 
You should retain the underlying semantic feature not the presentation.


2. Replacing quotation signs with <q> </q> will actually preserve them 
*better*.

Unless you replace all apostroph chars with the correct lsquo and rsquo 
characters or entities, almost every output will look nearer to the 
original if the renderer can insert the correct unicode lsquo rsquo 
glyphs. (Note: its difficult for a renderer to guess from context if it 
should render apos as apos, lsquo or rsquo, but it is easy to transform 
<q> and </q>.)

But *if* you replace apos with lsquo and rsquo you may as well replace 
it with <q> and </q>.


But of course all this discussion is moot, because my converter supports 
both ways and you can do as you like.


> 3. Adding <q> to all quotations (even with help of a script) is labour 
> intensive, and adds little value.

Not at all. The script finds most of these. The validator finds some 
more. Then you make a last pass in the editor with a regexp search. (Of 
course doing Mark Twain will take a little longer.)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From brad at chenla.org  Sat Oct 30 06:12:51 2004
From: brad at chenla.org (Brad Collins)
Date: Sat Oct 30 06:14:35 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <41834514.5040201@adelaide.edu.au> (Steve Thomas's message of
	"Sat, 30 Oct 2004 17:09:00 +0930")
References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net>
	<wk7jpbj67d.fsf@chenla.org> <p06110405bda77364aa78@[192.168.0.52]>
	<41834514.5040201@adelaide.edu.au>
Message-ID: <wkoeike4do.fsf@chenla.org>

Steve Thomas <stephen.thomas@adelaide.edu.au> writes:

> ALl you need is this:
>
> <div id="ch1" type="chapter" n="1">
>     <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
>
> The "rendering agent" can then, if desired, use the type and n
> attributes to generate the additional "Chapter 1" heading.
>
>
> Steve

This should work, at least on the vast majority of modern texts. And I
agree that the purpose of marking up a text is to markup the content
of the text, not duplicate the original layout or typography. PG is
producing electronic editions, not electronic facimiles of an
original.

This is no problem in texts like A Christmas Carol where Chapters are
called `staves', but what if the text uses an alternate spelling for
the word chapter, or only uses numbers or spells out the number into
words?  For example.

    1
    i.
    Chapter One
    CHAPTER ONE
    chapter 1.
    Chap 1.
    CH I
    First Chapter

The type attribute should generally use enumerated values so that
processing software can understand that all of these different forms
of the concept `chapter' are the same.

We could normalize all headings, no matter what the original was, but
I would prefer to keep the original.  In rare cases it reflects
authorial intent or is a stylistic element in the overall flow of the
work.

For A Christmas Carol I would rather use Scott's approach:

   <div id="ch1" type="chapter" n="1">
      <head type="DivLabel">STAVE ONE.</head>
      <head>MARLEY?S GHOST.</head>

Rather than:

   <div id="ch1" type="stave" n="1">

In this way processing software would understand that a stave is a
chapter when it looks up a reference in another work which points to
Chapter 1, page 4 in the Carol.

I also think that the type of label should be stated clearly.  There
might be many kinds of labels in a complex document.

------

Ack!  Before sending this I had a look through the TEI manual and
found this example:

   <div1 type="book" n="Herod I">
       <head>Libro Primo</head>

Which is somewhere between Scott's idea and and Steve's.  If you used
this for the Carol it might look like this:

   <div id="ch1" type="chapter" n="Stave 1">
      <head>MARLEY?S GHOST.</head>

This preserves the type value as an enumerated value, but using the
`n' value as a text string rather than an integer make's it more
difficult for processing agents to understand the structure of the
text.  I would prefer that the `n' value be an integer and use the
head/label approach.

I believe that as a general rule attribute values should be used for
items which help process a text, or clarify the meaning of a text,
rather than for any part of the text which is displayed.

The spec defines the datatype for `n' as CDATA.  So `Stave 1' is a
legal value, but I would seriously consider making the value more
restrictive.

All in all I think I still like Scott's approach but I'm still open
to any better suggestions.

BTW: This has been a fantastic discussion and has helped me clarify a
lot of details in using TEI which I hadn't completely worked out
before.

This is very difficult stuff folks, and 

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From marcello at perathoner.de  Sat Oct 30 06:22:36 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 06:22:42 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
In-Reply-To: <418144CD.7070707@bohol.ph>
References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>
	<418144CD.7070707@bohol.ph>
Message-ID: <4183959C.6010806@perathoner.de>

Jeroen Hellingman wrote:

> As I promised already some time ago, I've prepared a draft TEI Lite 
> conventions document,

1. The main point of incompatibility with my proposal is the lack of 
support for plain <div>. I support both, <div> and <divN>. I think you 
should also.


2. The rend attributes are ill-chosen and need reworking.

rend is a global attribute and can be used on all TEI elements. It is 
counter-intuitive to make the effect dependent on the element.

   <figure rend="left"> floats the picture to the left

   <p rend="left"> makes a ragged-right paragraph

better use

   <figure rend="float(left)">

   <p rend="text-align(left)">


3. The urls have to be changed.

www.gutenberg.org/css/ is already taken for the site css and I don't 
want to mix those with the book css.

www.gutenberg.org/xslt/ and www.gutenberg.org/dtd/ are off the main 
directory. I try to keep the number of subdirectories in the main 
directory to a minimum.

Proposal: one directory off the main with a hierarchy to accomodate all 
xslt stuff by different people.

   www.gutenberg.org/tei/
   www.gutenberg.org/tei/jeroen/
   www.gutenberg.org/tei/jeroen/css/
   www.gutenberg.org/tei/jeroen/dtd/
   www.gutenberg.org/tei/jeroen/xslt/
   www.gutenberg.org/tei/marcello/
   www.gutenberg.org/tei/marcello/css
   www.gutenberg.org/tei/marcello/dtd
   www.gutenberg.org/tei/marcello/xslt
   etc.

The prefix www.gutenberg.org/tei/jeroen/ should also be used for all 
your namespaces.

Is this ok?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Sat Oct 30 06:43:24 2004
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Oct 30 06:42:35 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <418342E2.5080906@adelaide.edu.au>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au>
Message-ID: <41839A7C.8090500@hutchinson.net>

Steve Thomas wrote:

>
> As I understand this (from an earlier post), 'rend="display"' is 
> supposed to mean that the block should be indented (rather like the 
> HTML blockquote).
>
> This seems like a very poor choice of terms to me. CSS has a "display" 
> property, which can take values such as "inline", "block", and -- 
> crucially -- "none". "display:none" is used where you don't want the 
> content displayed at all.
>
> So using this rend="display" seems likely to result in confusion.
>
> In any case, the choice is poor because it does not convey the 
> information desired. If you use <quote> on its own without 
> rend="display", does that indicate you don't want to display the 
> content? Or that you don't want to indent it?
>
> I personally don't see any need to use rend here. If you are quoting a 
> passage from some other work, then enclose it in <quote> .. </quote>. 
> That's enough. When someone comes to present this (e.g. in an HTML 
> version), the most natural thing would be to convert the tag to 
> blockquote. The rend is redundant.

You know... Thank you, Steve.  When I read this, I had a "duh!" moment 
and slapped my head. 

You are absolutely right.  <quote> *should* just result in a blockquote 
when converted to HTML.  The rend=display is redundant here.

Josh
From brad at chenla.org  Sat Oct 30 06:41:35 2004
From: brad at chenla.org (Brad Collins)
Date: Sat Oct 30 06:43:17 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <41837FC1.8080704@perathoner.de> (Marcello Perathoner's message
	of "Sat, 30 Oct 2004 13:49:21 +0200")
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
	<41837FC1.8080704@perathoner.de>
Message-ID: <wkhdoce31s.fsf@chenla.org>

Marcello Perathoner <marcello@perathoner.de> writes:

> Carlo Traverso wrote:
>
>>     Marcello> Steve Thomas wrote:
>>     >> The common advice seems to be to use <q> to enclose quoted
>>     >> speech *inline*, and use <quote> for quoting larger blocks of
>>     >> text. The P4 TEI manual was a bit vague on this, but that seems
>>     >> to be a sensible convention worth using.
>>     Marcello> That would be presentational markup and very against
>> the
>>     Marcello> TEI specs. The specs are very detailed on this:
>> If TEI has to be used only semantically, then it is inadequate for
>> PG
>> needs. PG markup has to contain presentational elements, in such a way
>> that one can obtain presentations "faithful to the original".
>

Marcello of course is completely correct, but that doesn't mean that
Steve is wrong....  Einstein didn't invalidate Newton, he refined
Newton.  That's how progressive passes of markup should work.

A lot of people are coming to TEI from an HTML background.  It's the
'ol when the only tool you have is a hammer, everything begins to look
like a nail.

And in a way, as a general rule you could say that <q> is for inline
and <quote> is for block quotes.  And many times you'd be right, even
though many times it would be for the wrong reasons.

Steve has voiced a sort of first-pass, rule of thumb.  It's a bit
like the <hi> tag in TEI which isn't terribly semantic when used as a
first pass general markup tag.

I would love to see a defined first-pass set of markup tags which
would be as easy as HTML to learn and apply.  This would help
enormously in early stages of markup which could then be done by
folks who haven't spent long lonely hours pouring over the TEI manual
and then testing chunks of code in nxml-mode (an XML editing mode in
Emacs).

b/

Who is bloody thankful the sun just went down after a blistering day in
the big shitty.... sometimes I wish I could afford air-con. And the
hot season is still yet to come.

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From marcello at perathoner.de  Sat Oct 30 06:49:04 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 06:49:09 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <wkoeike4do.fsf@chenla.org>
References: <1a5.29beadb0.2eb17a83@aol.com>
	<418030C5.7070505@hutchinson.net>	<wk7jpbj67d.fsf@chenla.org>
	<p06110405bda77364aa78@[192.168.0.52]>	<41834514.5040201@adelaide.edu.au>
	<wkoeike4do.fsf@chenla.org>
Message-ID: <41839BD0.2010405@perathoner.de>

Brad Collins wrote:

> For A Christmas Carol I would rather use Scott's approach:
> 
>    <div id="ch1" type="chapter" n="1">
>       <head type="DivLabel">STAVE ONE.</head>
>       <head>MARLEY?S GHOST.</head>
> 
> Rather than:
> 
>    <div id="ch1" type="stave" n="1">

You'll also have to consider XPath queries. In a couple of years we'll 
likely put all of the PG TEI files into a giant XML database. No more 
files.

You'll retrieve a book with an XPath query like (simplyfied):

   /org/gutenberg/etext/12345

You'll get the book title(s) with

   /org/gutenberg/etext/12345//titleStmt/title

and the title of the first chapter with

   /org/gutenberg/etext/12345//div[@type="chapter"][@n=1]/head

Of course this will only work if the first chapter always has attribute 
type="chapter" and attribute n=1 and not n="I" or n="Chapter 1" or 
n="Chapter I" ...


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jeroen at bohol.ph  Sat Oct 30 06:52:24 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sat Oct 30 06:51:27 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
In-Reply-To: <4183959C.6010806@perathoner.de>
References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com><418144CD.7070707@bohol.ph>
	<4183959C.6010806@perathoner.de>
Message-ID: <41839C98.4000702@bohol.ph>

Marcello Perathoner wrote:

> 1. The main point of incompatibility with my proposal is the lack of 
> support for plain <div>. I support both, <div> and <divN>. I think you 
> should also.

It means some extra programming from my side. I would like to see the 
various people working on this issue converge to a single standard.

> 2. The rend attributes are ill-chosen and need reworking.
>
> rend is a global attribute and can be used on all TEI elements. It is 
> counter-intuitive to make the effect dependent on the element.
>
>   <figure rend="left"> floats the picture to the left
>   <p rend="left"> makes a ragged-right paragraph
>
> better use
>
>   <figure rend="float(left)">
>   <p rend="text-align(left)">
>
I agree the rendition ladder approach is much better. The "simple" rend 
attributes are actually quick hacks. I've looked in your code, and the 
generic code for rend attributes you use is a much better way to deal 
with it. Quite some work needs to be done though before rendition 
ladders are fully supported.

>
> 3. The urls have to be changed.
>
> www.gutenberg.org/css/ is already taken for the site css and I don't 
> want to mix those with the book css.
>
> www.gutenberg.org/xslt/ and www.gutenberg.org/dtd/ are off the main 
> directory. I try to keep the number of subdirectories in the main 
> directory to a minimum.
>
> Proposal: one directory off the main with a hierarchy to accomodate 
> all xslt stuff by different people.
>
>   www.gutenberg.org/tei/
>   www.gutenberg.org/tei/jeroen/
>   www.gutenberg.org/tei/jeroen/css/
>   www.gutenberg.org/tei/jeroen/dtd/
>   www.gutenberg.org/tei/jeroen/xslt/
>   www.gutenberg.org/tei/marcello/
>   www.gutenberg.org/tei/marcello/css
>   www.gutenberg.org/tei/marcello/dtd
>   www.gutenberg.org/tei/marcello/xslt
>   etc.
>
> The prefix www.gutenberg.org/tei/jeroen/ should also be used for all 
> your namespaces.
>
That sounds like a good proposal, how do others think about it, 
especially if books in the books hierarchy start referencing to these 
things? Currently, they are rather self-contained things, with all 
required stuff in one place. Doing this will basically require us to 
keep things in the generic directories downwards compatible with texts 
posted before.

Jeroen
From brad at chenla.org  Sat Oct 30 06:59:22 2004
From: brad at chenla.org (Brad Collins)
Date: Sat Oct 30 07:01:05 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
In-Reply-To: <4183959C.6010806@perathoner.de> (Marcello Perathoner's message
	of "Sat, 30 Oct 2004 15:22:36 +0200")
References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>
	<418144CD.7070707@bohol.ph> <4183959C.6010806@perathoner.de>
Message-ID: <wk654se285.fsf@chenla.org>

Marcello Perathoner <marcello@perathoner.de> writes:

> Jeroen Hellingman wrote:
>
> 2. The rend attributes are ill-chosen and need reworking.
>
> rend is a global attribute and can be used on all TEI elements. It is
> counter-intuitive to make the effect dependent on the element.
>
>    <figure rend="left"> floats the picture to the left
>    <p rend="left"> makes a ragged-right paragraph
>
> better use
>
>    <figure rend="float(left)">
>    <p rend="text-align(left)">


Should `rend' then be a means of passing CSS to a processor?

I see a lot of people using the `rend' attribute as a means of
dumping in presentational instructions, when it should be used as a
means of describing the original:

,----[ TEI Manual: Global Attributes ]
| rend (rendition or presentation) indicates how the element in question
|      was rendered or presented in the source text.
|      Datatype: CDATA
|      Values: any string of characters; if the typographic rendition
| 	     of a text is to be systematically recorded, a
| 	     systematic set of values for the rend attribute should
| 	     be defined.
|      Default: #IMPLIED
`----

I would suggest that PG define a set of enumerated values for `rend'
which then can be mapped to CSS.

That is more restrictive and requires changing the datatype.

Any thoughts?

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From jon at noring.name  Sat Oct 30 07:57:30 2004
From: jon at noring.name (Jon Noring)
Date: Sat Oct 30 07:57:44 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <200410301034.i9UAYnHO024145@posso.dm.unipi.it>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
Message-ID: <1181387224250.20041030085730@noring.name>

Carlo wrote:
> Marcello wrote
>> Steve Thomas wrote:

>>> The common advice seems to be to use <q> to enclose quoted
>>> speech *inline*, and use <quote> for quoting larger blocks of
>>> text. The P4 TEI manual was a bit vague on this, but that seems
>>> to be a sensible convention worth using.

>> That would be presentational markup and very against the
>> TEI specs. The specs are very detailed on this:

> If TEI has to be used only semantically, then it is inadequate for PG
> needs. PG markup has to contain presentational elements, in such a way
> that one can obtain presentations "faithful to the original".

Is this a requirement that it be possible *without some manual work*
to regenerate the typographic layout of the source document?

And what impact does this attempt to be 'faithful to the original' have
on accessibility and non-visual uses of the PG texts?


> A PG-TEI encoded text should allow to call a transform to a
> presentation form with an "original" formatting specification,
> allowing to recover whatever was in the original, (as well as other
> specifications allowing to change it). This might include, (referring
> to quotations), the possibility of rendering a quoted section with
> running quotation marks at the start of each line.

This implies, for example, that "long-s" characters, common in
pre-19th century English texts, should be preserved (e.g., use the
Unicode character equivalent). For modern usage someone can later
transform all Unicode "long-s" characters to the ordinary "s". But to
do it the other way around is more difficult. (Yes, a special
character is not usually a "presentation" issue, but in this case it
has become a modern presentation issue.)


> One should never forget that presentation IS semantic: this is evident
> with heavily formatted poetry, (Mallarme's "Un coup de des jamais
> n'abolira le hazard" is a quite extreme case) but in some form or
> another it is always true.

I disagree with this in a general sense. Presentation is most used to
communicate document structure and sometimes the semantics of
particular chunks of content (e.g., "this is a foreign phrase".) In a
few cases visual layout becomes part of content itself ("poetry as
visual art"). In these rare cases I believe that SVG should be used
since there are facilities in SVG for accessibility, and SVG will
truly get it exactly right all the time. SVG is XML-based, too.

Jon Noring

From marcello at perathoner.de  Sat Oct 30 08:27:19 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 08:27:26 2004
Subject: [gutvol-d] draft TEI conventions and larger example file
In-Reply-To: <wk654se285.fsf@chenla.org>
References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com>	<418144CD.7070707@bohol.ph>
	<4183959C.6010806@perathoner.de> <wk654se285.fsf@chenla.org>
Message-ID: <4183B2D7.6010109@perathoner.de>

Brad Collins wrote:

>>   <figure rend="float(left)">
>>   <p rend="text-align(left)">
> 
> Should `rend' then be a means of passing CSS to a processor?

Not necessarily CSS. I used those values as example.


> I see a lot of people using the `rend' attribute as a means of
> dumping in presentational instructions, when it should be used as a
> means of describing the original:

Of course, you should use rend judiciously, and only to preserve the 
rendition of the original when you feel it needs recording.

If we are going to define a set of values we will most probably end up 
with something resembling CSS very much.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Sat Oct 30 09:40:42 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Oct 30 09:40:53 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <wkoeike4do.fsf@chenla.org>
References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net>
	<wk7jpbj67d.fsf@chenla.org> <p06110405bda77364aa78@[192.168.0.52]>
	<41834514.5040201@adelaide.edu.au> <wkoeike4do.fsf@chenla.org>
Message-ID: <Pine.GSO.4.58.0410300939190.3240@vtn1.victoria.tc.ca>


H. G. Wells texts very often use:
Chapter the First
Chapter the Second
Chapter the Third
etc.


Andrew

On Sat, 30 Oct 2004, Brad Collins wrote:

> This is no problem in texts like A Christmas Carol where Chapters are
> called `staves', but what if the text uses an alternate spelling for
> the word chapter, or only uses numbers or spells out the number into
> words?  For example.
>
>     1
>     i.
>     Chapter One
>     CHAPTER ONE
>     chapter 1.
>     Chap 1.
>     CH I
>     First Chapter
>

From gbnewby at pglaf.org  Sat Oct 30 11:20:32 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Oct 30 11:20:33 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <1181387224250.20041030085730@noring.name>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
	<1181387224250.20041030085730@noring.name>
Message-ID: <20041030182032.GA7344@pglaf.org>

On Sat, Oct 30, 2004 at 08:57:30AM -0600, Jon Noring wrote:
> Carlo wrote:
> > If TEI has to be used only semantically, then it is inadequate for PG
> > needs. PG markup has to contain presentational elements, in such a way
> > that one can obtain presentations "faithful to the original".
> 
> Is this a requirement that it be possible *without some manual work*
> to regenerate the typographic layout of the source document?

I have not heard this as a requirement.  Of course, some eBook
producers might believe it's valuable, and they are welcome to prepare
their work to be "typographyically correct" (whatever that might mean
to them).

However, it *is* a requirement to automatically regenerate plain text
and HTML (perhaps other formats as desired) from the XML.
  -- Greg
From jon at noring.name  Sat Oct 30 11:43:12 2004
From: jon at noring.name (Jon Noring)
Date: Sat Oct 30 11:43:30 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <20041030182032.GA7344@pglaf.org>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
	<1181387224250.20041030085730@noring.name>
	<20041030182032.GA7344@pglaf.org>
Message-ID: <251400766265.20041030124312@noring.name>

Greg Newby wrote:
> Jon Noring wrote:
>> Carlo wrote:

>>> If TEI has to be used only semantically, then it is inadequate for PG
>>> needs. PG markup has to contain presentational elements, in such a way
>>> that one can obtain presentations "faithful to the original".

>> Is this a requirement that it be possible *without some manual work*
>> to regenerate the typographic layout of the source document?

> I have not heard this as a requirement.  Of course, some eBook
> producers might believe it's valuable, and they are welcome to prepare
> their work to be "typographyically correct" (whatever that might mean
> to them).

My question was more rhetorical rather than inquisitive. The
discussion shows there's different views on the issue of what we
preserve, in a presentational sense, of the original source document.

For me, only rarely must the typographic layout be reproduced in some
manner (such as "poetry as visual art" and a few other rarities as
have been brought out here.) And for this, I recommend using SVG
rather than trying to use presentational markup plus CSS to effect the
desired result in the digital text version. I've previously commented
on eschewing tabs and spaces for poetry/verse used to preserve visual
indentation (my view is to use structural or semantic markup instead
-- and where poetry moves into the visual art realm, then use SVG.)

Whether to preserve the "long-s" or not is more problematic, since
where do we draw the line? For example, if we have an old Russian
text, do we transliterate the character set to Latin? Of course we
don't. Isn't the use of a "long-s" part of a variant character used at
the time of publication? It is easy to auto-convert the Unicode
equivalent of the 'long-s' character to an ordinary 's' (as it is for
the German ess-tsett), but going the other way is much more difficult.


> However, it *is* a requirement to automatically regenerate plain text
> and HTML (perhaps other formats as desired) from the XML.

Definitely! Both repurposeability and accessibility is vital. My view
is that, as much as possible, make the final master digital content as
agnostic with respect to presentation type as possible. And in the
rare instances this is not possible, then use SVG, which when done
right allows much better accessibility and repurposeability. If enough
agree here, we might want to begin discussing how to integrate islands
of SVG within the TEI framework.

Jon Noring

From scott_bulkmail at productarchitect.com  Sat Oct 30 15:33:03 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sat Oct 30 15:36:34 2004
Subject: [gutvol-d] capture original presentation?
In-Reply-To: <251400766265.20041030124312@noring.name>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
	<1181387224250.20041030085730@noring.name>
	<20041030182032.GA7344@pglaf.org>
	<251400766265.20041030124312@noring.name>
Message-ID: <p06110406bda9bc8aeba2@[192.168.0.52]>

I've taken the liberty of starting a new thread since I think this issue is important.

It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup.

It's fine if others don't place any or much importance on that goal, but I hope they will still contribute TEI/markup knowledge so that this choice is supported.  Here, I think we can have our cake and eat it too.


>Jon Noring typed:
>
>My view
>is that, as much as possible, make the final master digital content as
>agnostic with respect to presentation type as possible. And in the
>rare instances this is not possible, then use SVG

I think there's a better middle ground here.  Yes, SVG is useful in "extreme" cases, but I don't think it addresses the primary use case.

My suggestion is that structural markup is *required*, and additional presentational markup is *optional*.  For those who want an agnostic master file, just ignore the presentational markup -- i.e. we have to design the XML so that the presentation is clearly distinct from structure.  Paraphrasing what Brad said to me in an earlier thread, the "rend" attribute describes the original presentation but doesn't enforce any specific output presentation.

Here's an example where SVG is clearly overkill:

            --Introduction--
1.  The Cyclone
2.  The Council with the Munchkins

Structural markup and regeneration would yield:

Introduction
1.  The Cyclone
2.  The Council with the Munchkins

That's perfectly reasonable, and may suffice for most people.  I just want there to be a way for those who think it's worth the effort to capture the former presentation in the master file.  For example:

	<head index="            --Introduction--">Introduction</head>

Or, using Marcelo's index tag:

	<index index="toc" level1="            --Introduction--" />
	<head>Introduction</head>

(In both cases, -- should probably be &mdash; and there may be a better solution than hardcoding the leading spaces.)
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From scott_bulkmail at productarchitect.com  Sat Oct 30 15:34:58 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sat Oct 30 15:36:39 2004
Subject: [gutvol-d] "Chapter n" as title vs. something else
In-Reply-To: <418222C3.10402@perathoner.de>
References: <1a5.29beadb0.2eb17a83@aol.com>
	<418030C5.7070505@hutchinson.net>	<wk7jpbj67d.fsf@chenla.org>
	<p06110405bda77364aa78@[192.168.0.52]> <418222C3.10402@perathoner.de>
Message-ID: <p06110404bda984e8dd5b@[192.168.0.52]>

>><div n="1" type="chapter">
>> <head type="book">The Romance of Merlin.</head>
>> <head type="label">CHAPTER I</head>
>> <head>CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.</head>
>>
>>Thoughts?
>
>The book title is at a different level from a chapter title so it gets its own div. If you find multiple chapter titles, you decide which is the main one and which are subtitles.
>
>  <div type="book">
>    <head>The Romance of Merlin</head>
>
>    <div type="chapter">
>      <head type="sub">Chapter I</head>
>      <index index="toc" />
>      <head>Consultations of Devils, and Birth of Merlin</head>

I want to make sure that I understand how <div type=book> fits into the big picture.  Is the following correct?

<text>
  <front>
    ... table of contents, introduction ...
  </front>

  <body>
    <div type="book">
      <head>The Romance of Merlin</head>

      <div type="chapter">
        <head type="sub">Chapter I</head> ... or type="label"
        <head>Consultations of Devils, and Birth of Merlin</head>

If so:
1. I agree that it's consistent, and may be the best TEI-centric solution
2. it introduces a level of hierarchy that some may find confusing

Whether #2 is important depends in part on who will be doing the most markup: volunteers who have some HTML experience vs. volunteers who are already TEI savvy (or don't mind the additional complexity).


>      <head type="sub">Chapter I</head>

I much prefer type=label.  "Chapter I" is not a subhead according to the plain meaning of the term.  Also, unlike a true subhead, it may be something that some people want to strip out or translate or standardize.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From shalesller at writeme.com  Sat Oct 30 14:04:40 2004
From: shalesller at writeme.com (D. Starner)
Date: Sat Oct 30 16:06:37 2004
Subject: [gutvol-d] PGTEI and more
Message-ID: <20041030210440.B0B1B4C0CF@ws1-1.us4.outblaze.com>

Jon Noring writes:

> Whether to preserve the "long-s" or not is more problematic, since 
> where do we draw the line? For example, if we have an old Russian 
> text, do we transliterate the character set to Latin? Of course we 
> don't. 

Yes, because that would be stupid and useless. The real question is
if we have an old Russian text, do we convert the fitas and other
letters that are strictly redundant and hence abandoned in modern
Russian to their modern Russian equivalent? Most English editions
convert the long-s, and I believe most Russian editions convert the
old letters to the modern equivalents.

Or how about the o with e above that was the written form of o-umlaut?
Do we preserve that or convert it to the modern form? There's one book
in DP that preserves it in the Unicode edition because umlauts were
used in a brief section, but I don't know that it was more important
than a change to Fraktur or bold. Again, modern German that was original
printed with an o-e above is converted to umlauts when reprinted; the
e above was encoded for people who wanted to use it with middle German to
contrast with modern German.

> Isn't the use of a "long-s" part of a variant character used at 
> the time of publication? It is easy to auto-convert the Unicode 
> equivalent of the 'long-s' character to an ordinary 's' (as it is for 
> the German ess-tsett), but going the other way is much more difficult. 

It depends. In English, most of the long-s usage is trivial to convert;
if it's not at the end of a word, it's a long-s. Sometimes a long-s s
combination is seen for ss, but that's consistent within one work generally.

In German, it takes a dictionary lookup and there's one or two minor
examples, comparable to the ones about Polish and polish in English,
where it can't be automatically converted.
 
But one question should be our readers. There's a lot of well-educated people
who aren't familiar with the long-s; are we doing more good in keeping what's
more a detail of the typography than the spelling at the cost of some of our
readers? The vast majority of the editions I've seen that reprint pre-1800
English works or German works orginally printed in Fraktur in modern fonts do
not use the long-s, even when preserving original spelling.
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

From gbnewby at pglaf.org  Sat Oct 30 17:57:26 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Oct 30 17:57:27 2004
Subject: [gutvol-d] capture original presentation?
In-Reply-To: <p06110406bda9bc8aeba2@[192.168.0.52]>
References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com>
	<418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de>
	<200410301034.i9UAYnHO024145@posso.dm.unipi.it>
	<1181387224250.20041030085730@noring.name>
	<20041030182032.GA7344@pglaf.org>
	<251400766265.20041030124312@noring.name>
	<p06110406bda9bc8aeba2@[192.168.0.52]>
Message-ID: <20041031005726.GA14737@pglaf.org>

On Sat, Oct 30, 2004 at 06:33:03PM -0400, Scott Lawton wrote:
> I've taken the liberty of starting a new thread since I think this issue is important.
> 
> It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup.

Just a quick note related to this, and my apologies if
it turned up in the thread already and I missed it:

We're planning to include the scanned page images along
with eBooks.  In fact, this is part of the intent with
the new directory structure for the PG servers (the
/1/0/8/0/...  structure).

We haven't done any (or many, anyway) because we're still
trying to figure out how to best name the page files, and how
to link them on a page-by-page basis into the (marked up?)
eBooks.  Jim Tinsley drafted some general guidelines for
the image files themselves, but linking them to the eBooks
is something we need to figure out still.

(BTW, the Million Books project at archive.org uses djvu
for this purpose.  It's not bad, but I like our intended 
solution of XML markup much better.  Plus, of course, the MBP
is mostly working with relatively poor quality proofreading.
For PG, the text has taken the main emphasis, not the appearance.)

My notion is that the PGTEI and TEI lite solutions I've been
reading about in this list will be easily adaptable to including
links to specific page image files, so I've not mentioned
it until now.

But since it's related to your desire for preservation of the actual
appearance of the scanned page, I figured I'd type it up now.  That
accomplished, please continue with your further thoughts - preserving
appearance is definitely something that is frequently desired.
  -- Greg


From scott_bulkmail at productarchitect.com  Sat Oct 30 18:11:54 2004
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Sat Oct 30 18:12:49 2004
Subject: [gutvol-d] PGTEI and more
Message-ID: <p06110408bda9eb42e5bd@[192.168.0.52]>

>>langUsage: I suggest the standard should be to omit the content of
>>the tag (e.g. "British", which is probably more useful as "British
>>English" or "English (British)"). This information should be
>>generated to ensure consistency. (They appear in the generated PGTEI
>>and in alice.tei, but not in lmiss.tei.)
>
>You have to include only the languages you actually use in the text.

What about the content of the tag?  i.e. which is correct?

      <language id="en-gb"></language>        # lmiss.tei
      <language id="en-gb">British</language> # alice.tei

I think the first is much better.  Given the second, it will be extra work to enforce a consistent word or phrase.


>The converter includes some more because it is easier to delete than to add and if you declare too many it doesn't hurt.

I agree that it's easier to delete; hence my suggestion to include a note.

Actually, all languages except the main one should be able to be determined programmatically, right?  Just extract and dedup lang= attributes.

We certainly don't want to include languages that aren't used; no point in bothering with all this XML if we're just going to populate it with wrong data.


>>Having separate index tags for TOC, PDF and PDB strikes me as
>>unnecessary and prone to error. Shouldn't the TOC one suffice for
>>all?
>
>Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you don't always want the full <head> to appear in the contents.

So, the PalmDoc and PDF headers can be generated to conform to those limitations.  I don't see the benefit of including these extra tags for every chapter of every document in the PG collection!
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From marcello at perathoner.de  Sat Oct 30 18:37:56 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Oct 30 18:38:12 2004
Subject: [gutvol-d] PGTEI and more
In-Reply-To: <p06110408bda9eb42e5bd@[192.168.0.52]>
References: <p06110408bda9eb42e5bd@[192.168.0.52]>
Message-ID: <418441F4.4030708@perathoner.de>

Scott Lawton wrote:

> What about the content of the tag?  i.e. which is correct?
> 
>       <language id="en-gb"></language>        # lmiss.tei
>       <language id="en-gb">British</language> # alice.tei

Both work. The contents of the tag does not matter.

The lang attribute is and IDREF. If you say <foreign lang="fr"> then you 
must have an element somewhere in your TEI with and id of "fr" otherwise 
it will not validate. The <langUsage> section is just a bin to hold 
those elements.


>> Some formats have limitations. eg. PamlDoc bookmarks have a maximum
>> of 16 characters. PDF bookmarks have to use iso-8859-1 chars.
>> Moreover you don't always want the full <head> to appear in the
>> contents.
> 
> So, the PalmDoc and PDF headers can be generated to conform to those
> limitations.  I don't see the benefit of including these extra tags
> for every chapter of every document in the PG collection!

How do you go about to condense a longer title into 16 characters? There 
is no algorithm that can do that nearly as well as a human. A human will 
always choose to include the most important part.

CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

=> Birth of Merlin


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Sun Oct 31 01:18:28 2004
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Oct 31 01:18:51 2004
Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise)
Message-ID: <1b9.54e4965.2eb607e4@aol.com>

john said:
>   Please picture this scenario:
>
>   I'm a volunteer who has scanned a public-domain book and 
>   wants to make it available through the PG distribution mechanism 
>   (free of charge, available until the Internet collapses under the weight 
of 
>   spam and next-generation pornography, yadda, yadda, yadda).
>
>   Today, if I can convert this book to plain text (according to 
>   some stated formatting conventions), I may submit the book. 
>   If I'm ambitious, I can create an HTML version, which presents 
>   the same information, but allows "real" formatting rather than 
>   _italic_ and *bold*. 
>
>   In the background, however, there is this Whole New World(tm) 
>   of semantic tagging, which presumably will allow the book to 
>   make snacks and provide entertainment during the reading process. 
>   But, for me, as a volunteer, who spends a considerable amount of time 
>   working on books, but enjoys actually finishing one and seeing it posted, 
>   I can't get my arms around the benefits.
>
>   Except for recognizing the acronyms, I am agnostic to 
>   XML/ZML/TEI/ABC/EIEIO.
>
>   Could someone please explain the benefit of semantic tagging 
>   and why it won't horribly lengthen the amount of time required 
>   to produce an eBook?
>
>   Thank you.

well?

-bowerbird
From marcello at perathoner.de  Sun Oct 31 03:57:19 2004
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Oct 31 03:57:46 2004
Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise)
In-Reply-To: <1b9.54e4965.2eb607e4@aol.com>
References: <1b9.54e4965.2eb607e4@aol.com>
Message-ID: <4184D31F.4030204@perathoner.de>

Bowerbird@aol.com wrote:

>>  Could someone please explain the benefit of semantic tagging 
>>  and why it won't horribly lengthen the amount of time required 
>>  to produce an eBook?
> 
> well?

This has already been discussed at great length. He can go to the 
archives to read it up.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jeroen at bohol.ph  Sun Oct 31 07:46:36 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sun Oct 31 07:46:28 2004
Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise)
In-Reply-To: <4184D31F.4030204@perathoner.de>
References: <1b9.54e4965.2eb607e4@aol.com> <4184D31F.4030204@perathoner.de>
Message-ID: <418508DC.4080109@bohol.ph>

Marcello Perathoner wrote:

> Bowerbird@aol.com wrote:
>
>>>  Could someone please explain the benefit of semantic tagging  and 
>>> why it won't horribly lengthen the amount of time required  to 
>>> produce an eBook?
>>

You'll need about one hour to add very basic level TEI tagging to a 
simple work, such as a novel. For scientific works with loads of tables, 
footnotes, foreign citations, and numerous cross references, it can take 
several days, but they will be increasingly required to be able to 
handle such works at all.

The learning curve for basic TEI is not too steep, and can be learned as 
easy as HTML in a few hours, then as you encounter more difficult 
constructs, you can gradually absorb more of the stuff. For books 
requiring special things, we will probably end up having specialists.

Important in this stage is that we will have tools available such that 
people can easily validate what they are doing.

Jeroen.

From hyphen at hyphenologist.co.uk  Sun Oct 31 08:19:21 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Oct 31 08:19:38 2004
Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise)
In-Reply-To: <418508DC.4080109@bohol.ph>
References: <1b9.54e4965.2eb607e4@aol.com> <4184D31F.4030204@perathoner.de>
	<418508DC.4080109@bohol.ph>
Message-ID: <hf3ao0p6si6f22ch6mequfdkiqquvi9ien@4ax.com>

On Sun, 31 Oct 2004 16:46:36 +0100,  Jeroen Hellingman <jeroen@bohol.ph>
wrote:

| Marcello Perathoner wrote:
| 
| > Bowerbird@aol.com wrote:
| >
| >>>  Could someone please explain the benefit of semantic tagging  and 
| >>> why it won't horribly lengthen the amount of time required  to 
| >>> produce an eBook?
| >>
| 
| You'll need about one hour to add very basic level TEI tagging to a 
| simple work, such as a novel. For scientific works with loads of tables, 
| footnotes, foreign citations, and numerous cross references, it can take 
| several days, but they will be increasingly required to be able to 
| handle such works at all.
| 
| The learning curve for basic TEI is not too steep, and can be learned as 
| easy as HTML in a few hours, then as you encounter more difficult 
| constructs, you can gradually absorb more of the stuff. For books 
| requiring special things, we will probably end up having specialists.
| 
| Important in this stage is that we will have tools available such that 
| people can easily validate what they are doing.

Last time I marked up a text by hand was, Hmmmm 19 years ago using nroff,
and a great pain in the **** it was.   Then someone invented WYSYWYG (What
You See is What You Get), and producing properly laid out text became a
doddle.

Surely you are not suggesting that we go back the Dark Ages, nay  Neolithic
times?

Which Windoze WYSIWYG application produces TEI tagging, to whatever
standard PG proposes?

-- 
Dave F


From jeroen at bohol.ph  Sun Oct 31 09:08:31 2004
From: jeroen at bohol.ph (Jeroen Hellingman)
Date: Sun Oct 31 09:08:00 2004
Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise)
In-Reply-To: <hf3ao0p6si6f22ch6mequfdkiqquvi9ien@4ax.com>
References: <1b9.54e4965.2eb607e4@aol.com>
	<4184D31F.4030204@perathoner.de><418508DC.4080109@bohol.ph>
	<hf3ao0p6si6f22ch6mequfdkiqquvi9ien@4ax.com>
Message-ID: <41851C0F.1020203@bohol.ph>

Dave Fawthrop wrote:

>
>Which Windoze WYSIWYG application produces TEI tagging, to whatever
>standard PG proposes?
>
>  
>
OpenOffice can do so with some customization...

Jeroen.

From jmdyck at ibiblio.org  Sun Oct 31 18:18:24 2004
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Sun Oct 31 18:18:41 2004
Subject: [gutvol-d] test #5; please ignore
Message-ID: <41859CF0.9667CAA5@ibiblio.org>

 
From gbnewby at pglaf.org  Sun Oct 31 20:31:33 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Oct 31 20:31:34 2004
Subject: [gutvol-d] pglaf.org settings might block some messages
Message-ID: <20041101043133.GA26281@pglaf.org>

I've heard in the past 2 weeks from two people who discovered that
their mail server settings resulted in pglaf.org bouncing their
messages to gutvol-d@lists.pglaf.org (not necessarily through fault of
their own).

If you've sent messages to pglaf.org, but are not sure if they got
through, you might want to send a test message or check the list
archives (http://lists.pglaf.org).  You can set your own list settings
to get your own messages, and/or an acknowledgement message.

The setting we use is to enforce the standard that mail servers (aka
MTAs) must have reverse DNS entries (also known as PTRs).  Several of
the big ISPs, such as AOL, enforce this rule.  Others don't.  Mail
from addresses with no corresponding PTR is (at pglaf.org) well over
99.9% likely to be spam.  Over 10,000 such messages are blocked
per week on the pglaf.org server using this single rule.

The postfix server at pglaf.org has had this setting since the spring,
and while I'm sure it has bounced some legitimate messages, it's only
recently that anyone has brought any false positives to my attention.
  -- Greg


From brad at chenla.org  Sun Oct 31 21:43:41 2004
From: brad at chenla.org (Brad Collins)
Date: Sun Oct 31 21:45:32 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
Message-ID: <wk1xfeunsi.fsf@chenla.org>


I took a look at the source for the recent handsome re-release of
PG's edition of A Christmas Carol (46-h).

The code is bit old, <p> tags are not terminated and the formating
could be formated a bit better to make it more readable.

For example, the first paragraph looked like this:

<p>
<span class="caps">Marley</span> was dead: to begin with. There is no
doubt whatever about that. The register of his burial was signed by
the clergyman, the clerk, the undertaker, and the chief
mourner. Scrooge signed it: and Scrooge&#8217;s name was good upon
&#8217;Change, for anything he chose to put his hand to. Old Marley
was as dead as a door-nail.

I ran the file through HTML-Tidy which turned it into this:

    <p><span class="caps">Marley</span> was dead: to begin with.
    There is no doubt whatever about that. The register of his burial
    was signed by the clergyman, the clerk, the undertaker, and the
    chief mourner. Scrooge signed it: and Scrooge's name was good
    upon 'Change, for anything he chose to put his hand to. Old
    Marley was as dead as a door-nail.</p>

It took about ten seconds to open the, file run the file through tidy
and save it.  This resulted in a file which is consistent, standards
compliant and far easier to read and process.

Open tags in HTML are an artifact of SGML which can confuse some
browsers, processing software and limit what you can do with CSS.

I suggest that all PG html files be run through Tidy before being
released.

If anyone wants the tidy'd version let me know.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand

From gbnewby at pglaf.org  Sun Oct 31 22:27:04 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Oct 31 22:27:06 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <wk1xfeunsi.fsf@chenla.org>
References: <wk1xfeunsi.fsf@chenla.org>
Message-ID: <20041101062704.GA28837@pglaf.org>

On Mon, Nov 01, 2004 at 12:43:41PM +0700, Brad Collins wrote:
> 
> I took a look at the source for the recent handsome re-release of
> PG's edition of A Christmas Carol (46-h).
> 
> The code is bit old, <p> tags are not terminated and the formating
> could be formated a bit better to make it more readable.
...

Strangely, this title doesn't have the usual filename mask
in GUTINDEX.ALL.  I'm cc'ing George to see about adding this.

The answer, as you saw, is that the file is old and therefore
predates our current procedures.  Lacking a /p doesn't prevent
a file from passing the validator at w3c, except for the
most recent HTML versions, so this file could probably still
pass today.

Anyway: cleaning up HTML is definitely welcome.  When we update
a file, these days, we also move it into the new directory
structure (the post-10K naming scheme), so this would be /4/46/46h.htm
rather than /etext91/xmas10h.htm or whatever.  

We also add a new header, and apply it to all other files for
this eBook.  In short, it's more involved than just fixing the
file.

David Widger has updated hundreds of titles, and we would welcome
anyone else with desires to work on this task.  Personally, I would
not mind waiting until we also have good XML procedures in place,
so that we could kill two birds with one stone (actually, more than
one stone, since it's more work).

Finally, let me mention that we usually also run gutcheck and
find/fix many other errors in a typical older title.

I hope this helps explain.  I didn't mention any limitations
of Tidy, but of course like any tool you need to make sure it
doesn't accidentally do greater harm than it solves.

Really finally: send updated files (or URLs) to
errata AT pglaf.org , even if you didn't do all of the
above.  Thanks!
  -- Greg

> For example, the first paragraph looked like this:
> 
> <p>
> <span class="caps">Marley</span> was dead: to begin with. There is no
> doubt whatever about that. The register of his burial was signed by
> the clergyman, the clerk, the undertaker, and the chief
> mourner. Scrooge signed it: and Scrooge&#8217;s name was good upon
> &#8217;Change, for anything he chose to put his hand to. Old Marley
> was as dead as a door-nail.
> 
> I ran the file through HTML-Tidy which turned it into this:
> 
>     <p><span class="caps">Marley</span> was dead: to begin with.
>     There is no doubt whatever about that. The register of his burial
>     was signed by the clergyman, the clerk, the undertaker, and the
>     chief mourner. Scrooge signed it: and Scrooge's name was good
>     upon 'Change, for anything he chose to put his hand to. Old
>     Marley was as dead as a door-nail.</p>
> 
> It took about ten seconds to open the, file run the file through tidy
> and save it.  This resulted in a file which is consistent, standards
> compliant and far easier to read and process.
> 
> Open tags in HTML are an artifact of SGML which can confuse some
> browsers, processing software and limit what you can do with CSS.
> 
> I suggest that all PG html files be run through Tidy before being
> released.
> 
> If anyone wants the tidy'd version let me know.
> 
> b/
> 
> -- 
> Brad Collins <brad@chenla.org>, Bangkok, Thailand
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From sly at victoria.tc.ca  Sun Oct 31 22:33:34 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Oct 31 22:33:51 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <wk1xfeunsi.fsf@chenla.org>
References: <wk1xfeunsi.fsf@chenla.org>
Message-ID: <Pine.GSO.4.58.0410312219110.4470@vtn1.victoria.tc.ca>


A possible argument against using tidy as you mention is that
it can have side effects the user does not intend.

In the example you gave below, it appears to have replaced the
numberic character entities the volunteer wanted to put in.
Also, in the tidy executable I have, results are not always
reliable in the contents of a "pre" tag; I have seen Tidy
remove blank lines from within them before.

When I'm preparing an html and plain text file for PG, I
almost always do so in a way which has all the line endings
in the same place, which makes it much easier for anyone in
the future making corrections etc...

I use tidy to check html, but not to produce a final version.

I've just checked, and the file in question, while not
necessarily the way I would have marked it up, _is_ valid
HTML 4.01 Transitional, which matches what is required
to add it to PG.

Andrew

On Mon, 1 Nov 2004, Brad Collins wrote:

>
> I took a look at the source for the recent handsome re-release of
> PG's edition of A Christmas Carol (46-h).
>
> The code is bit old, <p> tags are not terminated and the formating
> could be formated a bit better to make it more readable.
>
> For example, the first paragraph looked like this:
>
> <p>
> <span class="caps">Marley</span> was dead: to begin with. There is no
> doubt whatever about that. The register of his burial was signed by
> the clergyman, the clerk, the undertaker, and the chief
> mourner. Scrooge signed it: and Scrooge&#8217;s name was good upon
> &#8217;Change, for anything he chose to put his hand to. Old Marley
> was as dead as a door-nail.
>
> I ran the file through HTML-Tidy which turned it into this:
>
>     <p><span class="caps">Marley</span> was dead: to begin with.
>     There is no doubt whatever about that. The register of his burial
>     was signed by the clergyman, the clerk, the undertaker, and the
>     chief mourner. Scrooge signed it: and Scrooge's name was good
>     upon 'Change, for anything he chose to put his hand to. Old
>     Marley was as dead as a door-nail.</p>
>
> It took about ten seconds to open the, file run the file through tidy
> and save it.  This resulted in a file which is consistent, standards
> compliant and far easier to read and process.
>
> Open tags in HTML are an artifact of SGML which can confuse some
> browsers, processing software and limit what you can do with CSS.
>
> I suggest that all PG html files be run through Tidy before being
> released.
>
From jtinsley at pobox.com  Sun Oct 31 22:39:23 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sun Oct 31 22:39:41 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <Pine.GSO.4.58.0410312219110.4470@vtn1.victoria.tc.ca>
References: <wk1xfeunsi.fsf@chenla.org>
	<Pine.GSO.4.58.0410312219110.4470@vtn1.victoria.tc.ca>
Message-ID: <20041101063923.GD6833@panix.com>

On Sun, Oct 31, 2004 at 10:33:34PM -0800, Andrew Sly wrote:
>
>I use tidy to check html, but not to produce a final version.
>

Me too.

jim

From sly at victoria.tc.ca  Sun Oct 31 22:47:31 2004
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sun Oct 31 22:47:49 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <20041101062704.GA28837@pglaf.org>
References: <wk1xfeunsi.fsf@chenla.org> <20041101062704.GA28837@pglaf.org>
Message-ID: <Pine.GSO.4.58.0410312236280.7942@vtn1.victoria.tc.ca>


Just to clear up any confusion....

Our text of A Christmas Carol has already been moved into the
new directory structure. It's "base directory" can be found at:
http://www.gutenberg.org/dirs/4/46/

When a file is reposted, the Gutindex line is modified, so as to
no longer include the filename mask (which no longer serves a
purpose in finding the files).

Also, a note that not all files that are fixed up are
automatically put into the new directory structure.

For instance, our text of "The Count of Monte Cristo" was updated
in the past day, and is still in its old location:
http://www.gutenberg.org/etext/1184


Andrew


On Sun, 31 Oct 2004, Greg Newby wrote:

> On Mon, Nov 01, 2004 at 12:43:41PM +0700, Brad Collins wrote:
> >
> > I took a look at the source for the recent handsome re-release of
> > PG's edition of A Christmas Carol (46-h).
> >
> > The code is bit old, <p> tags are not terminated and the formating
> > could be formated a bit better to make it more readable.
> ...
>
> Strangely, this title doesn't have the usual filename mask
> in GUTINDEX.ALL.  I'm cc'ing George to see about adding this.
>
> The answer, as you saw, is that the file is old and therefore
> predates our current procedures.  Lacking a /p doesn't prevent
> a file from passing the validator at w3c, except for the
> most recent HTML versions, so this file could probably still
> pass today.
>
> Anyway: cleaning up HTML is definitely welcome.  When we update
> a file, these days, we also move it into the new directory
> structure (the post-10K naming scheme), so this would be /4/46/46h.htm
> rather than /etext91/xmas10h.htm or whatever.
>
> We also add a new header, and apply it to all other files for
> this eBook.  In short, it's more involved than just fixing the
> file.
>
> David Widger has updated hundreds of titles, and we would welcome
> anyone else with desires to work on this task.  Personally, I would
> not mind waiting until we also have good XML procedures in place,
> so that we could kill two birds with one stone (actually, more than
> one stone, since it's more work).
>
> Finally, let me mention that we usually also run gutcheck and
> find/fix many other errors in a typical older title.
>
> I hope this helps explain.  I didn't mention any limitations
> of Tidy, but of course like any tool you need to make sure it
> doesn't accidentally do greater harm than it solves.
>
> Really finally: send updated files (or URLs) to
> errata AT pglaf.org , even if you didn't do all of the
> above.  Thanks!
>   -- Greg
>
> > For example, the first paragraph looked like this:
> >
> > <p>
> > <span class="caps">Marley</span> was dead: to begin with. There is no
> > doubt whatever about that. The register of his burial was signed by
> > the clergyman, the clerk, the undertaker, and the chief
> > mourner. Scrooge signed it: and Scrooge&#8217;s name was good upon
> > &#8217;Change, for anything he chose to put his hand to. Old Marley
> > was as dead as a door-nail.
> >
> > I ran the file through HTML-Tidy which turned it into this:
> >
> >     <p><span class="caps">Marley</span> was dead: to begin with.
> >     There is no doubt whatever about that. The register of his burial
> >     was signed by the clergyman, the clerk, the undertaker, and the
> >     chief mourner. Scrooge signed it: and Scrooge's name was good
> >     upon 'Change, for anything he chose to put his hand to. Old
> >     Marley was as dead as a door-nail.</p>
> >
> > It took about ten seconds to open the, file run the file through tidy
> > and save it.  This resulted in a file which is consistent, standards
> > compliant and far easier to read and process.
> >
> > Open tags in HTML are an artifact of SGML which can confuse some
> > browsers, processing software and limit what you can do with CSS.
> >
> > I suggest that all PG html files be run through Tidy before being
> > released.
> >
> > If anyone wants the tidy'd version let me know.
> >
> > b/
> >
> > --
> > Brad Collins <brad@chenla.org>, Bangkok, Thailand
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From jtinsley at pobox.com  Sun Oct 31 22:49:27 2004
From: jtinsley at pobox.com (Jim Tinsley)
Date: Sun Oct 31 22:49:46 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <20041101062704.GA28837@pglaf.org>
References: <wk1xfeunsi.fsf@chenla.org> <20041101062704.GA28837@pglaf.org>
Message-ID: <20041101064927.GE6833@panix.com>

On Sun, Oct 31, 2004 at 10:27:04PM -0800, Greg Newby wrote:
>
>Strangely, this title doesn't have the usual filename mask
>in GUTINDEX.ALL.  I'm cc'ing George to see about adding this.
>

The file has been reposted into the new filesystem, and
its GUTINDEX entry is 

A Christmas Carol, A Ghost Story of Christmas, by Charles Dickens           46

which is correct.


>Really finally: send updated files (or URLs) to
>errata AT pglaf.org , even if you didn't do all of the
>above.  Thanks!

Please, please do not.

There are a few cases when it is better to send a whole
replacement file, but in my experience they amount to no
more than 1% of all cases. Mostly, sending a whole file
just causes a lot of unnecessary work and confusion.

And please don't send a replacement HTML just because you
like your HTML coded in a different style. The W3C sets quite
enough standards, thank you: if everyone starts insisting on
their own home-grown standards in addition, we'll never get 
anything useful done, even if open warfare does not break
out between the various DIY standards-setters!

And please, whatever you do, never send a file that has been
put through any kind of automatic converter, tidier, rewrapper,
or anything that programmatically alters the text. If you
really feel that something like that needs to be used, then you
really need to re-proof the whole thing.

jim

From gbnewby at pglaf.org  Sun Oct 31 22:52:14 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Oct 31 22:52:16 2004
Subject: [gutvol-d] PG Policy for Releasing HTML?
In-Reply-To: <Pine.GSO.4.58.0410312236280.7942@vtn1.victoria.tc.ca>
References: <wk1xfeunsi.fsf@chenla.org> <20041101062704.GA28837@pglaf.org>
	<Pine.GSO.4.58.0410312236280.7942@vtn1.victoria.tc.ca>
Message-ID: <20041101065214.GA29779@pglaf.org>

On Sun, Oct 31, 2004 at 10:47:31PM -0800, Andrew Sly wrote:
> 
> Just to clear up any confusion....
> 
> Our text of A Christmas Carol has already been moved into the
> new directory structure. It's "base directory" can be found at:
> http://www.gutenberg.org/dirs/4/46/

Duh!  Sorry...

It's probably time for me to stop typing now, and get some rest.
  -- Greg
From hyphen at hyphenologist.co.uk  Sun Oct 31 22:59:48 2004
From: hyphen at hyphenologist.co.uk (Dave Fawthrop)
Date: Sun Oct 31 23:00:20 2004
Subject: [gutvol-d] pglaf.org settings might block some messages
In-Reply-To: <20041101043133.GA26281@pglaf.org>
References: <20041101043133.GA26281@pglaf.org>
Message-ID: <lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com>

On Sun, 31 Oct 2004 20:31:33 -0800,  Greg Newby <gbnewby@pglaf.org> wrote:

| I've heard in the past 2 weeks from two people who discovered that
| their mail server settings resulted in pglaf.org bouncing their
| messages to gutvol-d@lists.pglaf.org (not necessarily through fault of
| their own).
| 
| If you've sent messages to pglaf.org, but are not sure if they got
| through, you might want to send a test message or check the list
| archives (http://lists.pglaf.org).  You can set your own list settings
| to get your own messages, and/or an acknowledgement message.
| 
| The setting we use is to enforce the standard that mail servers (aka
| MTAs) must have reverse DNS entries (also known as PTRs).  Several of
| the big ISPs, such as AOL, enforce this rule.  Others don't.  Mail
| from addresses with no corresponding PTR is (at pglaf.org) well over
| 99.9% likely to be spam.  Over 10,000 such messages are blocked
| per week on the pglaf.org server using this single rule.
| 
| The postfix server at pglaf.org has had this setting since the spring,
| and while I'm sure it has bounced some legitimate messages, it's only
| recently that anyone has brought any false positives to my attention.
|   -- Greg

I am blocked and use BTConnect, BT is one of the largest ISPs in the UK

-- 
Dave Fawthrop <hyphen hyphenologist co uk> 
<http://www.hyphenologist.co.uk>
8 Cooper Grove, Shelf, Halifax, HX3 7RF, UK, 
Tel/F/A +44(0)1274 691092. H 01274 677161 M: +44(0)7720455248 


From gbnewby at pglaf.org  Sun Oct 31 23:10:16 2004
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Oct 31 23:10:17 2004
Subject: [gutvol-d] pglaf.org settings might block some messages
In-Reply-To: <lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com>
References: <20041101043133.GA26281@pglaf.org>
	<lhnbo0959v8pntpal744qdm1gvk4v2bm92@4ax.com>
Message-ID: <20041101071016.GA30421@pglaf.org>

On Mon, Nov 01, 2004 at 06:59:48AM +0000, Dave Fawthrop wrote:
> On Sun, 31 Oct 2004 20:31:33 -0800,  Greg Newby <gbnewby@pglaf.org> wrote:
> 
> | I've heard in the past 2 weeks from two people who discovered that
> | their mail server settings resulted in pglaf.org bouncing their
> | messages to gutvol-d@lists.pglaf.org (not necessarily through fault of
> | their own).
> | 
> | If you've sent messages to pglaf.org, but are not sure if they got
> | through, you might want to send a test message or check the list
> | archives (http://lists.pglaf.org).  You can set your own list settings
> | to get your own messages, and/or an acknowledgement message.
> | 
> | The setting we use is to enforce the standard that mail servers (aka
> | MTAs) must have reverse DNS entries (also known as PTRs).  Several of
> | the big ISPs, such as AOL, enforce this rule.  Others don't.  Mail
> | from addresses with no corresponding PTR is (at pglaf.org) well over
> | 99.9% likely to be spam.  Over 10,000 such messages are blocked
> | per week on the pglaf.org server using this single rule.
> | 
> | The postfix server at pglaf.org has had this setting since the spring,
> | and while I'm sure it has bounced some legitimate messages, it's only
> | recently that anyone has brought any false positives to my attention.
> |   -- Greg
> 
> I am blocked and use BTConnect, BT is one of the largest ISPs in the UK

You don't seem to be blocked.  Your message arrived.  Do you have
bounced messages?  If so, please send them and I'll try to diagnose.
  -- gbn