From shimmin at uiuc.edu  Mon Aug  1 07:15:44 2005
From: shimmin at uiuc.edu (Robert Shimmin)
Date: Mon Aug  1 07:15:50 2005
Subject: [gutvol-d] Rule 6 Question
In-Reply-To: <Pine.LNX.4.44.0507311932420.7352-100000@durendal.durendal.org>
References: <Pine.LNX.4.44.0507311932420.7352-100000@durendal.durendal.org>
Message-ID: <42EE2E90.4030404@uiuc.edu>

Greg Weeks wrote:
> Is it possible to do a rule 6 clearance on a book where the author isn't a
> US national? Didn't non-US copyright holders only get the extension during
> the 1924-1964 block when they either renewed or filed for a "Notice of
> Intent to Enforce (NIE) a Restored Copyright"? Or is it just much more
> complicated than that?

This is my understanding, but I am not a lawyer:

Copyright on works of foreign nationals that were still under copyright 
in the creator's home country, and had fallen out of copyright in the 
U.S. due to failure to observe U.S. copyright formalities, were 
automatically restored by the Uruguay Round Agreement Act on January 1, 
1996.

However, in order to enforce the restored copyright against a party that 
had been utilizing the work's public domain status prior to the passage 
of the URAA (December 8, 1994), the rights holder must have filed a 
Notice of Intent to Enforce.

It is my understanding that filing an NIE is not necessary to enforce a 
copyright against an infringer whose infringement began after December 
8, 1994, and so PG cannot usefully exploit this loophole to create new 
editions of works whose rights holders did not file an NIE.

-- RS

From greg at durendal.org  Mon Aug  1 08:01:21 2005
From: greg at durendal.org (Greg Weeks)
Date: Mon Aug  1 08:01:28 2005
Subject: [gutvol-d] Rule 6 Question
In-Reply-To: <42EE2E90.4030404@uiuc.edu>
Message-ID: <Pine.LNX.4.44.0508011100320.10325-100000@durendal.durendal.org>

On Mon, 1 Aug 2005, Robert Shimmin wrote:

> Greg Weeks wrote:
> > Is it possible to do a rule 6 clearance on a book where the author isn't a
> > US national? Didn't non-US copyright holders only get the extension during
> > the 1924-1964 block when they either renewed or filed for a "Notice of
> > Intent to Enforce (NIE) a Restored Copyright"? Or is it just much more
> > complicated than that?
>
> This is my understanding, but I am not a lawyer:
>
> Copyright on works of foreign nationals that were still under copyright
> in the creator's home country, and had fallen out of copyright in the
> U.S. due to failure to observe U.S. copyright formalities, were
> automatically restored by the Uruguay Round Agreement Act on January 1,
> 1996.
>
> However, in order to enforce the restored copyright against a party that
> had been utilizing the work's public domain status prior to the passage
> of the URAA (December 8, 1994), the rights holder must have filed a
> Notice of Intent to Enforce.
>
> It is my understanding that filing an NIE is not necessary to enforce a
> copyright against an infringer whose infringement began after December
> 8, 1994, and so PG cannot usefully exploit this loophole to create new
> editions of works whose rights holders did not file an NIE.

That makes sense. At least as much sense as the rest of the US copyright
nonsense.

Thanks.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From ehage at hot.rr.com  Mon Aug  1 05:57:37 2005
From: ehage at hot.rr.com (Ellen V. Hage)
Date: Mon Aug  1 08:26:47 2005
Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey
Message-ID: <200508011357.j71DvwgJ022097@ms-smtp-04.texas.rr.com>

Hello,

 
My name is Ellen V. Hage and I am doing my dissertation on e-book technology
usage and self-efficacy.  I am off my target of needed participants.
Hopefully you all can help be out.  The survey has only 28 questions and
should take no more than 5 minutes to complete. The survey is a blind survey
and doesn't collect any confidential information.  The URL is: 

 
http://www.zoomerang.com/survey.zgi?p=WEB224GUJSUZZE

 
Thanks, again,

 
Ellen V. Hage

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050801/4dc47c7c/attachment.html
From hart at pglaf.org  Mon Aug  1 10:12:21 2005
From: hart at pglaf.org (Michael Hart)
Date: Mon Aug  1 10:12:24 2005
Subject: [gutvol-d] Rule 6 Question
In-Reply-To: <Pine.LNX.4.44.0508011100320.10325-100000@durendal.durendal.org>
References: <Pine.LNX.4.44.0508011100320.10325-100000@durendal.durendal.org>
Message-ID: <Pine.LNX.4.60.0508011011070.29909@pglaf.org>


BTW, does anyone know about Japanese copyrights from before WWII???

We have a volunteer with some pre-war Japanese materials.

I would usually use the 1923 cutoff.

Thanks!

Michael
From prosfilaes at gmail.com  Mon Aug  1 12:18:02 2005
From: prosfilaes at gmail.com (David Starner)
Date: Mon Aug  1 12:18:10 2005
Subject: [gutvol-d] Rule 6 Question
In-Reply-To: <Pine.LNX.4.60.0508011011070.29909@pglaf.org>
References: <Pine.LNX.4.44.0508011100320.10325-100000@durendal.durendal.org>
	<Pine.LNX.4.60.0508011011070.29909@pglaf.org>
Message-ID: <6d99d1fd050801121831fc8f55@mail.gmail.com>

On 8/1/05, Michael Hart <hart@pglaf.org> wrote:
> 
> BTW, does anyone know about Japanese copyrights from before WWII???
> 
> We have a volunteer with some pre-war Japanese materials.
> 
> I would usually use the 1923 cutoff.

It'd be Life+50 in Japan, and thus for the US they'd have to die
before 1946 and not have renewed their work in the US for a Rule 6
clearance, if I understand correctly.

What about foreign nationals who became American citizens? The book
I'm looking at is Nabokov's Alice in Wonderland, which he published in
1923 in France, and then he became an American citizen in 1945.
From collin at xs4all.nl  Mon Aug  1 12:52:50 2005
From: collin at xs4all.nl (Branko Collin)
Date: Mon Aug  1 12:37:42 2005
Subject: [gutvol-d] Rule 6 Question
In-Reply-To: <Pine.LNX.4.60.0508011011070.29909@pglaf.org>
References: <Pine.LNX.4.44.0508011100320.10325-100000@durendal.durendal.org>
Message-ID: <42EE99B2.32640.17423734@localhost>

On 1 Aug 2005, at 10:12, Michael Hart wrote:

> BTW, does anyone know about Japanese copyrights from before WWII???
> 
> We have a volunteer with some pre-war Japanese materials.
> 
> I would usually use the 1923 cutoff.

Why not now? Is this for publication in the US? If so, pre-1923 
should be fine (IANAL). If not, I suggest the volunteer talk to the 
Aozora Bunko people. 

The Wikipedia article on copyright in Japan mentions it as a Life+50 
country, but also mentions that it has some war time exceptions.


-- 
branko collin
collin@xs4all.nl
From Gutenberg9443 at aol.com  Fri Aug  5 16:22:19 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Fri Aug  5 16:22:44 2005
Subject: [gutvol-d] In-Reply-to
Message-ID: <fd.190af9eb.30254eab@aol.com>

 
In a message dated 7/29/2005 2:14:54 AM Mountain Daylight Time,  
Bowerbird@aol.com writes:

but  nonetheless, i don't see an in-reply-to header
on your post either, so yes,  the problem _is_ there...


Then the problem is at your end, not at mine, because I have a reply  to 
header.
 

Anne

Do you like to  breathe?
Then save the trees! 
Begin a personal relationship
with an  ebook 
TODAY!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050805/5f949c83/attachment.html
From j.hagerson at comcast.net  Sat Aug  6 02:40:53 2005
From: j.hagerson at comcast.net (John Hagerson)
Date: Sat Aug  6 02:41:23 2005
Subject: [gutvol-d] E-books on iPods?
Message-ID: <00de01c59a6a$f3393270$0300a8c0@sarek>

I received a question about loading the MP3 version of our e-books into an
iPod. I may be the only person on the planet who does not own an iPod, but I
have no experience doing this.

Has anyone successfully loaded our e-books into an iPod? If so, could you
please provide me with a procedure that I can pass along?

Thank you.


From collin at xs4all.nl  Sat Aug  6 03:50:33 2005
From: collin at xs4all.nl (Branko Collin)
Date: Sat Aug  6 03:35:49 2005
Subject: [gutvol-d] E-books on iPods?
In-Reply-To: <00de01c59a6a$f3393270$0300a8c0@sarek>
Message-ID: <42F4B219.1734.8CFD702@localhost>


On 6 Aug 2005, at 4:40, John Hagerson wrote:

> I received a question about loading the MP3 version of our e-books
> into an iPod. I may be the only person on the planet who does not own
> an iPod, but I have no experience doing this.
> 
> Has anyone successfully loaded our e-books into an iPod? If so, could
> you please provide me with a procedure that I can pass along?

I do not own an iPod, but can tell you this: an MP3 e-book is an MP3 
file, just like all the music MP3s out there. Your friend should 
upload the ebook the same way s/he uploads music MP3s. Surely there 
must be something in either the iPod or the iTunes manual about this 
procedure? This is basic functionality. 

-- 
branko collin
collin@xs4all.nl
From Bowerbird at aol.com  Sat Aug  6 06:00:43 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug  6 06:00:50 2005
Subject: [gutvol-d] In-Reply-to
Message-ID: <86.2d93f1b7.30260e7b@aol.com>

anne said:
>    Then the problem is at your end, not at mine, 
>    because I have a reply to header.

oh please, anne.   wake up.

of course our posts have a reply-to header.

what they do not have is an in-reply-to header.

pay attention, dear.   you could start by
noticing the subject-header of this thread...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050806/63b4ebeb/attachment.html
From collin at xs4all.nl  Sun Aug  7 07:24:16 2005
From: collin at xs4all.nl (Branko Collin)
Date: Sun Aug  7 07:46:15 2005
Subject: [gutvol-d] newsletter?
Message-ID: <42F635B0.16647.76987F@localhost>


When I tried to look up the newsletter on the website, I noticed the 
most recent copy was of three months ago. Are the newsletters no 
longer published at www.gutenberg.org, or did I experience some weird 
caching problem?


-- 
branko collin
collin@xs4all.nl
From cannona at fireantproductions.com  Sun Aug  7 13:12:46 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Aug  7 13:19:30 2005
Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey
Message-ID: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't think this went through the first time, so here goes again.  Sorry
if it's a duplicate.

If you haven't gotten around to completing this survey, let me encourage
you to do so.  Ellen is not just someone who dropped in here to get her
survey filled out.  She has been a volunteer for a while and helped out
sending CDs and DVDs.  Also, it doesn't take long at all.

Sincerely
Aaron Cannon


At 07:57 AM 8/1/2005, you wrote:
>Hello,
>
>My name is Ellen V. Hage and I am doing my dissertation on e-book
>technology usage and self-efficacy.  I am off my target of needed
>participants.  Hopefully you all can help be out.  The survey has only 28
>questions and should take no more than 5 minutes to complete. The survey
>is a blind survey and doesn't collect any confidential information.  The
>URL is:
>
><http://www.zoomerang.com/survey.zgi?p=WEB224GUJSUZZE>http://www.zoomerang.com/survey.zgi?p=WEB224GUJSUZZE
>
>Thanks, again,
>
>Ellen V. Hage
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFC9mzMI7J99hVZuJcRAsGOAKCvh7h/LcfCOTzg4U+PXXA5mnEPzwCfUWBE
n6lOkWq+NGFLZS9JVR0Npg8=
=4m+Y
-----END PGP SIGNATURE-----

From gbnewby at pglaf.org  Sun Aug  7 14:06:19 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Aug  7 14:06:20 2005
Subject: [gutvol-d] newsletter?
In-Reply-To: <42F635B0.16647.76987F@localhost>
References: <42F635B0.16647.76987F@localhost>
Message-ID: <20050807210619.GB18600@pglaf.org>

On Sun, Aug 07, 2005 at 04:24:16PM +0200, Branko Collin wrote:
> 
> When I tried to look up the newsletter on the website, I noticed the 
> most recent copy was of three months ago. Are the newsletters no 
> longer published at www.gutenberg.org, or did I experience some weird 
> caching problem?

Hi, Branko.  It was great seeing you at WTH!

This is a task that was done by hand.  We had a
semi-automated method, but it didn't survive our
many mailing list and Web site changes.  The
fellow who had been doing it recently resigned
(about 3 weeks ago).

It's pretty easy to maintain the archive.  If
anyone is interested in doing it, let me know.  

Maybe we should just link to the pipermail archives,
or mirror them.  These are publicly available:
	http://lists.pglaf.org/pipermail/gweekly/

  -- Greg
From marcello at perathoner.de  Sun Aug  7 14:34:55 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Aug  7 14:41:50 2005
Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey
In-Reply-To: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com>
References: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com>
Message-ID: <42F67E7F.6030807@perathoner.de>

Aaron Cannon wrote:

> If you haven't gotten around to completing this survey, let me encourage
> you to do so.  Ellen is not just someone who dropped in here to get her
> survey filled out.  She has been a volunteer for a while and helped out
> sending CDs and DVDs.  Also, it doesn't take long at all.

I'm afraid that recruiting a significant portion of your respondents 
from gutvol-d will give you non-representative results.


Also, many questions are `ambiguous':

   "I feel confident about my ability to purchase e-books online."

I feel *very* confident about my ability to do so, but I would *never* 
do it. I would buy the paper edition, because I'm sure the paperbook 
remains usable if I have to change my bookshelves. Also I can lend the 
paperbook to my friends etc.


The survey should be separated into 2 parts. Part 1 about the acceptance 
of free ebooks, Part 2 about the acceptance of fettered ebooks / DRM / 
proprietary devices / proprietary formats etc.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From cannona at fireantproductions.com  Sun Aug  7 15:58:12 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Aug  7 16:00:28 2005
Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey
In-Reply-To: <42F67E7F.6030807@perathoner.de>
References: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com>
	<42F67E7F.6030807@perathoner.de>
Message-ID: <6.2.1.2.0.20050807175259.04155688@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 04:34 PM 8/7/2005, you wrote:

>I'm afraid that recruiting a significant portion of your respondents from
>gutvol-d will give you non-representative results.


So that we're all clear, I wasn't asked to vouch for her.  I simply thought
I'd step in and do so as she isn't that active on this list from what I can
tell, but she does help out with the DVDs.

Still, I agree with you that asking the volunteers who create Ebooks about
the same will produce skewed results.  I would assume that this is not the
only place she has sought help, but can't say for sure.


Sincerely
Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFC9pKEI7J99hVZuJcRAk7jAJwKKtQC5r4MrIH/gyS+JmvuhkYzSACghUEM
ROwcwxrw4lqpSGhcsawMlrE=
=U15S
-----END PGP SIGNATURE-----

From collin at xs4all.nl  Sun Aug  7 17:22:42 2005
From: collin at xs4all.nl (Branko Collin)
Date: Sun Aug  7 17:07:24 2005
Subject: [gutvol-d] newsletter?
In-Reply-To: <20050807210619.GB18600@pglaf.org>
References: <42F635B0.16647.76987F@localhost>
Message-ID: <42F6C1F2.30393.29A8E81@localhost>

On 7 Aug 2005, at 14:06, Greg Newby wrote:
> On Sun, Aug 07, 2005 at 04:24:16PM +0200, Branko Collin wrote:
> > 
> > When I tried to look up the newsletter on the website, I noticed the
> > most recent copy was of three months ago. Are the newsletters no
> > longer published at www.gutenberg.org, or did I experience some
> > weird caching problem?
> 
> Hi, Branko.  It was great seeing you at WTH!

Likewise!
 
> Maybe we should just link to the pipermail archives,
> or mirror them.  These are publicly available:
> http://lists.pglaf.org/pipermail/gweekly/

This seems an adequate solution, and is probably easier to implement 
than building a new volunteer.

-- 
branko collin
collin@xs4all.nl
From Bowerbird at aol.com  Sun Aug  7 21:06:46 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Aug  7 21:12:14 2005
Subject: [gutvol-d] f.t.p. for banana-cream
Message-ID: <1e9.418f0a5c.30283456@aol.com>

greg, if you could arrange that f.t.p. access
for me when you get a chance, please...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050808/ec5ab0dc/attachment.html
From ag737 at freenet.carleton.ca  Thu Aug 11 12:04:18 2005
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Thu Aug 11 13:05:08 2005
Subject: [gutvol-d] Historical book publishing statistics
Message-ID: <7cd0b7f2e3.7f2e37cd0b@ncf.ca>


Does anyone have quick-and-dirty, and preferably annual, statistics on 
numbers of books published in the United States, the UK, and any other 
country, over time?


From hart at pglaf.org  Thu Aug 11 13:10:20 2005
From: hart at pglaf.org (Michael Hart)
Date: Thu Aug 11 13:10:21 2005
Subject: [gutvol-d] Historical book publishing statistics
In-Reply-To: <7cd0b7f2e3.7f2e37cd0b@ncf.ca>
References: <7cd0b7f2e3.7f2e37cd0b@ncf.ca>
Message-ID: <Pine.LNX.4.60.0508111309230.21762@pglaf.org>


On Thu, 11 Aug 2005, Wallace J.McLean wrote:

>
> Does anyone have quick-and-dirty, and preferably annual, statistics on
> numbers of books published in the United States, the UK, and any other
> country, over time?

You can get some of these via "The Bowker Annual" at most reference desks.

Michael

I have last year's, and the 1955, right here, if these are of interest.
From sly at victoria.tc.ca  Thu Aug 11 15:12:25 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Aug 11 15:19:13 2005
Subject: [gutvol-d] Historical book publishing statistics
In-Reply-To: <7cd0b7f2e3.7f2e37cd0b@ncf.ca>
References: <7cd0b7f2e3.7f2e37cd0b@ncf.ca>
Message-ID: <Pine.GSO.4.58.0508111509170.18524@vtn1.victoria.tc.ca>


In the library, I've seen large annual volumes of "New Published Books"
from the earlyer 1900s, but I don't remember a summmary of numbers in
them.

A quick google search led me to some pages like these, that
might help:

http://observer.guardian.co.uk/review/story/0,6903,1288046,00.html

http://mjroseblog.typepad.com/buzz_balls_hype/2005/06/is_yours_the_28.html

http://www.primezone.com/pub/headlines.mhtml?d=53251


Andrew

On Thu, 11 Aug 2005, Wallace J.McLean wrote:

>
> Does anyone have quick-and-dirty, and preferably annual, statistics on
> numbers of books published in the United States, the UK, and any other
> country, over time?
>
>
From marcello at perathoner.de  Thu Aug 11 15:31:05 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Aug 11 15:31:20 2005
Subject: [gutvol-d] Amazon abuse of PG trademark
Message-ID: <42FBD1A9.4090204@perathoner.de>

See:

   http://www.alexa.com/browse/general/?CategoryID=1219096

I'm referring to the sidebar that says: "Bestselling Products in Project 
Gutenberg". This gives the impression we are selling those books.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Thu Aug 11 21:55:52 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Aug 11 21:55:54 2005
Subject: [gutvol-d] Re: Problems with PG uploads
In-Reply-To: <200507272144.39040.donovan@abs.net>
References: <200507272127.58606.donovan@abs.net>
	<200507272144.39040.donovan@abs.net>
Message-ID: <20050812045552.GD1544@pglaf.org>

On Wed, Jul 27, 2005 at 09:44:38PM -0400, D Garcia wrote:
> On Wednesday 27 July 2005 09:27 pm, D Garcia wrote:
> > Quite a few of the DP folks are seeing the following error when uploading
> > files to PF for the WW Team:
> >
> > "Aborting: no user information!"
> 
> (Yes, I'm replying to myself ... :)
> 
> The problem appears to be with the 'cz' program, which was recently updated by 
> Jim Timsley. I have sent him a note asking him to look into it. Apparently it 
> isn't able to determine what user it is running as on the server, and emits 
> the "Aborting: no user information!" error message and exits.

(Working through some older emails)

I'm just confirming that this was fixed -- there was a little
weirdness in our Apache configuration.  I tweaked the upload scripts,
and voila!
  -- Greg
From cweyant at twcny.rr.com  Fri Aug 12 04:58:07 2005
From: cweyant at twcny.rr.com (Curtis A. Weyant)
Date: Fri Aug 12 05:35:20 2005
Subject: [gutvol-d] E-books on iPods?
In-Reply-To: <42F4B219.1734.8CFD702@localhost>
References: <42F4B219.1734.8CFD702@localhost>
Message-ID: <42FC8ECF.7000505@twcny.rr.com>

There's a news item on the PG frontpage about converting TEXT for use 
with the iPod's Note feature. Short story, go here:

http://www.ambience.sk/ipod-ebook-creator/ipod-book-notes-text-conversion.php

Curtis.

P.S. I also do not have an iPod but an hoping to remedy that in the near 
future. :O)

Branko Collin wrote:
> On 6 Aug 2005, at 4:40, John Hagerson wrote:
> 
> 
>>I received a question about loading the MP3 version of our e-books
>>into an iPod. I may be the only person on the planet who does not own
>>an iPod, but I have no experience doing this.
>>
>>Has anyone successfully loaded our e-books into an iPod? If so, could
>>you please provide me with a procedure that I can pass along?
> 
> 
> I do not own an iPod, but can tell you this: an MP3 e-book is an MP3 
> file, just like all the music MP3s out there. Your friend should 
> upload the ebook the same way s/he uploads music MP3s. Surely there 
> must be something in either the iPod or the iTunes manual about this 
> procedure? This is basic functionality. 
> 

From collin at xs4all.nl  Fri Aug 12 07:55:28 2005
From: collin at xs4all.nl (Branko Collin)
Date: Fri Aug 12 07:39:57 2005
Subject: [gutvol-d] What the Hack?! and other conferences next week
In-Reply-To: <20050725025148.GA14434@pglaf.org>
References: <42E42405.32070.11B1840@localhost>
Message-ID: <42FCD480.15055.7322F5@localhost>


On 24 Jul 2005, at 19:51, Greg Newby wrote:
> On Sun, Jul 24, 2005 at 11:28:05PM +0200, Branko Collin wrote:
> 
> > http://www.whatthehack.org
> 
> One of my talks will be about Project Gutenberg, while
> the other is about information retrieval:
> 
> Saturday July 30
> "Literature wants to be free!" (Day 3, Tent 4, 1:00-2:00 pm)
> 
> Friday July 29
> "Search engine internal processes" (Day 2, Tent 2, 10:00 - 11:00 am)
> 
> These will be recorded, but not streamed live.

I have been trying to find these recordings, but the Hacktick hacker 
camps have a tradition of sporting the Worst Website Evar, and this 
one is no exception. In other words, I could not find them.

-- 
branko collin
collin@xs4all.nl
From curtzt at nuprometheus.com  Tue Aug 16 18:29:04 2005
From: curtzt at nuprometheus.com (Thad Curtz)
Date: Tue Aug 16 18:54:09 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <92d068b011c466997116e41ba04a2359@nuprometheus.com>

Hi. I'm a college lit teacher and have been thinking about doing 
footnotes and annotations of the sort most editions for college 
students supply for some PG classics, so my students could have the 
usual kinds of help reading them, and print them out, mark them up, 
bring them to class for discussion, etc. (I think the lack of notes. 
not the quality of the texts themselves, is currently the main barrier 
to more widespread use of PG texts in classes.) If I did this, I'd want 
to make the annotations available free for anybody else who wanted to 
use them for teaching (or just to read). Some form of structured markup 
that allowed people to reformat and print to different sizes and 
devices in the future would be nice, rather than pdfs...

I've looked at the archive, and haven't found any discussion of this 
topic; any suggestions or advice about such a project (or where to look 
next) would be appreciated.

Thanks,
Thad

  
From jtinsley at pobox.com  Tue Aug 16 19:31:06 2005
From: jtinsley at pobox.com (Jim Tinsley)
Date: Tue Aug 16 19:31:33 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
References: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
Message-ID: <20050817023106.GA6405@panix.com>

On Tue, Aug 16, 2005 at 06:29:04PM -0700, Thad Curtz wrote:
>Hi. I'm a college lit teacher and have been thinking about doing 
>footnotes and annotations of the sort most editions for college 
>students supply for some PG classics, so my students could have the 
>usual kinds of help reading them, and print them out, mark them up, 
>bring them to class for discussion, etc. (I think the lack of notes. 
>not the quality of the texts themselves, is currently the main barrier 
>to more widespread use of PG texts in classes.) If I did this, I'd want 
>to make the annotations available free for anybody else who wanted to 
>use them for teaching (or just to read). Some form of structured markup 
>that allowed people to reformat and print to different sizes and 
>devices in the future would be nice, rather than pdfs...
>
>I've looked at the archive, and haven't found any discussion of this 
>topic; any suggestions or advice about such a project (or where to look 
>next) would be appreciated.

It has been discussed a few times before, but perhaps
not on this list.

You are, of course, welcome to create annotated
versions for yourself, and to make them available through
your own website. That's great, and we hope it works well
for you.

However, we will not accept them back for posting into PG
as revised texts. If we did, we'd be inundated with Creationists
annotating Darwin, Darwinists annotating Genesis, and every
nutball who would be instantly removed from the lobby of any
publisher wanting to add their essays on The Meaning Of . . .

Believe me, they're out there. I've dealt with a few. It's
not a practical proposition. Once you allow one person to
annotate our texts, you have to give others an equal right,
and the whole thing devolves instantly.

As for formatting, while XML is theoretically ideal, there
are practical issues. Some members of this list are breaking 
ground in this area, and may be able to make suggestions. 
Nearly all current markup work uses HTML + CSS, which is
pretty flexible, and, as a practical matter, HTML is the 
Universal Input at the moment -- it can be immediately
converted to PDA formats, PDF and so on. If you choose your
conventions carefully, HTML or XHTML can be as well-structured
as you like.

jim

From Bowerbird at aol.com  Tue Aug 16 20:01:31 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Aug 16 20:01:52 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <15d.56e5af43.3034028b@aol.com>

thad said:
>    Hi. I'm a college lit teacher and 
>    have been thinking about doing footnotes and annotations 
>    of the sort most editions for college students supply 
>    for some PG classics, so my students could have the
>    usual kinds of help reading them, and print them out, 
>    mark them up, bring them to class for discussion, etc. 
>    (I think the lack of notes. not the quality of the texts 
>    themselves, is currently the main barrier to 
>    more widespread use of PG texts in classes.) 
>    If I did this, I'd want to make the annotations available 
>    free for anybody else who wanted to use them for teaching 
>    (or just to read). Some form of structured markup that 
>    allowed people to reformat and print to different sizes 
>    and devices in the future would be nice, rather than pdfs...

this is an entirely reasonable course of action.

what you need is a viewer-program that can
incorporate freestanding annotations into the
presentation of the text (which remains static).

i have written such a viewer-program, but have
not yet programmed the annotation capabilities.

if you'd be able to outline the ones you would like,
and be willing to test them once i programmed 'em,
i'd be happy to proceed that way...

we usually think of "annotations" as textual, but
they can actually manifest in a variety of ways,
such as margin highlights, graphics, movies, etc.
(see tk3 at nightkitchen.com for a program that
is adept at allowing multiple types of annotations.)

making your annotations widely and freely available,
such as putting them on your website for download,
is a generous thing to do...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050816/5e3080a9/attachment.html
From sly at victoria.tc.ca  Tue Aug 16 23:56:23 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Aug 16 23:56:43 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
References: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
Message-ID: <Pine.GSO.4.58.0508162355030.19442@vtn1.victoria.tc.ca>


On Tue, 16 Aug 2005, Thad Curtz wrote:

> Hi. I'm a college lit teacher and have been thinking about doing
> footnotes and annotations of the sort most editions for college
> students supply for some PG classics, so my students could have the
> usual kinds of help reading them, and print them out, mark them up,
> bring them to class for discussion, etc.


I hadn't thought about this before, but I can see the sense
in Jim's argument. On one hand, I think it's too bad if
PG misses out on a well-researched, comprehensive set of
annotations. On the other hand, I can see it that in the
larger scheme of things it's probably better to stick with
our practise of only adding contemparary material if it has
already been published in "dead-tree" form.

Thad:
If you are looking for some other longer-term home for
this type of text, one possibility may be Wikibooks.

See:
http://en.wikibooks.org/wiki/Wikibooks:Annotated_texts


Thanks,
Andrew
From collin at xs4all.nl  Wed Aug 17 02:34:56 2005
From: collin at xs4all.nl (Branko Collin)
Date: Wed Aug 17 02:19:34 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
Message-ID: <430320E0.31217.11260C@localhost>


On 16 Aug 2005, at 18:29, Thad Curtz wrote:

> Hi. I'm a college lit teacher and have been thinking about doing
> footnotes and annotations of the sort most editions for college
> students supply for some PG classics, so my students could have the
> usual kinds of help reading them, and print them out, mark them up,
> bring them to class for discussion, etc. (I think the lack of notes.
> not the quality of the texts themselves, is currently the main barrier
> to more widespread use of PG texts in classes.) If I did this, I'd
> want to make the annotations available free for anybody else who
> wanted to use them for teaching (or just to read). Some form of
> structured markup that allowed people to reformat and print to
> different sizes and devices in the future would be nice, rather than
> pdfs...
> 
> I've looked at the archive, and haven't found any discussion of this
> topic; any suggestions or advice about such a project (or where to
> look next) would be appreciated.

I would assume this works the way the creation of any other 
educational material works: teachers sit down and write the stuff. At 
least, that is the impression I got from looking at text books and 
annotated editions. What did your fellow teachers suggest?

-- 
branko collin
collin@xs4all.nl
From Bowerbird at aol.com  Wed Aug 17 10:27:12 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 10:32:28 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <1c3.2ee2dccc.3034cd70@aol.com>

branko said:
>    I would assume this works the way 
>    the creation of any other educational material works: 
>    teachers sit down and write the stuff.

but once you've done that,
you want your students to have
a tool that presents the annotations
alongside the text to which they refer.

preferably a tool that then lets _them_
edit them, or add their own annotations.

and finally, a tool that lets everyone _share_
annotations, perhaps even _pool_ all of them
-- within a single class, or across many years --
perhaps in such a manner that the annotations
themselves can become an independent e-book.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/81689281/attachment.html
From scott_bulkmail at productarchitect.com  Wed Aug 17 12:18:33 2005
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Aug 17 12:29:11 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
References: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
Message-ID: <p06110405bf2937b00688@[192.168.0.52]>

Sounds like a wonderful project!  Even though PG can't host it, don't let that dampen your enthusiasm.  There are lots of other ways to make the results available for free.


>Some form of structured markup that allowed people to reformat and print to different sizes and devices in the future would be nice, rather than pdfs...

Agreed.  Here's the challenge: if you make annotations within the existing PG files, then it's difficult to just see the annotations.  If you keep the annotations separate, it's hard to see them in context.

The ideal solution would be a tiny bit of automation (perhaps created by a student if techie stuff isn't your thing).  Then you could keep the annotations separate, and just add small markers to the original text.  Simple scripts could do things like:
- format the annotations on their own
- insert the annotations into the text, preferably with appropriate HTML wrapper that lets readers show/hide using CSS (style sheets) or JavaScript.

As others have noted, HTML or the newer XHTML is ideal here.  (If a specific "book" that you need doesn't exist in HTML, I'll bet some people here would help do at least a basic conversion.)


>I've looked at the archive, and haven't found any discussion of this topic; any suggestions or advice about such a project (or where to look next) would be appreciated.

If you want to take a more ambitious approach, review the list for discussions on "PGTEI", "TEI" and "XML".  But, effective use of these is likely to be more work.  (It's not overly difficult given the appropriate technical background, so it depends on what sort of resources you have available.)

Note for the record that highly-structured XML "originals" would let you "point" each annotation to the appropriate place in a document without altering the original (using XPATH).  That's great in theory, but is again probably too much tech work for your project.  (I'd be happy for an XPATH expert to show that I'm wrong; perhaps it's easier than I think to point to a specific location in a typical PG HTML file, presumably using Tidy and such to convert to XHTML.)
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From Bowerbird at aol.com  Wed Aug 17 12:59:49 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 13:00:06 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <1c1.2ea2ca2c.3034f135@aol.com>

scott said:
>    The ideal solution would be a tiny bit of automation 
>    (perhaps created by a student if techie stuff isn't your thing).? 

"tiny" is a very misleading term, i think.
unless you can show me this "tiny" thing.


>    Then you could keep the annotations separate, 
>    and just add small markers to the original text.? 

um, keeping the annotations separate is a good idea.
but requiring "small markers" in the original text is not.
the text should remain unchanged, for many reasons.


>    Simple scripts could do things like:
>    - format the annotations on their own
>    - insert the annotations into the text, 
>    preferably with appropriate HTML wrapper that 
>    lets readers show/hide using CSS (style sheets) or JavaScript.

except what you have described is _far_ from "simple",
as well as i can tell.   do you have sample implementations?


>    As others have noted, HTML 
>    or the newer XHTML is ideal here.

"ideal"?   i think not.

indeed, to the direct contrary, i believe that heavy markup
makes on-the-fly adding of annotations _extremely_ difficult.

but, as i said, if you can show me some examples,
ones that make it as simple as you make it sound,
i am open to being convinced otherwise...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/d5a8a009/attachment.html
From scott_bulkmail at productarchitect.com  Wed Aug 17 13:38:06 2005
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Aug 17 13:44:16 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <1c1.2ea2ca2c.3034f135@aol.com>
References: <1c1.2ea2ca2c.3034f135@aol.com>
Message-ID: <p06110402bf294c24fccc@[192.168.0.52]>

> >   The ideal solution would be a tiny bit of automation
> >   (perhaps created by a student if techie stuff isn't your thing). 
>
>"tiny" is a very misleading term, i think.
>unless you can show me this "tiny" thing.

All that's required is a tab-delimitted file (or database or spreadsheet) with 2 columns: id, annotation.

Now, in your favorite scripting or programming language, iterate thru the file, read the id and annotation, then replace the former with the latter in the marked-up book.  (Including the appropriate (X)HTML wrapper, as noted.)

For someone who doesn't write scripts, it's not trivial.  For someone who does, it's a few lines of code.

Note that with a little more techie work, the process could be simplified for the annotaters.  They could add the annotation text directly in the document, surrounded by unique delimiters.  Then, a script could generate any version, e.g. replace delimiters with (X)HTML wrapper and/or with a generated unique ID; extract the annotations to a separate (X)HTML file that can be printed on its own, etc.

All this stuff is pretty easy for a college student with any scripting experience.


> >   Then you could keep the annotations separate,
> >   and just add small markers to the original text. 
>
>um, keeping the annotations separate is a good idea.
>but requiring "small markers" in the original text is not.
>the text should remain unchanged, for many reasons.

Sure, in the ideal world.  Meanwhile, inserting unique IDs is a pragmatic solution.  Of course there's some work involved to sync the marked version with any future PG updates, but given "diff" tools, it's not that much work for a handful of annotated books.  (And, with a little work, could be largely automated.)

And, I'm all in favor of someone taking the time to get an XPATH solution working.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From jon at noring.name  Wed Aug 17 13:36:24 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 13:51:55 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <1c1.2ea2ca2c.3034f135@aol.com>
References: <1c1.2ea2ca2c.3034f135@aol.com>
Message-ID: <1499876850.20050817143624@noring.name>

Bowerbird wrote:
> scott said:

>>?As others have noted, HTML
>>?or the newer XHTML is ideal here.

>  "ideal"??  i think not.

XHTML is certainly not the best XML-based vocabulary for marking up
books. A carefully selected subset of TEI is much better. Both are
used in the context of XML.

XHTML can be adapted to books (and is.) If XHTML is used in a major
way to markup books, it makes sense to come up with a standardized
set of pre-defined classes to identify text structures and content
semantics. At this point, though, it still makes more sense to switch
to TEI. This is apparently what DP plans to do.


> indeed, to the direct contrary, i believe that heavy markup
> makes on-the-fly adding of annotations _extremely_ difficult.

How is that? Annotations can be linked to the text using the markup
as "hooks" (e.g., using XPointer.) The more markup there is, the more
hooks to latch onto.


> but, as i said, if you can show me some examples,
> ones that make it as simple as you make it sound,
> i am open to being convinced otherwise...

XPointer provides an XML-based standard to point to any spot in an XML
document.

Pointing to 'id' ("fragment identifiers") is the most robust and can
survive various types of document edits. In plain text systems, where
annotations have to hook to the content itself (rather than markup
which is separate from the content), it is more difficult to prevent
link breakage.

Jon

From Bowerbird at aol.com  Wed Aug 17 15:13:17 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 15:13:33 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <fe.19c619c0.3035107d@aol.com>

jon said:
>   How is that? 
>    Annotations can be linked to the text 
>    using the markup as "hooks" (e.g., using XPointer.) 
>    The more markup there is, 
>    the more hooks to latch onto.

please show me -- and the original poster --
an implementation that actually works, now.


>    Pointing to 'id' ("fragment identifiers") is the most robust 
>    and can survive various types of document edits. 
>    In plain text systems, where annotations have to 
>    hook to the content itself (rather than markup
>    which is separate from the content), 
>    it is more difficult to prevent link breakage.

this is another case of disingenuous sleight-of-hand.

you are trying to make us believe that
the text changes and the markup doesn't.

what you've done, though, is merely specified that there is
markup which _cannot_ change (the "fragment identifiers"),
so as to assure link-permanence.   if i were to specify content
that can not change, i can guarantee link-permanence as well.

and in almost all cases, we're more likely to have text-invariance
than to have markup-invariance.   (but this is beside the point,
since it's easy enough to specify invariance of text and markup.
it is also very easy to show link breakage in cases of variance.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/8e626238/attachment.html
From Bowerbird at aol.com  Wed Aug 17 15:37:54 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 15:38:19 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <1a8.3cf237d1.30351642@aol.com>

scott said:
>    All that's required is a tab-delimitted file 
>    (or database or spreadsheet) with 2 columns: 
>    id, annotation.

again, i say, please provide people with this solution!

if it's as easy as you say, it shouldn't take much time.   right?


>    Now, in your favorite scripting or programming language,
>    iterate thru the file, read the id and annotation, 
>    then replace the former with the latter in the marked-up book.? 
>    (Including the appropriate (X)HTML wrapper, as noted.)

you've got some slippery thinking here about the "id" markers.
do they already exist in the text?   how did they get there?
how did the database, which links id markers and annotations,
come into existence?   how are annotations shared with others?
can annotations be made on annotations?   what about things like
graphics and movies -- how can they be utilized as annotations?


>    For someone who doesn't write scripts, it's not trivial.? 
>    For someone who does, it's a few lines of code.

as above, if it's just a few lines of code, why won't you write 'em?


>    Note that with a little more techie work, 
>    the process could be simplified for the annotaters.

that simplification would be a good thing, yes, a good thing indeed.


>    They could add the annotation text directly in the document, 
>    surrounded by unique delimiters.? 

and maybe with just a little bit more techie work,
you could provide those "unique delimiters" for them,
save them the trouble.   not everyone knows x.h.t.m.l.,
and not everyone wants to learn it.


>    Then, a script could generate any version, 
>    e.g. replace delimiters with (X)HTML wrapper 
>    and/or with a generated unique ID; 
>    extract the annotations to a separate (X)HTML file 
>    that can be printed on its own, etc.

sounds nifty.   i'll implement this for a plain-text file,
and you do the implementation for a marked-up file,
and we'll see who gets done first, and who has the
solution that proves to be more powerful and robust.
how about it scott?   are you up for the challenge?


>    All this stuff is pretty easy for a college student 
>    with any scripting experience.

then it should be a piece of cake for you, scott, right?


>    Sure, in the ideal world.? Meanwhile, 
>    inserting unique IDs is a pragmatic solution.

i think that such a "pragmatic solution" would prove to be
a false economy, by causing more problems than it solves,
over the long run.   any time you fork the original text into
a different file, you're creating a brittleness that will bite you,
and causing yourself unnecessary editing trouble in the future.


>    Of course there's some work involved 
>    to sync the marked version with any future PG updates

that's exactly what i was just talking about, yep.


>    but given "diff" tools, it's not that much work 
>    for a handful of annotated books.? 
>    (And, with a little work, could be largely automated.)

have you ever done this type of work?
have you been successful in automating it?

if so, then you would be well-advised to start a business,
and charge the world for your expertise, since there are
lots of companies who are finding it expensive to do this,
and they would dearly love to find a less-costly solution...


>    And, I'm all in favor of someone taking the time 
>    to get an XPATH solution working.

me too!   how about you doing that work?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/6930ae90/attachment.html
From jon at noring.name  Wed Aug 17 16:47:27 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 16:47:45 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <fe.19c619c0.3035107d@aol.com>
References: <fe.19c619c0.3035107d@aol.com>
Message-ID: <1011340781.20050817174727@noring.name>

Bowerbird wrote:
> jon said:

>>?How is that? Annotations can be linked to the text using the markup
>> as "hooks" (e.g., using XPointer.) The more markup there is, the
>> more hooks to latch onto.

> please show me -- and the original poster -- an implementation that
> actually works, now.

Simply add an 'id' to any XHTML tag, and you can link to it using any
web application. Fragment identifiers are used in many millions of web
pages, if not billions.

The first step to an "annotation application" (which you are driving
at) is to have the underlying standards (the "hooks") worked out so as
to easily allow linking by the annotation application. This is already
worked out in the XML world (e.g., XPointer.) That's what I meant. In
LibraryCity we are now working on such annotation and related social
networking applications (e.g., blogs, wikis) to digital texts using
XPointer (which includes simple 'id' links.) Since the W3C standards
are open and universal, others can likewise build their own
application -- no need to invent a new hooking mechanism.


>>?Pointing to 'id' ("fragment identifiers") is the most robust and
>> can survive various types of document edits. In plain text systems,
>> where annotations have to hook to the content itself (rather than
>> markup which is separate from the content), it is more difficult to
>> prevent link breakage.

> you are trying to make us believe that the text changes and the
> markup doesn't.

Markup can change, but in the case of using 'id', the document
maintainers will be careful to keep 'id's undisturbed as much as
possible during document edits. This actually allows *major* changes
to documents and yet keep existing links unbroken. Can this be done
with plain text? Not as easily (it's not impossible, but requires some
sort of mapping system, or a knowledge of all the known
externally-generated links into the original text document.)

For example, an author may issue some work, and later revise it by
rewording paragraphs, adding new paragraphs, new chapters, etc. If
they are careful, they can assure integrity of existing links into the
updated Work by assuring the 'id's are properly preserved and placed
where they should be.

Here's an example:

======================================================================
[First Edition of a work]

   <p id="1234">First paragraph.</p>
   <p id="1235">Second paragraph.</p>

[Second Edition]

   <p id="1234">First paragraph with some minor edits.</p>
   <p id="4567">Inserted whole new paragraph.</p>
   <p id="1235">Second paragraph with some minor edits.</p>

======================================================================

If I have an external annotation which points to the content of
the second paragraph in the First Edition: id="1235", then in the
Second Edition, the link will remain unbroken even if a new
paragraph was inserted before it *and* the content in that paragraph
was revised but not enough to be topically different.

PSWG discussed the issues of interpublication linking for over a month
for the next generation Open eBook Publication Structure, where we
wanted to enable robust interpublication linking, annotation, etc.,
into OEBPS Publications. There's a *lot* of subtle and not-so-subtle
issues involved, some of which I've outlined in prior messages to
gutvol-d and TeBC a while back.

The original proposal to allow external annotation of digital texts
(like PG texts) may seem like a new idea to many PGers here, but it's
been something several of us have considered for quite a while (I was
thinking of it back in 2000 for Yomu.) It's not new to me. I've even
mentioned it here a few times, but not so explicitly (because we had
not yet publicly announced LibraryCity.)


> what you've done, though, is merely specified that there is markup
> which _cannot_ change (the "fragment identifiers"), so as to assure
> link-permanence.?if i were to specify content that can not change, i
> can guarantee link-permanence as well.

Of course, one can come up with a scheme to link into plain text
documents by character counting, paragraph counting, or a number of
other methods (including Ted Nelson's Project Xanadu approach.) It is
entirely possible someone has even come up with an IETF/RFC or
something else covering this. Have you researched to see what others
have already proposed for such a standard?

(For Ted Nelson's Xanadu, refer to http://www.xanadu.com/ )


> and in almost all cases, we're more likely to have text-invariance
> than to have markup-invariance.?(but this is beside the point,
> since it's easy enough to specify invariance of text and markup.
> it is also very easy to show link breakage in cases of variance.)

Ah, but with XHTML 'id' (and now XML xmlid), it is possible to do
significant text amendment and preserve existing links based on 'id'.
Of course, as noted above link preservation can be achieved for plain
text emendments, but it appears to be much messier, especially at the
authoring level. But then someone smart and motivated (like Bowerbird)
may come up with a clever way to make this work for plain text
documents.

Jon

From Bowerbird at aol.com  Wed Aug 17 17:16:25 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 17:21:50 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <1d4.42484bf0.30352d59@aol.com>

jon said:
>    Since the W3C standards are open and universal, 
>    others can likewise build their own application -- 
>    no need to invent a new hooking mechanism.

so you are telling the original poster that he can
"build his own application", is that what i'm getting?

nobody in the big wide world of x.m.l. has done it yet?

it's interesting how you always say "x.m.l. can do this",
but when it gets right down to it, nobody has done it.
when are y'all gonna get around to solving these issues?

you're telling people that they can do it themselves,
but meanwhile the experts haven't even done it yet!
don't you sense the disconnect in what you're saying?

vapor vapor vapor vapor vapor.


>    Markup can change, but in the case of using 'id', 
>    the document maintainers will be careful to keep 'id's 
>    undisturbed as much as possible during document edits.

keep that in mind when you're building your system, scott!


>    Can this be done with plain text? Not as easily

i'll worry about the plain-text implementation.

and it'll be all you can do to keep up with me...


>    (I was thinking of it back in 2000 for Yomu.)

what was "yomu"?   people here might want to know.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/e485fcfc/attachment.html
From donovan at abs.net  Wed Aug 17 17:37:10 2005
From: donovan at abs.net (D Garcia)
Date: Wed Aug 17 17:51:02 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <20050817234746.634B18C914@pglaf.org>
References: <20050817234746.634B18C914@pglaf.org>
Message-ID: <200508172037.11095.donovan@abs.net>

Separation of content, anyone?

Try as an experiment putting in the HTML version of the document a tag such as

blah blah blah <note1> blah blah

and in your header a link to an external style sheet (CSS2)

In that external style sheet, put something like

note1 {
  content:after "Your comment here";
}

or perhaps
  content: url(note1.html);

Of course, this will only work in a CSS2 compliant browser, so for now that's 
everybody except IE.
From scott_bulkmail at productarchitect.com  Wed Aug 17 17:52:57 2005
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Aug 17 17:53:33 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <1a8.3cf237d1.30351642@aol.com>
References: <1a8.3cf237d1.30351642@aol.com>
Message-ID: <p06110408bf2983ec19ac@[192.168.0.52]>

>you've got some slippery thinking here about the "id" markers.
>do they already exist in the text?  how did they get there?
>how did the database, which links id markers and annotations,
>come into existence?  how are annotations shared with others?
>can annotations be made on annotations?  what about things like
>graphics and movies -- how can they be utilized as annotations?

I don't have time to debate Bowerbird on these points, but if the original poster or anyone else who is actually going to work on annotation has serious questions, I'll be happy to throw in my $0.02 if I notice the thread.


>as above, if it's just a few lines of code, why won't you write 'em?

Several reasons, including:

- the original poster is better off having it done close to home so they have more control over implementation details, enhancements, etc.

- this isn't generally a tech list

- I typically code such things in UserTalk, which is most useful to those who already use Frontier for other reasons rather than as a quick-and-dirty scripting solution for an unknown environment.  If the original poster decides to use Frontier and whatever techie they find gets stuck, I'll be happy to help them out.


> >   but given "diff" tools, it's not that much work
> >   for a handful of annotated books. 
> >   (And, with a little work, could be largely automated.)
>
>have you ever done this type of work?
>have you been successful in automating it?
>
>if so, then you would be well-advised to start a business,
>and charge the world for your expertise, since there are
>lots of companies who are finding it expensive to do this,
>and they would dearly love to find a less-costly solution...

I have done this kind of work, have automated it, am in business, and have and do charge for it.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From jon at noring.name  Wed Aug 17 18:30:20 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 18:34:44 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <p06110408bf2983ec19ac@[192\.168\.0\.52]>
References: <1a8.3cf237d1.30351642@aol.com>
	<p06110408bf2983ec19ac@[192.168.0.52]>
Message-ID: <1508707233.20050817193020@noring.name>

Scott wrote:
> Bowerbird wrote:

>> you've got some slippery thinking here about the "id" markers.
>> do they already exist in the text?  how did they get there?
>> how did the database, which links id markers and annotations,
>> come into existence?  how are annotations shared with others?
>> can annotations be made on annotations?  what about things like
>> graphics and movies -- how can they be utilized as annotations?

> I don't have time to debate Bowerbird on these points, but if the
> original poster or anyone else who is actually going to work on
> annotation has serious questions, I'll be happy to throw in my $0.02
> if I notice the thread.

Most of the time one doesn't have to author an actual implementation
to determine whether it will be hard or not. Most experienced and
even inexperienced programmers instinctively know the difficulty of
most proposed applications.

It's like saying "if I go to the roof of a tall building and toss a
bowling ball off the side, it will begin accelerating to the ground."
It's obvious what will happen --there's no need to even waste the time
and run the experiment.

Some aspects of the current discussion about external annotation of
digital texts is similar. If one wants to implement some system,
planning and forethought are needed *before* writing lines of code.
For example, does XML confer benefits in the annotation system over
plain text, or vice-versa?

Obviously, Bowerbird wants everyone in the digital text arena to
embrace regularized digital text, and he is (apparently) building a
set of working applications (he calls them "tools") to prove his point.
All the power to him.

But we certainly have the right to bring up the requirements issues and
call into question whether regularized plain text is sufficient for all
uses and needs of the digital text universe. There are those among us,
including yours truly, who believe that XML should form the core of
digital publishing processes and formats. Even if Bowerbird implements
his system, we will ask: "will it do this and do that?" We *know* that
XML and its many associated W3C and IETF specifications and RFCs confers
a powerful and sufficient foundation to do all the myriad things proposed
for the digital publishing universe (that I know of at least, and I've
looked at a *lot* of advanced uses.) And we don't have to write code to
*know* this as true (see bowling ball discussion above.)

Those who listen to Garrison Keillor's "Prairie Home Companion" are
familiar with the fictional "Ralph's Pretty Good Grocery" (RPGG) in
Lake Wobegon. Their advertising slogan is "If you can't get it at
Ralph's, you can probably get along without it." Bowerbird's system is
clearly a RPGG since I know it will NOT do everything that has been
discussed for digital texts. Whether it will hit the sweet spot and
win the hearts and minds of the ebook masses (which must include all
the important stakeholders in the digital publishing universe), who
will overlook its deficiencies, remains to be seen. I'm skeptical, but
will wait to see what arrives.

Jon

From Bowerbird at aol.com  Wed Aug 17 19:10:20 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 19:10:41 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <d6.2bb96232.3035480c@aol.com>

donovan said:
>    Try as an experiment 
>    putting in the HTML version of the document 
>    a tag such as
>    blah blah blah <note1> blah blah
>    and in your header a link to an external style sheet (CSS2)
>    In that external style sheet, put something like
>    note1 {
>    ? content:after "Your comment here";
>    }
>    or perhaps
>    ? content: url(note1.html);

so, you want the user to edit an .html file
at the insertion point of every annotation,
and then edit the .css file appropriately...

i'll let the original poster tell us whether that
is something he feels is reasonable or not...


>    Of course, this will only work in a CSS2 compliant browser, 
>    so for now that's everybody except IE.

or, to put it another way, 1 out of 8 people.

***

scott said:
>   I don't have time to debate Bowerbird on these points,

i asked some fairly simple questions.   there is no "debate".
all you have to do is answer the fairly simple questions...


>    If the original poster decides to use Frontier 
>    and whatever techie they find gets stuck, 
>    I'll be happy to help them out.

so even a "techie" might "get stuck" doing this?   well, ok.
but your offer to help them out is certainly generous...


>   I have done this kind of work, have automated it, 
>    am in business, and have and do charge for it.

great!

can you point us to your for-a-fee solution? i would be
interested in pricing it.   do you have a cost-free demo?

i'd like to see the profit-margin on "a few simple scripts".

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/3c939848/attachment.html
From Bowerbird at aol.com  Wed Aug 17 19:29:11 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 19:29:32 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <42.6f5c078e.30354c77@aol.com>

jon said:
>    Most of the time one doesn't have to 
>    author an actual implementation
>    to determine whether it will be hard or not. 

i program, jon.   i'm not the world's greatest programmer,
but there aren't that many things that are hard to program,
even for me.   you just have to be willing to break the job down
into small enough pieces.   oh yeah, it also helps to have users
who don't care how big and/or slow your application might be...

but until you've actually programmed your application,
you don't really know what kind of obstacles lie hidden.

heck, sometimes you don't find out until end-users tell you.


>    Most experienced and even inexperienced programmers 
>    instinctively know the difficulty of most proposed applications.

most inexperienced programmers can't count on their "instincts".

(and someone like you -- a nonprogrammer -- certainly can't.)

and experienced programmers know that some projects that
look easy on the outside have a lot of those hidden obstacles.

but one thing i can tell you for sure, as a programmer, is that
if you let a nonprogrammer tell you what you should be doing,
you're in for a very rough time, unless you're paid by the hour,
and hard-up for the cash.   (it also helps if you are a masochist.)

another thing i can tell you for sure, as a programmer, is that
the fact that a format is "open and universal" doesn't mean
diddly-squat in terms of whether it'll be easy to program for it.


>    Bowerbird's system is clearly a RPGG 
>    since I know it will NOT do everything 
>    that has been discussed for digital texts.

oh really?

you seem to have some kind of super-e.s.p.
when it comes to knowing about my stuff,
some of which i haven't even programmed,
which could provide _useful_info_ to me...

heck, i can get a critique of my software
_before_i_even_write_it_!   that's awesome.

so jon, tell me what it won't do...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/04274eb9/attachment.html
From jon at noring.name  Wed Aug 17 19:29:20 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 19:29:46 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <1d4.42484bf0.30352d59@aol.com>
References: <1d4.42484bf0.30352d59@aol.com>
Message-ID: <487792669.20050817202920@noring.name>

Bowerbird wrote:
> jon said:

>>?Since the W3C standards are open and universal, others can likewise
>> build their own application -- no need to invent a new hooking
>> mechanism.

> so you are telling the original poster that he can "build his own
> application", is that what i'm getting?

I suppose, if he's interested. And if so, he wants guidance as to the
various issues involved, and the various proposed solutions. That's
what this discussion is about. Anyone can build "tools", but there's
a gazillion "tools" out there gathering dust on shelves because the
authors did not do their homework properly and try to understand the
truly important requirements leading to widescale embracement.

Anyone who builds such an annotation system *should* see the bigger
picture of the various issues involved *before* just building
something out of the blue -- and to understand how annotation fits
into the bigger picture of the general use of digital publications. We
(including you) are providing some of that foundation by giving our
respective views and perspectives on the matter. No need to build
"tools" to provide this perspective. That's silly. The tools can be
built, whether based on XML or plain text, when there is a need and
a decision to go ahead *after* fully understanding the requirements.

That you are supposedly going ahead with a plain text solution is
noble, but not germane to this discussion. You have not stated *why*
you believe your plain text solution is superior to XML for this
particular application. You've only dissed the XML approach w/o
going into detail of how your plain text approach will sufficiently
solve the external annotation of digital publications and meet all
the important requirements as we understand them now.

So far, all you've implied is "Trust me, *I'm* building a tool" (which
reminds me of John Kerry in the last prez election when he promised
many times "Trust me, I have a plan" but never gave specifics at the
time.) At least I tried to explain why the XML suite of specifications
provides a good foundation upon which to build that specific
functionality. So, how specifically would you implement external
annotation of plain texts in your system and why is the plain text
approach superior to the XML approach *for this specific purpose*?


> nobody in the big wide world of x.m.l. has done it yet?

It is built *when* there is a need for it, or somebody just takes
an interest, whether there's a need or not.


> it's interesting how you always say "x.m.l. can do this",
> but when it gets right down to it, nobody has done it.
> when are y'all gonna get around to solving these issues?

See my other message where I discuss this ("bowling ball experiment".)
To build anything, there has to be a perceived need, and up to now
there's not been the need, at least in the digital publishing
universe.


>  you're telling people that they can do it themselves,
>  but meanwhile the experts haven't even done it yet!
>  don't you sense the disconnect in what you're saying?

You are being disingenous by implying "they" (the XML experts) haven't
done it because it can't be done. That's wrong. The people who
authored XML included a large number of *experienced* software
developers who would eat your lunch. They developed XML not to solve
specific problems (although specific problems were in the back of
their minds), but rather to provide a powerful base upon which
applications to process text-based documents and data sets could
be built *when there is a need*. Just refer to the XML Cover Pages
for getting an idea of how well XML is being used to solve all kinds
of problems. It's amazing and overwhelming: http://xml.coverpages.org/

When you say "they" (whoever "they" are) haven't implemented a
particular application of *your* choosing -- using XML technologies --
as "proof" that XML is no good -- that is beyond silly. But that's
exactly what you continue to imply. It is a form of circular
reasoning -- clever, but easily seen through.

Anyway, there have been companies who've built proprietary systems to
interlink XML data using XPath and XPointer -- I know this for a
*fact*. One of my associates consulted for that company but I don't
recall the details of company name and product name -- it was shared
to me at the time, two years ago, in confidence under NDA. I'm sure if
one does a search at the XML Cover Pages, one will find several
implementations of the same W3C standards one would use for creating a
powerful annotation environment for XML documents. I'm not going to do
it because it is unnecessary at this time for the current state of
this discussion.)


> and it'll be all you can do to keep up with me...

I'll hand it to you -- you got chutzpah, and have been implying the
same thing for the nine or so years I've known you since ebook-list.
I'm dizzy trying to keep up with you! <smile/>

Jon


From Bowerbird at aol.com  Wed Aug 17 19:43:51 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 19:44:09 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <20c.748be19.30354fe7@aol.com>

jon said:
>    Anyone can build "tools", but there's a gazillion 
>    "tools" out there gathering dust on shelves 

there are?   can you point us to a half-dozen of these gazillion?


>    Anyone can build "tools", but there's a gazillion 
>    "tools" out there gathering dust on shelves 
>    because the authors did not do their homework properly 
>    and try to understand the truly important requirements 
>    leading to widescale embracement.

well, have you written reviews on these tools to
inform the authors about "the proper homework"
and "the truly important requirements"?   if so,
i'd like to see the reviews and get this head-start.

let's examine an actual example that i gave in this thread.
tk3, from nightkitchen.com, has good annotation features.
these include dogearing, highlighting (in 4 different colors),
stickies (in those same 4 colors), and a notebook capability,
which allows the user to include text, graphics, and movies
from the e-book in their notes.   annotations can be shared.
how does that stack up to your gazillion other tools, jon?


>    No need to build "tools" to provide this perspective. 
>    That's silly. The tools can be built, whether based on 
>    XML or plain text, when there is a need and a decision 
>    to go ahead *after* fully understanding the requirements.

except you will _not_ fully comprehend the situation _until_
you build the tools.   it's an iterative process.   and the fact that
you don't know that is one of our main points of contention...

i encourage you to get some programming experience, jon.
it'll make you a lot smarter...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/3db5492c/attachment.html
From scott_bulkmail at productarchitect.com  Wed Aug 17 19:57:44 2005
From: scott_bulkmail at productarchitect.com (Scott Lawton)
Date: Wed Aug 17 19:58:18 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <d6.2bb96232.3035480c@aol.com>
References: <d6.2bb96232.3035480c@aol.com>
Message-ID: <p0611040bbf299fbda295@[192.168.0.52]>

In answer to a specific suggestion, I typed:
> >   I have done this kind of work, have automated it,
> >   am in business, and have and do charge for it.

Bowerbird replied:
>great!
>
>can you point us to your for-a-fee solution? i would be
>interested in pricing it.  do you have a cost-free demo?
>
>i'd like to see the profit-margin on "a few simple scripts".

You left out some important context: my comment above was a reply to automated "diff", which is NOT something the original poster asked about, so it wasn't covered in my reply to him.

If Thad (who started this thread) is looking for a pragmatic solution that can work well for several books, it really is just a few simple scripts.  That would make a pretty meager product.

Of course annotation can be much more complex, but there's no need to make it so just to deliver some useful content to students.

I hate to leave issues hanging, but in your case I make an exception.  Folks who are new to the list may find it a bit rude of me not to reply to the many points you raised; folks who have been on awhile or stumbled across the many relevant portions of the archives will understand.
-- 

Cheers,

Scott S. Lawton
http://Classicosm.com/ - classic books
http://ProductArchitect.com/ - consulting
From jon at noring.name  Wed Aug 17 20:03:04 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 20:03:24 2005
Subject: SOPHIE (replacement for tk3 and XML-based) Re: [gutvol-d] Annotations
	for students
In-Reply-To: <42.6f5c078e.30354c77@aol.com>
References: <42.6f5c078e.30354c77@aol.com>
Message-ID: <208952740.20050817210304@noring.name>

Bowerbird wrote:
> jon said:

>> Most experienced and even inexperienced programmers
>> instinctively know the difficulty of most proposed applications.

>  most inexperienced programmers can't count on their "instincts".
>
>  (and someone like you -- a nonprogrammer -- certainly can't.)

Well, I have written over 100,000 lines of code (Fortran) over the
years. I've also written some scripts. I've edited and compiled some C
code. I've written a bunch of GWBasic programs. And I have a couple
associates in the XML world who are programming wizards and who I
often consult with regarding what can and can't be done. I've also
worked in engineering teams which included C++ coders. I took several
graduate level classes in computer science back in the 1980's, mostly
numerical analysis.

I have a pretty good "lay" understanding of contemporary programming.
And in the XML world, I keep abreast of successful XML applications
(e.g., web browsers) -- what they can do and can't do.

Next?


>  and experienced programmers know that some projects that
>  look easy on the outside have a lot of those hidden obstacles.

Certainly! I ran into this all the time when I was coding numerical
simulations of complex thermochemical systems (it's fun solving over
200 non-linear, and quite unstable equations in the same number of
unknowns.)

But overall one can *usually* get a good grasp at the general
difficulty of a programming task before sitting down and writing
code. Encountering problems is the norm, and it simply takes either
work-arounds, or changing the algorithm, in order to resolve. It is
expected.

*****

Btw, the tk3 example you gave in the other email you just sent (and I
just read), the replacement for it, SOPHIE, will be *XML-based*. The
developer of tk3, Bob Stein, is the major player of SOPHIE. He's been
around for years and years -- his partner is very experienced.

Gee, I wonder why they will now change gears and embrace XML? They are
*experienced* developers with a lot of knowledge of the tk3 product
(over 15 years), and *they* are switching to XML.

   http://www.annenberg.edu/futureofthebook/content/Mellon.pdf

   http://rit.mellon.org:8080/dev/projects/Sophie/

You better hurry and convince Bob Stein he needs to get rid of XML and
embrace plain text! Hurry, before it is too late! Your programming
experience should convince them of the folly of their ways.

Jon

From Bowerbird at aol.com  Wed Aug 17 20:21:08 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 20:21:28 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <15d.56facc3b.303558a4@aol.com>

scott said:
>   I hate to leave issues hanging, 
>    but in your case I make an exception.

no problem, scott.

i'll have an annotation app out soon,
so thad won't be left hanging for long.

***

jon said:
>    You have not stated *why* you believe your plain text solution 
>    is superior to XML for this particular application. 

sorry, i thought i've said that too many times to repeat again.
a plain-text solution is superior to x.m.l. because it frees people
from the hassle of doing markup.   fewer costs, equivalent benefits.

fairly straightforward...


>    You've only dissed the XML approach w/o going into detail 
>    of how your plain text approach will sufficiently solve 
>    the external annotation of digital publications and meet 
>    all the important requirements as we understand them now.

first of all, i haven't "dissed" the x.m.l. approach.
if somebody can present me an x.m.l. solution that works,
and is perfectly transparent to users, i'll be happy as a clam.   

what i have pointed out, respectfully, is that
acronyms are not solutions.   the feeling here
seems to be that "x.m.l. can do that" is an answer.
it's not.   if x.m.l. can truly do something, give people
a straightforward answer about _how_ to do it...

i know people who do tech support for software companies.
they are constantly having to deal with the expectations that
were falsely raised by the marketing department that says
"oh sure, our software can do that" simply to close the sale.
then, when the customer (rightfully) expects the software to
"do that", they are in for the surprise of their life, because
-- although the software _can_ do that -- it usually requires
a very expensive person (or even a whole crew) to work it.

you're doing the same kind of bait-and-switch to people here.
everyone gets the _impression_ that x.m.l. is all-powerful,
but x.m.l. actually never gets around to doing anything at all!

i hope thad, the original poster, will correct me if i'm wrong,
but i don't believe he came here asking about what kind of
core methodology he could use that would "eventually" let him
"write some scripts" or "build an application" to do annotations.
he wants to do annotations now, and wants a way to do them
that doesn't involve a lot of unnecessary work to implement.
the specific task that he has in mind is writing the annotations,
not creating an annotation system.

believe me, if x.m.l. delivered even _one-fifth_ of its promises,
i'd be one of its biggest supporters.   but so far it's 95% vapor.

and let us not forget that, in order to start milking benefits,
_first_ we have to mark up the entire library, and i must again
remind people that little progress is happening on that front...


>    So far, all you've implied is "Trust me, *I'm* building a tool"

wrong.   my saying is "the proof is in the pudding".
i'm not asking anybody to trust anybody.   be skeptical!

when my annotation program is ready, i will let you know.
and i "trust" that when your system is ready, you'll tell me.
and i "trust" the lurkers will notice who finishes first, and
finishes best, and which system is more robust and powerful
and cost-efficient and end-user friendly and all that good stuff.


>   You are being disingenous by implying "they" 
>    (the XML experts) haven't done it because it can't be done.

wrong.

it _can_ be done in x.m.l.
the degree of difficulty is
higher than with plain-text.
but it's _eminently_ doable.

nothing is that difficult in this arena.
this ain't putting a man on the moon.
it's basically a note-taking application.

what i _am_ curious about, though,
is why it hasn't been done already?

you'd think that with the huge number of people
running around saying "x.m.l. is the solution" that
_somebody_ would've put a bell on this cat by now.

perhaps that degree of difficulty is higher than we think.


>   The people who authored XML included 
>    a large number of *experienced* software
>    developers who would eat your lunch.

when they finally write an annotation app,
show me their program!   if it works well, 
i'll buy 'em beer to go along with my lunch,
and take 'em to the strip club that evening...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/5aaf16f0/attachment.html
From jon at noring.name  Wed Aug 17 22:24:29 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 17 23:18:03 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <15d.56facc3b.303558a4@aol.com>
References: <15d.56facc3b.303558a4@aol.com>
Message-ID: <334121333.20050817232429@noring.name>

Bowerbird wrote:

>  i'll have an annotation app out soon,
>  so thad won't be left hanging for long.

But will it work for *him*? Will it meet the requirements he sees that
such a system must fulfill?

That's the purpose of this discussion -- to assess the requirements
for any system enabling external annotations of digital publications.
This list of requirements is independent of solution type, but will
certainly be useful in evaluating how one would implement the
application (whether plain text or XML-based.)

Btw, since DP is providing the lion's share of the PG texts, to master
them in XML (TEI and XHTML), and plans to redo many of the early PG
texts, the issue that PG's corpus is now mostly plain text becomes less
compelling.

Jon

From tb at baechler.net  Wed Aug 17 22:52:01 2005
From: tb at baechler.net (Tony Baechler)
Date: Wed Aug 17 23:24:39 2005
Subject: [gutvol-d] Annotations for students
In-Reply-To: <Pine.GSO.4.58.0508162355030.19442@vtn1.victoria.tc.ca>
References: <92d068b011c466997116e41ba04a2359@nuprometheus.com>
	<92d068b011c466997116e41ba04a2359@nuprometheus.com>
Message-ID: <5.2.0.9.0.20050817224723.03b3eca0@bisinc.us>

Hello all.  I'm sorry, but I'm a little confused about something here.

PG has an offer from a teacher to annotate editions of PG classics which 
are already available.  Greg, Jim and Andrew all seem to agree that such 
notes should not be included in PG.  This is too bad since I would like to 
see such annotating.  However, what about a book by O'Henry?  There are two 
editions of this.  One without notes and one with notes by Joe who posted 
it.  I think the precident has been set to allow books with contemporary 
notes added.  Why not just assign them a new etext number?  book 17,000 for 
example could be a Mark Twain book with footnotes added.  If this is 
unacceptable, why not just have a xxxxx-notes file or directory?  For example:
17000.txt, 17000-h.htm, 17000-notes.txt

Or, the same as above but 17000-notes/ would be a separate directory with 
17000.txt that has notes added.  Hopefully this is clear.  Is there any 
reason why this can't be done?  This is how page images are done, right?

From Bowerbird at aol.com  Wed Aug 17 23:54:38 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 17 23:54:59 2005
Subject: [gutvol-d] Annotations for students
Message-ID: <e1.19bc9e14.30358aae@aol.com>

tony said:
>    Greg, Jim and Andrew all seem to agree that such notes 
>    should not be included in PG.? This is too bad since 
>    I would like to see such annotating.

i would think it would be too difficult to decide _who_
would get to provide the notes for a particular e-text...

it's easy enough for people to put their annotations
on their website for download by anyone who wants.
i think that's the reasonable course of action to take.

***

jon said:
>    But will it work for *him*? 

i dunno.   thad will have to tell me.          :+)


>    Will it meet the requirements he sees
>    that such a system must fulfill?

i'd think so, since i've done a lot of thinking
and searching and researching on what the
realm of requirements are.   but who knows?
maybe thad has an idea for a better mousetrap.

tell me, what do _you_ think "the requirements" are?


>    That's the purpose of this discussion 
>    -- to assess the requirements for any system 
>    enabling external annotations of digital publications.

not really.   i got the impression that
thad just wants something that works;
he wasn't trying to start a philosophical
discussion about the realm of annotation.
he just wants to juxtapose his commentary
next to the text to which it refers, i'd think.
(thad, don't let this melee scare you off, man;
step right in and tell us what you really think!)        ;+)


>    This list of requirements is independent of 
>    solution type, but will certainly be useful in 
>    evaluating how one would implement the application 
>    (whether plain text or XML-based.)

again, i don't think this is all that complicated.

if you've examined all the types of annotation
needed, and seen how various e-book programs
have implemented solutions, it is straightforward.


>    Btw, since DP is providing the lion's share of 
>    the PG texts, to master them in XML (TEI and XHTML), 
>    and plans to redo many of the early PG texts, 
>    the issue that PG's corpus is now mostly plain text 
>    becomes less compelling.

yeah, well, you'll let me know when that markup is complete,
won't you please?   because i'm not holding my breath on it...

-bowerbird

p.s.   i'm still doing some exploring about "sophie", 
but i'll be getting back to you on that very soon...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/31980ea1/attachment-0001.html
From curtzt at nuprometheus.com  Thu Aug 18 00:33:38 2005
From: curtzt at nuprometheus.com (Thad Curtz)
Date: Thu Aug 18 00:34:03 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 10
In-Reply-To: <20050818065502.10FAF8C916@pglaf.org>
References: <20050818065502.10FAF8C916@pglaf.org>
Message-ID: <222943DD-1873-4497-9F94-898D3DBF76A6@nuprometheus.com>

It's true - I don't have terribly grand aspirations for now (and I  
have done a good deal of amateur programming in a variety of  
languages, so a certain amount of technical overhead probably would  
be OK.) What I'm thinking about at the moment is just:

1. Some basic text formatting including superscripts for footnotes  
and italics.
2. Line numbers on the side every five lines for poems.
3. Glosses on difficult words - either as footnotes on the side, or  
as mouseover popups (with overlib, I suppose), or at the bottom of  
each page.
4. Longer explanatory notes.

I think the longer notes need to be at the bottoms of the pages when  
they're printed, and it might be nice to have the dictionary glosses  
there too. (There's really not that much space on the side of the  
page for them...)

And I'd like to do it in some way that's as standard as possible, as  
fast as possible, as simple for students to access as possible, and  
that makes it as easy as possible for other people who might want to  
do more to build on what I've done rather than starting over from  
scratch. (Eventually I assume people will want to be formatting this  
stuff for all sorts of digital reading platforms and all sorts of  
page sizes with cheap automated binding at home or at Kinkos or in  
college printshops.)

The problems about getting it to print as decently formatted pages  
with footnotes at the bottom on a variety of printers are one big  
reason I'd rather not do it in straight HTML. The problems about  
people being able to add to it (and file size issues) are reasons I'd  
rather not do it with PDFs.

But I've got now several leads for other tools and tactics to take a  
look at from your discussion. (As well as some ideas I hadn't thought  
of about why Gutenberg isn't already doing it...) I'll keep you  
posted if I get anything done.

Thanks,
Thad
From Bowerbird at aol.com  Thu Aug 18 01:40:08 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 18 01:40:37 2005
Subject: [gutvol-d] re: thad weighs in
Message-ID: <1a9.3d27336d.3035a368@aol.com>

thad said:
>    What I'm thinking about at the moment is just:

cool.   thanks for weighing in again.           :+)

one thing i'd suggest to you is to take a look at some of
the .html versions prepared by distributed proofreaders
where they have used side-notes.   it's pretty impressive,
considering they are operating without any javascript...

(but yeah, your comment about the difficulty of
getting well-formatted pages when printing from
.html is well-taken...)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/b37198a2/attachment.html
From creeva at gmail.com  Thu Aug 18 12:14:16 2005
From: creeva at gmail.com (Brent Gueth)
Date: Thu Aug 18 12:21:14 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <2510ddab05081812141c19d667@mail.gmail.com>

Going through the archives in my mail box the last few days I wanted
to add my .02 as someone who is not as close to the project as any of
you.   I think that any XML work (if Gutenberg goes that way)  needs
to be done in addition to, not in replacement of plaintest.

I don't care if XML becomes as common as plaintext and everyone uses
it, you can run into a problem in 20 years where XML falls out of
favor and there won't be software to render it properly.  This will
lead poor fools having to redo all the documents all over again.  
This is not a good thing.   Picking plaintext is genious in the sense,
that unless basic ASCII changes (not likely compared to XML losing
favor)  plaintext will always be able to be read.   This allows it
also to be read on older machines.   Maybe some of you don't care that
the guy with commodore 64 can read plaintext but can't read XML
because he is only one person on the planet.    But when you see all
the other 1 person implementations add together it becomes a decent
percentage.

My thoughts on a software to do the annotations would be to have a
read that could overlay annotations on the screen but maintain the
base document in plaintext or maintain a seperate annotated edition.


The problem also we come into when we discuss modern annotations is
who do we decide who is qualfied to release (write up) the annotations
for a certain book.   I may not agree with the annotationist that Bob
likes, and Sue will hate the choices bob and I will make.

The best solution I could honestly see to keep a degree of sanity is
to Wiki each book you wanted to annotate.


But I'll go back to reading now the archives now, I just though
plaintext still needed a champion before the whole world went
completely XML crazy.

Remember - 
plaintext was supposed to be replaced by Postscript
plaintext was supposed to be replaced by word perfect
plaintext was supposed ot be replaced by word
plaintext was supposed to be replaced by PDF
plantext was supposed to be replaced by HTML
plaintext is supposed ot be replaced by XML?  
Not bloody likely
From collin at xs4all.nl  Thu Aug 18 12:53:53 2005
From: collin at xs4all.nl (Branko Collin)
Date: Thu Aug 18 12:38:19 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com>
Message-ID: <43050371.25822.59AAF1@localhost>


On 18 Aug 2005, at 12:14, Brent Gueth wrote:

> Going through the archives in my mail box the last few days I wanted
> to add my .02 as someone who is not as close to the project as any of
> you.   I think that any XML work (if Gutenberg goes that way)  needs
> to be done in addition to, not in replacement of plaintest.

You are absolutely right, and I do not think you have anything to 
fear. 

The way I understood it, if some application of XML is going to be 
used at all, it will be as a storage format. From that format an 
immediate plain vanilla text file will be generated, that will be 
stored alongside the XML version.

-- 
branko collin
collin@xs4all.nl
From Bowerbird at aol.com  Thu Aug 18 13:06:14 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 18 13:06:31 2005
Subject: SOPHIE (replacement for tk3 and XML-based) Re: [gutvol-d]
	Annotations for students
Message-ID: <157.56f3b5c8.30364436@aol.com>

jon said:
>    Well, I have written over 100,000 lines of code (Fortran) 
>    over the years. I've also written some scripts. 
>    I've edited and compiled some C code. 
>    I've written a bunch of GWBasic programs. 
>    And I have a couple associates in the XML world 
>    who are programming wizards and who I often 
>    consult with regarding what can and can't be done.

great!   then i'm really looking forward to your program, jon...
like i always say, the proof is in the pudding.   deliver pudding.


>   Btw, the tk3 example you gave in the other email 
>    you just sent (and I just read), the replacement for it, 
>    SOPHIE, will be *XML-based*. The developer of tk3, 
>    Bob Stein, is the major player of SOPHIE. He's been
>    around for years and years -- his partner is very experienced.

i brought up tk3 because david rothman mentioned it in his blog,
so i took a fresh look at it.   and since the topic was annotations,
and tk3 is one of the programs with good annotation capabilities,
it seemed appropriate.   now, as you'll note in comments i made
over on david's blog, i know tk3 -- and bob stein -- very well.
(not personally, but i've followed his work since voyager days.)

i've talked to bob, and steve riggins (and steve's wife) as well.
i've even dropped some of my e-book programs on them, and
they've been impressed.   tell 'em to watch out for me!      :+)

but even after having visited the website you listed, i am puzzled.

"sophie" is actually the name of an e-book viewer-app written by
_richard_gaskin_, of fourth-world, another l.a.-based programmer.
see it at: http://www.fourthworld.com/products/sophie/index.html

so this makes me wonder if gaskin and stein are now teaming up?
that would be sweet.   i've been looking for a worthy competitor
in the e-book viewer-program realm, and openreader has been a
huge vapor bust so far, so a stein/gaskin product might be the one!

but i don't see anything on either website to indicate a merger?...

(after reading the .pdf, i see now that there's probably no merger.
riggins works in small-talk, while gaskin uses run-time revolution.
so it appears that this is just an unfortunate program-name crash,
all revolving around programmers who've been on the left coast.)


>    Gee, I wonder why they will now change gears 
>    and embrace XML? They are *experienced* developers 
>    with a lot of knowledge of the tk3 product (over 15 years), 
>    and *they* are switching to XML.

well, wonder no more, jon, because i can tell you why.
they're going for some hefty venture-capital bucks, and
x.m.l. is the trend-word of the decade.   if i was looking
for an investor sugar-daddy, i'd be spouting x.m.l. too!


>    You better hurry and convince Bob Stein he needs to 
>    get rid of XML and embrace plain text! 

again, you think i have something against x.m.l.   i don't.
if x.m.l. gave me useful tools, and hid all the complicated
file-formatting under the hood, i'd be happy to embrace it.

what i _do_ have something against is _vaporware_.

and that's _especially_ true in regard to electronic-books,
which have stagnated through cycle after cycle of _hype_
because nobody -- except for adobe -- has made it easy to
_author_ electronic-books that work well on all platforms.

read the "sophie" website you listed, and their .pdf, and
you will find that they both say the very exact same thing:
we need easy authoring-tools to make a revolution happen.

but instead of delivering honest-to-goodness, simple-to-use
authoring-tools, we've wasted the time fiddling with formats!

and when someone comes and asks a "how do i do this?"
question, we snow them with a bogus vaporware answer.

and then we wonder why nothing ever gets accomplished.


>    You better hurry and convince Bob Stein he needs to 
>    get rid of XML and embrace plain text! Hurry, 
>    before it is too late! Your programming experience 
>    should convince them of the folly of their ways.

nah.   i'll let bob burn through that investor cash instead;
he's already run tk3 through a couple rounds of funding.
(and got another quarter-of-a-million from u.s.c. recently.
man, i wonder what p.g. could do with a cool $250,000!)
yep, i _like_ to see the venture capitalists fall on their ass
chasing the trend-word of the decade, i really do...     :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/9b3d30a4/attachment.html
From marcello at perathoner.de  Thu Aug 18 14:28:20 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Aug 18 14:28:33 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com>
References: <2510ddab05081812141c19d667@mail.gmail.com>
Message-ID: <4304FD74.5020208@perathoner.de>

Brent Gueth wrote:

> This is not a good thing.   Picking plaintext is genious in the sense,
> that unless basic ASCII changes (not likely compared to XML losing
> favor)  plaintext will always be able to be read.

Are we confusing ASCII with plain text? Because the former is an 
encoding and the latter is a format. You are comparing apples with rocks 
and telling us we should eat rocks because they last longer.

Plaintext will stay forever because it defines nothing, and so will 
never have to be changed. TANSTAAPF: there ain't no such thing as a 
plaintext format. There are roughly 16,000 plaintext formats around, 
because every etext defines its own format. You cannot talk of a 
plaintext "format" at all.


> Maybe some of you don't care that
> the guy with commodore 64 can read plaintext but can't read XML
> because he is only one person on the planet.

That's easy to fix: he should get a girlfriend. (But he should let the 
C64 at home on the first few dates.)

Basically you say that millions of people with modern PCs should be 
forced to use stone-age technology because one person somewhere cannot 
afford to get an old PC from ebay? Even the PCs we are sending to 
African Schools are Pentium class machines!


> plaintext was supposed to be replaced by Postscript
> plaintext was supposed to be replaced by word perfect
> plaintext was supposed ot be replaced by word
> plaintext was supposed to be replaced by PDF
> plantext was supposed to be replaced by HTML
> plaintext is supposed ot be replaced by XML?  
> Not bloody likely

Horses were supposed to be replaced by cars.

Are we confusing existence with fitness for purpose? Or are we confusing 
existence with demand?

Because nobody wants plaintext.

Plaintext is ugly on a screen, is ugly on a PDA, is ugly on paper.

Plaintext cannot be converted automatically into anything else.

But, yes, it exists, like the treponema pallidum.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Thu Aug 18 22:54:20 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Aug 18 22:54:21 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com>
References: <2510ddab05081812141c19d667@mail.gmail.com>
Message-ID: <20050819055420.GC22610@pglaf.org>

On Thu, Aug 18, 2005 at 12:14:16PM -0700, Brent Gueth wrote:
> ...
> The problem also we come into when we discuss modern annotations is
> who do we decide who is qualfied to release (write up) the annotations
> for a certain book.   I may not agree with the annotationist that Bob
> likes, and Sue will hate the choices bob and I will make.
> 
> The best solution I could honestly see to keep a degree of sanity is
> to Wiki each book you wanted to annotate.

Thanks for your note, Brent, and for taking the time to
read through the archives.

A quick comment on this: PG is more likely to let other
folks take care of annotation.  Although we have some
producer-contributed reviews etc. in some eBooks, we generally
look to other sites to host reviews and other editorial
content.  For example, many of our catalog entries have
links to Wikipedia articles for info about authors & titles.

It might be that we'll have a "PG metadata"-type project
affiliate at some point (see our philosphy/FAQ/about
documents for some essays on this type of experimentation
& growth).  But I don't see adding such content to the
eBooks themselves any time soon.

Of course, such views could change as the people involved
in PG change, and the world continues to change...
  -- Greg


From greg at durendal.org  Fri Aug 19 11:01:38 2005
From: greg at durendal.org (Greg Weeks)
Date: Fri Aug 19 11:30:20 2005
Subject: [gutvol-d] Another rule 6 question
Message-ID: <Pine.LNX.4.44.0508191400170.30232-100000@durendal.durendal.org>


When a book is cleared with rule 6, is the artwork cleared also? Can the
artwork for the cover and interior illustration have a separate renewal?

-- 
Greg Weeks
http://durendal.org:8080/greg/


From joshua at hutchinson.net  Fri Aug 19 11:58:12 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Aug 19 12:21:26 2005
Subject: [gutvol-d] How to submit a new file to a posted ebook?
Message-ID: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com>

If I've finished producing a version of a posted text (such as HTML or PDF) and I want to get that submitted and posted ... how would I go about that/who should I bug with my e-mail?

I'm thinking it would either me one of the white-washers or the errata e-mail (which ends up in Jim's lap anyway), but I don't want to bother those folks who are overworked already, unless I know I'm supposed to be bothering them.  :)

Josh
From jtinsley at pobox.com  Fri Aug 19 13:23:20 2005
From: jtinsley at pobox.com (Jim Tinsley)
Date: Fri Aug 19 13:23:31 2005
Subject: [gutvol-d] How to submit a new file to a posted ebook?
In-Reply-To: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com>
References: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com>
Message-ID: <20050819202320.GA6921@panix.com>

On Fri, Aug 19, 2005 at 01:58:12PM -0500, Joshua Hutchinson wrote:
>If I've finished producing a version of a posted text (such as HTML or PDF) and I want to get that submitted and posted ... how would I go about that/who should I bug with my e-mail?

If it's a blind format conversion, please don't. 
http://www.gutenberg.org/faq/H-8

If it's just a late completion of a recent text,
just upload it in the normal way.

>
>I'm thinking it would either me one of the white-washers or the errata e-mail (which ends up in Jim's lap anyway), but I don't want to bother those folks who are overworked already, unless I know I'm supposed to be bothering them.  :)

I hereby declare myself to be so far behind that
nobody is supposed to be bothering me for years!

jim

From brad at chenla.org  Sat Aug 20 06:45:55 2005
From: brad at chenla.org (Brad Collins)
Date: Sat Aug 20 06:46:20 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com> (Brent Gueth's
	message of "Thu, 18 Aug 2005 12:14:16 -0700")
References: <2510ddab05081812141c19d667@mail.gmail.com>
Message-ID: <d5o8en70.fsf@chenla.org>

Brent Gueth <creeva@gmail.com> writes:

> I don't care if XML becomes as common as plaintext and everyone uses
> it, you can run into a problem in 20 years where XML falls out of
> favor and there won't be software to render it properly.  This will
> lead poor fools having to redo all the documents all over again.  
> This is not a good thing.   

As has already been mentioned, ASCII is an encoding and plaintext is a
format.

And ASCII is being replaced with Unicode.  Some decades from now ASCII
will gradually go the way of the Dodo.  This is inevitable as the vast
number of people in the world require a larger character set to read
and write than native English speakers.

As for plaintext, one of the core design goals for XML is that it
you'll be able to open it in any text editor and read it.  If a file
is human readable when it's opened in a text editor then it's a type
of plain text.  All XML does is place tags around text in order to
give the text a structure that machines can understand.

As long as you have a text editor, you'll be able to read XML.  A good
text editor can clean out all of the tags with a simple regular
expression like "<.*[^>]*>".  Script languages like perl, python,
ruby or any other language likely to come down the pike will be able
to process XML and convert it into whatever comes along in the future.

Very few applications render XML directly (except perhaps word
processors), everyone else converts it into html, pdf or other formats
for display.

SGML (XML's older sister) has been around for, what, twenty years or
more?  And all SGML documents are easily converted into XML.  XML is
simplier and designed to be around as an archive format for far longer
than that.

Think of the XML version of an ebook as expression of a work, which is
then converted into various manifestations including html, latex
(which can be converted to PDF via Postscript), html, tei as well as
a plain text file with not markup.  Most people will never know about
the master version in XML, they only will see the file formats they
use to read books.  XML is only a long term and safe archive format
which is flexible enough to describe both the structure of a text and
if you want it, also the semantic content of a text.

I suggest that you google for a basic intro to XML to get an idea of
what it really is.  If you know anything about HTML, XML is very
easy -- you can think of it as HTML where you can invent your own tags.

I personally don't like DOM and XSLT which are both used for
processing XML and converting it into formats like html which browsers
can render.  But this is no problem because I can just as easily
convert and XML document into a LISP data structure of S-expressions
which Lisp, Elisp, Scheme or Guile can process very easily.  

Once you understand that XML is just plain text, you can use any
software for processing text to work with it.  As long as there is a
text editor, an XML documment will never be lost.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From jon at noring.name  Sat Aug 20 12:01:08 2005
From: jon at noring.name (Jon Noring)
Date: Sat Aug 20 12:01:25 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <d5o8en70.fsf@chenla.org>
References: <2510ddab05081812141c19d667@mail.gmail.com>
	<d5o8en70.fsf@chenla.org>
Message-ID: <14610115858.20050820130108@noring.name>

Brad Collins wrote:
> Brent Gueth writes:

>> I don't care if XML becomes as common as plaintext and everyone uses
>> it, you can run into a problem in 20 years where XML falls out of
>> favor and there won't be software to render it properly.  This will
>> lead poor fools having to redo all the documents all over again.  
>> This is not a good thing.   

> [snip]
>
> As for plaintext, one of the core design goals for XML is that
> you'll be able to open it in any text editor and read it.  If a file
> is human readable when it's opened in a text editor then it's a type
> of plain text.  All XML does is place tags around text in order to
> give the text a structure that machines can understand.

Good points.

Properly marked up documents, where the XML vocabulary describes the
structure and semantics of the text, is highly repurposeable.

Should the day come that XML disappears from use, it will be
relatively easily to transform such XML documents into whatever is
new. Why? As Brad notes it's because an XML document comprises "plain"
text which has markup added (the markup itself is also "plain" text)
describing what the text is. One can think of markup as simply a sort
of descriptive metadata.

In the worst case scenario where one can't find anyone to write a
script or apply an XML processing application to do the transformation
(a scenario which will only happen if world-wide catastrophe strikes),
so long as there are running computers with text editors laying
around, one can open up the XML document in a text editor, and there
is the "plain" text, right in front of you, nicely described with
markup. Though it may take some work (depending upon the extent of the
markup), and some text metadata information may be lost, one can use
the text editor to strip out the markup and restore the content to
"traditional" PG plain text -- if so desired.

(In essence, XML markup follows Michael Hart's philosophy of using
text encoding to digitally preserve public domain Works.)

DP plans to apply an intelligently-designed XML vocabulary optimized
for book materials to their first-generation masters (they are looking
at a well-constrained subset of TEI, such as PGTEI now under
development by Marcello and others.) This is a good plan.

Jon

From Bowerbird at aol.com  Sat Aug 20 12:54:48 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug 20 12:55:01 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <11.4b9bef1f.3038e488@aol.com>

brad said:
>    As has already been mentioned, 
>   ASCII is an encoding and plaintext is a format.

i fail to see how this distinction has any importance to
the original point.   the user wants the words free of markup.


>    And ASCII is being replaced with Unicode.  Some decades 
>    from now ASCII will gradually go the way of the Dodo.?

well, if you want to get into this kind of doubletalk --
which i don't because, as i just said, it has no importance
-- then it is inaccurate to say that ascii is "being replaced"
by unicode, since the bottom 127 characters of unicode
are the same 127 ascii characters we've come to know.

if we give the original poster a unicode-aware text-editor,
and a file that contains no heavy markup, he will be happy.
he wants the words, all the words, and nothing but the words.


>    As for plaintext, one of the core design goals for XML is that 
>    it you'll be able to open it in any text editor and read it.? 

ok, and now here you seem to be trying to say that an x.m.l. file
is a plain-text file.   it's not.   it might consist of nothing more than

those 127 ascii characters, but it is decidedly not a plain-text file.

the original poster knows it's not plain-text.   so does michael hart.
most people do.   including, i suspect, you.   why confuse the issue?


>    If a file is human readable when it's opened in a text editor 
>    then it's a type of plain text.

again, this subterfuge is dishonest.

first, it's inaccurate to say that an x.m.l. file is "human readable".
and second, it's misleading to say it is "a type of plain text".
it might be an ascii file, but it's decidedly _not_ "plain-text".

? 
>    All XML does is place tags around text in order to 
>    give the text a structure that machines can understand.

you give machines far too little credit.   they can be made to be
far smarter than a dirt-dumb x.m.l. processor, which can _only_
be made to "understand" the structure of text _if_ it is tagged.


>    As long as you have a text editor, you'll be able to read XML.

let's give the original poster an x.m.l. file, and
have _him_ say whether he is able to "read it".

just because you can load a file into a text-editor
doesn't mean you'll actually be able to figure out
_how_ to edit the darn thing in the way you want.

and _that_ is the real topic at hand here...

these semantic games do nothing but cloud the discussion.


>    A good text editor can clean out all of the tags 
>    with a simple regular expression like "<.*[^>]*>".? 

ok, well at least now you're starting to talk about _issues_.

but of course, you're glossing over the reality even here.

the inference you are trying to get us to make is that
"cleaning out all the tags" will convert an x.m.l. file into
a plain-text file, magically.   it won't.   not in all cases anyway.
not unless the x.m.l. file was created -- carefully -- with that 
specific conversion in mind.   i've been writing a separate post
that will give details how this careful consideration and crafting
must be done.   (some hints: whitespace, quotemarks, and tables.)


>    Script languages like perl, python, ruby or any other language 
>    likely to come down the pike will be able to process XML and 
>    convert it into whatever comes along in the future.

it's telling how all of the hype about x.m.l. is in the present-tense,
but when you focus down to particulars, it moves to future-tense.

pay attention to this, lurkers!   it's a sure sign of vapor-ware!


>    Very few applications render XML directly 
>    (except perhaps word processors), 
>    everyone else converts it into html, pdf 
>    or other formats for display.

ask yourself why this is the case.   the answer is interesting.


>    SGML (XML's older sister) has been around for, what, 
>    twenty years or more?? And all SGML documents are 
>    easily converted into XML.? XML is simplier and 
>    designed to be around as an archive format 
>    for far longer than that.

in its day, s.g.m.l. made all the same promises as x.m.l. does now.
it couldn't keep them, so s.g.m.l. people had to invent a variant,
so they could regenerate all their hype from scratch and reuse it.

and sure enough, the public is gullible enough to believe it all again.

of course, the same difficulties that thwarted s.g.m.l. back in the day
-- sabotaging all their hype -- will return and bite x.m.l. in the butt.

but by the time we figure out how we've been had this time around,
all the x.m.l. proponents will have carted off their consultant cash...


>    Most people will never know about the master version in XML, 
>    they only will see the file formats they use to read books.? 

they'll "know about" that x.m.l. version indirectly;
it will be the reason their books are so expensive.
due to all that cash those consultants carted away.


>    XML is only a long term and safe archive format

hype and marketing.


>    Once you understand that XML is just plain text, 
>    you can use any software for processing text to work with it.? 

you can save a spreadsheet in "plain-text" form too,
and then "use any software for processing" that too.
but you're going to find yourself coming up short.

likewise when working with an x.m.l. file in a plain-text editor;
yes, it can be done, but you will find yourself coming up short.

but x.m.l. people will continue telling us this untruth, because
they want us to believe that x.m.l. is really simple.   but it's not.


>    As long as there is a text editor, 
>    an XML documment will never be lost.

of course, if it ain't human-readable in that form,
it doesn't really matter if it "will never be lost".

it won't need to be "lost" once it has been "tossed"...

***

i will repeat:   make x.m.l. work if you want us to respect it.
don't come and _tell_ us how wonderful it will be; show us.

the proof is in the pudding.   not in the hype and marketing.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/22ba0782/attachment.html
From creeva at gmail.com  Sat Aug 20 13:38:24 2005
From: creeva at gmail.com (Brent Gueth)
Date: Sat Aug 20 13:38:35 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <11.4b9bef1f.3038e488@aol.com>
References: <11.4b9bef1f.3038e488@aol.com>
Message-ID: <2510ddab050820133870f972bf@mail.gmail.com>

I do understand that unicode is the next generation of ascii, which as
bowerbird pointed out includes the standard ascii characters and is
enhanced from there.

While we could make XML the standard, why shouldn't we just include it
alongside the plaintext human readable revision without markup tags.

If the work is so easy it would be negligble to keep both revisions
around from initial editing.   I remember years ago when PG first
started (I was just an outside reader completely then) that they chose
plain ascii text to not get mired in any particular format that may be
lost.   Hence the stone tablets of the computer world.


To one of the people that commented on my email - yes I want everyone
to eat rocks.   I beleive in PG as an archival society.  The reason
the format is so successful, even in this day and age is that it
ubiquitous.   It is this commonality that makes it flexible.

do I really care if there is a seperate XML revision from the
plaintext?  No I do not.   I don't care if we make adobe pagemaker
versions.   I just don't want to lose the plaintext.


PG hasn't been futurists in the sense of betting what is going to be
common in the coming decades.   In any discussion that we could
consider replacnig the the plaintext revisions with XML it needs to be
asked if PG is a futurist or archival society.

If I ever manage ot get my hands on the first edition of Dumas's Count
of Monte Cristo like I want, I am not going to complain that it is in
French, and I can not read french.   In that sense it is no usable to
me (it wouldn't mater in book form or any other)  since I don't read
french.

But for some reason the desire to pretty up the archives with a
replacement format is just that.  I would have a book that as survived
150 years (give or take a decade or 2) that is still able to be
translated and worked from easyily.  Unicode wiull last at least that.
 I guarantee whichever XML revision is chosen it will be replaced in
150 years and be made obsolete.  Unicode on the other hand will still
be around because it is the workhorse of a computer society.

Finally let's leave this will a bit of my own dealing with XML.  I
work for a company that produced a major aplication we moved from
standard plaintext config files and plaintext logfiles to XML based. 
This in turn made tweaking and troubleshooting much more difficult
than it was worth.   THere is also other problems that arose with
that.   The cumbersome activities of our dewvelopment staff turned
alot of people away, and gained alot of new customers at great cost (I
don't want to go into too much detail about my company or product).  
The main difference though is my compnay is supposed ot be forward
thinking and is trying to keep up with the jones's.  PG has no jones's
to compete with it is a single entity above that petty bickering.  It
is a beautifl idea of preserving civilization for the future to
generations.   To survive copyright laws and make the works available.
  I also believe though that new books should be edited now by PG and
locked in a storage vault for release a a later date, so the books
themselves survive even if the print copies don't.  I add that in to
show I don't follow a straight PG line, but i'm all about keeping the
existence of this information alive for future generations.   I'm
about the information and the access and survival for it  You can
beautify it all you want but I strongly feel the essence and soul
should be maintained as it is now before we lose what makes us
special.
From hacker at gnu-designs.com  Sat Aug 20 13:45:48 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Aug 20 13:46:48 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com>
References: <11.4b9bef1f.3038e488@aol.com>
	<2510ddab050820133870f972bf@mail.gmail.com>
Message-ID: <Pine.LNX.4.61.0508201640360.27717@aphrodite.gnu-designs.com>


> While we could make XML the standard, why shouldn't we just include 
> it alongside the plaintext human readable revision without markup 
> tags.

 	I agree. Use the XML version as the base format, and transform 
that XML into plain text (or pdf, jpg, postscript, etc.) from there. 
Great solution and I believe that is what this discussion is leading 
to.

> do I really care if there is a seperate XML revision from the 
> plaintext?  No I do not.  I don't care if we make adobe pagemaker 
> versions.  I just don't want to lose the plaintext.

 	Exactly. That's what the XML version provides: one consistent 
base format through which all others are derived, making the final 
text, Adobe PageMaker, whatever... versions identical in content to 
the original XML version. Plain text is one of those formats, and if 
you prefer to read it in that format, you can do so.

> Finally let's leave this will a bit of my own dealing with XML.  I 
> work for a company that produced a major aplication we moved from 
> standard plaintext config files and plaintext logfiles to XML based. 
> This in turn made tweaking and troubleshooting much more difficult 
> than it was worth.

 	Why did your company move from plain text to XML? What tools 
were you using to process the XML? Moving to XML "Just Because(tm)", 
is not a good reason to move that direction. There's a lot of "XML is 
the Future" FUD flying around, and too many people are believing it. 
Without a solid reason for migrating to XML (as for config files in 
your case), then its the wrong solution.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From hacker at gnu-designs.com  Sat Aug 20 13:40:19 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Aug 20 13:46:50 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <11.4b9bef1f.3038e488@aol.com>
References: <11.4b9bef1f.3038e488@aol.com>
Message-ID: <Pine.LNX.4.61.0508201616520.27717@aphrodite.gnu-designs.com>


> i fail to see how this distinction has any importance to the 
> original point.  the user wants the words free of markup.

 	[snip]

> if we give the original poster a unicode-aware text-editor, and a 
> file that contains no heavy markup, he will be happy. he wants the 
> words, all the words, and nothing but the words.

 	[snip]

> ok, and now here you seem to be trying to say that an x.m.l. file is 
> a plain-text file.  it's not.  it might consist of nothing more than

 	You spelled XML incorrectly again.

> again, this subterfuge is dishonest.

> first, it's inaccurate to say that an x.m.l. file is "human 
> readable". and second, it's misleading to say it is "a type of plain 
> text". it might be an ascii file, but it's decidedly _not_ 
> "plain-text".

 	Are graphical buttons that contain letters "human readable"? 
What about product labels? Billboard signs? None of those are "human 
readable" (at least in the capacity that say... an OCR application 
could be able to decipher their meaning).

> let's give the original poster an x.m.l. file, and have _him_ say 
> whether he is able to "read it".

 	Sure, you can read the XML file with a browser, if you have 
the appropriate stylesheet that goes with it. A text editor does 
nothing more than "render" the text to the user's screen. Markup is 
the semantic instructions that describe exactly how that text is going 
to be rendered. A "text editor" that understands XML can easily make 
those tags invisible to the end user, or fold the sections, etc.

 	This is all just a silly argument, and by your definition, 
your own wacky ZML format is not human readable either. What exactly 
is your point with this diatribe anyway? You're not going to save the 
world from XML, and you're certainly not going to convince others here 
who use it in their daily jobs.

 	So what exactly is your point?

> just because you can load a file into a text-editor doesn't mean 
> you'll actually be able to figure out _how_ to edit the darn thing 
> in the way you want.

 	First it was about giving a 'user' the XML file to read, and 
now it's about editing the file? Which is it? If you're trying to edit 
the file, you should be expected to have the necessary tools and 
skills to do so.

 	"Users" shouldn't be expected to build software on their 
machines without the proper development tools and environment set up 
to do so.

 	Which brings me to another point: Is source code "human 
readable"? Its marked up in a way that provides instructions to the 
user's editor and compiler. By your definition, it too can't be 
considered "human readable" unless we remove those instructions. 
Removing them however... fundamentally changes how the "text file" is 
handled by the reader.

 	And also by your definition, since an XML file is not "human 
readable", it must fail the test for GPL compliance. How would you 
provide a person with the "human readable" format of the source, to 
remain in compliance with that license? Would you consider XML the 
"machine readable" source instead?

> these semantic games do nothing but cloud the discussion.

 	By trying to assert that XML isn't plain text, you are the one 
confusing the issue. Since < and > are within the 0-127 character 
limit, XML is actually ascii text. That means it is "plain text". You 
lose this argument based on your own conclusions.

> the inference you are trying to get us to make is that "cleaning out 
> all the tags" will convert an x.m.l. file into a plain-text file, 
> magically.  it won't.

 	It won't, because it already is a "plain-text" file. Cleaning 
them out just removes some of the plain text, leaving other plain text 
behind. There is nothing different from removing <this> and <that> 
from the text, just like removing (this) and (that) from the text.

> i've been writing a separate post that will give details how this 
> careful consideration and crafting must be done.  (some hints: 
> whitespace, quotemarks, and tables.)

 	Does it pass an XML validator? Is it well-formed? If not, then 
it isn't XML, and it is some other plain-text format with whitespace, 
quotemarks and tables.

> pay attention to this, lurkers!  it's a sure sign of vapor-ware!

 	What is the vaporware? I haven't seen it yet. XML exists, its 
not vaporware. I use it quite heavily to store Palm records with 
pilot-link. Its a great medium for atomic, record-level data in that 
specific case.

 	But I'm seeing that your argument is full of hot air... or 
vapor, if you wish to use proper semantics. ;)

> in its day, s.g.m.l. made all the same promises as x.m.l. does now. 
> it couldn't keep them, so s.g.m.l. people had to invent a variant, 
> so they could regenerate all their hype from scratch and reuse it.

 	No, SGML is completely different in goal and purpose from XML.

> and sure enough, the public is gullible enough to believe it all again.

 	When you believe the hype that XML has anything at all to do 
with the "Web", then you're the gullible one. XML is an empty bucket, 
nothing more. It simply "holds". That's it. This whole "XML is the 
future of the web" business is all just hype pushed by companies 
trying to sell you products based on XML that intersect with the web.

> of course, the same difficulties that thwarted s.g.m.l. back in the 
> day -- sabotaging all their hype -- will return and bite x.m.l. in 
> the butt.

 	You're spelling SGML and XML incorrectly again. For someone 
who is trying to defend what is, and what is not "plain text" or 
"ascii" or "unicode", you certainly don't know how to use grammar and 
spelling correctly. You would add significant weight to your arguments 
if you were able to articulate them using proper English.

> they'll "know about" that x.m.l. version indirectly; it will be the 
> reason their books are so expensive. due to all that cash those 
> consultants carted away.

 	Excuse me? How does storing a textual work in XML in any way 
increase its price? In fact, it should dramatically decrease the 
"price", because it requires less handling to convert to any of a 
dozen or more formats. Having to recreate a work in Word, pdf, XML, 
text, and so on is much more "interactive" work if your base format is 
something other than XML. It requires much more "carbon-based" 
handling to maintain in those formats (not to mention additional 
storage and processing and maintenance at update time).

>>    XML is only a long term and safe archive format

> hype and marketing.

 	And your solution is what? Your wacky ZML answer? Please.

> you can save a spreadsheet in "plain-text" form too, and then "use 
> any software for processing" that too. but you're going to find 
> yourself coming up short.

 	Not by your definition of "plain text".

> likewise when working with an x.m.l. file in a plain-text editor; 
> yes, it can be done, but you will find yourself coming up short.

 	Funny, not a single anti-XML argument I've ever read (and I've 
read hundreds) has ever said "XML is hard to work with because its not 
plain text". Except here of course.

> but x.m.l. people will continue telling us this untruth, because 
> they want us to believe that x.m.l. is really simple.  but it's not.

 	Because you're the only one who doesn't seem to grasp the 
means by which XML can be used, edited and converted, does not mean 
the format suffers or is lacking in any way.

 	The "X" in XML stands for Extensible. So extend it to suit 
your needs, or use something else. Nobody is twisting your arm.

> of course, if it ain't human-readable in that form, it doesn't 
> really matter if it "will never be lost".

 	Right, since XML is plain and simple and human readable, the 
documents contents will never be lost or buried in an unparsable 
format or a format that requires specialized tools to edit or 
maintain.

> i will repeat:  make x.m.l. work if you want us to respect it. don't 
> come and _tell_ us how wonderful it will be; show us. the proof is 
> in the pudding.  not in the hype and marketing.

 	Have you ever read an XML file that is properly styled, in an 
editor that properly renders it with that styling intact? XML was not 
meant to be "read" by human eyes. Its a bucket, it "holds". You 
process it to turn it into something that can be read by humans or 
other machines or whatever. It is "source code" in that respect, to 
the "compiler" (XSLT, DOM, parsers) that is used to read it.

 	And as much as I hate to bring it up, how many times have you 
openly exclaimed that you were leaving for good, and failed to do so?

 	More hype and marketing?


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From joshua at hutchinson.net  Sat Aug 20 15:43:05 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Aug 20 14:35:28 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com>
References: <11.4b9bef1f.3038e488@aol.com>
	<2510ddab050820133870f972bf@mail.gmail.com>
Message-ID: <4307B1F9.6000807@hutchinson.net>

Brent Gueth wrote:

>I do understand that unicode is the next generation of ascii, which as
>bowerbird pointed out includes the standard ascii characters and is
>enhanced from there.
>
>While we could make XML the standard, why shouldn't we just include it
>alongside the plaintext human readable revision without markup tags.
>  
>
Let me try to give the reasons why *I* started pursuing XML and see if 
that doesn't help allay some of your concerns.

My main involvement with PG texts comes from a DP background.  I'm one 
of the folks that help put the PG texts in place.  So my perspective is 
not as much from the point of reading the texts and it is producing the 
texts.  This isn't to say I don't consider the reader, but everyone 
tries to scratch their own itches first, and my itches are from a 
producer's point of view.

When you create a PG text now a days, most people create multiple 
"versions."  At the most basic, people usually create the text version 
and a HTML version.  Text is because that is the minimum required at PG, 
and HTML because there is a lot of information that cannot be well 
represented by a plain text file opened in Notepad.  Images are the 
first example that come to mind.

Then, there are some texts which require/practically beg for additional 
"versions".  We have scientific texts that really need a latex master 
document that is rendered to PDF.  Languages Other Than English (LOTE) 
texts that require a larger character set than ASCII, so you might do a 
UTF-8 encoded text.

The problem is, once you've create the first version (let's say it is 
the UTF-8 encoded plaintext format), you now have to do the manual work 
for the other formats.  Sometimes this is trivial, sometimes it is not.  
But to make matters worse, it is not uncommon to notice a typo in the 
HTML that you didn't fix earlier.  Now, you have to go back to the other 
versions and make the same "fix".  This very quickly becomes an 
organizational nightmare as I'm sure you can imagine.

XML solves this to a large extent.  I create one "master" document and 
then literally click a button and I get a UTF-8 encoded .txt file, a 
Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded 
file, and a PDF file.  I post all of them to the ww'ers in a fraction of 
the time.  Plus, if someone down the road finds a problem in the text, 
the fix can be applied to the master XML and the others files can be 
regenerated.

We are not doing away with the .txt files you want.  We are coming up 
with a more efficient way to create it (along with the many other 
document formats people want).

Oh, and yes, it is possible to create conversion routines for other 
formats as well.  Marcello had a Palm format working at one point, if I 
remember correctly.  A MS reader .LIT is possible (the specs are freely 
available and under a free license, we just need someone to take the 
time to create the converter).  Rocket ebook reader and others should 
all be possible as long as the spec for the format is freely available.

Please feel free to ask any questions you want on the subject.  I'll be 
happy to run at the mouth all you want!  ;)

Josh

From jon at noring.name  Sat Aug 20 14:42:55 2005
From: jon at noring.name (Jon Noring)
Date: Sat Aug 20 14:43:09 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <11.4b9bef1f.3038e488@aol.com>
References: <11.4b9bef1f.3038e488@aol.com>
Message-ID: <1467962782.20050820154255@noring.name>

Bowerbird wrote:

>  i will repeat:?  make x.m.l. work if you want us to respect it.

Who's "us"? Didn't you mean to say "me"?

From jon at noring.name  Sat Aug 20 14:52:48 2005
From: jon at noring.name (Jon Noring)
Date: Sat Aug 20 14:53:00 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <4307B1F9.6000807@hutchinson.net>
References: <11.4b9bef1f.3038e488@aol.com>
	<2510ddab050820133870f972bf@mail.gmail.com>
	<4307B1F9.6000807@hutchinson.net>
Message-ID: <1834636618.20050820155248@noring.name>

Joshua wrote:

[keeping his whole reply intact]

> My main involvement with PG texts comes from a DP background.  I'm one
> of the folks that help put the PG texts in place.  So my perspective is
> not as much from the point of reading the texts and it is producing the
> texts.  This isn't to say I don't consider the reader, but everyone 
> tries to scratch their own itches first, and my itches are from a 
> producer's point of view.
>
> When you create a PG text now a days, most people create multiple 
> "versions."  At the most basic, people usually create the text version
> and a HTML version.  Text is because that is the minimum required at PG,
> and HTML because there is a lot of information that cannot be well 
> represented by a plain text file opened in Notepad.  Images are the 
> first example that come to mind.
>
> Then, there are some texts which require/practically beg for additional
> "versions".  We have scientific texts that really need a latex master
> document that is rendered to PDF.  Languages Other Than English (LOTE)
> texts that require a larger character set than ASCII, so you might do a
> UTF-8 encoded text.
>
> The problem is, once you've create the first version (let's say it is
> the UTF-8 encoded plaintext format), you now have to do the manual work
> for the other formats.  Sometimes this is trivial, sometimes it is not.
> But to make matters worse, it is not uncommon to notice a typo in the
> HTML that you didn't fix earlier.  Now, you have to go back to the other
> versions and make the same "fix".  This very quickly becomes an 
> organizational nightmare as I'm sure you can imagine.
>
> XML solves this to a large extent.  I create one "master" document and
> then literally click a button and I get a UTF-8 encoded .txt file, a
> Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded
> file, and a PDF file.  I post all of them to the ww'ers in a fraction of
> the time.  Plus, if someone down the road finds a problem in the text,
> the fix can be applied to the master XML and the others files can be
> regenerated.
>
> We are not doing away with the .txt files you want.  We are coming up
> with a more efficient way to create it (along with the many other 
> document formats people want).
>
> Oh, and yes, it is possible to create conversion routines for other 
> formats as well.  Marcello had a Palm format working at one point, if I
> remember correctly.  A MS reader .LIT is possible (the specs are freely
> available and under a free license, we just need someone to take the
> time to create the converter).  Rocket ebook reader and others should
> all be possible as long as the spec for the format is freely available.
>
> Please feel free to ask any questions you want on the subject.  I'll be
> happy to run at the mouth all you want!  ;)

Kudos!

This is by far the best reply I've yet seen on the practical benefits of
XML for producing structured digital texts. Cogent, simple, and to the
point, backed up by real-world experience.

Joshua, you might consider submitting what you wrote to David Rothman's
TeleRead blog as a guest blog article (his blog is one of the more popular
blogs on the Internet, and by far the most read blog regarding ebooks
and digital libraries.) Let me know -- I will be glad assist.

Jon

From Bowerbird at aol.com  Sat Aug 20 15:14:55 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug 20 15:15:12 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <dc.2bf1b747.3039055f@aol.com>

david said:
>    So what exactly is your point?

my point, exactly, is that a lot of people here
tout x.m.l. as the solution, but no one seems to
want to pay the cost of actually doing the markup.

why is that?

as to all the verbiage about what constitutes
whether a file is "human readable" or not,
if we ask some humans, the answer is clear.


>   Since < and > are within the 0-127 character limit, 
>    XML is actually ascii text. That means it is "plain text". 

um, no.   an x.m.l. file is not "plain text".   ask a human.


>   How does storing a textual work in XML 
>    in any way increase its price?

applying the markup requires expensive expertise.


>    In fact, it should dramatically decrease the "price", 
>    because it requires less handling to convert to 
>    any of a dozen or more formats.

or so the hype goes.   but where is the pudding?

meanwhile, over at blackmask, daniel has been
converting the entire project gutenberg library
into a half-dozen formats for several years now,
based on the plain-text versions, with zero x.m.l.


>   And as much as I hate to bring it up, 
>    how many times have you openly exclaimed that 
>    you were leaving for good, and failed to do so?

you'll be dealing with me for a long time, david...

provide some pudding.   don't just talk about it.
there are 17,000 e-texts waiting to be marked up.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/2270795a/attachment.html
From Bowerbird at aol.com  Sat Aug 20 15:18:47 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug 20 15:19:03 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <1ac.3db11948.30390647@aol.com>

jon noring said something or other.

however, since jon is still moderating michael hart
over on jon's listserve, i won't be responding to him.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/d79bf79c/attachment.html
From marcello at perathoner.de  Sat Aug 20 16:03:14 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Aug 20 16:03:46 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <4307B1F9.6000807@hutchinson.net>
References: <11.4b9bef1f.3038e488@aol.com>	<2510ddab050820133870f972bf@mail.gmail.com>
	<4307B1F9.6000807@hutchinson.net>
Message-ID: <4307B6B2.1000905@perathoner.de>

Joshua Hutchinson wrote:

> Marcello had a Palm format working at one point, if I 
> remember correctly.

I dropped it because pluckering the html file gives you a better 
experience at a smaller file size.

The same conversion should be possible for Pocket-PC formats, but I'm 
not going to buy one just to test this.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Sat Aug 20 16:12:09 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Aug 20 16:12:23 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com>
References: <11.4b9bef1f.3038e488@aol.com>
	<2510ddab050820133870f972bf@mail.gmail.com>
Message-ID: <4307B8C9.1080407@perathoner.de>

Brent Gueth wrote:

> While we could make XML the standard, why shouldn't we just include it
> alongside the plaintext human readable revision without markup tags.

That's just what we were going to do.


> To one of the people that commented on my email - yes I want everyone
> to eat rocks.

Tip: don't open a restaurant.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Sat Aug 20 16:26:05 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Sat Aug 20 16:26:20 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <4307B1F9.6000807@hutchinson.net>
References: <11.4b9bef1f.3038e488@aol.com>
	<2510ddab050820133870f972bf@mail.gmail.com>
	<4307B1F9.6000807@hutchinson.net>
Message-ID: <Pine.GSO.4.58.0508201620070.29468@vtn1.victoria.tc.ca>


On Sat, 20 Aug 2005, Joshua Hutchinson wrote:

>
> The problem is, once you've create the first version (let's say it is
> the UTF-8 encoded plaintext format), you now have to do the manual work
> for the other formats.  Sometimes this is trivial, sometimes it is not.
> But to make matters worse, it is not uncommon to notice a typo in the
> HTML that you didn't fix earlier.  Now, you have to go back to the other
> versions and make the same "fix".  This very quickly becomes an
> organizational nightmare as I'm sure you can imagine.
>
> XML solves this to a large extent.  I create one "master" document and
> then literally click a button and I get a UTF-8 encoded .txt file, a
> Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded
> file, and a PDF file.  I post all of them to the ww'ers in a fraction of
> the time.  Plus, if someone down the road finds a problem in the text,
> the fix can be applied to the master XML and the others files can be
> regenerated.


I'll add this to Josh's well-worded message. For the white washers
and anyone doing maintenance on the PG files, having a variety of
file formats to deal with does sometimes make quite a headache.

Recently, I was making some corrections in a text that was in the
collection in txt, htm, and rtf formats, and I can tell you that
editing rtf manually is not fun.

Also a note that for the example Josh mentioned above, after he
submits the files, a white washers will review them with some
automatic checking before being posted, and any corrections
being made will need to be done individually to each file format.

Andrew
From joshua at hutchinson.net  Sat Aug 20 16:50:40 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sat Aug 20 16:50:53 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>

I played a little with the ReaderWorks converter for HTML to LIT.  The biggest limitation is that the LIT format supports a nice Table of Contents feature which a basic HTML to LIT conversion doesn't support.  The LIT specs are supposedly free (and under a Free License) but I haven't checked into it any further than that.  I supposed after TXT, HTML and PDF are working in the PG mainstream, I'll move on to other formats like the Palm and Reader formats.

----- Original Message -----
From: "Marcello Perathoner" <marcello@perathoner.de>
> 
> Joshua Hutchinson wrote:
> 
> > Marcello had a Palm format working at one point, if I remember correctly.
> 
> I dropped it because pluckering the html file gives you a better experience at 
> a smaller file size.
> 
> The same conversion should be possible for Pocket-PC formats, but I'm not 
> going to buy one just to test this.
> 
> 
> -- Marcello Perathoner
> webmaster@gutenberg.org
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From marcello at perathoner.de  Sat Aug 20 17:25:09 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Aug 20 17:25:25 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
Message-ID: <4307C9E5.5070606@perathoner.de>

Joshua Hutchinson wrote:

> I played a little with the ReaderWorks converter for HTML to LIT.
> The biggest limitation is that the LIT format supports a nice Table
> of Contents feature which a basic HTML to LIT conversion doesn't
> support.  The LIT specs are supposedly free (and under a Free
> License) but I haven't checked into it any further than that.  I
> supposed after TXT, HTML and PDF are working in the PG mainstream,
> I'll move on to other formats like the Palm and Reader formats.


Plucker lets you download a web site (and conversely an html ebook) to 
your Palm. Links and images still work. Its GPLed. But its PalmOS only.

AvantGo does the same for PocketPC. But it is payware.

We need a reader for PocketPC (and Symbian) and an html converter that 
runs on (at least) linux. Both must be open source.

Any suggestions?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Sat Aug 20 17:20:23 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug 20 17:25:39 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <99.64e100fd.303922c7@aol.com>

andrew said:
>    having a variety of file formats to deal with 
>    does sometimes make quite a headache.

yes, having a master-file is indeed a good thing.

but it's just not true that x.m.l. is the only form
that a master can take.   and it might not be true
that x.m.l. is even the _best_ form that it can take.

again, david moynihan at blackmask.com has proven
that x.m.l. isn't necessary to generate lots of formats.
through the use of standardized formatting, he has been
able to do what none of you have even been able to start.

but hey, i look forward to the day when you do get underway,
and to the time long after that when all the markup gets done,
because then the e-texts will finally be in a regularized format...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/ac9648b6/attachment.html
From hacker at gnu-designs.com  Sat Aug 20 18:14:20 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Aug 20 18:14:52 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <4307C9E5.5070606@perathoner.de>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<4307C9E5.5070606@perathoner.de>
Message-ID: <Pine.LNX.4.61.0508202113001.2861@aphrodite.gnu-designs.com>


> Plucker lets you download a web site (and conversely an html ebook) 
> to your Palm. Links and images still work. Its GPLed. But its PalmOS 
> only.

 	Incorrect. Plucker runs on PalmOS, PocketPC, Windows MObile, 
Linux and on non-PDA desktop machines. There are ports of the viewer 
for those platforms, many of which we carry in CVS.

> AvantGo does the same for PocketPC. But it is payware.

 	AvantGo falls short of about 40 of Plucker's core features.

> We need a reader for PocketPC (and Symbian) and an html converter 
> that runs on (at least) linux. Both must be open source.

 	Plucker, Vade Mecum (the PocketPC viewer based on Plucker) are 
the tools you need.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From gbnewby at pglaf.org  Sat Aug 20 18:23:35 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Sat Aug 20 18:23:36 2005
Subject: [gutvol-d] Another rule 6 question
In-Reply-To: <Pine.LNX.4.44.0508191400170.30232-100000@durendal.durendal.org>
References: <Pine.LNX.4.44.0508191400170.30232-100000@durendal.durendal.org>
Message-ID: <20050821012335.GC2094@pglaf.org>

On Fri, Aug 19, 2005 at 02:01:38PM -0400, Greg Weeks wrote:
> 
> When a book is cleared with rule 6, is the artwork cleared also? Can the
> artwork for the cover and interior illustration have a separate renewal?

The intent is for a clearance to clear an entire item:
intro, artwork, footnotes, etc.  When we think it doesn't
(for example, if there is a modern/new intro for an old
title), we try to mention this in the clearance notes.

If *you* think the artwork post-dates the printed volume,
please let us know so we can judge (we == Juliet & I).
Otherwise, yes: it's safe to think that any copyright rule
applied is for the entire printed work, including artwork.
  -- Greg
From marcello at perathoner.de  Sat Aug 20 18:33:30 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat Aug 20 18:34:02 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <Pine.LNX.4.61.0508202113001.2861@aphrodite.gnu-designs.com>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>	<4307C9E5.5070606@perathoner.de>
	<Pine.LNX.4.61.0508202113001.2861@aphrodite.gnu-designs.com>
Message-ID: <4307D9EA.5040704@perathoner.de>

David A. Desrosiers wrote:

>     Incorrect. Plucker runs on PalmOS, PocketPC, Windows MObile, Linux 
> and on non-PDA desktop machines. There are ports of the viewer for those 
> platforms, many of which we carry in CVS.

2.1 What platforms does Plucker run on?

The viewer should run on any PalmOS? device running version 2.0.4 or 
higher of PalmOS, while the desktop tools are supported on Linux, 
Windows, Mac OS X, and OS/2.

   ---- http://www.plkr.org/faq/2.1


And, no, I won't tell Aunt Tillie that she just has to pull the sources 
from CVS and compile if she wants to read a book.


>     Plucker, Vade Mecum (the PocketPC viewer based on Plucker) are the 
> tools you need.

Is this thing GPLed? Why don't I find any reference to this on the 
plucker site?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From hacker at gnu-designs.com  Sat Aug 20 19:19:06 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sat Aug 20 19:19:53 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <4307D9EA.5040704@perathoner.de>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<4307C9E5.5070606@perathoner.de>
	<Pine.LNX.4.61.0508202113001.2861@aphrodite.gnu-designs.com>
	<4307D9EA.5040704@perathoner.de>
Message-ID: <Pine.LNX.4.61.0508202214280.4442@aphrodite.gnu-designs.com>


> The viewer should run on any PalmOS? device running version 2.0.4 or 
> higher of PalmOS, while the desktop tools are supported on Linux, 
> Windows, Mac OS X, and OS/2.
>
>  ---- http://www.plkr.org/faq/2.1

 	As you know, the documentation is the last thing to be 
updated, and we can never track every single project out there using 
Plucker as an engine (there are now over 2-dozen of them, commercial 
and non).

> And, no, I won't tell Aunt Tillie that she just has to pull the 
> sources from CVS and compile if she wants to read a book.

 	Of course not, download the binaries provided on the other 
websites. In the case of Linux-based PDAs, use the reader packaged for 
those platforms (we don't provide packages for them, of course, thats 
not our job).

 	The same goes for the PocketPC and WindowsMobile versions. I'm 
not sure about a Symbian version, but I know Plucker runs on that new 
Nokia/Linux tablet device.

> Is this thing GPLed? Why don't I find any reference to this on the 
> plucker site?

 	Perhaps you didn't look? Its been there for almost exactly 2 
years:

 	http://www.plkr.org/news/31

 	As for the "cobwebs" on the site, the Plucker site is being 
rewritten from the ground up, and that includes catching up on about 
30 news articles that have to be made public as well.

 	We all have day jobs and that takes away from our time to play 
with these kinds of things. I've recently been asking the community to 
help us bring the docs and FAQ and other bits up to date, but the 
response has been depressingly light.

 	http://code.plkr.org/docwiki/

 	And some things I've been working on are over here:

 	http://code.plkr.org/


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From jon at noring.name  Sat Aug 20 19:06:22 2005
From: jon at noring.name (Jon Noring)
Date: Sat Aug 20 20:24:04 2005
Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate)
In-Reply-To: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
Message-ID: <109403289.20050820200622@noring.name>

Joshua wrote:

> I played a little with the ReaderWorks converter for HTML to LIT.
> The biggest limitation is that the LIT format supports a nice Table
> of Contents feature which a basic HTML to LIT conversion doesn't
> support.  The LIT specs are supposedly free (and under a Free
> License) but I haven't checked into it any further than that.  I
> supposed after TXT, HTML and PDF are working in the PG mainstream,
> I'll move on to other formats like the Palm and Reader formats.

LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What
ReaderWorks does is take HTML and "conforms" it internally to OEBPS,
then converts it to LIT using Microsoft's litgen.dll.

Microsoft has a Reader SDK which includes a "demo" to convert OEBPS
1.0.1 into LIT. I've taken that demo and tweaked the C++ code some and
then compiled it to generate a "production" level converter which I
use for my publishing business. ReaderWorks has some bugs not allowing
using the full power of OEBPS which LIT supports.

The LIT format supports the OEBPS Tours and "out-of-spine" feature
(where "out-of-spine" content is presented in "pagelets".) Most
publishers who produce LIT (using either ReaderWorks or, heaven
forbid, Word HTML as the input) are totally unaware of these cool
features. I use Tours and "out-of-spine" content a lot in my ebooks
(e.g., I put all footnotes into popup pagelets.)

Joshua, I'd be happy to share my OEBPS to LIT converter, as well as a
sample OEBPS Publication. You can use the Package supplied in the
sample Publication as a template to build your own Packages and
implement Tours and "out-of-spine" content. Let me know...

Jon Noring

From brad at chenla.org  Sun Aug 21 00:09:50 2005
From: brad at chenla.org (Brad Collins)
Date: Sun Aug 21 00:10:44 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <1ac.3db11948.30390647@aol.com> (Bowerbird@aol.com's message of
	"Sat, 20 Aug 2005 18:18:47 EDT")
References: <1ac.3db11948.30390647@aol.com>
Message-ID: <8xyvepfl.fsf@chenla.org>


Bowerbird /

I don't understand why a person who hates XML/HTML so much sends HTML
formated mail to the list.

You should really practice what you preach :)

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From marcello at perathoner.de  Sun Aug 21 05:40:05 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Aug 21 05:40:27 2005
Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate)
In-Reply-To: <109403289.20050820200622@noring.name>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<109403289.20050820200622@noring.name>
Message-ID: <43087625.9010104@perathoner.de>

Jon Noring wrote:

> LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What
> ReaderWorks does is take HTML and "conforms" it internally to OEBPS,
> then converts it to LIT using Microsoft's litgen.dll.

I'll add new formats to the PGTEI converter on the condition that:

  1. all components of the converter MUST be open source,
  2. all components of the converter MUST run under linux,
  3. the new format SHOULD be documented and be an open standard,
  4. there SHOULD be at least one free as in beer reader.


Ad 1.

The converter must run on servers at ibiblio. We cannot afford server 
licenses. Besides, I'm a narrow-minded free software bigot bastard and 
proud of it.

Ad 2.

The converter must run on ibiblio servers which run on linux.

Ad 3.

Ideally the format should be an open standard like HTML. I personally 
won't do any work on undocumented formats. But if anybody else takes the 
trouble I'm not going to stand in their way.

Ad 4.

Ideally the viewer should be open source, but I'll settle for a free 
beer one. It just feels wrong to make people pay for a viewer to read 
free books on.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Sun Aug 21 07:01:10 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sun Aug 21 05:54:42 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <4307C9E5.5070606@perathoner.de>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<4307C9E5.5070606@perathoner.de>
Message-ID: <43088926.70704@hutchinson.net>

Thanks to some back and forth with David Widger, we have posted a text 
to the PG archives that is basically the XML with its straight from 
conversion txt, html and pdf files.

http://www.gutenberg.org/1/6/5/2/16523

For those interested:  This book (Kitab-i-Aqdas) is a religious book 
from the Baha'i Faith.  The text is freely available from the Baha'i 
website with a usage license that allows us to post the text to our 
archive as long as we don't make any content changes.  I've basically 
converted it from the Microsoft Word format they posted in to a PGTEI 
based master and used that to create text in UTF-8, Latin-1 and 7-bit 
ASCII, html and pdf.

Regarding the XML.  The XML file can be found in the 16523-x 
subdirectory.  These files are not designed to be read directly in a web 
browser like IE or Firefox.  They are plain text files and open just 
fine in Notepad or vi or any other text editor of choice.  For those 
wishing to play with the XML, our online validator and conversion tools 
can be found here:

http://www.gutenberg.org/tei

Besides wanting to celebrate the first XML posting ;)  ... I'm also 
looking for contructive criticism.  What doesn't look right?  What 
problems do you see with the results?

Thanks for your attention,
Joshua Hutchinson
From hacker at gnu-designs.com  Sun Aug 21 06:25:55 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Sun Aug 21 06:26:49 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <43088926.70704@hutchinson.net>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net>
Message-ID: <Pine.LNX.4.61.0508210924510.17973@aphrodite.gnu-designs.com>


> Besides wanting to celebrate the first XML posting ;)  ... I'm also 
> looking for contructive criticism.  What doesn't look right?  What 
> problems do you see with the results?

 	Other than the unicode changes, what is the difference between 
16523-0.txt and 16523-8.txt? They appear to contain identical content.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From marcello at perathoner.de  Sun Aug 21 07:14:25 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Aug 21 07:14:33 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <43088926.70704@hutchinson.net>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>	<4307C9E5.5070606@perathoner.de>
	<43088926.70704@hutchinson.net>
Message-ID: <43088C41.3000004@perathoner.de>

Joshua Hutchinson wrote:

> Besides wanting to celebrate the first XML posting ;)  ... I'm also 
> looking for contructive criticism.  What doesn't look right?  What 
> problems do you see with the results?

1.

The TEI files should better be named .tei and put into a 16523-tei/ 
directory. We have other types of XML files (MusicXML) and we don't want 
to get confused. Besides, TEI is the more specific appellation than XML.

2.

The PDF shows some overly long page headlines. The page headline in pdf 
is taken from the toc entry ... Maybe I should change that to be taken 
from the pdf bookmarks, so you have a little more control over it.

Personally I would just not include all "notes" into the toc. This is 
allowed by the license ("in whole or in part").

3.

PDF again. The "Synopsis and Codification" section is not indented like 
in TXT and HTML. That is probably a bug in the converter. I'll look into it.

4.

PDF again. Some chapter names contain unicode characters like em-dash 
and pretty quotes. These are not supported by PDF bookmarks. You have to 
provide a `dumbed-down' title for the bookmark with: <index index="pdf"> 
before the <head>.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Sun Aug 21 08:51:04 2005
From: jon at noring.name (Jon Noring)
Date: Sun Aug 21 08:51:16 2005
Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate)
In-Reply-To: <43087625.9010104@perathoner.de>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<109403289.20050820200622@noring.name> <43087625.9010104@perathoner.de>
Message-ID: <348550588.20050821095104@noring.name>

Marcello wrote:
> Jon Noring wrote:

>> LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What
>> ReaderWorks does is take HTML and "conforms" it internally to OEBPS,
>> then converts it to LIT using Microsoft's litgen.dll.

> I'll add new formats to the PGTEI converter on the condition that:
>
>   1. all components of the converter MUST be open source,
>   2. all components of the converter MUST run under linux,
>   3. the new format SHOULD be documented and be an open standard,
>   4. there SHOULD be at least one free as in beer reader.

Well, that pretty much leaves LIT out of the picture (essentially by 3
and 4).

However, OEBPS 1.0.1 would be a viable format to produce (and quite
easy if the books documents will validate in XHTML 1.0 Strict.) Then
end-users can produce LIT if they so choose. (I'd also produce an
OEBPS 1.2 Publication version as well -- there are subtle differences
between the two.) As a format, OEBPS fulfills all the openness
requirements.

There are a couple primitive viewers (still under development) for
OEBPS 1.0.1 and 1.2. This includes the "OpenBerg" project. OpenReader
(the format) is planning on embracing OEBPS 1.2 and later a selected
subset of TEI (PGTEI?).

Jon Noring


From joshua at hutchinson.net  Sun Aug 21 11:40:27 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sun Aug 21 10:33:26 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <Pine.LNX.4.61.0508210924510.17973@aphrodite.gnu-designs.com>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>	<4307C9E5.5070606@perathoner.de>
	<43088926.70704@hutchinson.net>
	<Pine.LNX.4.61.0508210924510.17973@aphrodite.gnu-designs.com>
Message-ID: <4308CA9B.9030601@hutchinson.net>

David A. Desrosiers wrote:

>
>> Besides wanting to celebrate the first XML posting ;)  ... I'm also 
>> looking for contructive criticism.  What doesn't look right?  What 
>> problems do you see with the results?
>
>
>     Other than the unicode changes, what is the difference between 
> 16523-0.txt and 16523-8.txt? They appear to contain identical content.
>
>
16523-0.txt is UTF-8 encoding

16523-8.txt is Latin-1 encoding

16523-7.txt is ASCII encoding.

The content should otherwise be identical.

From Bowerbird at aol.com  Sun Aug 21 12:42:09 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Aug 21 12:42:19 2005
Subject: [gutvol-d] About the XML debate
Message-ID: <128.635cfd05.303a3311@aol.com>

brad said:
>    I don't understand why a person who hates XML/HTML 
>    so much sends HTML formated mail to the list.

it's been forced upon me, as i can't turn it off.


>    You should really practice what you preach :)

yes, i should.   instead the powers-that-be are
forcing things on me, which is what y'all preach.       ;+)

meanwhile, this thread has hit 20+ posts over the weekend,
which is very rude, since many people do not check their
e-mailboxes on the weekend, so i'll opt out until tomorrow.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/ff937c93/attachment.html
From marcello at perathoner.de  Sun Aug 21 13:01:25 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Aug 21 13:01:50 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <128.635cfd05.303a3311@aol.com>
References: <128.635cfd05.303a3311@aol.com>
Message-ID: <4308DD95.3080001@perathoner.de>

Bowerbird@aol.com wrote:

> it's been forced upon me, as i can't turn it off.

ROTFL: the great programmer can't figure out how to install a decent 
mail program.

Your words are soo big, but your actual abilities are pretty small.


> so i'll opt out until tomorrow.

Promises, promises ...


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Sun Aug 21 14:14:21 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Sun Aug 21 13:04:32 2005
Subject: [gutvol-d] About the XML debate
In-Reply-To: <128.635cfd05.303a3311@aol.com>
References: <128.635cfd05.303a3311@aol.com>
Message-ID: <4308EEAD.1000805@hutchinson.net>

Bowerbird@aol.com wrote:

> meanwhile, this thread has hit 20+ posts over the weekend,
> which is very rude, since many people do not check their
> e-mailboxes on the weekend, so i'll opt out until tomorrow.

Huh?  That sentence is perfect English, yet makes no sense to me at all.

Josh
From Bowerbird at aol.com  Sun Aug 21 13:51:15 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Aug 21 13:51:31 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
Message-ID: <88.2d4a4147.303a4343@aol.com>

joshua said:
>    ANNOUNCEMENT: XML has hit the PG archives!

that's great!

one down, 16000+ left to convert...

congratulations!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/84e53e69/attachment.html
From Bowerbird at aol.com  Sun Aug 21 13:52:07 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sun Aug 21 13:52:20 2005
Subject: [gutvol-d] re: narrow-minded free software bigot bastard,
	and proud of it
Message-ID: <d1.2f5e525c.303a4377@aol.com>

marcello said:
>    Besides, I'm a narrow-minded free software bigot bastard 
>    and proud of it.

great.   i like people who stand up for what they believe in.

now i have a question.   who runs project gutenberg, anyway?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/5f08ff36/attachment.html
From marcello at perathoner.de  Sun Aug 21 14:31:44 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun Aug 21 14:31:58 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <88.2d4a4147.303a4343@aol.com>
References: <88.2d4a4147.303a4343@aol.com>
Message-ID: <4308F2C0.7060401@perathoner.de>

Bowerbird@aol.com wrote:

>>   ANNOUNCEMENT: XML has hit the PG archives!
> 
> that's great!
> 
> one down, 16000+ left to convert...

That's exactly one more done item than you can show.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From gbnewby at pglaf.org  Sun Aug 21 14:40:04 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun Aug 21 14:40:06 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <43088926.70704@hutchinson.net>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>
	<4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net>
Message-ID: <20050821214004.GB3229@pglaf.org>

On Sun, Aug 21, 2005 at 10:01:10AM -0400, Joshua Hutchinson wrote:
> Thanks to some back and forth with David Widger, we have posted a text 
> to the PG archives that is basically the XML with its straight from 
> conversion txt, html and pdf files.
> 
> http://www.gutenberg.org/1/6/5/2/16523

Thanks, Joshua.  This is major!!

I'm still ready to post Gilgamesh, too (and in fact,
had been thinking of just "going for it").  I hope
you'll be able to work on it soon.

Today (yesterday?) will stand as a great day in 
Project Gutenberg history.  XML as the base format
for these "static" and forthcoming "dynamic" conversions
is what we've been talking about for years.  It's
the key to many of the activities we've anticipated.

Congratulations!!!
  -- Greg
From jeroen.mailinglist at bohol.ph  Mon Aug 22 12:43:19 2005
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Mon Aug 22 13:09:50 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <43088926.70704@hutchinson.net>
References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com>	<4307C9E5.5070606@perathoner.de>
	<43088926.70704@hutchinson.net>
Message-ID: <430A2AD7.8050302@bohol.ph>


Hurray, more XML!

Some time ago (in February 2004), I'd already prepared

The Einstein Theory of Relativity by H.A. Lorentz

as http://www.gutenberg.org/etext/11335

This was also posted in XML with derived text and HTML.

Jeroen.


Joshua Hutchinson wrote:

> Thanks to some back and forth with David Widger, we have posted a text 
> to the PG archives that is basically the XML with its straight from 
> conversion txt, html and pdf files.
>
> http://www.gutenberg.org/1/6/5/2/16523
>
> For those interested:  This book (Kitab-i-Aqdas) is a religious book 
> from the Baha'i Faith.  The text is freely available from the Baha'i 
> website with a usage license that allows us to post the text to our 
> archive as long as we don't make any content changes.  I've basically 
> converted it from the Microsoft Word format they posted in to a PGTEI 
> based master and used that to create text in UTF-8, Latin-1 and 7-bit 
> ASCII, html and pdf.
>
> Regarding the XML.  The XML file can be found in the 16523-x 
> subdirectory.  These files are not designed to be read directly in a 
> web browser like IE or Firefox.  They are plain text files and open 
> just fine in Notepad or vi or any other text editor of choice.  For 
> those wishing to play with the XML, our online validator and 
> conversion tools can be found here:
>
> http://www.gutenberg.org/tei
>
> Besides wanting to celebrate the first XML posting ;)  ... I'm also 
> looking for contructive criticism.  What doesn't look right?  What 
> problems do you see with the results?
>
> Thanks for your attention,
> Joshua Hutchinson
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>
>

From Bowerbird at aol.com  Mon Aug 22 14:02:58 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Aug 22 14:03:14 2005
Subject: [gutvol-d] re: the great programmer
Message-ID: <1a2.3a3a3de0.303b9782@aol.com>

marcello said:
>    ROTFL: the great programmer 
>    can't figure out how to 
>    install a decent mail program.
>    Your words are soo big, 
>    but your actual abilities are pretty small.

yeah, i'm just one of the 30 million a.o.l. idiots
slobbering on our keyboards...            :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050822/1f5e632a/attachment.html
From lee at novomail.net  Mon Aug 22 15:59:41 2005
From: lee at novomail.net (Lee Passey)
Date: Tue Aug 23 01:49:29 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <20050821132650.DA1298C992@pglaf.org>
References: <20050821132650.DA1298C992@pglaf.org>
Message-ID: <430A58DD.60004@novomail.net>

Joshua Hutchinson <joshua@hutchinson.net> wrote:

> Thanks to some back and forth with David Widger, we have posted a text 
> to the PG archives that is basically the XML with its straight from 
> conversion txt, html and pdf files.
>
> http://www.gutenberg.org/1/6/5/2/16523
>
> For those interested: This book (Kitab-i-Aqdas) is a religious book 
> from the Baha'i Faith. The text is freely available from the Baha'i 
> website with a usage license that allows us to post the text to our 
> archive as long as we don't make any content changes. I've basically 
> converted it from the Microsoft Word format they posted in to a PGTEI 
> based master and used that to create text in UTF-8, Latin-1 and 7-bit 
> ASCII, html and pdf.
>
> Regarding the XML. The XML file can be found in the 16523-x 
> subdirectory. These files are not designed to be read directly in a 
> web browser like IE or Firefox. They are plain text files and open 
> just fine in Notepad or vi or any other text editor of choice. For 
> those wishing to play with the XML, our online validator and 
> conversion tools can be found here:
>
> http://www.gutenberg.org/tei
>
> Besides wanting to celebrate the first XML posting ;) ... I'm also 
> looking for contructive criticism. What doesn't look right? What 
> problems do you see with the results?

Congratulations on a worthwhile accomplishment.

I would like to point out, however, that this is _not_ Gutenberg's first 
XML posting; I believe there are hundreds of XHTML files currently 
available. You probably intended to say that this is Gutenberg's first 
TEI-XML posting. I know that this seems like picking at some pretty 
minor nits, but there are some people who believe that there is actually 
a text markup language called XML. XML is actually a syntax for creating 
markup languages, and there are many markup language available which 
conform to the XML syntax, e.g. XHTML, TEI, and DocBook. For clarity's 
sake it is probably desirable to always refer to a specific XML 
vocabulary, except when discussing the XML syntax which applies to all 
XML vocabularies equally.

Some specific, and very preliminary observations:

As Mr. Noring is always quick to point out, XML files can be viewed 
natively in both Firefox and IE6 when accompanied by appropriate style 
sheets, so I attempted to open this file directly in both of these browsers.

In IE6, I get the error "The system cannot locate the object specified. 
Error processing resource 
'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent'. Apparently, your 
dtd, http://www.gutenberg.org/tei/marcello/0.3/dtd/pgtei.dtd, contains 
the line:

<!ENTITY % TEI SYSTEM "http://www.tei-c.org/P4X/DTD/tei2.dtd"> %TEI;

It looks like IE sees a full url for the TEI SYSTEM entity, so it 
assumes that

<!ENTITY % TEI.extensions.ent SYSTEM "pgtei-extensions.ent" >

refers to a file on the same system as "tei2.dtd." Of course, the TEI 
consortium doesn't maintain a file called "pgtei-extensions.ent", so IE 
fails catastrophically. Now I'm still having a hard time wrapping my 
head around dtd's, so I have no idea if IE's behavior is technically 
correct or not, but it would be nice if the dtd's could be reworked in 
such a way that this failure does not occur, perhaps by hosting the TEI 
dtd's at http://www.gutenberg.org/tei/marcello/0.3/dtd/, and referencing 
them there.

Firefox does not have this problem, but Firefox also breaks when it 
encounters named entities, even when the entities are referenced in .ent 
files included from the dtd's, leading me to believe that Firefox avoids 
the problems associated with "roaming dtd's" by simply not parsing them 
in the first place. Numerical entities _are_ recognized, and rendered 
appropriately, as are named entities when the entity definition is 
contained in the XML file itself. I have no solution to this problem, 
except to suggest that named entities simply be avoided in favor of 
numeric entities, at least in the short term (I do note that the etext 
16523-x.xml does not contain any named entities).

One of my pet peeves is the use of the <p> (paragraph) tag as a generic 
block tag, rather than limiting its use to true paragraphs, and using 
the <div> tag for generic blocks of text. I am happy to say that the 
text is mostly correct in this regard. The byline <p>by Bah??u?ll?h</p> 
should be marked using the <byline> tag instead of <p>; there may be 
other similar problems I simply haven't encountered yet.

It appears that the file is latin-1 encoded, despite the fact that the 
DTD claims that it is utf-8 encoded. This caused Firefox some grief as 
it tried to utf-8-decode some latin-1 accented vowels.

I grabbed an arbitrary "tei.css" style sheet off the net, and added the 
line:

<?xml-stylesheet href="tei.css" type="text/css"?>

to the beginning of the file. Looking at it in both browsers (after I 
had copied enough .dtd's and .ent's to my local file system that IE 
could cope) the document looked quirky, but readable. When I deleted the 
.css file the document turned into a plain-text file, totally without 
styling, but nothing broke. I think every PGTEI document should probably 
start with the three lines:

<?xml-stylesheet href="tei.css" type="text/css"?>
<?xml-stylesheet href="pgtei.css" type="text/css"?>
<?xml-stylesheet href="usertei.css" type="text/css"?>

and one of the next tasks should be to develop CSS files for generic TEI 
files and PG TEI files (the "usertei.css" file should be reserved for 
sophisticated users who may want to override the standard styles). If 
this were done (and the dtd issues are resolved for IE), the production 
TEI files should be usable directly by a modern web browser without any 
kind of pre-processing.

If you're interested, I'll start putting together a generic CSS file for 
TEI.

From joshua at hutchinson.net  Tue Aug 23 05:06:34 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Aug 23 04:14:54 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <430A58DD.60004@novomail.net>
References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net>
Message-ID: <430B114A.4030808@hutchinson.net>

Lee Passey wrote:

> Congratulations on a worthwhile accomplishment.
>
Thanks!

> I would like to point out, however, that this is _not_ Gutenberg's 
> first XML posting; I believe there are hundreds of XHTML files 
> currently available. You probably intended to say that this is 
> Gutenberg's first TEI-XML posting. I know that this seems like picking 
> at some pretty minor nits, but there are some people who believe that 
> there is actually a text markup language called XML. XML is actually a 
> syntax for creating markup languages, and there are many markup 
> language available which conform to the XML syntax, e.g. XHTML, TEI, 
> and DocBook. For clarity's sake it is probably desirable to always 
> refer to a specific XML vocabulary, except when discussing the XML 
> syntax which applies to all XML vocabularies equally.
>

We've had some back channel discussion on just how to name this and 
we've decided to change the extension to .tei to give a better 
indication of what the file is.

> Some specific, and very preliminary observations:
>
> As Mr. Noring is always quick to point out, XML files can be viewed 
> natively in both Firefox and IE6 when accompanied by appropriate style 
> sheets, so I attempted to open this file directly in both of these 
> browsers.
>
While this is true, our tei files are specifically meant as a master 
document and NOT as a viewing document.  They will NOT parse in any 
browser "out of the box".  As you've seen, you can jury-rig things to 
the point where it is usuable, but that is not our intention.  We 
provide the HTML files directly for people that want to browse the file 
in IE or Firefox.

Also, we have had some backchannel discussion about how the web server 
should serve the .tei files.  I think Marcello is going to change the 
server to tell your browser that the .tei files is a mime encoding of 
text so that it will display like a .txt file would.  This will help 
prevent people from being confused when their browser tries to display 
the file directly and fails miserably.

>
> Firefox does not have this problem, but Firefox also breaks when it 
> encounters named entities, even when the entities are referenced in 
> .ent files included from the dtd's, leading me to believe that Firefox 
> avoids the problems associated with "roaming dtd's" by simply not 
> parsing them in the first place. Numerical entities _are_ recognized, 
> and rendered appropriately, as are named entities when the entity 
> definition is contained in the XML file itself. I have no solution to 
> this problem, except to suggest that named entities simply be avoided 
> in favor of numeric entities, at least in the short term (I do note 
> that the etext 16523-x.xml does not contain any named entities).
>
I personally prefer numeric entities, as well, but for the more common 
ones, the conversion process will support named entities in the .tei 
file.  Most of them appear as unicode in the HTML, so it typically isn't 
an issue in the final product.

> One of my pet peeves is the use of the <p> (paragraph) tag as a 
> generic block tag, rather than limiting its use to true paragraphs, 
> and using the <div> tag for generic blocks of text. I am happy to say 
> that the text is mostly correct in this regard. The byline <p>by 
> Bah??u?ll?h</p> should be marked using the <byline> tag instead of 
> <p>; there may be other similar problems I simply haven't encountered 
> yet.
>
You are correct.  That'll get fixed today.

> It appears that the file is latin-1 encoded, despite the fact that the 
> DTD claims that it is utf-8 encoded. This caused Firefox some grief as 
> it tried to utf-8-decode some latin-1 accented vowels.
>
I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 
was a superset of Latin1?  Anyway, I know if this particular file there 
are quite a few UTF-8 encoded characters (and a couple more that should 
be that we found yesterday backchannel).

>
> If you're interested, I'll start putting together a generic CSS file 
> for TEI.

We aren't too interested in CSS directly for the TEI file (the css file 
sitting beside the TEI file right now is a mistake ... that should be 
changed later today).  However, once I have a few more documents posted 
and people seem fairly satisfied with the results, I want to get 
alternate CSS files submitted by other people for the HTML documents.

Also, if any industrious programmers out there know TEI conversions and 
would like to tackle the job of preparing a conversion process for other 
end formats (such as Palm files, Plucker, MS Reader, etc) please let me 
and/or Marcello know.  The conversion must run on Linux (our server OS) 
and be open source (for future compatibility).

Josh
From greg at durendal.org  Tue Aug 23 04:47:20 2005
From: greg at durendal.org (Greg Weeks)
Date: Tue Aug 23 04:47:44 2005
Subject: [gutvol-d] 1950 periodicals renewals
Message-ID: <Pine.LNX.4.44.0508230743070.30069-100000@durendal.durendal.org>


When looking for the periodicals renewals for 1950 last night I didn't
find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948,
1949 and 1950 didn't even have a renewals section in the book. This is
"The Catalog of Copyright Entries" in the Carnegie Library. Does anyone
know what's going on with these? I've got some journal entries I'm trying
to put a rule 6 clearance together for and I neede information on 1950
through 1955.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From Bowerbird at aol.com  Tue Aug 23 08:07:38 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Aug 23 08:07:46 2005
Subject: [gutvol-d] the issues with using stylesheets across many documents
Message-ID: <1d9.4317f85a.303c95ba@aol.com>

i'd like to assure myself that there is experience here on
the issues with using stylesheets across many documents.

how would you sum up _the_major_question_, and answer it?
what problems typically arise?   what are some workarounds?
when/how/why do workarounds cause their own problems?

anyone who has worked extensively with stylesheets knows
their magical power is typically matched by an ornery ability
to mess things up too, and very badly.   is that expertise here?

please, someone, show me that it is, with a detailed treatment.
(or else i will have to come in and give one, and you all know
how insufferable my superior tone can be, right?).   thank you.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/1abce568/attachment.html
From jon at noring.name  Tue Aug 23 08:21:04 2005
From: jon at noring.name (Jon Noring)
Date: Tue Aug 23 09:02:12 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <430B114A.4030808@hutchinson.net>
References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net>
	<430B114A.4030808@hutchinson.net>
Message-ID: <484135928.20050823092104@noring.name>

Joshua wrote:
> Lee Passey wrote:

>> As Mr. Noring is always quick to point out, XML files can be viewed
>> natively in both Firefox and IE6 when accompanied by appropriate style
>> sheets, so I attempted to open this file directly in both of these 
>> browsers.

> While this is true, our tei files are specifically meant as a master
> document and NOT as a viewing document.  They will NOT parse in any 
> browser "out of the box".  As you've seen, you can jury-rig things to
> the point where it is usuable, but that is not our intention.  We 
> provide the HTML files directly for people that want to browse the file
> in IE or Firefox.

One value in the direct viewing of PG-TEI documents is for checking
the markup -- to make sure the content is properly marked up (Lee
later brought up a specific example of incorrectly applied markup to
the particular PG-TEI document under discussion.)

For example, one could put together a "silly.css", using a variety
of text colors, font-styles, font-weights, etc., to highlight
certain structures and text semantics.

Another knotty issue is that TEI includes structural/semantic markup
that current HTML-based browsers don't know how to natively (without
CSS) handle or interpret properly (and even with the right CSS some
substandard browsers like IE6 can't be forced to handle properly.)

This includes the inline note tag -- HTML has never had an inline note
tag where it is assumed, even without CSS, the browser will pull the
note out of the main flow and present it separately (such as in a
popup window.) [HTML *should* have had this feature from the start but
that's water under the bridge -- XHTML 2.0 plans to include
functionality to allow this, so future browsers will have to be able,
without CSS, to extract certain inline stuff and render it outside the
main flow, such as in a popup window, to the side, or other means. My
kudos to the XHTML working group for implementing this!]


> Also, we have had some backchannel discussion about how the web server
> should serve the .tei files.  I think Marcello is going to change the
> server to tell your browser that the .tei files is a mime encoding of
> text so that it will display like a .txt file would.  This will help
> prevent people from being confused when their browser tries to display
> the file directly and fails miserably.

Good point! Another way around the issue is to simply zip up the TEI
document for download, and include a separate "readthisfirst.txt"
file describing what it is and how to directly render it if that is
of interest to the end-user.


>> Firefox does not have this problem, but Firefox also breaks when it
>> encounters named entities, even when the entities are referenced in
>> .ent files included from the dtd's, leading me to believe that Firefox
>> avoids the problems associated with "roaming dtd's" by simply not 
>> parsing them in the first place.

This is interesting. Didn't know this. I don't think Firefox has
concentrated on general XML rendering. Interestingly FF does support
a subset of XLink, thus it is possible, using XLink, to create
hypertext links in non-XHTML documents (with the full XLink, it is
possible to do other things, such as embed images, to be equivalent
to the HTML <img> and <object> tags.) I'll have to repeat this
experiment with Opera 8 to see if they've enabled some XLink stuff
(Opera 7 did not.)


>> It appears that the file is latin-1 encoded, despite the fact that the
>> DTD claims that it is utf-8 encoded. This caused Firefox some grief as
>> it tried to utf-8-decode some latin-1 accented vowels.

> I may be wrong here (Marcello is my unicode guru), but I thought UTF-8
> was a superset of Latin1?  Anyway, I know if this particular file there
> are quite a few UTF-8 encoded characters (and a couple more that should
> be that we found yesterday backchannel).

If what Lee refers to as "Latin-1" is ISO-8859, then Lee is right, it
is NOT correct to specify the document encoding as UTF-8 since they
are incompatible.

It is my personal view that ISO-8859 should never be used for the PG
masters -- UTF-8 should be used instead. That "7-bit" ASCII conforms
to UTF-8 is a nice bonus. (But ISO-8859-x, a.k.a. "8-bit ASCII" and
"Latin-1", does not conform to UTF-8.)


>> If you're interested, I'll start putting together a generic CSS file
>> for TEI.

> We aren't too interested in CSS directly for the TEI file (the css file
> sitting beside the TEI file right now is a mistake ... that should be
> changed later today).  However, once I have a few more documents posted
> and people seem fairly satisfied with the results, I want to get 
> alternate CSS files submitted by other people for the HTML documents.

As noted above, I think a generic CSS file for PG-TEI would be a great
idea! It allows direct viewing of the master for errors, and the CSS
can be tweaked for direct viewing by end-users (probably restricted
to Firefox and Opera in order to handle inline notes, where the CSS
has to move the inline notes and similar stuff to a box outside of the
flow of the text, maybe highlighted in some way -- as noted above, IE6
chokes on this CSS2 stuff.)

Another issue of incompatibility, where CSS may break down, is that
the table model in TEI is different in some ways from the HTML table
model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI
include support for TEI tables? (I would assume it does.)


> Also, if any industrious programmers out there know TEI conversions and
> would like to tackle the job of preparing a conversion process for other
> end formats (such as Palm files, Plucker, MS Reader, etc) please let me
> and/or Marcello know.  The conversion must run on Linux (our server OS)
> and be open source (for future compatibility).

For MS Reader, unless one wants to build an unapproved and possibly
illegal converter (since the LIT format has been cracked it is now
possible), one has to use Microsoft's litgen.dll to produce LIT files,
thus restricting the converter to MS Windows (litgen.dll requires, in
turn, MSXML for XML document parsing and validation.) Litgen takes as
input an OEBPS 1.0.1 Publication.

Now I do think it worthwhile to produce OEBPS as one of the output
formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion
into LIT so others may do so automatically), and OEBPS 1.2 (which is
the current OEBPS standard and is preferable.) Essentially, the process
works as follows:

PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s)

OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication

Inline notes would be handled by inserting an anchor link where the
note was, and pulling the note into a separate XHTML/OEBPS document.
The notes can either be aggregated into one document, or each be kept
in their own document. The OEBPS 1.x framework will easily handle
multiple documents that comprise one publication (it's very cool,
really, in how it works.)

Jon


(p.s., Lee, did you experiment with Opera 8? They have a full-featured
free version -- just have to put up with the ads in the free version.)

From joshua at hutchinson.net  Tue Aug 23 09:21:50 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Aug 23 09:21:56 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
Message-ID: <20050823162150.78010EE161@ws6-1.us4.outblaze.com>


----- Original Message -----
From: "Jon Noring" <jon@noring.name>
> 
> Another issue of incompatibility, where CSS may break down, is that
> the table model in TEI is different in some ways from the HTML table
> model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI
> include support for TEI tables? (I would assume it does.)
> 

Yes it does.  See www.gutenberg.org/tei for a link to the documentation we have on PGTEI.

> 
> PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s)
> 
> OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication
> 

If you or anyone else would like to code something up, I'd be happy to test it out.  I'm afraid my talents do not lie in that direction!  ;)

Josh
From marcello at perathoner.de  Tue Aug 23 09:34:40 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 09:34:56 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <484135928.20050823092104@noring.name>
References: <20050821132650.DA1298C992@pglaf.org>
	<430A58DD.60004@novomail.net>	<430B114A.4030808@hutchinson.net>
	<484135928.20050823092104@noring.name>
Message-ID: <430B5020.7040307@perathoner.de>

Jon Noring wrote:

> As noted above, I think a generic CSS file for PG-TEI would be a great
> idea!

Every PGTEI producer is free to use as many CSS she wants. It just 
doesn't make sense to post them.


> Now I do think it worthwhile to produce OEBPS as one of the output
> formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion
> into LIT so others may do so automatically), and OEBPS 1.2 (which is
> the current OEBPS standard and is preferable.) Essentially, the process
> works as follows:
> 
> PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s)
> 
> OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication

We already produce XHTML 1.0. So if you want to build a converter XHTML 
-> OEBPS you may start right now.


P.S. I just don't have the time nor the inclination to read all your 
words. If you want better answers I suggest getting to the point faster.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Tue Aug 23 09:54:54 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Tue Aug 23 09:55:04 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <430B114A.4030808@hutchinson.net>
References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net>
	<430B114A.4030808@hutchinson.net>
Message-ID: <Pine.GSO.4.58.0508230941540.11970@vtn1.victoria.tc.ca>


On Tue, 23 Aug 2005, Joshua Hutchinson wrote:

> I may be wrong here (Marcello is my unicode guru), but I thought UTF-8
> was a superset of Latin1?  Anyway, I know if this particular file there
> are quite a few UTF-8 encoded characters (and a couple more that should
> be that we found yesterday backchannel).
>


Well, if you look merely at abstract numbered code points,
it is correct to say that the initial code points of Unicode
are numbered the same as ISO Latin-1.

However, you have to realize that, while ISO Latin-1 is a
legacy encoding in which each character is encoded using
only one byte, the nature of Unicode has led to different
different methods (Unicode Transformation Formats) of
actually encoding each character in a series of bytes.

One way to look at UTF-8 is as a compressed format.
(When used to encode texts which consist primarily of
the character found in lower ascii, UTF-16, which uses
two bytes for each character, results in noticably
longer files) Ascii characters are encoded the same
in UTF-8 as in common legacy single-byte encodings,
but all higer numbered characters are represented by
muli-byte sequences.

Excerpt from: http://en.wikipedia.org/wiki/UTF-8
   So the first 128 characters need one byte. The next 1920 characters
   need two bytes to encode. This includes Latin alphabet characters
   with diacritics, Greek, Cyrillic, Coptic, Armenian, Hebrew, and
   Arabic characters. The rest of the BMP characters use three bytes,
   and additional characters are encoded in four bytes.

I hope that is somewhat clear....

Andrew
From marcello at perathoner.de  Tue Aug 23 09:54:57 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 09:55:07 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <430A58DD.60004@novomail.net>
References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net>
Message-ID: <430B54E1.9080900@perathoner.de>

Lee Passey wrote:

> It appears that the file is latin-1 encoded, despite the fact that the 
> DTD claims that it is utf-8 encoded. This caused Firefox some grief as 
> it tried to utf-8-decode some latin-1 accented vowels.

That is just what Apache thinks it is because it doesn't look inside the 
file before serving it. Apache can be made to serve the encoding based 
on the file extension. Lacking a definite extension it will serve the 
default which is iso-8859-1.

The same problem exists with all plain text files in the archive. They 
are all served as iso-8859-1. We cannot fix that unless we rename all files:

   12345-8.txt  --> 12345-8.txt.8
   12345-0.txt  --> 12345-0.txt.0

In this case Apache sees the .0 extension, strips it, and serves the 
file as 12345-0.txt with utf-8 encoding.

And don't look at me. I made this suggestion before the new filesystem 
went live.


> I grabbed an arbitrary "tei.css" style sheet off the net, and added the 
> line:
> 
> <?xml-stylesheet href="tei.css" type="text/css"?>

You can also include an XSL stylesheet which gives you far more power.

But why do you want to look at the TEI file in the browser when there is 
an HTML file available?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From kth at srv.net  Tue Aug 23 09:45:50 2005
From: kth at srv.net (Kevin Handy)
Date: Tue Aug 23 10:08:09 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
In-Reply-To: <20050823162150.78010EE161@ws6-1.us4.outblaze.com>
References: <20050823162150.78010EE161@ws6-1.us4.outblaze.com>
Message-ID: <430B52BE.5000906@srv.net>

Joshua Hutchinson wrote:

>----- Original Message -----
>From: "Jon Noring" <jon@noring.name>
>  
>
>>Another issue of incompatibility, where CSS may break down, is that
>>the table model in TEI is different in some ways from the HTML table
>>model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI
>>include support for TEI tables? (I would assume it does.)
>>
>>    
>>
>
>Yes it does.  See www.gutenberg.org/tei for a link to the documentation we have on PGTEI.
>
>  
>
Any plans on making something like guiguts for pgtei, and
bundling all the conversion routines with it?

>>PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s)
>>
>>OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication
>>
>>    
>>
>
>If you or anyone else would like to code something up, I'd be happy to test it out.  I'm afraid my talents do not lie in that direction!  ;)
>  
>

From marcello at perathoner.de  Tue Aug 23 10:12:28 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 10:12:36 2005
Subject: [gutvol-d] the issues with using stylesheets across many documents
In-Reply-To: <1d9.4317f85a.303c95ba@aol.com>
References: <1d9.4317f85a.303c95ba@aol.com>
Message-ID: <430B58FC.3030109@perathoner.de>

Bowerbird@aol.com wrote:

> i'd like to assure myself that there is experience here on
> the issues with using stylesheets across many documents.

No, there is not. We are all new into this. We want to get there and we 
will learn as we go.

Software development is an iterative business. Many problems surface as 
you start using the first implementation. With what you learn from the 
first implementation you go back and do the second. And so on.

Requiring all problems to be known and solved in advance is the one sure 
fire thing to never get started. Known as: "analysis paralysis".


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Tue Aug 23 10:31:49 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Aug 23 10:32:03 2005
Subject: [gutvol-d] the issues with using stylesheets across many documents
Message-ID: <19d.3a8005db.303cb785@aol.com>

marcello said:
>    No, there is not. 

marcello, _you_ might not have any
large-scale stylesheet experience.

but perhaps someone else here has.

if not, then it will be a very bumpy ride
over the next few years, as y'all learn.


>    We are all new into this. 

um, well, speak for yourself.

i started using stylesheets back in 1987,
with the first release of ventura publisher.

and i figure i made the 428 common mistakes
most people make, and learned to avoid 'em...


>    We want to get there and we will learn as we go.

and again, i'm guessing that someone here has already 
been there, and back, and will guide you, if you let them.
there are a lot of people on this listserve, willing to help...


>    Software development is an iterative business.

and again, someone has already done that back-and-forth.


>    Many problems surface as you 
>    start using the first implementation. 

stylesheets are well past their "first implementation".


>    With what you learn from the first implementation 
>    you go back and do the second. And so on.

stylesheets are well past their "second implementation".


>    Requiring all problems to be known 
>    and solved in advance is the one sure fire thing 
>    to never get started. Known as: "analysis paralysis".

and ignoring the lessons of the past is
one sure-fire way to repeat the pain...

now, is anyone willing to step in
and help out project gutenberg?

please?   thank you...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/0edbda33/attachment.html
From joshua at hutchinson.net  Tue Aug 23 11:08:53 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Aug 23 11:09:00 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
Message-ID: <20050823180853.5BF0EEE19A@ws6-1.us4.outblaze.com>


----- Original Message -----
From: "Kevin Handy" <kth@srv.net>

> Any plans on making something like guiguts for pgtei, and
> bundling all the conversion routines with it?

Not by me personally, but when I turn DP loose on this format, I expect a flurry of tools to follow.  :)

As an aside, I do all my editing for my TEI files in GuiGuts right now.  My only annoyance right now is that I wish it didn't add the darned Byte Order Mark to the beginning of the file when it detects UTF-8 characters.

Josh
From lee at novomail.net  Tue Aug 23 14:45:34 2005
From: lee at novomail.net (Lee Passey)
Date: Tue Aug 23 14:45:49 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 20)
In-Reply-To: <20050823190003.BF81A8C832@pglaf.org>
References: <20050823190003.BF81A8C832@pglaf.org>
Message-ID: <430B98FE.3030906@novomail.net>

Marcello Perathoner <marcello@perathoner.de> wrote:

> Lee Passey wrote:
>
>> It appears that the file is latin-1 encoded, despite the fact that 
>> the DTD claims that it is utf-8 encoded. This caused Firefox some 
>> grief as it tried to utf-8-decode some latin-1 accented vowels.
>
>
> That is just what Apache thinks it is because it doesn't look inside 
> the file before serving it. Apache can be made to serve the encoding 
> based on the file extension. Lacking a definite extension it will 
> serve the default which is iso-8859-1.


In this case I saved the file to my local file system before doing 
anything with it. Are you suggesting that Apache (your server) looked at 
the contents of the file it was serving and replace the <?xml ...> 
declaration to "<?xml version="1.0" encoding="utf-8" ?>" before serving 
it? Or are you suggesting that as it transfered the file it changed 
utf-8 encoded characters to Latin-1 encoding? (I've never seen that 
behavior in Apache before, but I could have overlooked something.) If I 
retrieved the file via FTP would it be different than if I retrieved it 
using HTTP?


>> I grabbed an arbitrary "tei.css" style sheet off the net, and added 
>> the line:
>>
>> <?xml-stylesheet href="tei.css" type="text/css"?>
>
>
> You can also include an XSL stylesheet which gives you far more power.


XSL isn't really a stylesheet, it is a scripting language for a 
transformational engine. XSL has many good uses, but applying styles to 
a document isn't one of them. Indeed, I've never figured out how to use 
XSL to style an XML file without having an existing Cascading Style 
Sheet that I could use for the actual styles.

> But why do you want to look at the TEI file in the browser when there 
> is an HTML file available?


Why ask why?

Actually, I'm not interested in looking at the file at all; it's as 
boring as hell. What I _am_ interested in is exploring the use of TEI as 
an archive format, _and_ as a content delivery format. I think that 
enabling a TEI-XML file to be used by a browser directly, if it can be 
done without compromising its function as an archive format, is a 
worthwhile goal, and in many cases better than requiring some sort of 
XSL transformation before it can be viewed.

From joshua at hutchinson.net  Tue Aug 23 16:48:52 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Aug 23 15:24:19 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 20)
In-Reply-To: <430B98FE.3030906@novomail.net>
References: <20050823190003.BF81A8C832@pglaf.org>
	<430B98FE.3030906@novomail.net>
Message-ID: <430BB5E4.5090607@hutchinson.net>

Lee Passey wrote:

>>
>>
>>> It appears that the file is latin-1 encoded, despite the fact that 
>>> the DTD claims that it is utf-8 encoded. This caused Firefox some 
>>> grief as it tried to utf-8-decode some latin-1 accented vowels.
>>

Ok, I tried to see what grief you are talking about ... all the accented 
vowels I looked at are appearing correctly.  Which ones are you having 
trouble with?  (This is looking at the XML directly in Firefox)

I thought everything in Latin-1 encoding would be the same under a UTF-8 
encoding, but evidentally I'm mistaken there (which wouldn't be 
surprising, my encoding set knowledge is often shaky at best).

Josh
From jon at noring.name  Tue Aug 23 15:31:54 2005
From: jon at noring.name (Jon Noring)
Date: Tue Aug 23 15:32:08 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 20)
In-Reply-To: <430BB5E4.5090607@hutchinson.net>
References: <20050823190003.BF81A8C832@pglaf.org>
	<430B98FE.3030906@novomail.net> <430BB5E4.5090607@hutchinson.net>
Message-ID: <122792848.20050823163154@noring.name>

Joshua wrote:
> Lee Passey wrote:

>> It appears that the file is latin-1 encoded, despite the fact that
>> the DTD claims that it is utf-8 encoded. This caused Firefox some
>> grief as it tried to utf-8-decode some latin-1 accented vowels.

> Ok, I tried to see what grief you are talking about ... all the accented
> vowels I looked at are appearing correctly.  Which ones are you having
> trouble with?  (This is looking at the XML directly in Firefox)
>
> I thought everything in Latin-1 encoding would be the same under a UTF-8
> encoding, but evidentally I'm mistaken there (which wouldn't be 
> surprising, my encoding set knowledge is often shaky at best).

Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL:
http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 )
that the "template" has the following DOCTYPE:

<?xml version="1.0" encoding="iso-8859-1" ?>

Why isn't it

<?xml version="1.0" encoding="utf-8" ?>

?

Is this the issue of what Lee observed, or is this a different issue?

Jon

From marcello at perathoner.de  Tue Aug 23 15:52:05 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 15:52:18 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 20)
In-Reply-To: <122792848.20050823163154@noring.name>
References: <20050823190003.BF81A8C832@pglaf.org>	<430B98FE.3030906@novomail.net>
	<430BB5E4.5090607@hutchinson.net>
	<122792848.20050823163154@noring.name>
Message-ID: <430BA895.7080802@perathoner.de>

Jon Noring wrote:

> Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL:
> http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 )
> that the "template" has the following DOCTYPE:
> 
> <?xml version="1.0" encoding="iso-8859-1" ?>
> 
> Why isn't it
> 
> <?xml version="1.0" encoding="utf-8" ?>

Because most people will want to author their TEI files in iso-8859-1.

If you want to use utf-8, just change the declaration. But you'll need 
an editor that groks utf-8.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Tue Aug 23 15:53:25 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 15:53:38 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 20)
In-Reply-To: <430B98FE.3030906@novomail.net>
References: <20050823190003.BF81A8C832@pglaf.org>
	<430B98FE.3030906@novomail.net>
Message-ID: <430BA8E5.4050305@perathoner.de>

Lee Passey wrote:

> In this case I saved the file to my local file system before doing 
> anything with it.

Then I don't know. The file is correct utf-8.

Did you tell your editor that it is an utf-8 file?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From lee at novomail.net  Tue Aug 23 16:31:03 2005
From: lee at novomail.net (Lee Passey)
Date: Tue Aug 23 16:31:18 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
	(gutvol-d Digest, Vol 13, Issue 19)
In-Reply-To: <20050823160216.6E6438C8E8@pglaf.org>
References: <20050823160216.6E6438C8E8@pglaf.org>
Message-ID: <430BB1B7.1080006@novomail.net>

Joshua Hutchinson <joshua@hutchinson.net> wrote:

> Lee Passey wrote:

[snip]

>>
>> As Mr. Noring is always quick to point out, XML files can be viewed 
>> natively in both Firefox and IE6 when accompanied by appropriate 
>> style sheets, so I attempted to open this file directly in both of 
>> these browsers.
>>
> While this is true, our tei files are specifically meant as a master 
> document and NOT as a viewing document.  They will NOT parse in any 
> browser "out of the box".  As you've seen, you can jury-rig things to 
> the point where it is usuable, but that is not our intention.  We 
> provide the HTML files directly for people that want to browse the 
> file in IE or Firefox.


I understand that creating a file format which could be viewed without 
further processing was not your intention, but now that we have some 
evidence that suggests that it is a real possiblity is there any reason 
_not_ to pursue that possiblity, especially if it only requires adding 
three lines to the source (and making sure that all the dtd's are 
accessible)?

[snip]

>> I have no solution to this problem, except to suggest that named 
>> entities simply be avoided in favor of numeric entities, at least in 
>> the short term (I do note that the etext 16523-x.xml does not contain 
>> any named entities).
>>
> I personally prefer numeric entities, as well, but for the more common 
> ones, the conversion process will support named entities in the .tei 
> file.  Most of them appear as unicode in the HTML, so it typically 
> isn't an issue in the final product.


You are correct; so long as you are relying on conversion to HTML (or 
some other file format) before the file is used, there should be no 
problem (so long as the conversion utility can get to the correct .ent 
files). Use of named entities is only a problem if you are attempting to 
display the TEI-XML directly.

[snip]

>> It appears that the file is latin-1 encoded, despite the fact that 
>> the DTD claims that it is utf-8 encoded. This caused Firefox some 
>> grief as it tried to utf-8-decode some latin-1 accented vowels.
>>
> I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 
> was a superset of Latin1?  Anyway, I know in this particular file 
> there are quite a few UTF-8 encoded characters (and a couple more that 
> should be that we found yesterday backchannel).


UTF-8 and Latin-1 (aka ISO-8859-1) are both encoding methods. They share 
the same codepoints (the value of an acute 'e' is 233 in both encodings) 
but they use different encoding methods. Neither is a superset or subset 
of the other. Values from 0 to 127 are the same in both encodings, but 
values from 128 to 255 are encoded in a single byte in Latin-1 whereas 
those same values are encoded in two bytes in UTF-8. Values above 255 
are represented in two or more bytes in UTF-8 (up to 6) where those same 
values cannot be represented at all in Latin-1. From an efficiency 
standpoint (which is not always the best way to look at things) if you 
have an English text which contains some few characters having values 
above 127, and which has as many above 255 as below, or if you have a 
text which contains a large number of characters with values above 255, 
UTF-8 is the probably the most efficient encoding (size-wise). If you 
have a western european text with a large number of characters above 
127, but very few above 255 (French is a good example) Latin-1, with 
values above 255 expressed as entities (numberic or named) is probably 
the most efficient encoding. If you have a text where most of the 
characters have values above 1920 UTF-16 is probably the most efficient 
encoding (now we're really straying from the point).

In any case, it doesn't matter which encoding is used, so long as it is 
not misrepresented in the <?xml ...> declaration.

>> If you're interested, I'll start putting together a generic CSS file 
>> for TEI.
>
>
> We aren't too interested in CSS directly for the TEI file (the css 
> file sitting beside the TEI file right now is a mistake ... that 
> should be changed later today).  However, once I have a few more 
> documents posted and people seem fairly satisfied with the results, I 
> want to get alternate CSS files submitted by other people for the HTML 
> documents.


Well, I might do it anyway for my own edification and enjoyment (and 
because I think you _will_ be interested at some point in the future ;-).)

Some months ago I put together a couple of tables showing how HTML could 
be mapped to TEI-lite, and vice-versa. The goal was to create a mapping 
that could be used for round-tripping via XSLT; that is, a TEI-lite 
document could be used to create an HTML document which could then be 
transformed back into TEI without loss of markup. I will probably start 
from those tables in creating a tei.css file. They may also be useful to 
you in creating XSLT scripts (aka XSL style sheets). If you're 
interested they can be found at www.passkeysoft.com/~lee/xhtml2tei.html 
and www.passkeysoft.com/~lee/tei2xhtml.html.

>
> Also, if any industrious programmers out there know TEI conversions 
> and would like to tackle the job of preparing a conversion process for 
> other end formats (such as Palm files, Plucker, MS Reader, etc) please 
> let me and/or Marcello know.  The conversion must run on Linux (our 
> server OS) and be open source (for future compatibility).


You probably don't need anything more than someone with basic shell 
scripting capabilities, as all the software to do this exists currently. 
When you say Palm files, I am assuming you mean PalmDOC files, which are 
nothing more than text files converted into the Palm Database format. 
This conversion can be performed by the command line program "Makedoc". 
Source code is available at 
http://linuxmafia.com/pub/palmos/other-os/makedoc9.tar.gz. The shell 
script would be:

PGTEI -> (via XSLT) -> .txt -> (via makedoc9) -> .pdb

Plucker is a progam which encapsulates a bundle of HTML files into a 
single file which can be rendered on the PalmOS. The script for a 
plucker transformation should be very similar to the PalmDOC 
transformation (I'm certain Mr. Desrosiers could help you with the 
precise syntax):

PGTEI -> (via XSLT) -> HTML -> (via plucker distiller) -> .pdb

To my knowledge there are no known lit compilers that run on Linux (thus 
making them ineligble by your requirements). This is not really a big 
deal because most MSReader users who are familiar with Project Gutenberg 
are comfortable making .lit files from HTML themselves, so if you can 
serve good HTML they will be happy.

What I would really like to see is an XSL script that could do a PGTEI 
-> RTF transformation. It probably wouldn't be very useful, but it would 
sure be interesting.

Now on a separate note:

As part of my CSS experimentation, I set the display setting for the 
<tei-header> element to "none", because while I think the data is 
important, I'm not particularly interested in seeing it when I'm 
reading. When I did this, I thought I lost the title of the book because 
it only appears in the <tei-header> element. I discovered later the 
title was repeated in the <front> element, identified as a <head>er.  As 
I read the TEI spec, (and I am by no means well-versed) I believe that 
there should also exist a <titlePage> element which should be part of 
the <front>, and which should contain all the information traditionally 
found on the title page of a book. The main title should be marked as 
<titlePart type="main">, subtitles should be marked as <titlePart 
type="sub">, and the byline should be marked as <byline>. This would be 
in addition to the information included in the <tei-header> element, 
which may be formated differently (e.g. the author's name may be 
presented last name first for automated catalog processing).

I also had some question about the difference between the <titlePart> 
element and the <title> element. Looking at the spec it seems that the 
<title> element is not to be used to indicate the title of the work, as 
would appear on a title page, but the title of _another_ work referenced 
in the main work (these are the titles we were taught to underline back 
in the days of single font typewriters). For example, if _The 
Kit?b-i-Aqdas_ made reference to the _Baghad-Vita_, it would be marked 
as <title>The Baghad-Vita</title>, and should probably be rendered with 
an italicised font.

I also note that you encoded the glossary at the end of the work with 
<p> tags (naughty, naughty). Based on what I saw in the TEI docs I would 
have encoded it as follows:

<div type="glossary">
<head>Glossary</head>
<list type="gloss">
<label>'Abdu'l-Bah?</label>
<gloss>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son 
and appointed Successor of Bah?'u'll?h, and the Centre of His 
Covenant.</gloss>
<label>Abjad<label>
<gloss>The ancient Arabic system of allocating a numerical value to 
letters of the alphabet, so that numbers may be represented by letters 
and vice versa. Thus every word has both a literal meaning and a 
numerical value.</gloss>

etc.

</list></div>

I hope you find this useful.

From marcello at perathoner.de  Tue Aug 23 17:09:27 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue Aug 23 17:09:41 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!	(gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <430BB1B7.1080006@novomail.net>
References: <20050823160216.6E6438C8E8@pglaf.org>
	<430BB1B7.1080006@novomail.net>
Message-ID: <430BBAB7.8050906@perathoner.de>

Lee Passey wrote:

> I understand that creating a file format which could be viewed without 
> further processing was not your intention, but now that we have some 
> evidence that suggests that it is a real possiblity is there any reason 
> _not_ to pursue that possiblity, especially if it only requires adding 
> three lines to the source (and making sure that all the dtd's are 
> accessible)?

Supporting CSS styling will add another complexity layer to an already 
overly complex thing. A software architect has to leave things out to 
make the design implementable.

Also, things like footnotes are impossible with CSS. So why bother?


> In any case, it doesn't matter which encoding is used, so long as it is 
> not misrepresented in the <?xml ...> declaration.

Both the TEI and the XHTML file are correct. I don't know why it doesn't 
work for you.


> As part of my CSS experimentation, I set the display setting for the 
> <tei-header> element to "none", because while I think the data is 
> important, I'm not particularly interested in seeing it when I'm 
> reading. When I did this, I thought I lost the title of the book because 
> it only appears in the <tei-header> element. I discovered later the 
> title was repeated in the <front> element, identified as a <head>er.  As 
> I read the TEI spec, (and I am by no means well-versed) I believe that 
> there should also exist a <titlePage> element which should be part of 
> the <front>, and which should contain all the information traditionally 
> found on the title page of a book.

That is for the encoder to decide. If the title page is interesting 
enough to warrant a separate encoding, she will use <titlePage> etc. to 
mark it up.

If the title page is just plain boring you can generate a standard title 
page with <divGen type="titlepage">. This will pull all data out of the 
<teiHeader> and save you the trouble.

There are a lot of such shortcuts implemented like <divGen type="toc"> 
and <divGen type="footnotes">.


> I also note that you encoded the glossary at the end of the work with 
> <p> tags (naughty, naughty). Based on what I saw in the TEI docs I would 
> have encoded it as follows:
> 
> <div type="glossary">
> <head>Glossary</head>
> <list type="gloss">
> <label>'Abdu'l-Bah?</label>
> <gloss>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son 
> and appointed Successor of Bah?'u'll?h, and the Centre of His 
> Covenant.</gloss>

And it wouldn't have validated because gloss has no business inside list.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jmdyck at ibiblio.org  Tue Aug 23 17:55:07 2005
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Tue Aug 23 17:57:45 2005
Subject: [gutvol-d] 1950 periodicals renewals
References: <Pine.LNX.4.44.0508230743070.30069-100000@durendal.durendal.org>
Message-ID: <430BC56B.5D6E7524@ibiblio.org>

Greg Weeks wrote:
> 
> When looking for the periodicals renewals for 1950 last night I didn't
> find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948,
> 1949 and 1950 didn't even have a renewals section in the book. This is
> "The Catalog of Copyright Entries" in the Carnegie Library. Does anyone
> know what's going on with these? I've got some journal entries I'm trying
> to put a rule 6 clearance together for and I neede information on 1950
> through 1955.

>From 1947 to 1950, renewals for all classes of registrations were
published together in Part 14 of the CCE (which was actually split into
14B for Music and 14A for everything else). In 1947, they divided up 14A
somewhat by class (so periodicals renewals only occupied 3 pages), but
then they gave up and put all of 14A into a single collation.

So the periodical (class B) renewals for 1950 are in Part 14A,
interfiled with all the other renewals for everything-but-music. E.g.,
for Jan-June 1950, they're spread over pages 1-60 of Part 14A, and for
July-Dec 1950, they're spread over pages 61-121. 

PG has text versions of the *book* renewals for 1950-1977. Because of
the 1948-1950 interfiling, PG also has the periodical renewals for 1950,
scattered throughout etexts #11801 (Jan-June 1950) and #11802 (July-Dec
1950). E.g., in #11801, the first periodical renewal is under "Abbott's
Digest of All the New York Reports". (You can tell it's a periodical
renewal because its original registration starts with 'B'.) Page images
for these two volumes appear at
<http://onlinebooks.library.upenn.edu/cce/1950r.html>.

---

After 1950, the Copyright Office discontinued Part 14 of the CCE, and
went back to having each Part of the CCE also contain the renewal
records in the classes covered by that Part.

So the periodical renewals for 1951-1977 are in Part 2 of the CCE. E.g.,
for Jan-June 1951, they're on pages 155-159 of Part 2, and for July-Dec
1951, they're on pages 303-307. Page images for these two sections
appear at <http://onlinebooks.library.upenn.edu/cce/1951r.html>, about
3/4 of the way down the page. 

-Michael Dyck

From jon at noring.name  Tue Aug 23 18:12:05 2005
From: jon at noring.name (Jon Noring)
Date: Tue Aug 23 18:12:27 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <430BB1B7.1080006@novomail.net>
References: <20050823160216.6E6438C8E8@pglaf.org>
	<430BB1B7.1080006@novomail.net>
Message-ID: <1218078621.20050823191205@noring.name>

Lee Passey wrote:
> Joshua Hutchinson wrote:

>> While this is true, our tei files are specifically meant as a master
>> document and NOT as a viewing document.  They will NOT parse in any
>> browser "out of the box".  As you've seen, you can jury-rig things to
>> the point where it is usuable, but that is not our intention.  We 
>> provide the HTML files directly for people that want to browse the 
>> file in IE or Firefox.

> I understand that creating a file format which could be viewed without
> further processing was not your intention, but now that we have some
> evidence that suggests that it is a real possiblity is there any reason
> _not_ to pursue that possiblity, especially if it only requires adding
> three lines to the source (and making sure that all the dtd's are 
> accessible)?

Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI
Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is
also a real possibility. But I believe, subject to change as I learn
more from the experts here and the TEI-L folk, that in order to make
PGTEI+CSS2 to render in web standards browsers (limited now to Firefox
and maybe Opera 8) we also have to appropriately constrain/subset the
PG-TEI vocabulary (allowed elements/attributes/attr-values) and
content models (what results may be somewhat like TEI-Lite, but not
exactly the same -- we can certainly add our own tags as needs
require.) We may also have to give up a couple things.

[Note: Even if CSS2 rendering is not of interest, I think PG-TEI, when
released as version 1.0, needs to be appropriately constrained to make
life a whole lot easier for everyone using it -- subject of a future
message if this topic comes up.]

Assuming appropriate constraints, here's the five items needing
further investigation to see how to get them to render properly using
CSS2 (there may be other TEI constructs which don't fit well into the
XHTML model):

1) The TEI <note> tag. If placed directly inline (not indirectly
   referenced), it is possible in CSS2 to declare it block and move it
   outside of the main flow, which is a reasonable way to present it
   (even if not the best.) I've actually experimented with this, but
   my test files are inexplicably long-lost <fuming class="mad"/>.
   This won't work in IE6, but then IE6 sucks when it comes to web
   standards support. (I assume with XSLT that more advanced moving
   around of the content within notes is possible to do, such as
   dumping it into another document or placing it in a notes section.)

2) Hypertext links. CSS2 'display' provides no mapping for anchors.
   XLink will work, but then that's outside of TEI. (XLink for
   hypertext linking is recognized in Mozilla/Firefox, but not in
   Opera 7 -- don't know about Opera 8 yet. Try the following test:

      http://www.windspun.com/demoxml/demolink.xml

3) Tables. I think the basic TEI table model will map to the XHTML
   model (there's quite a few table-related CSS2 'display' values.)

   However, if PG-TEI will optionally allow other table models to
   be used, such as CALS, all bets are off. I'm not sure that even
   XSLT will be able to properly map any CALS table to XHTML (may
   require something outside of XSLT to do the transformation.)

4) Lists. I think that TEI Lists can be made to render properly with
   CSS2 'display', but not sure. It needs experimentation.

5) Images. CSS2 'display' has no mapping for images and objects. XLink
   provides the ability to embed objects, but no web browser appears to
   support this functionality of XLink yet, and anyway XLink will not
   be used to specify images in PG-TEI documents.

   (Hmmm, I think here it may be possible with CSS2 to pull out the
   name of the image and then use that name as a string to embed the
   image back in -- CSS2 is capable of image embedding. Need to
   experiment with it. It might work in IE6, too.)


>> I personally prefer numeric entities, as well, but for the more common
>> ones, the conversion process will support named entities in the .tei
>> file.  Most of them appear as unicode in the HTML, so it typically 
>> isn't an issue in the final product.

> You are correct; so long as you are relying on conversion to HTML (or
> some other file format) before the file is used, there should be no 
> problem (so long as the conversion utility can get to the correct .ent
> files). Use of named entities is only a problem if you are attempting to
> display the TEI-XML directly.

Yes, definitely! Of course, those named character entities which are
defined in HTML/XHTML will be renderable in webs standards browsers.

But I think it best, in whatever DP exports as PG-TEI, to use numeric
character entities. For primarily "ASCII" documents, a manifest of
non-ASCII characters used in the document can be placed in a comment
somewhere in the header. This allows someone to know what &#x1234;
found in the text is (here it is an Ethiopic character), without
having to refer to the Unicode docs. I build a non-ASCII character
manifest for many of the XHTML documents I author.


> In any case, it doesn't matter which encoding is used, so long as it is
> not misrepresented in the <?xml ...> declaration.

Yes. To reply to Marcello's comment in another message, the PG-TEI
documentation should make it clear, and provide an example, of using
either ISO-8859-1 or UTF-8 in the XML declaration.

If it was my druthers, only UTF-8 should be used, but a compromise
where ISO-8859-1 can also be used is acceptable. But no others for all
mostly Latin documents! And I'd work at a future time to re-encode
documents in ISO-8859-1 into UTF-8.


>> We aren't too interested in CSS directly for the TEI file (the css 
>> file sitting beside the TEI file right now is a mistake ... that 
>> should be changed later today).  However, once I have a few more 
>> documents posted and people seem fairly satisfied with the results, I
>> want to get alternate CSS files submitted by other people for the HTML
>> documents.

> Well, I might do it anyway for my own edification and enjoyment (and
> because I think you _will_ be interested at some point in the future ;-).)

<laugh> Careful Lee, you almost sound like Bowerbird on that one (but
not quite.)

I think it is an excellent exercise to explore how to properly render 
XML-conforming TEI documents using only CSS2 in web standards browsers.
It may indicate how to constrain TEI so it is renderable, which may
be useful for the set of criteria to build the constrained PG-TEI
subset of TEI.

It is also useful for the proposed TEI support in OpenReader.


> Some months ago I put together a couple of tables showing how HTML could
> be mapped to TEI-lite, and vice-versa. The goal was to create a mapping
> that could be used for round-tripping via XSLT; that is, a TEI-lite 
> document could be used to create an HTML document which could then be
> transformed back into TEI without loss of markup. I will probably start
> from those tables in creating a tei.css file. They may also be useful to
> you in creating XSLT scripts (aka XSL style sheets). If you're 
> interested they can be found at
> www.passkeysoft.com/~lee/xhtml2tei.html 
> and www.passkeysoft.com/~lee/tei2xhtml.html.

Well, round-tripping using XSLT and direct rendering of TEI using CSS2
are two different things. I believe XSLT has more power, but CSS2 is
not bad, and CSS3 adds some new stuff (but mostly not supported in
Firefox and Opera.)


>> Also, if any industrious programmers out there know TEI conversions
>> and would like to tackle the job of preparing a conversion process for
>> other end formats (such as Palm files, Plucker, MS Reader, etc) please
>> let me and/or Marcello know.  The conversion must run on Linux (our
>> server OS) and be open source (for future compatibility).

> To my knowledge there are no known lit compilers that run on Linux (thus
> making them ineligble by your requirements). This is not really a big
> deal because most MSReader users who are familiar with Project Gutenberg
> are comfortable making .lit files from HTML themselves, so if you can
> serve good HTML they will be happy.

My view in LIT production is to go from PG-TEI to well-structured
XHTML 1.1 (which is probably what Lee means by "HTML".) Then from
there build OEBPS 1.0.1 (LIT optimized) and OEBPS 1.2. Then let
end-users convert the OEBPS 1.0.1 to LIT using the simple
litconvertdemo in MS Reader's SDK (I have a "non-demo" version of the
same). This approach takes full advantage of what LIT provides, while
ReaderWorks does not (RW is buggy plus does not support a couple of
the Reader/LIT features.) That is, to produce the hightest quality LIT
having available the full range of Reader/LIT features, it is much
better to start with OEBPS 1.0.1 than to use ReaderWorks which
assembles HTML fragments.

Jon


From greg at durendal.org  Tue Aug 23 18:20:51 2005
From: greg at durendal.org (Greg Weeks)
Date: Tue Aug 23 18:21:05 2005
Subject: [gutvol-d] 1950 periodicals renewals
In-Reply-To: <430BC56B.5D6E7524@ibiblio.org>
Message-ID: <Pine.LNX.4.44.0508232119230.7430-100000@durendal.durendal.org>

On Tue, 23 Aug 2005, Michael Dyck wrote:

> Greg Weeks wrote:
> >
> > When looking for the periodicals renewals for 1950 last night I didn't
> > find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948,
> > 1949 and 1950 didn't even have a renewals section in the book. This is
> > "The Catalog of Copyright Entries" in the Carnegie Library. Does anyone
> > know what's going on with these? I've got some journal entries I'm trying
> > to put a rule 6 clearance together for and I neede information on 1950
> > through 1955.
>
> >From 1947 to 1950, renewals for all classes of registrations were
> published together in Part 14 of the CCE (which was actually split into
> 14B for Music and 14A for everything else). In 1947, they divided up 14A
> somewhat by class (so periodicals renewals only occupied 3 pages), but
> then they gave up and put all of 14A into a single collation.
>
> So the periodical (class B) renewals for 1950 are in Part 14A,
> interfiled with all the other renewals for everything-but-music. E.g.,
> for Jan-June 1950, they're spread over pages 1-60 of Part 14A, and for
> July-Dec 1950, they're spread over pages 61-121.
>
> PG has text versions of the *book* renewals for 1950-1977. Because of
> the 1948-1950 interfiling, PG also has the periodical renewals for 1950,
> scattered throughout etexts #11801 (Jan-June 1950) and #11802 (July-Dec
> 1950). E.g., in #11801, the first periodical renewal is under "Abbott's
> Digest of All the New York Reports". (You can tell it's a periodical
> renewal because its original registration starts with 'B'.) Page images
> for these two volumes appear at
> <http://onlinebooks.library.upenn.edu/cce/1950r.html>.
>
> ---
>
> After 1950, the Copyright Office discontinued Part 14 of the CCE, and
> went back to having each Part of the CCE also contain the renewal
> records in the classes covered by that Part.
>
> So the periodical renewals for 1951-1977 are in Part 2 of the CCE. E.g.,
> for Jan-June 1951, they're on pages 155-159 of Part 2, and for July-Dec
> 1951, they're on pages 303-307. Page images for these two sections
> appear at <http://onlinebooks.library.upenn.edu/cce/1951r.html>, about
> 3/4 of the way down the page.

Thank you. That's what I wanted to know. 1950 is already done then for PG.
I have photocopies now of 1951-1969 for periodicals renewals. I know books
have already been done by DP.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From Bowerbird at aol.com  Tue Aug 23 19:18:38 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Tue Aug 23 19:18:56 2005
Subject: [gutvol-d] the new mantra for project gutenberg
Message-ID: <82.2ed2bc5e.303d32fe@aol.com>

marcello said:
>    I don't know why it doesn't work for you.

this will increasingly become the mantra for project gutenberg...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/789da4cd/attachment.html
From jon at noring.name  Tue Aug 23 19:38:42 2005
From: jon at noring.name (Jon Noring)
Date: Tue Aug 23 19:38:59 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <430BBAB7.8050906@perathoner.de>
References: <20050823160216.6E6438C8E8@pglaf.org>
	<430BB1B7.1080006@novomail.net> <430BBAB7.8050906@perathoner.de>
Message-ID: <143389361.20050823203842@noring.name>

Marcello wrote:
> Lee Passey wrote:

>> I also note that you encoded the glossary at the end of the work with
>> <p> tags (naughty, naughty). Based on what I saw in the TEI docs I would
>> have encoded it as follows:
>> 
>> <div type="glossary">
>> <head>Glossary</head>
>> <list type="gloss">
>> <label>'Abdu'l-Bah?</label>
>> <gloss>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son
>> and appointed Successor of Bah?'u'll?h, and the Centre of His 
>> Covenant.</gloss>

> And it wouldn't have validated because gloss has no business inside list.

TEI P4 shows how to do it (I think):

http://www.tei-c.org/P4X/DS.html#TDX-280

Then from there it links to:

http://www.tei-c.org/P4X/CO.html#COLI

Where it gives the following example:

   <list type="gloss">
    <head>Report of the conduct and progress of Ernest Pontifex.
      Upper Vth form &mdash; half term ending Midsummer 1851</head>
    <label>Classics</label>    <item>Idle listless and unimproving</item>
    <label>Mathematics</label> <item>ditto</item>
    <label>Divinity</label>    <item>ditto</item>
    <label>Conduct in house</label> <item>Orderly</item>
    <label>General conduct</label>
    <item>Not satisfactory, on account of his great
       unpunctuality and inattention to duties</item>
   </list>


Also refer to: http://www.tei-c.org/P4X/CO.html#COHQU

Which talks about the <gloss> element. It appears that this particular
markup problem has appeared before for TEI-P4 to even discuss it (see
the prior links.)

Definitely Lee is right in that <p> is not the best for this purpose,
and Marcello is right in that how Lee used it is incorrect. In fact,
the closer I look at the above example, the more it looks like XHTML
definition lists with almost an exact mapping between the two except
that XHTML <dl> (analogous to TEI <list type="gloss">) cannot contain
anything but <dd> <dt> pairs, while the TEI version can also contain a
<head>er. In fact, as I look at it, getting the example above to work
in XHTML is problematic because of the <head> line. In fact, XHTML has
pretty poor list support for internal headers and the like (all the
lists: ol, ul, and dl, only support li, and dd/dt for dl), so this
looks like item #6 in my "problems with TEI+CSS2 rendering" list.

Jon

From joshua at hutchinson.net  Tue Aug 23 21:28:27 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Tue Aug 23 20:05:36 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <143389361.20050823203842@noring.name>
References: <20050823160216.6E6438C8E8@pglaf.org>	<430BB1B7.1080006@novomail.net>
	<430BBAB7.8050906@perathoner.de>
	<143389361.20050823203842@noring.name>
Message-ID: <430BF76B.9010701@hutchinson.net>

Jon Noring wrote:

>Marcello wrote:
>  
>
>>Lee Passey wrote:
>>    
>>
>
>  
>
>>>I also note that you encoded the glossary at the end of the work with
>>><p> tags (naughty, naughty). Based on what I saw in the TEI docs I would
>>>have encoded it as follows:
>>>
>>><div type="glossary">
>>><head>Glossary</head>
>>><list type="gloss">
>>><label>'Abdu'l-Bah?</label>
>>><gloss>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son
>>>and appointed Successor of Bah?'u'll?h, and the Centre of His 
>>>Covenant.</gloss>
>>>      
>>>
>
>  
>
>>And it wouldn't have validated because gloss has no business inside list.
>>    
>>
><snip good info on glossary markup>
>  
>
There is a concept that Marcello and I have discussed of markup 
"levels".  When it comes to something like TEI, there are so many ways 
you can add meta data it is completely daunting at times.  In this 
example, yes, a more specific markup could have been used.  But, in the 
final render, it works just fine as <p> blocks.

Another example is a text with foreign words interspersed throughout.  
Often, those words would be printed in italics in the original book.  
Now, the simplest markup in TEI would be to put <hi 
rend="italics">around</hi> the word.  But you could also mark the word 
with a <foreign lang="en">foreign</foreign> tag.  In the final render, 
it would look exactly the same, but the second option provides more 
specific metadata.  You could even go further by provide a translation 
of the foreign word inside the attribute (the markup escapes me at the 
moment).

The markup that would cover what PG currently has would be want I would 
call a "level one markup" and that is the minimum, obviously, that a TEI 
could be marked to.  Level two would be given a little more metadata, 
but nothing drastic.  Maybe marking certain words as foreign instead of 
italics.  Marking a letter as such instead of just a block of indented 
paragraphs.  etc. etc.

Level three would be going the extra, extra mile.  It's the kind of 
markup I don't expect to see, but is possible in TEI.

I expect most TEI documents we post will fall in level one or level two.

Josh
From JBuck814366460 at aol.com  Tue Aug 23 20:43:22 2005
From: JBuck814366460 at aol.com (Jared Buck)
Date: Tue Aug 23 20:48:47 2005
Subject: [gutvol-d] the new mantra for project gutenberg
In-Reply-To: <82.2ed2bc5e.303d32fe@aol.com>
References: <82.2ed2bc5e.303d32fe@aol.com>
Message-ID: <430BECDA.9040801@aol.com>

An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/44e1fc85/attachment.html
From lee at novomail.net  Wed Aug 24 08:59:45 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 08:59:56 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 21
In-Reply-To: <20050824005745.3DB258C8E0@pglaf.org>
References: <20050824005745.3DB258C8E0@pglaf.org>
Message-ID: <430C9971.7070800@novomail.net>

Marcello Perathoner <marcello@perathoner.de> wrote:

> Lee Passey wrote:
>
>> I understand that creating a file format which could be viewed 
>> without further processing was not your intention, but now that we 
>> have some evidence that suggests that it is a real possiblity is 
>> there any reason _not_ to pursue that possiblity, especially if it 
>> only requires adding three lines to the source (and making sure that 
>> all the dtd's are accessible)?
>
>
> Supporting CSS styling will add another complexity layer to an already 
> overly complex thing. A software architect has to leave things out to 
> make the design implementable.


Adding three lines to the template file adds complexity?

> Also, things like footnotes are impossible with CSS. So why bother?


I've never had any problems with footnotes and CSS. But the real key is 
to separate the TEI and the CSS, even at a conceptual level. I would 
never suggest creating a TEI file with the assumption that it would be 
rendered in conjunction with some specific CSS file, or indeed assuming 
that it will have some specific rendering at all. I recommend creating 
TEI files that are both valid and correct with regard to the TEI spec 
and not be concerned at all about how it might be rendered. On the other 
hand, simple modifications which will enable other people and 
applications to select a rendering should be acceptable if it's not a 
hinderance to the primary goal of producing valid and correct TEI.

>> In any case, it doesn't matter which encoding is used, so long as it 
>> is not misrepresented in the <?xml ...> declaration.
>
>
> Both the TEI and the XHTML file are correct. I don't know why it 
> doesn't work for you.


I think I may. I downloaded the ZIP archive of the file to be sure that 
there were no issues involving Apache or Firefox. After extracting the 
contents, and before touching the file with any other application, I did 
a hexdump on the file. Sure enough, it was valid UTF-8 encoding.

I don't know which of my editors did the conversion, but I suspect it 
was Microsoft's XML editor which ships as part of of Visual Studio Dot 
Net. I also suspect that the conversion was not to iso-8859-1 but to 
win-1252, which is indistinquishable from 8859-1 except in the range of 
128-159. The HTML editor that shipped with earlier versions of Visual 
C++ was known to do this conversion under the covers and without 
warning. After all, if you're running on a version of Microsoft Windows 
you're obviously going to want to be using Microsoft's own character 
mappings, right?

>> As part of my CSS experimentation, I set the display setting for the 
>> <tei-header> element to "none", because while I think the data is 
>> important, I'm not particularly interested in seeing it when I'm 
>> reading. When I did this, I thought I lost the title of the book 
>> because it only appears in the <tei-header> element. I discovered 
>> later the title was repeated in the <front> element, identified as a 
>> <head>er.  As I read the TEI spec, (and I am by no means well-versed) 
>> I believe that there should also exist a <titlePage> element which 
>> should be part of the <front>, and which should contain all the 
>> information traditionally found on the title page of a book.
>
>
> That is for the encoder to decide. If the title page is interesting 
> enough to warrant a separate encoding, she will use <titlePage> etc. 
> to mark it up.


Indeed. And as Mr. Hutchinson is the encoder, I'm suggesting he ought to 
conside it.

> If the title page is just plain boring you can generate a standard 
> title page with <divGen type="titlepage">. This will pull all data out 
> of the <teiHeader> and save you the trouble.
>
> There are a lot of such shortcuts implemented like <divGen type="toc"> 
> and <divGen type="footnotes">.


I'm not terribly enamored with <divGen> tags, because it seems to rely 
on software that so far is largely unimplemented. _I_ wouldn't recommend 
its use, but it _is_ part of the spec...

>> I also note that you encoded the glossary at the end of the work with 
>> <p> tags (naughty, naughty). Based on what I saw in the TEI docs I 
>> would have encoded it as follows:
>>
>> <div type="glossary">
>> <head>Glossary</head>
>> <list type="gloss">
>> <label>'Abdu'l-Bah?</label>
>> <gloss>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest 
>> son and appointed Successor of Bah?'u'll?h, and the Centre of His 
>> Covenant.</gloss>
>
>
> And it wouldn't have validated because gloss has no business inside list.


True. I made the same mistake that Mr. Hutchinson did with  the <title> 
element: assuming that an element designed to indicate usage was instead 
structural. Mr. Hutchinson's orignal was valid but incorrect (because 
the textual fragments he was dealing with were not paragraphs). My 
example was correct but invalid (because <gloss> cannot appear within 
lists). By encapsulating the <gloss> elements with <item> elements the 
glossary would become both correct _and_ valid. (My thanks to Mr. Noring 
for posting the links to the relevant portions of the spec so I don't 
have to.)

From Gutenberg9443 at aol.com  Wed Aug 24 09:14:26 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Aug 24 09:19:45 2005
Subject: [gutvol-d] Re: prank someone is pulling
Message-ID: <2b.79e03427.303df6e2@aol.com>

TO ALL:
 
Someone purporting to be from PG has faxed a book, in Finnish, the name of  
which is 
"Fredrika Runeberg," to the state of New Jersey  Surveying Office. Whoever 
did it somehow managed to get around the requirement  that the sender's name and 
telephone number is to be on the first page of all  faxes. I have assured him 
that nobody in our organization did it. He wanted to  fax the first page to 
me but my fax is down right now, and I wanted him to mail  it as an attached 
file but he doesn't have a scanner. Therefore, he has lost the  33 sheets of 
paper that were in his fax machine and is afraid to try to reuse  the fax machine 
because it will immediately try to go on printing. (I told him  to turn it 
off, unplug it, then replug it and turn it on, and he would probably  then be 
able to use the fax machine normally.)
 
He didn't give me his name other than "Jim." I have his telephone number  but 
will release it only to Greg.
 
I have asked him to snail mail me the first page and let me see what I  can 
find out.
 
If anybody on this ML is the culprit, cease and desist and notify me  
personally that you did it. Then I will only bite your head off, spit it into  your 
face, and then turn it over to Greg.
 
If anybody on this ML is the culprit and does not admit it and gets caught,  
that person's ass is grass and that person will be permanently barred from 
this  ML and everything else I can get him barred from. This conduct is  
unconscionable.

Anne

Do you like to  breathe?
Then save the trees! 
Begin a personal relationship
with an  ebook 
TODAY!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/806e42fb/attachment.html
From joshua at hutchinson.net  Wed Aug 24 09:35:39 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Aug 24 09:35:45 2005
Subject: [gutvol-d] Re: prank someone is pulling
Message-ID: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com>

Ann, this is called a FAX bomb.  It is similar to a "e-mail bomb" where someone drops a bunch of e-mail garbage on someone to clog their e-mail account.  It is usually someone who wants to "get back" at someone else.

FAX bombs are especially nasty because then you are not only tying up their service, you are wasting tangible resources like paper and ink.

The reason our file is being used is probably pretty simple ... we have great big text files that are easily accessible and work perfectly for this kind of griefing.

It is doubtful anyone that has anything to do with PG had anything to do with this.

Josh

----- Original Message -----
From: Gutenberg9443@aol.com
To: gutvol-d@lists.pglaf.org, gbnewby@pglaf.org
Subject: [gutvol-d] Re: prank someone is pulling
Date: Wed, 24 Aug 2005 12:14:26 EDT

> 
> TO ALL:
> 
> Someone purporting to be from PG has faxed a book, in Finnish, the name of
> which is
> "Fredrika Runeberg," to the state of New Jersey  Surveying Office. Whoever
> did it somehow managed to get around the requirement  that the sender's name 
> and
> telephone number is to be on the first page of all  faxes. I have assured him
> that nobody in our organization did it. He wanted to  fax the first page to
> me but my fax is down right now, and I wanted him to mail  it as an attached
> file but he doesn't have a scanner. Therefore, he has lost the  33 sheets of
> paper that were in his fax machine and is afraid to try to reuse  the fax 
> machine
> because it will immediately try to go on printing. (I told him  to turn it
> off, unplug it, then replug it and turn it on, and he would probably  then be
> able to use the fax machine normally.)
> 
> He didn't give me his name other than "Jim." I have his telephone number  but
> will release it only to Greg.
> 
> I have asked him to snail mail me the first page and let me see what I  can
> find out.
> 
> If anybody on this ML is the culprit, cease and desist and notify me
> personally that you did it. Then I will only bite your head off, spit it into  
> your
> face, and then turn it over to Greg.
> 
> If anybody on this ML is the culprit and does not admit it and gets caught,
> that person's ass is grass and that person will be permanently barred from
> this  ML and everything else I can get him barred from. This conduct is
> unconscionable.
> 
> Anne
> 
> Do you like to  breathe?
> Then save the trees!
> Begin a personal relationship
> with an  ebook
> TODAY!

>
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From marcello at perathoner.de  Wed Aug 24 10:33:32 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 24 10:33:54 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 21
In-Reply-To: <430C9971.7070800@novomail.net>
References: <20050824005745.3DB258C8E0@pglaf.org>
	<430C9971.7070800@novomail.net>
Message-ID: <430CAF6C.2020606@perathoner.de>

Lee Passey wrote:

> Adding three lines to the template file adds complexity?

No, but soon you'll start and say things like: if we did this to the TEI 
file, the rendering thru CCS would be so much easier, etc. etc.


> I would 
> never suggest creating a TEI file with the assumption that it would be 
> rendered in conjunction with some specific CSS file, or indeed assuming 
> that it will have some specific rendering at all. I recommend creating 
> TEI files that are both valid and correct with regard to the TEI spec 
> and not be concerned at all about how it might be rendered.

You just proposed the exact opposite thing: to hard-code a set of CSS 
stylesheets into the file.


> On the other 
> hand, simple modifications which will enable other people and 
> applications to select a rendering should be acceptable if it's not a 
> hinderance to the primary goal of producing valid and correct TEI.

If anybody wants to view their TEI thru CSS they should apply their 
preferred stylesheet by hand. Some browsers let you select a user 
stylesheet. No need to pollute the TEI file with that.


> I'm not terribly enamored with <divGen> tags, because it seems to rely 
> on software that so far is largely unimplemented. _I_ wouldn't recommend 
> its use, but it _is_ part of the spec...

You need not use them. If you want to code the toc by hand, feel free. 
But you get some goodies if you use divGen like a toc with correct page 
numbers in the pdf. And you don't have to jiggle hundreds of links.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Aug 24 11:43:04 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 24 11:43:20 2005
Subject: [gutvol-d] on viewing the .pgtei file directly
Message-ID: <1c6.2f5def07.303e19b8@aol.com>

i think it would be marvelous to view the .pgtei file directly.

why go through the pain of conversion if you don't have to?
and the .pgtei file is the one with all the information in it, not?
might as well view that, rather than some pale conversion...

but let's get real here for a minute, ok?

if the only people who can view the .pgtei file directly
are the few who happen to be using a specific browser,
there's no need to put a lot of resources in that direction.

however, that's not really what lee is talking about, is it?

no, it isn't.

no sir.   what lee is _really_ talking about is "openreader",
which he has begun programming.   (you _have_ begun,
haven't you, lee?   because there's no time like the present.)

because, you see, a specialized e-book viewer-program
(like openreader) can deliver an e-book experience that
_far_surpasses_ the one that an end-user gets in a browser.

and _that_ is the reason why people would want to view a
.pgtei file directly, rather than look at an .html conversion;
not because of the files per se -- it's silly to think end-users
care anything about formats -- but because of the _viewer_
and the e-book _experience_ that was delivered therein...

(savvy lurkers will recognize this as a straightforward
variant of the argument i have been making all along...)

so, if lee can deliver an openreader that is _cross-platform_
and runs on _older_hardware_, using _minimal_resources_,
and can render the .pgtei file directly, giving the end-user a 
powerful e-book experience, all from a free-beer program,
no one will use their funky web-browser to read an e-text...

so let's wish lee success in his endeavor,
for the ultimate good of all the end-users...

-bowerbird

p.s.   when i try to view 16523-x.xml directly in firefox, it says:
"this xml file does not appear to have any style information
associated with it.   the document tree is shown below." and
then it shows me the document tree.   how can i fix this problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/298f5841/attachment.html
From lee at novomail.net  Wed Aug 24 12:03:37 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 12:03:50 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 22)
In-Reply-To: <20050824155955.AAFC78C8ED@pglaf.org>
References: <20050824155955.AAFC78C8ED@pglaf.org>
Message-ID: <430CC489.5090701@novomail.net>

Jon Noring <jon@noring.name> wrote:

>Lee Passey wrote:
>

[snip]

>>Well, I might do it anyway for my own edification and enjoyment (and
>>because I think you _will_ be interested at some point in the future ;-).)
>>    
>>
>
><laugh> Careful Lee, you almost sound like Bowerbird on that one (but
>not quite.)
>  
>

The difference is that if I _do_ do it (no promises, as of right now 
it's just speculation) I will make it publicly available, even in an 
unfinished state. As a result of the discussions here on the <gloss> tag 
I can already see that I'm going to have to make changes to my TEI to 
HTML tables. Hey, it's a work in progress, and I could be wrong (I 
obviously have been in the recent past ;-)),

[snip]

>It is also useful for the proposed TEI support in OpenReader.
>  
>

Y'eh think? ;-)

>>Some months ago I put together a couple of tables showing how HTML could
>>be mapped to TEI-lite, and vice-versa. The goal was to create a mapping
>>that could be used for round-tripping via XSLT; that is, a TEI-lite 
>>document could be used to create an HTML document which could then be
>>transformed back into TEI without loss of markup. I will probably start
>>from those tables in creating a tei.css file. They may also be useful to
>>you in creating XSLT scripts (aka XSL style sheets). If you're 
>>interested they can be found at
>>www.passkeysoft.com/~lee/xhtml2tei.html 
>>and www.passkeysoft.com/~lee/tei2xhtml.html.
>>    
>>
>
>Well, round-tripping using XSLT and direct rendering of TEI using CSS2
>are two different things.
>

Absolutely. These kind of tables are useful in developing a CSS file in 
a different sort of way. If you go out to w3c.org you can find a file 
that is basically the style sheet for XHTML. If you had a User Agent 
that knew how to render XML+CSS, but which knew nothing about HTML, you 
could add this style sheet to an XHTML file and it would render just 
like in a browser. So if you know that <hi> in TEI maps to <i> in HTML, 
you could use the same style that <i> uses in the HTML style sheet for 
the <hi> element in the TEI style sheet. This purely mechanical process 
isn't going to give you a perfect (or perhaps even adequate) style sheet 
for TEI, but it will probably get you more than 50% of the way there.

[snip]


>My view in LIT production is to go from PG-TEI to well-structured
>XHTML 1.1 (which is probably what Lee means by "HTML".)
>

Oh, yeah. If you're going to use XSLT to transform TEI to HTML it makes 
absolutely no sense to output anything _other_ than XHTML 1.1. To my 
knowledge there are no tools that rely on structures of HTML 3.2 which 
are unavailable in XHTML 1.1 (except for the fact that some older 
browsers need a space before the slash on empty elements, e.g. <hr />). 
When I say HTML you can always assume I'm talking about XHTML unless I 
make it explicit otherwise.

>Definitely Lee is right in that <p> is not the best for this purpose,
>and Marcello is right in that how Lee used it is incorrect. In fact,
>the closer I look at the above example, the more it looks like XHTML
>definition lists with almost an exact mapping between the two except
>that XHTML <dl> (analogous to TEI <list type="gloss">) cannot contain
>anything but <dd> <dt> pairs, while the TEI version can also contain a
><head>er. 
>  
>

This is something that actually bothers me quite a bit about the TEI 
implementation of lists. As a programmer, I want a definition list (of 
which a glossary is a specific instance) to be structured in such a way 
that I can grab _one_ element and get both the term _and_ the associated 
definition. I really dislike both the HTML and the TEI implementation 
where it relies on the definition to be in a separate element from the 
term, but immediately following it. The two elements are obviously 
inextricably linked, but the vocabularies require the encoder to make 
the link explicit if it is to exist at all.

If I ruled the world, the term and its gloss would be combined into an 
item element, as follows (example modified from the sample at 
http://www.tei-c.org/P4X/CO.html#COLI):

<list type="gloss">
  <head>Unit Three --Vocabulary</head>
  <item>
    <term lang="la">acerbus, -a, -um</term>
    <gloss>bitter, harsh</gloss>
  </item>
  <item>
    <term lang="la">ager, agr&imacr;, M.</term>
    <gloss>field</gloss>
  </item>
  <item>
    <term lang="la">audi&omacr;, -&imacr;re, -&imacr;v&imacr;, 
-&imacr;tus</term>
    <gloss>hear, listen (to)</gloss>
  </item>
  <!-- etc. -->
</list>

I believe that this implementation of a glossary list would pass the 
scrutiny of a XML validator, but it is nonetheless incorrect as the TEI 
spec clearly states that "it is a semantic error for a list tagged with 
type='gloss' not to have labels." Heck, I dislike the TEI implementation 
of glossary lists so much that I am tempted to suggest using lists of 
type "glossary" instead of type "gloss" just to avoid the 
specification's requirements (which, by the way, an XML validator would 
not catch. Validators can tell you when you've done something wrong, but 
not when you've failed to do something right).

>In fact, as I look at it, getting the example above to work
>in XHTML is problematic because of the <head> line. In fact, XHTML has
>pretty poor list support for internal headers and the like (all the
>lists: ol, ul, and dl, only support li, and dd/dt for dl), so this
>looks like item #6 in my "problems with TEI+CSS2 rendering" list.
>  
>

Not so, because the problem is one of mapping between TEI and XHTML, not 
one of rendering TEI with CSS; although you could certainly add it to 
your "problems with transforming TEI to XHTML" list. CSS can deal with 
headers inside of lists without problem, it's HTML that has the problem.

>Jon
>  
>
From lee at novomail.net  Wed Aug 24 12:03:59 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 12:04:15 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 22)
In-Reply-To: <20050824155955.AAFC78C8ED@pglaf.org>
References: <20050824155955.AAFC78C8ED@pglaf.org>
Message-ID: <430CC49F.2070408@novomail.net>

Joshua Hutchinson <joshua@hutchinson.net> wrote:

> There is a concept that Marcello and I have discussed of markup 
> "levels".  When it comes to something like TEI, there are so many ways 
> you can add meta data it is completely daunting at times.  In this 
> example, yes, a more specific markup could have been used.  But, in 
> the final render, it works just fine as <p> blocks.


But, you see, it doesn't. On my PocketPC, and in all dedicated e-book 
programs on my desktop computer, <p> elements (in HTML of course) have 
the first line indented, and there is no blank space between paragraphs. 
In this case your glossary just looks odd: there is an indented word or 
phrase, then another indented word or phrase, then another indented word 
or phrase, most of which are not complete sentences (although every once 
in a while there is a complete sentence thrown in just to confuse me). 
There is no typographical convention that indicates which is the term 
and which is the gloss. Of course I can figure it out relatively easily, 
but to do so I have to exit "immersive reading" mode and go into "copy 
editing" mode, which is disruptive to my reading experience.

Had you created the list as a "<list 
type="gloss"><label>word</label><item>definition</item>", your XSL 
script could have transformed it into "<ul><li><strong>word</strong>: 
definition</li>", and preserved all the typographically conventions I 
have come to expect. As it is, these fragments are identified as 
paragraphs, and XSL scripts and CSS style sheets have to treat them in 
the same way they treat real paragraphs.

I have never understood the resistance to the notion that text blocks 
with indeterminable semantics should be identified as <div> rather than 
<p>. It's a simple change in mindset. When in doubt, use <div>. Look up 
the definition of the word 'paragraph' in a dictionary. If you don't 
think you could convince your English (or any other language for that 
matter) teacher that the block of text satisfies the definition, use 
<div> (or some other more appropriate element), not <p>.

This, I think, is a good example of the distinction between correctness 
and validity. I could mark up a phrase as "Four <term 
id="score">score</term> and <gloss target="score">seven</gloss> years 
ago ..." and it would be valid, although it would not be correct, as the 
word 'seven' is not really a gloss for the word 'score'.

There are times when it is valuable to know that a certain block of text 
is, in fact, a paragraph. Suppose, for example, that someone might want 
to create an annotation to accompany <title>The Kit?b-i-Aqdas</title>. 
He or she might want to preface some text with "if you look at the 
second paragraph following header 77 ..." If the user has a dt edition, 
finding this passage is fairly easy: flip through the book for something 
that looks like a header and is numbered 77, and count the paragraphs 
that follow. If there are only a few paragraphs after header 77 and 
before header 78 this is quite easy. If you're looking for the 935th 
paragraph following header 77 it can quickly become tedious. Luckily for 
us, tedious is something that computers do very well. Unluckily for us, 
Bowerbird has not yet released his algorithm for determining whether a 
block of unstructured text is a paragraph. So for today, we must rely on 
the coders to correctly identify which blocks are paragraphs, and, just 
as importantly, which blocks are not paragraphs. If every indeterminate 
block of text is marked as a paragraph, then the value of the <p> tag is 
lost; it has just become a synonym for <div>, and is redundant. As the 
pointed man in the pointless forest said to Oblio, "A point in every 
direction is as good as no point at all."

So, if being conscientious about only identifying as a paragraph that 
text which really _is_ a paragraph adds value to the file (perhaps not 
to you, but to someone, and if you didn't want this file to be useful to 
someone else you wouldn't be doing it in the first place), and if it is 
just as easy to be discriminating about paragraphs as it is _not_ to be, 
why not do it?

> Another example is a text with foreign words interspersed throughout.  
> Often, those words would be printed in italics in the original book.  
> Now, the simplest markup in TEI would be to put <hi 
> rend="italics">around</hi> the word.  But you could also mark the word 
> with a <foreign lang="en">foreign</foreign> tag.  In the final render, 
> it would look exactly the same, but the second option provides more 
> specific metadata.  You could even go further by provide a translation 
> of the foreign word inside the attribute (the markup escapes me at the 
> moment).
>
> The markup that would cover what PG currently has would be want I 
> would call a "level one markup" and that is the minimum, obviously, 
> that a TEI could be marked to.  Level two would be given a little more 
> metadata, but nothing drastic.  Maybe marking certain words as foreign 
> instead of italics.  Marking a letter as such instead of just a block 
> of indented paragraphs.  etc. etc.
>
> Level three would be going the extra, extra mile.  It's the kind of 
> markup I don't expect to see, but is possible in TEI.


I can completely agree with this notion of markup levels, but it seems 
to me that the thing that should distinquish the levels is completeness, 
not correctness. Documents at every level should be correct, even if not 
complete. In your example, the use of the <hi> tag tells the user (or 
more accurately, his or her software agent) "this text was italicized in 
the original text, but I am unable or unwilling to tell you why." The 
markup is incomplete, but it is not incorrect. If you were to mark up a 
block quotation with the <div> tag you are telling the user agent "this 
text was set aside as a block in the original text, but I am unable or 
unwilling to tell you why." I can live with that. But if you mark up a 
block of text with the <p> tag you are telling the user agent "this 
block of text contains one or more compete sentences, and deals with a 
single thought or topic or quotes one speaker's continuous words." 
Marking up a definition term as a paragraph is as incorrect as marking 
up the word 'seven' as a gloss for the word 'score.'

Please don't let the reasonable need to tolerate incompleteness become 
an excuse for incorrectness.

> I expect most TEI documents we post will fall in level one or level two.
>
> Josh 


From gbnewby at pglaf.org  Wed Aug 24 12:08:41 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed Aug 24 12:08:43 2005
Subject: [gutvol-d] Re: prank someone is pulling
In-Reply-To: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com>
References: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com>
Message-ID: <20050824190841.GD21452@pglaf.org>

On Wed, Aug 24, 2005 at 11:35:39AM -0500, Joshua Hutchinson wrote:
> Ann, this is called a FAX bomb.  It is similar to a "e-mail bomb" where someone drops a bunch of e-mail garbage on someone to clog their e-mail account.  It is usually someone who wants to "get back" at someone else.
> 
> FAX bombs are especially nasty because then you are not only tying up their service, you are wasting tangible resources like paper and ink.
> 
> The reason our file is being used is probably pretty simple ... we have great big text files that are easily accessible and work perfectly for this kind of griefing.
> 
> It is doubtful anyone that has anything to do with PG had anything to do with this.
> 
> Josh

Agreed.

It's not our responsibility to help people with their faxes...
be polite, and firm, and tell them to seek local advice on
how to cancel incoming faxes.

Unfortunately, it's pretty easy to use our eBooks for this
type of thing...including for SPAM emails.
  -- Greg


> ----- Original Message -----
> From: Gutenberg9443@aol.com
> To: gutvol-d@lists.pglaf.org, gbnewby@pglaf.org
> Subject: [gutvol-d] Re: prank someone is pulling
> Date: Wed, 24 Aug 2005 12:14:26 EDT
> 
> > 
> > TO ALL:
> > 
> > Someone purporting to be from PG has faxed a book, in Finnish, the name of
> > which is
> > "Fredrika Runeberg," to the state of New Jersey  Surveying Office. Whoever
> > did it somehow managed to get around the requirement  that the sender's name 
> > and
> > telephone number is to be on the first page of all  faxes. I have assured him
> > that nobody in our organization did it. He wanted to  fax the first page to
> > me but my fax is down right now, and I wanted him to mail  it as an attached
> > file but he doesn't have a scanner. Therefore, he has lost the  33 sheets of
> > paper that were in his fax machine and is afraid to try to reuse  the fax 
> > machine
> > because it will immediately try to go on printing. (I told him  to turn it
> > off, unplug it, then replug it and turn it on, and he would probably  then be
> > able to use the fax machine normally.)
> > 
> > He didn't give me his name other than "Jim." I have his telephone number  but
> > will release it only to Greg.
> > 
> > I have asked him to snail mail me the first page and let me see what I  can
> > find out.
> > 
> > If anybody on this ML is the culprit, cease and desist and notify me
> > personally that you did it. Then I will only bite your head off, spit it into  
> > your
> > face, and then turn it over to Greg.
> > 
> > If anybody on this ML is the culprit and does not admit it and gets caught,
> > that person's ass is grass and that person will be permanently barred from
> > this  ML and everything else I can get him barred from. This conduct is
> > unconscionable.
> > 
> > Anne
> > 
> > Do you like to  breathe?
> > Then save the trees!
> > Begin a personal relationship
> > with an  ebook
> > TODAY!
> 
> >
> > 
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d@lists.pglaf.org
> > http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From Bowerbird at aol.com  Wed Aug 24 12:35:55 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 24 12:36:12 2005
Subject: [gutvol-d] re: unluckily for us
Message-ID: <1d4.42c952fa.303e261b@aol.com>

lee said:
>     Unluckily for us, Bowerbird has not yet 
>    released his algorithm for determining whether 
>    a block of unstructured text is a paragraph.

oh gee, i'm sorry.   i thought i had.

in z.m.l., anything surrounded by two or more blank lines
is a paragraph.   for counting purposes, this works just fine,
even if your english teacher might not "approve" of it...

a z.m.l. viewer-app will display the paragraph numbers,
so there's absolutely no ambiguity about what they are...

as long as the programs you are using count the same way,
there's no reason to get strung out in definitional sand-traps.

any more questions?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/6dfad56e/attachment.html
From joshua at hutchinson.net  Wed Aug 24 12:39:04 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Aug 24 12:39:12 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
Message-ID: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com>


----- Original Message -----
From: "Lee Passey" <lee@novomail.net>

> 
> Joshua Hutchinson <joshua@hutchinson.net> wrote:
> 
> When in doubt, use <div>.

I've learned the fear the <div> container.  Let me show you why using a <div> container for "non-specific" blocks of text won't work.

<div>
<head>Level 1</head>
<p>Paragraph 1.</p>

  <div>Block o' text.</div>

<p> Paragraph 2.</p>

</div>

***

The above will not validate.  Once you go one deeper in a nest, you cannot come back up just one level.  You have to close the whole nesting.

The above would work if changed to:

<div>
<head>Level 1</head>
<p>Paragraph 1.</p>

  <div>Block o' text.</div>
</div>

<div>

<p> Paragraph 2.</p>

</div>

***

The problem is that now you have two distinct blocks of text that should really be treated as one block.

***

NOTE: None of this is to argue that the more correct way to handle the example you gave would be to mark it as something other than <p> chunks.  You are right there.  Just that your counter example of <div> wouldn't work at all.

Josh
From hacker at gnu-designs.com  Wed Aug 24 12:49:24 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Aug 24 12:50:35 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
In-Reply-To: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com>
References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0508241546230.16202@aphrodite.gnu-designs.com>


> I've learned the fear the <div> container.  Let me show you why 
> using a <div> container for "non-specific" blocks of text won't 
> work.

> <div>
> <head>Level 1</head>
> <p>Paragraph 1.</p>
>
>  <div>Block o' text.</div>
>
> <p> Paragraph 2.</p>
>
> </div>

 	Improperly nested tags will never validate. You can't have a 
bare string inside the <head> tag like that, and <head> isn't a child 
of <div>, so that won't work either. After correcting those errors, it 
validates fine.

> The above will not validate.  Once you go one deeper in a nest, you 
> cannot come back up just one level.  You have to close the whole 
> nesting.

 	Nope, this is completely untrue.

> The above would work if changed to:
>
> <div>
> <head>Level 1</head>
> <p>Paragraph 1.</p>
>
>  <div>Block o' text.</div>
> </div>
>
> <div>
>
> <p> Paragraph 2.</p>
>
> </div>

 	You're still producing invalid markup. Try something like 
this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

    <head><title>Level 1</title></head>
    <body>
       <p>Level 1</p>
       <div>

          <p>Paragraph 1.</p>

          <div>Block o' text.</div>

          <p> Paragraph 2.</p>

       </div>
    </body>
</html>


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From Gutenberg9443 at aol.com  Wed Aug 24 12:51:12 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Aug 24 12:51:28 2005
Subject: [gutvol-d] Re: prank someone is pulling
Message-ID: <1df.42c8f7e2.303e29b0@aol.com>

 
In a message dated 8/24/2005 1:09:05 PM Mountain Daylight Time,  
gbnewby@pglaf.org writes:

Unfortunately, it's pretty easy to use our eBooks for this
type of  thing...including for SPAM emails.


I agree. I told him that anybody in the world, anywhere in the world,  can 
download our files and then use them in this way. I suspect he has annoyed  
someone who is using this means of revenge. But he was just about frothing at  the 
mouth when he called, because he was convinced that it WAS somebody in our  
organization. It took a while to calm him down enough to listen to reason.
 

Anne

Do you like to  breathe?
Then save the trees! 
Begin a personal relationship
with an  ebook 
TODAY!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/b1225fe5/attachment-0001.html
From Gutenberg9443 at aol.com  Wed Aug 24 12:49:35 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Wed Aug 24 12:54:55 2005
Subject: [gutvol-d] Re: prank someone is pulling
Message-ID: <90.649164cc.303e294f@aol.com>

 
In a message dated 8/24/2005 10:36:42 AM Mountain Daylight Time,  
joshua@hutchinson.net writes:

It is  doubtful anyone that has anything to do with PG had anything to do 
with  this.


I agree. I told him that anybody in the world, anywhere in the world,  can 
download our files and then use them in this way. I suspect he has annoyed  
someone who is using this means of revenge.
 

Anne

Do you like to  breathe?
Then save the trees! 
Begin a personal relationship
with an  ebook 
TODAY!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/41c3d958/attachment.html
From lee at novomail.net  Wed Aug 24 12:57:12 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 12:57:27 2005
Subject: [gutvol-d] on viewing the .pgtei file directly (gutvol-d Digest, 
	Vol 13, Issue 23)
In-Reply-To: <20050824190003.BA8CD8C8E8@pglaf.org>
References: <20050824190003.BA8CD8C8E8@pglaf.org>
Message-ID: <430CD118.4060500@novomail.net>

Bowerbird@aol.com wrote:

> p.s.  when i try to view 16523-x.xml directly in firefox, it says:
> "this xml file does not appear to have any style information
> associated with it.  the document tree is shown below." and
> then it shows me the document tree.  how can i fix this problem?


Download the file 16523-x.zip, and extract the files. Hopefully, the 
file "persistent.css" is still part of the package.

Edit the .xml file with a simple text editor (beware Microsoft tools!) 
to add the line:

<?xml-stylesheet href="persistent.css" type="text/css"?>

immediately after the line:

<?xml version="1.0" encoding="utf-8" ?>

Save the file, and hopefully your editor won't have screwed up the utf-8 
encoding.

Now view the file from your local file system with Firefox. You should 
see just a bunch of run-on text, because the styles in 'persistent.css' 
are designed to be used with the XSL transformation to XHTML, and so 
none of them apply. But the document structure will disappear.

You can experiment by adding new styles to 'persistent.css' (don't 
forget to save the file and reload your browser after adding rules). For 
example, add "p { display:block; text-indent: 3em }" and all of a sudden 
you will get distinct, indented paragraphs (and some non-paragraphs will 
also become distinct and indented). Add "teiHeader { display: none }" 
and all the Gutenberg legal cruft, together with the metadata which is 
typically only of interest to archivers, will disappear (it's still 
there, it's just not "in your face" anymore). Add "head { display:block; 
font-size: x-large; text-align: center }" and the headers will pop out. 
Add "hi { font-style: italic }" and highlighted text will become 
italicized. Use "hi { background: yellow }" instead and the highlighted 
text will look like you have run over it with a yellow highlighter. Or 
combine them both: "hi { font-style: italic; background: yellow }".

The TEI markup used in this file is fairly simple, so you could probably 
get a pretty good looking file by using no more than a dozen or so CSS 
rules. Mr. Perathoner's TEI version of Alice in Wonderland 
(http://www.gutenberg.org/tei/marcello/0.3/examples/alice/) looks like 
it is much more complex, and would be funner to play with. I really like 
_Alice_ as an experimental text because it has quite a few typographical 
oddities which make it a good test case.

Unfortunately, I haven't been able to figure out how to tell Firefox how 
to use a user specified css file (I've got version 1.0.4). If anyone can 
enlighten me on this score, I would be most grateful. Mr. Noring tells 
me that I can do it with Opera, but I've yet to try it.

From joshua at hutchinson.net  Wed Aug 24 13:05:44 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Aug 24 13:05:53 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
Message-ID: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com>

Dave, we're talking about TEI/XML.  Nothing of what you said applies here (but you would be absolutely right on a HTML document, as I understand it). ;)

Josh


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d@lists.pglaf.org>
Subject: Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d	Digest, Vol
Date: Wed, 24 Aug 2005 15:49:24 -0400 (EDT)

> 
> 
> > I've learned the fear the <div> container.  Let me show you why using a 
> > <div> container for "non-specific" blocks of text won't work.
> 
> > <div>
> > <head>Level 1</head>
> > <p>Paragraph 1.</p>
> >
> >  <div>Block o' text.</div>
> >
> > <p> Paragraph 2.</p>
> >
> > </div>
> 
> 	Improperly nested tags will never validate. You can't have a bare string 
> inside the <head> tag like that, and <head> isn't a child of <div>, so that 
> won't work either. After correcting those errors, it validates fine.
> 
> > The above will not validate.  Once you go one deeper in a nest, you cannot 
> > come back up just one level.  You have to close the whole nesting.
> 
> 	Nope, this is completely untrue.
> 
> > The above would work if changed to:
> >
> > <div>
> > <head>Level 1</head>
> > <p>Paragraph 1.</p>
> >
> >  <div>Block o' text.</div>
> > </div>
> >
> > <div>
> >
> > <p> Paragraph 2.</p>
> >
> > </div>
> 
> 	You're still producing invalid markup. Try something like this:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> 	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
> 
>     <head><title>Level 1</title></head>
>     <body>
>        <p>Level 1</p>
>        <div>
> 
>           <p>Paragraph 1.</p>
> 
>           <div>Block o' text.</div>
> 
>           <p> Paragraph 2.</p>
> 
>        </div>
>     </body>
> </html>
> 
> 
> David A. Desrosiers
> desrod@gnu-designs.com
> http://gnu-designs.com
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From hacker at gnu-designs.com  Wed Aug 24 13:08:00 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Aug 24 13:08:36 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
In-Reply-To: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com>
References: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com>
Message-ID: <Pine.LNX.4.61.0508241607100.28921@aphrodite.gnu-designs.com>


> Dave, we're talking about TEI/XML.  Nothing of what you said applies 
> here (but you would be absolutely right on a HTML document, as I 
> understand it). ;)

 	Its not even well-formed XML, and fails XML validation (as the 
doctype shown below shows). Perhaps the TEI/XML needs to start 
conforming.


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com

From jeroen.mailinglist at bohol.ph  Wed Aug 24 13:15:57 2005
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Wed Aug 24 13:15:24 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <1218078621.20050823191205@noring.name>
References: <20050823160216.6E6438C8E8@pglaf.org>	<430BB1B7.1080006@novomail.net>
	<1218078621.20050823191205@noring.name>
Message-ID: <430CD57D.4000901@bohol.ph>

Jon Noring wrote:

>
>Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI
>Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is
>also a real possibility. But I believe, subject to change as I learn
>more from the experts here and the TEI-L folk, that in order to make
>PGTEI+CSS2 to render in web standards browsers (limited now to Firefox
>and maybe Opera 8) we also have to appropriately constrain/subset the
>PG-TEI vocabulary (allowed elements/attributes/attr-values) and
>content models (what results may be somewhat like TEI-Lite, but not
>exactly the same -- we can certainly add our own tags as needs
>require.) We may also have to give up a couple things.
>  
>

You can render XML, using XSLT + CSS in Firefox and IE, for a small 
demo, look at
http://www.gutenberg.org/files/11335/11335-x/11335-x.xml. This sample 
still has a few rough edges,
but can be made more beautiful. The XSLT is simply pulled in by the browser.

For any TEI file to work in an actual environment, you need to have a 
set of working instructions and
conventions, such as what to put in rend attributes, and how to 
interpret certain things. TEI is mainly
concerned about the semantics, but to render it, you need, even in a 
minimal way, also concern yourself
about looks.

Just some examples: I consider the foreign tag to imply no rendering 
information, only a language change. I
will use <hi> with a lang (and rend) attribute to indicate a rendering 
change as well as a language change. If somebody
applies italics to all foreign tags, it wont be as I intended it.

Similarly, I consider quotation marks part of the text, and will leave 
them, even when I use <q> tags, and never emit quotation
marks when rendering TEI. Another user may choose different.

Some have argued (with valid reasons) that the entire idea of TEI markup 
is broken, and have proposed systems
in which the mark-up is separated from the text (stream of characters), 
in such a way that multiple, parallel systems of
mark-up can exist. Think of a separate (part of a) file, saying 
characters 21 to 34 are italics, and so on. This may sound
odd, but it is the way the old Macintosh wordprocessor MacWrite worked.

Jeroen.
From jon_niehof at yahoo.com  Wed Aug 24 13:09:22 2005
From: jon_niehof at yahoo.com (Jon Niehof)
Date: Wed Aug 24 13:16:14 2005
Subject: [gutvol-d] Re: prank someone is pulling
In-Reply-To: <1df.42c8f7e2.303e29b0@aol.com>
Message-ID: <20050824200922.11755.qmail@web32912.mail.mud.yahoo.com>

--- Gutenberg9443@aol.com wrote:
> But he was just about frothing at the mouth when he called,
> because he was convinced that it WAS somebody in our  
> organization. It took a while to calm him down enough to
> listen to reason.

Hmm. If someone faxed him Julius Caesar, I wonder* if he'd be
mad at ol' Will. Or Caesar.

It annoys me no end that the good names of volunteers are
besmirched in this fashion (or via google spamming, as posted
earlier).

(*but not enough to try it or recommend someone else try)


__________________________________ 
Yahoo! Mail for Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail 
From jeroen.mailinglist at bohol.ph  Wed Aug 24 12:58:39 2005
From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account))
Date: Wed Aug 24 13:21:00 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
In-Reply-To: <Pine.LNX.4.61.0508241546230.16202@aphrodite.gnu-designs.com>
References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com>
	<Pine.LNX.4.61.0508241546230.16202@aphrodite.gnu-designs.com>
Message-ID: <430CD16F.5050906@bohol.ph>


The people were talking about TEI here, not HTML, as in your example... 
in TEI <div> means a division of a text, not some ad-hoc container as in 
HTML.

Jeroen.

David A. Desrosiers wrote:

>
>     You're still producing invalid markup. Try something like this:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
>
>
>

From hacker at gnu-designs.com  Wed Aug 24 13:28:26 2005
From: hacker at gnu-designs.com (David A. Desrosiers)
Date: Wed Aug 24 13:29:34 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <430CD57D.4000901@bohol.ph>
References: <20050823160216.6E6438C8E8@pglaf.org>
	<430BB1B7.1080006@novomail.net>
	<1218078621.20050823191205@noring.name> <430CD57D.4000901@bohol.ph>
Message-ID: <Pine.LNX.4.61.0508241626020.506@aphrodite.gnu-designs.com>


> You can render XML, using XSLT + CSS in Firefox and IE, for a small 
> demo, look at 
> http://www.gutenberg.org/files/11335/11335-x/11335-x.xml. This 
> sample still has a few rough edges, but can be made more beautiful. 
> The XSLT is simply pulled in by the browser.

 	I've been doing XML styling for years... There's nothing 
really magical about it. You can see that here:

 	http://plkr.org/rss.pl

 	There's plenty of Gutenberg XML examples here as well:

 	http://gutenberg.hwg.org/checkdoc2.html


David A. Desrosiers
desrod@gnu-designs.com
http://gnu-designs.com
From joshua at hutchinson.net  Wed Aug 24 13:31:32 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Aug 24 13:31:41 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
Message-ID: <20050824203132.70A95EE1A5@ws6-1.us4.outblaze.com>


----- Original Message -----
From: "David A. Desrosiers" <hacker@gnu-designs.com>
> 
> > Dave, we're talking about TEI/XML.  Nothing of what you said applies here 
> > (but you would be absolutely right on a HTML document, as I understand it). 
> > ;)
> 
> 	Its not even well-formed XML, and fails XML validation (as the doctype shown 
> below shows). Perhaps the TEI/XML needs to start conforming.
> 
> 

My example is perfectly legal XML.  Follow the TEI DTD.

Your example uses:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

That is a xhtml1-strict.dtd.  Completely different DTD and hence completely different set of markup rules.  For instance, <head> means VERY different things and is used in VERY different ways between a TEI document and a XHTML document.

Josh
From lee at novomail.net  Wed Aug 24 13:49:38 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 13:49:52 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 24
In-Reply-To: <20050824195127.2F8FC8C905@pglaf.org>
References: <20050824195127.2F8FC8C905@pglaf.org>
Message-ID: <430CDD62.8090404@novomail.net>

David A. Desrosiers" <hacker@gnu-designs.com> wrote:

>> I've learned the fear the <div> container.  Let me show you why using 
>> a <div> container for "non-specific" blocks of text won't work.
>
>> <div>
>> <head>Level 1</head>
>> <p>Paragraph 1.</p>
>>  <div>Block o' text.</div>
>> <p> Paragraph 2.</p>
>> </div>
>
>
>     Improperly nested tags will never validate. You can't have a bare 
> string inside the <head> tag like that, and <head> isn't a child of 
> <div>, so that won't work either. After correcting those errors, it 
> validates fine.
>
>> The above will not validate.  Once you go one deeper in a nest, you 
>> cannot come back up just one level.  You have to close the whole 
>> nesting.
>
>
>     Nope, this is completely untrue.
>
>> The above would work if changed to:
>>
>> <div>
>> <head>Level 1</head>
>> <p>Paragraph 1.</p>
>>  <div>Block o' text.</div>
>> </div>
>> <div>
>> <p> Paragraph 2.</p>
>> </div>
>
>
>     You're still producing invalid markup. Try something like this:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
>
>    <head><title>Level 1</title></head>
>    <body>
>       <p>Level 1</p>
>       <div>
>          <p>Paragraph 1.</p>
>          <div>Block o' text.</div>
>          <p> Paragraph 2.</p>
>       </div>
>    </body>
> </html>

I fear that Mr. Desrosiers has made the classic error of confusing 
varieties of fruits, which is third only to "never become involved in a 
land war in Asia." and "never bet with a Sicilian when death is on the 
line." Mr. Hutchinson's code snippet was encoded using the TEI 
vocabulary of XML, not the XHTML vocabulary. In TEI the <head> tag is 
analogous to the HTML <h1> tag, and the HTML <head> tag is analogous to 
the TEI <teiHeader> tag. Apples and oranges.

On the other hand, I don't see how Mr. Hutchinson's second example could 
validate if the first does not, particularly given the fact that DTD's 
are not structured in such a way to permit a validator to make that kind 
of a judgment ("if a <div> contains a <div> it must be the last element 
of the first <div>" or "if a <div> contains a <div> it may be preceded 
by a <p>, but not followed by one"). I'm not that great at deciphering 
DTDs, but I don't see anything in http://www.tei-c.org/P4X/DS.html which 
would cause me to believe that example 1 is not valid.

In this particular case, I suspect a bug in the validator program. I 
mean, writing validators is hard, and I am aware of at least one bug in 
the W3C's online HTML validator.

Supposedly, Xerces is a validating parser. Maybe I'll see if I can find 
the time to run the snippet through Xerces and see if (and where) it breaks.

From marcello at perathoner.de  Wed Aug 24 14:05:30 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 24 14:05:47 2005
Subject: [gutvol-d] re: unluckily for us
In-Reply-To: <1d4.42c952fa.303e261b@aol.com>
References: <1d4.42c952fa.303e261b@aol.com>
Message-ID: <430CE11A.5070301@perathoner.de>

Bowerbird@aol.com wrote:

> in z.m.l., anything surrounded by two or more blank lines
> is a paragraph.

Is a tennis court a paragraph?


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Aug 24 14:10:19 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 24 14:10:30 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol
In-Reply-To: <Pine.LNX.4.61.0508241546230.16202@aphrodite.gnu-designs.com>
References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com>
	<Pine.LNX.4.61.0508241546230.16202@aphrodite.gnu-designs.com>
Message-ID: <430CE23B.9080003@perathoner.de>

David A. Desrosiers wrote:

>     Improperly nested tags will never validate. You can't have a bare 
> string inside the <head> tag like that, and <head> isn't a child of 
> <div>, so that won't work either. After correcting those errors, it 
> validates fine.

You are in the wrong picture!

We are talking about TEI. You are talking about HTML.

What Joshua says is true. Go to the TEI-L archives and search for "div 
tessellation problem".


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Wed Aug 24 14:17:42 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 24 14:17:56 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
Message-ID: <e3.1a59842a.303e3df6@aol.com>

lee said:
>   Edit the .xml file with a simple text editor 
>    (beware Microsoft tools!) to add the line:
>    ?xml-stylesheet href="persistent.css" type="text/css"?
>   You can experiment by adding new styles to 'persistent.css' 
>    (don't forget to save the file and reload your browser after 
>    adding rules). For example, add 
>    "p { display:block; text-indent: 3em }" 
>    and all of a sudden you will get distinct, indented paragraphs 
>    (and some non-paragraphs will also become distinct and 
>    indented). Add "teiHeader { display: none }" and all the 
>    Gutenberg legal cruft, together with the metadata which is
>    typically only of interest to archivers, will disappear 
>    (it's still there, it's just not "in your face" anymore).

that is, in other words, if i tell it to use a stylesheet,
and then go and create that stylesheet, it will work.          :+)

i knew that anyway, but i guess it's good to be reminded.       ;+)

***

jeroen said:
>    You can render XML, using XSLT + CSS in Firefox and IE, 
>    for a small demo, look at
>    http://www.gutenberg.org/files/11335/11335-x/11335-x.xml.

yes, i should have mentioned jeroen's files work in firefox...
(not in safari.   but in firefox.)


>    Some have argued (with valid reasons) that 
>    the entire idea of TEI markup is broken, and 
>    have proposed systems in which the mark-up is 
>    separated from the text (stream of characters),
>    in such a way that multiple, parallel systems of 
>    mark-up can exist. Think of a separate (part of a) file, 
>    saying characters 21 to 34 are italics, and so on. 
>    This may sound odd, but it is the way the old 
>    Macintosh wordprocessor MacWrite worked.

actually, that's the way the underlying _editfield_
of the (classic) mac operating system is structured.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/e9bdbde1/attachment.html
From marcello at perathoner.de  Wed Aug 24 14:30:02 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 24 14:30:14 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 24
In-Reply-To: <430CDD62.8090404@novomail.net>
References: <20050824195127.2F8FC8C905@pglaf.org>
	<430CDD62.8090404@novomail.net>
Message-ID: <430CE6DA.5050605@perathoner.de>

Lee Passey wrote:

> On the other hand, I don't see how Mr. Hutchinson's second example could 
> validate if the first does not, particularly given the fact that DTD's 
> are not structured in such a way to permit a validator to make that kind 
> of a judgment ("if a <div> contains a <div> it must be the last element 
> of the first <div>" or "if a <div> contains a <div> it may be preceded 
> by a <p>, but not followed by one").

This simple declaration does exactly that:

   <!ELEMENT div (p*, div*)>

"A div may contain zero or more p followed by zero or more div."


> In this particular case, I suspect a bug in the validator program. I 
> mean, writing validators is hard, and I am aware of at least one bug in 
> the W3C's online HTML validator.

No bug. The TEI dtd is broken as designed.


> Supposedly, Xerces is a validating parser. Maybe I'll see if I can find 
> the time to run the snippet through Xerces and see if (and where) it 
> breaks.

Get libxml2 from xmlsoft.org and use xmllint.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Wed Aug 24 14:38:47 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 24 14:38:59 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <e3.1a59842a.303e3df6@aol.com>
References: <e3.1a59842a.303e3df6@aol.com>
Message-ID: <908745031.20050824153847@noring.name>

Bowerbird wrote:
>  jeroen said:

>>?Some have argued (with valid reasons) that the entire idea of TEI
>> markup is broken, and have proposed systems in which the mark-up is
>> separated from the text (stream of characters), in such a way that
>> multiple, parallel systems of  mark-up can exist. Think of a separate
>> (part of a) file, saying characters 21 to 34 are italics, and so on.
>>?This may sound odd, but it is the way the old Macintosh wordprocessor
>> MacWrite worked.

> actually, that's the way the underlying _editfield_
> of the (classic) mac operating system is structured.

I was told by someone (who I think is in the know) that the idea of
separating markup from content (by having layers) was first proposed
years ago by Ted Nelson of "Project Xanadu" fame.

I recall asking Dr. Stephen DeRose at Brown University (one of the
world's leading electronic document experts) about Ted Nelson's
proposal and how it compares with SGML/XML markup. Dr. DeRose's
reply was essentially that layering has some obvious advantages (i.e.,
easier to represent non-hierarchical structures), but that there were
a lot of real world disadvantages as well. In the early days, before
SGML, the researchers were exploring all kinds of avenues, and nearly
all of them moved in the direction of direct markup rather than Ted
Nelson's layering. Of course, one wonders if the dynamics have changed
enough that revisiting the issue would yield a different result. Can't
answer that, but other than being able to non-hierarchically "markup"
documents with layering, I do not see any compelling advantages --
there'd have to be some whole new killer application which requires
such layering to work properly, and I've not seen such an application
arise the last few years..

(It is possible in XML to do some non-hierarchical markup using empty
"milemarkers" with ID/IDREF pairs. But one would have to build a
special application to read such documents -- that's no different than
building an application to process the "layer" approach. An example of
non-hierarchical documents is the modern Bible, where verses can cross
sentence and even paragraph boundaries. So one has the choice in
SGML/XML of marking it up by chapter/paragraphs, and put in verse
"milemarkers", or the opposite. Most would agree that one applies
hierarchical markup to document structure (paragraphs), and then add
milemarkers to locate the start of a new verse.)

Jon
  

From jon at noring.name  Wed Aug 24 15:05:05 2005
From: jon at noring.name (Jon Noring)
Date: Wed Aug 24 15:05:18 2005
Subject: [gutvol-d] on viewing the .pgtei file directly
In-Reply-To: <1c6.2f5def07.303e19b8@aol.com>
References: <1c6.2f5def07.303e19b8@aol.com>
Message-ID: <1859002177.20050824160505@noring.name>

Bowerbird wrote:

> why go through the pain of conversion if you don't have to?
> and the .pgtei file is the one with all the information in it, not?
> might as well view that, rather than some pale conversion...
>
> but let's get real here for a minute, ok?
>
> if the only people who can view the .pgtei file directly
> are the few who happen to be using a specific browser,
> there's no need to put a lot of resources in that direction.

Browsers are slowly moving towards better CSS2 and even CSS3 support
(this is a major component of what is called 'web standards'). So
things are not fixed in the browser arena. Overall Firefox and Opera 8
have the best web standards support, but they're not yet 100% (note
that Haakon Wium Lie, the CEO of Opera, is one of the principal
players in W3C's CSS development.) IE6 is way behind. It is unknown
how much better IE7 will be. It doesn't really matter -- Firefox and
Opera are plowing ahead, and continue to gain market share across
platforms.


> no sir.?  what lee is _really_ talking about is "openreader",
> which he has begun programming.?  (you _have_ begun,
> haven't you, lee??  because there's no time like the present.)

Lee is the chair of the OpenReader Development Working Group, which is
now working on "Orca", the name we've given to the OpenReader "user
agent". I'm not sure where Lee and the WG are at present, although as
you are probably aware there's not been much public activity this
Summer on the WG list. Summer is usually a slow time in standards and
development work. Good thing the principals of OpenReader don't live
in Norway. In Norway, everything shuts down for two to three months in
the summer, and understandably so. <smile/>

Btw, David Teller in France is working on an OEBPS "browser", which
could also be used to render OpenReader publications. So there are
parallel efforts, which is good!


> because, you see, a specialized e-book viewer-program
> (like openreader) can deliver an e-book experience that
> _far_surpasses_ the one that an end-user gets in a browser.

That's the plan for OpenReader. Refer to the interim OR site at:

   http://www.openreader.org/

Also, refer to the page where we discuss the freedom that OR gives us
with respect to rendering. We are no longer constrained by the web
browser/HTML paradigm:

   http://www.openreader.org/browsers.html

A lot of this "freeing" up comes from OEBPS. The OEBPS "out-of-spine"
construct is proving itself to be a powerful feature. There is a
reason why I continue to discuss the TEI <note> tag as I do. Since
OpenReader will handle "out-of-spine" content (in Orca via "Booklets"),
we automatically have a way to beautifully handle inline TEI <note>
-- just view it in an optional popup window (or other mechanism).
This is one reason why it will be *easier* for OpenReader to support
TEI than it would be for current web browsers since Orca and any
other OR user agent *has* to handle OEBPS "out-of-spine" content.

Interestingly, XHTML 2.0 also plans to introduce a new attribute
which is similar (and actually more powerful) than TEI's <note>, and
it will force web browsers to render such marked-up content to be
displayed outside the main flow of the text. I wonder how Opera and
Firefox will do it? <smile/>

For the proposal in XHTML 2.0, refer to:

   http://www.w3.org/TR/xhtml2/mod-role.html#s_rolemodule

Look at the 'role' attribute (i.e., 'role="note"'). It is part of
the Common attributes collection.

With this attribute, one can make just about any tag become a note
(annotation, parenthetical content, etc.) In my private chat with
Stephen Pemberton, the chair of the XHTML working group, it is
intended for web browsers to somehow display to the end-user the
content within 'role="note"'. Thus, it appears to not be too different
from TEI <note>.


> and _that_ is the reason why people would want to view a
> .pgtei file directly, rather than look at an .html conversion;
> not because of the files per se -- it's silly to think end-users
> care anything about formats -- but because of the _viewer_
> and the e-book _experience_ that was delivered therein...

The format is integral to the reading experience. One can't really
separate them. But the format does come first. It needs to be
intelligently designed so as to allow the greatest reading experience,
among other things. Fortunately OEBPS has done a lot of the work
already.

Regarding TEI, we at OpenReader are definitely interested, at some
future time, to support TEI in some fashion. The specifics have yet to
be resolved. What Lee and I are doing is *learning* about TEI with
respect to utilization in ebook presentation. This means we need to
learn the vocabulary, learn its limitations and advantages, see how
it relates to XHTML/OEBPS, look at direct rendering issues (incl.
CSS support), etc. To get a grasp of the major issues. Lee is doing
it his way, I'm doing it my way. Obviously, PG-TEI is of interest to
us since it is pretty much the TEI implementation closest to our
interests.


> so, if lee can deliver an openreader that is _cross-platform_
> and runs on _older_hardware_, using _minimal_resources_,
> and can render the .pgtei file directly, giving the end-user a 
> powerful e-book experience, all from a free-beer program,
> no one will use their funky web-browser to read an e-text...

Define "older" hardware.

Our determination is that very old hardware, and low-power hardware,
such as older PDAs, simply don't have the horsepower required to
deliver a nice digital publication (ebook) reading experience. We are
focusing on the future, thus the OR format and Orca (which is intended
to be a reference implementation, a demo if you prefer) is going to
draw the line somewhere with legacy hardware support. I doubt Orca
will be developed to be compiled on older Macs (only OS X), but this
doesn't prevent someone else from building their own OpenReader user
agent to run on whatever platform(s) they desire. We view Orca to be
similar to the early days of the web, when Mosaic was developed to be
a reference implementation of an HTML user agent. Mosaic launched the
web, and over time there's been dozens of web browsers developed.
Notice that Mosaic doesn't even exist any more.

For more info on OpenReader legacy support, refer to:

   http://www.openreader.org/macpalm.html

There we talk about Mac (a little) and Palm support. We do note talk
about older Mac (pre OS X) support. Whether Orca supports older Mac or
not sort of depends upon the final architecture of the code base. We
don't deem it important for Orca to support pre OS X Macs (sorry!)


> so let's wish lee success in his endeavor,
> for the ultimate good of all the end-users...

We plan that a group of programmers, working together, will develop
Orca. The kudos, should it happen, will go out to all those who
contribute. It *should* be a team effort. Of course, if any developer
here is interested in helping develop Orca, contact Lee.

Jon Noring
OpenReader Consortium


From Bowerbird at aol.com  Wed Aug 24 16:20:45 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 24 16:21:03 2005
Subject: [gutvol-d] on viewing the .pgtei file directly
Message-ID: <2b.79e7bacb.303e5acd@aol.com>

hey jon noring, as long as you're still moderating michael hart
over on your listserve, i'm still declining to talk with you here...

but please, do have a nice day anyway...            :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/a8eaf5ac/attachment.html
From lee at novomail.net  Wed Aug 24 16:48:11 2005
From: lee at novomail.net (Lee Passey)
Date: Wed Aug 24 16:48:24 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 25)
In-Reply-To: <20050824204954.02BEC8C906@pglaf.org>
References: <20050824204954.02BEC8C906@pglaf.org>
Message-ID: <430D073B.30104@novomail.net>

Jeroen Hellingman wrote:

[snip]

> Some have argued (with valid reasons) that the entire idea of TEI 
> markup is broken, and have proposed systems
> in which the mark-up is separated from the text (stream of 
> characters), in such a way that multiple, parallel systems of
> mark-up can exist. Think of a separate (part of a) file, saying 
> characters 21 to 34 are italics, and so on. This may sound
> odd, but it is the way the old Macintosh wordprocessor MacWrite worked.
>
> Jeroen.


This is also the way HTML Tidy works, internally. As an HTML file is 
parsed (and fixed, if necessary) a DOM tree is built. But when a text 
node is encountered rather than malloc'ing a potentially small amount of 
memory and storing a pointer, the text is copied into a pre-allocated 
text buffer, and the start and end points of the fragment are saved in 
the node structure (the start and end points are actually saved in every 
node, so you can grab any node in the tree and know that it encompasses 
"this much" of the actual text.) When 'pretty-printing' the tree, text 
is grabbed from the buffer as needed.

Having created this structure in memory, there is no reason at all it 
couldn't be saved out separately, with text nodes simply referring to an 
offset and length in a separate file which receives the entire text 
buffer, or a separate segment in the same file that contains the text.

Likewise, if someone wanted, hypothetically mind you, to write a set of 
annotations and footnotes to classic literature found at Gutenberg, the 
same sort of strategy could be used; the annotations would be in a 
separate file and refer to text at a certain offset in the base file. 
You'd have to write a small application to merge the two files for 
presentations, but that sort of thing is trivial, perfectly suited for 
perl, awk or python.

This type of division between markup and content is also perfectly 
suited to writing an application to display e-books in a low memory/low 
power device. The DOM tree could quickly be loaded into memory and 
remain resident, permitting fast navigation and styling, but the actual 
text could remain in static storage, only being accessed when needed.

One of the downsides to this sort of system is that the base content 
_must_ remain 1. accessible and 2. inviolate. The Gutenberg edition of 
_The Adventures of Sherlock Holmes_ was first released in 1999 and has 
gone through 12 revisions, the most recent being in 2002. Version 10 is 
still available at gutenberg.org, but I can't find any earlier versions 
(this is not a criticism; PG is not an archive, after all). So if I were 
to write an annotation designed to be overlayed over the PG text I would 
want to have some assurances that the base text were always available, 
or I would want to be sure that the base text was always physically 
attached to the annotation file (to the extent that anything digital can 
be said to be physical). If I were to write a separate HTML markup file 
for TAOSH, I would want some assurance that the base text would not be 
altered in any way which would change the position of any character in 
the file, otherwise my markup would break.

So there are definitely problems with this sort of application, but 
there are real benefits too, in some circumstances.

From sly at victoria.tc.ca  Wed Aug 24 17:11:00 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Aug 24 17:11:16 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 25)
In-Reply-To: <430D073B.30104@novomail.net>
References: <20050824204954.02BEC8C906@pglaf.org> <430D073B.30104@novomail.net>
Message-ID: <Pine.GSO.4.58.0508241703330.13923@vtn1.victoria.tc.ca>


On Wed, 24 Aug 2005, Lee Passey wrote:

> _must_ remain 1. accessible and 2. inviolate. The Gutenberg edition of
> _The Adventures of Sherlock Holmes_ was first released in 1999 and has
> gone through 12 revisions, the most recent being in 2002. Version 10 is
> still available at gutenberg.org, but I can't find any earlier versions
> (this is not a criticism; PG is not an archive, after all).

A brief explanation here of the historical edition numbering of
PG texts. Every text was released initially in a version "10"
(Think of that as 1.0) And then subsequent "editions" would be
numbered 11, 12, etc. If you look hard enough in the pre-10,000
files, you can find a couple of exceptions, but that will
cover most cases. Also note that the consensus that emerged
was that a small number of minor corrections could be made
without increasing the edition number.


Andrew
From Bowerbird at aol.com  Wed Aug 24 17:49:40 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 24 17:50:01 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 25)
Message-ID: <13d.1a0b3c0e.303e6fa4@aol.com>

andrew said:
>    A brief explanation here of 
>    the historical edition numbering of PG texts. 
>    Every text was released initially in a version "10"
>    (Think of that as 1.0) And then subsequent "editions" 
>    would be numbered 11, 12, etc. If you look hard enough 
>    in the pre-10,000 files, you can find a couple of exceptions, 
>    but that will cover most cases.   Also note that the consensus 
>    that emerged was that a small number of minor corrections 
>    could be made without increasing the edition number.

it is this last "consensus that emerged" that is most troublesome.

as long as the filename stays unique for each different version,
at least people can depend on the name to identify the version.

when you substitute in a different file without changing its name,
you've introduced unnecessary ambiguity into the situation, and
thus made it extremely difficult for people to keep track of things.

and when you fail to provide changelogs on the entire process,
the difficulty-factor starts to climb into the "impossible" range...

apologies for spoiling your fun...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/dd6e3949/attachment.html
From grythumn at gmail.com  Wed Aug 24 21:08:14 2005
From: grythumn at gmail.com (Robert Cicconetti)
Date: Wed Aug 24 21:15:27 2005
Subject: [gutvol-d] on viewing the .pgtei file directly (gutvol-d Digest,
	Vol 13, Issue 23)
In-Reply-To: <430CD118.4060500@novomail.net>
References: <20050824190003.BA8CD8C8E8@pglaf.org>
	<430CD118.4060500@novomail.net>
Message-ID: <15cfa2a505082421087bab263e@mail.gmail.com>

On 8/24/05, Lee Passey <lee@novomail.net> wrote:
> Unfortunately, I haven't been able to figure out how to tell Firefox how
> to use a user specified css file (I've got version 1.0.4). If anyone can
> enlighten me on this score, I would be most grateful. Mr. Noring tells
> me that I can do it with Opera, but I've yet to try it.

Put it in userContent.css?

http://www.mozilla.org/support/firefox/edit#content

R C
From sly at victoria.tc.ca  Thu Aug 25 00:25:09 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Aug 25 00:25:30 2005
Subject: [gutvol-d] TEI markup question
Message-ID: <Pine.GSO.4.58.0508250024010.26981@vtn1.victoria.tc.ca>

Here is a question for those interested in TEI markup.
(non-constuctive answers will be ignored)

Has anyone here had experience in marking up a passage
which contains a couple of quoted lines of verse that
clearly occur within a paragraph?

Here is a particular example (from anne11.txt):

== Begin Excerpt ==
The cows swung placidly down the lane, and Anne followed them
dreamily, repeating aloud the battle canto from MARMION--which
had also been part of their English course the preceding winter
and which Miss Stacy had made them learn off by heart--and
exulting in its rushing lines and the clash of spears in its
imagery.  When she came to the lines

             The stubborn spearsmen still made good
             Their dark impenetrable wood,

she stopped in ecstasy to shut her eyes that she might the better
fancy herself one of that heroic ring.  When she opened them
again it was to behold Diana coming through the gate that led
into the Barry field and looking so important that Anne instantly
divined there was news to be told.  But betray too eager
curiosity she would not.

== End Excerpt ==

The first reaction, which I have seen done before, is
to use a <p> then a <lg> then a <p>. This is not an ideal
solution, (as I am sure Lee would not hesitate to point out).
>From a semantic point of view the section beginning "she
stopped in ecstasy to shut her eyes" is not structually
a complete paragraph. From a presentational point of
view, you are likely to get an undesirable styling on
that last element if someone decides to have initial
indentation on all paragraphs in the document.


According to the PGTEI dtd, is it valid to have a <lg>
within a <p>?

If not, I don't suppose we can lable a <p> as Initial, Medial
or Final as with the <l> element...

Andrew
From gsmith at nc.rr.com  Wed Aug 24 17:32:11 2005
From: gsmith at nc.rr.com (Greg Smith)
Date: Thu Aug 25 00:31:37 2005
Subject: [gutvol-d] newbie question: copyright editions of public domain
	works
Message-ID: <1124929931.3443.13.camel@localhost.localdomain>

I inherited from my grandfather 25 years ago 4 collections from his
library.  He was a Baptist minister (read poor).

The first collection is the 11th edition of the Encyclopedia Brittanica
published 1910-1911.  The other three collections are editions of works
that were public domain at the time of the edition (eg `Best Known
Works: Defoe').  The copyright dates range from the mid teens to the
early forties.

But what are these copyrights copyrighting?  I can understand editor
comments, translations, pictures, etc.  But what if spelling/grammar was
modernized?  Or, portions cut or moved?  Would these books (~120
volumes) be of any use?

Apologies for my ignorance,

Greg Smith

From sly at victoria.tc.ca  Thu Aug 25 00:45:15 2005
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Aug 25 00:45:35 2005
Subject: [gutvol-d] newbie question: copyright editions of public domain
	works
In-Reply-To: <1124929931.3443.13.camel@localhost.localdomain>
References: <1124929931.3443.13.camel@localhost.localdomain>
Message-ID: <Pine.GSO.4.58.0508250033110.6220@vtn1.victoria.tc.ca>

Hi Greg.

Please don't be afraid to ask questions...

First, the usual disclaimer. I am not a lawer--Copyright laws
are different in every country, and often in a state of flux
--etc.

Now here's my understanding of the topics you mention.
First of all, published books often claim copyright
on re-published material when it is really not
merited. However, for our purposes at PG, we can't
know for sure that editorial interventions such as
you've mentioned have not happened--without doing
a comparison with a proven public domain edition.
And if you have that PD edition availible anyway,
you may as well work from it.

(On a side note, I'll mention that I have occasionally
made some use of supposedly copyright imprints like
this before. For example if I'm reformatting a German
text to include in PG, and I get copyright clearnace
from some late 19th century edition with a hard-to-
read Fraktur font, then I may use a more recent edition
as a reference, or to do some spot-checks.)

Andrew

On Wed, 24 Aug 2005, Greg Smith wrote:

> I inherited from my grandfather 25 years ago 4 collections from his
> library.  He was a Baptist minister (read poor).
>
> The first collection is the 11th edition of the Encyclopedia Brittanica
> published 1910-1911.  The other three collections are editions of works
> that were public domain at the time of the edition (eg `Best Known
> Works: Defoe').  The copyright dates range from the mid teens to the
> early forties.
>
> But what are these copyrights copyrighting?  I can understand editor
> comments, translations, pictures, etc.  But what if spelling/grammar was
> modernized?  Or, portions cut or moved?  Would these books (~120
> volumes) be of any use?
>
> Apologies for my ignorance,
>
> Greg Smith
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From marcello at perathoner.de  Thu Aug 25 05:04:31 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Aug 25 05:04:39 2005
Subject: [gutvol-d] TEI markup question
In-Reply-To: <Pine.GSO.4.58.0508250024010.26981@vtn1.victoria.tc.ca>
References: <Pine.GSO.4.58.0508250024010.26981@vtn1.victoria.tc.ca>
Message-ID: <430DB3CF.90203@perathoner.de>

Andrew Sly wrote:

> Has anyone here had experience in marking up a passage
> which contains a couple of quoted lines of verse that
> clearly occur within a paragraph?

TEI has not the petty limitations of HTML. A TEI p can contain q, list, 
table, figure, text. (This is also the reason why XSL transformation 
from TEI to HTML is hard. An HTML p may not contain blockquote, ul, ol, 
dl, table.)

Mark it up straight like this:

-------------------
<p>The cows swung placidly down the lane, and Anne followed them
dreamily, repeating aloud the battle canto from 
<title>Marmion</title>&mdash;which
had also been part of their English course the preceding winter
and which Miss Stacy had made them learn off by heart&mdash;and
exulting in its rushing lines and the clash of spears in its
imagery.  When she came to the lines

<quote rend="display">
<lg>
<l>The stubborn spearsmen still made good</l>
<l>Their dark impenetrable wood,</l>
</lg>
</quote>

she stopped in ecstasy to shut her eyes that she might the better
fancy herself one of that heroic ring.  When she opened them
again it was to behold Diana coming through the gate that led
into the Barry field and looking so important that Anne instantly
divined there was news to be told.  But betray too eager
curiosity she would not.</p>
----------------------

You may also use <q> instead of <quote>. But <quote> is more correct if 
it references a published work.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Thu Aug 25 05:05:40 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Aug 25 05:05:41 2005
Subject: [gutvol-d] TEI markup question
Message-ID: <20050825120540.12C7C1098E9@ws6-4.us4.outblaze.com>

I don't remember if you can place a <lg> within a <p> block, but ...

You can put a rend="noindent" on the second paragraph to insure that it won't ever have an indention.

As to what is the "right way to do it" ... I don't know.

Josh


----- Original Message -----
From: "Andrew Sly" <sly@victoria.tc.ca>
> 
> Here is a question for those interested in TEI markup.
> (non-constuctive answers will be ignored)
> 
> Has anyone here had experience in marking up a passage
> which contains a couple of quoted lines of verse that
> clearly occur within a paragraph?
> 
> Here is a particular example (from anne11.txt):
> 
> == Begin Excerpt ==
> The cows swung placidly down the lane, and Anne followed them
> dreamily, repeating aloud the battle canto from MARMION--which
> had also been part of their English course the preceding winter
> and which Miss Stacy had made them learn off by heart--and
> exulting in its rushing lines and the clash of spears in its
> imagery.  When she came to the lines
> 
>               The stubborn spearsmen still made good
>               Their dark impenetrable wood,
> 
> she stopped in ecstasy to shut her eyes that she might the better
> fancy herself one of that heroic ring.  When she opened them
> again it was to behold Diana coming through the gate that led
> into the Barry field and looking so important that Anne instantly
> divined there was news to be told.  But betray too eager
> curiosity she would not.
> 
> == End Excerpt ==
> 
> The first reaction, which I have seen done before, is
> to use a <p> then a <lg> then a <p>. This is not an ideal
> solution, (as I am sure Lee would not hesitate to point out).
> > From a semantic point of view the section beginning "she
> stopped in ecstasy to shut her eyes" is not structually
> a complete paragraph. From a presentational point of
> view, you are likely to get an undesirable styling on
> that last element if someone decides to have initial
> indentation on all paragraphs in the document.
> 
> 
> According to the PGTEI dtd, is it valid to have a <lg>
> within a <p>?
> 
> If not, I don't suppose we can lable a <p> as Initial, Medial
> or Final as with the <l> element...
> 
> Andrew
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d

From brad at chenla.org  Thu Aug 25 07:47:30 2005
From: brad at chenla.org (Brad Collins)
Date: Thu Aug 25 07:48:36 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <908745031.20050824153847@noring.name> (Jon Noring's message of
	"Wed, 24 Aug 2005 15:38:47 -0600")
References: <e3.1a59842a.303e3df6@aol.com>
	<908745031.20050824153847@noring.name>
Message-ID: <pss2f4zh.fsf@chenla.org>

Jon Noring <jon@noring.name> writes:

> Bowerbird wrote:
>>  jeroen said:
>
>>>?Some have argued (with valid reasons) that the entire idea of TEI
>>> markup is broken, and have proposed systems in which the mark-up is
>>> separated from the text (stream of characters), in such a way that
>>> multiple, parallel systems of  mark-up can exist. Think of a separate
>>> (part of a) file, saying characters 21 to 34 are italics, and so on.
>>>?This may sound odd, but it is the way the old Macintosh wordprocessor
>>> MacWrite worked.

About two years ago I was playing around with the same idea.  My
solution was to take a CSS approach to layering.

CSS places an external layer of formating instructions on top of a
text, so why not extend CSS to also be able to add layers of semantic
markup to a text?

This would make it easy to add semantic markup including glosses,
notes, comments (scholia) etc to a text, even it the text is located
on a server somewhere on the Net.

The folks doing the Hypereal Dictionary of Mathematics are creating a
scholia system based on Emacs text properties to add layers of scholia
to texts.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From jon at noring.name  Thu Aug 25 07:59:22 2005
From: jon at noring.name (Jon Noring)
Date: Thu Aug 25 07:59:28 2005
Subject: [gutvol-d] TEI markup question
In-Reply-To: <430DB3CF.90203@perathoner.de>
References: <Pine.GSO.4.58.0508250024010.26981@vtn1.victoria.tc.ca>
	<430DB3CF.90203@perathoner.de>
Message-ID: <115134081.20050825085922@noring.name>

Marcello wrote:
> Andrew Sly wrote:

>> Has anyone here had experience in marking up a passage
>> which contains a couple of quoted lines of verse that
>> clearly occur within a paragraph?

> TEI has not the petty limitations of HTML. A TEI p can contain q, list,
> table, figure, text. (This is also the reason why XSL transformation
> from TEI to HTML is hard. An HTML p may not contain blockquote, ul, ol,
> dl, table.)
>
> Mark it up straight like this:
>
> -------------------
> <p>The cows swung placidly down the lane, and Anne followed them
> dreamily, repeating aloud the battle canto from 
> <title>Marmion</title>&mdash;which
> had also been part of their English course the preceding winter
> and which Miss Stacy had made them learn off by heart&mdash;and
> exulting in its rushing lines and the clash of spears in its
> imagery.  When she came to the lines
>
> <quote rend="display">
> <lg>
> <l>The stubborn spearsmen still made good</l>
> <l>Their dark impenetrable wood,</l>
> </lg>
> </quote>
>
> she stopped in ecstasy to shut her eyes that she might the better
> fancy herself one of that heroic ring.  When she opened them
> again it was to behold Diana coming through the gate that led
> into the Barry field and looking so important that Anne instantly
> divined there was news to be told.  But betray too eager
> curiosity she would not.</p>
> ----------------------
>
> You may also use <q> instead of <quote>. But <quote> is more correct if
> it references a published work.


My first thought, as what I always do when encountering text to markup,
is to understand the presentation-agnostic *structure* and/or *semantics*
of what I'm seeing in the typography. The two lines in the example forms
a structural block of a certain kind. The question now is *what* does
this block represent. Well, for starters, it is a snippet of verse
that appears within a paragraph but obviously is intended to be
autonomous to the paragraph (rather than just being quoted inline as
is often done.) But what kind of verse, or where does it come from?

Marcello identifies (and it appears correct to me) as a <quote>, so that
works for me (need to study up more on the TEI <quote> tag.) A <lg> just
doesn't have enough semantic meaning (without adding an attribute) to let
it be a child of the <p> tag. The TEI authors apparently recognize this
and allow only certain "block-level" tags within <p>, so the markup wonk
has to somehow fit the "in paragraph" block they encounter into one of
those allowed in TEI. (Another area to study -- the elements allowed as
children of the TEI <p> tag -- looking at the full flattened DTD now,
which I'll append at the end.)

*****

Regarding the limitation of HTML <p>, yes that is frustrating. There
are block-level tags (actually tags that can be either block or
inline, among them <del>, <ins>, <script>, and <noscript>) that can be
placed within <p>, but none of them are *meant* to be used for what
we'd like and should not be used that way. One could take advantage of
the flexible <div> tag (such as <div class="paragraph">) to contain both
inline and block-level stuff and use that instead of <p> in this case.
But this is also a kludge which, without a CSS stylesheet, does not
lead to proper rendering of the paragraph PCDATA portion -- but if one
doesn't care about non-CSS rendering for a particular document, then it
would be what I'd do, and state in a comment within the document that
<div class="paragraph"> is intended to be identical to <p>. (Note:
HTML was never designed to represent the more complex structures we
find in books.)

XHTML 2.0 is planning to remove the limitation of <p> and allow it to
contain some block-level stuff in mixed fashion with PCDATA (it will
allow lists, blockquote and the new blockcode, tables and <pre>.) On a
related matter, XHTML is even planning to chuck the <br /> tag (which is
problematic -- which to me is almost always a lazy-person's tag so they
don't have to markup the full structure) and replace it with the inline
<l> tag, which means a line, so one may have in XHTML 2.0:

   <!-- XHTML 2.0 example -->
   <p>Some text.
      <l>A line.</l>
      <l>A second line.</l>
   Some more text.</p>

(Refer to http://www.w3.org/TR/xhtml2/mod-text.html#s_textmodule
for what the XHTML people have in mind for the <l> tag for XHTML 2.0.)

The XHTML 2.0 folk are aware of the limitations of HTML (up to XHTML
1.1) with respect to document structure and are improving it. XHTML
2.0 is not intended to "compete" with TEI, but certainly will make it
possible for XHTML to be a little more compatible with TEI when
mapping between the two. The ability to map inline TEI <note> to XHTML
2.0 is particularly of interest.

(Btw, looking at XHTML 2.0 draft, I notice that for the block lists
of <dl>, the new <nl> (for a navigational list!), <ol> and <ul>, that
it now has facility to add a <label> tag, similar to the TEI <head>
used within lists.)


Jon


(p.s., I used TEI Pizza Chef to generate a flattened DTD for the
complete P4X. At least I think it is complete based on checking the
various options at Pizza Chef. Here's the content model for the TEI
<p> element -- notice it does not allow <lg> as a child:

<!ELEMENT p
        (#PCDATA | abbr | address | date | dateRange | dateStruct 
        | expan | geogName | lang | measure | name | num | orgName 
        | persName | placeName | rs | time | timeRange | timeStruct 
        | add | app | corr | damage | del | orig | reg | restore 
        | sic | space | supplied | unclear | oRef | oVar | pRef 
        | pVar | formula | handShift | distinct | emph | foreign 
        | gloss | hi | mentioned | soCalled | term | title | ptr 
        | ref | xptr | xref | caesura | c | cl | m | phr | s | seg 
        | w | bibl | biblFull | biblStruct | castList | cit | q 
        | quote | label | list | listBibl | note | witDetail | stage 
        | camera | caption | move | sound | tech | view | table 
        | text | anchor | addSpan | delSpan | gap | figure | alt 
        | altGrp | certainty | fLib | fs | fsLib | fvLib | index 
        | interp | interpGrp | join | joinGrp | link | linkGrp | 
        respons | span | spanGrp | timeline | cb | fw | lb | milestone 
        | pb)* >

<!-- And for completeness, the ATTLIST for <p> -->

<!ATTLIST p 
        group CDATA #IMPLIED
        grpPtr IDREF #IMPLIED
        depend CDATA #IMPLIED
        depPtr IDREF #IMPLIED
        corresp IDREFS #IMPLIED
        synch IDREFS #IMPLIED
        sameAs IDREF #IMPLIED
        copyOf IDREF #IMPLIED
        next IDREF #IMPLIED
        prev IDREF #IMPLIED
        exclude IDREFS #IMPLIED
        select IDREFS #IMPLIED
        ana IDREFS #IMPLIED
        id ID #IMPLIED
        n CDATA #IMPLIED
        lang IDREF #IMPLIED
        rend CDATA #IMPLIED
        TEIform CDATA "p" >

)

From ag737 at freenet.carleton.ca  Thu Aug 25 08:03:58 2005
From: ag737 at freenet.carleton.ca (Wallace J.McLean)
Date: Thu Aug 25 08:04:52 2005
Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 27
Message-ID: <9fe769dfed.9dfed9fe76@ncf.ca>


> The other three collections are editions of works
> that were public domain at the time of the edition (eg `Best Known
> Works: Defoe').

Other than any new material, arguably editorial changes, arguably (or 
certainly, in some jurisdictions) typographical arrangement, copyright 
would subsist in the compilation alone of a "Best Known Works" type 
dealie.
-------------- next part --------------
Today's Topics:

   1. Re: ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
      Digest, Vol 13, Issue 25) (Andrew Sly)
   2. re: ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
      Digest, Vol 13, Issue 25) (Bowerbird@aol.com)
   3. Re: on viewing the .pgtei file directly (gutvol-d Digest,	Vol
      13, Issue 23) (Robert Cicconetti)
   4. TEI markup question (Andrew Sly)
   5. newbie question: copyright editions of public domain	works
      (Greg Smith)
   6. Re: newbie question: copyright editions of public domain
      works (Andrew Sly)
   7. Re: TEI markup question (Marcello Perathoner)
   8. Re: TEI markup question (Joshua Hutchinson)
   9. Re: ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
      Digest, Vol 13, Issue 19) (Brad Collins)
  10. Re: TEI markup question (Jon Noring)
-------------- next part --------------
Skipped content of type multipart/digest-------------- next part --------------
_______________________________________________
gutvol-d mailing list
gutvol-d@lists.pglaf.org
http://lists.pglaf.org/listinfo.cgi/gutvol-d
From jon at noring.name  Thu Aug 25 08:50:43 2005
From: jon at noring.name (Jon Noring)
Date: Thu Aug 25 08:50:48 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <pss2f4zh.fsf@chenla.org>
References: <e3.1a59842a.303e3df6@aol.com>
	<908745031.20050824153847@noring.name> <pss2f4zh.fsf@chenla.org>
Message-ID: <182290311.20050825095043@noring.name>

Brad Collins wrote:
>  jeroen said:

>>?Some have argued (with valid reasons) that the entire idea of TEI
>> markup is broken, and have proposed systems in which the mark-up is
>> separated from the text (stream of characters), in such a way that
>> multiple, parallel systems of  mark-up can exist. Think of a separate
>> (part of a) file, saying characters 21 to 34 are italics, and so on.
>>?This may sound odd, but it is the way the old Macintosh wordprocessor
>> MacWrite worked.

> About two years ago I was playing around with the same idea.  My
> solution was to take a CSS approach to layering.
>
> CSS places an external layer of formating instructions on top of a
> text, so why not extend CSS to also be able to add layers of semantic
> markup to a text?
>
> This would make it easy to add semantic markup including glosses,
> notes, comments (scholia) etc to a text, even it the text is located
> on a server somewhere on the Net.
>
> The folks doing the Hypereal Dictionary of Mathematics are creating a
> scholia system based on Emacs text properties to add layers of scholia
> to texts.

Interesting! I'll not comment directly on Brad's idea, but will talk
about a distantly related idea, which sort of intersects with what
Brad is talking about when we propose tweaking with CSS.

A couple years ago I floated the idea to both OeBF (as part of OEBPS
work) and to the accessibility folk (such as DAISY) that we explore a
better way a document author can assign structural semantics to the
tags in arbitrary XML documents.

A problem the accessibility people have when encountering an arbitrary
XML document (from an unknown vocabulary) is what do the tags mean
from a document structure viewpoint? A text-to-speech converter needs
to unambigiously know this to do an effective job at properly
conveying the content to the listener. An attached visual CSS style
sheet (standards conforming at least) is insufficient to communicate
the exact structures in such arbitrary XML documents.

So I proposed something called a "Rosetta Stone", which would be a
sort of attached document (probably XML) which describes the semantics
of the tags in the content document so the document structure can be
identified by machine processing. The RS may syntactically be based
upon XSLT, but it is not intended to be a markup transformation --
it is solely a way to assign semantics to elements so the user agent
(such as text-to-speech engine) can figure out what to do with them.

Key to the Rosetta Stone is setting up a universal "metavocabulary"
to describe common document structures. Now, I have no illusion this
will be easy -- it will not be easy -- it will be damn hard to do
right. Then there's the issue of the granularity of the metavocabulary
-- how fine with document structure does one go -- and what types of
documents will be targeted?

By and large CSS was not designed for the purpose of assigning
structural semantics to tags. CSS does have the 'display' property
which assigns, at a very rudimentary level, some critical structural
semantics (block, inline, table, list). But as we know, the allowed
'display' values are quite limited -- they don't, and in practical
sense cannot, assign some critical semantics such as hypertext links,
embedded images and objects (XLink is the vocabulary-agnostic solution
for these particular things.) There is no CSS 'display' property for
section headers, for example (in CSS, a header has to be treated as
simply a kind of "block-level" tag), yet it is clear for
text-to-speech that section headers be specifically identified as
such, and not lumped in with paragraphs.

Then there's the issue that CSS is intended for *styling* during
presentation (by and large visual styling). That is its purpose --
it's not designed to be a "Rosetta Stone" for conveying detailed
structural information.

I don't know if the "Rosetta Stone" idea is tractable, and will in the
long-run solve any real problems. In lieu of that, the accessibility
community, and I think anyone else using markup to structure texts,
would want all XML documents representing publications to conform with
particular, well-defined vocabularies which are marked up in an
acceptable structural, presentational agnostic manner. Properly done
TEI is one such acceptable vocabulary, the more I study it. The
accessibility folk have proposed their own, Digital Talking Book,
which is essentially XHTML with some interesting TEI-like extensions.

(Just about any markup vocabulary can be abused/misused to make it
more difficult to convey the structural/semantic meaning of the
content. Even TEI -- this is why I'm interested in subsetting and
constraining the TEI vocabulary to assure the marked up content will
be more accessible which includes presentation agnosticism.)

Jon


From lee at novomail.net  Thu Aug 25 10:49:59 2005
From: lee at novomail.net (Lee Passey)
Date: Thu Aug 25 10:50:08 2005
Subject: [gutvol-d] 
 Re: User style sheets in Firefox (gutvol-d Digest, Vol 13, Issue 27)
In-Reply-To: <20050825145930.C5D088C905@pglaf.org>
References: <20050825145930.C5D088C905@pglaf.org>
Message-ID: <430E04C7.8010705@novomail.net>

Robert Cicconetti <grythumn@gmail.com> wrote:

>On 8/24/05, Lee Passey <lee@novomail.net> wrote:
>  
>
>>Unfortunately, I haven't been able to figure out how to tell Firefox how
>>to use a user specified css file (I've got version 1.0.4). If anyone can
>>enlighten me on this score, I would be most grateful. Mr. Noring tells
>>me that I can do it with Opera, but I've yet to try it.
>>    
>>
>
>Put it in userContent.css?
>
>http://www.mozilla.org/support/firefox/edit#content
>
>R C
>  
>

I _thought_ that was what I had done, but at your urging I did some more 
experimentation. I finally discovered that on this particular box 
(Windows XP, which I ordinarily wouldn't use, but sometimes you have to 
compromise) the userContent.css file had to be in C:\Documents and 
Settings\lpassey\Application 
Data\Mozilla\Firefox\Profiles\0wlij0cb.default\chrome -- not the most 
intuitive location. Once I had found the right location it still didn't 
work until I had restarted Firefox, so apparently the .css file is 
loaded on startup, and not consulted thereafter -- great if you want to 
override HTML defaults, but not so good if you want to switch style 
sheets when looking at a particular document. And I haven't tested it 
yet but my guess is that this means it will _not_ override some of the 
screwy choices that some document authors choose to use. That is, if an 
HTML file links to "screwy.css" those style will be applied _after_ my 
preferences, not before.

And the userContent.css styles are only applied to HTML documents, not 
to XML documents.

Ah well, I guess our reach should always exceed our grasp, that's how 
progress is made.

Thanks for your help.
From greg at durendal.org  Thu Aug 25 11:44:46 2005
From: greg at durendal.org (Greg Weeks)
Date: Thu Aug 25 11:44:55 2005
Subject: [gutvol-d] 1950 periodicals renewals
In-Reply-To: <Pine.LNX.4.44.0508232119230.7430-100000@durendal.durendal.org>
Message-ID: <Pine.LNX.4.44.0508251441490.4580-100000@durendal.durendal.org>

On Tue, 23 Aug 2005, Greg Weeks wrote:

> I have photocopies now of 1951-1969 for periodicals renewals. I know books
> have already been done by DP.

http://durendal.org:8080/pdrn/

Has what I've got scanned so far. 1951-1952, 1954-1962 and parts of
others. I'll update this as I get the rest scanned. These will eventually
end up on John Okckerbloom's site as well.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From lee at novomail.net  Thu Aug 25 14:07:25 2005
From: lee at novomail.net (Lee Passey)
Date: Thu Aug 25 14:07:37 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 26)
In-Reply-To: <20050824234827.0530C8C8C9@pglaf.org>
References: <20050824234827.0530C8C8C9@pglaf.org>
Message-ID: <430E330D.4060205@novomail.net>

Marcello Perathoner <marcello@perathoner.de> wrote:

> Lee Passey wrote:
>
>> On the other hand, I don't see how Mr. Hutchinson's second example 
>> could validate if the first does not, particularly given the fact 
>> that DTD's are not structured in such a way to permit a validator to 
>> make that kind of a judgment ("if a <div> contains a <div> it must be 
>> the last element of the first <div>" or "if a <div> contains a <div> 
>> it may be preceded by a <p>, but not followed by one").
>
>
> This simple declaration does exactly that:
>
>   <!ELEMENT div (p*, div*)>
>
> "A div may contain zero or more p followed by zero or more div."


Well, I carefully decomposed the TEI DTD and discovered that you're 
absolutely right (but you knew that already, didn't you :-)). As I 
understand it, a <div> can contain just about any other element, but 
once you include another <div> you can't include anything else (almost).

What the hell were they thinking?

I don't see anything in the English spec that would have led me to this 
conclusion, and I can't think of any rationale why it should be this 
way. Is it possible that the DTD has incorrectly implemented the TEI 
spec? Or did the authors really intend this inane result? I have to 
admit, this requirement (and the fact that <div> is not allowed inside 
<p>) really makes me have second thoughts about the usefulness of TEI as 
an encoding (because it hinders you from making a level-one, incomplete, 
encoding).

I would really like to know what the rationale for this rule is.

>> In this particular case, I suspect a bug in the validator program. I 
>> mean, writing validators is hard, and I am aware of at least one bug 
>> in the W3C's online HTML validator.
>
>
> No bug. The TEI dtd is broken as designed.


Well, at least I was able to figure out that _something_ was broken.
From lee at novomail.net  Thu Aug 25 14:15:27 2005
From: lee at novomail.net (Lee Passey)
Date: Thu Aug 25 14:15:38 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13, Issue 24)
In-Reply-To: <20050824195127.2F8FC8C905@pglaf.org>
References: <20050824195127.2F8FC8C905@pglaf.org>
Message-ID: <430E34EF.6050308@novomail.net>

Bowerbird@aol.com wrote:

> lee said:
> >    Unluckily for us, Bowerbird has not yet
> >   released his algorithm for determining whether
> >   a block of unstructured text is a paragraph.
>
> oh gee, i'm sorry.  i thought i had.
>
> in z.m.l., anything surrounded by two or more blank lines
> is a paragraph.  


In z.m.l., anything surrounded by two or more blank lines is structured 
text.

When you can recognize a paragraph, and distinquish it from a title or a 
block quotation inside of a paragraph, no matter how many blank lines 
surround them, then I'll be impressed.

From collin at xs4all.nl  Thu Aug 25 14:50:24 2005
From: collin at xs4all.nl (Branko Collin)
Date: Thu Aug 25 14:34:38 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!
	(gutvol-d	Digest, Vol 13, Issue 26)
In-Reply-To: <430E330D.4060205@novomail.net>
References: <20050824234827.0530C8C8C9@pglaf.org>
Message-ID: <430E5940.3100.D3728D@localhost>

On 25 Aug 2005, at 15:07, Lee Passey wrote:
> Marcello Perathoner <marcello@perathoner.de> wrote:
> > Lee Passey wrote:
> >
> >> On the other hand, I don't see how Mr. Hutchinson's second example
> >> could validate if the first does not, particularly given the fact
> >> that DTD's are not structured in such a way to permit a validator
> >> to make that kind of a judgment ("if a <div> contains a <div> it
> >> must be the last element of the first <div>" or "if a <div>
> >> contains a <div> it may be preceded by a <p>, but not followed by
> >> one").
> >
> > This simple declaration does exactly that:
> >
> >   <!ELEMENT div (p*, div*)>
> >
> > "A div may contain zero or more p followed by zero or more div."

> I would really like to know what the rationale for this rule is.

There's been much discussion about this on the TEI-L, but not much 
resolution, as far as I can tell. 

Here's what Lou Burnard wrote in the "<p> and <divN>" thread: 

"There is a long tradition of embedding distinct narratives within an 
overarching framing narrative: as well as the Arabian nights, we 
could cite Bocaccio, Chaucer etc. I continue, stubbornly, to think 
that the right way to deal with these is as embedded texts.

The one which my learned colleague Rahtz refers to is rather 
different: here we have a distinct paragraph-like object within a div 
which has the unusual property of itself containing paragraph-like 
objects, but which is not really a self-contained text. We could call 
it a paraDiv and maybe, if we can find more evidence, it should be 
admitted into P5."

It would seem that most often, a text (like a letter) included in 
another text would be marked up something like 
<q><text>...</text></q> or <ab><text>...</text></ab>.

The archives for TEI-L can be found at 
<http://listserv.brown.edu/archives/tei-l.html>. Just search for 
"div" among the thread names and you should find plenty discussion 
about this problem.

BTW, is this the sort of thing we should be discussing at gutvol-d? 
Wasn't there a PG-XML list or something?


-- 
branko collin
collin@xs4all.nl
From Bowerbird at aol.com  Thu Aug 25 15:08:10 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 15:08:33 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 24)
Message-ID: <1f6.10a59866.303f9b4a@aol.com>

i said:
>    in z.m.l., anything surrounded by two or more blank lines
>    is a paragraph.?

pardon me.   that's a goof.

anything preceded by _one_ or more blank lines
is considered to be a new paragraph in z.m.l.

i sometimes accidentally mix up "blank lines" and 
"line-endings".   _two_ consecutive line-endings
(i.e., _one_ blank line) delineates a new paragraph.

two blank lines constitutes a "thought break".

three or more blank lines constitutes a new section.
(three is more like a "subsection", while four marks
the lowest-level section, five the next-lowest, etc.)


>    In z.m.l., anything surrounded by 
>    two or more blank lines is structured text.

wait a minute!   _i_ make the rules for z.m.l., not you.      :+)


>    When you can recognize a paragraph, 

i can recognize a paragraph.
i just explained how, up above.


>    and distinquish it from a title

"distinquish" is a nice word.   what does it mean?
to distinguish and then squish?           :+)

i can recognize a title too, and thus "distinguish" it.
i just explained how, up above.


>    or a block quotation inside of a paragraph, 

what's so hard about that?

it's very easy to recognize the block quotation
-- because those things are indented in z.m.l. --
and if the paragraph above it was not terminated,
then the block quote is "inside" of it.

(well, due to how z.m.l. defines "a paragraph",
the block-quote itself is its own "paragraph"...
but there's no need to discuss these semantics.)

at any rate, the z.m.l. viewer-program is happy to
show you a listbox of all the paragraphs in the file,
nicely numbered and everything.   or it will show you
a list of all the words in the file, also nicely numbered.

and at the bottom of each page that it shows you,
it gives the character-numbers and line-numbers
of the range of text on that page.   so an end-user will
have an easy time quantifying exactly where they are.
but you know what?   very few of them ever have a need.


>    no matter how many blank lines surround them

in z.m.l., blank lines are the very thing that _define_
paragraphs -- and titles too.   so i am afraid that z.m.l.
will never "be able to" do what you are asking.   but...


>    then I'll be impressed.

...my goal is to render e-books properly, not impress you.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/ea4fbef2/attachment.html
From Bowerbird at aol.com  Thu Aug 25 15:13:15 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 15:13:33 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 26)
Message-ID: <c1.5f2f7987.303f9c7b@aol.com>

lee said:
>   What the hell were they thinking?

c'mon lee, don't spoil their fun...             ;+)

actually, the infighting on that one was pretty fierce.
and a good part of it was frontchannel and archived,
so you can read through every bit of the gory details.

(gee, according to branko, it's still going on!
well, you know how a good argument is,
you just can't stop talking about it.   maybe
you can go talk some sense into them, lee.)

-bowerbird

p.s.   and yes, branko, you're absolutely right,
this thread should be on the x.m.l. list, not here!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/1dbd42fc/attachment.html
From marcello at perathoner.de  Thu Aug 25 15:25:59 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Aug 25 15:26:22 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 26)
In-Reply-To: <430E330D.4060205@novomail.net>
References: <20050824234827.0530C8C8C9@pglaf.org>
	<430E330D.4060205@novomail.net>
Message-ID: <430E4577.9030704@perathoner.de>

Lee Passey wrote:

> Well, I carefully decomposed the TEI DTD and discovered that you're 
> absolutely right (but you knew that already, didn't you :-)). As I 
> understand it, a <div> can contain just about any other element, but 
> once you include another <div> you can't include anything else (almost).
> 
> What the hell were they thinking?
> 
> I don't see anything in the English spec that would have led me to this 
> conclusion, and I can't think of any rationale why it should be this 
> way. Is it possible that the DTD has incorrectly implemented the TEI 
> spec? Or did the authors really intend this inane result? I have to 
> admit, this requirement (and the fact that <div> is not allowed inside 
> <p>) really makes me have second thoughts about the usefulness of TEI as 
> an encoding (because it hinders you from making a level-one, incomplete, 
> encoding).

This "quirk" is intended and many a rebarbative mail has been written on 
the TEI-L mailing list defending or belittling this choice.

You may subscribe and start another thread. Maybe they'll get tired of 
explaining this thing to people and change it in the new TEI revision. 
(I've done this 2 years ago and it did me no good.)


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Aug 25 15:34:02 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Aug 25 15:34:17 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives!	(gutvol-d
	Digest, Vol 13, Issue 26)
In-Reply-To: <430E5940.3100.D3728D@localhost>
References: <20050824234827.0530C8C8C9@pglaf.org>
	<430E5940.3100.D3728D@localhost>
Message-ID: <430E475A.5040103@perathoner.de>

Branko Collin wrote:

> BTW, is this the sort of thing we should be discussing at gutvol-d? 
> Wasn't there a PG-XML list or something?

That list is for James Linden's markup language and is not hosted at 
pglaf.org.

We should use gutvol-p so we can get you-know-who moderated.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Aug 25 15:54:16 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 15:54:34 2005
Subject: [gutvol-d] re: the proof is in the tarts
Message-ID: <bb.5e61262b.303fa618@aol.com>

lee said:
>    Mr. Perathoner's TEI version of Alice in Wonderland
>    (http://www.gutenberg.org/tei/marcello/0.3/examples/alice/) 
>    looks like it is much more complex, 
>    and would be funner to play with. 
>    I really like _Alice_ as an experimental text 
>    because it has quite a few typographical oddities 
>    which make it a good test case.

alice is one of the most-frequently-used
e-texts for e-book demos.   maybe _the_ most.

i worked up a z.m.l. version a while back.

perhaps you'd like to create a version, too,
in whatever format you like (openreader?),
and we can do some real-world user testing?

the proof _is_ in the pudding, don't you know...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/70c3cb78/attachment.html
From jon at noring.name  Thu Aug 25 16:08:07 2005
From: jon at noring.name (Jon Noring)
Date: Thu Aug 25 16:08:19 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 26)
In-Reply-To: <430E330D.4060205@novomail.net>
References: <20050824234827.0530C8C8C9@pglaf.org>
	<430E330D.4060205@novomail.net>
Message-ID: <217099393.20050825170807@noring.name>

Lee Passey wrote:
> Marcello Perathoner wrote:
>> Lee Passey wrote:

>>> On the other hand, I don't see how Mr. Hutchinson's second example
>>> could validate if the first does not, particularly given the fact 
>>> that DTD's are not structured in such a way to permit a validator to
>>> make that kind of a judgment ("if a <div> contains a <div> it must be
>>> the last element of the first <div>" or "if a <div> contains a <div>
>>> it may be preceded by a <p>, but not followed by one").

>> This simple declaration does exactly that:
>>
>>   <!ELEMENT div (p*, div*)>
>>
>> "A div may contain zero or more p followed by zero or more div."

> Well, I carefully decomposed the TEI DTD and discovered that you're 
> absolutely right (but you knew that already, didn't you :-)). As I 
> understand it, a <div> can contain just about any other element, but
> once you include another <div> you can't include anything else (almost).
>
> What the hell were they thinking?

Hmmm, when I parsed the full content model for TEI <div> (which I
obtained from TEI Pizza Chef -- again a cool way to flatten the TEI
P4X DTD), removing all elements except for <div> and <p> (let's assume
that's all we're going to use in our little thought experiment here),
I get for the content model:

   <!ELEMENT div (div+ | (p+, div*))>

Compared to the one mentioned by Marcello, it is close to the same
one, but not exactly the same -- i.e., we cannot have <div></div> --
<div> has to contain something. But other than this small difference
it is otherwise logically the same, so Marcello's is easier to wrap
our minds around.)

Not sure, Lee, what you mean by your statement. What is interesting,
as noted by Lee in a prior message (see above), that one may not have
a <p> after a <div> within another <div>. I.e., this appears to be
invalid TEI P4X:

<div>
   <p>something</p>
   <div>...</div>
   <p>something else</p>
</div>

While this is valid:

<div>
  <p>something</p>
  <p>something else</p>
  <div>...</div>
</div> 

Last year I talked about Burton's Arabian Nights which structurally
and logically follows, in many places, the p-div-p pattern, to several
levels of nesting. So if I were to customize the TEI content model for
<div>, I would seriously allow a <p> after a child <div>. I don't see
a problem in doing this, but then I don't fully understand the quite
subtle explanations offered on TEI-L regarding this (especially as it
pertains to the Arabian Nights where stories are wrapped inside of
stories to several levels of nesting.)


> I don't see anything in the English spec that would have led me to this
> conclusion, and I can't think of any rationale why it should be this
> way. Is it possible that the DTD has incorrectly implemented the TEI
> spec? Or did the authors really intend this inane result? I have to 
> admit, this requirement (and the fact that <div> is not allowed inside
> <p>) really makes me have second thoughts about the usefulness of TEI as
> an encoding (because it hinders you from making a level-one, incomplete,
> encoding).
>
> I would really like to know what the rationale for this rule is.

I suggest that you repackage your inquiry and post it to TEI-L. That's
where the bulk of the TEI experts, including those involved with TEI
development, hang out. Sebastian Rahtz is one of the brilliant people
there who is interested in a more constrained TEI subset for ebook
use.

   http://www.lsoft.com/scripts/wl.exe?SL1=TEI-L&H=LISTSERV.BROWN.EDU

Maybe this time around the TEI mavens will be able to cogently explain
(at the highest abstract level) why one should not have a <p> after a
<div> within another <div>, but can have a <p> before the <div>. I'm
still quite perplexed, as it appears Marcello is as well.

Jon


From Gutenberg9443 at aol.com  Thu Aug 25 16:13:40 2005
From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com)
Date: Thu Aug 25 16:19:02 2005
Subject: [gutvol-d] Re: prank someone is pulling
Message-ID: <1a3.3a911bab.303faaa4@aol.com>

 
In a message dated 8/24/2005 2:16:34 PM Mountain Daylight Time,  
jon_niehof@yahoo.com writes:

>Hmm.  If someone faxed him Julius Caesar, I wonder* if he'd be
>mad at ol'  Will. Or Caesar.

>It annoys me no end that the good names of  volunteers are
>besmirched in this fashion (or via google spamming, as  posted
>earlier).

>(*but not enough to try it or recommend  someone else try)


Good question! I reassured him about three dozen times that nobody in  our 
organization did it, but I don't think he believed me for the first two and  a 
half dozen times.
 
Anne
 

Anne

Do you like to  breathe?
Then save the trees! 
Begin a personal relationship
with an  ebook 
TODAY!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/974b7465/attachment.html
From lee at novomail.net  Thu Aug 25 16:50:07 2005
From: lee at novomail.net (Lee Passey)
Date: Thu Aug 25 16:50:22 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13, Issue 30)
In-Reply-To: <20050825230821.2C34F8C915@pglaf.org>
References: <20050825230821.2C34F8C915@pglaf.org>
Message-ID: <430E592F.6030707@novomail.net>

Bowerbird@aol.com wrote:

> >   In z.m.l., anything surrounded by
> >   two or more blank lines is structured text.
>
> wait a minute!  _i_ make the rules for z.m.l., not you.     :+)


Sure, but you don't get to call a lemon purple, just because that's the 
way you see it.

So if you require that paragraphs in your markup language be segregated 
from the surrounding text by one, and only one, blank line you have 
forced a structure on the text. Hence, it is, by definition, a 
structured text.

[snip]

> in z.m.l., blank lines are the very thing that _define_
> paragraphs -- and titles too.  so i am afraid that z.m.l.
> will never "be able to" do what you are asking.  but...


Precisely my point. I was hoping that you were coming up with an 
algorithm that would identify _real_ paragraphs (as defined in the 
English language) inside unstructured text (human beings are pretty good 
at it, but even they are not infallible), and instead you have told me 
that you can detect blocks of random text which are delimited by blank 
lines (which we shall refer to as bns-paragraphs, for "bowerbird 
new-speak paragraphs") from inside of z.m.l.-structured text. The ?Book 
ebook reader can do that right now (PG simplified text looks really good 
in ?Book). I can't see why this particular wheel needs re-inventing.

From lee at novomail.net  Thu Aug 25 17:08:38 2005
From: lee at novomail.net (Lee Passey)
Date: Thu Aug 25 17:08:51 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 30)
In-Reply-To: <20050825230821.2C34F8C915@pglaf.org>
References: <20050825230821.2C34F8C915@pglaf.org>
Message-ID: <430E5D86.8030406@novomail.net>

Jon Noring <jon@noring.name> wrote:

>Lee Passey wrote:
>
>>I would really like to know what the rationale for this rule is.
>>    
>>
>I suggest that you repackage your inquiry and post it to TEI-L. That's
>where the bulk of the TEI experts, including those involved with TEI
>development, hang out. Sebastian Rahtz is one of the brilliant people
>there who is interested in a more constrained TEI subset for ebook
>use.
>
>   http://www.lsoft.com/scripts/wl.exe?SL1=TEI-L&H=LISTSERV.BROWN.EDU
>
>Maybe this time around the TEI mavens will be able to cogently explain
>(at the highest abstract level) why one should not have a <p> after a
><div> within another <div>, but can have a <p> before the <div>. I'm
>still quite perplexed, as it appears Marcello is as well.
>  
>

OK, so I guess I really don't want to know that badly. Everyone has to 
select from among those things they feel passsionate about, which ones 
they are going to spend time working one, and for me this is not one of 
those things. Besides, if they're not going to listen to Mr. Perathoner, 
or Mr. Collin, or even, apparently, Mr. Rahtz, one more random voice 
crying from the wilderness is not going to make much difference.

Besides, being the anarchist that I am, my solution would be just to 
grab the TEI dtds from tei-c.org, rename them to "pg<whatever>.dtd", 
replace the one offending comma with a space, and then validate to that 
instead.
From Bowerbird at aol.com  Thu Aug 25 17:20:21 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 17:20:40 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <fe.1a6301d8.303fba45@aol.com>

lee said:
>    Sure, but you don't get to call a lemon purple, 
>    just because that's the way you see it.

and if i _do_ call a lemon purple?   or you call it chartreuse?
who cares?   if it ain't important in rendering the e-book,
it makes absolutely no difference what color anyone calls it.


>    I can't see why this particular wheel needs re-inventing.

and your obsession with paragraphs strikes me as being
as unimportant and trivial as the number of angels that
can dance on the head of a pin.

so, lee, are you up for our usability challenge on alice?
or is this little exercise your attempt to deflect attention?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/f642417e/attachment.html
From jon at noring.name  Thu Aug 25 18:15:26 2005
From: jon at noring.name (Jon Noring)
Date: Thu Aug 25 18:15:41 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
In-Reply-To: <fe.1a6301d8.303fba45@aol.com>
References: <fe.1a6301d8.303fba45@aol.com>
Message-ID: <552287622.20050825191526@noring.name>

Bowerbird wrote:
> lee said:

>>?I can't see why this particular wheel needs re-inventing.

> and your obsession with paragraphs strikes me as being
> as unimportant and trivial as the number of angels that
> can dance on the head of a pin.

It is NOT trivial as Lee explained.

During high-quality ebook presentation, an end-user may want to style
the output so true paragraphs are indented, but other blocks of
standalone stuff (non-paragraphs) are not indented (and maybe
presented with certain typographic differences.) So identifying the
structure we call a true paragraph is important.

Furthermore, the example given earlier of a block of verse within a
paragraph is very interesting and illustrates the importance of
identifying what is and is not a paragraph, and how far it extends.
I'll repeat the example here in "plain text":

[plain text example]
**********************************************************************
The cows swung placidly down the lane, and Anne followed them
dreamily, repeating aloud the battle canto from "Marmion" -- which had
also been part of their English course the preceding winter and which
Miss Stacy had made them learn off by heart -- and exulting in its
rushing lines and the clash of spears in its imagery. When she came to
the lines

      stubborn spearsmen still made good
      Their dark impenetrable wood,

she stopped in ecstasy to shut her eyes that she might the better
fancy herself one of that heroic ring. When she opened them again it
was to behold Diana coming through the gate that led into the Barry
field and looking so important that Anne instantly divined there was
news to be told. But betray too eager curiosity she would not.
**********************************************************************
(end of example text)

Now, it is clear that we have only one paragraph here, so if we rely
upon regularized text to perfectly structure that text, how does one
identify, without machine textual analysis (such as analysis of upper/
lower case, which in Bowerbird's writings would fail!) that the
portion starting with "she stopped in ecstasy..." is part of the prior
paragraph and not the start of a new paragraph? In the scenario where
the end-user wants paragraphs indented, if the last portion is
misidentified as being the start of a new paragraph, then "she stopped
in ecstasy..." will be indented -- not exactly the presentation result
desired. The reader will be distracted upon seeing such an obvious
"typo".

It is clear that stand-alone regularized plain text (such as
following the ZML system -- there are other systems) is limited in the
number and type of document structures it can communicate to the
processor. Now obviously one can add textual analysis (case and
punctuation analysis which has to be language/country/era specific) to
try to handle "exceptions" (and no doubt with thought the number of
exceptions will become quite large), but now one requires fairly
complex AI-like analysis of the content itself (it will have to be
extensible to work for Han script, too!) to what would otherwise be
simple to unambiguously communicate with XML markup.

I suppose some would say that such fine structural detail capable in
XML is still not important. Fine. But as I said before, one gives up
the ability at fine (and essentially unlimited) structural (and
semantic) detail possibilities when relying on regularized plain text
to communicate document structure to a processor (human beings, if
they understand the language, can take regularized plain text and
discern finer structural detail by content and contextual analysis,
namely human intelligence.) There's only so much that can be done with
plain text when machine processed using today's level of AI.

Since the major players in PG and DP (this includes Greg Newby) appear
to be gung-ho on formatting the master etexts in XML, the burden is on
those advocating plain text solutions to demonstrate that regularized
plain text *is sufficient* for the purposes of high-quality
presentation. If a "tool" is built to demonstrate the supremacy of
regularized plain text, it better be able to handle the many
exceptions such as the one above -- when the end-user wants paragraphs
to be indented during presentation, "she stopped in ecstasy..." better
not be indented.

Jon Noring


From jon at noring.name  Thu Aug 25 17:25:48 2005
From: jon at noring.name (Jon Noring)
Date: Thu Aug 25 18:15:58 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 30)
In-Reply-To: <430E5D86.8030406@novomail.net>
References: <20050825230821.2C34F8C915@pglaf.org>
	<430E5D86.8030406@novomail.net>
Message-ID: <1433729628.20050825182548@noring.name>

Lee wrote:
> Jon Noring <jon@noring.name> wrote:

>> Maybe this time around the TEI mavens will be able to cogently explain
>> (at the highest abstract level) why one should not have a <p> after a
>> <div> within another <div>, but can have a <p> before the <div>. I'm
>> still quite perplexed, as it appears Marcello is as well.

> OK, so I guess I really don't want to know that badly. Everyone has to
> select from among those things they feel passsionate about, which ones
> they are going to spend time working one, and for me this is not one of
> those things. Besides, if they're not going to listen to Mr. Perathoner,
> or Mr. Collin, or even, apparently, Mr. Rahtz, one more random voice
> crying from the wilderness is not going to make much difference.

Yes, definitely. Your comment reminds me of the Book of Ecclesiastes
in the Bible about the "right time" and "wrong time" to pursue
something.

Nevertheless, those working on TEI should subscribe to TEI-L and
casually follow the discussion threads. It's quite interesting.


> Besides, being the anarchist that I am, my solution would be just to
> grab the TEI dtds from tei-c.org, rename them to "pg<whatever>.dtd",
> replace the one offending comma with a space, and then validate to that
> instead.

This is my thought as well. I do think PG-TEI should, at least for its
"basic" vocabulary, pick a subset of TEI (subset the elements, the
attributes, and constrain/select particular attribute values), and
build a flat (non-modular) DTD for that (Pizza Chef will make this
almost trivial to do.) And certainly one could tweak the content model
(which should be rare) as seen fit (such as tweaking the content model
for <div> to include a final <p> after a child <div>.) The resulting
DTD won't be a pure subset of TEI P4X, but very close to it. So long
as each deviation from pure TEI P4X is well-documented and cogently
explained, I don't see anyone having a problem with that.

Jon


From Bowerbird at aol.com  Thu Aug 25 18:40:03 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 18:40:21 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <e0.1a736f30.303fccf3@aol.com>

jon noring said:
>    It is NOT trivial as Lee explained.

as i explained, jon, i'm not speaking with you here.

as usual, all of your points can be addressed.
if anyone cares to hear them being addressed,
they will have to make the points themselves.

but i think most people just want this thread to
go poof...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/d4e79332/attachment.html
From j.hagerson at comcast.net  Thu Aug 25 19:33:24 2005
From: j.hagerson at comcast.net (John Hagerson)
Date: Thu Aug 25 19:38:52 2005
Subject: [gutvol-d] Copyright Renewals
Message-ID: <01b101c5a9e6$8b654810$0300a8c0@sarek>

Through my contacts with people as part of the CD and DVD distribution
effort, I had a recent contact with a person who was confused by something
that she found on the shelves of our library.

She found a book that she was interested in reading, but she could not
figure out how to download it. After some back and forth, I found that she
had apparently used one of the full text search engines which found a
citation of an interesting book within one of the copyright renewals files.

Of course, because the copyright has been renewed, this particular book will
not be available on PG for quite a while.

This may be an isolated instance. However, would it make sense to prevent
the full text search engines from indexing the copyright renewal files?
Would it make sense to add some text to the PG index page to indicate that
the copyright renewal files list books that are NOT in PG, rather than books
that are?

Thank you.
John Hagerson


From joshua at hutchinson.net  Thu Aug 25 21:06:25 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Aug 25 19:44:53 2005
Subject: [gutvol-d] Copyright Renewals
In-Reply-To: <01b101c5a9e6$8b654810$0300a8c0@sarek>
References: <01b101c5a9e6$8b654810$0300a8c0@sarek>
Message-ID: <430E9541.5050406@hutchinson.net>

I don't think so.  Most people will understand that if you look in a 
book titled Copyright Renewals, it will have something to do with 
copyrights that have been renewed.  Just like *most* people know that 
even though you *can* put a coffee mug on that handy tray that pops out 
of the front of your computer, that isn't what it is *meant* for.

;)

Josh

John Hagerson wrote:

>Through my contacts with people as part of the CD and DVD distribution
>effort, I had a recent contact with a person who was confused by something
>that she found on the shelves of our library.
>
>She found a book that she was interested in reading, but she could not
>figure out how to download it. After some back and forth, I found that she
>had apparently used one of the full text search engines which found a
>citation of an interesting book within one of the copyright renewals files.
>
>Of course, because the copyright has been renewed, this particular book will
>not be available on PG for quite a while.
>
>This may be an isolated instance. However, would it make sense to prevent
>the full text search engines from indexing the copyright renewal files?
>Would it make sense to add some text to the PG index page to indicate that
>the copyright renewal files list books that are NOT in PG, rather than books
>that are?
>
>Thank you.
>John Hagerson
>
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
>  
>

From cannona at fireantproductions.com  Thu Aug 25 19:46:19 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Aug 25 19:56:28 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
In-Reply-To: <e0.1a736f30.303fccf3@aol.com>
References: <e0.1a736f30.303fccf3@aol.com>
Message-ID: <6.2.1.2.0.20050825213542.04246f08@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 08:40 PM 8/25/2005, Bowerbird wrote:
>jon noring said:
> >   It is NOT trivial as Lee explained.
>
>as i explained, jon, i'm not speaking with you here.

Convenient.  I'll have to keep that in mind.  The next time someone asks
difficult questions which I don't wish to or can't answer, I'll simply find
something which that person has done that I don't agree with, and refuse to
talk to them on the basis of moral superiority.


>but i think most people just want this thread to
>go poof...

"Most people" being code for Bowerbird?


Sincerely
Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFDDoNNI7J99hVZuJcRAuGSAJ9PCb6NLnQRlMxsv+hsRBya93r1AACeMng6
Vro5FwGMCSJjugs5pjU16Pw=
=K6Wq
-----END PGP SIGNATURE-----
From prosfilaes at gmail.com  Thu Aug 25 20:06:39 2005
From: prosfilaes at gmail.com (David Starner)
Date: Thu Aug 25 20:13:23 2005
Subject: [gutvol-d] Copyright Renewals
In-Reply-To: <01b101c5a9e6$8b654810$0300a8c0@sarek>
References: <01b101c5a9e6$8b654810$0300a8c0@sarek>
Message-ID: <6d99d1fd0508252006bff13b9@mail.gmail.com>

On 8/25/05, John Hagerson <j.hagerson@comcast.net> wrote:
> She found a book that she was interested in reading, but she could not
> figure out how to download it. After some back and forth, I found that she
> had apparently used one of the full text search engines which found a
> citation of an interesting book within one of the copyright renewals files.

I don't see how this is limited to the renewals. We have a
bibliography of Talbot's works in PG, despite some of them not going
into the public domain for a while. I don't see that this type of
error will be common enough to take drastic measures against.
From joshua at hutchinson.net  Thu Aug 25 21:38:00 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu Aug 25 20:14:27 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,	Issue
	30)
In-Reply-To: <6.2.1.2.0.20050825213542.04246f08@mail.fireantproductions.com>
References: <e0.1a736f30.303fccf3@aol.com>
	<6.2.1.2.0.20050825213542.04246f08@mail.fireantproductions.com>
Message-ID: <430E9CA8.1000704@hutchinson.net>

Aaron Cannon wrote:

>
>> but i think most people just want this thread to
>> go poof...
>
>
> "Most people" being code for Bowerbird?
>
>

Close ...

"this thread" is code for "Bowerbird", actually.

From Morasch at aol.com  Thu Aug 25 20:17:02 2005
From: Morasch at aol.com (Morasch@aol.com)
Date: Thu Aug 25 20:22:28 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <77.4c6491df.303fe3ae@aol.com>

aaron said:
>    The next time someone asks difficult questions 
>    which I don't wish to or can't answer, I'll simply 
>    find something which that person has done that 
>    I don't agree with, and refuse to talk to them 
>    on the basis of moral superiority.

i wasn't claiming any "moral superiority".

i just think that to force the father of e-books
to undergo "moderation" on an e-book listserve
is too galling to let pass by without a reaction...

where was noring back in 1971, when michael
was typing in the declaration of independence?


>    "Most people" being code for Bowerbird?

actually, i don't mind the so-called "flamewars" a bit.
in case you haven't noticed.   in fact, some people have
accused me of being the _cause_ of them, haven't they?

so if you want to continue them, it's quite alright by me.
i buy the fire-retardant foam they use at the airports and
get a cheap rate because i buy it by the tanker-truckload,
so these little dances around the campfire are fun for me.

and aaron, if you really think jon has asked me some
"difficult questions" that "i don't wish to" or even "can't"
answer, you should step right up and ask them again.

because if you ask them to me, i _will_ answer them...

and i assure you i can answer the questions jon asked,
and that they are actually quite simplistic, not difficult,
and furthermore, that i would be positively _delighted_
to talk at great length about z.m.l. here on this listserve.

(but, like i said, i think most people on this listserve
-- and by that, i mean most people on this listserve --
wish these bowerbird/noring threads would go poof.
but again, if you disagree, let me know, and i'll post.)

i don't think it'd match the density of the x.m.l. threads,
because z.m.l. dispenses with unnecessary complexity,
but i could certainly give my best effort.   and i _am_
quite capable of going on and on, all by my lonesome,
as you also must have noticed by now.   so, by all means,
feel free to pull my string and i will post and post and post.

you might also want to know, though, that i will shortly
start to upload demo programs for your amusement
and edification -- it's pudding time, kids! -- so if you
would rather deal in reality (for a change) instead of
all this speculative, high-falutin' vapor-ware noise,
you might want to wait-and-see for a day-or-two...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/5718387e/attachment.html
From Bowerbird at aol.com  Thu Aug 25 20:30:40 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Aug 25 20:31:00 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <1dc.4389ae30.303fe6e0@aol.com>

josh said:
>    "this thread" is code for "Bowerbird", actually.

now that's funny.   i like funny.              :+)

but most people want the _conflict_ to go poof.
and once i start delivering the pudding to y'all,
the reaction will be silence from the other side,
pure silence, so the conflict will indeed go away.

-bowerbird

p.s.   oops.   posted from my girlfriend's account.
but from me, without a doubt, that tone is unique.         :+)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050825/377ebafa/attachment.html
From brad at chenla.org  Thu Aug 25 21:32:41 2005
From: brad at chenla.org (Brad Collins)
Date: Thu Aug 25 21:33:31 2005
Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d
	Digest, Vol 13, Issue 19)
In-Reply-To: <182290311.20050825095043@noring.name> (Jon Noring's message of
	"Thu, 25 Aug 2005 09:50:43 -0600")
References: <e3.1a59842a.303e3df6@aol.com>
	<908745031.20050824153847@noring.name> <pss2f4zh.fsf@chenla.org>
	<182290311.20050825095043@noring.name>
Message-ID: <ll2pfhcm.fsf@chenla.org>

Jon Noring <jon@noring.name> writes:

> Brad Collins wrote:

> Key to the Rosetta Stone is setting up a universal "metavocabulary"
> to describe common document structures. Now, I have no illusion this
> will be easy -- it will not be easy -- it will be damn hard to do
> right. Then there's the issue of the granularity of the metavocabulary
> -- how fine with document structure does one go -- and what types of
> documents will be targeted?

But it is possible as long as people can make the distinction between
description and meaning.  We might agree to call something the same
thing, but not agree on what it means.

This is a good thing to work towards. XHTML isn't a bad basic,
universal structural language, but it has no way of dealing with
semantic markup.  This is why Docbook and TEI are becoming
increasingly popular, they provide a way of semantically describing a
text.

A lot of semantic markup won't be displayed to the end user at all.
This is as it should be.  Wikipedia is a good example of over-linking
to articles which often do nothing to help explain the concept being
described by the article or are actually related terms.  Are most of
these links generated automatically?  The links in the jrank edition
of the 1911 Encyclop?dia Britannica is another example of pointless
automatically generated links.

The main purpose of semantic tagging is for indexing and search.  If
all texts marked Personal Names, Place Names, Event Names, and names
of Works (books, serials, etc) we could build applications which would
provide a far richer user experience.

Search services could then provide fine-grained searching of a
particular text, rather than just pointing to a document.

I don't think that PG should take on the job of doing this kind of
markup.... PG has enough on it's plate as it is :)

> By and large CSS was not designed for the purpose of assigning
> structural semantics to tags. CSS does have the 'display' property
> which assigns, at a very rudimentary level, some critical structural
> semantics (block, inline, table, list). But as we know, the allowed
> 'display' values are quite limited -- they don't, and in practical
> sense cannot, assign some critical semantics such as hypertext links,
> embedded images and objects (XLink is the vocabulary-agnostic solution
> for these particular things.) There is no CSS 'display' property for
> section headers, for example (in CSS, a header has to be treated as
> simply a kind of "block-level" tag), yet it is clear for
> text-to-speech that section headers be specifically identified as
> such, and not lumped in with paragraphs.
>
> Then there's the issue that CSS is intended for *styling* during
> presentation (by and large visual styling). That is its purpose --
> it's not designed to be a "Rosetta Stone" for conveying detailed
> structural information.

I was thinking along the lines of creating a new CSS module which
would allow XPath as an alternative to CSS Selectors, and then create
semantic css elements.  I suppose you could declare at the beginning
of a style sheet it is semantic or style and which selector type you
want to use.

This is just off the top of my head but you could then have something
like this in the stylesheet (the XPATH is probably not correct):

//p/ string("Scrooge") {
  semantic-type : personal-name
  used-for      : Ebenezer Scrooge
  defined-by    : bxid://aut:OSE0-1157
  url           : http://chenla.org/blah/blah/Scrooge.html
  scope-note    : "Ebenezer Scrooge is the miserly old man visited by
                  the ghost of his dead partner Jacob Marley in
                  Charles Dicken's A Christmas Carol."
}

I think it's an interesting idea, but it would require adoption by
each browser which is a long shot at best.  I think it's more
practical to define a master XML text and then let people apply markup
to a locals copy which would act as a layer.  Multiple layers could be
merged together with something that works like diff and patch....

This would work just like ARCH (a version control system like CVS) --
all local copies are branches which you can keep private, merge with
the orignal, or branch the version into a new edition.

b/

-- 
Brad Collins <brad@chenla.org>, Bangkok, Thailand
From gbnewby at pglaf.org  Thu Aug 25 23:24:33 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Aug 25 23:24:34 2005
Subject: [gutvol-d] Copyright Renewals
In-Reply-To: <01b101c5a9e6$8b654810$0300a8c0@sarek>
References: <01b101c5a9e6$8b654810$0300a8c0@sarek>
Message-ID: <20050826062433.GA9899@pglaf.org>

On Thu, Aug 25, 2005 at 09:33:24PM -0500, John Hagerson wrote:
> Through my contacts with people as part of the CD and DVD distribution
> effort, I had a recent contact with a person who was confused by something
> that she found on the shelves of our library.
> 
> She found a book that she was interested in reading, but she could not
> figure out how to download it. After some back and forth, I found that she
> had apparently used one of the full text search engines which found a
> citation of an interesting book within one of the copyright renewals files.

Hi, John.   This describes an FAQ we get to help@pglaf.org
at least once or twice per week.

Yes, it might make sense to set our robots.txt file
to eliminate these files from search engine indexing.
I'm cc'ing Marcello, so he can tell us whether this
seems worthwhile given our system stats and such.

  - Greg

> Of course, because the copyright has been renewed, this particular book will
> not be available on PG for quite a while.
> 
> This may be an isolated instance. However, would it make sense to prevent
> the full text search engines from indexing the copyright renewal files?
> Would it make sense to add some text to the PG index page to indicate that
> the copyright renewal files list books that are NOT in PG, rather than books
> that are?
> 
> Thank you.
> John Hagerson
> 
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From jmdyck at ibiblio.org  Fri Aug 26 00:57:07 2005
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Fri Aug 26 00:58:13 2005
Subject: [gutvol-d] Re: PT1 Weekly Project Gutenberg Newsletter
References: <Pine.LNX.4.60.0508240958150.13907@pglaf.org>
Message-ID: <430ECB53.76F6699D@ibiblio.org>

Michael Hart wrote:
> 
> Weekly_August_24.txt
>
> >>>   !!!17,000+ eBooks at http://www.gutenberg.org as of today!!!   <<<

As I understand it (from part 2 of the newsletter), yesterday's count of
17020 includes 476 from PG-Australia that (for copyright reasons) are
not available from http://www.gutenberg.org (nor are they in its
catalog). If that's the case, then it's somewhat misleading to say
"17,000+ eBooks at http://www.gutenberg.org". Instead, "17,000+ eBooks
available from Project Gutenberg" would be sufficiently vague to cover
both PG-US and PG-Aus.

-Michael Dyck
From marcello at perathoner.de  Fri Aug 26 02:39:10 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 02:39:41 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,	Issue
	30)
In-Reply-To: <e0.1a736f30.303fccf3@aol.com>
References: <e0.1a736f30.303fccf3@aol.com>
Message-ID: <430EE33E.9090406@perathoner.de>

Bowerbird@aol.com wrote:

> but i think most people just want this thread to
> go poof...

Guess who started this thread?

Think about that before you start another one like this.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Aug 26 03:01:42 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 03:02:10 2005
Subject: [gutvol-d] Copyright Renewals
In-Reply-To: <01b101c5a9e6$8b654810$0300a8c0@sarek>
References: <01b101c5a9e6$8b654810$0300a8c0@sarek>
Message-ID: <430EE886.8000102@perathoner.de>

John Hagerson wrote:

> She found a book that she was interested in reading, but she could not
> figure out how to download it. After some back and forth, I found that she
> had apparently used one of the full text search engines which found a
> citation of an interesting book within one of the copyright renewals files.

> This may be an isolated instance. However, would it make sense to prevent
> the full text search engines from indexing the copyright renewal files?

That would make the copyright renewal files unusable for those who 
wanted to search them. Besides, that is another thing that would have to 
be maintained by hand ... better write it off as pilot error (and 
remember the story for the next time you need a good one.)


> Would it make sense to add some text to the PG index page to indicate that
> the copyright renewal files list books that are NOT in PG, rather than books
> that are?

The note would be displayed on this page:

   http://www.gutenberg.org/etext/11800

but, if she got to that page, I dont see how she could possibly have 
mistaken the book for something else even without note.

I guess she just looked at the file in the google cache and read the PG 
header.


-- 
Marcello Perathoner
webmaster@gutenberg.org
From marcello at perathoner.de  Fri Aug 26 03:16:23 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 03:16:46 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,	Issue
	30)
In-Reply-To: <77.4c6491df.303fe3ae@aol.com>
References: <77.4c6491df.303fe3ae@aol.com>
Message-ID: <430EEBF7.6010305@perathoner.de>

Morasch@aol.com wrote:

Who may be this new guy who walks like a bowerbird and quacks like a 
bowerbird?


> where was noring back in 1971, when michael
> was typing in the declaration of independence?

Where was "Morasch" ?


> i don't think it'd match the density of the x.m.l. threads,
> because z.m.l. dispenses with unnecessary complexity,

It dispenses with everything else too, like, it dispenses with a first 
implementation after 2,5 years of announcements.


> you might also want to know, though, that i will shortly
> start to upload demo programs for your amusement

I know. You already said that in February 2003:

> -- for immediate release --
> 
> date:  14 february 2003
> dateline:  los angeles, california
> contact:  bowerbird intelligentleman
> bowerbird@aol.com  310.980.9202
> 
> bowerbird intelligentleman announces
> an open-source project geared toward
> creating an o.e.b. "presentation system",
> i.e., a cross-platform reader-program
> that will allow users to read o.e.b files. 
> 
> [...]
> 
> bowerbird further indicated that he is fully confident that
> the effort would bear fruit quickly, since he has previously
> programmed a wide variety of electronic-book applications. 

... released a vast array of fruitless announcements.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Aug 26 03:56:11 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 03:56:35 2005
Subject: [gutvol-d] The Bowerbird
Message-ID: <430EF54B.7010905@perathoner.de>

The Bowerbird

   "Bower-" said I, "-bird of evil! -- Bore and wastrel, fiend and devil!
   Get thee back into thy bower at the Californian shore!
   Be today your time of parting, get a life!" I shrieked, upstarting --
   "We are tired of your blurting! -- Quit! and post here nevermore!
   Will you -- *will* you leave this listserve? -- tell me -- tell me, I 
implore!"
                                            Quoth the Raven, "Nevermore."


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Fri Aug 26 08:41:42 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Aug 26 08:41:55 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <81.2e9e2b07.30409236@aol.com>

marcello said:
>    Guess who started this thread?

i don't need to "guess".

the phrase "unluckily for us" was the 
beginning of an assertion made by lee.
that's what "started this thread".

but the real reason people get fed up
is no-content posts like yours here...

think about that before you make another.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050826/da69e6f0/attachment.html
From distributedmel at gmail.com  Fri Aug 26 09:46:30 2005
From: distributedmel at gmail.com (Melissa Er-Raqabi)
Date: Fri Aug 26 09:54:35 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
In-Reply-To: <81.2e9e2b07.30409236@aol.com>
References: <81.2e9e2b07.30409236@aol.com>
Message-ID: <a5a2a50205082609464b5eff4@mail.gmail.com>

Is there a reason your text needs to be so giant lately Bowerbird?
It's really quite intrusively large.

Melissa

On 8/26/05, Bowerbird@aol.com <Bowerbird@aol.com> wrote:
> marcello said:
>  >   Guess who started this thread?
>  
>  i don't need to "guess".
>  
>  the phrase "unluckily for us" was the 
>  beginning of an assertion made by lee.
>  that's what "started this thread".
>  
>  but the real reason people get fed up
>  is no-content posts like yours here...
>  
>  think about that before you make another.
>  
>  -bowerbird
>  
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
> 
> 
>
From Bowerbird at aol.com  Fri Aug 26 12:42:07 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Aug 26 12:42:20 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <9f.65fd5580.3040ca8f@aol.com>

melissa said:
>    Is there a reason your text 
>    needs to be so giant lately Bowerbird?
>    It's really quite intrusively large.

sorry.   it's a new version of the a.o.l. client,
and i don't seem to be able to turn off .html.

please accept my apologies...

i too long for the days of plain-text...        :+(

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050826/ac12d552/attachment.html
From marcello at perathoner.de  Fri Aug 26 13:04:09 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 13:04:20 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,	Issue
	30)
In-Reply-To: <9f.65fd5580.3040ca8f@aol.com>
References: <9f.65fd5580.3040ca8f@aol.com>
Message-ID: <430F75B9.1080705@perathoner.de>

Bowerbird@aol.com wrote:

>>   Is there a reason your text 
>>   needs to be so giant lately Bowerbird?
>>   It's really quite intrusively large.
> 
> sorry.   it's a new version of the a.o.l. client,
> and i don't seem to be able to turn off .html.

You don't have many marbles, do you?


If the aol version of thunderbird is crippled, then go to:

   http://www.mozilla.org/products/thunderbird/all.html

and download the official version of thunderbird for MAC OS X.


And stop blaming everything that went wrong for you on XML and AOL.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Fri Aug 26 13:06:37 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Fri Aug 26 13:07:01 2005
Subject: [gutvol-d] TEI 'floating' divs
Message-ID: <430F764D.6090505@perathoner.de>

If anybody wants to learn more about the "<p> followed by <div> is
invalid" kind of problem just subscribe the TEI-L list. They are at it
again ...

To subscribe to the TEI-L mailing-list:

   http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1


-- 
Marcello Perathoner
webmaster@gutenberg.org


From cannona at fireantproductions.com  Fri Aug 26 13:23:44 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Aug 26 13:28:21 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
In-Reply-To: <77.4c6491df.303fe3ae@aol.com>
References: <77.4c6491df.303fe3ae@aol.com>
Message-ID: <6.2.1.2.0.20050826150712.042f3fd0@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 10:17 PM 8/25/2005, you wrote:

>i wasn't claiming any "moral superiority".
>
>i just think that to force the father of e-books
>to undergo "moderation" on an e-book listserve
>is too galling to let pass by without a reaction...

Could have fooled me.


>where was noring back in 1971, when michael
>was typing in the declaration of independence?

Awe, I see.  So in order to be able to criticize anything, you must be the
founder?


<snip>
>and aaron, if you really think jon has asked me some
>"difficult questions" that "i don't wish to" or even "can't"
>answer, you should step right up and ask them again.

I'd rather not play the Bowerbird game.  Kannons and Katapults is much more
enjoyable thanks.  You have the questions.  You will either answer them or
you won't.  Should you choose the latter, it certainly won't have been the
first time.


>because if you ask them to me, i _will_ answer them...

That's ok.  I'll let you get back to your work.  Careful, your pudding is
burning.  Of course that's only to be expected when you cook it for 2+ years.


Sincerely
Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFDD3tXI7J99hVZuJcRAvRoAKDxhVMmbYfERT6CG6WjT+C3ZEGjMQCcD/sx
CB60WUOhJvJ0VQCjhSdXJes=
=Wt87
-----END PGP SIGNATURE-----
From Bowerbird at aol.com  Fri Aug 26 14:01:09 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Aug 26 14:01:33 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <42.6f844ae5.3040dd15@aol.com>

aaron said:
>    Awe, I see.? So in order to be able to criticize anything, 
>    you must be the founder?

um, no.

does that gem really represent your capacity for inference?

if so, then i'll be happy to explain it to you:
anyone should be "able to criticize anything",
as you put it; and the very act of censorship
(or "moderation", as it is sometimes called),
puts a chill on that.   furthermore, if _anyone_
is to be "given a pass" and allowed to speak
freely, it should be the _leading_pioneer_...

for _michael_hart_ to be "moderated" on a
listserve for _electronic_books_ is laughable.

it is as humorous as sprinkling sugar on pizza.


>    I'd rather not play the Bowerbird game.? 
>    Kannons and Katapults is much more enjoyable thanks.? 
>    You have the questions.? You will either answer them 
>    or you won't.? Should you choose the latter, 
>    it certainly won't have been the first time.

you don't really want the answers to the questions,
because you'd rather say i don't have the answers.

you don't have any intellectual curiosity about this,
you just have a desire to try and make me look bad.

and you'd rather say my programs are vaporware
than to simply go and download them, wouldn't you?

i'm just not clear how this helps your _integrity_, aaron.

even if you were to ignore how they make you look,
don't the lies coming out of your mouth _bother_ you?


>    That's ok.? I'll let you get back to your work.? 
>    Careful, your pudding is burning.? Of course 
>    that's only to be expected when you 
>    cook it for 2+ years.

no problem.   new batch up soon.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050826/61846adf/attachment.html
From hart at pglaf.org  Fri Aug 26 14:26:15 2005
From: hart at pglaf.org (Michael Hart)
Date: Fri Aug 26 14:26:17 2005
Subject: [gutvol-d] Re: PT1 Weekly Project Gutenberg Newsletter
In-Reply-To: <430ECB53.76F6699D@ibiblio.org>
References: <Pine.LNX.4.60.0508240958150.13907@pglaf.org>
	<430ECB53.76F6699D@ibiblio.org>
Message-ID: <Pine.LNX.4.60.0508261425220.26790@pglaf.org>


Fixed in both Newsletter and my sig block,
thanks for pointing out the errors.

Michael


On Fri, 26 Aug 2005, Michael Dyck wrote:

> Michael Hart wrote:
>>
>> Weekly_August_24.txt
>>
>>>>>   !!!17,000+ eBooks at http://www.gutenberg.org as of today!!!   <<<
>
> As I understand it (from part 2 of the newsletter), yesterday's count of
> 17020 includes 476 from PG-Australia that (for copyright reasons) are
> not available from http://www.gutenberg.org (nor are they in its
> catalog). If that's the case, then it's somewhat misleading to say
> "17,000+ eBooks at http://www.gutenberg.org". Instead, "17,000+ eBooks
> available from Project Gutenberg" would be sufficiently vague to cover
> both PG-US and PG-Aus.
>
> -Michael Dyck
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From Bowerbird at aol.com  Fri Aug 26 15:09:33 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Aug 26 15:09:47 2005
Subject: [gutvol-d] speaking of pudding, banana-cream is ready
Message-ID: <ae.7872aed5.3040ed1d@aol.com>

speaking of pudding, "banana-cream" is ready.

greg asked, a while back, for demonstrations of
how online page-scans might be incorporated
into the e-book experience.   this is one entry.

i used the text of jon noring's "my antonia" version,
which can be obtained from the openreader site:
      http://www.openreader.org/myantonia

the text is presented to the end-user, page by page.
the end-user can then just click a   button to download
the scan-image for a specific page to be displayed too.
(for instance, if they wanted to check a possible error.)

the scan-image is retrieved from the openreader site. 

when an image is downloaded, it's stored on your machine,
and any further accesses are made using that local copy...

the option is also given to download _all_ of the images
in a batch, in either a foreground or background process.
that's what a pre-processor or post-processor from d.p.
might do, so they could work throughout the entire book.
an end-user might also choose to download all the scans,
if they wanted to see how the physical book itself looked.

(to do this another way for the demo, download the .zip file of 
all the image-scans from the openreader site to begin with.)

i also threw in text-to-speech, because it was easy, so
the computer can read you the text as you view the scan.
text-to-speech _does_ work on newer p.c. systems, but
might not work automatically on older ones, i'm not sure.
of course, macs have had text-to-speech for over a decade.

this is a demo-app, so it's not polished, and has no docs.
if people like it, and ask me to polish it up, i certainly can.

but i anticipate that the reaction from here will be silence.
sorry to spoil your "bowerbird makes vaporware" fun...

banana-cream currently runs on the mac, from os8.1 up,
including o.s.x., and on windows, from windows98 up.

if there are any linux users who want to alpha-test for me,
speak up.

i'll be uploading banana-cream to the web next week;
but anyone who would like to use it before then can
backchannel me for a preview copy...

screenshots:
>    http://snowy.arsc.alaska.edu/bowerbird/bc--1.jpg
>    http://snowy.arsc.alaska.edu/bowerbird/bc-03.jpg
>    http://snowy.arsc.alaska.edu/bowerbird/bc003.jpg

(i put one error on each scan, for "where's waldo" fans.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050826/4ad81d48/attachment.html
From cannona at fireantproductions.com  Fri Aug 26 16:52:59 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Aug 26 16:57:59 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
In-Reply-To: <42.6f844ae5.3040dd15@aol.com>
References: <42.6f844ae5.3040dd15@aol.com>
Message-ID: <6.2.1.2.0.20050826163545.04420e70@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 04:01 PM 8/26/2005, you wrote:
>aaron said:
> >   Awe, I see.  So in order to be able to criticize anything,
> >   you must be the founder?
>
>um, no.
>
>does that gem really represent your capacity for inference?
>
>if so, then i'll be happy to explain it to you:
>anyone should be "able to criticize anything",
>as you put it; and the very act of censorship
>(or "moderation", as it is sometimes called),
>puts a chill on that.  furthermore, if _anyone_
>is to be "given a pass" and allowed to speak
>freely, it should be the _leading_pioneer_...
>
>for _michael_hart_ to be "moderated" on a
>listserve for _electronic_books_ is laughable.
>
>it is as humorous as sprinkling sugar on pizza.

Nice try.  You asked the question "where was noring back in 1971, when
Michael was typing in the declaration of independence?"  The obvious answer
to this is "not there."  The purpose of this rhetorical question was
clearly to point out that Mr. Noring is some how unqualified to speak on
the matter due to his absence from such a momentous occasion.  It doesn't
take all that much inference, unless of course one attempts to spin what
you said to match your argument.


>you don't really want the answers to the questions,
>because you'd rather say i don't have the answers.

And you'd rather avoid the questions than provide the answers.


>you don't have any intellectual curiosity about this,
>you just have a desire to try and make me look bad.

Exactly.  I have no interest in any of this.  The only reason I subscribe
to the list and read the messages is so that I can torment you.  My life
revolves around Bowerbird.  If it weren't for you, I don't think I'd even
have an e-mail address.


>and you'd rather say my programs are vaporware
>than to simply go and download them, wouldn't you?

Awe yes.  Please provide me with a link so I can download these
revolutionary applications.


>i'm just not clear how this helps your _integrity_, aaron.

>even if you were to ignore how they make you look,
>don't the lies coming out of your mouth _bother_ you?
>
Yes they do.  I'm sorry everyone.  I confess.  It was I, not Bowerbird who
told the world two years ago that I was going to release a really neat
program that would make you eat your own children with envy; then I never
did.  Please forgive me.


Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFDD6x5I7J99hVZuJcRAstZAJ9Lv2maoXWH+HCXR7OIGmOT0LapqACfZcQi
6WUWo3gy7kjfgLBbNxdQopI=
=OCoz
-----END PGP SIGNATURE-----
From Bowerbird at aol.com  Fri Aug 26 17:12:00 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Fri Aug 26 17:12:18 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <1ff.8b1bd7d.304109d0@aol.com>

aaron said:
>    The purpose of this rhetorical question was
>    clearly to point out that Mr. Noring is some how 
>    unqualified to speak on the matter 

no.   again, your problems with inference manifest.

the point is that mr. noring should be _embarassed_ to 
subject michael hart to moderation on an e-book listserve.

michael hart pioneered e-books.   he should be allowed to
speak, to speak freely, and to speak as much as he wants.

that's _my_ opinion anyway.   and i feel strongly enough
about it that i don't care to speak to jon here, thank you.

i am not telling you that _you_ can't speak to him here,
or that _he_ can't speak here.   i'm just saying that i don't
care to speak to him, so i will be ignoring any questions
that he asks in his posts.   but again, if _you_ want the
answers to any of his questions, just ask me yourself.


>    Awe yes.? Please provide me with a link 
>    so I can download these revolutionary applications.

well, i've given it numerous times already,
but i'm happy to give it to you again, aaron.

join the zml_talk listserve at yahoogroups, by sending
an e-mail to zml_talk-subscribe@yahoogroups.com
and you will find the zml-viewer in the "files" section.

banana-cream will be available for download next week.
or, if you'd like to see it sooner, just request it backchannel.

if you have any more questions, save them for next week.
as i've said, it's rude to flood the listserve on the weekend.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050826/4eeb13c6/attachment.html
From cannona at fireantproductions.com  Fri Aug 26 17:25:57 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Fri Aug 26 17:43:21 2005
Subject: [gutvol-d] Greg, please read
Message-ID: <6.2.1.2.0.20050826192348.04418fa0@mail.fireantproductions.com>

Please check your spam folder for a message from mdelato@gmail.com.  She 
one of our CD/DVD volunteers who has been attempting to reach you without 
success.

Thanks and sorry everyone else for the interruption.

Sincerely
Aaron Cannon


--
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail 
address.)  

From joshua at hutchinson.net  Fri Aug 26 20:40:01 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Fri Aug 26 19:17:14 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,	Issue
	30)
In-Reply-To: <1ff.8b1bd7d.304109d0@aol.com>
References: <1ff.8b1bd7d.304109d0@aol.com>
Message-ID: <430FE091.2010007@hutchinson.net>

Bowerbird@aol.com wrote:

> that's _my_ opinion anyway.  and i feel strongly enough
> about it that i don't care to speak to jon here, thank you.


Oh, oh, oh.  I'm censoring Michael Hart.  I won't let him speak at my 
family reunion unless he submits his speech to an oversight committee.

Please, does that mean you won't talk to me, either?  Please.

(Note to everyone else:  If this works, feel free to steal my idea and 
censor Michael to shut up bowerbird!)

Josh
From gbnewby at pglaf.org  Fri Aug 26 19:43:54 2005
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri Aug 26 19:43:56 2005
Subject: [gutvol-d] Greg, please read
In-Reply-To: <6.2.1.2.0.20050826192348.04418fa0@mail.fireantproductions.com>
References: <6.2.1.2.0.20050826192348.04418fa0@mail.fireantproductions.com>
Message-ID: <20050827024354.GB1238@pglaf.org>

On Fri, Aug 26, 2005 at 07:25:57PM -0500, Aaron Cannon wrote:
> Please check your spam folder for a message from mdelato@gmail.com.  She 
> one of our CD/DVD volunteers who has been attempting to reach you without 
> success.
> 
> Thanks and sorry everyone else for the interruption.

Oops....sorry.  Done!
  -- Greg

> Sincerely
> Aaron Cannon
> 
> 
> 
> --
> E-mail: cannona@fireantproductions.com
> Skype: cannona
> MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail 
> address.)  
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
From cannona at fireantproductions.com  Sat Aug 27 12:13:02 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat Aug 27 12:14:35 2005
Subject: [gutvol-d] Re: unluckily for us (gutvol-d Digest, Vol 13,
	Issue 30)
Message-ID: <6.2.1.2.0.20050827141248.035ea208@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 07:12 PM 8/26/2005, you wrote:

>no.  again, your problems with inference manifest.
>
>the point is that mr. noring should be _embarassed_ to
>subject michael hart to moderation on an e-book listserve.

Ok.  I am sure we can all use our imaginations for this one and make your
interpretation fit.


>michael hart pioneered e-books.  he should be allowed to
>speak, to speak freely, and to speak as much as he wants.
>
>that's _my_ opinion anyway.  and i feel strongly enough
>about it that i don't care to speak to jon here, thank you.
>
>i am not telling you that _you_ can't speak to him here,
>or that _he_ can't speak here.  i'm just saying that i don't
>care to speak to him, so i will be ignoring any questions
>that he asks in his posts.  but again, if _you_ want the
>answers to any of his questions, just ask me yourself.

Again, I am curious, but I will not ask you.  I'm not interested in playing
the Bowerbird Game.  If you do not wish to answer serious questions about
your project, why should I or anyone else care.

It's like if you have two people X and Y, and person x designs Z.

X: "You should really use Z. I designed it and it will solve all of the
problems that you have."
Y: "Oh really X?  If I feed A and B to Z, will I get output C?"
x: "I'm sorry I can't answer that question because you called me X and not
Mr. X."
Y: "Yeah, but what about D and E and F?  How does project Z handle those?"
X: "hmm hmm hmm lalala not listening; not listening."
Y: Fine, I'll use project W instead.  It's been around longer, has better
documentation, and more extensive testing."
X: "You're just trying to make me look bad!  Project W is stupid and
everyone knows it.  You are nothing but a vocal minority.  I am the only
one who knows what everyone else wants."

It's not a perfect analogy, but pretty dang close.


> >   Awe yes.  Please provide me with a link
> >   so I can download these revolutionary applications.
>
>well, i've given it numerous times already,
>but i'm happy to give it to you again, aaron.
>
>join the zml_talk listserve at yahoogroups, by sending
>an e-mail to zml_talk-subscribe@yahoogroups.com
>and you will find the zml-viewer in the "files" section.

That's not a link.  A link generally, but not always begins with http://....

Do you keep it locked up under the files section to control it's
distribution?  Doesn't exactly sound like a distribution model that Project
Gutenberg will be likely to embrace.  Since you've worked on so many ebook
related apps in the past, it doesn't seem like it would be much of a
challenge for you to find somewhere to host the files.


>banana-cream will be available for download next week.
>or, if you'd like to see it sooner, just request it backchannel.

No thanks.  I did however take a look at the give program.  It could use
some redesign as far as the pull down menus are concerned.  I've never seen
so many in my life.  Also, I can tell you for sure that it is not very
accessible for a blind user.  By not very accessible, I mean that I would
never read a book in it, although I could read a page if I had to.

It's too bad really.  You've got some half-decent ideas here.  However, I
can promise you they will never be embraced by anyone if you don't
radically alter your social conduct.


>if you have any more questions, save them for next week.
>as i've said, it's rude to flood the listserve on the weekend.

Oh that's classic stuff!  That's right Bowerbird, we wouldn't want to flood
the listserve.  Could this be simply another means of avoiding difficult
questions?


Sincerely
Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFDELuQI7J99hVZuJcRAvrxAKCeV0w/19oL0c8uHAXqp7B/KcoxNgCffF86
yneUAn5plrhQWFe762xvKWc=
=uLAi
-----END PGP SIGNATURE-----
From Bowerbird at aol.com  Sat Aug 27 16:02:15 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Sat Aug 27 16:02:32 2005
Subject: [gutvol-d] how can you
Message-ID: <74.5ae1629b.30424af7@aol.com>

aaron said:
>   It's too bad really.? You've got some half-decent ideas here.? 
>    However, I can promise you they will never be embraced 
>    by anyone if you don't radically alter your social conduct.

how can you have any pudding if you don't eat your meat?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050827/1e38c095/attachment.html
From JBuck814366460 at aol.com  Sat Aug 27 19:31:07 2005
From: JBuck814366460 at aol.com (Jared Buck)
Date: Sat Aug 27 19:31:25 2005
Subject: [gutvol-d] how can you
In-Reply-To: <74.5ae1629b.30424af7@aol.com>
References: <74.5ae1629b.30424af7@aol.com>
Message-ID: <431121EB.8080407@aol.com>

An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050827/c98e9a60/attachment.html
From cannona at fireantproductions.com  Sun Aug 28 12:00:24 2005
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun Aug 28 12:01:31 2005
Subject: [gutvol-d] how can you
In-Reply-To: <74.5ae1629b.30424af7@aol.com>
References: <74.5ae1629b.30424af7@aol.com>
Message-ID: <6.2.1.2.0.20050828135916.0453d298@mail.fireantproductions.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 06:02 PM 8/27/2005, you wrote:

>how can you have any pudding if you don't eat your meat?


"Full of sound and fury, signifying nothing."

Aaron Cannon


- --
E-mail: cannona@fireantproductions.com
Skype: cannona
MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFDEgoBI7J99hVZuJcRAg46AJ0Yp8nj4nDljGWF8bQQp0fzDFe3xwCg4Ea2
so1COFZkVW16/yj6Hkd+JhU=
=inxw
-----END PGP SIGNATURE-----
From Bowerbird at aol.com  Mon Aug 29 13:37:48 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Mon Aug 29 13:38:03 2005
Subject: [gutvol-d] reinventing the wheel
Message-ID: <1ad.3daf93f5.3044cc1c@aol.com>

lee said:
>   The ?Book ebook reader can do that right now 
>    (PG simplified text looks really good in ?Book). 

so ?book has upgraded the project gutenberg e-texts
from "impoverished text format" to "p.g. simplified text"?
that's good news to hear.   imagine what a more powerful
viewer-app might do for the reputation of these e-texts...


>    (PG simplified text looks really good in ?Book). 
>    I can't see why this particular wheel needs re-inventing.

you know, lee, over the many years, i've heard you
say a lot of smart things.   and a few dumb things too.
this statement has to be one of your dumbest ever...

you point me -- a mac user -- to a pc-only program,
and then suggest that by writing the same type of app,
i am "reinventing the wheel"?    what a silly thing to say.

but let me reassure you that i am not stopping there at 
the viewer-program.   no sir, that's only the first step...

if you take a look at my viewer now, it already contains
several features that give the reader a sense of mastery
over the document, like a multiple-term search routine.

and i'll extend those features greatly, offering things like
automatic generation of a concordance and kwic feature.

none of these features is available in any other viewer.

but as i said, i'm not stopping there...

i've gone to the next step -- writing the authoring-tool --
a number of times.   that's the more important program;
until someone creates e-books, readers can't read 'em.

and tomorrow, i will upload "banana-cream", a demo that
shows how a reader might incorporate on-line scans into
their e-book experience.   i expect people who know how to
digitize paper-books will realize this app can help them too.

but i have gone way past that step as well.   my research
has examined the question of how to put many thousands
of e-books on a person's machine in a manageable way
which maximizes the potential obtained from all of them.
(this arena includes realms like searching and interlinking.)

i have also studied the best way to form and use a "database"
of the various things that a person has read on their computer
(not just e-books, but e-mail and websites and what have you),
so you can quickly and easily pull out anything you have read,
even if you don't remember when you read it, or its source...

this power to recall anything you've ever read (provided that
you read it on your computer) will be a revolutionary spark...

so i'm not "reinventing the wheel".   i'm reinventing 18 of 'em,
and putting them under a big rig, so i can roll down the road...

i'm not standing still, lee.   you shouldn't either...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050829/defc0208/attachment.html
From Bowerbird at aol.com  Wed Aug 31 13:50:29 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 31 13:50:48 2005
Subject: [gutvol-d] banana-cream
Message-ID: <24.778a6d43.30477215@aol.com>

i've decided not to take the banana-cream demo public yet.

the current version works on jon noring's "my antonia".

i've started work on a version that uses another e-text,
which will be submitted very soon to project gutenberg,
and that's the demonstration that i will take public first...

however, if you'd like to take a look at the current version,
just request it from me backchannel.   (tell me your o.s.)

have a nice day...

-bowerbird

p.s.   if anyone wants to smoothread an advance-copy of
this other e-text prior to its being submitted, let me know.
it's a space-story set in the future, by meyer moldeven...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050831/a365123e/attachment.html
From joshua at hutchinson.net  Wed Aug 31 13:54:35 2005
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Wed Aug 31 13:54:44 2005
Subject: [gutvol-d] banana-cream
Message-ID: <20050831205435.4FE89EDFF1@ws6-1.us4.outblaze.com>


----- Original Message -----
From: Bowerbird@aol.com
> 
> i've decided not to take the banana-cream demo public yet.
> 


I am absolutely shocked by this.  Shocked and surprised and dismayed.

Who would have thought bowerbird would disappoint us on a promised release?

Josh
From Bowerbird at aol.com  Wed Aug 31 14:55:02 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 31 14:55:16 2005
Subject: [gutvol-d] banana-cream
Message-ID: <7b.4c77fed6.30478136@aol.com>

josh said:
>    I am absolutely shocked by this.? 
>    Shocked and surprised and dismayed.

gee, gosh, i didn't know you cared so much!


>    Who would have thought bowerbird would 
>    disappoint us on a promised release?

well, i don't think anyone is "disappointed".
but if they are, they can just request a copy
of the program backchannel.   and _voila!_

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050831/4742ca07/attachment.html
From marcello at perathoner.de  Wed Aug 31 15:03:13 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 31 15:03:44 2005
Subject: [gutvol-d] banana-cream
In-Reply-To: <24.778a6d43.30477215@aol.com>
References: <24.778a6d43.30477215@aol.com>
Message-ID: <43162921.8070001@perathoner.de>

Bowerbird@aol.com wrote:

> i've decided not to take the banana-cream demo public yet.
> 
> the current version works on jon noring's "my antonia".
> 
> i've started work on a version that uses another e-text,

If I understand this right, the etext is hard-coded into the reader?

So you'll have to program a reader for each of the 16.000 ebooks?

And that from the guy that always said marking up was too much trouble.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Wed Aug 31 15:08:48 2005
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Aug 31 15:08:59 2005
Subject: [gutvol-d] banana-cream
In-Reply-To: <7b.4c77fed6.30478136@aol.com>
References: <7b.4c77fed6.30478136@aol.com>
Message-ID: <43162A70.9040808@perathoner.de>

Bowerbird@aol.com wrote:

> well, i don't think anyone is "disappointed".
> but if they are, they can just request a copy
> of the program backchannel.   and _voila!_

I request a copy of your program for linux.

Please mail it to me asap.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From Mark_Davies at byu.edu  Wed Aug 31 15:04:05 2005
From: Mark_Davies at byu.edu (Mark Davies)
Date: Wed Aug 31 15:16:15 2005
Subject: [gutvol-d] Original publication dates?
Message-ID: <CA8D39028955ED41AFCB2B50C54BA7E9018DB492@THORN.exch.ad.byu.edu>

Sorry if this is a FAQ, or a variation on a FAQ.

Are there any Project Gutenberg databases that show the *original*
publication dates (e.g. 1875, 1916) for all or most of the texts?  I've
created a database (current as of a few months ago) that has info on
each book -- author, title, LC classification, etc --- but nowhere in
the metadata for the texts could I find the original publication date.  

Unless I can find such a database, I'm going to get a research assistant
to find this info for all ([original] English language) texts in the
collection, or else write a script to automate the process.

FWIW, I'm planning on using the Gutenberg texts as part of a 100 million
word corpus of texts from English (British and US) from the 1800s-1900s,
similar to what I've done for the 100 million word British National
Corpus (http://view.byu.edu) and the 100 million word Corpus del Espanol
(www.corpusdelespanol.org).

Thanks in advance for any info you might have.

Mark Davies

=================================================

Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906

http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **

================================================= 
From Bowerbird at aol.com  Wed Aug 31 15:35:43 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 31 15:35:57 2005
Subject: [gutvol-d] banana-cream
Message-ID: <19b.3b350271.30478abf@aol.com>

marcello said:
>    If I understand this right, the etext is hard-coded into the reader?

for the demo, yes, it is.

both the text and the download u.r.l.s are hard-coded.
it wouldn't make sense to do a demo program differently.


>    So you'll have to program a reader for each of the 16.000 ebooks?

for the actual program, no.
what would be the sense in that?


>    And that from the guy that always said 
>    marking up was too much trouble.

get it straight, marcello.

i said that markup is _costly_.
and that it might not be _worth_its_benefits_.
(especially considering that most of the hype
will not be realized, for one reason or another.)

but whether markup is "too much trouble"
depends on whether _i_ do it, or _you_ do it.

if you do it, i don't think it'll be "too much trouble" at all.
indeed, i'm impatient for you to start to get to work on it.
and i've waited for several years.   what is your hangup?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050831/a1ec9da8/attachment.html
From Bowerbird at aol.com  Wed Aug 31 15:37:57 2005
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Aug 31 15:38:15 2005
Subject: [gutvol-d] banana-cream
Message-ID: <1da.4356e517.30478b45@aol.com>

marcello said:
>    I request a copy of your program for linux.

it doesn't exist for linux.   only for mac and windows.

the compiler i use -- realbasic -- can indeed produce
linux programs, from the same code that produces the
mac and windows apps, so it is possible for me to make
linux apps as well, if there should be demand for them.

but since i don't have a linux machine to do any testing,
it wouldn't be a good use of my time to pursue that now.

but if anyone wants to be my linux alpha-tester, just
let me know.   i'll feed you a succession of programs,
starting with "hello world" and working from there,
to bring the full range of my programs to your o.s.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050831/01a88ac1/attachment.html