From shimmin at uiuc.edu Mon Aug 1 07:15:44 2005 From: shimmin at uiuc.edu (Robert Shimmin) Date: Mon Aug 1 07:15:50 2005 Subject: [gutvol-d] Rule 6 Question In-Reply-To: References: Message-ID: <42EE2E90.4030404@uiuc.edu> Greg Weeks wrote: > Is it possible to do a rule 6 clearance on a book where the author isn't a > US national? Didn't non-US copyright holders only get the extension during > the 1924-1964 block when they either renewed or filed for a "Notice of > Intent to Enforce (NIE) a Restored Copyright"? Or is it just much more > complicated than that? This is my understanding, but I am not a lawyer: Copyright on works of foreign nationals that were still under copyright in the creator's home country, and had fallen out of copyright in the U.S. due to failure to observe U.S. copyright formalities, were automatically restored by the Uruguay Round Agreement Act on January 1, 1996. However, in order to enforce the restored copyright against a party that had been utilizing the work's public domain status prior to the passage of the URAA (December 8, 1994), the rights holder must have filed a Notice of Intent to Enforce. It is my understanding that filing an NIE is not necessary to enforce a copyright against an infringer whose infringement began after December 8, 1994, and so PG cannot usefully exploit this loophole to create new editions of works whose rights holders did not file an NIE. -- RS From greg at durendal.org Mon Aug 1 08:01:21 2005 From: greg at durendal.org (Greg Weeks) Date: Mon Aug 1 08:01:28 2005 Subject: [gutvol-d] Rule 6 Question In-Reply-To: <42EE2E90.4030404@uiuc.edu> Message-ID: On Mon, 1 Aug 2005, Robert Shimmin wrote: > Greg Weeks wrote: > > Is it possible to do a rule 6 clearance on a book where the author isn't a > > US national? Didn't non-US copyright holders only get the extension during > > the 1924-1964 block when they either renewed or filed for a "Notice of > > Intent to Enforce (NIE) a Restored Copyright"? Or is it just much more > > complicated than that? > > This is my understanding, but I am not a lawyer: > > Copyright on works of foreign nationals that were still under copyright > in the creator's home country, and had fallen out of copyright in the > U.S. due to failure to observe U.S. copyright formalities, were > automatically restored by the Uruguay Round Agreement Act on January 1, > 1996. > > However, in order to enforce the restored copyright against a party that > had been utilizing the work's public domain status prior to the passage > of the URAA (December 8, 1994), the rights holder must have filed a > Notice of Intent to Enforce. > > It is my understanding that filing an NIE is not necessary to enforce a > copyright against an infringer whose infringement began after December > 8, 1994, and so PG cannot usefully exploit this loophole to create new > editions of works whose rights holders did not file an NIE. That makes sense. At least as much sense as the rest of the US copyright nonsense. Thanks. -- Greg Weeks http://durendal.org:8080/greg/ From ehage at hot.rr.com Mon Aug 1 05:57:37 2005 From: ehage at hot.rr.com (Ellen V. Hage) Date: Mon Aug 1 08:26:47 2005 Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey Message-ID: <200508011357.j71DvwgJ022097@ms-smtp-04.texas.rr.com> Hello, My name is Ellen V. Hage and I am doing my dissertation on e-book technology usage and self-efficacy. I am off my target of needed participants. Hopefully you all can help be out. The survey has only 28 questions and should take no more than 5 minutes to complete. The survey is a blind survey and doesn't collect any confidential information. The URL is: http://www.zoomerang.com/survey.zgi?p=WEB224GUJSUZZE Thanks, again, Ellen V. Hage -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050801/4dc47c7c/attachment.html From hart at pglaf.org Mon Aug 1 10:12:21 2005 From: hart at pglaf.org (Michael Hart) Date: Mon Aug 1 10:12:24 2005 Subject: [gutvol-d] Rule 6 Question In-Reply-To: References: Message-ID: BTW, does anyone know about Japanese copyrights from before WWII??? We have a volunteer with some pre-war Japanese materials. I would usually use the 1923 cutoff. Thanks! Michael From prosfilaes at gmail.com Mon Aug 1 12:18:02 2005 From: prosfilaes at gmail.com (David Starner) Date: Mon Aug 1 12:18:10 2005 Subject: [gutvol-d] Rule 6 Question In-Reply-To: References: Message-ID: <6d99d1fd050801121831fc8f55@mail.gmail.com> On 8/1/05, Michael Hart wrote: > > BTW, does anyone know about Japanese copyrights from before WWII??? > > We have a volunteer with some pre-war Japanese materials. > > I would usually use the 1923 cutoff. It'd be Life+50 in Japan, and thus for the US they'd have to die before 1946 and not have renewed their work in the US for a Rule 6 clearance, if I understand correctly. What about foreign nationals who became American citizens? The book I'm looking at is Nabokov's Alice in Wonderland, which he published in 1923 in France, and then he became an American citizen in 1945. From collin at xs4all.nl Mon Aug 1 12:52:50 2005 From: collin at xs4all.nl (Branko Collin) Date: Mon Aug 1 12:37:42 2005 Subject: [gutvol-d] Rule 6 Question In-Reply-To: References: Message-ID: <42EE99B2.32640.17423734@localhost> On 1 Aug 2005, at 10:12, Michael Hart wrote: > BTW, does anyone know about Japanese copyrights from before WWII??? > > We have a volunteer with some pre-war Japanese materials. > > I would usually use the 1923 cutoff. Why not now? Is this for publication in the US? If so, pre-1923 should be fine (IANAL). If not, I suggest the volunteer talk to the Aozora Bunko people. The Wikipedia article on copyright in Japan mentions it as a Life+50 country, but also mentions that it has some war time exceptions. -- branko collin collin@xs4all.nl From Gutenberg9443 at aol.com Fri Aug 5 16:22:19 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Aug 5 16:22:44 2005 Subject: [gutvol-d] In-Reply-to Message-ID: In a message dated 7/29/2005 2:14:54 AM Mountain Daylight Time, Bowerbird@aol.com writes: but nonetheless, i don't see an in-reply-to header on your post either, so yes, the problem _is_ there... Then the problem is at your end, not at mine, because I have a reply to header. Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050805/5f949c83/attachment.html From j.hagerson at comcast.net Sat Aug 6 02:40:53 2005 From: j.hagerson at comcast.net (John Hagerson) Date: Sat Aug 6 02:41:23 2005 Subject: [gutvol-d] E-books on iPods? Message-ID: <00de01c59a6a$f3393270$0300a8c0@sarek> I received a question about loading the MP3 version of our e-books into an iPod. I may be the only person on the planet who does not own an iPod, but I have no experience doing this. Has anyone successfully loaded our e-books into an iPod? If so, could you please provide me with a procedure that I can pass along? Thank you. From collin at xs4all.nl Sat Aug 6 03:50:33 2005 From: collin at xs4all.nl (Branko Collin) Date: Sat Aug 6 03:35:49 2005 Subject: [gutvol-d] E-books on iPods? In-Reply-To: <00de01c59a6a$f3393270$0300a8c0@sarek> Message-ID: <42F4B219.1734.8CFD702@localhost> On 6 Aug 2005, at 4:40, John Hagerson wrote: > I received a question about loading the MP3 version of our e-books > into an iPod. I may be the only person on the planet who does not own > an iPod, but I have no experience doing this. > > Has anyone successfully loaded our e-books into an iPod? If so, could > you please provide me with a procedure that I can pass along? I do not own an iPod, but can tell you this: an MP3 e-book is an MP3 file, just like all the music MP3s out there. Your friend should upload the ebook the same way s/he uploads music MP3s. Surely there must be something in either the iPod or the iTunes manual about this procedure? This is basic functionality. -- branko collin collin@xs4all.nl From Bowerbird at aol.com Sat Aug 6 06:00:43 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Aug 6 06:00:50 2005 Subject: [gutvol-d] In-Reply-to Message-ID: <86.2d93f1b7.30260e7b@aol.com> anne said: > Then the problem is at your end, not at mine, > because I have a reply to header. oh please, anne. wake up. of course our posts have a reply-to header. what they do not have is an in-reply-to header. pay attention, dear. you could start by noticing the subject-header of this thread... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050806/63b4ebeb/attachment.html From collin at xs4all.nl Sun Aug 7 07:24:16 2005 From: collin at xs4all.nl (Branko Collin) Date: Sun Aug 7 07:46:15 2005 Subject: [gutvol-d] newsletter? Message-ID: <42F635B0.16647.76987F@localhost> When I tried to look up the newsletter on the website, I noticed the most recent copy was of three months ago. Are the newsletters no longer published at www.gutenberg.org, or did I experience some weird caching problem? -- branko collin collin@xs4all.nl From cannona at fireantproductions.com Sun Aug 7 13:12:46 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun Aug 7 13:19:30 2005 Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey Message-ID: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't think this went through the first time, so here goes again. Sorry if it's a duplicate. If you haven't gotten around to completing this survey, let me encourage you to do so. Ellen is not just someone who dropped in here to get her survey filled out. She has been a volunteer for a while and helped out sending CDs and DVDs. Also, it doesn't take long at all. Sincerely Aaron Cannon At 07:57 AM 8/1/2005, you wrote: >Hello, > >My name is Ellen V. Hage and I am doing my dissertation on e-book >technology usage and self-efficacy. I am off my target of needed >participants. Hopefully you all can help be out. The survey has only 28 >questions and should take no more than 5 minutes to complete. The survey >is a blind survey and doesn't collect any confidential information. The >URL is: > >http://www.zoomerang.com/survey.zgi?p=WEB224GUJSUZZE > >Thanks, again, > >Ellen V. Hage >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFC9mzMI7J99hVZuJcRAsGOAKCvh7h/LcfCOTzg4U+PXXA5mnEPzwCfUWBE n6lOkWq+NGFLZS9JVR0Npg8= =4m+Y -----END PGP SIGNATURE----- From gbnewby at pglaf.org Sun Aug 7 14:06:19 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Aug 7 14:06:20 2005 Subject: [gutvol-d] newsletter? In-Reply-To: <42F635B0.16647.76987F@localhost> References: <42F635B0.16647.76987F@localhost> Message-ID: <20050807210619.GB18600@pglaf.org> On Sun, Aug 07, 2005 at 04:24:16PM +0200, Branko Collin wrote: > > When I tried to look up the newsletter on the website, I noticed the > most recent copy was of three months ago. Are the newsletters no > longer published at www.gutenberg.org, or did I experience some weird > caching problem? Hi, Branko. It was great seeing you at WTH! This is a task that was done by hand. We had a semi-automated method, but it didn't survive our many mailing list and Web site changes. The fellow who had been doing it recently resigned (about 3 weeks ago). It's pretty easy to maintain the archive. If anyone is interested in doing it, let me know. Maybe we should just link to the pipermail archives, or mirror them. These are publicly available: http://lists.pglaf.org/pipermail/gweekly/ -- Greg From marcello at perathoner.de Sun Aug 7 14:34:55 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Aug 7 14:41:50 2005 Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey In-Reply-To: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com> References: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com> Message-ID: <42F67E7F.6030807@perathoner.de> Aaron Cannon wrote: > If you haven't gotten around to completing this survey, let me encourage > you to do so. Ellen is not just someone who dropped in here to get her > survey filled out. She has been a volunteer for a while and helped out > sending CDs and DVDs. Also, it doesn't take long at all. I'm afraid that recruiting a significant portion of your respondents from gutvol-d will give you non-representative results. Also, many questions are `ambiguous': "I feel confident about my ability to purchase e-books online." I feel *very* confident about my ability to do so, but I would *never* do it. I would buy the paper edition, because I'm sure the paperbook remains usable if I have to change my bookshelves. Also I can lend the paperbook to my friends etc. The survey should be separated into 2 parts. Part 1 about the acceptance of free ebooks, Part 2 about the acceptance of fettered ebooks / DRM / proprietary devices / proprietary formats etc. -- Marcello Perathoner webmaster@gutenberg.org From cannona at fireantproductions.com Sun Aug 7 15:58:12 2005 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun Aug 7 16:00:28 2005 Subject: [gutvol-d] Volunteer Needs Help with Doctoral Survey In-Reply-To: <42F67E7F.6030807@perathoner.de> References: <6.2.1.2.0.20050807151208.037d2968@mail.fireantproductions.com> <42F67E7F.6030807@perathoner.de> Message-ID: <6.2.1.2.0.20050807175259.04155688@mail.fireantproductions.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At 04:34 PM 8/7/2005, you wrote: >I'm afraid that recruiting a significant portion of your respondents from >gutvol-d will give you non-representative results. So that we're all clear, I wasn't asked to vouch for her. I simply thought I'd step in and do so as she isn't that active on this list from what I can tell, but she does help out with the DVDs. Still, I agree with you that asking the volunteers who create Ebooks about the same will produce skewed results. I would assume that this is not the only place she has sought help, but can't say for sure. Sincerely Aaron Cannon - -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFC9pKEI7J99hVZuJcRAk7jAJwKKtQC5r4MrIH/gyS+JmvuhkYzSACghUEM ROwcwxrw4lqpSGhcsawMlrE= =U15S -----END PGP SIGNATURE----- From collin at xs4all.nl Sun Aug 7 17:22:42 2005 From: collin at xs4all.nl (Branko Collin) Date: Sun Aug 7 17:07:24 2005 Subject: [gutvol-d] newsletter? In-Reply-To: <20050807210619.GB18600@pglaf.org> References: <42F635B0.16647.76987F@localhost> Message-ID: <42F6C1F2.30393.29A8E81@localhost> On 7 Aug 2005, at 14:06, Greg Newby wrote: > On Sun, Aug 07, 2005 at 04:24:16PM +0200, Branko Collin wrote: > > > > When I tried to look up the newsletter on the website, I noticed the > > most recent copy was of three months ago. Are the newsletters no > > longer published at www.gutenberg.org, or did I experience some > > weird caching problem? > > Hi, Branko. It was great seeing you at WTH! Likewise! > Maybe we should just link to the pipermail archives, > or mirror them. These are publicly available: > http://lists.pglaf.org/pipermail/gweekly/ This seems an adequate solution, and is probably easier to implement than building a new volunteer. -- branko collin collin@xs4all.nl From Bowerbird at aol.com Sun Aug 7 21:06:46 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Aug 7 21:12:14 2005 Subject: [gutvol-d] f.t.p. for banana-cream Message-ID: <1e9.418f0a5c.30283456@aol.com> greg, if you could arrange that f.t.p. access for me when you get a chance, please... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050808/ec5ab0dc/attachment.html From ag737 at freenet.carleton.ca Thu Aug 11 12:04:18 2005 From: ag737 at freenet.carleton.ca (Wallace J.McLean) Date: Thu Aug 11 13:05:08 2005 Subject: [gutvol-d] Historical book publishing statistics Message-ID: <7cd0b7f2e3.7f2e37cd0b@ncf.ca> Does anyone have quick-and-dirty, and preferably annual, statistics on numbers of books published in the United States, the UK, and any other country, over time? From hart at pglaf.org Thu Aug 11 13:10:20 2005 From: hart at pglaf.org (Michael Hart) Date: Thu Aug 11 13:10:21 2005 Subject: [gutvol-d] Historical book publishing statistics In-Reply-To: <7cd0b7f2e3.7f2e37cd0b@ncf.ca> References: <7cd0b7f2e3.7f2e37cd0b@ncf.ca> Message-ID: On Thu, 11 Aug 2005, Wallace J.McLean wrote: > > Does anyone have quick-and-dirty, and preferably annual, statistics on > numbers of books published in the United States, the UK, and any other > country, over time? You can get some of these via "The Bowker Annual" at most reference desks. Michael I have last year's, and the 1955, right here, if these are of interest. From sly at victoria.tc.ca Thu Aug 11 15:12:25 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Aug 11 15:19:13 2005 Subject: [gutvol-d] Historical book publishing statistics In-Reply-To: <7cd0b7f2e3.7f2e37cd0b@ncf.ca> References: <7cd0b7f2e3.7f2e37cd0b@ncf.ca> Message-ID: In the library, I've seen large annual volumes of "New Published Books" from the earlyer 1900s, but I don't remember a summmary of numbers in them. A quick google search led me to some pages like these, that might help: http://observer.guardian.co.uk/review/story/0,6903,1288046,00.html http://mjroseblog.typepad.com/buzz_balls_hype/2005/06/is_yours_the_28.html http://www.primezone.com/pub/headlines.mhtml?d=53251 Andrew On Thu, 11 Aug 2005, Wallace J.McLean wrote: > > Does anyone have quick-and-dirty, and preferably annual, statistics on > numbers of books published in the United States, the UK, and any other > country, over time? > > From marcello at perathoner.de Thu Aug 11 15:31:05 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Aug 11 15:31:20 2005 Subject: [gutvol-d] Amazon abuse of PG trademark Message-ID: <42FBD1A9.4090204@perathoner.de> See: http://www.alexa.com/browse/general/?CategoryID=1219096 I'm referring to the sidebar that says: "Bestselling Products in Project Gutenberg". This gives the impression we are selling those books. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Thu Aug 11 21:55:52 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Aug 11 21:55:54 2005 Subject: [gutvol-d] Re: Problems with PG uploads In-Reply-To: <200507272144.39040.donovan@abs.net> References: <200507272127.58606.donovan@abs.net> <200507272144.39040.donovan@abs.net> Message-ID: <20050812045552.GD1544@pglaf.org> On Wed, Jul 27, 2005 at 09:44:38PM -0400, D Garcia wrote: > On Wednesday 27 July 2005 09:27 pm, D Garcia wrote: > > Quite a few of the DP folks are seeing the following error when uploading > > files to PF for the WW Team: > > > > "Aborting: no user information!" > > (Yes, I'm replying to myself ... :) > > The problem appears to be with the 'cz' program, which was recently updated by > Jim Timsley. I have sent him a note asking him to look into it. Apparently it > isn't able to determine what user it is running as on the server, and emits > the "Aborting: no user information!" error message and exits. (Working through some older emails) I'm just confirming that this was fixed -- there was a little weirdness in our Apache configuration. I tweaked the upload scripts, and voila! -- Greg From cweyant at twcny.rr.com Fri Aug 12 04:58:07 2005 From: cweyant at twcny.rr.com (Curtis A. Weyant) Date: Fri Aug 12 05:35:20 2005 Subject: [gutvol-d] E-books on iPods? In-Reply-To: <42F4B219.1734.8CFD702@localhost> References: <42F4B219.1734.8CFD702@localhost> Message-ID: <42FC8ECF.7000505@twcny.rr.com> There's a news item on the PG frontpage about converting TEXT for use with the iPod's Note feature. Short story, go here: http://www.ambience.sk/ipod-ebook-creator/ipod-book-notes-text-conversion.php Curtis. P.S. I also do not have an iPod but an hoping to remedy that in the near future. :O) Branko Collin wrote: > On 6 Aug 2005, at 4:40, John Hagerson wrote: > > >>I received a question about loading the MP3 version of our e-books >>into an iPod. I may be the only person on the planet who does not own >>an iPod, but I have no experience doing this. >> >>Has anyone successfully loaded our e-books into an iPod? If so, could >>you please provide me with a procedure that I can pass along? > > > I do not own an iPod, but can tell you this: an MP3 e-book is an MP3 > file, just like all the music MP3s out there. Your friend should > upload the ebook the same way s/he uploads music MP3s. Surely there > must be something in either the iPod or the iTunes manual about this > procedure? This is basic functionality. > From collin at xs4all.nl Fri Aug 12 07:55:28 2005 From: collin at xs4all.nl (Branko Collin) Date: Fri Aug 12 07:39:57 2005 Subject: [gutvol-d] What the Hack?! and other conferences next week In-Reply-To: <20050725025148.GA14434@pglaf.org> References: <42E42405.32070.11B1840@localhost> Message-ID: <42FCD480.15055.7322F5@localhost> On 24 Jul 2005, at 19:51, Greg Newby wrote: > On Sun, Jul 24, 2005 at 11:28:05PM +0200, Branko Collin wrote: > > > http://www.whatthehack.org > > One of my talks will be about Project Gutenberg, while > the other is about information retrieval: > > Saturday July 30 > "Literature wants to be free!" (Day 3, Tent 4, 1:00-2:00 pm) > > Friday July 29 > "Search engine internal processes" (Day 2, Tent 2, 10:00 - 11:00 am) > > These will be recorded, but not streamed live. I have been trying to find these recordings, but the Hacktick hacker camps have a tradition of sporting the Worst Website Evar, and this one is no exception. In other words, I could not find them. -- branko collin collin@xs4all.nl From curtzt at nuprometheus.com Tue Aug 16 18:29:04 2005 From: curtzt at nuprometheus.com (Thad Curtz) Date: Tue Aug 16 18:54:09 2005 Subject: [gutvol-d] Annotations for students Message-ID: <92d068b011c466997116e41ba04a2359@nuprometheus.com> Hi. I'm a college lit teacher and have been thinking about doing footnotes and annotations of the sort most editions for college students supply for some PG classics, so my students could have the usual kinds of help reading them, and print them out, mark them up, bring them to class for discussion, etc. (I think the lack of notes. not the quality of the texts themselves, is currently the main barrier to more widespread use of PG texts in classes.) If I did this, I'd want to make the annotations available free for anybody else who wanted to use them for teaching (or just to read). Some form of structured markup that allowed people to reformat and print to different sizes and devices in the future would be nice, rather than pdfs... I've looked at the archive, and haven't found any discussion of this topic; any suggestions or advice about such a project (or where to look next) would be appreciated. Thanks, Thad From jtinsley at pobox.com Tue Aug 16 19:31:06 2005 From: jtinsley at pobox.com (Jim Tinsley) Date: Tue Aug 16 19:31:33 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com> References: <92d068b011c466997116e41ba04a2359@nuprometheus.com> Message-ID: <20050817023106.GA6405@panix.com> On Tue, Aug 16, 2005 at 06:29:04PM -0700, Thad Curtz wrote: >Hi. I'm a college lit teacher and have been thinking about doing >footnotes and annotations of the sort most editions for college >students supply for some PG classics, so my students could have the >usual kinds of help reading them, and print them out, mark them up, >bring them to class for discussion, etc. (I think the lack of notes. >not the quality of the texts themselves, is currently the main barrier >to more widespread use of PG texts in classes.) If I did this, I'd want >to make the annotations available free for anybody else who wanted to >use them for teaching (or just to read). Some form of structured markup >that allowed people to reformat and print to different sizes and >devices in the future would be nice, rather than pdfs... > >I've looked at the archive, and haven't found any discussion of this >topic; any suggestions or advice about such a project (or where to look >next) would be appreciated. It has been discussed a few times before, but perhaps not on this list. You are, of course, welcome to create annotated versions for yourself, and to make them available through your own website. That's great, and we hope it works well for you. However, we will not accept them back for posting into PG as revised texts. If we did, we'd be inundated with Creationists annotating Darwin, Darwinists annotating Genesis, and every nutball who would be instantly removed from the lobby of any publisher wanting to add their essays on The Meaning Of . . . Believe me, they're out there. I've dealt with a few. It's not a practical proposition. Once you allow one person to annotate our texts, you have to give others an equal right, and the whole thing devolves instantly. As for formatting, while XML is theoretically ideal, there are practical issues. Some members of this list are breaking ground in this area, and may be able to make suggestions. Nearly all current markup work uses HTML + CSS, which is pretty flexible, and, as a practical matter, HTML is the Universal Input at the moment -- it can be immediately converted to PDA formats, PDF and so on. If you choose your conventions carefully, HTML or XHTML can be as well-structured as you like. jim From Bowerbird at aol.com Tue Aug 16 20:01:31 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Aug 16 20:01:52 2005 Subject: [gutvol-d] Annotations for students Message-ID: <15d.56e5af43.3034028b@aol.com> thad said: > Hi. I'm a college lit teacher and > have been thinking about doing footnotes and annotations > of the sort most editions for college students supply > for some PG classics, so my students could have the > usual kinds of help reading them, and print them out, > mark them up, bring them to class for discussion, etc. > (I think the lack of notes. not the quality of the texts > themselves, is currently the main barrier to > more widespread use of PG texts in classes.) > If I did this, I'd want to make the annotations available > free for anybody else who wanted to use them for teaching > (or just to read). Some form of structured markup that > allowed people to reformat and print to different sizes > and devices in the future would be nice, rather than pdfs... this is an entirely reasonable course of action. what you need is a viewer-program that can incorporate freestanding annotations into the presentation of the text (which remains static). i have written such a viewer-program, but have not yet programmed the annotation capabilities. if you'd be able to outline the ones you would like, and be willing to test them once i programmed 'em, i'd be happy to proceed that way... we usually think of "annotations" as textual, but they can actually manifest in a variety of ways, such as margin highlights, graphics, movies, etc. (see tk3 at nightkitchen.com for a program that is adept at allowing multiple types of annotations.) making your annotations widely and freely available, such as putting them on your website for download, is a generous thing to do... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050816/5e3080a9/attachment.html From sly at victoria.tc.ca Tue Aug 16 23:56:23 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Aug 16 23:56:43 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com> References: <92d068b011c466997116e41ba04a2359@nuprometheus.com> Message-ID: On Tue, 16 Aug 2005, Thad Curtz wrote: > Hi. I'm a college lit teacher and have been thinking about doing > footnotes and annotations of the sort most editions for college > students supply for some PG classics, so my students could have the > usual kinds of help reading them, and print them out, mark them up, > bring them to class for discussion, etc. I hadn't thought about this before, but I can see the sense in Jim's argument. On one hand, I think it's too bad if PG misses out on a well-researched, comprehensive set of annotations. On the other hand, I can see it that in the larger scheme of things it's probably better to stick with our practise of only adding contemparary material if it has already been published in "dead-tree" form. Thad: If you are looking for some other longer-term home for this type of text, one possibility may be Wikibooks. See: http://en.wikibooks.org/wiki/Wikibooks:Annotated_texts Thanks, Andrew From collin at xs4all.nl Wed Aug 17 02:34:56 2005 From: collin at xs4all.nl (Branko Collin) Date: Wed Aug 17 02:19:34 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com> Message-ID: <430320E0.31217.11260C@localhost> On 16 Aug 2005, at 18:29, Thad Curtz wrote: > Hi. I'm a college lit teacher and have been thinking about doing > footnotes and annotations of the sort most editions for college > students supply for some PG classics, so my students could have the > usual kinds of help reading them, and print them out, mark them up, > bring them to class for discussion, etc. (I think the lack of notes. > not the quality of the texts themselves, is currently the main barrier > to more widespread use of PG texts in classes.) If I did this, I'd > want to make the annotations available free for anybody else who > wanted to use them for teaching (or just to read). Some form of > structured markup that allowed people to reformat and print to > different sizes and devices in the future would be nice, rather than > pdfs... > > I've looked at the archive, and haven't found any discussion of this > topic; any suggestions or advice about such a project (or where to > look next) would be appreciated. I would assume this works the way the creation of any other educational material works: teachers sit down and write the stuff. At least, that is the impression I got from looking at text books and annotated editions. What did your fellow teachers suggest? -- branko collin collin@xs4all.nl From Bowerbird at aol.com Wed Aug 17 10:27:12 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 10:32:28 2005 Subject: [gutvol-d] Annotations for students Message-ID: <1c3.2ee2dccc.3034cd70@aol.com> branko said: > I would assume this works the way > the creation of any other educational material works: > teachers sit down and write the stuff. but once you've done that, you want your students to have a tool that presents the annotations alongside the text to which they refer. preferably a tool that then lets _them_ edit them, or add their own annotations. and finally, a tool that lets everyone _share_ annotations, perhaps even _pool_ all of them -- within a single class, or across many years -- perhaps in such a manner that the annotations themselves can become an independent e-book. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/81689281/attachment.html From scott_bulkmail at productarchitect.com Wed Aug 17 12:18:33 2005 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Aug 17 12:29:11 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <92d068b011c466997116e41ba04a2359@nuprometheus.com> References: <92d068b011c466997116e41ba04a2359@nuprometheus.com> Message-ID: Sounds like a wonderful project! Even though PG can't host it, don't let that dampen your enthusiasm. There are lots of other ways to make the results available for free. >Some form of structured markup that allowed people to reformat and print to different sizes and devices in the future would be nice, rather than pdfs... Agreed. Here's the challenge: if you make annotations within the existing PG files, then it's difficult to just see the annotations. If you keep the annotations separate, it's hard to see them in context. The ideal solution would be a tiny bit of automation (perhaps created by a student if techie stuff isn't your thing). Then you could keep the annotations separate, and just add small markers to the original text. Simple scripts could do things like: - format the annotations on their own - insert the annotations into the text, preferably with appropriate HTML wrapper that lets readers show/hide using CSS (style sheets) or JavaScript. As others have noted, HTML or the newer XHTML is ideal here. (If a specific "book" that you need doesn't exist in HTML, I'll bet some people here would help do at least a basic conversion.) >I've looked at the archive, and haven't found any discussion of this topic; any suggestions or advice about such a project (or where to look next) would be appreciated. If you want to take a more ambitious approach, review the list for discussions on "PGTEI", "TEI" and "XML". But, effective use of these is likely to be more work. (It's not overly difficult given the appropriate technical background, so it depends on what sort of resources you have available.) Note for the record that highly-structured XML "originals" would let you "point" each annotation to the appropriate place in a document without altering the original (using XPATH). That's great in theory, but is again probably too much tech work for your project. (I'd be happy for an XPATH expert to show that I'm wrong; perhaps it's easier than I think to point to a specific location in a typical PG HTML file, presumably using Tidy and such to convert to XHTML.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From Bowerbird at aol.com Wed Aug 17 12:59:49 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 13:00:06 2005 Subject: [gutvol-d] Annotations for students Message-ID: <1c1.2ea2ca2c.3034f135@aol.com> scott said: > The ideal solution would be a tiny bit of automation > (perhaps created by a student if techie stuff isn't your thing).? "tiny" is a very misleading term, i think. unless you can show me this "tiny" thing. > Then you could keep the annotations separate, > and just add small markers to the original text.? um, keeping the annotations separate is a good idea. but requiring "small markers" in the original text is not. the text should remain unchanged, for many reasons. > Simple scripts could do things like: > - format the annotations on their own > - insert the annotations into the text, > preferably with appropriate HTML wrapper that > lets readers show/hide using CSS (style sheets) or JavaScript. except what you have described is _far_ from "simple", as well as i can tell. do you have sample implementations? > As others have noted, HTML > or the newer XHTML is ideal here. "ideal"? i think not. indeed, to the direct contrary, i believe that heavy markup makes on-the-fly adding of annotations _extremely_ difficult. but, as i said, if you can show me some examples, ones that make it as simple as you make it sound, i am open to being convinced otherwise... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/d5a8a009/attachment.html From scott_bulkmail at productarchitect.com Wed Aug 17 13:38:06 2005 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Aug 17 13:44:16 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <1c1.2ea2ca2c.3034f135@aol.com> References: <1c1.2ea2ca2c.3034f135@aol.com> Message-ID: > > The ideal solution would be a tiny bit of automation > > (perhaps created by a student if techie stuff isn't your thing). > >"tiny" is a very misleading term, i think. >unless you can show me this "tiny" thing. All that's required is a tab-delimitted file (or database or spreadsheet) with 2 columns: id, annotation. Now, in your favorite scripting or programming language, iterate thru the file, read the id and annotation, then replace the former with the latter in the marked-up book. (Including the appropriate (X)HTML wrapper, as noted.) For someone who doesn't write scripts, it's not trivial. For someone who does, it's a few lines of code. Note that with a little more techie work, the process could be simplified for the annotaters. They could add the annotation text directly in the document, surrounded by unique delimiters. Then, a script could generate any version, e.g. replace delimiters with (X)HTML wrapper and/or with a generated unique ID; extract the annotations to a separate (X)HTML file that can be printed on its own, etc. All this stuff is pretty easy for a college student with any scripting experience. > > Then you could keep the annotations separate, > > and just add small markers to the original text. > >um, keeping the annotations separate is a good idea. >but requiring "small markers" in the original text is not. >the text should remain unchanged, for many reasons. Sure, in the ideal world. Meanwhile, inserting unique IDs is a pragmatic solution. Of course there's some work involved to sync the marked version with any future PG updates, but given "diff" tools, it's not that much work for a handful of annotated books. (And, with a little work, could be largely automated.) And, I'm all in favor of someone taking the time to get an XPATH solution working. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From jon at noring.name Wed Aug 17 13:36:24 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 13:51:55 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <1c1.2ea2ca2c.3034f135@aol.com> References: <1c1.2ea2ca2c.3034f135@aol.com> Message-ID: <1499876850.20050817143624@noring.name> Bowerbird wrote: > scott said: >>?As others have noted, HTML >>?or the newer XHTML is ideal here. > "ideal"?? i think not. XHTML is certainly not the best XML-based vocabulary for marking up books. A carefully selected subset of TEI is much better. Both are used in the context of XML. XHTML can be adapted to books (and is.) If XHTML is used in a major way to markup books, it makes sense to come up with a standardized set of pre-defined classes to identify text structures and content semantics. At this point, though, it still makes more sense to switch to TEI. This is apparently what DP plans to do. > indeed, to the direct contrary, i believe that heavy markup > makes on-the-fly adding of annotations _extremely_ difficult. How is that? Annotations can be linked to the text using the markup as "hooks" (e.g., using XPointer.) The more markup there is, the more hooks to latch onto. > but, as i said, if you can show me some examples, > ones that make it as simple as you make it sound, > i am open to being convinced otherwise... XPointer provides an XML-based standard to point to any spot in an XML document. Pointing to 'id' ("fragment identifiers") is the most robust and can survive various types of document edits. In plain text systems, where annotations have to hook to the content itself (rather than markup which is separate from the content), it is more difficult to prevent link breakage. Jon From Bowerbird at aol.com Wed Aug 17 15:13:17 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 15:13:33 2005 Subject: [gutvol-d] Annotations for students Message-ID: jon said: > How is that? > Annotations can be linked to the text > using the markup as "hooks" (e.g., using XPointer.) > The more markup there is, > the more hooks to latch onto. please show me -- and the original poster -- an implementation that actually works, now. > Pointing to 'id' ("fragment identifiers") is the most robust > and can survive various types of document edits. > In plain text systems, where annotations have to > hook to the content itself (rather than markup > which is separate from the content), > it is more difficult to prevent link breakage. this is another case of disingenuous sleight-of-hand. you are trying to make us believe that the text changes and the markup doesn't. what you've done, though, is merely specified that there is markup which _cannot_ change (the "fragment identifiers"), so as to assure link-permanence. if i were to specify content that can not change, i can guarantee link-permanence as well. and in almost all cases, we're more likely to have text-invariance than to have markup-invariance. (but this is beside the point, since it's easy enough to specify invariance of text and markup. it is also very easy to show link breakage in cases of variance.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/8e626238/attachment.html From Bowerbird at aol.com Wed Aug 17 15:37:54 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 15:38:19 2005 Subject: [gutvol-d] Annotations for students Message-ID: <1a8.3cf237d1.30351642@aol.com> scott said: > All that's required is a tab-delimitted file > (or database or spreadsheet) with 2 columns: > id, annotation. again, i say, please provide people with this solution! if it's as easy as you say, it shouldn't take much time. right? > Now, in your favorite scripting or programming language, > iterate thru the file, read the id and annotation, > then replace the former with the latter in the marked-up book.? > (Including the appropriate (X)HTML wrapper, as noted.) you've got some slippery thinking here about the "id" markers. do they already exist in the text? how did they get there? how did the database, which links id markers and annotations, come into existence? how are annotations shared with others? can annotations be made on annotations? what about things like graphics and movies -- how can they be utilized as annotations? > For someone who doesn't write scripts, it's not trivial.? > For someone who does, it's a few lines of code. as above, if it's just a few lines of code, why won't you write 'em? > Note that with a little more techie work, > the process could be simplified for the annotaters. that simplification would be a good thing, yes, a good thing indeed. > They could add the annotation text directly in the document, > surrounded by unique delimiters.? and maybe with just a little bit more techie work, you could provide those "unique delimiters" for them, save them the trouble. not everyone knows x.h.t.m.l., and not everyone wants to learn it. > Then, a script could generate any version, > e.g. replace delimiters with (X)HTML wrapper > and/or with a generated unique ID; > extract the annotations to a separate (X)HTML file > that can be printed on its own, etc. sounds nifty. i'll implement this for a plain-text file, and you do the implementation for a marked-up file, and we'll see who gets done first, and who has the solution that proves to be more powerful and robust. how about it scott? are you up for the challenge? > All this stuff is pretty easy for a college student > with any scripting experience. then it should be a piece of cake for you, scott, right? > Sure, in the ideal world.? Meanwhile, > inserting unique IDs is a pragmatic solution. i think that such a "pragmatic solution" would prove to be a false economy, by causing more problems than it solves, over the long run. any time you fork the original text into a different file, you're creating a brittleness that will bite you, and causing yourself unnecessary editing trouble in the future. > Of course there's some work involved > to sync the marked version with any future PG updates that's exactly what i was just talking about, yep. > but given "diff" tools, it's not that much work > for a handful of annotated books.? > (And, with a little work, could be largely automated.) have you ever done this type of work? have you been successful in automating it? if so, then you would be well-advised to start a business, and charge the world for your expertise, since there are lots of companies who are finding it expensive to do this, and they would dearly love to find a less-costly solution... > And, I'm all in favor of someone taking the time > to get an XPATH solution working. me too! how about you doing that work? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/6930ae90/attachment.html From jon at noring.name Wed Aug 17 16:47:27 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 16:47:45 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: References: Message-ID: <1011340781.20050817174727@noring.name> Bowerbird wrote: > jon said: >>?How is that? Annotations can be linked to the text using the markup >> as "hooks" (e.g., using XPointer.) The more markup there is, the >> more hooks to latch onto. > please show me -- and the original poster -- an implementation that > actually works, now. Simply add an 'id' to any XHTML tag, and you can link to it using any web application. Fragment identifiers are used in many millions of web pages, if not billions. The first step to an "annotation application" (which you are driving at) is to have the underlying standards (the "hooks") worked out so as to easily allow linking by the annotation application. This is already worked out in the XML world (e.g., XPointer.) That's what I meant. In LibraryCity we are now working on such annotation and related social networking applications (e.g., blogs, wikis) to digital texts using XPointer (which includes simple 'id' links.) Since the W3C standards are open and universal, others can likewise build their own application -- no need to invent a new hooking mechanism. >>?Pointing to 'id' ("fragment identifiers") is the most robust and >> can survive various types of document edits. In plain text systems, >> where annotations have to hook to the content itself (rather than >> markup which is separate from the content), it is more difficult to >> prevent link breakage. > you are trying to make us believe that the text changes and the > markup doesn't. Markup can change, but in the case of using 'id', the document maintainers will be careful to keep 'id's undisturbed as much as possible during document edits. This actually allows *major* changes to documents and yet keep existing links unbroken. Can this be done with plain text? Not as easily (it's not impossible, but requires some sort of mapping system, or a knowledge of all the known externally-generated links into the original text document.) For example, an author may issue some work, and later revise it by rewording paragraphs, adding new paragraphs, new chapters, etc. If they are careful, they can assure integrity of existing links into the updated Work by assuring the 'id's are properly preserved and placed where they should be. Here's an example: ====================================================================== [First Edition of a work]

First paragraph.

Second paragraph.

[Second Edition]

First paragraph with some minor edits.

Inserted whole new paragraph.

Second paragraph with some minor edits.

====================================================================== If I have an external annotation which points to the content of the second paragraph in the First Edition: id="1235", then in the Second Edition, the link will remain unbroken even if a new paragraph was inserted before it *and* the content in that paragraph was revised but not enough to be topically different. PSWG discussed the issues of interpublication linking for over a month for the next generation Open eBook Publication Structure, where we wanted to enable robust interpublication linking, annotation, etc., into OEBPS Publications. There's a *lot* of subtle and not-so-subtle issues involved, some of which I've outlined in prior messages to gutvol-d and TeBC a while back. The original proposal to allow external annotation of digital texts (like PG texts) may seem like a new idea to many PGers here, but it's been something several of us have considered for quite a while (I was thinking of it back in 2000 for Yomu.) It's not new to me. I've even mentioned it here a few times, but not so explicitly (because we had not yet publicly announced LibraryCity.) > what you've done, though, is merely specified that there is markup > which _cannot_ change (the "fragment identifiers"), so as to assure > link-permanence.?if i were to specify content that can not change, i > can guarantee link-permanence as well. Of course, one can come up with a scheme to link into plain text documents by character counting, paragraph counting, or a number of other methods (including Ted Nelson's Project Xanadu approach.) It is entirely possible someone has even come up with an IETF/RFC or something else covering this. Have you researched to see what others have already proposed for such a standard? (For Ted Nelson's Xanadu, refer to http://www.xanadu.com/ ) > and in almost all cases, we're more likely to have text-invariance > than to have markup-invariance.?(but this is beside the point, > since it's easy enough to specify invariance of text and markup. > it is also very easy to show link breakage in cases of variance.) Ah, but with XHTML 'id' (and now XML xmlid), it is possible to do significant text amendment and preserve existing links based on 'id'. Of course, as noted above link preservation can be achieved for plain text emendments, but it appears to be much messier, especially at the authoring level. But then someone smart and motivated (like Bowerbird) may come up with a clever way to make this work for plain text documents. Jon From Bowerbird at aol.com Wed Aug 17 17:16:25 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 17:21:50 2005 Subject: [gutvol-d] Annotations for students Message-ID: <1d4.42484bf0.30352d59@aol.com> jon said: > Since the W3C standards are open and universal, > others can likewise build their own application -- > no need to invent a new hooking mechanism. so you are telling the original poster that he can "build his own application", is that what i'm getting? nobody in the big wide world of x.m.l. has done it yet? it's interesting how you always say "x.m.l. can do this", but when it gets right down to it, nobody has done it. when are y'all gonna get around to solving these issues? you're telling people that they can do it themselves, but meanwhile the experts haven't even done it yet! don't you sense the disconnect in what you're saying? vapor vapor vapor vapor vapor. > Markup can change, but in the case of using 'id', > the document maintainers will be careful to keep 'id's > undisturbed as much as possible during document edits. keep that in mind when you're building your system, scott! > Can this be done with plain text? Not as easily i'll worry about the plain-text implementation. and it'll be all you can do to keep up with me... > (I was thinking of it back in 2000 for Yomu.) what was "yomu"? people here might want to know. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/e485fcfc/attachment.html From donovan at abs.net Wed Aug 17 17:37:10 2005 From: donovan at abs.net (D Garcia) Date: Wed Aug 17 17:51:02 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <20050817234746.634B18C914@pglaf.org> References: <20050817234746.634B18C914@pglaf.org> Message-ID: <200508172037.11095.donovan@abs.net> Separation of content, anyone? Try as an experiment putting in the HTML version of the document a tag such as blah blah blah blah blah and in your header a link to an external style sheet (CSS2) In that external style sheet, put something like note1 { content:after "Your comment here"; } or perhaps content: url(note1.html); Of course, this will only work in a CSS2 compliant browser, so for now that's everybody except IE. From scott_bulkmail at productarchitect.com Wed Aug 17 17:52:57 2005 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Aug 17 17:53:33 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <1a8.3cf237d1.30351642@aol.com> References: <1a8.3cf237d1.30351642@aol.com> Message-ID: >you've got some slippery thinking here about the "id" markers. >do they already exist in the text? how did they get there? >how did the database, which links id markers and annotations, >come into existence? how are annotations shared with others? >can annotations be made on annotations? what about things like >graphics and movies -- how can they be utilized as annotations? I don't have time to debate Bowerbird on these points, but if the original poster or anyone else who is actually going to work on annotation has serious questions, I'll be happy to throw in my $0.02 if I notice the thread. >as above, if it's just a few lines of code, why won't you write 'em? Several reasons, including: - the original poster is better off having it done close to home so they have more control over implementation details, enhancements, etc. - this isn't generally a tech list - I typically code such things in UserTalk, which is most useful to those who already use Frontier for other reasons rather than as a quick-and-dirty scripting solution for an unknown environment. If the original poster decides to use Frontier and whatever techie they find gets stuck, I'll be happy to help them out. > > but given "diff" tools, it's not that much work > > for a handful of annotated books. > > (And, with a little work, could be largely automated.) > >have you ever done this type of work? >have you been successful in automating it? > >if so, then you would be well-advised to start a business, >and charge the world for your expertise, since there are >lots of companies who are finding it expensive to do this, >and they would dearly love to find a less-costly solution... I have done this kind of work, have automated it, am in business, and have and do charge for it. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From jon at noring.name Wed Aug 17 18:30:20 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 18:34:44 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: References: <1a8.3cf237d1.30351642@aol.com> Message-ID: <1508707233.20050817193020@noring.name> Scott wrote: > Bowerbird wrote: >> you've got some slippery thinking here about the "id" markers. >> do they already exist in the text? how did they get there? >> how did the database, which links id markers and annotations, >> come into existence? how are annotations shared with others? >> can annotations be made on annotations? what about things like >> graphics and movies -- how can they be utilized as annotations? > I don't have time to debate Bowerbird on these points, but if the > original poster or anyone else who is actually going to work on > annotation has serious questions, I'll be happy to throw in my $0.02 > if I notice the thread. Most of the time one doesn't have to author an actual implementation to determine whether it will be hard or not. Most experienced and even inexperienced programmers instinctively know the difficulty of most proposed applications. It's like saying "if I go to the roof of a tall building and toss a bowling ball off the side, it will begin accelerating to the ground." It's obvious what will happen --there's no need to even waste the time and run the experiment. Some aspects of the current discussion about external annotation of digital texts is similar. If one wants to implement some system, planning and forethought are needed *before* writing lines of code. For example, does XML confer benefits in the annotation system over plain text, or vice-versa? Obviously, Bowerbird wants everyone in the digital text arena to embrace regularized digital text, and he is (apparently) building a set of working applications (he calls them "tools") to prove his point. All the power to him. But we certainly have the right to bring up the requirements issues and call into question whether regularized plain text is sufficient for all uses and needs of the digital text universe. There are those among us, including yours truly, who believe that XML should form the core of digital publishing processes and formats. Even if Bowerbird implements his system, we will ask: "will it do this and do that?" We *know* that XML and its many associated W3C and IETF specifications and RFCs confers a powerful and sufficient foundation to do all the myriad things proposed for the digital publishing universe (that I know of at least, and I've looked at a *lot* of advanced uses.) And we don't have to write code to *know* this as true (see bowling ball discussion above.) Those who listen to Garrison Keillor's "Prairie Home Companion" are familiar with the fictional "Ralph's Pretty Good Grocery" (RPGG) in Lake Wobegon. Their advertising slogan is "If you can't get it at Ralph's, you can probably get along without it." Bowerbird's system is clearly a RPGG since I know it will NOT do everything that has been discussed for digital texts. Whether it will hit the sweet spot and win the hearts and minds of the ebook masses (which must include all the important stakeholders in the digital publishing universe), who will overlook its deficiencies, remains to be seen. I'm skeptical, but will wait to see what arrives. Jon From Bowerbird at aol.com Wed Aug 17 19:10:20 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 19:10:41 2005 Subject: [gutvol-d] Annotations for students Message-ID: donovan said: > Try as an experiment > putting in the HTML version of the document > a tag such as > blah blah blah blah blah > and in your header a link to an external style sheet (CSS2) > In that external style sheet, put something like > note1 { > ? content:after "Your comment here"; > } > or perhaps > ? content: url(note1.html); so, you want the user to edit an .html file at the insertion point of every annotation, and then edit the .css file appropriately... i'll let the original poster tell us whether that is something he feels is reasonable or not... > Of course, this will only work in a CSS2 compliant browser, > so for now that's everybody except IE. or, to put it another way, 1 out of 8 people. *** scott said: > I don't have time to debate Bowerbird on these points, i asked some fairly simple questions. there is no "debate". all you have to do is answer the fairly simple questions... > If the original poster decides to use Frontier > and whatever techie they find gets stuck, > I'll be happy to help them out. so even a "techie" might "get stuck" doing this? well, ok. but your offer to help them out is certainly generous... > I have done this kind of work, have automated it, > am in business, and have and do charge for it. great! can you point us to your for-a-fee solution? i would be interested in pricing it. do you have a cost-free demo? i'd like to see the profit-margin on "a few simple scripts". -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/3c939848/attachment.html From Bowerbird at aol.com Wed Aug 17 19:29:11 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 19:29:32 2005 Subject: [gutvol-d] Annotations for students Message-ID: <42.6f5c078e.30354c77@aol.com> jon said: > Most of the time one doesn't have to > author an actual implementation > to determine whether it will be hard or not. i program, jon. i'm not the world's greatest programmer, but there aren't that many things that are hard to program, even for me. you just have to be willing to break the job down into small enough pieces. oh yeah, it also helps to have users who don't care how big and/or slow your application might be... but until you've actually programmed your application, you don't really know what kind of obstacles lie hidden. heck, sometimes you don't find out until end-users tell you. > Most experienced and even inexperienced programmers > instinctively know the difficulty of most proposed applications. most inexperienced programmers can't count on their "instincts". (and someone like you -- a nonprogrammer -- certainly can't.) and experienced programmers know that some projects that look easy on the outside have a lot of those hidden obstacles. but one thing i can tell you for sure, as a programmer, is that if you let a nonprogrammer tell you what you should be doing, you're in for a very rough time, unless you're paid by the hour, and hard-up for the cash. (it also helps if you are a masochist.) another thing i can tell you for sure, as a programmer, is that the fact that a format is "open and universal" doesn't mean diddly-squat in terms of whether it'll be easy to program for it. > Bowerbird's system is clearly a RPGG > since I know it will NOT do everything > that has been discussed for digital texts. oh really? you seem to have some kind of super-e.s.p. when it comes to knowing about my stuff, some of which i haven't even programmed, which could provide _useful_info_ to me... heck, i can get a critique of my software _before_i_even_write_it_! that's awesome. so jon, tell me what it won't do... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/04274eb9/attachment.html From jon at noring.name Wed Aug 17 19:29:20 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 19:29:46 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <1d4.42484bf0.30352d59@aol.com> References: <1d4.42484bf0.30352d59@aol.com> Message-ID: <487792669.20050817202920@noring.name> Bowerbird wrote: > jon said: >>?Since the W3C standards are open and universal, others can likewise >> build their own application -- no need to invent a new hooking >> mechanism. > so you are telling the original poster that he can "build his own > application", is that what i'm getting? I suppose, if he's interested. And if so, he wants guidance as to the various issues involved, and the various proposed solutions. That's what this discussion is about. Anyone can build "tools", but there's a gazillion "tools" out there gathering dust on shelves because the authors did not do their homework properly and try to understand the truly important requirements leading to widescale embracement. Anyone who builds such an annotation system *should* see the bigger picture of the various issues involved *before* just building something out of the blue -- and to understand how annotation fits into the bigger picture of the general use of digital publications. We (including you) are providing some of that foundation by giving our respective views and perspectives on the matter. No need to build "tools" to provide this perspective. That's silly. The tools can be built, whether based on XML or plain text, when there is a need and a decision to go ahead *after* fully understanding the requirements. That you are supposedly going ahead with a plain text solution is noble, but not germane to this discussion. You have not stated *why* you believe your plain text solution is superior to XML for this particular application. You've only dissed the XML approach w/o going into detail of how your plain text approach will sufficiently solve the external annotation of digital publications and meet all the important requirements as we understand them now. So far, all you've implied is "Trust me, *I'm* building a tool" (which reminds me of John Kerry in the last prez election when he promised many times "Trust me, I have a plan" but never gave specifics at the time.) At least I tried to explain why the XML suite of specifications provides a good foundation upon which to build that specific functionality. So, how specifically would you implement external annotation of plain texts in your system and why is the plain text approach superior to the XML approach *for this specific purpose*? > nobody in the big wide world of x.m.l. has done it yet? It is built *when* there is a need for it, or somebody just takes an interest, whether there's a need or not. > it's interesting how you always say "x.m.l. can do this", > but when it gets right down to it, nobody has done it. > when are y'all gonna get around to solving these issues? See my other message where I discuss this ("bowling ball experiment".) To build anything, there has to be a perceived need, and up to now there's not been the need, at least in the digital publishing universe. > you're telling people that they can do it themselves, > but meanwhile the experts haven't even done it yet! > don't you sense the disconnect in what you're saying? You are being disingenous by implying "they" (the XML experts) haven't done it because it can't be done. That's wrong. The people who authored XML included a large number of *experienced* software developers who would eat your lunch. They developed XML not to solve specific problems (although specific problems were in the back of their minds), but rather to provide a powerful base upon which applications to process text-based documents and data sets could be built *when there is a need*. Just refer to the XML Cover Pages for getting an idea of how well XML is being used to solve all kinds of problems. It's amazing and overwhelming: http://xml.coverpages.org/ When you say "they" (whoever "they" are) haven't implemented a particular application of *your* choosing -- using XML technologies -- as "proof" that XML is no good -- that is beyond silly. But that's exactly what you continue to imply. It is a form of circular reasoning -- clever, but easily seen through. Anyway, there have been companies who've built proprietary systems to interlink XML data using XPath and XPointer -- I know this for a *fact*. One of my associates consulted for that company but I don't recall the details of company name and product name -- it was shared to me at the time, two years ago, in confidence under NDA. I'm sure if one does a search at the XML Cover Pages, one will find several implementations of the same W3C standards one would use for creating a powerful annotation environment for XML documents. I'm not going to do it because it is unnecessary at this time for the current state of this discussion.) > and it'll be all you can do to keep up with me... I'll hand it to you -- you got chutzpah, and have been implying the same thing for the nine or so years I've known you since ebook-list. I'm dizzy trying to keep up with you! Jon From Bowerbird at aol.com Wed Aug 17 19:43:51 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 19:44:09 2005 Subject: [gutvol-d] Annotations for students Message-ID: <20c.748be19.30354fe7@aol.com> jon said: > Anyone can build "tools", but there's a gazillion > "tools" out there gathering dust on shelves there are? can you point us to a half-dozen of these gazillion? > Anyone can build "tools", but there's a gazillion > "tools" out there gathering dust on shelves > because the authors did not do their homework properly > and try to understand the truly important requirements > leading to widescale embracement. well, have you written reviews on these tools to inform the authors about "the proper homework" and "the truly important requirements"? if so, i'd like to see the reviews and get this head-start. let's examine an actual example that i gave in this thread. tk3, from nightkitchen.com, has good annotation features. these include dogearing, highlighting (in 4 different colors), stickies (in those same 4 colors), and a notebook capability, which allows the user to include text, graphics, and movies from the e-book in their notes. annotations can be shared. how does that stack up to your gazillion other tools, jon? > No need to build "tools" to provide this perspective. > That's silly. The tools can be built, whether based on > XML or plain text, when there is a need and a decision > to go ahead *after* fully understanding the requirements. except you will _not_ fully comprehend the situation _until_ you build the tools. it's an iterative process. and the fact that you don't know that is one of our main points of contention... i encourage you to get some programming experience, jon. it'll make you a lot smarter... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/3db5492c/attachment.html From scott_bulkmail at productarchitect.com Wed Aug 17 19:57:44 2005 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Aug 17 19:58:18 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: References: Message-ID: In answer to a specific suggestion, I typed: > > I have done this kind of work, have automated it, > > am in business, and have and do charge for it. Bowerbird replied: >great! > >can you point us to your for-a-fee solution? i would be >interested in pricing it. do you have a cost-free demo? > >i'd like to see the profit-margin on "a few simple scripts". You left out some important context: my comment above was a reply to automated "diff", which is NOT something the original poster asked about, so it wasn't covered in my reply to him. If Thad (who started this thread) is looking for a pragmatic solution that can work well for several books, it really is just a few simple scripts. That would make a pretty meager product. Of course annotation can be much more complex, but there's no need to make it so just to deliver some useful content to students. I hate to leave issues hanging, but in your case I make an exception. Folks who are new to the list may find it a bit rude of me not to reply to the many points you raised; folks who have been on awhile or stumbled across the many relevant portions of the archives will understand. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From jon at noring.name Wed Aug 17 20:03:04 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 20:03:24 2005 Subject: SOPHIE (replacement for tk3 and XML-based) Re: [gutvol-d] Annotations for students In-Reply-To: <42.6f5c078e.30354c77@aol.com> References: <42.6f5c078e.30354c77@aol.com> Message-ID: <208952740.20050817210304@noring.name> Bowerbird wrote: > jon said: >> Most experienced and even inexperienced programmers >> instinctively know the difficulty of most proposed applications. > most inexperienced programmers can't count on their "instincts". > > (and someone like you -- a nonprogrammer -- certainly can't.) Well, I have written over 100,000 lines of code (Fortran) over the years. I've also written some scripts. I've edited and compiled some C code. I've written a bunch of GWBasic programs. And I have a couple associates in the XML world who are programming wizards and who I often consult with regarding what can and can't be done. I've also worked in engineering teams which included C++ coders. I took several graduate level classes in computer science back in the 1980's, mostly numerical analysis. I have a pretty good "lay" understanding of contemporary programming. And in the XML world, I keep abreast of successful XML applications (e.g., web browsers) -- what they can do and can't do. Next? > and experienced programmers know that some projects that > look easy on the outside have a lot of those hidden obstacles. Certainly! I ran into this all the time when I was coding numerical simulations of complex thermochemical systems (it's fun solving over 200 non-linear, and quite unstable equations in the same number of unknowns.) But overall one can *usually* get a good grasp at the general difficulty of a programming task before sitting down and writing code. Encountering problems is the norm, and it simply takes either work-arounds, or changing the algorithm, in order to resolve. It is expected. ***** Btw, the tk3 example you gave in the other email you just sent (and I just read), the replacement for it, SOPHIE, will be *XML-based*. The developer of tk3, Bob Stein, is the major player of SOPHIE. He's been around for years and years -- his partner is very experienced. Gee, I wonder why they will now change gears and embrace XML? They are *experienced* developers with a lot of knowledge of the tk3 product (over 15 years), and *they* are switching to XML. http://www.annenberg.edu/futureofthebook/content/Mellon.pdf http://rit.mellon.org:8080/dev/projects/Sophie/ You better hurry and convince Bob Stein he needs to get rid of XML and embrace plain text! Hurry, before it is too late! Your programming experience should convince them of the folly of their ways. Jon From Bowerbird at aol.com Wed Aug 17 20:21:08 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 20:21:28 2005 Subject: [gutvol-d] Annotations for students Message-ID: <15d.56facc3b.303558a4@aol.com> scott said: > I hate to leave issues hanging, > but in your case I make an exception. no problem, scott. i'll have an annotation app out soon, so thad won't be left hanging for long. *** jon said: > You have not stated *why* you believe your plain text solution > is superior to XML for this particular application. sorry, i thought i've said that too many times to repeat again. a plain-text solution is superior to x.m.l. because it frees people from the hassle of doing markup. fewer costs, equivalent benefits. fairly straightforward... > You've only dissed the XML approach w/o going into detail > of how your plain text approach will sufficiently solve > the external annotation of digital publications and meet > all the important requirements as we understand them now. first of all, i haven't "dissed" the x.m.l. approach. if somebody can present me an x.m.l. solution that works, and is perfectly transparent to users, i'll be happy as a clam. what i have pointed out, respectfully, is that acronyms are not solutions. the feeling here seems to be that "x.m.l. can do that" is an answer. it's not. if x.m.l. can truly do something, give people a straightforward answer about _how_ to do it... i know people who do tech support for software companies. they are constantly having to deal with the expectations that were falsely raised by the marketing department that says "oh sure, our software can do that" simply to close the sale. then, when the customer (rightfully) expects the software to "do that", they are in for the surprise of their life, because -- although the software _can_ do that -- it usually requires a very expensive person (or even a whole crew) to work it. you're doing the same kind of bait-and-switch to people here. everyone gets the _impression_ that x.m.l. is all-powerful, but x.m.l. actually never gets around to doing anything at all! i hope thad, the original poster, will correct me if i'm wrong, but i don't believe he came here asking about what kind of core methodology he could use that would "eventually" let him "write some scripts" or "build an application" to do annotations. he wants to do annotations now, and wants a way to do them that doesn't involve a lot of unnecessary work to implement. the specific task that he has in mind is writing the annotations, not creating an annotation system. believe me, if x.m.l. delivered even _one-fifth_ of its promises, i'd be one of its biggest supporters. but so far it's 95% vapor. and let us not forget that, in order to start milking benefits, _first_ we have to mark up the entire library, and i must again remind people that little progress is happening on that front... > So far, all you've implied is "Trust me, *I'm* building a tool" wrong. my saying is "the proof is in the pudding". i'm not asking anybody to trust anybody. be skeptical! when my annotation program is ready, i will let you know. and i "trust" that when your system is ready, you'll tell me. and i "trust" the lurkers will notice who finishes first, and finishes best, and which system is more robust and powerful and cost-efficient and end-user friendly and all that good stuff. > You are being disingenous by implying "they" > (the XML experts) haven't done it because it can't be done. wrong. it _can_ be done in x.m.l. the degree of difficulty is higher than with plain-text. but it's _eminently_ doable. nothing is that difficult in this arena. this ain't putting a man on the moon. it's basically a note-taking application. what i _am_ curious about, though, is why it hasn't been done already? you'd think that with the huge number of people running around saying "x.m.l. is the solution" that _somebody_ would've put a bell on this cat by now. perhaps that degree of difficulty is higher than we think. > The people who authored XML included > a large number of *experienced* software > developers who would eat your lunch. when they finally write an annotation app, show me their program! if it works well, i'll buy 'em beer to go along with my lunch, and take 'em to the strip club that evening... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050817/5aaf16f0/attachment.html From jon at noring.name Wed Aug 17 22:24:29 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 17 23:18:03 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: <15d.56facc3b.303558a4@aol.com> References: <15d.56facc3b.303558a4@aol.com> Message-ID: <334121333.20050817232429@noring.name> Bowerbird wrote: > i'll have an annotation app out soon, > so thad won't be left hanging for long. But will it work for *him*? Will it meet the requirements he sees that such a system must fulfill? That's the purpose of this discussion -- to assess the requirements for any system enabling external annotations of digital publications. This list of requirements is independent of solution type, but will certainly be useful in evaluating how one would implement the application (whether plain text or XML-based.) Btw, since DP is providing the lion's share of the PG texts, to master them in XML (TEI and XHTML), and plans to redo many of the early PG texts, the issue that PG's corpus is now mostly plain text becomes less compelling. Jon From tb at baechler.net Wed Aug 17 22:52:01 2005 From: tb at baechler.net (Tony Baechler) Date: Wed Aug 17 23:24:39 2005 Subject: [gutvol-d] Annotations for students In-Reply-To: References: <92d068b011c466997116e41ba04a2359@nuprometheus.com> <92d068b011c466997116e41ba04a2359@nuprometheus.com> Message-ID: <5.2.0.9.0.20050817224723.03b3eca0@bisinc.us> Hello all. I'm sorry, but I'm a little confused about something here. PG has an offer from a teacher to annotate editions of PG classics which are already available. Greg, Jim and Andrew all seem to agree that such notes should not be included in PG. This is too bad since I would like to see such annotating. However, what about a book by O'Henry? There are two editions of this. One without notes and one with notes by Joe who posted it. I think the precident has been set to allow books with contemporary notes added. Why not just assign them a new etext number? book 17,000 for example could be a Mark Twain book with footnotes added. If this is unacceptable, why not just have a xxxxx-notes file or directory? For example: 17000.txt, 17000-h.htm, 17000-notes.txt Or, the same as above but 17000-notes/ would be a separate directory with 17000.txt that has notes added. Hopefully this is clear. Is there any reason why this can't be done? This is how page images are done, right? From Bowerbird at aol.com Wed Aug 17 23:54:38 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 17 23:54:59 2005 Subject: [gutvol-d] Annotations for students Message-ID: tony said: > Greg, Jim and Andrew all seem to agree that such notes > should not be included in PG.? This is too bad since > I would like to see such annotating. i would think it would be too difficult to decide _who_ would get to provide the notes for a particular e-text... it's easy enough for people to put their annotations on their website for download by anyone who wants. i think that's the reasonable course of action to take. *** jon said: > But will it work for *him*? i dunno. thad will have to tell me. :+) > Will it meet the requirements he sees > that such a system must fulfill? i'd think so, since i've done a lot of thinking and searching and researching on what the realm of requirements are. but who knows? maybe thad has an idea for a better mousetrap. tell me, what do _you_ think "the requirements" are? > That's the purpose of this discussion > -- to assess the requirements for any system > enabling external annotations of digital publications. not really. i got the impression that thad just wants something that works; he wasn't trying to start a philosophical discussion about the realm of annotation. he just wants to juxtapose his commentary next to the text to which it refers, i'd think. (thad, don't let this melee scare you off, man; step right in and tell us what you really think!) ;+) > This list of requirements is independent of > solution type, but will certainly be useful in > evaluating how one would implement the application > (whether plain text or XML-based.) again, i don't think this is all that complicated. if you've examined all the types of annotation needed, and seen how various e-book programs have implemented solutions, it is straightforward. > Btw, since DP is providing the lion's share of > the PG texts, to master them in XML (TEI and XHTML), > and plans to redo many of the early PG texts, > the issue that PG's corpus is now mostly plain text > becomes less compelling. yeah, well, you'll let me know when that markup is complete, won't you please? because i'm not holding my breath on it... -bowerbird p.s. i'm still doing some exploring about "sophie", but i'll be getting back to you on that very soon... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/31980ea1/attachment-0001.html From curtzt at nuprometheus.com Thu Aug 18 00:33:38 2005 From: curtzt at nuprometheus.com (Thad Curtz) Date: Thu Aug 18 00:34:03 2005 Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 10 In-Reply-To: <20050818065502.10FAF8C916@pglaf.org> References: <20050818065502.10FAF8C916@pglaf.org> Message-ID: <222943DD-1873-4497-9F94-898D3DBF76A6@nuprometheus.com> It's true - I don't have terribly grand aspirations for now (and I have done a good deal of amateur programming in a variety of languages, so a certain amount of technical overhead probably would be OK.) What I'm thinking about at the moment is just: 1. Some basic text formatting including superscripts for footnotes and italics. 2. Line numbers on the side every five lines for poems. 3. Glosses on difficult words - either as footnotes on the side, or as mouseover popups (with overlib, I suppose), or at the bottom of each page. 4. Longer explanatory notes. I think the longer notes need to be at the bottoms of the pages when they're printed, and it might be nice to have the dictionary glosses there too. (There's really not that much space on the side of the page for them...) And I'd like to do it in some way that's as standard as possible, as fast as possible, as simple for students to access as possible, and that makes it as easy as possible for other people who might want to do more to build on what I've done rather than starting over from scratch. (Eventually I assume people will want to be formatting this stuff for all sorts of digital reading platforms and all sorts of page sizes with cheap automated binding at home or at Kinkos or in college printshops.) The problems about getting it to print as decently formatted pages with footnotes at the bottom on a variety of printers are one big reason I'd rather not do it in straight HTML. The problems about people being able to add to it (and file size issues) are reasons I'd rather not do it with PDFs. But I've got now several leads for other tools and tactics to take a look at from your discussion. (As well as some ideas I hadn't thought of about why Gutenberg isn't already doing it...) I'll keep you posted if I get anything done. Thanks, Thad From Bowerbird at aol.com Thu Aug 18 01:40:08 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Aug 18 01:40:37 2005 Subject: [gutvol-d] re: thad weighs in Message-ID: <1a9.3d27336d.3035a368@aol.com> thad said: > What I'm thinking about at the moment is just: cool. thanks for weighing in again. :+) one thing i'd suggest to you is to take a look at some of the .html versions prepared by distributed proofreaders where they have used side-notes. it's pretty impressive, considering they are operating without any javascript... (but yeah, your comment about the difficulty of getting well-formatted pages when printing from .html is well-taken...) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/b37198a2/attachment.html From creeva at gmail.com Thu Aug 18 12:14:16 2005 From: creeva at gmail.com (Brent Gueth) Date: Thu Aug 18 12:21:14 2005 Subject: [gutvol-d] About the XML debate Message-ID: <2510ddab05081812141c19d667@mail.gmail.com> Going through the archives in my mail box the last few days I wanted to add my .02 as someone who is not as close to the project as any of you. I think that any XML work (if Gutenberg goes that way) needs to be done in addition to, not in replacement of plaintest. I don't care if XML becomes as common as plaintext and everyone uses it, you can run into a problem in 20 years where XML falls out of favor and there won't be software to render it properly. This will lead poor fools having to redo all the documents all over again. This is not a good thing. Picking plaintext is genious in the sense, that unless basic ASCII changes (not likely compared to XML losing favor) plaintext will always be able to be read. This allows it also to be read on older machines. Maybe some of you don't care that the guy with commodore 64 can read plaintext but can't read XML because he is only one person on the planet. But when you see all the other 1 person implementations add together it becomes a decent percentage. My thoughts on a software to do the annotations would be to have a read that could overlay annotations on the screen but maintain the base document in plaintext or maintain a seperate annotated edition. The problem also we come into when we discuss modern annotations is who do we decide who is qualfied to release (write up) the annotations for a certain book. I may not agree with the annotationist that Bob likes, and Sue will hate the choices bob and I will make. The best solution I could honestly see to keep a degree of sanity is to Wiki each book you wanted to annotate. But I'll go back to reading now the archives now, I just though plaintext still needed a champion before the whole world went completely XML crazy. Remember - plaintext was supposed to be replaced by Postscript plaintext was supposed to be replaced by word perfect plaintext was supposed ot be replaced by word plaintext was supposed to be replaced by PDF plantext was supposed to be replaced by HTML plaintext is supposed ot be replaced by XML? Not bloody likely From collin at xs4all.nl Thu Aug 18 12:53:53 2005 From: collin at xs4all.nl (Branko Collin) Date: Thu Aug 18 12:38:19 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com> Message-ID: <43050371.25822.59AAF1@localhost> On 18 Aug 2005, at 12:14, Brent Gueth wrote: > Going through the archives in my mail box the last few days I wanted > to add my .02 as someone who is not as close to the project as any of > you. I think that any XML work (if Gutenberg goes that way) needs > to be done in addition to, not in replacement of plaintest. You are absolutely right, and I do not think you have anything to fear. The way I understood it, if some application of XML is going to be used at all, it will be as a storage format. From that format an immediate plain vanilla text file will be generated, that will be stored alongside the XML version. -- branko collin collin@xs4all.nl From Bowerbird at aol.com Thu Aug 18 13:06:14 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Aug 18 13:06:31 2005 Subject: SOPHIE (replacement for tk3 and XML-based) Re: [gutvol-d] Annotations for students Message-ID: <157.56f3b5c8.30364436@aol.com> jon said: > Well, I have written over 100,000 lines of code (Fortran) > over the years. I've also written some scripts. > I've edited and compiled some C code. > I've written a bunch of GWBasic programs. > And I have a couple associates in the XML world > who are programming wizards and who I often > consult with regarding what can and can't be done. great! then i'm really looking forward to your program, jon... like i always say, the proof is in the pudding. deliver pudding. > Btw, the tk3 example you gave in the other email > you just sent (and I just read), the replacement for it, > SOPHIE, will be *XML-based*. The developer of tk3, > Bob Stein, is the major player of SOPHIE. He's been > around for years and years -- his partner is very experienced. i brought up tk3 because david rothman mentioned it in his blog, so i took a fresh look at it. and since the topic was annotations, and tk3 is one of the programs with good annotation capabilities, it seemed appropriate. now, as you'll note in comments i made over on david's blog, i know tk3 -- and bob stein -- very well. (not personally, but i've followed his work since voyager days.) i've talked to bob, and steve riggins (and steve's wife) as well. i've even dropped some of my e-book programs on them, and they've been impressed. tell 'em to watch out for me! :+) but even after having visited the website you listed, i am puzzled. "sophie" is actually the name of an e-book viewer-app written by _richard_gaskin_, of fourth-world, another l.a.-based programmer. see it at: http://www.fourthworld.com/products/sophie/index.html so this makes me wonder if gaskin and stein are now teaming up? that would be sweet. i've been looking for a worthy competitor in the e-book viewer-program realm, and openreader has been a huge vapor bust so far, so a stein/gaskin product might be the one! but i don't see anything on either website to indicate a merger?... (after reading the .pdf, i see now that there's probably no merger. riggins works in small-talk, while gaskin uses run-time revolution. so it appears that this is just an unfortunate program-name crash, all revolving around programmers who've been on the left coast.) > Gee, I wonder why they will now change gears > and embrace XML? They are *experienced* developers > with a lot of knowledge of the tk3 product (over 15 years), > and *they* are switching to XML. well, wonder no more, jon, because i can tell you why. they're going for some hefty venture-capital bucks, and x.m.l. is the trend-word of the decade. if i was looking for an investor sugar-daddy, i'd be spouting x.m.l. too! > You better hurry and convince Bob Stein he needs to > get rid of XML and embrace plain text! again, you think i have something against x.m.l. i don't. if x.m.l. gave me useful tools, and hid all the complicated file-formatting under the hood, i'd be happy to embrace it. what i _do_ have something against is _vaporware_. and that's _especially_ true in regard to electronic-books, which have stagnated through cycle after cycle of _hype_ because nobody -- except for adobe -- has made it easy to _author_ electronic-books that work well on all platforms. read the "sophie" website you listed, and their .pdf, and you will find that they both say the very exact same thing: we need easy authoring-tools to make a revolution happen. but instead of delivering honest-to-goodness, simple-to-use authoring-tools, we've wasted the time fiddling with formats! and when someone comes and asks a "how do i do this?" question, we snow them with a bogus vaporware answer. and then we wonder why nothing ever gets accomplished. > You better hurry and convince Bob Stein he needs to > get rid of XML and embrace plain text! Hurry, > before it is too late! Your programming experience > should convince them of the folly of their ways. nah. i'll let bob burn through that investor cash instead; he's already run tk3 through a couple rounds of funding. (and got another quarter-of-a-million from u.s.c. recently. man, i wonder what p.g. could do with a cool $250,000!) yep, i _like_ to see the venture capitalists fall on their ass chasing the trend-word of the decade, i really do... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050818/9b3d30a4/attachment.html From marcello at perathoner.de Thu Aug 18 14:28:20 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Aug 18 14:28:33 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com> References: <2510ddab05081812141c19d667@mail.gmail.com> Message-ID: <4304FD74.5020208@perathoner.de> Brent Gueth wrote: > This is not a good thing. Picking plaintext is genious in the sense, > that unless basic ASCII changes (not likely compared to XML losing > favor) plaintext will always be able to be read. Are we confusing ASCII with plain text? Because the former is an encoding and the latter is a format. You are comparing apples with rocks and telling us we should eat rocks because they last longer. Plaintext will stay forever because it defines nothing, and so will never have to be changed. TANSTAAPF: there ain't no such thing as a plaintext format. There are roughly 16,000 plaintext formats around, because every etext defines its own format. You cannot talk of a plaintext "format" at all. > Maybe some of you don't care that > the guy with commodore 64 can read plaintext but can't read XML > because he is only one person on the planet. That's easy to fix: he should get a girlfriend. (But he should let the C64 at home on the first few dates.) Basically you say that millions of people with modern PCs should be forced to use stone-age technology because one person somewhere cannot afford to get an old PC from ebay? Even the PCs we are sending to African Schools are Pentium class machines! > plaintext was supposed to be replaced by Postscript > plaintext was supposed to be replaced by word perfect > plaintext was supposed ot be replaced by word > plaintext was supposed to be replaced by PDF > plantext was supposed to be replaced by HTML > plaintext is supposed ot be replaced by XML? > Not bloody likely Horses were supposed to be replaced by cars. Are we confusing existence with fitness for purpose? Or are we confusing existence with demand? Because nobody wants plaintext. Plaintext is ugly on a screen, is ugly on a PDA, is ugly on paper. Plaintext cannot be converted automatically into anything else. But, yes, it exists, like the treponema pallidum. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Thu Aug 18 22:54:20 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Aug 18 22:54:21 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com> References: <2510ddab05081812141c19d667@mail.gmail.com> Message-ID: <20050819055420.GC22610@pglaf.org> On Thu, Aug 18, 2005 at 12:14:16PM -0700, Brent Gueth wrote: > ... > The problem also we come into when we discuss modern annotations is > who do we decide who is qualfied to release (write up) the annotations > for a certain book. I may not agree with the annotationist that Bob > likes, and Sue will hate the choices bob and I will make. > > The best solution I could honestly see to keep a degree of sanity is > to Wiki each book you wanted to annotate. Thanks for your note, Brent, and for taking the time to read through the archives. A quick comment on this: PG is more likely to let other folks take care of annotation. Although we have some producer-contributed reviews etc. in some eBooks, we generally look to other sites to host reviews and other editorial content. For example, many of our catalog entries have links to Wikipedia articles for info about authors & titles. It might be that we'll have a "PG metadata"-type project affiliate at some point (see our philosphy/FAQ/about documents for some essays on this type of experimentation & growth). But I don't see adding such content to the eBooks themselves any time soon. Of course, such views could change as the people involved in PG change, and the world continues to change... -- Greg From greg at durendal.org Fri Aug 19 11:01:38 2005 From: greg at durendal.org (Greg Weeks) Date: Fri Aug 19 11:30:20 2005 Subject: [gutvol-d] Another rule 6 question Message-ID: When a book is cleared with rule 6, is the artwork cleared also? Can the artwork for the cover and interior illustration have a separate renewal? -- Greg Weeks http://durendal.org:8080/greg/ From joshua at hutchinson.net Fri Aug 19 11:58:12 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Aug 19 12:21:26 2005 Subject: [gutvol-d] How to submit a new file to a posted ebook? Message-ID: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com> If I've finished producing a version of a posted text (such as HTML or PDF) and I want to get that submitted and posted ... how would I go about that/who should I bug with my e-mail? I'm thinking it would either me one of the white-washers or the errata e-mail (which ends up in Jim's lap anyway), but I don't want to bother those folks who are overworked already, unless I know I'm supposed to be bothering them. :) Josh From jtinsley at pobox.com Fri Aug 19 13:23:20 2005 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Aug 19 13:23:31 2005 Subject: [gutvol-d] How to submit a new file to a posted ebook? In-Reply-To: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com> References: <20050819185812.A5B9C2FBB8@ws6-3.us4.outblaze.com> Message-ID: <20050819202320.GA6921@panix.com> On Fri, Aug 19, 2005 at 01:58:12PM -0500, Joshua Hutchinson wrote: >If I've finished producing a version of a posted text (such as HTML or PDF) and I want to get that submitted and posted ... how would I go about that/who should I bug with my e-mail? If it's a blind format conversion, please don't. http://www.gutenberg.org/faq/H-8 If it's just a late completion of a recent text, just upload it in the normal way. > >I'm thinking it would either me one of the white-washers or the errata e-mail (which ends up in Jim's lap anyway), but I don't want to bother those folks who are overworked already, unless I know I'm supposed to be bothering them. :) I hereby declare myself to be so far behind that nobody is supposed to be bothering me for years! jim From brad at chenla.org Sat Aug 20 06:45:55 2005 From: brad at chenla.org (Brad Collins) Date: Sat Aug 20 06:46:20 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab05081812141c19d667@mail.gmail.com> (Brent Gueth's message of "Thu, 18 Aug 2005 12:14:16 -0700") References: <2510ddab05081812141c19d667@mail.gmail.com> Message-ID: Brent Gueth writes: > I don't care if XML becomes as common as plaintext and everyone uses > it, you can run into a problem in 20 years where XML falls out of > favor and there won't be software to render it properly. This will > lead poor fools having to redo all the documents all over again. > This is not a good thing. As has already been mentioned, ASCII is an encoding and plaintext is a format. And ASCII is being replaced with Unicode. Some decades from now ASCII will gradually go the way of the Dodo. This is inevitable as the vast number of people in the world require a larger character set to read and write than native English speakers. As for plaintext, one of the core design goals for XML is that it you'll be able to open it in any text editor and read it. If a file is human readable when it's opened in a text editor then it's a type of plain text. All XML does is place tags around text in order to give the text a structure that machines can understand. As long as you have a text editor, you'll be able to read XML. A good text editor can clean out all of the tags with a simple regular expression like "<.*[^>]*>". Script languages like perl, python, ruby or any other language likely to come down the pike will be able to process XML and convert it into whatever comes along in the future. Very few applications render XML directly (except perhaps word processors), everyone else converts it into html, pdf or other formats for display. SGML (XML's older sister) has been around for, what, twenty years or more? And all SGML documents are easily converted into XML. XML is simplier and designed to be around as an archive format for far longer than that. Think of the XML version of an ebook as expression of a work, which is then converted into various manifestations including html, latex (which can be converted to PDF via Postscript), html, tei as well as a plain text file with not markup. Most people will never know about the master version in XML, they only will see the file formats they use to read books. XML is only a long term and safe archive format which is flexible enough to describe both the structure of a text and if you want it, also the semantic content of a text. I suggest that you google for a basic intro to XML to get an idea of what it really is. If you know anything about HTML, XML is very easy -- you can think of it as HTML where you can invent your own tags. I personally don't like DOM and XSLT which are both used for processing XML and converting it into formats like html which browsers can render. But this is no problem because I can just as easily convert and XML document into a LISP data structure of S-expressions which Lisp, Elisp, Scheme or Guile can process very easily. Once you understand that XML is just plain text, you can use any software for processing text to work with it. As long as there is a text editor, an XML documment will never be lost. b/ -- Brad Collins , Bangkok, Thailand From jon at noring.name Sat Aug 20 12:01:08 2005 From: jon at noring.name (Jon Noring) Date: Sat Aug 20 12:01:25 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: References: <2510ddab05081812141c19d667@mail.gmail.com> Message-ID: <14610115858.20050820130108@noring.name> Brad Collins wrote: > Brent Gueth writes: >> I don't care if XML becomes as common as plaintext and everyone uses >> it, you can run into a problem in 20 years where XML falls out of >> favor and there won't be software to render it properly. This will >> lead poor fools having to redo all the documents all over again. >> This is not a good thing. > [snip] > > As for plaintext, one of the core design goals for XML is that > you'll be able to open it in any text editor and read it. If a file > is human readable when it's opened in a text editor then it's a type > of plain text. All XML does is place tags around text in order to > give the text a structure that machines can understand. Good points. Properly marked up documents, where the XML vocabulary describes the structure and semantics of the text, is highly repurposeable. Should the day come that XML disappears from use, it will be relatively easily to transform such XML documents into whatever is new. Why? As Brad notes it's because an XML document comprises "plain" text which has markup added (the markup itself is also "plain" text) describing what the text is. One can think of markup as simply a sort of descriptive metadata. In the worst case scenario where one can't find anyone to write a script or apply an XML processing application to do the transformation (a scenario which will only happen if world-wide catastrophe strikes), so long as there are running computers with text editors laying around, one can open up the XML document in a text editor, and there is the "plain" text, right in front of you, nicely described with markup. Though it may take some work (depending upon the extent of the markup), and some text metadata information may be lost, one can use the text editor to strip out the markup and restore the content to "traditional" PG plain text -- if so desired. (In essence, XML markup follows Michael Hart's philosophy of using text encoding to digitally preserve public domain Works.) DP plans to apply an intelligently-designed XML vocabulary optimized for book materials to their first-generation masters (they are looking at a well-constrained subset of TEI, such as PGTEI now under development by Marcello and others.) This is a good plan. Jon From Bowerbird at aol.com Sat Aug 20 12:54:48 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Aug 20 12:55:01 2005 Subject: [gutvol-d] About the XML debate Message-ID: <11.4b9bef1f.3038e488@aol.com> brad said: > As has already been mentioned, > ASCII is an encoding and plaintext is a format. i fail to see how this distinction has any importance to the original point. the user wants the words free of markup. > And ASCII is being replaced with Unicode. Some decades > from now ASCII will gradually go the way of the Dodo.? well, if you want to get into this kind of doubletalk -- which i don't because, as i just said, it has no importance -- then it is inaccurate to say that ascii is "being replaced" by unicode, since the bottom 127 characters of unicode are the same 127 ascii characters we've come to know. if we give the original poster a unicode-aware text-editor, and a file that contains no heavy markup, he will be happy. he wants the words, all the words, and nothing but the words. > As for plaintext, one of the core design goals for XML is that > it you'll be able to open it in any text editor and read it.? ok, and now here you seem to be trying to say that an x.m.l. file is a plain-text file. it's not. it might consist of nothing more than those 127 ascii characters, but it is decidedly not a plain-text file. the original poster knows it's not plain-text. so does michael hart. most people do. including, i suspect, you. why confuse the issue? > If a file is human readable when it's opened in a text editor > then it's a type of plain text. again, this subterfuge is dishonest. first, it's inaccurate to say that an x.m.l. file is "human readable". and second, it's misleading to say it is "a type of plain text". it might be an ascii file, but it's decidedly _not_ "plain-text". ? > All XML does is place tags around text in order to > give the text a structure that machines can understand. you give machines far too little credit. they can be made to be far smarter than a dirt-dumb x.m.l. processor, which can _only_ be made to "understand" the structure of text _if_ it is tagged. > As long as you have a text editor, you'll be able to read XML. let's give the original poster an x.m.l. file, and have _him_ say whether he is able to "read it". just because you can load a file into a text-editor doesn't mean you'll actually be able to figure out _how_ to edit the darn thing in the way you want. and _that_ is the real topic at hand here... these semantic games do nothing but cloud the discussion. > A good text editor can clean out all of the tags > with a simple regular expression like "<.*[^>]*>".? ok, well at least now you're starting to talk about _issues_. but of course, you're glossing over the reality even here. the inference you are trying to get us to make is that "cleaning out all the tags" will convert an x.m.l. file into a plain-text file, magically. it won't. not in all cases anyway. not unless the x.m.l. file was created -- carefully -- with that specific conversion in mind. i've been writing a separate post that will give details how this careful consideration and crafting must be done. (some hints: whitespace, quotemarks, and tables.) > Script languages like perl, python, ruby or any other language > likely to come down the pike will be able to process XML and > convert it into whatever comes along in the future. it's telling how all of the hype about x.m.l. is in the present-tense, but when you focus down to particulars, it moves to future-tense. pay attention to this, lurkers! it's a sure sign of vapor-ware! > Very few applications render XML directly > (except perhaps word processors), > everyone else converts it into html, pdf > or other formats for display. ask yourself why this is the case. the answer is interesting. > SGML (XML's older sister) has been around for, what, > twenty years or more?? And all SGML documents are > easily converted into XML.? XML is simplier and > designed to be around as an archive format > for far longer than that. in its day, s.g.m.l. made all the same promises as x.m.l. does now. it couldn't keep them, so s.g.m.l. people had to invent a variant, so they could regenerate all their hype from scratch and reuse it. and sure enough, the public is gullible enough to believe it all again. of course, the same difficulties that thwarted s.g.m.l. back in the day -- sabotaging all their hype -- will return and bite x.m.l. in the butt. but by the time we figure out how we've been had this time around, all the x.m.l. proponents will have carted off their consultant cash... > Most people will never know about the master version in XML, > they only will see the file formats they use to read books.? they'll "know about" that x.m.l. version indirectly; it will be the reason their books are so expensive. due to all that cash those consultants carted away. > XML is only a long term and safe archive format hype and marketing. > Once you understand that XML is just plain text, > you can use any software for processing text to work with it.? you can save a spreadsheet in "plain-text" form too, and then "use any software for processing" that too. but you're going to find yourself coming up short. likewise when working with an x.m.l. file in a plain-text editor; yes, it can be done, but you will find yourself coming up short. but x.m.l. people will continue telling us this untruth, because they want us to believe that x.m.l. is really simple. but it's not. > As long as there is a text editor, > an XML documment will never be lost. of course, if it ain't human-readable in that form, it doesn't really matter if it "will never be lost". it won't need to be "lost" once it has been "tossed"... *** i will repeat: make x.m.l. work if you want us to respect it. don't come and _tell_ us how wonderful it will be; show us. the proof is in the pudding. not in the hype and marketing. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/22ba0782/attachment.html From creeva at gmail.com Sat Aug 20 13:38:24 2005 From: creeva at gmail.com (Brent Gueth) Date: Sat Aug 20 13:38:35 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <11.4b9bef1f.3038e488@aol.com> References: <11.4b9bef1f.3038e488@aol.com> Message-ID: <2510ddab050820133870f972bf@mail.gmail.com> I do understand that unicode is the next generation of ascii, which as bowerbird pointed out includes the standard ascii characters and is enhanced from there. While we could make XML the standard, why shouldn't we just include it alongside the plaintext human readable revision without markup tags. If the work is so easy it would be negligble to keep both revisions around from initial editing. I remember years ago when PG first started (I was just an outside reader completely then) that they chose plain ascii text to not get mired in any particular format that may be lost. Hence the stone tablets of the computer world. To one of the people that commented on my email - yes I want everyone to eat rocks. I beleive in PG as an archival society. The reason the format is so successful, even in this day and age is that it ubiquitous. It is this commonality that makes it flexible. do I really care if there is a seperate XML revision from the plaintext? No I do not. I don't care if we make adobe pagemaker versions. I just don't want to lose the plaintext. PG hasn't been futurists in the sense of betting what is going to be common in the coming decades. In any discussion that we could consider replacnig the the plaintext revisions with XML it needs to be asked if PG is a futurist or archival society. If I ever manage ot get my hands on the first edition of Dumas's Count of Monte Cristo like I want, I am not going to complain that it is in French, and I can not read french. In that sense it is no usable to me (it wouldn't mater in book form or any other) since I don't read french. But for some reason the desire to pretty up the archives with a replacement format is just that. I would have a book that as survived 150 years (give or take a decade or 2) that is still able to be translated and worked from easyily. Unicode wiull last at least that. I guarantee whichever XML revision is chosen it will be replaced in 150 years and be made obsolete. Unicode on the other hand will still be around because it is the workhorse of a computer society. Finally let's leave this will a bit of my own dealing with XML. I work for a company that produced a major aplication we moved from standard plaintext config files and plaintext logfiles to XML based. This in turn made tweaking and troubleshooting much more difficult than it was worth. THere is also other problems that arose with that. The cumbersome activities of our dewvelopment staff turned alot of people away, and gained alot of new customers at great cost (I don't want to go into too much detail about my company or product). The main difference though is my compnay is supposed ot be forward thinking and is trying to keep up with the jones's. PG has no jones's to compete with it is a single entity above that petty bickering. It is a beautifl idea of preserving civilization for the future to generations. To survive copyright laws and make the works available. I also believe though that new books should be edited now by PG and locked in a storage vault for release a a later date, so the books themselves survive even if the print copies don't. I add that in to show I don't follow a straight PG line, but i'm all about keeping the existence of this information alive for future generations. I'm about the information and the access and survival for it You can beautify it all you want but I strongly feel the essence and soul should be maintained as it is now before we lose what makes us special. From hacker at gnu-designs.com Sat Aug 20 13:45:48 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Aug 20 13:46:48 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> Message-ID: > While we could make XML the standard, why shouldn't we just include > it alongside the plaintext human readable revision without markup > tags. I agree. Use the XML version as the base format, and transform that XML into plain text (or pdf, jpg, postscript, etc.) from there. Great solution and I believe that is what this discussion is leading to. > do I really care if there is a seperate XML revision from the > plaintext? No I do not. I don't care if we make adobe pagemaker > versions. I just don't want to lose the plaintext. Exactly. That's what the XML version provides: one consistent base format through which all others are derived, making the final text, Adobe PageMaker, whatever... versions identical in content to the original XML version. Plain text is one of those formats, and if you prefer to read it in that format, you can do so. > Finally let's leave this will a bit of my own dealing with XML. I > work for a company that produced a major aplication we moved from > standard plaintext config files and plaintext logfiles to XML based. > This in turn made tweaking and troubleshooting much more difficult > than it was worth. Why did your company move from plain text to XML? What tools were you using to process the XML? Moving to XML "Just Because(tm)", is not a good reason to move that direction. There's a lot of "XML is the Future" FUD flying around, and too many people are believing it. Without a solid reason for migrating to XML (as for config files in your case), then its the wrong solution. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Sat Aug 20 13:40:19 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Aug 20 13:46:50 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <11.4b9bef1f.3038e488@aol.com> References: <11.4b9bef1f.3038e488@aol.com> Message-ID: > i fail to see how this distinction has any importance to the > original point. the user wants the words free of markup. [snip] > if we give the original poster a unicode-aware text-editor, and a > file that contains no heavy markup, he will be happy. he wants the > words, all the words, and nothing but the words. [snip] > ok, and now here you seem to be trying to say that an x.m.l. file is > a plain-text file. it's not. it might consist of nothing more than You spelled XML incorrectly again. > again, this subterfuge is dishonest. > first, it's inaccurate to say that an x.m.l. file is "human > readable". and second, it's misleading to say it is "a type of plain > text". it might be an ascii file, but it's decidedly _not_ > "plain-text". Are graphical buttons that contain letters "human readable"? What about product labels? Billboard signs? None of those are "human readable" (at least in the capacity that say... an OCR application could be able to decipher their meaning). > let's give the original poster an x.m.l. file, and have _him_ say > whether he is able to "read it". Sure, you can read the XML file with a browser, if you have the appropriate stylesheet that goes with it. A text editor does nothing more than "render" the text to the user's screen. Markup is the semantic instructions that describe exactly how that text is going to be rendered. A "text editor" that understands XML can easily make those tags invisible to the end user, or fold the sections, etc. This is all just a silly argument, and by your definition, your own wacky ZML format is not human readable either. What exactly is your point with this diatribe anyway? You're not going to save the world from XML, and you're certainly not going to convince others here who use it in their daily jobs. So what exactly is your point? > just because you can load a file into a text-editor doesn't mean > you'll actually be able to figure out _how_ to edit the darn thing > in the way you want. First it was about giving a 'user' the XML file to read, and now it's about editing the file? Which is it? If you're trying to edit the file, you should be expected to have the necessary tools and skills to do so. "Users" shouldn't be expected to build software on their machines without the proper development tools and environment set up to do so. Which brings me to another point: Is source code "human readable"? Its marked up in a way that provides instructions to the user's editor and compiler. By your definition, it too can't be considered "human readable" unless we remove those instructions. Removing them however... fundamentally changes how the "text file" is handled by the reader. And also by your definition, since an XML file is not "human readable", it must fail the test for GPL compliance. How would you provide a person with the "human readable" format of the source, to remain in compliance with that license? Would you consider XML the "machine readable" source instead? > these semantic games do nothing but cloud the discussion. By trying to assert that XML isn't plain text, you are the one confusing the issue. Since < and > are within the 0-127 character limit, XML is actually ascii text. That means it is "plain text". You lose this argument based on your own conclusions. > the inference you are trying to get us to make is that "cleaning out > all the tags" will convert an x.m.l. file into a plain-text file, > magically. it won't. It won't, because it already is a "plain-text" file. Cleaning them out just removes some of the plain text, leaving other plain text behind. There is nothing different from removing and from the text, just like removing (this) and (that) from the text. > i've been writing a separate post that will give details how this > careful consideration and crafting must be done. (some hints: > whitespace, quotemarks, and tables.) Does it pass an XML validator? Is it well-formed? If not, then it isn't XML, and it is some other plain-text format with whitespace, quotemarks and tables. > pay attention to this, lurkers! it's a sure sign of vapor-ware! What is the vaporware? I haven't seen it yet. XML exists, its not vaporware. I use it quite heavily to store Palm records with pilot-link. Its a great medium for atomic, record-level data in that specific case. But I'm seeing that your argument is full of hot air... or vapor, if you wish to use proper semantics. ;) > in its day, s.g.m.l. made all the same promises as x.m.l. does now. > it couldn't keep them, so s.g.m.l. people had to invent a variant, > so they could regenerate all their hype from scratch and reuse it. No, SGML is completely different in goal and purpose from XML. > and sure enough, the public is gullible enough to believe it all again. When you believe the hype that XML has anything at all to do with the "Web", then you're the gullible one. XML is an empty bucket, nothing more. It simply "holds". That's it. This whole "XML is the future of the web" business is all just hype pushed by companies trying to sell you products based on XML that intersect with the web. > of course, the same difficulties that thwarted s.g.m.l. back in the > day -- sabotaging all their hype -- will return and bite x.m.l. in > the butt. You're spelling SGML and XML incorrectly again. For someone who is trying to defend what is, and what is not "plain text" or "ascii" or "unicode", you certainly don't know how to use grammar and spelling correctly. You would add significant weight to your arguments if you were able to articulate them using proper English. > they'll "know about" that x.m.l. version indirectly; it will be the > reason their books are so expensive. due to all that cash those > consultants carted away. Excuse me? How does storing a textual work in XML in any way increase its price? In fact, it should dramatically decrease the "price", because it requires less handling to convert to any of a dozen or more formats. Having to recreate a work in Word, pdf, XML, text, and so on is much more "interactive" work if your base format is something other than XML. It requires much more "carbon-based" handling to maintain in those formats (not to mention additional storage and processing and maintenance at update time). >> XML is only a long term and safe archive format > hype and marketing. And your solution is what? Your wacky ZML answer? Please. > you can save a spreadsheet in "plain-text" form too, and then "use > any software for processing" that too. but you're going to find > yourself coming up short. Not by your definition of "plain text". > likewise when working with an x.m.l. file in a plain-text editor; > yes, it can be done, but you will find yourself coming up short. Funny, not a single anti-XML argument I've ever read (and I've read hundreds) has ever said "XML is hard to work with because its not plain text". Except here of course. > but x.m.l. people will continue telling us this untruth, because > they want us to believe that x.m.l. is really simple. but it's not. Because you're the only one who doesn't seem to grasp the means by which XML can be used, edited and converted, does not mean the format suffers or is lacking in any way. The "X" in XML stands for Extensible. So extend it to suit your needs, or use something else. Nobody is twisting your arm. > of course, if it ain't human-readable in that form, it doesn't > really matter if it "will never be lost". Right, since XML is plain and simple and human readable, the documents contents will never be lost or buried in an unparsable format or a format that requires specialized tools to edit or maintain. > i will repeat: make x.m.l. work if you want us to respect it. don't > come and _tell_ us how wonderful it will be; show us. the proof is > in the pudding. not in the hype and marketing. Have you ever read an XML file that is properly styled, in an editor that properly renders it with that styling intact? XML was not meant to be "read" by human eyes. Its a bucket, it "holds". You process it to turn it into something that can be read by humans or other machines or whatever. It is "source code" in that respect, to the "compiler" (XSLT, DOM, parsers) that is used to read it. And as much as I hate to bring it up, how many times have you openly exclaimed that you were leaving for good, and failed to do so? More hype and marketing? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From joshua at hutchinson.net Sat Aug 20 15:43:05 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Aug 20 14:35:28 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> Message-ID: <4307B1F9.6000807@hutchinson.net> Brent Gueth wrote: >I do understand that unicode is the next generation of ascii, which as >bowerbird pointed out includes the standard ascii characters and is >enhanced from there. > >While we could make XML the standard, why shouldn't we just include it >alongside the plaintext human readable revision without markup tags. > > Let me try to give the reasons why *I* started pursuing XML and see if that doesn't help allay some of your concerns. My main involvement with PG texts comes from a DP background. I'm one of the folks that help put the PG texts in place. So my perspective is not as much from the point of reading the texts and it is producing the texts. This isn't to say I don't consider the reader, but everyone tries to scratch their own itches first, and my itches are from a producer's point of view. When you create a PG text now a days, most people create multiple "versions." At the most basic, people usually create the text version and a HTML version. Text is because that is the minimum required at PG, and HTML because there is a lot of information that cannot be well represented by a plain text file opened in Notepad. Images are the first example that come to mind. Then, there are some texts which require/practically beg for additional "versions". We have scientific texts that really need a latex master document that is rendered to PDF. Languages Other Than English (LOTE) texts that require a larger character set than ASCII, so you might do a UTF-8 encoded text. The problem is, once you've create the first version (let's say it is the UTF-8 encoded plaintext format), you now have to do the manual work for the other formats. Sometimes this is trivial, sometimes it is not. But to make matters worse, it is not uncommon to notice a typo in the HTML that you didn't fix earlier. Now, you have to go back to the other versions and make the same "fix". This very quickly becomes an organizational nightmare as I'm sure you can imagine. XML solves this to a large extent. I create one "master" document and then literally click a button and I get a UTF-8 encoded .txt file, a Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded file, and a PDF file. I post all of them to the ww'ers in a fraction of the time. Plus, if someone down the road finds a problem in the text, the fix can be applied to the master XML and the others files can be regenerated. We are not doing away with the .txt files you want. We are coming up with a more efficient way to create it (along with the many other document formats people want). Oh, and yes, it is possible to create conversion routines for other formats as well. Marcello had a Palm format working at one point, if I remember correctly. A MS reader .LIT is possible (the specs are freely available and under a free license, we just need someone to take the time to create the converter). Rocket ebook reader and others should all be possible as long as the spec for the format is freely available. Please feel free to ask any questions you want on the subject. I'll be happy to run at the mouth all you want! ;) Josh From jon at noring.name Sat Aug 20 14:42:55 2005 From: jon at noring.name (Jon Noring) Date: Sat Aug 20 14:43:09 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <11.4b9bef1f.3038e488@aol.com> References: <11.4b9bef1f.3038e488@aol.com> Message-ID: <1467962782.20050820154255@noring.name> Bowerbird wrote: > i will repeat:? make x.m.l. work if you want us to respect it. Who's "us"? Didn't you mean to say "me"? From jon at noring.name Sat Aug 20 14:52:48 2005 From: jon at noring.name (Jon Noring) Date: Sat Aug 20 14:53:00 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <4307B1F9.6000807@hutchinson.net> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> <4307B1F9.6000807@hutchinson.net> Message-ID: <1834636618.20050820155248@noring.name> Joshua wrote: [keeping his whole reply intact] > My main involvement with PG texts comes from a DP background. I'm one > of the folks that help put the PG texts in place. So my perspective is > not as much from the point of reading the texts and it is producing the > texts. This isn't to say I don't consider the reader, but everyone > tries to scratch their own itches first, and my itches are from a > producer's point of view. > > When you create a PG text now a days, most people create multiple > "versions." At the most basic, people usually create the text version > and a HTML version. Text is because that is the minimum required at PG, > and HTML because there is a lot of information that cannot be well > represented by a plain text file opened in Notepad. Images are the > first example that come to mind. > > Then, there are some texts which require/practically beg for additional > "versions". We have scientific texts that really need a latex master > document that is rendered to PDF. Languages Other Than English (LOTE) > texts that require a larger character set than ASCII, so you might do a > UTF-8 encoded text. > > The problem is, once you've create the first version (let's say it is > the UTF-8 encoded plaintext format), you now have to do the manual work > for the other formats. Sometimes this is trivial, sometimes it is not. > But to make matters worse, it is not uncommon to notice a typo in the > HTML that you didn't fix earlier. Now, you have to go back to the other > versions and make the same "fix". This very quickly becomes an > organizational nightmare as I'm sure you can imagine. > > XML solves this to a large extent. I create one "master" document and > then literally click a button and I get a UTF-8 encoded .txt file, a > Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded > file, and a PDF file. I post all of them to the ww'ers in a fraction of > the time. Plus, if someone down the road finds a problem in the text, > the fix can be applied to the master XML and the others files can be > regenerated. > > We are not doing away with the .txt files you want. We are coming up > with a more efficient way to create it (along with the many other > document formats people want). > > Oh, and yes, it is possible to create conversion routines for other > formats as well. Marcello had a Palm format working at one point, if I > remember correctly. A MS reader .LIT is possible (the specs are freely > available and under a free license, we just need someone to take the > time to create the converter). Rocket ebook reader and others should > all be possible as long as the spec for the format is freely available. > > Please feel free to ask any questions you want on the subject. I'll be > happy to run at the mouth all you want! ;) Kudos! This is by far the best reply I've yet seen on the practical benefits of XML for producing structured digital texts. Cogent, simple, and to the point, backed up by real-world experience. Joshua, you might consider submitting what you wrote to David Rothman's TeleRead blog as a guest blog article (his blog is one of the more popular blogs on the Internet, and by far the most read blog regarding ebooks and digital libraries.) Let me know -- I will be glad assist. Jon From Bowerbird at aol.com Sat Aug 20 15:14:55 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Aug 20 15:15:12 2005 Subject: [gutvol-d] About the XML debate Message-ID: david said: > So what exactly is your point? my point, exactly, is that a lot of people here tout x.m.l. as the solution, but no one seems to want to pay the cost of actually doing the markup. why is that? as to all the verbiage about what constitutes whether a file is "human readable" or not, if we ask some humans, the answer is clear. > Since < and > are within the 0-127 character limit, > XML is actually ascii text. That means it is "plain text". um, no. an x.m.l. file is not "plain text". ask a human. > How does storing a textual work in XML > in any way increase its price? applying the markup requires expensive expertise. > In fact, it should dramatically decrease the "price", > because it requires less handling to convert to > any of a dozen or more formats. or so the hype goes. but where is the pudding? meanwhile, over at blackmask, daniel has been converting the entire project gutenberg library into a half-dozen formats for several years now, based on the plain-text versions, with zero x.m.l. > And as much as I hate to bring it up, > how many times have you openly exclaimed that > you were leaving for good, and failed to do so? you'll be dealing with me for a long time, david... provide some pudding. don't just talk about it. there are 17,000 e-texts waiting to be marked up. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/2270795a/attachment.html From Bowerbird at aol.com Sat Aug 20 15:18:47 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Aug 20 15:19:03 2005 Subject: [gutvol-d] About the XML debate Message-ID: <1ac.3db11948.30390647@aol.com> jon noring said something or other. however, since jon is still moderating michael hart over on jon's listserve, i won't be responding to him. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/d79bf79c/attachment.html From marcello at perathoner.de Sat Aug 20 16:03:14 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Aug 20 16:03:46 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <4307B1F9.6000807@hutchinson.net> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> <4307B1F9.6000807@hutchinson.net> Message-ID: <4307B6B2.1000905@perathoner.de> Joshua Hutchinson wrote: > Marcello had a Palm format working at one point, if I > remember correctly. I dropped it because pluckering the html file gives you a better experience at a smaller file size. The same conversion should be possible for Pocket-PC formats, but I'm not going to buy one just to test this. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Aug 20 16:12:09 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Aug 20 16:12:23 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <2510ddab050820133870f972bf@mail.gmail.com> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> Message-ID: <4307B8C9.1080407@perathoner.de> Brent Gueth wrote: > While we could make XML the standard, why shouldn't we just include it > alongside the plaintext human readable revision without markup tags. That's just what we were going to do. > To one of the people that commented on my email - yes I want everyone > to eat rocks. Tip: don't open a restaurant. -- Marcello Perathoner webmaster@gutenberg.org From sly at victoria.tc.ca Sat Aug 20 16:26:05 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Aug 20 16:26:20 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <4307B1F9.6000807@hutchinson.net> References: <11.4b9bef1f.3038e488@aol.com> <2510ddab050820133870f972bf@mail.gmail.com> <4307B1F9.6000807@hutchinson.net> Message-ID: On Sat, 20 Aug 2005, Joshua Hutchinson wrote: > > The problem is, once you've create the first version (let's say it is > the UTF-8 encoded plaintext format), you now have to do the manual work > for the other formats. Sometimes this is trivial, sometimes it is not. > But to make matters worse, it is not uncommon to notice a typo in the > HTML that you didn't fix earlier. Now, you have to go back to the other > versions and make the same "fix". This very quickly becomes an > organizational nightmare as I'm sure you can imagine. > > XML solves this to a large extent. I create one "master" document and > then literally click a button and I get a UTF-8 encoded .txt file, a > Latin-1 encoded .txt file, an ASCII encoded .txt file, a HTML encoded > file, and a PDF file. I post all of them to the ww'ers in a fraction of > the time. Plus, if someone down the road finds a problem in the text, > the fix can be applied to the master XML and the others files can be > regenerated. I'll add this to Josh's well-worded message. For the white washers and anyone doing maintenance on the PG files, having a variety of file formats to deal with does sometimes make quite a headache. Recently, I was making some corrections in a text that was in the collection in txt, htm, and rtf formats, and I can tell you that editing rtf manually is not fun. Also a note that for the example Josh mentioned above, after he submits the files, a white washers will review them with some automatic checking before being posted, and any corrections being made will need to be done individually to each file format. Andrew From joshua at hutchinson.net Sat Aug 20 16:50:40 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Aug 20 16:50:53 2005 Subject: [gutvol-d] About the XML debate Message-ID: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> I played a little with the ReaderWorks converter for HTML to LIT. The biggest limitation is that the LIT format supports a nice Table of Contents feature which a basic HTML to LIT conversion doesn't support. The LIT specs are supposedly free (and under a Free License) but I haven't checked into it any further than that. I supposed after TXT, HTML and PDF are working in the PG mainstream, I'll move on to other formats like the Palm and Reader formats. ----- Original Message ----- From: "Marcello Perathoner" > > Joshua Hutchinson wrote: > > > Marcello had a Palm format working at one point, if I remember correctly. > > I dropped it because pluckering the html file gives you a better experience at > a smaller file size. > > The same conversion should be possible for Pocket-PC formats, but I'm not > going to buy one just to test this. > > > -- Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Sat Aug 20 17:25:09 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Aug 20 17:25:25 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> Message-ID: <4307C9E5.5070606@perathoner.de> Joshua Hutchinson wrote: > I played a little with the ReaderWorks converter for HTML to LIT. > The biggest limitation is that the LIT format supports a nice Table > of Contents feature which a basic HTML to LIT conversion doesn't > support. The LIT specs are supposedly free (and under a Free > License) but I haven't checked into it any further than that. I > supposed after TXT, HTML and PDF are working in the PG mainstream, > I'll move on to other formats like the Palm and Reader formats. Plucker lets you download a web site (and conversely an html ebook) to your Palm. Links and images still work. Its GPLed. But its PalmOS only. AvantGo does the same for PocketPC. But it is payware. We need a reader for PocketPC (and Symbian) and an html converter that runs on (at least) linux. Both must be open source. Any suggestions? -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Sat Aug 20 17:20:23 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Aug 20 17:25:39 2005 Subject: [gutvol-d] About the XML debate Message-ID: <99.64e100fd.303922c7@aol.com> andrew said: > having a variety of file formats to deal with > does sometimes make quite a headache. yes, having a master-file is indeed a good thing. but it's just not true that x.m.l. is the only form that a master can take. and it might not be true that x.m.l. is even the _best_ form that it can take. again, david moynihan at blackmask.com has proven that x.m.l. isn't necessary to generate lots of formats. through the use of standardized formatting, he has been able to do what none of you have even been able to start. but hey, i look forward to the day when you do get underway, and to the time long after that when all the markup gets done, because then the e-texts will finally be in a regularized format... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050820/ac9648b6/attachment.html From hacker at gnu-designs.com Sat Aug 20 18:14:20 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Aug 20 18:14:52 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <4307C9E5.5070606@perathoner.de> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> Message-ID: > Plucker lets you download a web site (and conversely an html ebook) > to your Palm. Links and images still work. Its GPLed. But its PalmOS > only. Incorrect. Plucker runs on PalmOS, PocketPC, Windows MObile, Linux and on non-PDA desktop machines. There are ports of the viewer for those platforms, many of which we carry in CVS. > AvantGo does the same for PocketPC. But it is payware. AvantGo falls short of about 40 of Plucker's core features. > We need a reader for PocketPC (and Symbian) and an html converter > that runs on (at least) linux. Both must be open source. Plucker, Vade Mecum (the PocketPC viewer based on Plucker) are the tools you need. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From gbnewby at pglaf.org Sat Aug 20 18:23:35 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Aug 20 18:23:36 2005 Subject: [gutvol-d] Another rule 6 question In-Reply-To: References: Message-ID: <20050821012335.GC2094@pglaf.org> On Fri, Aug 19, 2005 at 02:01:38PM -0400, Greg Weeks wrote: > > When a book is cleared with rule 6, is the artwork cleared also? Can the > artwork for the cover and interior illustration have a separate renewal? The intent is for a clearance to clear an entire item: intro, artwork, footnotes, etc. When we think it doesn't (for example, if there is a modern/new intro for an old title), we try to mention this in the clearance notes. If *you* think the artwork post-dates the printed volume, please let us know so we can judge (we == Juliet & I). Otherwise, yes: it's safe to think that any copyright rule applied is for the entire printed work, including artwork. -- Greg From marcello at perathoner.de Sat Aug 20 18:33:30 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Aug 20 18:34:02 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> Message-ID: <4307D9EA.5040704@perathoner.de> David A. Desrosiers wrote: > Incorrect. Plucker runs on PalmOS, PocketPC, Windows MObile, Linux > and on non-PDA desktop machines. There are ports of the viewer for those > platforms, many of which we carry in CVS. 2.1 What platforms does Plucker run on? The viewer should run on any PalmOS? device running version 2.0.4 or higher of PalmOS, while the desktop tools are supported on Linux, Windows, Mac OS X, and OS/2. ---- http://www.plkr.org/faq/2.1 And, no, I won't tell Aunt Tillie that she just has to pull the sources from CVS and compile if she wants to read a book. > Plucker, Vade Mecum (the PocketPC viewer based on Plucker) are the > tools you need. Is this thing GPLed? Why don't I find any reference to this on the plucker site? -- Marcello Perathoner webmaster@gutenberg.org From hacker at gnu-designs.com Sat Aug 20 19:19:06 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Aug 20 19:19:53 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <4307D9EA.5040704@perathoner.de> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <4307D9EA.5040704@perathoner.de> Message-ID: > The viewer should run on any PalmOS? device running version 2.0.4 or > higher of PalmOS, while the desktop tools are supported on Linux, > Windows, Mac OS X, and OS/2. > > ---- http://www.plkr.org/faq/2.1 As you know, the documentation is the last thing to be updated, and we can never track every single project out there using Plucker as an engine (there are now over 2-dozen of them, commercial and non). > And, no, I won't tell Aunt Tillie that she just has to pull the > sources from CVS and compile if she wants to read a book. Of course not, download the binaries provided on the other websites. In the case of Linux-based PDAs, use the reader packaged for those platforms (we don't provide packages for them, of course, thats not our job). The same goes for the PocketPC and WindowsMobile versions. I'm not sure about a Symbian version, but I know Plucker runs on that new Nokia/Linux tablet device. > Is this thing GPLed? Why don't I find any reference to this on the > plucker site? Perhaps you didn't look? Its been there for almost exactly 2 years: http://www.plkr.org/news/31 As for the "cobwebs" on the site, the Plucker site is being rewritten from the ground up, and that includes catching up on about 30 news articles that have to be made public as well. We all have day jobs and that takes away from our time to play with these kinds of things. I've recently been asking the community to help us bring the docs and FAQ and other bits up to date, but the response has been depressingly light. http://code.plkr.org/docwiki/ And some things I've been working on are over here: http://code.plkr.org/ David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jon at noring.name Sat Aug 20 19:06:22 2005 From: jon at noring.name (Jon Noring) Date: Sat Aug 20 20:24:04 2005 Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate) In-Reply-To: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> Message-ID: <109403289.20050820200622@noring.name> Joshua wrote: > I played a little with the ReaderWorks converter for HTML to LIT. > The biggest limitation is that the LIT format supports a nice Table > of Contents feature which a basic HTML to LIT conversion doesn't > support. The LIT specs are supposedly free (and under a Free > License) but I haven't checked into it any further than that. I > supposed after TXT, HTML and PDF are working in the PG mainstream, > I'll move on to other formats like the Palm and Reader formats. LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What ReaderWorks does is take HTML and "conforms" it internally to OEBPS, then converts it to LIT using Microsoft's litgen.dll. Microsoft has a Reader SDK which includes a "demo" to convert OEBPS 1.0.1 into LIT. I've taken that demo and tweaked the C++ code some and then compiled it to generate a "production" level converter which I use for my publishing business. ReaderWorks has some bugs not allowing using the full power of OEBPS which LIT supports. The LIT format supports the OEBPS Tours and "out-of-spine" feature (where "out-of-spine" content is presented in "pagelets".) Most publishers who produce LIT (using either ReaderWorks or, heaven forbid, Word HTML as the input) are totally unaware of these cool features. I use Tours and "out-of-spine" content a lot in my ebooks (e.g., I put all footnotes into popup pagelets.) Joshua, I'd be happy to share my OEBPS to LIT converter, as well as a sample OEBPS Publication. You can use the Package supplied in the sample Publication as a template to build your own Packages and implement Tours and "out-of-spine" content. Let me know... Jon Noring From brad at chenla.org Sun Aug 21 00:09:50 2005 From: brad at chenla.org (Brad Collins) Date: Sun Aug 21 00:10:44 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <1ac.3db11948.30390647@aol.com> (Bowerbird@aol.com's message of "Sat, 20 Aug 2005 18:18:47 EDT") References: <1ac.3db11948.30390647@aol.com> Message-ID: <8xyvepfl.fsf@chenla.org> Bowerbird / I don't understand why a person who hates XML/HTML so much sends HTML formated mail to the list. You should really practice what you preach :) b/ -- Brad Collins , Bangkok, Thailand From marcello at perathoner.de Sun Aug 21 05:40:05 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Aug 21 05:40:27 2005 Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate) In-Reply-To: <109403289.20050820200622@noring.name> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <109403289.20050820200622@noring.name> Message-ID: <43087625.9010104@perathoner.de> Jon Noring wrote: > LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What > ReaderWorks does is take HTML and "conforms" it internally to OEBPS, > then converts it to LIT using Microsoft's litgen.dll. I'll add new formats to the PGTEI converter on the condition that: 1. all components of the converter MUST be open source, 2. all components of the converter MUST run under linux, 3. the new format SHOULD be documented and be an open standard, 4. there SHOULD be at least one free as in beer reader. Ad 1. The converter must run on servers at ibiblio. We cannot afford server licenses. Besides, I'm a narrow-minded free software bigot bastard and proud of it. Ad 2. The converter must run on ibiblio servers which run on linux. Ad 3. Ideally the format should be an open standard like HTML. I personally won't do any work on undocumented formats. But if anybody else takes the trouble I'm not going to stand in their way. Ad 4. Ideally the viewer should be open source, but I'll settle for a free beer one. It just feels wrong to make people pay for a viewer to read free books on. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Sun Aug 21 07:01:10 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sun Aug 21 05:54:42 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <4307C9E5.5070606@perathoner.de> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> Message-ID: <43088926.70704@hutchinson.net> Thanks to some back and forth with David Widger, we have posted a text to the PG archives that is basically the XML with its straight from conversion txt, html and pdf files. http://www.gutenberg.org/1/6/5/2/16523 For those interested: This book (Kitab-i-Aqdas) is a religious book from the Baha'i Faith. The text is freely available from the Baha'i website with a usage license that allows us to post the text to our archive as long as we don't make any content changes. I've basically converted it from the Microsoft Word format they posted in to a PGTEI based master and used that to create text in UTF-8, Latin-1 and 7-bit ASCII, html and pdf. Regarding the XML. The XML file can be found in the 16523-x subdirectory. These files are not designed to be read directly in a web browser like IE or Firefox. They are plain text files and open just fine in Notepad or vi or any other text editor of choice. For those wishing to play with the XML, our online validator and conversion tools can be found here: http://www.gutenberg.org/tei Besides wanting to celebrate the first XML posting ;) ... I'm also looking for contructive criticism. What doesn't look right? What problems do you see with the results? Thanks for your attention, Joshua Hutchinson From hacker at gnu-designs.com Sun Aug 21 06:25:55 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sun Aug 21 06:26:49 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <43088926.70704@hutchinson.net> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net> Message-ID: > Besides wanting to celebrate the first XML posting ;) ... I'm also > looking for contructive criticism. What doesn't look right? What > problems do you see with the results? Other than the unicode changes, what is the difference between 16523-0.txt and 16523-8.txt? They appear to contain identical content. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From marcello at perathoner.de Sun Aug 21 07:14:25 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Aug 21 07:14:33 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <43088926.70704@hutchinson.net> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net> Message-ID: <43088C41.3000004@perathoner.de> Joshua Hutchinson wrote: > Besides wanting to celebrate the first XML posting ;) ... I'm also > looking for contructive criticism. What doesn't look right? What > problems do you see with the results? 1. The TEI files should better be named .tei and put into a 16523-tei/ directory. We have other types of XML files (MusicXML) and we don't want to get confused. Besides, TEI is the more specific appellation than XML. 2. The PDF shows some overly long page headlines. The page headline in pdf is taken from the toc entry ... Maybe I should change that to be taken from the pdf bookmarks, so you have a little more control over it. Personally I would just not include all "notes" into the toc. This is allowed by the license ("in whole or in part"). 3. PDF again. The "Synopsis and Codification" section is not indented like in TXT and HTML. That is probably a bug in the converter. I'll look into it. 4. PDF again. Some chapter names contain unicode characters like em-dash and pretty quotes. These are not supported by PDF bookmarks. You have to provide a `dumbed-down' title for the bookmark with: before the . -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Sun Aug 21 08:51:04 2005 From: jon at noring.name (Jon Noring) Date: Sun Aug 21 08:51:16 2005 Subject: OEBPS to LIT Converter (was Re: [gutvol-d] About the XML debate) In-Reply-To: <43087625.9010104@perathoner.de> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <109403289.20050820200622@noring.name> <43087625.9010104@perathoner.de> Message-ID: <348550588.20050821095104@noring.name> Marcello wrote: > Jon Noring wrote: >> LIT is essentially an encapsulated OEBPS 1.0.1 Publication. What >> ReaderWorks does is take HTML and "conforms" it internally to OEBPS, >> then converts it to LIT using Microsoft's litgen.dll. > I'll add new formats to the PGTEI converter on the condition that: > > 1. all components of the converter MUST be open source, > 2. all components of the converter MUST run under linux, > 3. the new format SHOULD be documented and be an open standard, > 4. there SHOULD be at least one free as in beer reader. Well, that pretty much leaves LIT out of the picture (essentially by 3 and 4). However, OEBPS 1.0.1 would be a viable format to produce (and quite easy if the books documents will validate in XHTML 1.0 Strict.) Then end-users can produce LIT if they so choose. (I'd also produce an OEBPS 1.2 Publication version as well -- there are subtle differences between the two.) As a format, OEBPS fulfills all the openness requirements. There are a couple primitive viewers (still under development) for OEBPS 1.0.1 and 1.2. This includes the "OpenBerg" project. OpenReader (the format) is planning on embracing OEBPS 1.2 and later a selected subset of TEI (PGTEI?). Jon Noring From joshua at hutchinson.net Sun Aug 21 11:40:27 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sun Aug 21 10:33:26 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net> Message-ID: <4308CA9B.9030601@hutchinson.net> David A. Desrosiers wrote: > >> Besides wanting to celebrate the first XML posting ;) ... I'm also >> looking for contructive criticism. What doesn't look right? What >> problems do you see with the results? > > > Other than the unicode changes, what is the difference between > 16523-0.txt and 16523-8.txt? They appear to contain identical content. > > 16523-0.txt is UTF-8 encoding 16523-8.txt is Latin-1 encoding 16523-7.txt is ASCII encoding. The content should otherwise be identical. From Bowerbird at aol.com Sun Aug 21 12:42:09 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Aug 21 12:42:19 2005 Subject: [gutvol-d] About the XML debate Message-ID: <128.635cfd05.303a3311@aol.com> brad said: > I don't understand why a person who hates XML/HTML > so much sends HTML formated mail to the list. it's been forced upon me, as i can't turn it off. > You should really practice what you preach :) yes, i should. instead the powers-that-be are forcing things on me, which is what y'all preach. ;+) meanwhile, this thread has hit 20+ posts over the weekend, which is very rude, since many people do not check their e-mailboxes on the weekend, so i'll opt out until tomorrow. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/ff937c93/attachment.html From marcello at perathoner.de Sun Aug 21 13:01:25 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Aug 21 13:01:50 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <128.635cfd05.303a3311@aol.com> References: <128.635cfd05.303a3311@aol.com> Message-ID: <4308DD95.3080001@perathoner.de> Bowerbird@aol.com wrote: > it's been forced upon me, as i can't turn it off. ROTFL: the great programmer can't figure out how to install a decent mail program. Your words are soo big, but your actual abilities are pretty small. > so i'll opt out until tomorrow. Promises, promises ... -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Sun Aug 21 14:14:21 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sun Aug 21 13:04:32 2005 Subject: [gutvol-d] About the XML debate In-Reply-To: <128.635cfd05.303a3311@aol.com> References: <128.635cfd05.303a3311@aol.com> Message-ID: <4308EEAD.1000805@hutchinson.net> Bowerbird@aol.com wrote: > meanwhile, this thread has hit 20+ posts over the weekend, > which is very rude, since many people do not check their > e-mailboxes on the weekend, so i'll opt out until tomorrow. Huh? That sentence is perfect English, yet makes no sense to me at all. Josh From Bowerbird at aol.com Sun Aug 21 13:51:15 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Aug 21 13:51:31 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! Message-ID: <88.2d4a4147.303a4343@aol.com> joshua said: > ANNOUNCEMENT: XML has hit the PG archives! that's great! one down, 16000+ left to convert... congratulations! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/84e53e69/attachment.html From Bowerbird at aol.com Sun Aug 21 13:52:07 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Aug 21 13:52:20 2005 Subject: [gutvol-d] re: narrow-minded free software bigot bastard, and proud of it Message-ID: marcello said: > Besides, I'm a narrow-minded free software bigot bastard > and proud of it. great. i like people who stand up for what they believe in. now i have a question. who runs project gutenberg, anyway? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050821/5f08ff36/attachment.html From marcello at perathoner.de Sun Aug 21 14:31:44 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Aug 21 14:31:58 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <88.2d4a4147.303a4343@aol.com> References: <88.2d4a4147.303a4343@aol.com> Message-ID: <4308F2C0.7060401@perathoner.de> Bowerbird@aol.com wrote: >> ANNOUNCEMENT: XML has hit the PG archives! > > that's great! > > one down, 16000+ left to convert... That's exactly one more done item than you can show. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Sun Aug 21 14:40:04 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Aug 21 14:40:06 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <43088926.70704@hutchinson.net> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net> Message-ID: <20050821214004.GB3229@pglaf.org> On Sun, Aug 21, 2005 at 10:01:10AM -0400, Joshua Hutchinson wrote: > Thanks to some back and forth with David Widger, we have posted a text > to the PG archives that is basically the XML with its straight from > conversion txt, html and pdf files. > > http://www.gutenberg.org/1/6/5/2/16523 Thanks, Joshua. This is major!! I'm still ready to post Gilgamesh, too (and in fact, had been thinking of just "going for it"). I hope you'll be able to work on it soon. Today (yesterday?) will stand as a great day in Project Gutenberg history. XML as the base format for these "static" and forthcoming "dynamic" conversions is what we've been talking about for years. It's the key to many of the activities we've anticipated. Congratulations!!! -- Greg From jeroen.mailinglist at bohol.ph Mon Aug 22 12:43:19 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Mon Aug 22 13:09:50 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <43088926.70704@hutchinson.net> References: <20050820235041.05A8C109688@ws6-4.us4.outblaze.com> <4307C9E5.5070606@perathoner.de> <43088926.70704@hutchinson.net> Message-ID: <430A2AD7.8050302@bohol.ph> Hurray, more XML! Some time ago (in February 2004), I'd already prepared The Einstein Theory of Relativity by H.A. Lorentz as http://www.gutenberg.org/etext/11335 This was also posted in XML with derived text and HTML. Jeroen. Joshua Hutchinson wrote: > Thanks to some back and forth with David Widger, we have posted a text > to the PG archives that is basically the XML with its straight from > conversion txt, html and pdf files. > > http://www.gutenberg.org/1/6/5/2/16523 > > For those interested: This book (Kitab-i-Aqdas) is a religious book > from the Baha'i Faith. The text is freely available from the Baha'i > website with a usage license that allows us to post the text to our > archive as long as we don't make any content changes. I've basically > converted it from the Microsoft Word format they posted in to a PGTEI > based master and used that to create text in UTF-8, Latin-1 and 7-bit > ASCII, html and pdf. > > Regarding the XML. The XML file can be found in the 16523-x > subdirectory. These files are not designed to be read directly in a > web browser like IE or Firefox. They are plain text files and open > just fine in Notepad or vi or any other text editor of choice. For > those wishing to play with the XML, our online validator and > conversion tools can be found here: > > http://www.gutenberg.org/tei > > Besides wanting to celebrate the first XML posting ;) ... I'm also > looking for contructive criticism. What doesn't look right? What > problems do you see with the results? > > Thanks for your attention, > Joshua Hutchinson > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From Bowerbird at aol.com Mon Aug 22 14:02:58 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Aug 22 14:03:14 2005 Subject: [gutvol-d] re: the great programmer Message-ID: <1a2.3a3a3de0.303b9782@aol.com> marcello said: > ROTFL: the great programmer > can't figure out how to > install a decent mail program. > Your words are soo big, > but your actual abilities are pretty small. yeah, i'm just one of the 30 million a.o.l. idiots slobbering on our keyboards... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050822/1f5e632a/attachment.html From lee at novomail.net Mon Aug 22 15:59:41 2005 From: lee at novomail.net (Lee Passey) Date: Tue Aug 23 01:49:29 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <20050821132650.DA1298C992@pglaf.org> References: <20050821132650.DA1298C992@pglaf.org> Message-ID: <430A58DD.60004@novomail.net> Joshua Hutchinson wrote: > Thanks to some back and forth with David Widger, we have posted a text > to the PG archives that is basically the XML with its straight from > conversion txt, html and pdf files. > > http://www.gutenberg.org/1/6/5/2/16523 > > For those interested: This book (Kitab-i-Aqdas) is a religious book > from the Baha'i Faith. The text is freely available from the Baha'i > website with a usage license that allows us to post the text to our > archive as long as we don't make any content changes. I've basically > converted it from the Microsoft Word format they posted in to a PGTEI > based master and used that to create text in UTF-8, Latin-1 and 7-bit > ASCII, html and pdf. > > Regarding the XML. The XML file can be found in the 16523-x > subdirectory. These files are not designed to be read directly in a > web browser like IE or Firefox. They are plain text files and open > just fine in Notepad or vi or any other text editor of choice. For > those wishing to play with the XML, our online validator and > conversion tools can be found here: > > http://www.gutenberg.org/tei > > Besides wanting to celebrate the first XML posting ;) ... I'm also > looking for contructive criticism. What doesn't look right? What > problems do you see with the results? Congratulations on a worthwhile accomplishment. I would like to point out, however, that this is _not_ Gutenberg's first XML posting; I believe there are hundreds of XHTML files currently available. You probably intended to say that this is Gutenberg's first TEI-XML posting. I know that this seems like picking at some pretty minor nits, but there are some people who believe that there is actually a text markup language called XML. XML is actually a syntax for creating markup languages, and there are many markup language available which conform to the XML syntax, e.g. XHTML, TEI, and DocBook. For clarity's sake it is probably desirable to always refer to a specific XML vocabulary, except when discussing the XML syntax which applies to all XML vocabularies equally. Some specific, and very preliminary observations: As Mr. Noring is always quick to point out, XML files can be viewed natively in both Firefox and IE6 when accompanied by appropriate style sheets, so I attempted to open this file directly in both of these browsers. In IE6, I get the error "The system cannot locate the object specified. Error processing resource 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent'. Apparently, your dtd, http://www.gutenberg.org/tei/marcello/0.3/dtd/pgtei.dtd, contains the line: %TEI; It looks like IE sees a full url for the TEI SYSTEM entity, so it assumes that refers to a file on the same system as "tei2.dtd." Of course, the TEI consortium doesn't maintain a file called "pgtei-extensions.ent", so IE fails catastrophically. Now I'm still having a hard time wrapping my head around dtd's, so I have no idea if IE's behavior is technically correct or not, but it would be nice if the dtd's could be reworked in such a way that this failure does not occur, perhaps by hosting the TEI dtd's at http://www.gutenberg.org/tei/marcello/0.3/dtd/, and referencing them there. Firefox does not have this problem, but Firefox also breaks when it encounters named entities, even when the entities are referenced in .ent files included from the dtd's, leading me to believe that Firefox avoids the problems associated with "roaming dtd's" by simply not parsing them in the first place. Numerical entities _are_ recognized, and rendered appropriately, as are named entities when the entity definition is contained in the XML file itself. I have no solution to this problem, except to suggest that named entities simply be avoided in favor of numeric entities, at least in the short term (I do note that the etext 16523-x.xml does not contain any named entities). One of my pet peeves is the use of the

(paragraph) tag as a generic block tag, rather than limiting its use to true paragraphs, and using the

tag for generic blocks of text. I am happy to say that the text is mostly correct in this regard. The byline

by Bah??u?ll?h

should be marked using the tag instead of

; there may be other similar problems I simply haven't encountered yet. It appears that the file is latin-1 encoded, despite the fact that the DTD claims that it is utf-8 encoded. This caused Firefox some grief as it tried to utf-8-decode some latin-1 accented vowels. I grabbed an arbitrary "tei.css" style sheet off the net, and added the line: to the beginning of the file. Looking at it in both browsers (after I had copied enough .dtd's and .ent's to my local file system that IE could cope) the document looked quirky, but readable. When I deleted the .css file the document turned into a plain-text file, totally without styling, but nothing broke. I think every PGTEI document should probably start with the three lines: and one of the next tasks should be to develop CSS files for generic TEI files and PG TEI files (the "usertei.css" file should be reserved for sophisticated users who may want to override the standard styles). If this were done (and the dtd issues are resolved for IE), the production TEI files should be usable directly by a modern web browser without any kind of pre-processing. If you're interested, I'll start putting together a generic CSS file for TEI. From joshua at hutchinson.net Tue Aug 23 05:06:34 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Aug 23 04:14:54 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <430A58DD.60004@novomail.net> References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net> Message-ID: <430B114A.4030808@hutchinson.net> Lee Passey wrote: > Congratulations on a worthwhile accomplishment. > Thanks! > I would like to point out, however, that this is _not_ Gutenberg's > first XML posting; I believe there are hundreds of XHTML files > currently available. You probably intended to say that this is > Gutenberg's first TEI-XML posting. I know that this seems like picking > at some pretty minor nits, but there are some people who believe that > there is actually a text markup language called XML. XML is actually a > syntax for creating markup languages, and there are many markup > language available which conform to the XML syntax, e.g. XHTML, TEI, > and DocBook. For clarity's sake it is probably desirable to always > refer to a specific XML vocabulary, except when discussing the XML > syntax which applies to all XML vocabularies equally. > We've had some back channel discussion on just how to name this and we've decided to change the extension to .tei to give a better indication of what the file is. > Some specific, and very preliminary observations: > > As Mr. Noring is always quick to point out, XML files can be viewed > natively in both Firefox and IE6 when accompanied by appropriate style > sheets, so I attempted to open this file directly in both of these > browsers. > While this is true, our tei files are specifically meant as a master document and NOT as a viewing document. They will NOT parse in any browser "out of the box". As you've seen, you can jury-rig things to the point where it is usuable, but that is not our intention. We provide the HTML files directly for people that want to browse the file in IE or Firefox. Also, we have had some backchannel discussion about how the web server should serve the .tei files. I think Marcello is going to change the server to tell your browser that the .tei files is a mime encoding of text so that it will display like a .txt file would. This will help prevent people from being confused when their browser tries to display the file directly and fails miserably. > > Firefox does not have this problem, but Firefox also breaks when it > encounters named entities, even when the entities are referenced in > .ent files included from the dtd's, leading me to believe that Firefox > avoids the problems associated with "roaming dtd's" by simply not > parsing them in the first place. Numerical entities _are_ recognized, > and rendered appropriately, as are named entities when the entity > definition is contained in the XML file itself. I have no solution to > this problem, except to suggest that named entities simply be avoided > in favor of numeric entities, at least in the short term (I do note > that the etext 16523-x.xml does not contain any named entities). > I personally prefer numeric entities, as well, but for the more common ones, the conversion process will support named entities in the .tei file. Most of them appear as unicode in the HTML, so it typically isn't an issue in the final product. > One of my pet peeves is the use of the

(paragraph) tag as a > generic block tag, rather than limiting its use to true paragraphs, > and using the

tag for generic blocks of text. I am happy to say > that the text is mostly correct in this regard. The byline

by > Bah??u?ll?h

should be marked using the tag instead of >

; there may be other similar problems I simply haven't encountered > yet. > You are correct. That'll get fixed today. > It appears that the file is latin-1 encoded, despite the fact that the > DTD claims that it is utf-8 encoded. This caused Firefox some grief as > it tried to utf-8-decode some latin-1 accented vowels. > I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 was a superset of Latin1? Anyway, I know if this particular file there are quite a few UTF-8 encoded characters (and a couple more that should be that we found yesterday backchannel). > > If you're interested, I'll start putting together a generic CSS file > for TEI. We aren't too interested in CSS directly for the TEI file (the css file sitting beside the TEI file right now is a mistake ... that should be changed later today). However, once I have a few more documents posted and people seem fairly satisfied with the results, I want to get alternate CSS files submitted by other people for the HTML documents. Also, if any industrious programmers out there know TEI conversions and would like to tackle the job of preparing a conversion process for other end formats (such as Palm files, Plucker, MS Reader, etc) please let me and/or Marcello know. The conversion must run on Linux (our server OS) and be open source (for future compatibility). Josh From greg at durendal.org Tue Aug 23 04:47:20 2005 From: greg at durendal.org (Greg Weeks) Date: Tue Aug 23 04:47:44 2005 Subject: [gutvol-d] 1950 periodicals renewals Message-ID: When looking for the periodicals renewals for 1950 last night I didn't find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948, 1949 and 1950 didn't even have a renewals section in the book. This is "The Catalog of Copyright Entries" in the Carnegie Library. Does anyone know what's going on with these? I've got some journal entries I'm trying to put a rule 6 clearance together for and I neede information on 1950 through 1955. -- Greg Weeks http://durendal.org:8080/greg/ From Bowerbird at aol.com Tue Aug 23 08:07:38 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Aug 23 08:07:46 2005 Subject: [gutvol-d] the issues with using stylesheets across many documents Message-ID: <1d9.4317f85a.303c95ba@aol.com> i'd like to assure myself that there is experience here on the issues with using stylesheets across many documents. how would you sum up _the_major_question_, and answer it? what problems typically arise? what are some workarounds? when/how/why do workarounds cause their own problems? anyone who has worked extensively with stylesheets knows their magical power is typically matched by an ornery ability to mess things up too, and very badly. is that expertise here? please, someone, show me that it is, with a detailed treatment. (or else i will have to come in and give one, and you all know how insufferable my superior tone can be, right?). thank you. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/1abce568/attachment.html From jon at noring.name Tue Aug 23 08:21:04 2005 From: jon at noring.name (Jon Noring) Date: Tue Aug 23 09:02:12 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <430B114A.4030808@hutchinson.net> References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net> <430B114A.4030808@hutchinson.net> Message-ID: <484135928.20050823092104@noring.name> Joshua wrote: > Lee Passey wrote: >> As Mr. Noring is always quick to point out, XML files can be viewed >> natively in both Firefox and IE6 when accompanied by appropriate style >> sheets, so I attempted to open this file directly in both of these >> browsers. > While this is true, our tei files are specifically meant as a master > document and NOT as a viewing document. They will NOT parse in any > browser "out of the box". As you've seen, you can jury-rig things to > the point where it is usuable, but that is not our intention. We > provide the HTML files directly for people that want to browse the file > in IE or Firefox. One value in the direct viewing of PG-TEI documents is for checking the markup -- to make sure the content is properly marked up (Lee later brought up a specific example of incorrectly applied markup to the particular PG-TEI document under discussion.) For example, one could put together a "silly.css", using a variety of text colors, font-styles, font-weights, etc., to highlight certain structures and text semantics. Another knotty issue is that TEI includes structural/semantic markup that current HTML-based browsers don't know how to natively (without CSS) handle or interpret properly (and even with the right CSS some substandard browsers like IE6 can't be forced to handle properly.) This includes the inline note tag -- HTML has never had an inline note tag where it is assumed, even without CSS, the browser will pull the note out of the main flow and present it separately (such as in a popup window.) [HTML *should* have had this feature from the start but that's water under the bridge -- XHTML 2.0 plans to include functionality to allow this, so future browsers will have to be able, without CSS, to extract certain inline stuff and render it outside the main flow, such as in a popup window, to the side, or other means. My kudos to the XHTML working group for implementing this!] > Also, we have had some backchannel discussion about how the web server > should serve the .tei files. I think Marcello is going to change the > server to tell your browser that the .tei files is a mime encoding of > text so that it will display like a .txt file would. This will help > prevent people from being confused when their browser tries to display > the file directly and fails miserably. Good point! Another way around the issue is to simply zip up the TEI document for download, and include a separate "readthisfirst.txt" file describing what it is and how to directly render it if that is of interest to the end-user. >> Firefox does not have this problem, but Firefox also breaks when it >> encounters named entities, even when the entities are referenced in >> .ent files included from the dtd's, leading me to believe that Firefox >> avoids the problems associated with "roaming dtd's" by simply not >> parsing them in the first place. This is interesting. Didn't know this. I don't think Firefox has concentrated on general XML rendering. Interestingly FF does support a subset of XLink, thus it is possible, using XLink, to create hypertext links in non-XHTML documents (with the full XLink, it is possible to do other things, such as embed images, to be equivalent to the HTML and tags.) I'll have to repeat this experiment with Opera 8 to see if they've enabled some XLink stuff (Opera 7 did not.) >> It appears that the file is latin-1 encoded, despite the fact that the >> DTD claims that it is utf-8 encoded. This caused Firefox some grief as >> it tried to utf-8-decode some latin-1 accented vowels. > I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 > was a superset of Latin1? Anyway, I know if this particular file there > are quite a few UTF-8 encoded characters (and a couple more that should > be that we found yesterday backchannel). If what Lee refers to as "Latin-1" is ISO-8859, then Lee is right, it is NOT correct to specify the document encoding as UTF-8 since they are incompatible. It is my personal view that ISO-8859 should never be used for the PG masters -- UTF-8 should be used instead. That "7-bit" ASCII conforms to UTF-8 is a nice bonus. (But ISO-8859-x, a.k.a. "8-bit ASCII" and "Latin-1", does not conform to UTF-8.) >> If you're interested, I'll start putting together a generic CSS file >> for TEI. > We aren't too interested in CSS directly for the TEI file (the css file > sitting beside the TEI file right now is a mistake ... that should be > changed later today). However, once I have a few more documents posted > and people seem fairly satisfied with the results, I want to get > alternate CSS files submitted by other people for the HTML documents. As noted above, I think a generic CSS file for PG-TEI would be a great idea! It allows direct viewing of the master for errors, and the CSS can be tweaked for direct viewing by end-users (probably restricted to Firefox and Opera in order to handle inline notes, where the CSS has to move the inline notes and similar stuff to a box outside of the flow of the text, maybe highlighted in some way -- as noted above, IE6 chokes on this CSS2 stuff.) Another issue of incompatibility, where CSS may break down, is that the table model in TEI is different in some ways from the HTML table model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI include support for TEI tables? (I would assume it does.) > Also, if any industrious programmers out there know TEI conversions and > would like to tackle the job of preparing a conversion process for other > end formats (such as Palm files, Plucker, MS Reader, etc) please let me > and/or Marcello know. The conversion must run on Linux (our server OS) > and be open source (for future compatibility). For MS Reader, unless one wants to build an unapproved and possibly illegal converter (since the LIT format has been cracked it is now possible), one has to use Microsoft's litgen.dll to produce LIT files, thus restricting the converter to MS Windows (litgen.dll requires, in turn, MSXML for XML document parsing and validation.) Litgen takes as input an OEBPS 1.0.1 Publication. Now I do think it worthwhile to produce OEBPS as one of the output formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion into LIT so others may do so automatically), and OEBPS 1.2 (which is the current OEBPS standard and is preferable.) Essentially, the process works as follows: PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s) OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication Inline notes would be handled by inserting an anchor link where the note was, and pulling the note into a separate XHTML/OEBPS document. The notes can either be aggregated into one document, or each be kept in their own document. The OEBPS 1.x framework will easily handle multiple documents that comprise one publication (it's very cool, really, in how it works.) Jon (p.s., Lee, did you experiment with Opera 8? They have a full-featured free version -- just have to put up with the ads in the free version.) From joshua at hutchinson.net Tue Aug 23 09:21:50 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Aug 23 09:21:56 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! Message-ID: <20050823162150.78010EE161@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "Jon Noring" > > Another issue of incompatibility, where CSS may break down, is that > the table model in TEI is different in some ways from the HTML table > model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI > include support for TEI tables? (I would assume it does.) > Yes it does. See www.gutenberg.org/tei for a link to the documentation we have on PGTEI. > > PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s) > > OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication > If you or anyone else would like to code something up, I'd be happy to test it out. I'm afraid my talents do not lie in that direction! ;) Josh From marcello at perathoner.de Tue Aug 23 09:34:40 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 09:34:56 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <484135928.20050823092104@noring.name> References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net> <430B114A.4030808@hutchinson.net> <484135928.20050823092104@noring.name> Message-ID: <430B5020.7040307@perathoner.de> Jon Noring wrote: > As noted above, I think a generic CSS file for PG-TEI would be a great > idea! Every PGTEI producer is free to use as many CSS she wants. It just doesn't make sense to post them. > Now I do think it worthwhile to produce OEBPS as one of the output > formats. PG/DP can generate both OEBPS 1.0.1 (optimized for conversion > into LIT so others may do so automatically), and OEBPS 1.2 (which is > the current OEBPS standard and is preferable.) Essentially, the process > works as follows: > > PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s) > > OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication We already produce XHTML 1.0. So if you want to build a converter XHTML -> OEBPS you may start right now. P.S. I just don't have the time nor the inclination to read all your words. If you want better answers I suggest getting to the point faster. -- Marcello Perathoner webmaster@gutenberg.org From sly at victoria.tc.ca Tue Aug 23 09:54:54 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Aug 23 09:55:04 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <430B114A.4030808@hutchinson.net> References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net> <430B114A.4030808@hutchinson.net> Message-ID: On Tue, 23 Aug 2005, Joshua Hutchinson wrote: > I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 > was a superset of Latin1? Anyway, I know if this particular file there > are quite a few UTF-8 encoded characters (and a couple more that should > be that we found yesterday backchannel). > Well, if you look merely at abstract numbered code points, it is correct to say that the initial code points of Unicode are numbered the same as ISO Latin-1. However, you have to realize that, while ISO Latin-1 is a legacy encoding in which each character is encoded using only one byte, the nature of Unicode has led to different different methods (Unicode Transformation Formats) of actually encoding each character in a series of bytes. One way to look at UTF-8 is as a compressed format. (When used to encode texts which consist primarily of the character found in lower ascii, UTF-16, which uses two bytes for each character, results in noticably longer files) Ascii characters are encoded the same in UTF-8 as in common legacy single-byte encodings, but all higer numbered characters are represented by muli-byte sequences. Excerpt from: http://en.wikipedia.org/wiki/UTF-8 So the first 128 characters need one byte. The next 1920 characters need two bytes to encode. This includes Latin alphabet characters with diacritics, Greek, Cyrillic, Coptic, Armenian, Hebrew, and Arabic characters. The rest of the BMP characters use three bytes, and additional characters are encoded in four bytes. I hope that is somewhat clear.... Andrew From marcello at perathoner.de Tue Aug 23 09:54:57 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 09:55:07 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <430A58DD.60004@novomail.net> References: <20050821132650.DA1298C992@pglaf.org> <430A58DD.60004@novomail.net> Message-ID: <430B54E1.9080900@perathoner.de> Lee Passey wrote: > It appears that the file is latin-1 encoded, despite the fact that the > DTD claims that it is utf-8 encoded. This caused Firefox some grief as > it tried to utf-8-decode some latin-1 accented vowels. That is just what Apache thinks it is because it doesn't look inside the file before serving it. Apache can be made to serve the encoding based on the file extension. Lacking a definite extension it will serve the default which is iso-8859-1. The same problem exists with all plain text files in the archive. They are all served as iso-8859-1. We cannot fix that unless we rename all files: 12345-8.txt --> 12345-8.txt.8 12345-0.txt --> 12345-0.txt.0 In this case Apache sees the .0 extension, strips it, and serves the file as 12345-0.txt with utf-8 encoding. And don't look at me. I made this suggestion before the new filesystem went live. > I grabbed an arbitrary "tei.css" style sheet off the net, and added the > line: > > You can also include an XSL stylesheet which gives you far more power. But why do you want to look at the TEI file in the browser when there is an HTML file available? -- Marcello Perathoner webmaster@gutenberg.org From kth at srv.net Tue Aug 23 09:45:50 2005 From: kth at srv.net (Kevin Handy) Date: Tue Aug 23 10:08:09 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! In-Reply-To: <20050823162150.78010EE161@ws6-1.us4.outblaze.com> References: <20050823162150.78010EE161@ws6-1.us4.outblaze.com> Message-ID: <430B52BE.5000906@srv.net> Joshua Hutchinson wrote: >----- Original Message ----- >From: "Jon Noring" > > >>Another issue of incompatibility, where CSS may break down, is that >>the table model in TEI is different in some ways from the HTML table >>model. Not sure if this can be fixed with CSS 'display'. Does PG-TEI >>include support for TEI tables? (I would assume it does.) >> >> >> > >Yes it does. See www.gutenberg.org/tei for a link to the documentation we have on PGTEI. > > > Any plans on making something like guiguts for pgtei, and bundling all the conversion routines with it? >>PGTEI --> XHTML 1.1 (or XHTML 1.0 Strict) --> OEBPS 1.x Document(s) >> >>OEBPS 1.x document(s) + OEBPS Package --> OEBPS 1.x Publication >> >> >> > >If you or anyone else would like to code something up, I'd be happy to test it out. I'm afraid my talents do not lie in that direction! ;) > > From marcello at perathoner.de Tue Aug 23 10:12:28 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 10:12:36 2005 Subject: [gutvol-d] the issues with using stylesheets across many documents In-Reply-To: <1d9.4317f85a.303c95ba@aol.com> References: <1d9.4317f85a.303c95ba@aol.com> Message-ID: <430B58FC.3030109@perathoner.de> Bowerbird@aol.com wrote: > i'd like to assure myself that there is experience here on > the issues with using stylesheets across many documents. No, there is not. We are all new into this. We want to get there and we will learn as we go. Software development is an iterative business. Many problems surface as you start using the first implementation. With what you learn from the first implementation you go back and do the second. And so on. Requiring all problems to be known and solved in advance is the one sure fire thing to never get started. Known as: "analysis paralysis". -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Aug 23 10:31:49 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Aug 23 10:32:03 2005 Subject: [gutvol-d] the issues with using stylesheets across many documents Message-ID: <19d.3a8005db.303cb785@aol.com> marcello said: > No, there is not. marcello, _you_ might not have any large-scale stylesheet experience. but perhaps someone else here has. if not, then it will be a very bumpy ride over the next few years, as y'all learn. > We are all new into this. um, well, speak for yourself. i started using stylesheets back in 1987, with the first release of ventura publisher. and i figure i made the 428 common mistakes most people make, and learned to avoid 'em... > We want to get there and we will learn as we go. and again, i'm guessing that someone here has already been there, and back, and will guide you, if you let them. there are a lot of people on this listserve, willing to help... > Software development is an iterative business. and again, someone has already done that back-and-forth. > Many problems surface as you > start using the first implementation. stylesheets are well past their "first implementation". > With what you learn from the first implementation > you go back and do the second. And so on. stylesheets are well past their "second implementation". > Requiring all problems to be known > and solved in advance is the one sure fire thing > to never get started. Known as: "analysis paralysis". and ignoring the lessons of the past is one sure-fire way to repeat the pain... now, is anyone willing to step in and help out project gutenberg? please? thank you... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/0edbda33/attachment.html From joshua at hutchinson.net Tue Aug 23 11:08:53 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Aug 23 11:09:00 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! Message-ID: <20050823180853.5BF0EEE19A@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "Kevin Handy" > Any plans on making something like guiguts for pgtei, and > bundling all the conversion routines with it? Not by me personally, but when I turn DP loose on this format, I expect a flurry of tools to follow. :) As an aside, I do all my editing for my TEI files in GuiGuts right now. My only annoyance right now is that I wish it didn't add the darned Byte Order Mark to the beginning of the file when it detects UTF-8 characters. Josh From lee at novomail.net Tue Aug 23 14:45:34 2005 From: lee at novomail.net (Lee Passey) Date: Tue Aug 23 14:45:49 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20) In-Reply-To: <20050823190003.BF81A8C832@pglaf.org> References: <20050823190003.BF81A8C832@pglaf.org> Message-ID: <430B98FE.3030906@novomail.net> Marcello Perathoner wrote: > Lee Passey wrote: > >> It appears that the file is latin-1 encoded, despite the fact that >> the DTD claims that it is utf-8 encoded. This caused Firefox some >> grief as it tried to utf-8-decode some latin-1 accented vowels. > > > That is just what Apache thinks it is because it doesn't look inside > the file before serving it. Apache can be made to serve the encoding > based on the file extension. Lacking a definite extension it will > serve the default which is iso-8859-1. In this case I saved the file to my local file system before doing anything with it. Are you suggesting that Apache (your server) looked at the contents of the file it was serving and replace the declaration to "" before serving it? Or are you suggesting that as it transfered the file it changed utf-8 encoded characters to Latin-1 encoding? (I've never seen that behavior in Apache before, but I could have overlooked something.) If I retrieved the file via FTP would it be different than if I retrieved it using HTTP? >> I grabbed an arbitrary "tei.css" style sheet off the net, and added >> the line: >> >> > > > You can also include an XSL stylesheet which gives you far more power. XSL isn't really a stylesheet, it is a scripting language for a transformational engine. XSL has many good uses, but applying styles to a document isn't one of them. Indeed, I've never figured out how to use XSL to style an XML file without having an existing Cascading Style Sheet that I could use for the actual styles. > But why do you want to look at the TEI file in the browser when there > is an HTML file available? Why ask why? Actually, I'm not interested in looking at the file at all; it's as boring as hell. What I _am_ interested in is exploring the use of TEI as an archive format, _and_ as a content delivery format. I think that enabling a TEI-XML file to be used by a browser directly, if it can be done without compromising its function as an archive format, is a worthwhile goal, and in many cases better than requiring some sort of XSL transformation before it can be viewed. From joshua at hutchinson.net Tue Aug 23 16:48:52 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Aug 23 15:24:19 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20) In-Reply-To: <430B98FE.3030906@novomail.net> References: <20050823190003.BF81A8C832@pglaf.org> <430B98FE.3030906@novomail.net> Message-ID: <430BB5E4.5090607@hutchinson.net> Lee Passey wrote: >> >> >>> It appears that the file is latin-1 encoded, despite the fact that >>> the DTD claims that it is utf-8 encoded. This caused Firefox some >>> grief as it tried to utf-8-decode some latin-1 accented vowels. >> Ok, I tried to see what grief you are talking about ... all the accented vowels I looked at are appearing correctly. Which ones are you having trouble with? (This is looking at the XML directly in Firefox) I thought everything in Latin-1 encoding would be the same under a UTF-8 encoding, but evidentally I'm mistaken there (which wouldn't be surprising, my encoding set knowledge is often shaky at best). Josh From jon at noring.name Tue Aug 23 15:31:54 2005 From: jon at noring.name (Jon Noring) Date: Tue Aug 23 15:32:08 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20) In-Reply-To: <430BB5E4.5090607@hutchinson.net> References: <20050823190003.BF81A8C832@pglaf.org> <430B98FE.3030906@novomail.net> <430BB5E4.5090607@hutchinson.net> Message-ID: <122792848.20050823163154@noring.name> Joshua wrote: > Lee Passey wrote: >> It appears that the file is latin-1 encoded, despite the fact that >> the DTD claims that it is utf-8 encoded. This caused Firefox some >> grief as it tried to utf-8-decode some latin-1 accented vowels. > Ok, I tried to see what grief you are talking about ... all the accented > vowels I looked at are appearing correctly. Which ones are you having > trouble with? (This is looking at the XML directly in Firefox) > > I thought everything in Latin-1 encoding would be the same under a UTF-8 > encoding, but evidentally I'm mistaken there (which wouldn't be > surprising, my encoding set knowledge is often shaky at best). Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL: http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 ) that the "template" has the following DOCTYPE: Why isn't it ? Is this the issue of what Lee observed, or is this a different issue? Jon From marcello at perathoner.de Tue Aug 23 15:52:05 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 15:52:18 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20) In-Reply-To: <122792848.20050823163154@noring.name> References: <20050823190003.BF81A8C832@pglaf.org> <430B98FE.3030906@novomail.net> <430BB5E4.5090607@hutchinson.net> <122792848.20050823163154@noring.name> Message-ID: <430BA895.7080802@perathoner.de> Jon Noring wrote: > Hmmm, I notice in the PG-TEI documentation (version 0.3 at URL: > http://www.gutenberg.org/tei/marcello/0.3/doc/20000-h/20000-h.html#toc_12 ) > that the "template" has the following DOCTYPE: > > > > Why isn't it > > Because most people will want to author their TEI files in iso-8859-1. If you want to use utf-8, just change the declaration. But you'll need an editor that groks utf-8. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Tue Aug 23 15:53:25 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 15:53:38 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 20) In-Reply-To: <430B98FE.3030906@novomail.net> References: <20050823190003.BF81A8C832@pglaf.org> <430B98FE.3030906@novomail.net> Message-ID: <430BA8E5.4050305@perathoner.de> Lee Passey wrote: > In this case I saved the file to my local file system before doing > anything with it. Then I don't know. The file is correct utf-8. Did you tell your editor that it is an utf-8 file? -- Marcello Perathoner webmaster@gutenberg.org From lee at novomail.net Tue Aug 23 16:31:03 2005 From: lee at novomail.net (Lee Passey) Date: Tue Aug 23 16:31:18 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <20050823160216.6E6438C8E8@pglaf.org> References: <20050823160216.6E6438C8E8@pglaf.org> Message-ID: <430BB1B7.1080006@novomail.net> Joshua Hutchinson wrote: > Lee Passey wrote: [snip] >> >> As Mr. Noring is always quick to point out, XML files can be viewed >> natively in both Firefox and IE6 when accompanied by appropriate >> style sheets, so I attempted to open this file directly in both of >> these browsers. >> > While this is true, our tei files are specifically meant as a master > document and NOT as a viewing document. They will NOT parse in any > browser "out of the box". As you've seen, you can jury-rig things to > the point where it is usuable, but that is not our intention. We > provide the HTML files directly for people that want to browse the > file in IE or Firefox. I understand that creating a file format which could be viewed without further processing was not your intention, but now that we have some evidence that suggests that it is a real possiblity is there any reason _not_ to pursue that possiblity, especially if it only requires adding three lines to the source (and making sure that all the dtd's are accessible)? [snip] >> I have no solution to this problem, except to suggest that named >> entities simply be avoided in favor of numeric entities, at least in >> the short term (I do note that the etext 16523-x.xml does not contain >> any named entities). >> > I personally prefer numeric entities, as well, but for the more common > ones, the conversion process will support named entities in the .tei > file. Most of them appear as unicode in the HTML, so it typically > isn't an issue in the final product. You are correct; so long as you are relying on conversion to HTML (or some other file format) before the file is used, there should be no problem (so long as the conversion utility can get to the correct .ent files). Use of named entities is only a problem if you are attempting to display the TEI-XML directly. [snip] >> It appears that the file is latin-1 encoded, despite the fact that >> the DTD claims that it is utf-8 encoded. This caused Firefox some >> grief as it tried to utf-8-decode some latin-1 accented vowels. >> > I may be wrong here (Marcello is my unicode guru), but I thought UTF-8 > was a superset of Latin1? Anyway, I know in this particular file > there are quite a few UTF-8 encoded characters (and a couple more that > should be that we found yesterday backchannel). UTF-8 and Latin-1 (aka ISO-8859-1) are both encoding methods. They share the same codepoints (the value of an acute 'e' is 233 in both encodings) but they use different encoding methods. Neither is a superset or subset of the other. Values from 0 to 127 are the same in both encodings, but values from 128 to 255 are encoded in a single byte in Latin-1 whereas those same values are encoded in two bytes in UTF-8. Values above 255 are represented in two or more bytes in UTF-8 (up to 6) where those same values cannot be represented at all in Latin-1. From an efficiency standpoint (which is not always the best way to look at things) if you have an English text which contains some few characters having values above 127, and which has as many above 255 as below, or if you have a text which contains a large number of characters with values above 255, UTF-8 is the probably the most efficient encoding (size-wise). If you have a western european text with a large number of characters above 127, but very few above 255 (French is a good example) Latin-1, with values above 255 expressed as entities (numberic or named) is probably the most efficient encoding. If you have a text where most of the characters have values above 1920 UTF-16 is probably the most efficient encoding (now we're really straying from the point). In any case, it doesn't matter which encoding is used, so long as it is not misrepresented in the declaration. >> If you're interested, I'll start putting together a generic CSS file >> for TEI. > > > We aren't too interested in CSS directly for the TEI file (the css > file sitting beside the TEI file right now is a mistake ... that > should be changed later today). However, once I have a few more > documents posted and people seem fairly satisfied with the results, I > want to get alternate CSS files submitted by other people for the HTML > documents. Well, I might do it anyway for my own edification and enjoyment (and because I think you _will_ be interested at some point in the future ;-).) Some months ago I put together a couple of tables showing how HTML could be mapped to TEI-lite, and vice-versa. The goal was to create a mapping that could be used for round-tripping via XSLT; that is, a TEI-lite document could be used to create an HTML document which could then be transformed back into TEI without loss of markup. I will probably start from those tables in creating a tei.css file. They may also be useful to you in creating XSLT scripts (aka XSL style sheets). If you're interested they can be found at www.passkeysoft.com/~lee/xhtml2tei.html and www.passkeysoft.com/~lee/tei2xhtml.html. > > Also, if any industrious programmers out there know TEI conversions > and would like to tackle the job of preparing a conversion process for > other end formats (such as Palm files, Plucker, MS Reader, etc) please > let me and/or Marcello know. The conversion must run on Linux (our > server OS) and be open source (for future compatibility). You probably don't need anything more than someone with basic shell scripting capabilities, as all the software to do this exists currently. When you say Palm files, I am assuming you mean PalmDOC files, which are nothing more than text files converted into the Palm Database format. This conversion can be performed by the command line program "Makedoc". Source code is available at http://linuxmafia.com/pub/palmos/other-os/makedoc9.tar.gz. The shell script would be: PGTEI -> (via XSLT) -> .txt -> (via makedoc9) -> .pdb Plucker is a progam which encapsulates a bundle of HTML files into a single file which can be rendered on the PalmOS. The script for a plucker transformation should be very similar to the PalmDOC transformation (I'm certain Mr. Desrosiers could help you with the precise syntax): PGTEI -> (via XSLT) -> HTML -> (via plucker distiller) -> .pdb To my knowledge there are no known lit compilers that run on Linux (thus making them ineligble by your requirements). This is not really a big deal because most MSReader users who are familiar with Project Gutenberg are comfortable making .lit files from HTML themselves, so if you can serve good HTML they will be happy. What I would really like to see is an XSL script that could do a PGTEI -> RTF transformation. It probably wouldn't be very useful, but it would sure be interesting. Now on a separate note: As part of my CSS experimentation, I set the display setting for the element to "none", because while I think the data is important, I'm not particularly interested in seeing it when I'm reading. When I did this, I thought I lost the title of the book because it only appears in the element. I discovered later the title was repeated in the element, identified as a er. As I read the TEI spec, (and I am by no means well-versed) I believe that there should also exist a element which should be part of the , and which should contain all the information traditionally found on the title page of a book. The main title should be marked as , subtitles should be marked as , and the byline should be marked as . This would be in addition to the information included in the element, which may be formated differently (e.g. the author's name may be presented last name first for automated catalog processing). I also had some question about the difference between the element and the element. Looking at the spec it seems that the <title> element is not to be used to indicate the title of the work, as would appear on a title page, but the title of _another_ work referenced in the main work (these are the titles we were taught to underline back in the days of single font typewriters). For example, if _The Kit?b-i-Aqdas_ made reference to the _Baghad-Vita_, it would be marked as <title>The Baghad-Vita, and should probably be rendered with an italicised font. I also note that you encoded the glossary at the end of the work with

tags (naughty, naughty). Based on what I saw in the TEI docs I would have encoded it as follows:

Glossary The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son and appointed Successor of Bah?'u'll?h, and the Centre of His Covenant.
I hope you find this useful. From marcello at perathoner.de Tue Aug 23 17:09:27 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Aug 23 17:09:41 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <430BB1B7.1080006@novomail.net> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> Message-ID: <430BBAB7.8050906@perathoner.de> Lee Passey wrote: > I understand that creating a file format which could be viewed without > further processing was not your intention, but now that we have some > evidence that suggests that it is a real possiblity is there any reason > _not_ to pursue that possiblity, especially if it only requires adding > three lines to the source (and making sure that all the dtd's are > accessible)? Supporting CSS styling will add another complexity layer to an already overly complex thing. A software architect has to leave things out to make the design implementable. Also, things like footnotes are impossible with CSS. So why bother? > In any case, it doesn't matter which encoding is used, so long as it is > not misrepresented in the declaration. Both the TEI and the XHTML file are correct. I don't know why it doesn't work for you. > As part of my CSS experimentation, I set the display setting for the > element to "none", because while I think the data is > important, I'm not particularly interested in seeing it when I'm > reading. When I did this, I thought I lost the title of the book because > it only appears in the element. I discovered later the > title was repeated in the element, identified as a er. As > I read the TEI spec, (and I am by no means well-versed) I believe that > there should also exist a element which should be part of > the , and which should contain all the information traditionally > found on the title page of a book. That is for the encoder to decide. If the title page is interesting enough to warrant a separate encoding, she will use etc. to mark it up. If the title page is just plain boring you can generate a standard title page with . This will pull all data out of the and save you the trouble. There are a lot of such shortcuts implemented like and . > I also note that you encoded the glossary at the end of the work with >

tags (naughty, naughty). Based on what I saw in the TEI docs I would > have encoded it as follows: > >

> Glossary > > > The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son > and appointed Successor of Bah?'u'll?h, and the Centre of His > Covenant. And it wouldn't have validated because gloss has no business inside list. -- Marcello Perathoner webmaster@gutenberg.org From jmdyck at ibiblio.org Tue Aug 23 17:55:07 2005 From: jmdyck at ibiblio.org (Michael Dyck) Date: Tue Aug 23 17:57:45 2005 Subject: [gutvol-d] 1950 periodicals renewals References: Message-ID: <430BC56B.5D6E7524@ibiblio.org> Greg Weeks wrote: > > When looking for the periodicals renewals for 1950 last night I didn't > find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948, > 1949 and 1950 didn't even have a renewals section in the book. This is > "The Catalog of Copyright Entries" in the Carnegie Library. Does anyone > know what's going on with these? I've got some journal entries I'm trying > to put a rule 6 clearance together for and I neede information on 1950 > through 1955. >From 1947 to 1950, renewals for all classes of registrations were published together in Part 14 of the CCE (which was actually split into 14B for Music and 14A for everything else). In 1947, they divided up 14A somewhat by class (so periodicals renewals only occupied 3 pages), but then they gave up and put all of 14A into a single collation. So the periodical (class B) renewals for 1950 are in Part 14A, interfiled with all the other renewals for everything-but-music. E.g., for Jan-June 1950, they're spread over pages 1-60 of Part 14A, and for July-Dec 1950, they're spread over pages 61-121. PG has text versions of the *book* renewals for 1950-1977. Because of the 1948-1950 interfiling, PG also has the periodical renewals for 1950, scattered throughout etexts #11801 (Jan-June 1950) and #11802 (July-Dec 1950). E.g., in #11801, the first periodical renewal is under "Abbott's Digest of All the New York Reports". (You can tell it's a periodical renewal because its original registration starts with 'B'.) Page images for these two volumes appear at . --- After 1950, the Copyright Office discontinued Part 14 of the CCE, and went back to having each Part of the CCE also contain the renewal records in the classes covered by that Part. So the periodical renewals for 1951-1977 are in Part 2 of the CCE. E.g., for Jan-June 1951, they're on pages 155-159 of Part 2, and for July-Dec 1951, they're on pages 303-307. Page images for these two sections appear at , about 3/4 of the way down the page. -Michael Dyck From jon at noring.name Tue Aug 23 18:12:05 2005 From: jon at noring.name (Jon Noring) Date: Tue Aug 23 18:12:27 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <430BB1B7.1080006@novomail.net> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> Message-ID: <1218078621.20050823191205@noring.name> Lee Passey wrote: > Joshua Hutchinson wrote: >> While this is true, our tei files are specifically meant as a master >> document and NOT as a viewing document. They will NOT parse in any >> browser "out of the box". As you've seen, you can jury-rig things to >> the point where it is usuable, but that is not our intention. We >> provide the HTML files directly for people that want to browse the >> file in IE or Firefox. > I understand that creating a file format which could be viewed without > further processing was not your intention, but now that we have some > evidence that suggests that it is a real possiblity is there any reason > _not_ to pursue that possiblity, especially if it only requires adding > three lines to the source (and making sure that all the dtd's are > accessible)? Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is also a real possibility. But I believe, subject to change as I learn more from the experts here and the TEI-L folk, that in order to make PGTEI+CSS2 to render in web standards browsers (limited now to Firefox and maybe Opera 8) we also have to appropriately constrain/subset the PG-TEI vocabulary (allowed elements/attributes/attr-values) and content models (what results may be somewhat like TEI-Lite, but not exactly the same -- we can certainly add our own tags as needs require.) We may also have to give up a couple things. [Note: Even if CSS2 rendering is not of interest, I think PG-TEI, when released as version 1.0, needs to be appropriately constrained to make life a whole lot easier for everyone using it -- subject of a future message if this topic comes up.] Assuming appropriate constraints, here's the five items needing further investigation to see how to get them to render properly using CSS2 (there may be other TEI constructs which don't fit well into the XHTML model): 1) The TEI tag. If placed directly inline (not indirectly referenced), it is possible in CSS2 to declare it block and move it outside of the main flow, which is a reasonable way to present it (even if not the best.) I've actually experimented with this, but my test files are inexplicably long-lost . This won't work in IE6, but then IE6 sucks when it comes to web standards support. (I assume with XSLT that more advanced moving around of the content within notes is possible to do, such as dumping it into another document or placing it in a notes section.) 2) Hypertext links. CSS2 'display' provides no mapping for anchors. XLink will work, but then that's outside of TEI. (XLink for hypertext linking is recognized in Mozilla/Firefox, but not in Opera 7 -- don't know about Opera 8 yet. Try the following test: http://www.windspun.com/demoxml/demolink.xml 3) Tables. I think the basic TEI table model will map to the XHTML model (there's quite a few table-related CSS2 'display' values.) However, if PG-TEI will optionally allow other table models to be used, such as CALS, all bets are off. I'm not sure that even XSLT will be able to properly map any CALS table to XHTML (may require something outside of XSLT to do the transformation.) 4) Lists. I think that TEI Lists can be made to render properly with CSS2 'display', but not sure. It needs experimentation. 5) Images. CSS2 'display' has no mapping for images and objects. XLink provides the ability to embed objects, but no web browser appears to support this functionality of XLink yet, and anyway XLink will not be used to specify images in PG-TEI documents. (Hmmm, I think here it may be possible with CSS2 to pull out the name of the image and then use that name as a string to embed the image back in -- CSS2 is capable of image embedding. Need to experiment with it. It might work in IE6, too.) >> I personally prefer numeric entities, as well, but for the more common >> ones, the conversion process will support named entities in the .tei >> file. Most of them appear as unicode in the HTML, so it typically >> isn't an issue in the final product. > You are correct; so long as you are relying on conversion to HTML (or > some other file format) before the file is used, there should be no > problem (so long as the conversion utility can get to the correct .ent > files). Use of named entities is only a problem if you are attempting to > display the TEI-XML directly. Yes, definitely! Of course, those named character entities which are defined in HTML/XHTML will be renderable in webs standards browsers. But I think it best, in whatever DP exports as PG-TEI, to use numeric character entities. For primarily "ASCII" documents, a manifest of non-ASCII characters used in the document can be placed in a comment somewhere in the header. This allows someone to know what ሴ found in the text is (here it is an Ethiopic character), without having to refer to the Unicode docs. I build a non-ASCII character manifest for many of the XHTML documents I author. > In any case, it doesn't matter which encoding is used, so long as it is > not misrepresented in the declaration. Yes. To reply to Marcello's comment in another message, the PG-TEI documentation should make it clear, and provide an example, of using either ISO-8859-1 or UTF-8 in the XML declaration. If it was my druthers, only UTF-8 should be used, but a compromise where ISO-8859-1 can also be used is acceptable. But no others for all mostly Latin documents! And I'd work at a future time to re-encode documents in ISO-8859-1 into UTF-8. >> We aren't too interested in CSS directly for the TEI file (the css >> file sitting beside the TEI file right now is a mistake ... that >> should be changed later today). However, once I have a few more >> documents posted and people seem fairly satisfied with the results, I >> want to get alternate CSS files submitted by other people for the HTML >> documents. > Well, I might do it anyway for my own edification and enjoyment (and > because I think you _will_ be interested at some point in the future ;-).) Careful Lee, you almost sound like Bowerbird on that one (but not quite.) I think it is an excellent exercise to explore how to properly render XML-conforming TEI documents using only CSS2 in web standards browsers. It may indicate how to constrain TEI so it is renderable, which may be useful for the set of criteria to build the constrained PG-TEI subset of TEI. It is also useful for the proposed TEI support in OpenReader. > Some months ago I put together a couple of tables showing how HTML could > be mapped to TEI-lite, and vice-versa. The goal was to create a mapping > that could be used for round-tripping via XSLT; that is, a TEI-lite > document could be used to create an HTML document which could then be > transformed back into TEI without loss of markup. I will probably start > from those tables in creating a tei.css file. They may also be useful to > you in creating XSLT scripts (aka XSL style sheets). If you're > interested they can be found at > www.passkeysoft.com/~lee/xhtml2tei.html > and www.passkeysoft.com/~lee/tei2xhtml.html. Well, round-tripping using XSLT and direct rendering of TEI using CSS2 are two different things. I believe XSLT has more power, but CSS2 is not bad, and CSS3 adds some new stuff (but mostly not supported in Firefox and Opera.) >> Also, if any industrious programmers out there know TEI conversions >> and would like to tackle the job of preparing a conversion process for >> other end formats (such as Palm files, Plucker, MS Reader, etc) please >> let me and/or Marcello know. The conversion must run on Linux (our >> server OS) and be open source (for future compatibility). > To my knowledge there are no known lit compilers that run on Linux (thus > making them ineligble by your requirements). This is not really a big > deal because most MSReader users who are familiar with Project Gutenberg > are comfortable making .lit files from HTML themselves, so if you can > serve good HTML they will be happy. My view in LIT production is to go from PG-TEI to well-structured XHTML 1.1 (which is probably what Lee means by "HTML".) Then from there build OEBPS 1.0.1 (LIT optimized) and OEBPS 1.2. Then let end-users convert the OEBPS 1.0.1 to LIT using the simple litconvertdemo in MS Reader's SDK (I have a "non-demo" version of the same). This approach takes full advantage of what LIT provides, while ReaderWorks does not (RW is buggy plus does not support a couple of the Reader/LIT features.) That is, to produce the hightest quality LIT having available the full range of Reader/LIT features, it is much better to start with OEBPS 1.0.1 than to use ReaderWorks which assembles HTML fragments. Jon From greg at durendal.org Tue Aug 23 18:20:51 2005 From: greg at durendal.org (Greg Weeks) Date: Tue Aug 23 18:21:05 2005 Subject: [gutvol-d] 1950 periodicals renewals In-Reply-To: <430BC56B.5D6E7524@ibiblio.org> Message-ID: On Tue, 23 Aug 2005, Michael Dyck wrote: > Greg Weeks wrote: > > > > When looking for the periodicals renewals for 1950 last night I didn't > > find any. 1947 had about 900 renewals and 1951 had about 1000, but 1948, > > 1949 and 1950 didn't even have a renewals section in the book. This is > > "The Catalog of Copyright Entries" in the Carnegie Library. Does anyone > > know what's going on with these? I've got some journal entries I'm trying > > to put a rule 6 clearance together for and I neede information on 1950 > > through 1955. > > >From 1947 to 1950, renewals for all classes of registrations were > published together in Part 14 of the CCE (which was actually split into > 14B for Music and 14A for everything else). In 1947, they divided up 14A > somewhat by class (so periodicals renewals only occupied 3 pages), but > then they gave up and put all of 14A into a single collation. > > So the periodical (class B) renewals for 1950 are in Part 14A, > interfiled with all the other renewals for everything-but-music. E.g., > for Jan-June 1950, they're spread over pages 1-60 of Part 14A, and for > July-Dec 1950, they're spread over pages 61-121. > > PG has text versions of the *book* renewals for 1950-1977. Because of > the 1948-1950 interfiling, PG also has the periodical renewals for 1950, > scattered throughout etexts #11801 (Jan-June 1950) and #11802 (July-Dec > 1950). E.g., in #11801, the first periodical renewal is under "Abbott's > Digest of All the New York Reports". (You can tell it's a periodical > renewal because its original registration starts with 'B'.) Page images > for these two volumes appear at > . > > --- > > After 1950, the Copyright Office discontinued Part 14 of the CCE, and > went back to having each Part of the CCE also contain the renewal > records in the classes covered by that Part. > > So the periodical renewals for 1951-1977 are in Part 2 of the CCE. E.g., > for Jan-June 1951, they're on pages 155-159 of Part 2, and for July-Dec > 1951, they're on pages 303-307. Page images for these two sections > appear at , about > 3/4 of the way down the page. Thank you. That's what I wanted to know. 1950 is already done then for PG. I have photocopies now of 1951-1969 for periodicals renewals. I know books have already been done by DP. -- Greg Weeks http://durendal.org:8080/greg/ From Bowerbird at aol.com Tue Aug 23 19:18:38 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Aug 23 19:18:56 2005 Subject: [gutvol-d] the new mantra for project gutenberg Message-ID: <82.2ed2bc5e.303d32fe@aol.com> marcello said: > I don't know why it doesn't work for you. this will increasingly become the mantra for project gutenberg... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/789da4cd/attachment.html From jon at noring.name Tue Aug 23 19:38:42 2005 From: jon at noring.name (Jon Noring) Date: Tue Aug 23 19:38:59 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <430BBAB7.8050906@perathoner.de> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> <430BBAB7.8050906@perathoner.de> Message-ID: <143389361.20050823203842@noring.name> Marcello wrote: > Lee Passey wrote: >> I also note that you encoded the glossary at the end of the work with >>

tags (naughty, naughty). Based on what I saw in the TEI docs I would >> have encoded it as follows: >> >>

>> Glossary >> >> >> The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son >> and appointed Successor of Bah?'u'll?h, and the Centre of His >> Covenant. > And it wouldn't have validated because gloss has no business inside list. TEI P4 shows how to do it (I think): http://www.tei-c.org/P4X/DS.html#TDX-280 Then from there it links to: http://www.tei-c.org/P4X/CO.html#COLI Where it gives the following example: Report of the conduct and progress of Ernest Pontifex. Upper Vth form — half term ending Midsummer 1851 Idle listless and unimproving ditto ditto Orderly Not satisfactory, on account of his great unpunctuality and inattention to duties Also refer to: http://www.tei-c.org/P4X/CO.html#COHQU Which talks about the element. It appears that this particular markup problem has appeared before for TEI-P4 to even discuss it (see the prior links.) Definitely Lee is right in that

is not the best for this purpose, and Marcello is right in that how Lee used it is incorrect. In fact, the closer I look at the above example, the more it looks like XHTML definition lists with almost an exact mapping between the two except that XHTML

(analogous to TEI ) cannot contain anything but
pairs, while the TEI version can also contain a er. In fact, as I look at it, getting the example above to work in XHTML is problematic because of the line. In fact, XHTML has pretty poor list support for internal headers and the like (all the lists: ol, ul, and dl, only support li, and dd/dt for dl), so this looks like item #6 in my "problems with TEI+CSS2 rendering" list. Jon From joshua at hutchinson.net Tue Aug 23 21:28:27 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Aug 23 20:05:36 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <143389361.20050823203842@noring.name> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> <430BBAB7.8050906@perathoner.de> <143389361.20050823203842@noring.name> Message-ID: <430BF76B.9010701@hutchinson.net> Jon Noring wrote: >Marcello wrote: > > >>Lee Passey wrote: >> >> > > > >>>I also note that you encoded the glossary at the end of the work with >>>

tags (naughty, naughty). Based on what I saw in the TEI docs I would >>>have encoded it as follows: >>> >>>

>>>Glossary >>> >>> >>>The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest son >>>and appointed Successor of Bah?'u'll?h, and the Centre of His >>>Covenant. >>> >>> > > > >>And it wouldn't have validated because gloss has no business inside list. >> >> > > > There is a concept that Marcello and I have discussed of markup "levels". When it comes to something like TEI, there are so many ways you can add meta data it is completely daunting at times. In this example, yes, a more specific markup could have been used. But, in the final render, it works just fine as

blocks. Another example is a text with foreign words interspersed throughout. Often, those words would be printed in italics in the original book. Now, the simplest markup in TEI would be to put around the word. But you could also mark the word with a foreign tag. In the final render, it would look exactly the same, but the second option provides more specific metadata. You could even go further by provide a translation of the foreign word inside the attribute (the markup escapes me at the moment). The markup that would cover what PG currently has would be want I would call a "level one markup" and that is the minimum, obviously, that a TEI could be marked to. Level two would be given a little more metadata, but nothing drastic. Maybe marking certain words as foreign instead of italics. Marking a letter as such instead of just a block of indented paragraphs. etc. etc. Level three would be going the extra, extra mile. It's the kind of markup I don't expect to see, but is possible in TEI. I expect most TEI documents we post will fall in level one or level two. Josh From JBuck814366460 at aol.com Tue Aug 23 20:43:22 2005 From: JBuck814366460 at aol.com (Jared Buck) Date: Tue Aug 23 20:48:47 2005 Subject: [gutvol-d] the new mantra for project gutenberg In-Reply-To: <82.2ed2bc5e.303d32fe@aol.com> References: <82.2ed2bc5e.303d32fe@aol.com> Message-ID: <430BECDA.9040801@aol.com> An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050823/44e1fc85/attachment.html From lee at novomail.net Wed Aug 24 08:59:45 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 08:59:56 2005 Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 21 In-Reply-To: <20050824005745.3DB258C8E0@pglaf.org> References: <20050824005745.3DB258C8E0@pglaf.org> Message-ID: <430C9971.7070800@novomail.net> Marcello Perathoner wrote: > Lee Passey wrote: > >> I understand that creating a file format which could be viewed >> without further processing was not your intention, but now that we >> have some evidence that suggests that it is a real possiblity is >> there any reason _not_ to pursue that possiblity, especially if it >> only requires adding three lines to the source (and making sure that >> all the dtd's are accessible)? > > > Supporting CSS styling will add another complexity layer to an already > overly complex thing. A software architect has to leave things out to > make the design implementable. Adding three lines to the template file adds complexity? > Also, things like footnotes are impossible with CSS. So why bother? I've never had any problems with footnotes and CSS. But the real key is to separate the TEI and the CSS, even at a conceptual level. I would never suggest creating a TEI file with the assumption that it would be rendered in conjunction with some specific CSS file, or indeed assuming that it will have some specific rendering at all. I recommend creating TEI files that are both valid and correct with regard to the TEI spec and not be concerned at all about how it might be rendered. On the other hand, simple modifications which will enable other people and applications to select a rendering should be acceptable if it's not a hinderance to the primary goal of producing valid and correct TEI. >> In any case, it doesn't matter which encoding is used, so long as it >> is not misrepresented in the declaration. > > > Both the TEI and the XHTML file are correct. I don't know why it > doesn't work for you. I think I may. I downloaded the ZIP archive of the file to be sure that there were no issues involving Apache or Firefox. After extracting the contents, and before touching the file with any other application, I did a hexdump on the file. Sure enough, it was valid UTF-8 encoding. I don't know which of my editors did the conversion, but I suspect it was Microsoft's XML editor which ships as part of of Visual Studio Dot Net. I also suspect that the conversion was not to iso-8859-1 but to win-1252, which is indistinquishable from 8859-1 except in the range of 128-159. The HTML editor that shipped with earlier versions of Visual C++ was known to do this conversion under the covers and without warning. After all, if you're running on a version of Microsoft Windows you're obviously going to want to be using Microsoft's own character mappings, right? >> As part of my CSS experimentation, I set the display setting for the >> element to "none", because while I think the data is >> important, I'm not particularly interested in seeing it when I'm >> reading. When I did this, I thought I lost the title of the book >> because it only appears in the element. I discovered >> later the title was repeated in the element, identified as a >> er. As I read the TEI spec, (and I am by no means well-versed) >> I believe that there should also exist a element which >> should be part of the , and which should contain all the >> information traditionally found on the title page of a book. > > > That is for the encoder to decide. If the title page is interesting > enough to warrant a separate encoding, she will use etc. > to mark it up. Indeed. And as Mr. Hutchinson is the encoder, I'm suggesting he ought to conside it. > If the title page is just plain boring you can generate a standard > title page with . This will pull all data out > of the and save you the trouble. > > There are a lot of such shortcuts implemented like > and . I'm not terribly enamored with tags, because it seems to rely on software that so far is largely unimplemented. _I_ wouldn't recommend its use, but it _is_ part of the spec... >> I also note that you encoded the glossary at the end of the work with >>

tags (naughty, naughty). Based on what I saw in the TEI docs I >> would have encoded it as follows: >> >>

>> Glossary >> >> >> The "Servant of Bah?", Abb?s Effendi (1844-1921), the eldest >> son and appointed Successor of Bah?'u'll?h, and the Centre of His >> Covenant. > > > And it wouldn't have validated because gloss has no business inside list. True. I made the same mistake that Mr. Hutchinson did with the element: assuming that an element designed to indicate usage was instead structural. Mr. Hutchinson's orignal was valid but incorrect (because the textual fragments he was dealing with were not paragraphs). My example was correct but invalid (because <gloss> cannot appear within lists). By encapsulating the <gloss> elements with <item> elements the glossary would become both correct _and_ valid. (My thanks to Mr. Noring for posting the links to the relevant portions of the spec so I don't have to.) From Gutenberg9443 at aol.com Wed Aug 24 09:14:26 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Aug 24 09:19:45 2005 Subject: [gutvol-d] Re: prank someone is pulling Message-ID: <2b.79e03427.303df6e2@aol.com> TO ALL: Someone purporting to be from PG has faxed a book, in Finnish, the name of which is "Fredrika Runeberg," to the state of New Jersey Surveying Office. Whoever did it somehow managed to get around the requirement that the sender's name and telephone number is to be on the first page of all faxes. I have assured him that nobody in our organization did it. He wanted to fax the first page to me but my fax is down right now, and I wanted him to mail it as an attached file but he doesn't have a scanner. Therefore, he has lost the 33 sheets of paper that were in his fax machine and is afraid to try to reuse the fax machine because it will immediately try to go on printing. (I told him to turn it off, unplug it, then replug it and turn it on, and he would probably then be able to use the fax machine normally.) He didn't give me his name other than "Jim." I have his telephone number but will release it only to Greg. I have asked him to snail mail me the first page and let me see what I can find out. If anybody on this ML is the culprit, cease and desist and notify me personally that you did it. Then I will only bite your head off, spit it into your face, and then turn it over to Greg. If anybody on this ML is the culprit and does not admit it and gets caught, that person's ass is grass and that person will be permanently barred from this ML and everything else I can get him barred from. This conduct is unconscionable. Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/806e42fb/attachment.html From joshua at hutchinson.net Wed Aug 24 09:35:39 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Aug 24 09:35:45 2005 Subject: [gutvol-d] Re: prank someone is pulling Message-ID: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com> Ann, this is called a FAX bomb. It is similar to a "e-mail bomb" where someone drops a bunch of e-mail garbage on someone to clog their e-mail account. It is usually someone who wants to "get back" at someone else. FAX bombs are especially nasty because then you are not only tying up their service, you are wasting tangible resources like paper and ink. The reason our file is being used is probably pretty simple ... we have great big text files that are easily accessible and work perfectly for this kind of griefing. It is doubtful anyone that has anything to do with PG had anything to do with this. Josh ----- Original Message ----- From: Gutenberg9443@aol.com To: gutvol-d@lists.pglaf.org, gbnewby@pglaf.org Subject: [gutvol-d] Re: prank someone is pulling Date: Wed, 24 Aug 2005 12:14:26 EDT > > TO ALL: > > Someone purporting to be from PG has faxed a book, in Finnish, the name of > which is > "Fredrika Runeberg," to the state of New Jersey Surveying Office. Whoever > did it somehow managed to get around the requirement that the sender's name > and > telephone number is to be on the first page of all faxes. I have assured him > that nobody in our organization did it. He wanted to fax the first page to > me but my fax is down right now, and I wanted him to mail it as an attached > file but he doesn't have a scanner. Therefore, he has lost the 33 sheets of > paper that were in his fax machine and is afraid to try to reuse the fax > machine > because it will immediately try to go on printing. (I told him to turn it > off, unplug it, then replug it and turn it on, and he would probably then be > able to use the fax machine normally.) > > He didn't give me his name other than "Jim." I have his telephone number but > will release it only to Greg. > > I have asked him to snail mail me the first page and let me see what I can > find out. > > If anybody on this ML is the culprit, cease and desist and notify me > personally that you did it. Then I will only bite your head off, spit it into > your > face, and then turn it over to Greg. > > If anybody on this ML is the culprit and does not admit it and gets caught, > that person's ass is grass and that person will be permanently barred from > this ML and everything else I can get him barred from. This conduct is > unconscionable. > > Anne > > Do you like to breathe? > Then save the trees! > Begin a personal relationship > with an ebook > TODAY! > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Wed Aug 24 10:33:32 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Aug 24 10:33:54 2005 Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 21 In-Reply-To: <430C9971.7070800@novomail.net> References: <20050824005745.3DB258C8E0@pglaf.org> <430C9971.7070800@novomail.net> Message-ID: <430CAF6C.2020606@perathoner.de> Lee Passey wrote: > Adding three lines to the template file adds complexity? No, but soon you'll start and say things like: if we did this to the TEI file, the rendering thru CCS would be so much easier, etc. etc. > I would > never suggest creating a TEI file with the assumption that it would be > rendered in conjunction with some specific CSS file, or indeed assuming > that it will have some specific rendering at all. I recommend creating > TEI files that are both valid and correct with regard to the TEI spec > and not be concerned at all about how it might be rendered. You just proposed the exact opposite thing: to hard-code a set of CSS stylesheets into the file. > On the other > hand, simple modifications which will enable other people and > applications to select a rendering should be acceptable if it's not a > hinderance to the primary goal of producing valid and correct TEI. If anybody wants to view their TEI thru CSS they should apply their preferred stylesheet by hand. Some browsers let you select a user stylesheet. No need to pollute the TEI file with that. > I'm not terribly enamored with <divGen> tags, because it seems to rely > on software that so far is largely unimplemented. _I_ wouldn't recommend > its use, but it _is_ part of the spec... You need not use them. If you want to code the toc by hand, feel free. But you get some goodies if you use divGen like a toc with correct page numbers in the pdf. And you don't have to jiggle hundreds of links. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Aug 24 11:43:04 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 24 11:43:20 2005 Subject: [gutvol-d] on viewing the .pgtei file directly Message-ID: <1c6.2f5def07.303e19b8@aol.com> i think it would be marvelous to view the .pgtei file directly. why go through the pain of conversion if you don't have to? and the .pgtei file is the one with all the information in it, not? might as well view that, rather than some pale conversion... but let's get real here for a minute, ok? if the only people who can view the .pgtei file directly are the few who happen to be using a specific browser, there's no need to put a lot of resources in that direction. however, that's not really what lee is talking about, is it? no, it isn't. no sir. what lee is _really_ talking about is "openreader", which he has begun programming. (you _have_ begun, haven't you, lee? because there's no time like the present.) because, you see, a specialized e-book viewer-program (like openreader) can deliver an e-book experience that _far_surpasses_ the one that an end-user gets in a browser. and _that_ is the reason why people would want to view a .pgtei file directly, rather than look at an .html conversion; not because of the files per se -- it's silly to think end-users care anything about formats -- but because of the _viewer_ and the e-book _experience_ that was delivered therein... (savvy lurkers will recognize this as a straightforward variant of the argument i have been making all along...) so, if lee can deliver an openreader that is _cross-platform_ and runs on _older_hardware_, using _minimal_resources_, and can render the .pgtei file directly, giving the end-user a powerful e-book experience, all from a free-beer program, no one will use their funky web-browser to read an e-text... so let's wish lee success in his endeavor, for the ultimate good of all the end-users... -bowerbird p.s. when i try to view 16523-x.xml directly in firefox, it says: "this xml file does not appear to have any style information associated with it. the document tree is shown below." and then it shows me the document tree. how can i fix this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/298f5841/attachment.html From lee at novomail.net Wed Aug 24 12:03:37 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 12:03:50 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 22) In-Reply-To: <20050824155955.AAFC78C8ED@pglaf.org> References: <20050824155955.AAFC78C8ED@pglaf.org> Message-ID: <430CC489.5090701@novomail.net> Jon Noring <jon@noring.name> wrote: >Lee Passey wrote: > [snip] >>Well, I might do it anyway for my own edification and enjoyment (and >>because I think you _will_ be interested at some point in the future ;-).) >> >> > ><laugh> Careful Lee, you almost sound like Bowerbird on that one (but >not quite.) > > The difference is that if I _do_ do it (no promises, as of right now it's just speculation) I will make it publicly available, even in an unfinished state. As a result of the discussions here on the <gloss> tag I can already see that I'm going to have to make changes to my TEI to HTML tables. Hey, it's a work in progress, and I could be wrong (I obviously have been in the recent past ;-)), [snip] >It is also useful for the proposed TEI support in OpenReader. > > Y'eh think? ;-) >>Some months ago I put together a couple of tables showing how HTML could >>be mapped to TEI-lite, and vice-versa. The goal was to create a mapping >>that could be used for round-tripping via XSLT; that is, a TEI-lite >>document could be used to create an HTML document which could then be >>transformed back into TEI without loss of markup. I will probably start >>from those tables in creating a tei.css file. They may also be useful to >>you in creating XSLT scripts (aka XSL style sheets). If you're >>interested they can be found at >>www.passkeysoft.com/~lee/xhtml2tei.html >>and www.passkeysoft.com/~lee/tei2xhtml.html. >> >> > >Well, round-tripping using XSLT and direct rendering of TEI using CSS2 >are two different things. > Absolutely. These kind of tables are useful in developing a CSS file in a different sort of way. If you go out to w3c.org you can find a file that is basically the style sheet for XHTML. If you had a User Agent that knew how to render XML+CSS, but which knew nothing about HTML, you could add this style sheet to an XHTML file and it would render just like in a browser. So if you know that <hi> in TEI maps to <i> in HTML, you could use the same style that <i> uses in the HTML style sheet for the <hi> element in the TEI style sheet. This purely mechanical process isn't going to give you a perfect (or perhaps even adequate) style sheet for TEI, but it will probably get you more than 50% of the way there. [snip] >My view in LIT production is to go from PG-TEI to well-structured >XHTML 1.1 (which is probably what Lee means by "HTML".) > Oh, yeah. If you're going to use XSLT to transform TEI to HTML it makes absolutely no sense to output anything _other_ than XHTML 1.1. To my knowledge there are no tools that rely on structures of HTML 3.2 which are unavailable in XHTML 1.1 (except for the fact that some older browsers need a space before the slash on empty elements, e.g. <hr />). When I say HTML you can always assume I'm talking about XHTML unless I make it explicit otherwise. >Definitely Lee is right in that <p> is not the best for this purpose, >and Marcello is right in that how Lee used it is incorrect. In fact, >the closer I look at the above example, the more it looks like XHTML >definition lists with almost an exact mapping between the two except >that XHTML <dl> (analogous to TEI <list type="gloss">) cannot contain >anything but <dd> <dt> pairs, while the TEI version can also contain a ><head>er. > > This is something that actually bothers me quite a bit about the TEI implementation of lists. As a programmer, I want a definition list (of which a glossary is a specific instance) to be structured in such a way that I can grab _one_ element and get both the term _and_ the associated definition. I really dislike both the HTML and the TEI implementation where it relies on the definition to be in a separate element from the term, but immediately following it. The two elements are obviously inextricably linked, but the vocabularies require the encoder to make the link explicit if it is to exist at all. If I ruled the world, the term and its gloss would be combined into an item element, as follows (example modified from the sample at http://www.tei-c.org/P4X/CO.html#COLI): <list type="gloss"> <head>Unit Three --Vocabulary</head> <item> <term lang="la">acerbus, -a, -um</term> <gloss>bitter, harsh</gloss> </item> <item> <term lang="la">ager, agrī, M.</term> <gloss>field</gloss> </item> <item> <term lang="la">audiō, -īre, -īvī, -ītus</term> <gloss>hear, listen (to)</gloss> </item> <!-- etc. --> </list> I believe that this implementation of a glossary list would pass the scrutiny of a XML validator, but it is nonetheless incorrect as the TEI spec clearly states that "it is a semantic error for a list tagged with type='gloss' not to have labels." Heck, I dislike the TEI implementation of glossary lists so much that I am tempted to suggest using lists of type "glossary" instead of type "gloss" just to avoid the specification's requirements (which, by the way, an XML validator would not catch. Validators can tell you when you've done something wrong, but not when you've failed to do something right). >In fact, as I look at it, getting the example above to work >in XHTML is problematic because of the <head> line. In fact, XHTML has >pretty poor list support for internal headers and the like (all the >lists: ol, ul, and dl, only support li, and dd/dt for dl), so this >looks like item #6 in my "problems with TEI+CSS2 rendering" list. > > Not so, because the problem is one of mapping between TEI and XHTML, not one of rendering TEI with CSS; although you could certainly add it to your "problems with transforming TEI to XHTML" list. CSS can deal with headers inside of lists without problem, it's HTML that has the problem. >Jon > > From lee at novomail.net Wed Aug 24 12:03:59 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 12:04:15 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 22) In-Reply-To: <20050824155955.AAFC78C8ED@pglaf.org> References: <20050824155955.AAFC78C8ED@pglaf.org> Message-ID: <430CC49F.2070408@novomail.net> Joshua Hutchinson <joshua@hutchinson.net> wrote: > There is a concept that Marcello and I have discussed of markup > "levels". When it comes to something like TEI, there are so many ways > you can add meta data it is completely daunting at times. In this > example, yes, a more specific markup could have been used. But, in > the final render, it works just fine as <p> blocks. But, you see, it doesn't. On my PocketPC, and in all dedicated e-book programs on my desktop computer, <p> elements (in HTML of course) have the first line indented, and there is no blank space between paragraphs. In this case your glossary just looks odd: there is an indented word or phrase, then another indented word or phrase, then another indented word or phrase, most of which are not complete sentences (although every once in a while there is a complete sentence thrown in just to confuse me). There is no typographical convention that indicates which is the term and which is the gloss. Of course I can figure it out relatively easily, but to do so I have to exit "immersive reading" mode and go into "copy editing" mode, which is disruptive to my reading experience. Had you created the list as a "<list type="gloss"><label>word</label><item>definition</item>", your XSL script could have transformed it into "<ul><li><strong>word</strong>: definition</li>", and preserved all the typographically conventions I have come to expect. As it is, these fragments are identified as paragraphs, and XSL scripts and CSS style sheets have to treat them in the same way they treat real paragraphs. I have never understood the resistance to the notion that text blocks with indeterminable semantics should be identified as <div> rather than <p>. It's a simple change in mindset. When in doubt, use <div>. Look up the definition of the word 'paragraph' in a dictionary. If you don't think you could convince your English (or any other language for that matter) teacher that the block of text satisfies the definition, use <div> (or some other more appropriate element), not <p>. This, I think, is a good example of the distinction between correctness and validity. I could mark up a phrase as "Four <term id="score">score</term> and <gloss target="score">seven</gloss> years ago ..." and it would be valid, although it would not be correct, as the word 'seven' is not really a gloss for the word 'score'. There are times when it is valuable to know that a certain block of text is, in fact, a paragraph. Suppose, for example, that someone might want to create an annotation to accompany <title>The Kit?b-i-Aqdas. He or she might want to preface some text with "if you look at the second paragraph following header 77 ..." If the user has a dt edition, finding this passage is fairly easy: flip through the book for something that looks like a header and is numbered 77, and count the paragraphs that follow. If there are only a few paragraphs after header 77 and before header 78 this is quite easy. If you're looking for the 935th paragraph following header 77 it can quickly become tedious. Luckily for us, tedious is something that computers do very well. Unluckily for us, Bowerbird has not yet released his algorithm for determining whether a block of unstructured text is a paragraph. So for today, we must rely on the coders to correctly identify which blocks are paragraphs, and, just as importantly, which blocks are not paragraphs. If every indeterminate block of text is marked as a paragraph, then the value of the

tag is lost; it has just become a synonym for

, and is redundant. As the pointed man in the pointless forest said to Oblio, "A point in every direction is as good as no point at all." So, if being conscientious about only identifying as a paragraph that text which really _is_ a paragraph adds value to the file (perhaps not to you, but to someone, and if you didn't want this file to be useful to someone else you wouldn't be doing it in the first place), and if it is just as easy to be discriminating about paragraphs as it is _not_ to be, why not do it? > Another example is a text with foreign words interspersed throughout. > Often, those words would be printed in italics in the original book. > Now, the simplest markup in TEI would be to put rend="italics">around the word. But you could also mark the word > with a foreign tag. In the final render, > it would look exactly the same, but the second option provides more > specific metadata. You could even go further by provide a translation > of the foreign word inside the attribute (the markup escapes me at the > moment). > > The markup that would cover what PG currently has would be want I > would call a "level one markup" and that is the minimum, obviously, > that a TEI could be marked to. Level two would be given a little more > metadata, but nothing drastic. Maybe marking certain words as foreign > instead of italics. Marking a letter as such instead of just a block > of indented paragraphs. etc. etc. > > Level three would be going the extra, extra mile. It's the kind of > markup I don't expect to see, but is possible in TEI. I can completely agree with this notion of markup levels, but it seems to me that the thing that should distinquish the levels is completeness, not correctness. Documents at every level should be correct, even if not complete. In your example, the use of the tag tells the user (or more accurately, his or her software agent) "this text was italicized in the original text, but I am unable or unwilling to tell you why." The markup is incomplete, but it is not incorrect. If you were to mark up a block quotation with the
tag you are telling the user agent "this text was set aside as a block in the original text, but I am unable or unwilling to tell you why." I can live with that. But if you mark up a block of text with the

tag you are telling the user agent "this block of text contains one or more compete sentences, and deals with a single thought or topic or quotes one speaker's continuous words." Marking up a definition term as a paragraph is as incorrect as marking up the word 'seven' as a gloss for the word 'score.' Please don't let the reasonable need to tolerate incompleteness become an excuse for incorrectness. > I expect most TEI documents we post will fall in level one or level two. > > Josh From gbnewby at pglaf.org Wed Aug 24 12:08:41 2005 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Aug 24 12:08:43 2005 Subject: [gutvol-d] Re: prank someone is pulling In-Reply-To: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com> References: <20050824163539.CC2B74F629@ws6-5.us4.outblaze.com> Message-ID: <20050824190841.GD21452@pglaf.org> On Wed, Aug 24, 2005 at 11:35:39AM -0500, Joshua Hutchinson wrote: > Ann, this is called a FAX bomb. It is similar to a "e-mail bomb" where someone drops a bunch of e-mail garbage on someone to clog their e-mail account. It is usually someone who wants to "get back" at someone else. > > FAX bombs are especially nasty because then you are not only tying up their service, you are wasting tangible resources like paper and ink. > > The reason our file is being used is probably pretty simple ... we have great big text files that are easily accessible and work perfectly for this kind of griefing. > > It is doubtful anyone that has anything to do with PG had anything to do with this. > > Josh Agreed. It's not our responsibility to help people with their faxes... be polite, and firm, and tell them to seek local advice on how to cancel incoming faxes. Unfortunately, it's pretty easy to use our eBooks for this type of thing...including for SPAM emails. -- Greg > ----- Original Message ----- > From: Gutenberg9443@aol.com > To: gutvol-d@lists.pglaf.org, gbnewby@pglaf.org > Subject: [gutvol-d] Re: prank someone is pulling > Date: Wed, 24 Aug 2005 12:14:26 EDT > > > > > TO ALL: > > > > Someone purporting to be from PG has faxed a book, in Finnish, the name of > > which is > > "Fredrika Runeberg," to the state of New Jersey Surveying Office. Whoever > > did it somehow managed to get around the requirement that the sender's name > > and > > telephone number is to be on the first page of all faxes. I have assured him > > that nobody in our organization did it. He wanted to fax the first page to > > me but my fax is down right now, and I wanted him to mail it as an attached > > file but he doesn't have a scanner. Therefore, he has lost the 33 sheets of > > paper that were in his fax machine and is afraid to try to reuse the fax > > machine > > because it will immediately try to go on printing. (I told him to turn it > > off, unplug it, then replug it and turn it on, and he would probably then be > > able to use the fax machine normally.) > > > > He didn't give me his name other than "Jim." I have his telephone number but > > will release it only to Greg. > > > > I have asked him to snail mail me the first page and let me see what I can > > find out. > > > > If anybody on this ML is the culprit, cease and desist and notify me > > personally that you did it. Then I will only bite your head off, spit it into > > your > > face, and then turn it over to Greg. > > > > If anybody on this ML is the culprit and does not admit it and gets caught, > > that person's ass is grass and that person will be permanently barred from > > this ML and everything else I can get him barred from. This conduct is > > unconscionable. > > > > Anne > > > > Do you like to breathe? > > Then save the trees! > > Begin a personal relationship > > with an ebook > > TODAY! > > > > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Wed Aug 24 12:35:55 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 24 12:36:12 2005 Subject: [gutvol-d] re: unluckily for us Message-ID: <1d4.42c952fa.303e261b@aol.com> lee said: > Unluckily for us, Bowerbird has not yet > released his algorithm for determining whether > a block of unstructured text is a paragraph. oh gee, i'm sorry. i thought i had. in z.m.l., anything surrounded by two or more blank lines is a paragraph. for counting purposes, this works just fine, even if your english teacher might not "approve" of it... a z.m.l. viewer-app will display the paragraph numbers, so there's absolutely no ambiguity about what they are... as long as the programs you are using count the same way, there's no reason to get strung out in definitional sand-traps. any more questions? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/6dfad56e/attachment.html From joshua at hutchinson.net Wed Aug 24 12:39:04 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Aug 24 12:39:12 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol Message-ID: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "Lee Passey" > > Joshua Hutchinson wrote: > > When in doubt, use

. I've learned the fear the
container. Let me show you why using a
container for "non-specific" blocks of text won't work.
Level 1

Paragraph 1.

Block o' text.

Paragraph 2.

*** The above will not validate. Once you go one deeper in a nest, you cannot come back up just one level. You have to close the whole nesting. The above would work if changed to:
Level 1

Paragraph 1.

Block o' text.

Paragraph 2.

*** The problem is that now you have two distinct blocks of text that should really be treated as one block. *** NOTE: None of this is to argue that the more correct way to handle the example you gave would be to mark it as something other than

chunks. You are right there. Just that your counter example of

wouldn't work at all. Josh From hacker at gnu-designs.com Wed Aug 24 12:49:24 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Aug 24 12:50:35 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol In-Reply-To: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com> References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com> Message-ID: > I've learned the fear the
container. Let me show you why > using a
container for "non-specific" blocks of text won't > work. >
> Level 1 >

Paragraph 1.

> >
Block o' text.
> >

Paragraph 2.

> >
Improperly nested tags will never validate. You can't have a bare string inside the tag like that, and isn't a child of
, so that won't work either. After correcting those errors, it validates fine. > The above will not validate. Once you go one deeper in a nest, you > cannot come back up just one level. You have to close the whole > nesting. Nope, this is completely untrue. > The above would work if changed to: > >
> Level 1 >

Paragraph 1.

> >
Block o' text.
>
> >
> >

Paragraph 2.

> >
You're still producing invalid markup. Try something like this: Level 1

Level 1

Paragraph 1.

Block o' text.

Paragraph 2.

David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Gutenberg9443 at aol.com Wed Aug 24 12:51:12 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Aug 24 12:51:28 2005 Subject: [gutvol-d] Re: prank someone is pulling Message-ID: <1df.42c8f7e2.303e29b0@aol.com> In a message dated 8/24/2005 1:09:05 PM Mountain Daylight Time, gbnewby@pglaf.org writes: Unfortunately, it's pretty easy to use our eBooks for this type of thing...including for SPAM emails. I agree. I told him that anybody in the world, anywhere in the world, can download our files and then use them in this way. I suspect he has annoyed someone who is using this means of revenge. But he was just about frothing at the mouth when he called, because he was convinced that it WAS somebody in our organization. It took a while to calm him down enough to listen to reason. Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/b1225fe5/attachment-0001.html From Gutenberg9443 at aol.com Wed Aug 24 12:49:35 2005 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Aug 24 12:54:55 2005 Subject: [gutvol-d] Re: prank someone is pulling Message-ID: <90.649164cc.303e294f@aol.com> In a message dated 8/24/2005 10:36:42 AM Mountain Daylight Time, joshua@hutchinson.net writes: It is doubtful anyone that has anything to do with PG had anything to do with this. I agree. I told him that anybody in the world, anywhere in the world, can download our files and then use them in this way. I suspect he has annoyed someone who is using this means of revenge. Anne Do you like to breathe? Then save the trees! Begin a personal relationship with an ebook TODAY! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/41c3d958/attachment.html From lee at novomail.net Wed Aug 24 12:57:12 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 12:57:27 2005 Subject: [gutvol-d] on viewing the .pgtei file directly (gutvol-d Digest, Vol 13, Issue 23) In-Reply-To: <20050824190003.BA8CD8C8E8@pglaf.org> References: <20050824190003.BA8CD8C8E8@pglaf.org> Message-ID: <430CD118.4060500@novomail.net> Bowerbird@aol.com wrote: > p.s. when i try to view 16523-x.xml directly in firefox, it says: > "this xml file does not appear to have any style information > associated with it. the document tree is shown below." and > then it shows me the document tree. how can i fix this problem? Download the file 16523-x.zip, and extract the files. Hopefully, the file "persistent.css" is still part of the package. Edit the .xml file with a simple text editor (beware Microsoft tools!) to add the line: immediately after the line: Save the file, and hopefully your editor won't have screwed up the utf-8 encoding. Now view the file from your local file system with Firefox. You should see just a bunch of run-on text, because the styles in 'persistent.css' are designed to be used with the XSL transformation to XHTML, and so none of them apply. But the document structure will disappear. You can experiment by adding new styles to 'persistent.css' (don't forget to save the file and reload your browser after adding rules). For example, add "p { display:block; text-indent: 3em }" and all of a sudden you will get distinct, indented paragraphs (and some non-paragraphs will also become distinct and indented). Add "teiHeader { display: none }" and all the Gutenberg legal cruft, together with the metadata which is typically only of interest to archivers, will disappear (it's still there, it's just not "in your face" anymore). Add "head { display:block; font-size: x-large; text-align: center }" and the headers will pop out. Add "hi { font-style: italic }" and highlighted text will become italicized. Use "hi { background: yellow }" instead and the highlighted text will look like you have run over it with a yellow highlighter. Or combine them both: "hi { font-style: italic; background: yellow }". The TEI markup used in this file is fairly simple, so you could probably get a pretty good looking file by using no more than a dozen or so CSS rules. Mr. Perathoner's TEI version of Alice in Wonderland (http://www.gutenberg.org/tei/marcello/0.3/examples/alice/) looks like it is much more complex, and would be funner to play with. I really like _Alice_ as an experimental text because it has quite a few typographical oddities which make it a good test case. Unfortunately, I haven't been able to figure out how to tell Firefox how to use a user specified css file (I've got version 1.0.4). If anyone can enlighten me on this score, I would be most grateful. Mr. Noring tells me that I can do it with Opera, but I've yet to try it. From joshua at hutchinson.net Wed Aug 24 13:05:44 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Aug 24 13:05:53 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol Message-ID: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com> Dave, we're talking about TEI/XML. Nothing of what you said applies here (but you would be absolutely right on a HTML document, as I understand it). ;) Josh ----- Original Message ----- From: "David A. Desrosiers" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol Date: Wed, 24 Aug 2005 15:49:24 -0400 (EDT) > > > > I've learned the fear the
container. Let me show you why using a > >
container for "non-specific" blocks of text won't work. > > >
> > Level 1 > >

Paragraph 1.

> > > >
Block o' text.
> > > >

Paragraph 2.

> > > >
> > Improperly nested tags will never validate. You can't have a bare string > inside the tag like that, and isn't a child of
, so that > won't work either. After correcting those errors, it validates fine. > > > The above will not validate. Once you go one deeper in a nest, you cannot > > come back up just one level. You have to close the whole nesting. > > Nope, this is completely untrue. > > > The above would work if changed to: > > > >
> > Level 1 > >

Paragraph 1.

> > > >
Block o' text.
> >
> > > >
> > > >

Paragraph 2.

> > > >
> > You're still producing invalid markup. Try something like this: > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > > > Level 1 > >

Level 1

>
> >

Paragraph 1.

> >
Block o' text.
> >

Paragraph 2.

> >
> > > > > David A. Desrosiers > desrod@gnu-designs.com > http://gnu-designs.com > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hacker at gnu-designs.com Wed Aug 24 13:08:00 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Aug 24 13:08:36 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol In-Reply-To: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com> References: <20050824200544.A415A2F90F@ws6-3.us4.outblaze.com> Message-ID: > Dave, we're talking about TEI/XML. Nothing of what you said applies > here (but you would be absolutely right on a HTML document, as I > understand it). ;) Its not even well-formed XML, and fails XML validation (as the doctype shown below shows). Perhaps the TEI/XML needs to start conforming. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jeroen.mailinglist at bohol.ph Wed Aug 24 13:15:57 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Wed Aug 24 13:15:24 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <1218078621.20050823191205@noring.name> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> <1218078621.20050823191205@noring.name> Message-ID: <430CD57D.4000901@bohol.ph> Jon Noring wrote: > >Well, my investigation into PG-TEI and TEI-P4X (thank heavens for TEI >Pizza Chef to flatten the otherwise unreadable TEI-P4 DTD!) shows it is >also a real possibility. But I believe, subject to change as I learn >more from the experts here and the TEI-L folk, that in order to make >PGTEI+CSS2 to render in web standards browsers (limited now to Firefox >and maybe Opera 8) we also have to appropriately constrain/subset the >PG-TEI vocabulary (allowed elements/attributes/attr-values) and >content models (what results may be somewhat like TEI-Lite, but not >exactly the same -- we can certainly add our own tags as needs >require.) We may also have to give up a couple things. > > You can render XML, using XSLT + CSS in Firefox and IE, for a small demo, look at http://www.gutenberg.org/files/11335/11335-x/11335-x.xml. This sample still has a few rough edges, but can be made more beautiful. The XSLT is simply pulled in by the browser. For any TEI file to work in an actual environment, you need to have a set of working instructions and conventions, such as what to put in rend attributes, and how to interpret certain things. TEI is mainly concerned about the semantics, but to render it, you need, even in a minimal way, also concern yourself about looks. Just some examples: I consider the foreign tag to imply no rendering information, only a language change. I will use with a lang (and rend) attribute to indicate a rendering change as well as a language change. If somebody applies italics to all foreign tags, it wont be as I intended it. Similarly, I consider quotation marks part of the text, and will leave them, even when I use tags, and never emit quotation marks when rendering TEI. Another user may choose different. Some have argued (with valid reasons) that the entire idea of TEI markup is broken, and have proposed systems in which the mark-up is separated from the text (stream of characters), in such a way that multiple, parallel systems of mark-up can exist. Think of a separate (part of a) file, saying characters 21 to 34 are italics, and so on. This may sound odd, but it is the way the old Macintosh wordprocessor MacWrite worked. Jeroen. From jon_niehof at yahoo.com Wed Aug 24 13:09:22 2005 From: jon_niehof at yahoo.com (Jon Niehof) Date: Wed Aug 24 13:16:14 2005 Subject: [gutvol-d] Re: prank someone is pulling In-Reply-To: <1df.42c8f7e2.303e29b0@aol.com> Message-ID: <20050824200922.11755.qmail@web32912.mail.mud.yahoo.com> --- Gutenberg9443@aol.com wrote: > But he was just about frothing at the mouth when he called, > because he was convinced that it WAS somebody in our > organization. It took a while to calm him down enough to > listen to reason. Hmm. If someone faxed him Julius Caesar, I wonder* if he'd be mad at ol' Will. Or Caesar. It annoys me no end that the good names of volunteers are besmirched in this fashion (or via google spamming, as posted earlier). (*but not enough to try it or recommend someone else try) __________________________________ Yahoo! Mail for Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail From jeroen.mailinglist at bohol.ph Wed Aug 24 12:58:39 2005 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Wed Aug 24 13:21:00 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol In-Reply-To: References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com> Message-ID: <430CD16F.5050906@bohol.ph> The people were talking about TEI here, not HTML, as in your example... in TEI
means a division of a text, not some ad-hoc container as in HTML. Jeroen. David A. Desrosiers wrote: > > You're still producing invalid markup. Try something like this: > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > > > > From hacker at gnu-designs.com Wed Aug 24 13:28:26 2005 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Aug 24 13:29:34 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <430CD57D.4000901@bohol.ph> References: <20050823160216.6E6438C8E8@pglaf.org> <430BB1B7.1080006@novomail.net> <1218078621.20050823191205@noring.name> <430CD57D.4000901@bohol.ph> Message-ID: > You can render XML, using XSLT + CSS in Firefox and IE, for a small > demo, look at > http://www.gutenberg.org/files/11335/11335-x/11335-x.xml. This > sample still has a few rough edges, but can be made more beautiful. > The XSLT is simply pulled in by the browser. I've been doing XML styling for years... There's nothing really magical about it. You can see that here: http://plkr.org/rss.pl There's plenty of Gutenberg XML examples here as well: http://gutenberg.hwg.org/checkdoc2.html David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From joshua at hutchinson.net Wed Aug 24 13:31:32 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Aug 24 13:31:41 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol Message-ID: <20050824203132.70A95EE1A5@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" > > > Dave, we're talking about TEI/XML. Nothing of what you said applies here > > (but you would be absolutely right on a HTML document, as I understand it). > > ;) > > Its not even well-formed XML, and fails XML validation (as the doctype shown > below shows). Perhaps the TEI/XML needs to start conforming. > > My example is perfectly legal XML. Follow the TEI DTD. Your example uses: That is a xhtml1-strict.dtd. Completely different DTD and hence completely different set of markup rules. For instance, means VERY different things and is used in VERY different ways between a TEI document and a XHTML document. Josh From lee at novomail.net Wed Aug 24 13:49:38 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 13:49:52 2005 Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 24 In-Reply-To: <20050824195127.2F8FC8C905@pglaf.org> References: <20050824195127.2F8FC8C905@pglaf.org> Message-ID: <430CDD62.8090404@novomail.net> David A. Desrosiers" wrote: >> I've learned the fear the
container. Let me show you why using >> a
container for "non-specific" blocks of text won't work. > >>
>> Level 1 >>

Paragraph 1.

>>
Block o' text.
>>

Paragraph 2.

>>
> > > Improperly nested tags will never validate. You can't have a bare > string inside the tag like that, and isn't a child of >
, so that won't work either. After correcting those errors, it > validates fine. > >> The above will not validate. Once you go one deeper in a nest, you >> cannot come back up just one level. You have to close the whole >> nesting. > > > Nope, this is completely untrue. > >> The above would work if changed to: >> >>
>> Level 1 >>

Paragraph 1.

>>
Block o' text.
>>
>>
>>

Paragraph 2.

>>
> > > You're still producing invalid markup. Try something like this: > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > > > Level 1 > >

Level 1

>
>

Paragraph 1.

>
Block o' text.
>

Paragraph 2.

>
> > I fear that Mr. Desrosiers has made the classic error of confusing varieties of fruits, which is third only to "never become involved in a land war in Asia." and "never bet with a Sicilian when death is on the line." Mr. Hutchinson's code snippet was encoded using the TEI vocabulary of XML, not the XHTML vocabulary. In TEI the tag is analogous to the HTML

tag, and the HTML tag is analogous to the TEI tag. Apples and oranges. On the other hand, I don't see how Mr. Hutchinson's second example could validate if the first does not, particularly given the fact that DTD's are not structured in such a way to permit a validator to make that kind of a judgment ("if a
contains a
it must be the last element of the first
" or "if a
contains a
it may be preceded by a

, but not followed by one"). I'm not that great at deciphering DTDs, but I don't see anything in http://www.tei-c.org/P4X/DS.html which would cause me to believe that example 1 is not valid. In this particular case, I suspect a bug in the validator program. I mean, writing validators is hard, and I am aware of at least one bug in the W3C's online HTML validator. Supposedly, Xerces is a validating parser. Maybe I'll see if I can find the time to run the snippet through Xerces and see if (and where) it breaks. From marcello at perathoner.de Wed Aug 24 14:05:30 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Aug 24 14:05:47 2005 Subject: [gutvol-d] re: unluckily for us In-Reply-To: <1d4.42c952fa.303e261b@aol.com> References: <1d4.42c952fa.303e261b@aol.com> Message-ID: <430CE11A.5070301@perathoner.de> Bowerbird@aol.com wrote: > in z.m.l., anything surrounded by two or more blank lines > is a paragraph. Is a tennis court a paragraph? -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Aug 24 14:10:19 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Aug 24 14:10:30 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol In-Reply-To: References: <20050824193904.1CE474F675@ws6-5.us4.outblaze.com> Message-ID: <430CE23B.9080003@perathoner.de> David A. Desrosiers wrote: > Improperly nested tags will never validate. You can't have a bare > string inside the tag like that, and isn't a child of >

, so that won't work either. After correcting those errors, it > validates fine. You are in the wrong picture! We are talking about TEI. You are talking about HTML. What Joshua says is true. Go to the TEI-L archives and search for "div tessellation problem". -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Aug 24 14:17:42 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 24 14:17:56 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) Message-ID: lee said: > Edit the .xml file with a simple text editor > (beware Microsoft tools!) to add the line: > ?xml-stylesheet href="persistent.css" type="text/css"? > You can experiment by adding new styles to 'persistent.css' > (don't forget to save the file and reload your browser after > adding rules). For example, add > "p { display:block; text-indent: 3em }" > and all of a sudden you will get distinct, indented paragraphs > (and some non-paragraphs will also become distinct and > indented). Add "teiHeader { display: none }" and all the > Gutenberg legal cruft, together with the metadata which is > typically only of interest to archivers, will disappear > (it's still there, it's just not "in your face" anymore). that is, in other words, if i tell it to use a stylesheet, and then go and create that stylesheet, it will work. :+) i knew that anyway, but i guess it's good to be reminded. ;+) *** jeroen said: > You can render XML, using XSLT + CSS in Firefox and IE, > for a small demo, look at > http://www.gutenberg.org/files/11335/11335-x/11335-x.xml. yes, i should have mentioned jeroen's files work in firefox... (not in safari. but in firefox.) > Some have argued (with valid reasons) that > the entire idea of TEI markup is broken, and > have proposed systems in which the mark-up is > separated from the text (stream of characters), > in such a way that multiple, parallel systems of > mark-up can exist. Think of a separate (part of a) file, > saying characters 21 to 34 are italics, and so on. > This may sound odd, but it is the way the old > Macintosh wordprocessor MacWrite worked. actually, that's the way the underlying _editfield_ of the (classic) mac operating system is structured. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/e9bdbde1/attachment.html From marcello at perathoner.de Wed Aug 24 14:30:02 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Aug 24 14:30:14 2005 Subject: [gutvol-d] Re: gutvol-d Digest, Vol 13, Issue 24 In-Reply-To: <430CDD62.8090404@novomail.net> References: <20050824195127.2F8FC8C905@pglaf.org> <430CDD62.8090404@novomail.net> Message-ID: <430CE6DA.5050605@perathoner.de> Lee Passey wrote: > On the other hand, I don't see how Mr. Hutchinson's second example could > validate if the first does not, particularly given the fact that DTD's > are not structured in such a way to permit a validator to make that kind > of a judgment ("if a
contains a
it must be the last element > of the first
" or "if a
contains a
it may be preceded > by a

, but not followed by one"). This simple declaration does exactly that: "A div may contain zero or more p followed by zero or more div." > In this particular case, I suspect a bug in the validator program. I > mean, writing validators is hard, and I am aware of at least one bug in > the W3C's online HTML validator. No bug. The TEI dtd is broken as designed. > Supposedly, Xerces is a validating parser. Maybe I'll see if I can find > the time to run the snippet through Xerces and see if (and where) it > breaks. Get libxml2 from xmlsoft.org and use xmllint. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Wed Aug 24 14:38:47 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 24 14:38:59 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: References: Message-ID: <908745031.20050824153847@noring.name> Bowerbird wrote: > jeroen said: >>?Some have argued (with valid reasons) that the entire idea of TEI >> markup is broken, and have proposed systems in which the mark-up is >> separated from the text (stream of characters), in such a way that >> multiple, parallel systems of mark-up can exist. Think of a separate >> (part of a) file, saying characters 21 to 34 are italics, and so on. >>?This may sound odd, but it is the way the old Macintosh wordprocessor >> MacWrite worked. > actually, that's the way the underlying _editfield_ > of the (classic) mac operating system is structured. I was told by someone (who I think is in the know) that the idea of separating markup from content (by having layers) was first proposed years ago by Ted Nelson of "Project Xanadu" fame. I recall asking Dr. Stephen DeRose at Brown University (one of the world's leading electronic document experts) about Ted Nelson's proposal and how it compares with SGML/XML markup. Dr. DeRose's reply was essentially that layering has some obvious advantages (i.e., easier to represent non-hierarchical structures), but that there were a lot of real world disadvantages as well. In the early days, before SGML, the researchers were exploring all kinds of avenues, and nearly all of them moved in the direction of direct markup rather than Ted Nelson's layering. Of course, one wonders if the dynamics have changed enough that revisiting the issue would yield a different result. Can't answer that, but other than being able to non-hierarchically "markup" documents with layering, I do not see any compelling advantages -- there'd have to be some whole new killer application which requires such layering to work properly, and I've not seen such an application arise the last few years.. (It is possible in XML to do some non-hierarchical markup using empty "milemarkers" with ID/IDREF pairs. But one would have to build a special application to read such documents -- that's no different than building an application to process the "layer" approach. An example of non-hierarchical documents is the modern Bible, where verses can cross sentence and even paragraph boundaries. So one has the choice in SGML/XML of marking it up by chapter/paragraphs, and put in verse "milemarkers", or the opposite. Most would agree that one applies hierarchical markup to document structure (paragraphs), and then add milemarkers to locate the start of a new verse.) Jon From jon at noring.name Wed Aug 24 15:05:05 2005 From: jon at noring.name (Jon Noring) Date: Wed Aug 24 15:05:18 2005 Subject: [gutvol-d] on viewing the .pgtei file directly In-Reply-To: <1c6.2f5def07.303e19b8@aol.com> References: <1c6.2f5def07.303e19b8@aol.com> Message-ID: <1859002177.20050824160505@noring.name> Bowerbird wrote: > why go through the pain of conversion if you don't have to? > and the .pgtei file is the one with all the information in it, not? > might as well view that, rather than some pale conversion... > > but let's get real here for a minute, ok? > > if the only people who can view the .pgtei file directly > are the few who happen to be using a specific browser, > there's no need to put a lot of resources in that direction. Browsers are slowly moving towards better CSS2 and even CSS3 support (this is a major component of what is called 'web standards'). So things are not fixed in the browser arena. Overall Firefox and Opera 8 have the best web standards support, but they're not yet 100% (note that Haakon Wium Lie, the CEO of Opera, is one of the principal players in W3C's CSS development.) IE6 is way behind. It is unknown how much better IE7 will be. It doesn't really matter -- Firefox and Opera are plowing ahead, and continue to gain market share across platforms. > no sir.? what lee is _really_ talking about is "openreader", > which he has begun programming.? (you _have_ begun, > haven't you, lee?? because there's no time like the present.) Lee is the chair of the OpenReader Development Working Group, which is now working on "Orca", the name we've given to the OpenReader "user agent". I'm not sure where Lee and the WG are at present, although as you are probably aware there's not been much public activity this Summer on the WG list. Summer is usually a slow time in standards and development work. Good thing the principals of OpenReader don't live in Norway. In Norway, everything shuts down for two to three months in the summer, and understandably so. Btw, David Teller in France is working on an OEBPS "browser", which could also be used to render OpenReader publications. So there are parallel efforts, which is good! > because, you see, a specialized e-book viewer-program > (like openreader) can deliver an e-book experience that > _far_surpasses_ the one that an end-user gets in a browser. That's the plan for OpenReader. Refer to the interim OR site at: http://www.openreader.org/ Also, refer to the page where we discuss the freedom that OR gives us with respect to rendering. We are no longer constrained by the web browser/HTML paradigm: http://www.openreader.org/browsers.html A lot of this "freeing" up comes from OEBPS. The OEBPS "out-of-spine" construct is proving itself to be a powerful feature. There is a reason why I continue to discuss the TEI tag as I do. Since OpenReader will handle "out-of-spine" content (in Orca via "Booklets"), we automatically have a way to beautifully handle inline TEI -- just view it in an optional popup window (or other mechanism). This is one reason why it will be *easier* for OpenReader to support TEI than it would be for current web browsers since Orca and any other OR user agent *has* to handle OEBPS "out-of-spine" content. Interestingly, XHTML 2.0 also plans to introduce a new attribute which is similar (and actually more powerful) than TEI's , and it will force web browsers to render such marked-up content to be displayed outside the main flow of the text. I wonder how Opera and Firefox will do it? For the proposal in XHTML 2.0, refer to: http://www.w3.org/TR/xhtml2/mod-role.html#s_rolemodule Look at the 'role' attribute (i.e., 'role="note"'). It is part of the Common attributes collection. With this attribute, one can make just about any tag become a note (annotation, parenthetical content, etc.) In my private chat with Stephen Pemberton, the chair of the XHTML working group, it is intended for web browsers to somehow display to the end-user the content within 'role="note"'. Thus, it appears to not be too different from TEI . > and _that_ is the reason why people would want to view a > .pgtei file directly, rather than look at an .html conversion; > not because of the files per se -- it's silly to think end-users > care anything about formats -- but because of the _viewer_ > and the e-book _experience_ that was delivered therein... The format is integral to the reading experience. One can't really separate them. But the format does come first. It needs to be intelligently designed so as to allow the greatest reading experience, among other things. Fortunately OEBPS has done a lot of the work already. Regarding TEI, we at OpenReader are definitely interested, at some future time, to support TEI in some fashion. The specifics have yet to be resolved. What Lee and I are doing is *learning* about TEI with respect to utilization in ebook presentation. This means we need to learn the vocabulary, learn its limitations and advantages, see how it relates to XHTML/OEBPS, look at direct rendering issues (incl. CSS support), etc. To get a grasp of the major issues. Lee is doing it his way, I'm doing it my way. Obviously, PG-TEI is of interest to us since it is pretty much the TEI implementation closest to our interests. > so, if lee can deliver an openreader that is _cross-platform_ > and runs on _older_hardware_, using _minimal_resources_, > and can render the .pgtei file directly, giving the end-user a > powerful e-book experience, all from a free-beer program, > no one will use their funky web-browser to read an e-text... Define "older" hardware. Our determination is that very old hardware, and low-power hardware, such as older PDAs, simply don't have the horsepower required to deliver a nice digital publication (ebook) reading experience. We are focusing on the future, thus the OR format and Orca (which is intended to be a reference implementation, a demo if you prefer) is going to draw the line somewhere with legacy hardware support. I doubt Orca will be developed to be compiled on older Macs (only OS X), but this doesn't prevent someone else from building their own OpenReader user agent to run on whatever platform(s) they desire. We view Orca to be similar to the early days of the web, when Mosaic was developed to be a reference implementation of an HTML user agent. Mosaic launched the web, and over time there's been dozens of web browsers developed. Notice that Mosaic doesn't even exist any more. For more info on OpenReader legacy support, refer to: http://www.openreader.org/macpalm.html There we talk about Mac (a little) and Palm support. We do note talk about older Mac (pre OS X) support. Whether Orca supports older Mac or not sort of depends upon the final architecture of the code base. We don't deem it important for Orca to support pre OS X Macs (sorry!) > so let's wish lee success in his endeavor, > for the ultimate good of all the end-users... We plan that a group of programmers, working together, will develop Orca. The kudos, should it happen, will go out to all those who contribute. It *should* be a team effort. Of course, if any developer here is interested in helping develop Orca, contact Lee. Jon Noring OpenReader Consortium From Bowerbird at aol.com Wed Aug 24 16:20:45 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 24 16:21:03 2005 Subject: [gutvol-d] on viewing the .pgtei file directly Message-ID: <2b.79e7bacb.303e5acd@aol.com> hey jon noring, as long as you're still moderating michael hart over on your listserve, i'm still declining to talk with you here... but please, do have a nice day anyway... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/a8eaf5ac/attachment.html From lee at novomail.net Wed Aug 24 16:48:11 2005 From: lee at novomail.net (Lee Passey) Date: Wed Aug 24 16:48:24 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 25) In-Reply-To: <20050824204954.02BEC8C906@pglaf.org> References: <20050824204954.02BEC8C906@pglaf.org> Message-ID: <430D073B.30104@novomail.net> Jeroen Hellingman wrote: [snip] > Some have argued (with valid reasons) that the entire idea of TEI > markup is broken, and have proposed systems > in which the mark-up is separated from the text (stream of > characters), in such a way that multiple, parallel systems of > mark-up can exist. Think of a separate (part of a) file, saying > characters 21 to 34 are italics, and so on. This may sound > odd, but it is the way the old Macintosh wordprocessor MacWrite worked. > > Jeroen. This is also the way HTML Tidy works, internally. As an HTML file is parsed (and fixed, if necessary) a DOM tree is built. But when a text node is encountered rather than malloc'ing a potentially small amount of memory and storing a pointer, the text is copied into a pre-allocated text buffer, and the start and end points of the fragment are saved in the node structure (the start and end points are actually saved in every node, so you can grab any node in the tree and know that it encompasses "this much" of the actual text.) When 'pretty-printing' the tree, text is grabbed from the buffer as needed. Having created this structure in memory, there is no reason at all it couldn't be saved out separately, with text nodes simply referring to an offset and length in a separate file which receives the entire text buffer, or a separate segment in the same file that contains the text. Likewise, if someone wanted, hypothetically mind you, to write a set of annotations and footnotes to classic literature found at Gutenberg, the same sort of strategy could be used; the annotations would be in a separate file and refer to text at a certain offset in the base file. You'd have to write a small application to merge the two files for presentations, but that sort of thing is trivial, perfectly suited for perl, awk or python. This type of division between markup and content is also perfectly suited to writing an application to display e-books in a low memory/low power device. The DOM tree could quickly be loaded into memory and remain resident, permitting fast navigation and styling, but the actual text could remain in static storage, only being accessed when needed. One of the downsides to this sort of system is that the base content _must_ remain 1. accessible and 2. inviolate. The Gutenberg edition of _The Adventures of Sherlock Holmes_ was first released in 1999 and has gone through 12 revisions, the most recent being in 2002. Version 10 is still available at gutenberg.org, but I can't find any earlier versions (this is not a criticism; PG is not an archive, after all). So if I were to write an annotation designed to be overlayed over the PG text I would want to have some assurances that the base text were always available, or I would want to be sure that the base text was always physically attached to the annotation file (to the extent that anything digital can be said to be physical). If I were to write a separate HTML markup file for TAOSH, I would want some assurance that the base text would not be altered in any way which would change the position of any character in the file, otherwise my markup would break. So there are definitely problems with this sort of application, but there are real benefits too, in some circumstances. From sly at victoria.tc.ca Wed Aug 24 17:11:00 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Aug 24 17:11:16 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 25) In-Reply-To: <430D073B.30104@novomail.net> References: <20050824204954.02BEC8C906@pglaf.org> <430D073B.30104@novomail.net> Message-ID: On Wed, 24 Aug 2005, Lee Passey wrote: > _must_ remain 1. accessible and 2. inviolate. The Gutenberg edition of > _The Adventures of Sherlock Holmes_ was first released in 1999 and has > gone through 12 revisions, the most recent being in 2002. Version 10 is > still available at gutenberg.org, but I can't find any earlier versions > (this is not a criticism; PG is not an archive, after all). A brief explanation here of the historical edition numbering of PG texts. Every text was released initially in a version "10" (Think of that as 1.0) And then subsequent "editions" would be numbered 11, 12, etc. If you look hard enough in the pre-10,000 files, you can find a couple of exceptions, but that will cover most cases. Also note that the consensus that emerged was that a small number of minor corrections could be made without increasing the edition number. Andrew From Bowerbird at aol.com Wed Aug 24 17:49:40 2005 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Aug 24 17:50:01 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 25) Message-ID: <13d.1a0b3c0e.303e6fa4@aol.com> andrew said: > A brief explanation here of > the historical edition numbering of PG texts. > Every text was released initially in a version "10" > (Think of that as 1.0) And then subsequent "editions" > would be numbered 11, 12, etc. If you look hard enough > in the pre-10,000 files, you can find a couple of exceptions, > but that will cover most cases. Also note that the consensus > that emerged was that a small number of minor corrections > could be made without increasing the edition number. it is this last "consensus that emerged" that is most troublesome. as long as the filename stays unique for each different version, at least people can depend on the name to identify the version. when you substitute in a different file without changing its name, you've introduced unnecessary ambiguity into the situation, and thus made it extremely difficult for people to keep track of things. and when you fail to provide changelogs on the entire process, the difficulty-factor starts to climb into the "impossible" range... apologies for spoiling your fun... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20050824/dd6e3949/attachment.html From grythumn at gmail.com Wed Aug 24 21:08:14 2005 From: grythumn at gmail.com (Robert Cicconetti) Date: Wed Aug 24 21:15:27 2005 Subject: [gutvol-d] on viewing the .pgtei file directly (gutvol-d Digest, Vol 13, Issue 23) In-Reply-To: <430CD118.4060500@novomail.net> References: <20050824190003.BA8CD8C8E8@pglaf.org> <430CD118.4060500@novomail.net> Message-ID: <15cfa2a505082421087bab263e@mail.gmail.com> On 8/24/05, Lee Passey wrote: > Unfortunately, I haven't been able to figure out how to tell Firefox how > to use a user specified css file (I've got version 1.0.4). If anyone can > enlighten me on this score, I would be most grateful. Mr. Noring tells > me that I can do it with Opera, but I've yet to try it. Put it in userContent.css? http://www.mozilla.org/support/firefox/edit#content R C From sly at victoria.tc.ca Thu Aug 25 00:25:09 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Aug 25 00:25:30 2005 Subject: [gutvol-d] TEI markup question Message-ID: Here is a question for those interested in TEI markup. (non-constuctive answers will be ignored) Has anyone here had experience in marking up a passage which contains a couple of quoted lines of verse that clearly occur within a paragraph? Here is a particular example (from anne11.txt): == Begin Excerpt == The cows swung placidly down the lane, and Anne followed them dreamily, repeating aloud the battle canto from MARMION--which had also been part of their English course the preceding winter and which Miss Stacy had made them learn off by heart--and exulting in its rushing lines and the clash of spears in its imagery. When she came to the lines The stubborn spearsmen still made good Their dark impenetrable wood, she stopped in ecstasy to shut her eyes that she might the better fancy herself one of that heroic ring. When she opened them again it was to behold Diana coming through the gate that led into the Barry field and looking so important that Anne instantly divined there was news to be told. But betray too eager curiosity she would not. == End Excerpt == The first reaction, which I have seen done before, is to use a

then a then a

. This is not an ideal solution, (as I am sure Lee would not hesitate to point out). >From a semantic point of view the section beginning "she stopped in ecstasy to shut her eyes" is not structually a complete paragraph. From a presentational point of view, you are likely to get an undesirable styling on that last element if someone decides to have initial indentation on all paragraphs in the document. According to the PGTEI dtd, is it valid to have a within a

? If not, I don't suppose we can lable a

as Initial, Medial or Final as with the element... Andrew From gsmith at nc.rr.com Wed Aug 24 17:32:11 2005 From: gsmith at nc.rr.com (Greg Smith) Date: Thu Aug 25 00:31:37 2005 Subject: [gutvol-d] newbie question: copyright editions of public domain works Message-ID: <1124929931.3443.13.camel@localhost.localdomain> I inherited from my grandfather 25 years ago 4 collections from his library. He was a Baptist minister (read poor). The first collection is the 11th edition of the Encyclopedia Brittanica published 1910-1911. The other three collections are editions of works that were public domain at the time of the edition (eg `Best Known Works: Defoe'). The copyright dates range from the mid teens to the early forties. But what are these copyrights copyrighting? I can understand editor comments, translations, pictures, etc. But what if spelling/grammar was modernized? Or, portions cut or moved? Would these books (~120 volumes) be of any use? Apologies for my ignorance, Greg Smith From sly at victoria.tc.ca Thu Aug 25 00:45:15 2005 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Aug 25 00:45:35 2005 Subject: [gutvol-d] newbie question: copyright editions of public domain works In-Reply-To: <1124929931.3443.13.camel@localhost.localdomain> References: <1124929931.3443.13.camel@localhost.localdomain> Message-ID: Hi Greg. Please don't be afraid to ask questions... First, the usual disclaimer. I am not a lawer--Copyright laws are different in every country, and often in a state of flux --etc. Now here's my understanding of the topics you mention. First of all, published books often claim copyright on re-published material when it is really not merited. However, for our purposes at PG, we can't know for sure that editorial interventions such as you've mentioned have not happened--without doing a comparison with a proven public domain edition. And if you have that PD edition availible anyway, you may as well work from it. (On a side note, I'll mention that I have occasionally made some use of supposedly copyright imprints like this before. For example if I'm reformatting a German text to include in PG, and I get copyright clearnace from some late 19th century edition with a hard-to- read Fraktur font, then I may use a more recent edition as a reference, or to do some spot-checks.) Andrew On Wed, 24 Aug 2005, Greg Smith wrote: > I inherited from my grandfather 25 years ago 4 collections from his > library. He was a Baptist minister (read poor). > > The first collection is the 11th edition of the Encyclopedia Brittanica > published 1910-1911. The other three collections are editions of works > that were public domain at the time of the edition (eg `Best Known > Works: Defoe'). The copyright dates range from the mid teens to the > early forties. > > But what are these copyrights copyrighting? I can understand editor > comments, translations, pictures, etc. But what if spelling/grammar was > modernized? Or, portions cut or moved? Would these books (~120 > volumes) be of any use? > > Apologies for my ignorance, > > Greg Smith > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Thu Aug 25 05:04:31 2005 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Aug 25 05:04:39 2005 Subject: [gutvol-d] TEI markup question In-Reply-To: References: Message-ID: <430DB3CF.90203@perathoner.de> Andrew Sly wrote: > Has anyone here had experience in marking up a passage > which contains a couple of quoted lines of verse that > clearly occur within a paragraph? TEI has not the petty limitations of HTML. A TEI p can contain q, list, table, figure, text. (This is also the reason why XSL transformation from TEI to HTML is hard. An HTML p may not contain blockquote, ul, ol, dl, table.) Mark it up straight like this: -------------------

The cows swung placidly down the lane, and Anne followed them dreamily, repeating aloud the battle canto from Marmion—which had also been part of their English course the preceding winter and which Miss Stacy had made them learn off by heart—and exulting in its rushing lines and the clash of spears in its imagery. When she came to the lines The stubborn spearsmen still made good Their dark impenetrable wood, she stopped in ecstasy to shut her eyes that she might the better fancy herself one of that heroic ring. When she opened them again it was to behold Diana coming through the gate that led into the Barry field and looking so important that Anne instantly divined there was news to be told. But betray too eager curiosity she would not.

---------------------- You may also use instead of . But is more correct if it references a published work. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Thu Aug 25 05:05:40 2005 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Aug 25 05:05:41 2005 Subject: [gutvol-d] TEI markup question Message-ID: <20050825120540.12C7C1098E9@ws6-4.us4.outblaze.com> I don't remember if you can place a within a

block, but ... You can put a rend="noindent" on the second paragraph to insure that it won't ever have an indention. As to what is the "right way to do it" ... I don't know. Josh ----- Original Message ----- From: "Andrew Sly" > > Here is a question for those interested in TEI markup. > (non-constuctive answers will be ignored) > > Has anyone here had experience in marking up a passage > which contains a couple of quoted lines of verse that > clearly occur within a paragraph? > > Here is a particular example (from anne11.txt): > > == Begin Excerpt == > The cows swung placidly down the lane, and Anne followed them > dreamily, repeating aloud the battle canto from MARMION--which > had also been part of their English course the preceding winter > and which Miss Stacy had made them learn off by heart--and > exulting in its rushing lines and the clash of spears in its > imagery. When she came to the lines > > The stubborn spearsmen still made good > Their dark impenetrable wood, > > she stopped in ecstasy to shut her eyes that she might the better > fancy herself one of that heroic ring. When she opened them > again it was to behold Diana coming through the gate that led > into the Barry field and looking so important that Anne instantly > divined there was news to be told. But betray too eager > curiosity she would not. > > == End Excerpt == > > The first reaction, which I have seen done before, is > to use a

then a then a

. This is not an ideal > solution, (as I am sure Lee would not hesitate to point out). > > From a semantic point of view the section beginning "she > stopped in ecstasy to shut her eyes" is not structually > a complete paragraph. From a presentational point of > view, you are likely to get an undesirable styling on > that last element if someone decides to have initial > indentation on all paragraphs in the document. > > > According to the PGTEI dtd, is it valid to have a > within a

? > > If not, I don't suppose we can lable a

as Initial, Medial > or Final as with the element... > > Andrew > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From brad at chenla.org Thu Aug 25 07:47:30 2005 From: brad at chenla.org (Brad Collins) Date: Thu Aug 25 07:48:36 2005 Subject: [gutvol-d] ANNOUNCEMENT: XML has hit the PG archives! (gutvol-d Digest, Vol 13, Issue 19) In-Reply-To: <908745031.20050824153847@noring.name> (Jon Noring's message of "Wed, 24 Aug 2005 15:38:47 -0600") References: <908745031.20050824153847@noring.name> Message-ID: Jon Noring writes: > Bowerbird wrote: >> jeroen said: > >>>?Some have argued (with valid reasons) that the entire idea of TEI >>> markup is broken, and have proposed systems in which the mark-up is >>> separated from the text (stream of characters), in such a way that >>> multiple, parallel systems of mark-up can exist. Think of a separate >>> (part of a) file, saying characters 21 to 34 are italics, and so on. >>>?This may sound odd, but it is the way the old Macintosh wordprocessor >>> MacWrite worked. About two years ago I was playing around with the same idea. My solution was to take a CSS approach to layering. CSS places an external layer of formating instructions on top of a text, so why not extend CSS to also be able to add layers of semantic markup to a text? This would make it easy to add semantic markup including glosses, notes, comments (scholia) etc to a text, even it the text is located on a server somewhere on the Net. The folks doing the Hypereal Dictionary of Mathematics are creating a scholia system based on Emacs text properties to add layers of scholia to texts. b/ -- Brad Collins , Bangkok, Thailand From jon at noring.name Thu Aug 25 07:59:22 2005 From: jon at noring.name (Jon Noring) Date: Thu Aug 25 07:59:28 2005 Subject: [gutvol-d] TEI markup question In-Reply-To: <430DB3CF.90203@perathoner.de> References: <430DB3CF.90203@perathoner.de> Message-ID: <115134081.20050825085922@noring.name> Marcello wrote: > Andrew Sly wrote: >> Has anyone here had experience in marking up a passage >> which contains a couple of quoted lines of verse that >> clearly occur within a paragraph? > TEI has not the petty limitations of HTML. A TEI p can contain q, list, > table, figure, text. (This is also the reason why XSL transformation > from TEI to HTML is hard. An HTML p may not contain blockquote, ul, ol, > dl, table.) > > Mark it up straight like this: > > ------------------- >

The cows swung placidly down the lane, and Anne followed them > dreamily, repeating aloud the battle canto from > Marmion—which > had also been part of their English course the preceding winter > and which Miss Stacy had made them learn off by heart—and > exulting in its rushing lines and the clash of spears in its > imagery. When she came to the lines > > > > The stubborn spearsmen still made good > Their dark impenetrable wood, > > > > she stopped in ecstasy to shut her eyes that she might the better > fancy herself one of that heroic ring. When she opened them > again it was to behold Diana coming through the gate that led > into the Barry field and looking so important that Anne instantly > divined there was news to be told. But betray too eager > curiosity she would not.

> ---------------------- > > You may also use instead of . But is more correct if > it references a published work. My first thought, as what I always do when encountering text to markup, is to understand the presentation-agnostic *structure* and/or *semantics* of what I'm seeing in the typography. The two lines in the example forms a structural block of a certain kind. The question now is *what* does this block represent. Well, for starters, it is a snippet of verse that appears within a paragraph but obviously is intended to be autonomous to the paragraph (rather than just being quoted inline as is often done.) But what kind of verse, or where does it come from? Marcello identifies (and it appears correct to me) as a , so that works for me (need to study up more on the TEI tag.) A just doesn't have enough semantic meaning (without adding an attribute) to let it be a child of the

tag. The TEI authors apparently recognize this and allow only certain "block-level" tags within

, so the markup wonk has to somehow fit the "in paragraph" block they encounter into one of those allowed in TEI. (Another area to study -- the elements allowed as children of the TEI

tag -- looking at the full flattened DTD now, which I'll append at the end.) ***** Regarding the limitation of HTML

, yes that is frustrating. There are block-level tags (actually tags that can be either block or inline, among them , ,