From Gutenberg9443 at aol.com Fri Oct 1 05:07:09 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Oct 1 05:07:21 2004 Subject: [gutvol-d] Indexing Editors, etc. Message-ID: <144.3502adb2.2e8ea26d@aol.com> In a message dated 9/19/2004 10:38:22 AM Mountain Standard Time, marevalo@marevalo.net writes: And it would be great to have the complete bibliographical record of the book (o books) used as source for the digital edition on every new text. Or at least the date of original publication and the name of the original publisher. Not having those makes it difficult to cite the book in research work. I usually resort to going to the Library of Congress record to get that, and then include the fact that this is an on-line edition etc. etc. etc. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041001/887c6099/attachment.html From Bowerbird at aol.com Fri Oct 1 10:56:23 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 1 10:56:35 2004 Subject: [gutvol-d] Indexing Editors, etc. Message-ID: <8e.1648d182.2e8ef447@aol.com> marevalo said: > > it would be great to have the complete bibliographical record > > of the book (or books) used as source for the digital edition anne said: > Or at least the date of original publication > and the name of the original publisher. this is a recurring request, of course. it might be interesting to have a public forum, like a wiki, where requests like these could be made, so we could see the cumulation. as it is, i get the distinct impression these go in one ear and out the other, and we go along in ignorant bliss thinking that all of our users are completely happy... -bowerbird From marcello at perathoner.de Fri Oct 1 11:10:06 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 1 11:10:15 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <144.3502adb2.2e8ea26d@aol.com> References: <144.3502adb2.2e8ea26d@aol.com> Message-ID: <415D9D7E.5000908@perathoner.de> Gutenberg9443@aol.com wrote: > And it would be great to have the complete bibliographical record of the > book (o books) used as source for the digital edition on every new text. > > Or at least the date of original publication and the name of the > original publisher. Not having those makes it difficult to cite the book > in research work. I usually resort to going to the Library of Congress > record to get that, and then include the fact that this is an on-line > edition etc. etc. etc. You give me a list of all publication dates and places etc. and I'll insert them into the database. If PG has all TP&Vs archived for copyright purposes, it should not be an impossible task. Adding database support for more attributes is a matter of a few hours. It just doens't make sense to add more attributes if nobody volunteers to fill them in. -- Marcello Perathoner webmaster@gutenberg.org From traverso at dm.unipi.it Fri Oct 1 11:19:53 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Oct 1 11:20:02 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <415D9D7E.5000908@perathoner.de> (message from Marcello Perathoner on Fri, 01 Oct 2004 20:10:06 +0200) References: <144.3502adb2.2e8ea26d@aol.com> <415D9D7E.5000908@perathoner.de> Message-ID: <200410011819.i91IJr72024672@posso.dm.unipi.it> >>>>> "Marcello" == Marcello Perathoner writes: Marcello> Gutenberg9443@aol.com wrote: >> And it would be great to have the complete bibliographical >> record of the book (o books) used as source for the digital >> edition on every new text. >> >> Or at least the date of original publication and the name of >> the original publisher. Not having those makes it difficult to >> cite the book in research work. I usually resort to going to >> the Library of Congress record to get that, and then include >> the fact that this is an on-line edition etc. etc. etc. Marcello> You give me a list of all publication dates and places Marcello> etc. and I'll insert them into the database. If PG has Marcello> all TP&Vs archived for copyright purposes, it should not Marcello> be an impossible task. Marcello> Adding database support for more attributes is a matter Marcello> of a few hours. It just doens't make sense to add more Marcello> attributes if nobody volunteers to fill them in. It would be very simple, if just the information were not removed systematically. At DP, the full information (including a transcription of the titlepage) is kept in proofreading, but it is removed, if not by the post-procesors, by the whitewashers. Carlo From jon at noring.name Fri Oct 1 11:42:05 2004 From: jon at noring.name (Jon Noring) Date: Fri Oct 1 11:42:23 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <8e.1648d182.2e8ef447@aol.com> References: <8e.1648d182.2e8ef447@aol.com> Message-ID: <169440744187.20041001124205@noring.name> marevalo said: >> > it would be great to have the complete bibliographical record >> > of the book (or books) used as source for the digital edition anne said: > Or at least the date of original publication > and the name of the original publisher. Bowerbird said: > this is a recurring request, of course. > > it might be interesting to have a public forum, like a wiki, where > requests like these could be made, so we could see the cumulation. > > as it is, i get the distinct impression these go in one ear and > out the other, and we go along in ignorant bliss thinking that > all of our users are completely happy... Agreed. I believe PG should change their policy (if they haven't already) and for all new titles to include the full citation of the source (or sources if someone made a composite using two or more differing editions, which btw I believe PG should discourage.) I surmise the reason for the past (and I assume current) policy of obfuscating the source had to do with fear of copyright litigation -- in essence "providing information to the enemy." However, since nearly all the texts produced today are from scans which are preserved, it is no longer possible to obfuscate the source. Anyway, there are a few other arguments in support of full source citation, including the most important: assuring integrity of the text to the original source. This is not a trivial issue. Jon Noring From skip at nextra.sk Sun Oct 3 17:14:55 2004 From: skip at nextra.sk (Skippi) Date: Sun Oct 3 17:15:23 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <169440744187.20041001124205@noring.name> References: <8e.1648d182.2e8ef447@aol.com> <169440744187.20041001124205@noring.name> Message-ID: <1194651402.20041004021455@nextra.sk> Friday, October 1, 2004, 8:42:05 PM, Jon wrote: > I believe PG should change their policy (if they haven't already) and > for all new titles to include the full citation of the source (or > sources if someone made a composite using two or more differing > editions, which btw I believe PG should discourage.) ... > Anyway, there are a few other arguments in support of full source > citation, including the most important: assuring integrity of the text > to the original source. This is not a trivial issue. I agree too and suggest that may be this information could be kept in a XML format conforming some DTD (PG own) so that the book can be very easily processed or catalogued. But this is going too far from the plain text idea PG was built and still succesfuly lives on. -- Skippi mailto:skip@nextra.sk From hart at pglaf.org Mon Oct 4 10:00:35 2004 From: hart at pglaf.org (Michael Hart) Date: Mon Oct 4 10:00:36 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <1194651402.20041004021455@nextra.sk> References: <8e.1648d182.2e8ef447@aol.com> <169440744187.20041001124205@noring.name> <1194651402.20041004021455@nextra.sk> Message-ID: If the original source you use turns out to have errors, as nearly all books do, do you want the errors preserved? mh On Mon, 4 Oct 2004, Skippi wrote: > Friday, October 1, 2004, 8:42:05 PM, Jon wrote: > >> I believe PG should change their policy (if they haven't already) and >> for all new titles to include the full citation of the source (or >> sources if someone made a composite using two or more differing >> editions, which btw I believe PG should discourage.) > ... >> Anyway, there are a few other arguments in support of full source >> citation, including the most important: assuring integrity of the text >> to the original source. This is not a trivial issue. > > I agree too and suggest that may be this information could be kept in > a XML format conforming some DTD (PG own) so that the book can be very > easily processed or catalogued. But this is going too far from the > plain text idea PG was built and still succesfuly lives on. > > -- > Skippi mailto:skip@nextra.sk > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From shalesller at writeme.com Mon Oct 4 10:28:58 2004 From: shalesller at writeme.com (D. Starner) Date: Mon Oct 4 10:29:03 2004 Subject: [gutvol-d] Indexing Editors, etc. Message-ID: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com> Michael Hart writes: > If the original source you use turns out to have errors, > as nearly all books do, do you want the errors preserved? Yes. That's way too unconditional, and there's a lot of minor errors in texts that we can just correct; but I've been reading a book on Beowulf which talks about one of the first published transcriptions that "corrected" a lot of things and screwed with work on Beowulf for 50 years. I hate to imagine us adding anacronistic spelling to a work or making a work harder to understand. On DP, people frequently ask about obvious errors that turn out to be correct. If we want to be editors, let us be editors and take full responsibility for checking other editions and writing introductions and bibliographies and keeping notes about what we've changed. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hart at pglaf.org Mon Oct 4 10:41:08 2004 From: hart at pglaf.org (Michael Hart) Date: Mon Oct 4 10:41:10 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com> References: <20041004172858.1311E4BDA9@ws1-1.us4.outblaze.com> Message-ID: On Mon, 4 Oct 2004, D. Starner wrote: > Michael Hart writes: > >> If the original source you use turns out to have errors, >> as nearly all books do, do you want the errors preserved? > > Yes. That's way too unconditional, and there's a lot of > minor errors in texts that we can just correct; but I've > been reading a book on Beowulf which talks about one of > the first published transcriptions that "corrected" a lot > of things and screwed with work on Beowulf for 50 years. > I hate to imagine us adding anacronistic spelling to a > work or making a work harder to understand. On DP, people > frequently ask about obvious errors that turn out to be > correct. If we want to be editors, let us be editors and > take full responsibility for checking other editions and > writing introductions and bibliographies and keeping notes > about what we've changed. If we do take such responsibility, then we are creating a new "critical edition," which was always our goal. However, we have tried to avoid the arguments that come with this sort of thing, as per a previous discussion about this, when we argued about the punctuation in "To be or not to be." Once we get into that kind of discussion, it really never ends, take my father's word for it, he was a great Shakespeare prof. Thanks! Michael From ke at gnu.franken.de Mon Oct 4 10:40:41 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Mon Oct 4 11:31:55 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: <1194651402.20041004021455@nextra.sk> (Skippi's message of "Mon, 4 Oct 2004 02:14:55 +0200") References: <8e.1648d182.2e8ef447@aol.com> <169440744187.20041001124205@noring.name> <1194651402.20041004021455@nextra.sk> Message-ID: Skippi writes: > I agree too and suggest that may be this information could be kept in > a XML format conforming some DTD (PG own) so that the book can be very > easily processed or catalogued. It would be wise to go with the TEI DTD and, actually, soem support for the TEI DTD is already available. Michael Hart writes: > If the original source you use turns out to have errors, > as nearly all books do, do you want the errors preserved? Sure. At least, say what you changed and why. Also I strongly recommend to keep the original page references; you can hide them, but it should be possibile to make them visible the user is interested in them. This could be done using a simple CSS mechanism. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From scott_bulkmail at productarchitect.com Mon Oct 4 11:23:10 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Mon Oct 4 13:02:12 2004 Subject: [gutvol-d] Indexing Editors, etc. In-Reply-To: References: <8e.1648d182.2e8ef447@aol.com> <169440744187.20041001124205@noring.name> <1194651402.20041004021455@nextra.sk> Message-ID: >If the original source you use turns out to have errors, >as nearly all books do, do you want the errors preserved? One of the advantages of an XML (or similar) "master version" is that we can have our cake and eat it too. The MASTER can capture the mistake AND the correction; a separate process can render either or both in various formats, e.g. HTML with little "tool tips" and of course plain text (optionally with or without "errors", anachronistic spellings, etc.). Three examples: to-day speeling f8r As an example of the third case, Twain's CT Yankee includes a "newspaper article" full of "typos" that are intended to illustrate an amateur job of typesetting. NOTE that the xml tags and attributes are just made up on the fly. I'm NOT advocating any specific tags. Also, I think it's better to start with something rather than wait for a perfect design. It's generally easy to transform from one xml into another, e.g. to (or from): to-day speeling f8r -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From Bowerbird at aol.com Mon Oct 4 13:21:09 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 4 13:21:32 2004 Subject: [gutvol-d] re: toasted-cheese sandwich Message-ID: <1e8.2b80fab1.2e930ab5@aol.com> michael said: > If the original source you use turns out to have errors, > as nearly all books do, do you want the errors preserved? me? nope. i want 'em corrected, with a note to that effect. but that's if you're _sure_ it's an error. it's not always that clear. if you _suspect_ something _might_ be an error, but aren't sure, even after detective work, i would want a note made to that effect. *** > However, we have tried to avoid the arguments that > come with this sort of thing, as per a previous discussion > about this, when we argued about the punctuation in > "To be or not to be." in such cases, make note of the arguments. :+) *** skippi said: > XML karl said: > TEI DTD ... > CSS scott said: > XML that x.m.l. it's good for _everything_, isn't it? :+) i just heard there's a new thing out now where x.m.l. can actually make you a toasted-cheese sandwich, no kidding, and you can even specify -- with a tag -- how dark or light you want the bread toasted. i gotta get me that thing, man, because i'm jonesin' for a toasted-cheese sandwich right now... -bowerbird From scott_bulkmail at productarchitect.com Mon Oct 4 13:50:16 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Mon Oct 4 13:50:53 2004 Subject: [gutvol-d] re: toasted-cheese sandwich In-Reply-To: <1e8.2b80fab1.2e930ab5@aol.com> References: <1e8.2b80fab1.2e930ab5@aol.com> Message-ID: >that x.m.l. it's good for _everything_, isn't it? :+) It does an outstanding job of all the uses that I've ever seen it suggested for on PG lists (based on reviewing several months of archives). How does Z/M/L handle the 3 cases that I listed? -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From Bowerbird at aol.com Mon Oct 4 14:13:37 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 4 14:14:06 2004 Subject: [gutvol-d] re: toasted-cheese sandwich Message-ID: <1d8.2cd34e3d.2e931701@aol.com> scott said: > How does Z/M/L handle the 3 cases that I listed? oh please, nobody here wants to talk about z.m.l. ok, maybe 2 or 3 people, but nobody else. and those 2 or 3 should join the beta-test for my viewer-program, where we can discuss issues like the ones scott lists. zml_talk-subscribe@yahoogroups.com now i'm off to make that sandwich... -bowerbird p.s. to answer your question, though, scott, annotations could be used to make those notes. From nwolcott2 at kreative.net Sat Oct 2 13:03:17 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Tue Oct 5 06:14:19 2004 Subject: [gutvol-d] Indexing Editors, etc. References: <144.3502adb2.2e8ea26d@aol.com> Message-ID: <008e01c4aadc$b6ac78e0$0b9495ce@net> I understand the need to conceal the date, publisher etc for most PG books. But I think an exception could be made if the book is over 100 years old. Books of this age are likely to be among the first published and hence have authenticity. Often the publisher has ceased publication, and in this case (as opposed to being bought out) there is no harm in listing the publisher. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: Gutenberg9443@aol.com To: gutvol-d@lists.pglaf.org Sent: Friday, October 01, 2004 8:07 AM Subject: Re: [gutvol-d] Indexing Editors, etc. In a message dated 9/19/2004 10:38:22 AM Mountain Standard Time, marevalo@marevalo.net writes: And it would be great to have the complete bibliographical record of the book (o books) used as source for the digital edition on every new text. Or at least the date of original publication and the name of the original publisher. Not having those makes it difficult to cite the book in research work. I usually resort to going to the Library of Congress record to get that, and then include the fact that this is an on-line edition etc. etc. etc. Anne ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041002/83f0fa08/attachment.html From shalesller at writeme.com Tue Oct 5 13:23:08 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 5 13:23:57 2004 Subject: [gutvol-d] Indexing Editors, etc. Message-ID: <20041005202308.51B3C4BDA9@ws1-1.us4.outblaze.com> > I understand the need to conceal the date, publisher etc for most PG books. I don't. Most of the reprint editions I've seen don't; they usually have printed right on the verso which edition they are a copy of. > But I think an exception could be made if the book is over 100 years old. > Books of this age are likely to be among the first published and hence have > authenticity. Often the publisher has ceased publication, and in this case > (as opposed to being bought out) there is no harm in listing the publisher. I'm doing a lot of books over 100 years old that are new editions of older books. And there's a lot of cases where (for example) the American edition came out months after the British edition, but isn't considered authentic at all. Likewise, the Early English Text Society is still publishing, as is the Oxford University Press and most other university presses. I don't think time makes a bit of difference here. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From marcello at perathoner.de Thu Oct 7 11:34:59 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 7 11:35:09 2004 Subject: [gutvol-d] Fundraising thru Amazon ??? Message-ID: <41658C53.9000108@perathoner.de> Warning: potential heresy ahead! Seeing that many `independent' PG websites are trying to make money using our books and our catalog data, basically mixing search results from PG and Amazon Ads, eg. http://www.abacci.com/books/ http://textual.net/access.gutenberg why not do some fundraising thru Amazon ourselves? Basically, we set up a page and tell our visitors: If you ever feel the need to buy a book at Amazon don't go there directly but always thru the PG site. Thus Amazon will pass a small percentage of the revenue back to PG. This way you can donate to PG without spending anything and virtually without any trouble. Just delete your old Amazon bookmark and bookmark this page instead. More details at: http://www.amazon.com/gp/browse.html?node=3435371 -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Thu Oct 7 11:38:37 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 7 11:38:50 2004 Subject: [gutvol-d] Fundraising thru Amazon ??? Message-ID: <20041007183837.811752F9DE@ws6-3.us4.outblaze.com> I've got no problem with it. Just don't make it too obtrusive and I think it's a fine way to try to bring some income into our little corner of the Net. Josh ----- Original Message ----- From: Marcello Perathoner Date: Thu, 07 Oct 2004 20:34:59 +0200 To: Project Gutenberg volunteer discussion Subject: [gutvol-d] Fundraising thru Amazon ??? > Warning: potential heresy ahead! > > > > Seeing that many `independent' PG websites are trying to make money > using our books and our catalog data, basically mixing search results > from PG and Amazon Ads, eg. > > http://www.abacci.com/books/ > http://textual.net/access.gutenberg > > why not do some fundraising thru Amazon ourselves? > > > Basically, we set up a page and tell our visitors: > > If you ever feel the need to buy a book at Amazon > don't go there directly but always thru the PG site. > Thus Amazon will pass a small percentage of the revenue > back to PG. This way you can donate to PG without > spending anything and virtually without any trouble. > Just delete your old Amazon bookmark and bookmark > this page instead. > > > More details at: > > http://www.amazon.com/gp/browse.html?node=3435371 > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From maitriv at yahoo.com Thu Oct 7 11:55:33 2004 From: maitriv at yahoo.com (maitri venkat-ramani) Date: Thu Oct 7 11:55:44 2004 Subject: [gutvol-d] Fundraising thru Amazon ??? In-Reply-To: <41658C53.9000108@perathoner.de> Message-ID: <20041007185533.97941.qmail@web52302.mail.yahoo.com> The method that you suggest is what Newsscan does at the end of their Honorary Subscriber segment. The only problem I have with it is sending people to Amazon at all, when they should be getting out of their houses and keeping local bookstores afloat. Granted, it's easier and cheaper (and how many of us live in the internet age now), but still gives me that feeling akin to shopping at WalMart. Social policy aside, it's a great way to get some of the money Amazon is giving away. Cheers, Maitri --- Marcello Perathoner wrote: > Warning: potential heresy ahead! > > > > Seeing that many `independent' PG websites are trying to make money > using our books and our catalog data, basically mixing search results > > from PG and Amazon Ads, eg. > > http://www.abacci.com/books/ > http://textual.net/access.gutenberg > > why not do some fundraising thru Amazon ourselves? > > > Basically, we set up a page and tell our visitors: > > If you ever feel the need to buy a book at Amazon > don't go there directly but always thru the PG site. > Thus Amazon will pass a small percentage of the revenue > back to PG. This way you can donate to PG without > spending anything and virtually without any trouble. > Just delete your old Amazon bookmark and bookmark > this page instead. > > > More details at: > > http://www.amazon.com/gp/browse.html?node=3435371 > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From gbnewby at pglaf.org Thu Oct 7 11:59:09 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Oct 7 11:59:11 2004 Subject: [gutvol-d] Fundraising thru Amazon ??? In-Reply-To: <41658C53.9000108@perathoner.de> References: <41658C53.9000108@perathoner.de> Message-ID: <20041007185909.GA25006@pglaf.org> On Thu, Oct 07, 2004 at 08:34:59PM +0200, Marcello Perathoner wrote: > Warning: potential heresy ahead! > > > > Seeing that many `independent' PG websites are trying to make money > using our books and our catalog data, basically mixing search results > from PG and Amazon Ads, eg. > > http://www.abacci.com/books/ > http://textual.net/access.gutenberg > > why not do some fundraising thru Amazon ourselves? > > > Basically, we set up a page and tell our visitors: > > If you ever feel the need to buy a book at Amazon > don't go there directly but always thru the PG site. > Thus Amazon will pass a small percentage of the revenue > back to PG. This way you can donate to PG without > spending anything and virtually without any trouble. > Just delete your old Amazon bookmark and bookmark > this page instead. > > > More details at: > > http://www.amazon.com/gp/browse.html?node=3435371 While I'd like to add a few hundred words of disclaimer about how Amazon has stolen our works and the works of our authors (including copyrighted contemporary authors like Sam Vaknin), and about how they keep trying to "partner" with PG, but always drop off the edge of the earth after we do a bunch of work for them, and have never done *anything* for us, including what they've offered & promised, and about how putting ink on dead trees is completely passe, and about how I'm *still* boycotting them over the 1-click patent thing, and so should you, and about how they give eBooks a bad name by having completely disfunctional "within the book" pages on their site, and DRM'd versions of public domain content, and more ... I think it's OK to put them in some far-off corner of gutenberg.net (*not* on "links & affiliates", please, which should be for people we *like*) with this pass-through link. I'd like to do the same for O'Reilly & BN.com, if they have similar programs. I'd also like to make sure we get the $$$ from Amazon (and how much it will be), so we can have full disclosure to our buyers - um, I mean, readers - about what their actions do for us. Maybe someone else will want to pay us $200 per year or so to NOT put a link to Amazon (I'm guessing that's about the most we'd get from this). -- Greg From j.hagerson at comcast.net Thu Oct 7 18:16:38 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Thu Oct 7 18:19:04 2004 Subject: [gutvol-d] Fundraising thru Amazon ??? In-Reply-To: <20041007185909.GA25006@pglaf.org> Message-ID: <00be01c4acd4$7998ae70$6401a8c0@enterprise> Yesterday, I saw a financial news story to the effect that Google is planning a similar (full-text search of books) service. Here is a link I found in Google News, by searching "Google and books" http://www.itweek.co.uk/news/1158624 Many more citations are available. From sly at victoria.tc.ca Sat Oct 9 01:26:13 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Oct 9 01:26:36 2004 Subject: [gutvol-d] Extra spaces in html files Message-ID: Dear fellow PG volunteers, I know that discussing issues of markup in PG files is a pointless argument that rarely goes anywhere. Still, I must ask if is it generally acceptable to most PG volunteers to have HTML files in the collection with massive amounts of redundant white space in them? By this point in time, there are megabytes of storage space in the PG archive which consist of only spaces because of much indentation in html files. Take a look at the html source of the recently released Edward Lear "A Book of Nonsense" to see an example a little more extreme than most I've seen: http://www.gutenberg.net/etext/13646 Andrew From hyphen at hyphenologist.co.uk Sat Oct 9 01:59:12 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Oct 9 02:00:03 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: References: Message-ID: On Sat, 9 Oct 2004 01:26:13 -0700 (PDT), Andrew Sly wrote: | Dear fellow PG volunteers, | | I know that discussing issues of markup in PG files | is a pointless argument that rarely goes anywhere. | Still, I must ask if is it generally acceptable to | most PG volunteers to have HTML files in the collection | with massive amounts of redundant white space in them? | | By this point in time, there are megabytes of storage | space in the PG archive which consist of only spaces | because of much indentation in html files. | | Take a look at the html source of the recently released | Edward Lear "A Book of Nonsense" to see an example a little | more extreme than most I've seen: | | http://www.gutenberg.net/etext/13646 Just had a look at it and IMO it appears to be *very* well done. The indentation is only *two* spaces per level, whereas some would use *eight* spaces per level. As anyone who has done hand programming of html or any computer language, knows, the indenting and other white space in the code is *absolutely* essential for understanding the code, especially after a year or two, when you have forgotten everything about it. The white space is even more essential when modifieing other peoples code. -- Dave Fawthrop Don't eat cousin Banana she shares 50% of your genes. Do not kill cousin House Mouse, it is not his fault he is doubly incontinent. Flies need your help. Killing cousin salmonella with bleach is murder, he is as much alive as you are. ;-) From traverso at dm.unipi.it Sat Oct 9 02:25:04 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sat Oct 9 02:25:29 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: (message from Andrew Sly on Sat, 9 Oct 2004 01:26:13 -0700 (PDT)) References: Message-ID: <200410090925.i999P4iv004470@posso.dm.unipi.it> >>>>> "Andrew" == Andrew Sly writes: Andrew> Dear fellow PG volunteers, Andrew> I know that discussing issues of markup in PG files is a Andrew> pointless argument that rarely goes anywhere. Still, I Andrew> must ask if is it generally acceptable to most PG Andrew> volunteers to have HTML files in the collection with Andrew> massive amounts of redundant white space in them? Andrew> By this point in time, there are megabytes of storage Andrew> space in the PG archive which consist of only spaces Andrew> because of much indentation in html files. Andrew> Take a look at the html source of the recently released Andrew> Edward Lear "A Book of Nonsense" to see an example a Andrew> little more extreme than most I've seen: I have taken the file, unzipped, replaced every multiple whitespace with single withspace and rezipped; the saving has been 365 bytes (out of 640KB). The message of Andrew, as received by me, with all the headers etc, was 3247 bytes. Although one might discuss logical indenting in html sources, versus 75 column texts, I don't think that the space is at issue; discussing bytes, or even megabytes, when the archive is terabytes, is discussing 00001% savings. Carlo From jonathan_ingram at yahoo.com Sat Oct 9 03:46:10 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Sat Oct 9 03:46:33 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: Message-ID: <20041009104610.11378.qmail@web41722.mail.yahoo.com> --- Andrew Sly wrote: > Dear fellow PG volunteers, > > I know that discussing issues of markup in PG files > is a pointless argument that rarely goes anywhere. > Still, I must ask if is it generally acceptable to > most PG volunteers to have HTML files in the collection > with massive amounts of redundant white space in them? > > By this point in time, there are megabytes of storage > space in the PG archive which consist of only spaces > because of much indentation in html files. Just as much space is wasted by the pointless way we insert newlines into text editions to keep line lengths down to 80 characters. Much more space is wasted by the odd decision to include in PG poor-quality computerized 'readings' of PG material. The easiest way to drastically reduce the amount of wasted space used by PG is to get rid of the multiple editions, transition to one decently marked up XML master format, and convert to required output formats on the fly. This has approximately zero chance of happening any time soon. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From jeroen at bohol.ph Sat Oct 9 07:07:16 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sat Oct 9 07:06:03 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: <20041009104610.11378.qmail@web41722.mail.yahoo.com> References: <20041009104610.11378.qmail@web41722.mail.yahoo.com> Message-ID: <4167F094.5020908@bohol.ph> Jonathan Ingram wrote: >The easiest way to drastically reduce the amount of wasted space used by PG is >to get rid of the multiple editions, transition to one decently marked up XML >master format, and convert to required output formats on the fly. This has >approximately zero chance of happening any time soon. > > I am a big supporter of XML, but I challenge you to automatically create an acceptible ASCII version from one of my XML files without manual intervention... One small warning, they have loads of tables and other challenging stuff. I think it can be done, but it is far from trivial. Jeroen. From sly at victoria.tc.ca Sat Oct 9 09:38:15 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Oct 9 09:38:20 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: References: Message-ID: Thank you for everyone's feedback. A closer look at the file I mentioned shows that it uses tabs, not spaces for indenting, so it will appear differently depending on what program you use to view it. (the main body of the text is all indented by eight tabs, which for me, made it appear to start in the 64th column) Thanks, Andrew From ke at gnu.franken.de Sat Oct 9 09:50:38 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Sat Oct 9 09:44:47 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <4167F094.5020908@bohol.ph> (Jeroen Hellingman's message of "Sat, 09 Oct 2004 16:07:16 +0200") References: <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> Message-ID: Jeroen Hellingman writes: > I am a big supporter of XML, but I challenge you to automatically create > an acceptible ASCII version from one of my XML files without manual > intervention... Don't waste your time on so called ASCII version. Simple HTML as a replacement for the traditional ASCII version is "good enough" - then tools like lynx or w3m or links(?) can do the dirty work. I do not know whether there are special HTML device for the blind; but I know some of them use lynx to browse (parts of) the web. > One small warning, they have loads of tables and other challenging > stuff. I think it can be done, but it is far from trivial. First, these text browser can display tables and if this is not good enough, you can always press a magic key and view the HTML source. Of course, if people want to spend their time on ASCII versions, it is their business. But the XML version must be the source for all other formats. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From gbnewby at pglaf.org Sat Oct 9 19:18:14 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Oct 9 19:18:15 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: References: <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> Message-ID: <20041010021814.GB15791@pglaf.org> On Sat, Oct 09, 2004 at 06:50:38PM +0200, Karl Eichwalder wrote: > Jeroen Hellingman writes: > > > I am a big supporter of XML, but I challenge you to automatically create > > an acceptible ASCII version from one of my XML files without manual > > intervention... > > Don't waste your time on so called ASCII version. Simple HTML as a > replacement for the traditional ASCII version is "good enough" - then > tools like lynx or w3m or links(?) can do the dirty work. I do not know > whether there are special HTML device for the blind; but I know some of > them use lynx to browse (parts of) the web. > > > One small warning, they have loads of tables and other challenging > > stuff. I think it can be done, but it is far from trivial. > > First, these text browser can display tables and if this is not good > enough, you can always press a magic key and view the HTML source. > > Of course, if people want to spend their time on ASCII versions, it is > their business. But the XML version must be the source for all other > formats. I'm just writing to point out that Karl's statements are not consistent with how Project Gutenberg processes and distributes eBooks. See more in our FAQ at gutenberg.net In short: - we *require* plain text, except in cases where the format, language or other aspects make it impossible or highly difficult As Jeroen mentioned, we're anxious to have an automatic transformation from XML to HTML and from XML to plain text. These have proven more difficult than expected, although both Jeroen & Marcello have solutions that are pretty good. People who think they know how to accomplish this task should send a URL to documentation & a demonstration. -- Greg PS: People who want PDF-only, XML-only, HTML-only, TeX-only, etc. are welcome to start their own projects. PG might even be willing to license our name to you (more on this in http://gutenberg.net/about). From nwolcott2 at kreative.net Sat Oct 9 23:04:21 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Oct 9 23:21:05 2004 Subject: [gutvol-d] Extra spaces in html files References: Message-ID: <008601c4ae90$c56264a0$1d9895ce@net> Most of the white space in the html is tab characters, so that cuts things down quite a bit. Also if a few tabs make the html source more readable (and editable) then why not? In any event the tabs use up far less space than the pictures. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Andrew Sly" To: Sent: Saturday, October 09, 2004 4:26 AM Subject: [gutvol-d] Extra spaces in html files > Dear fellow PG volunteers, > > I know that discussing issues of markup in PG files > is a pointless argument that rarely goes anywhere. > Still, I must ask if is it generally acceptable to > most PG volunteers to have HTML files in the collection > with massive amounts of redundant white space in them? > > By this point in time, there are megabytes of storage > space in the PG archive which consist of only spaces > because of much indentation in html files. > > Take a look at the html source of the recently released > Edward Lear "A Book of Nonsense" to see an example a little > more extreme than most I've seen: > > http://www.gutenberg.net/etext/13646 > > Andrew > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From tb at baechler.net Sun Oct 10 01:28:13 2004 From: tb at baechler.net (Tony Baechler) Date: Sun Oct 10 01:27:38 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: References: <4167F094.5020908@bohol.ph> <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> Message-ID: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> At 06:50 PM 10/9/2004 +0200, you wrote: >Jeroen Hellingman writes: > > > I am a big supporter of XML, but I challenge you to automatically create > > an acceptible ASCII version from one of my XML files without manual > > intervention... > >Don't waste your time on so called ASCII version. Simple HTML as a >replacement for the traditional ASCII version is "good enough" - then >tools like lynx or w3m or links(?) can do the dirty work. I do not know >whether there are special HTML device for the blind; but I know some of >them use lynx to browse (parts of) the web. Hello. Yes, I am blind and I still use Lynx regularly. However, it does not create clean ASCII files. Every page I convert has tww blank spaces at the beginning of every line and it inserts junk to mark links and image placeholders. Also, more and more sites no longer work with text browsers so using Lynx or Links is becoming a thing of the past. Please don't even get me started on how poor Internet Explorer does at plain text dumps, however it is currently the most accessible graphical browser. One thing I really like about the current PG model is that I can quickl go to the ftp site, grab a file, unzip it and have readable plain text. I would not want to have to download a master xml file and convert it or have the PG site convert it on the fly and try to download it with my browser. Let's not lose sight of the goal of PG, to make as many ebooks available to as many people on as many platforms as possible. I can download the same file to my Windows or Linux machines and they are just as accessible. I can load it into a portable notetaker for the blind and it is still just as accessible. I can even put it on my old Apple II and yes, it's still accessible. I hope this doesn't change. From ke at gnu.franken.de Sat Oct 9 22:18:19 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Sun Oct 10 07:31:48 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <20041010021814.GB15791@pglaf.org> (Greg Newby's message of "Sat, 9 Oct 2004 19:18:14 -0700") References: <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> <20041010021814.GB15791@pglaf.org> Message-ID: Greg Newby writes: > - we *require* plain text, except in cases where the format, > language or other aspects make it impossible or highly difficult In cases where plain text is not impossible or highly difficult, use lynx's or w3m's -dump option. Problem solved. Most of the time this will look better than hand-crafted .txt files. > PS: People who want PDF-only, XML-only, HTML-only, TeX-only, > etc. are welcome to start their own projects. That's what I do. But this does not mean I did not try to cooperate. > PG might even be willing to license our name to you (more on this in > http://gutenberg.net/about). The name is not important for my (little) project. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From ke at gnu.franken.de Sun Oct 10 07:19:24 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Sun Oct 10 07:31:49 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> (Tony Baechler's message of "Sun, 10 Oct 2004 01:28:13 -0700") References: <4167F094.5020908@bohol.ph> <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> Message-ID: Tony Baechler writes: > However, it does not create clean ASCII files. Every page I convert > has tww blank spaces at the beginning of every line and it inserts > junk to mark links and image placeholders. I appreciate your feedback very much! I guess with a little bit post-processing we can improve the output. Or we should use 'w3m' for creating txt files. > One thing I really like about the current PG model is that I can quickl go > to the ftp site, grab a file, unzip it and have readable plain text. Yes, I don't want you to produce txt files on your own. We should change the way how we create txt files. Doing txt files by hand is too slow. Often it is necessary to improve a text (typos, missing part, random garbage); if you have to apply the same correction to various files manually you must spend more time than necessary and such a procedure is error prone by itself. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From Bowerbird at aol.com Sun Oct 10 09:12:12 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Oct 10 09:12:23 2004 Subject: [gutvol-d] Re: Extra spaces in html files Message-ID: <1dc.2d93808d.2e9ab95c@aol.com> tony, thank you! your input as a blind reader is extremely valuable. i love it. > Let's not lose sight of the goal of PG ha! and you've got a sense of humor too! ;+) -bowerbird From nwolcott2 at kreative.net Sun Oct 10 20:00:32 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sun Oct 10 20:04:50 2004 Subject: [gutvol-d] St. Nicholas Magazines Message-ID: <005e01c4af3e$84610340$ae9495ce@net> I have about 10 years of annuals of St. Nicholas Magazine rescued from a dumpster. Does anyone know the address of the DP High speed scanning place, and whether anyone there would process these? They have wonderful engravings. I would lpay the postage, if I knew where to send them and that someone would process them to DP. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041010/bcb2bdc9/attachment.html From hart at pglaf.org Thu Oct 14 04:59:47 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Oct 14 04:59:48 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: References: <4167F094.5020908@bohol.ph> <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> Message-ID: On Sun, 10 Oct 2004, Karl Eichwalder wrote: > Tony Baechler writes: > >> However, it does not create clean ASCII files. Every page I convert >> has tww blank spaces at the beginning of every line and it inserts >> junk to mark links and image placeholders. > > I appreciate your feedback very much! I guess with a little bit > post-processing we can improve the output. Or we should use 'w3m' for > creating txt files. > >> One thing I really like about the current PG model is that I can quickl go >> to the ftp site, grab a file, unzip it and have readable plain text. > > Yes, I don't want you to produce txt files on your own. We should > change the way how we create txt files. Doing txt files by hand is too > slow. Often it is necessary to improve a text (typos, missing part, > random garbage); if you have to apply the same correction to various > files manually you must spend more time than necessary and such a > procedure is error prone by itself. When I was faced with these problems, I just wrote macros for my word processor to take out leading and trailing spaces. If there were sections of poetry or songs that looked better indented, then I just changed the spaces in those to @'s and then did a global search and replace [after first searching for @'s already there]. These steps all combined take less time than I spent writing this. Michael Hart From hart at pglaf.org Thu Oct 14 05:26:14 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Oct 14 05:26:16 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> References: <4167F094.5020908@bohol.ph> <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> Message-ID: One more suggestion: there are many brands of word processors and other programs that include file conversion [and the kinds of macros I had mentioned earlier], so I should think it would be easy enuf to find one that met your specifications. My own suggestion would be to start with things such as the Word Perfect versions, don't just try one version, they are quite different version to version. Michael From hart at pglaf.org Thu Oct 14 05:37:21 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Oct 14 05:37:23 2004 Subject: [gutvol-d] Extra spaces in html files In-Reply-To: References: Message-ID: On Sat, 9 Oct 2004, Andrew Sly wrote: > > Thank you for everyone's feedback. > > A closer look at the file I mentioned shows that it uses tabs, > not spaces for indenting, so it will appear differently depending > on what program you use to view it. > (the main body of the text is all indented by eight tabs, which > for me, made it appear to start in the 64th column) That's why TABS are not recommended in any Project Gutenberg file. Michael From ke at gnu.franken.de Thu Oct 14 07:44:08 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Thu Oct 14 09:31:40 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: (Michael Hart's message of "Thu, 14 Oct 2004 05:26:14 -0700 (PDT)") References: <4167F094.5020908@bohol.ph> <20041009104610.11378.qmail@web41722.mail.yahoo.com> <4167F094.5020908@bohol.ph> <5.2.0.9.0.20041010012245.00acf530@snoopy2.trkhosting.com> Message-ID: Michael Hart writes: > there are many brands of word processors and other programs > that include file conversion [and the kinds of macros I had > mentioned earlier], so I should think it would be easy enuf > to find one that met your specifications. Converting any file into HTML is easy, but that's not the point. If you are interested in good HTML or PDF you must start with a sematically tagged file (these days that's mostly an XML file). -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From Bowerbird at aol.com Thu Oct 14 10:02:44 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 14 10:03:01 2004 Subject: [gutvol-d] Re: Extra spaces in html files Message-ID: <42.5a526037.2ea00b34@aol.com> karl said: > If you are interested in good HTML or PDF > you must start with a sematically tagged file > (these days that's mostly an XML file). can you give and defend your definition of "good" in this case? ditto with "semantically tagged file"? and, if you are up to the challenge, what is your recommendation as to the route that should be taken to get a library of 14,000+ e-texts converted to the brand of x.m.l. markup you think is best? (bonus points if you can convince all the other x.m.l. advocates that the markup version you prefer is better than the ones they prefer.) finally, greg recently requested that people come forward with working routines to implement an x.m.l.-master methodology. are you able to answer that call? did you? if so, do let us know. -bowerbird From marcello at perathoner.de Thu Oct 14 10:10:27 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 14 10:10:36 2004 Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html files] In-Reply-To: <42.5a526037.2ea00b34@aol.com> References: <42.5a526037.2ea00b34@aol.com> Message-ID: <416EB303.8040800@perathoner.de> Bowerbird@aol.com wrote: >> If you are interested in good HTML or PDF >> you must start with a sematically tagged file >> (these days that's mostly an XML file). > > > can you give and defend your definition of "good" in this case? > > ditto with "semantically tagged file"? > > and, if you are up to the challenge, what is your recommendation > as to the route that should be taken to get a library of 14,000+ > e-texts converted to the brand of x.m.l. markup you think is best? > > (bonus points if you can convince all the other x.m.l. advocates that > the markup version you prefer is better than the ones they prefer.) > > finally, greg recently requested that people come forward with > working routines to implement an x.m.l.-master methodology. > are you able to answer that call? did you? if so, do let us know. > > -bowerbird -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 14 10:35:36 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 14 10:35:54 2004 Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html files] Message-ID: <128.4dabf8bd.2ea012e8@aol.com> i have a question for the person tending this listserve: is this kind of name-calling condoned on this listserve? -bowerbird From jeroen at bohol.ph Thu Oct 14 15:29:08 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Thu Oct 14 15:29:25 2004 Subject: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html files] In-Reply-To: <416EB303.8040800@perathoner.de> References: <42.5a526037.2ea00b34@aol.com> <416EB303.8040800@perathoner.de> Message-ID: <416EFDB4.9090708@bohol.ph> Well, lets keep the name calling off-line, and the discussion pure..., and realise that XML is not a format, but a way of specifying formats (and probably all these formats have in common is that they use angled brackets in some way), and that semantically tagged is an ideal, that even the most ambitious attempts at a generic DTD for pre-existing texts (and that is what we are mostly dealing with in PG) have not reached, and is either unreachable (since we can't know the original intend with much of the formatting we encounter) or impractical (since the effort to do all this tagging is just too big, and isn't really needed by 99% of the users.) In my opinion, the best attempt to such a generic beast has been the TEI effort, which is described in a massive 1400 page document, still requires customization for numerous academic projects (both are bad news; both are unavoidable given the complexity of the task) -- but which can cover 95 percent of all text with just 5 percent of that bulk in an incarnation called TEI-Lite, and that is basically all I suggest to PG to adopt as a standard. The nice thing of this monster is that we can add those 5 percent, and if somebody decides to add more, nothing will stop him, and he can easily return the improved version to the collection. Doing fully automatic convertion to good paged PDFs for printing nice copies (and I mean good, as different from workable) will probably always remain a dream, as good layout, just as good a good typographic design is a skill, learned through doing it a lot. Even in a highly programmable environment such as TeX, I've never been able to print something from "semantic" markup without manual interventions once in a while -- even for something as arcane as a two column dictionary. Simularly, doing a good HTML (as different from a reasonable HTML) will probably also require manual intervention and tweaking once in a while... but both these things do not disqualify the large benefits we could have from having TEI tagged master copies in our collection, even if just at a relatively simple level of tagging (just marking headings, divisions, italics, footnotes, and tables). The task of producing nice HTML / Printable versions of XML documents is further complicated by the highly verbose and somewhat unintuitive model of XSLT, which is presented as the most important tool for this task -- from the computer scientist purist point of view that might be true, but for many less gods, who think five lines of basic is already a lot, its functional programming model and verbosity is a real piss-off. Getting 14000+ texts to XML can be done, just as they where produced initially, by starting somewhere with the first one, and not stopping until we've completed them all. A very simple alternative way would be to load them in OpenOffice, apply the formatting you like and save it (OpenOffice uses XML files for everything, and collects them in zip archives. If you don't believe that, change the extention of an OpenOffice document to .zip, and have a look inside) ofcourse that formatting would be very much non-"semantic". Jeroen. (Still formatting his ebooks in SGML based TEI) Marcello Perathoner wrote: > Bowerbird@aol.com wrote: > >>> If you are interested in good HTML or PDF you must start with a >>> sematically tagged file >>> (these days that's mostly an XML file). >> >> >> >> can you give and defend your definition of "good" in this case? >> >> ditto with "semantically tagged file"? >> >> and, if you are up to the challenge, what is your recommendation >> as to the route that should be taken to get a library of 14,000+ >> e-texts converted to the brand of x.m.l. markup you think is best? >> >> (bonus points if you can convince all the other x.m.l. advocates that >> the markup version you prefer is better than the ones they prefer.) >> >> finally, greg recently requested that people come forward with >> working routines to implement an x.m.l.-master methodology. >> are you able to answer that call? did you? if so, do let us know. >> >> -bowerbird > > > From jon at noring.name Thu Oct 14 16:45:07 2004 From: jon at noring.name (Jon Noring) Date: Thu Oct 14 16:46:12 2004 Subject: YesLogic's Prince and OpenReader (was Re: [gutvol-d] Don't feed the troll ! [Was: Extra spaces in html files]) In-Reply-To: <416EFDB4.9090708@bohol.ph> References: <42.5a526037.2ea00b34@aol.com> <416EB303.8040800@perathoner.de> <416EFDB4.9090708@bohol.ph> Message-ID: <8536547921.20041014174507@noring.name> Jeroen wrote: > The task of producing nice HTML / Printable versions of XML documents is > further complicated by the highly verbose and somewhat unintuitive model > of XSLT, which is presented as the most important tool for this task -- > from the computer scientist purist point of view that might be true, but > for many less gods, who think five lines of basic is already a lot, its > functional programming model and verbosity is a real piss-off. There is actually a fairly powerful "non-professional" alternative to the XSLT/XSL-FO approach to converting XML into PDF (or similar page-oriented layout): YesLogic's Prince product (soon to be at version 4.0 with optimized PDF output and embedded fonts -- wait until 4.0 is released in the next few days.) Prince uses the XML+CSS approach, and of course invokes the advanced CSS2 and some of the proposed CSS3 constructs. The founder of YesLogic, Michael Day, serves on the CSS Working Group of W3C, so he is quite aware of the power and limitations of CSS. Of course, there are a few knotty things that the current CSS2 cannot do, but YesLogic has added a few "custom" CSS constructs to fill in the voids, just as both Mozilla and Opera have (little known, btw). (I also want to add for those few here interested that the CSS parser in Prince is probably the best out there.) Now, I do agree that the absolute best outputs for print from XML sources via the XSLT/XSL-FO and Prince approaches require human intervention ("tweaking"), but the nice thing with a tool like Prince is that it gets one most of the way there, uses the slightly easier-to-use CSS, and allows for manual tweaking until the PDF is just right. Prince supports SVG and plans to add MathML support as well. They are a major supporter of the OpenReader System which I'm leading the development of: http://www.openreader.org . As an aside, for OpenReader I'm now building a supporter's/endorser's page, and any company, organization or individual willing to add their logo or name to the page, contact me in private email -- I'll send you the link to the current draft supporters page if you're interested in supporting/endorsing OpenReader. Maybe PG Foundation is interested? Greg? Michael? Btw, OpenReader plans to eventually natively support TEI-Lite (or maybe a well-defined subset of TEI or TEI-Lite) without need for conversion, including supporting constructs not supported in HTML web browsers such as inline notes and the like. Refer to the OpenReader web site for the details. Heck, we may even support ZML if it becomes popular as Bowerbird believes it will -- it'd be trivial to support ZML, actually (we'd internally convert it to XML and then present it using standardized CSS style sheets.) Jon Noring From j.hagerson at comcast.net Thu Oct 14 19:19:08 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Thu Oct 14 19:17:31 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <002401c4b25d$5e1c4470$6401a8c0@enterprise> Please picture this scenario: I'm a volunteer who has scanned a public-domain book and wants to make it available through the PG distribution mechanism (free of charge, available until the Internet collapses under the weight of spam and next-generation pornography, yadda, yadda, yadda). Today, if I can convert this book to plain text (according to some stated formatting conventions), I may submit the book. If I'm ambitious, I can create an HTML version, which presents the same information, but allows "real" formatting rather than _italic_ and *bold*. In the background, however, there is this Whole New World(tm) of semantic tagging, which presumably will allow the book to make snacks and provide entertainment during the reading process. But, for me, as a volunteer, who spends a considerable amount of time working on books, but enjoys actually finishing one and seeing it posted, I can't get my arms around the benefits. Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO. Could someone please explain the benefit of semantic tagging and why it won't horribly lengthen the amount of time required to produce an eBook? Thank you. From stephen.thomas at adelaide.edu.au Thu Oct 14 20:11:18 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Thu Oct 14 20:11:39 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise> References: <002401c4b25d$5e1c4470$6401a8c0@enterprise> Message-ID: <416F3FD6.5080705@adelaide.edu.au> John Hagerson wrote: > ... > > Could someone please explain the benefit of semantic tagging and why it > won't horribly lengthen the amount of time required to produce an eBook? Well, I'll try: First, let me say that for many works, for the purpose of *reading* the work, it doesn't matter. (I'll probably be flamed for that, but never mind.) Your simple, basic, novel, in which there are a great many paragraphs of text, divided into chapters with obvious headings like "CHAPTER II", don't really need much more than the very basic, simple HTML P tag. However, not all works are so simple. Yesterday I had cause to look at Immanuel Kant's /The Science of Right/, in which the author chose to use a great many divisions, subdivisions, sections, etc. -- all with their own headers. Since I converted this from plain text to HTML, I needed to determine from the plain text which were headings, subheadings, sub-sub-headings, etc. And unfortunately, this has required some guess-work by me. So, one benefit of more detailed tagging would be that for such a work, it would be made obvious and explicit which were headings, and which sub-headings. In other words, the structure intended by Kant is recorded in the tagging. Another example: look at any play. You have speech, names of speakers, stage directions, headings, and divisions into Act and Scene. All of these are made explicit by the tagging. Without tagging, there may well be confusion at some point as to what is speech and what is stage direction, for example. In a plain text file, we do make some effort to distinguish different elements of a work: quotations are indented, headings in UPPER CASE and centered, etc. But any kind of complexity in the work tends quickly to make that unworkable. Regards, Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From ke at gnu.franken.de Thu Oct 14 20:13:45 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Thu Oct 14 20:31:06 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <42.5a526037.2ea00b34@aol.com> (Bowerbird@aol.com's message of "Thu, 14 Oct 2004 13:02:44 EDT") References: <42.5a526037.2ea00b34@aol.com> Message-ID: Bowerbird@aol.com writes: > can you give and defend your definition of "good" in this case? "Save as HTML" normally is not good enough. > ditto with "semantically tagged file"? Why do you ask? > and, if you are up to the challenge, what is your recommendation > as to the route that should be taken to get a library of 14,000+ > e-texts converted to the brand of x.m.l. markup you think is best? We can keep the old file unchanged for the time being. XML produced by http://www.pgdp.net/ is good enough to work with. > finally, greg recently requested that people come forward with > working routines to implement an x.m.l.-master methodology. > are you able to answer that call? did you? if so, do let us know. For converting TEI XML to HTML and PDF you can use Sebastian Rahtz' XSL stylesheets: http://www.tei-c.org/Stylesheets/teixsl.html I'm old fashioned and like playing with DSSSL tools (that's all in German and not that polished nor finished -- take it as a proof of concept): http://www.gnu.franken.de/Tieck/ http://www.gnu.franken.de/Tieck/Dokumente/Koepke/ -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From joshua at hutchinson.net Fri Oct 15 04:08:16 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 04:08:20 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise> References: <002401c4b25d$5e1c4470$6401a8c0@enterprise> Message-ID: <416FAFA0.1030304@hutchinson.net> Steve makes a good answer in another post, but I wanted to add my personal holy grail that hopefully a TEI-Lite master format will help bring about... A single master document. Right now, I create a ASCII version and then a HTML version. If I make the ASCII version first, it almost never fails that I find at least one more mistake when I then do the HTML version. I fix it there, but I have to remember it and go back to the ASCII version and make the fix there. And god forbid the fix requires another rewrap. A master document format that is auto-converted to the others (at an acceptable level) would be wonderful and, imo, worth a little extra up front effort to prepare it. If someone could get a working bit of code in place, I'd be happy to start testing it like crazy and work on old texts to get it converted to that format. Josh John Hagerson wrote: >Please picture this scenario: > >I'm a volunteer who has scanned a public-domain book and wants to make it >available through the PG distribution mechanism (free of charge, available >until the Internet collapses under the weight of spam and next-generation >pornography, yadda, yadda, yadda). > >Today, if I can convert this book to plain text (according to some stated >formatting conventions), I may submit the book. If I'm ambitious, I can >create an HTML version, which presents the same information, but allows >"real" formatting rather than _italic_ and *bold*. > >In the background, however, there is this Whole New World(tm) of semantic >tagging, which presumably will allow the book to make snacks and provide >entertainment during the reading process. But, for me, as a volunteer, who >spends a considerable amount of time working on books, but enjoys actually >finishing one and seeing it posted, I can't get my arms around the benefits. > >Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO. > >Could someone please explain the benefit of semantic tagging and why it >won't horribly lengthen the amount of time required to produce an eBook? > >Thank you. > > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From nihil_obstat at mindspring.com Fri Oct 15 06:13:56 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Fri Oct 15 06:14:00 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net> I started e-books in the old days when PG was only plain text. Then after quite a long lapse I had returned to discover that I could release a book in HTML if I wished, supplying a standard TXT along with it. I am happy with this arrangement, sometimes doing both HTML and TXT, and sometimes just TXT depending on how highly formatted the original was. I tend to work the opposite way, though, doing the HTML first (using a text editor incidentally), then stripping the code for the TXT. It is probably not the most efficient way, but hobbies are not supposed to be efficient. I am ignorant too about the acronyms you mentioned. I am also very pragmatic, and hope to remain totally ignorant of these until someone proves to me--with a history of examples--that it is worth it. TXT and HTML have such histories, so I shall stick with these for now. Regarding HTML, some thoughts. . . - Use the full range of tags when appropriate (but if possible stick with the older 3.2 tags unless necessary. I always try the simplest tool first that will do the job). There was a reply about the limitations in TXT with heading hierarchies. HTML has several levels of header tags that are meant to be used for this purpose. Other tags can be used creatively to achieve other ends. A list of the 3.2 tags are at http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to validate, though). - The huge benefit of HTML (besides the text formatting that you mentioned) is the ability to insert images. Some books I would never have considered working on if could not have done an HTML. - Don't forget to set the background color if you want a specific color (in the BODY tag, or style sheet). I have seen hundreds of pages where the writer assumes that white is always the default background color for everyone (not true) intending the graphics to blend into the background. -----Original Message----- From: Joshua Hutchinson Sent: Oct 15, 2004 7:08 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] I'm sorry but I don't get it... Steve makes a good answer in another post, but I wanted to add my personal holy grail that hopefully a TEI-Lite master format will help bring about... A single master document. Right now, I create a ASCII version and then a HTML version. If I make the ASCII version first, it almost never fails that I find at least one more mistake when I then do the HTML version. I fix it there, but I have to remember it and go back to the ASCII version and make the fix there. And god forbid the fix requires another rewrap. A master document format that is auto-converted to the others (at an acceptable level) would be wonderful and, imo, worth a little extra up front effort to prepare it. If someone could get a working bit of code in place, I'd be happy to start testing it like crazy and work on old texts to get it converted to that format. Josh John Hagerson wrote: >Please picture this scenario: > >I'm a volunteer who has scanned a public-domain book and wants to make it >available through the PG distribution mechanism (free of charge, available >until the Internet collapses under the weight of spam and next-generation >pornography, yadda, yadda, yadda). > >Today, if I can convert this book to plain text (according to some stated >formatting conventions), I may submit the book. If I'm ambitious, I can >create an HTML version, which presents the same information, but allows >"real" formatting rather than _italic_ and *bold*. > >In the background, however, there is this Whole New World(tm) of semantic >tagging, which presumably will allow the book to make snacks and provide >entertainment during the reading process. But, for me, as a volunteer, who >spends a considerable amount of time working on books, but enjoys actually >finishing one and seeing it posted, I can't get my arms around the benefits. > >Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO. > >Could someone please explain the benefit of semantic tagging and why it >won't horribly lengthen the amount of time required to produce an eBook? > >Thank you. > > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From scott_bulkmail at productarchitect.com Fri Oct 15 06:39:29 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Fri Oct 15 06:41:36 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <002401c4b25d$5e1c4470$6401a8c0@enterprise> References: <002401c4b25d$5e1c4470$6401a8c0@enterprise> Message-ID: I'll take your questions in reverse order. >why [semantic tagging] >won't horribly lengthen the amount of time required to produce an eBook? I think a two-part answer is important here. 1. The great news is that basic semantic tagging is roughly the same effort as HTML. And, if PG had acceptable MASTER-to-text conversion, the overall effort would be REDUCED compared to creating BOTH text and HTML by hand. Today, creating an eText involves throwing information away, e.g. converting what is clearly multiple levels of heading into ALL CAPS -- which loses any distinction between the levels. The key to creating a MASTER is to preserve this information. Sometimes this will require a tiny bit more time (to use the correct tag or add the appropriate attribute) but often it will take less time than manually converting to ALL CAPS or whatever. And, as I've argued elsewhere, there's no need to wait for widespread agreement on any particular set of XML tags. If used consistently, it's much, much easier to convert from one XML representation to another than to convert from text to HTML. In fact, it's also fine to skip XML and just use consistent HTML with appropriate div/span tags and/or attributes on regular HTML tags. What's important is to stop throwing useful information away and instead to capture it in a way that can be processed automatically. Takeaway point: reliable MASTER-to-text conversion would increase the number of eTexts produced per unit of volunteer time investment. (And, as DP folks have argued, additional automation would streamline other stages too.) 2. There's a second level of semantic tagging that *does* require more effort: adding information that's useful but isn't represented in print. For example, perhaps we want to label every quotation with the name of the speaker. That's easy in a play, since the name is printed. That's quite a lot of work in prose since the name may or may not occur adjacent to the quote, and even when it does, could be before or after, and may be represented several ways (e.g. "Arthur", "The King", "His Majesty"). I'm actually a fan of rich semantic markup, but, to be honest, the benefits of this second level are much smaller and the effort much greater. In the foreseeable future, this is likely only to be done when the volunteer has a specific end use in mind. >Could someone please explain the benefit of semantic tagging Others have addressed this, but I want to summarize and add a few points. 1. A single MASTER copy from which all other versions can be generated automatically. Plain text and HTML of course, but also PDF and the various eBook formats. Just as important: more than one rendition of any particular format can be created, e.g. a set of HTML files split by chapter or even page, or PDF formatted for a particular screen size, paper size, or printing layout (e.g. as a booklet). 2. Capture information that's beyond what is generally printed, but is useful to certain audiences and/or in certain contexts. e.g. (from an earlier thread) the MASTER can capture a mistake AND the correction; or other variations. See Re[2]: [gutvol-d] Indexing Editors, etc. from Oct. 4, 2004 for details. 3. Automated processes that "add value" in some way, e.g. using a different computer voice for different characters, or creating an index by character. -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From nwolcott2 at kreative.net Fri Oct 15 07:53:39 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Oct 15 07:57:43 2004 Subject: [gutvol-d] I'm sorry but I don't get it... References: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: <004801c4b2c6$ca0042e0$c99495ce@net> But PG has adopted standards which limit the range of tags and CSS you can use, so you may not be able to specify changes in background color or font, such as Alice in Wonderland. Some contributors put their HTML elsewhere, perhaps for this reason. Bad news. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Dennis McCarthy" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, October 15, 2004 9:13 AM Subject: Re: [gutvol-d] I'm sorry but I don't get it... > > I started e-books in the old days when PG was only plain text. Then after quite a long lapse I had returned to discover that I could release a book in HTML if I wished, supplying a standard TXT along with it. > > I am happy with this arrangement, sometimes doing both HTML and TXT, and sometimes just TXT depending on how highly formatted the original was. I tend to work the opposite way, though, doing the HTML first (using a text editor incidentally), then stripping the code for the TXT. It is probably not the most efficient way, but hobbies are not supposed to be efficient. > > I am ignorant too about the acronyms you mentioned. I am also very pragmatic, and hope to remain totally ignorant of these until someone proves to me--with a history of examples--that it is worth it. TXT and HTML have such histories, so I shall stick with these for now. > > Regarding HTML, some thoughts. . . > > - Use the full range of tags when appropriate (but if possible stick with the older 3.2 tags unless necessary. I always try the simplest tool first that will do the job). There was a reply about the limitations in TXT with heading hierarchies. HTML has several levels of header tags that are meant to be used for this purpose. Other tags can be used creatively to achieve other ends. A list of the 3.2 tags are at http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to validate, though). > > - The huge benefit of HTML (besides the text formatting that you mentioned) is the ability to insert images. Some books I would never have considered working on if could not have done an HTML. > > - Don't forget to set the background color if you want a specific color (in the BODY tag, or style sheet). I have seen hundreds of pages where the writer assumes that white is always the default background color for everyone (not true) intending the graphics to blend into the background. > > > -----Original Message----- > From: Joshua Hutchinson > Sent: Oct 15, 2004 7:08 AM > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] I'm sorry but I don't get it... > > Steve makes a good answer in another post, but I wanted to add my > personal holy grail that hopefully a TEI-Lite master format will help > bring about... > > A single master document. > > Right now, I create a ASCII version and then a HTML version. If I make > the ASCII version first, it almost never fails that I find at least one > more mistake when I then do the HTML version. I fix it there, but I > have to remember it and go back to the ASCII version and make the fix > there. And god forbid the fix requires another rewrap. > > A master document format that is auto-converted to the others (at an > acceptable level) would be wonderful and, imo, worth a little extra up > front effort to prepare it. > > If someone could get a working bit of code in place, I'd be happy to > start testing it like crazy and work on old texts to get it converted to > that format. > > Josh > > John Hagerson wrote: > > >Please picture this scenario: > > > >I'm a volunteer who has scanned a public-domain book and wants to make it > >available through the PG distribution mechanism (free of charge, available > >until the Internet collapses under the weight of spam and next-generation > >pornography, yadda, yadda, yadda). > > > >Today, if I can convert this book to plain text (according to some stated > >formatting conventions), I may submit the book. If I'm ambitious, I can > >create an HTML version, which presents the same information, but allows > >"real" formatting rather than _italic_ and *bold*. > > > >In the background, however, there is this Whole New World(tm) of semantic > >tagging, which presumably will allow the book to make snacks and provide > >entertainment during the reading process. But, for me, as a volunteer, who > >spends a considerable amount of time working on books, but enjoys actually > >finishing one and seeing it posted, I can't get my arms around the benefits. > > > >Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO. > > > >Could someone please explain the benefit of semantic tagging and why it > >won't horribly lengthen the amount of time required to produce an eBook? > > > >Thank you. > > > > > > > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > --------------------------- > Dennis McCarthy > nihil_obstat@mindspring.com > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From nihil_obstat at mindspring.com Fri Oct 15 08:26:31 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Fri Oct 15 08:26:35 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net> I once had to ask the whitewasher to put the background color code back in (it is only a few character's long) and he did it. He had put in a new automated header code that overwrote my BODY tag. I see the wisdom in avoiding the FONT tag as much as possible, particularly the font face (i.e. Arial, ComicSans, etc.) seems something bound to get lost or unrecognized in future browser/OS versions. If the font face is important enough (0.01% of the time?) one could do a PDF or page scans. Even when I am being a stickler about preserving the text format as much as possible, I only aim for it to be 99% useful for 99% of readers, rather than trying to make a perfect reprint. Of course there is no perfect reproduction. . . I once heard a researcher talk about a man he found smelling manuscripts in a library. A conversation started where the man explained he was trying to trace diseases in European towns. A vinegar spray was apparently used at one time as an attempted disinfectant when papers where transfered between infected and uninfected areas. You shall never get a reproduced smell from microfilm, a page scan, on an e-book. -----Original Message----- From: Norm Wolcott Sent: Oct 15, 2004 10:53 AM To: Dennis McCarthy , Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] I'm sorry but I don't get it... But PG has adopted standards which limit the range of tags and CSS you can use, so you may not be able to specify changes in background color or font, such as Alice in Wonderland. Some contributors put their HTML elsewhere, perhaps for this reason. Bad news. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Dennis McCarthy" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, October 15, 2004 9:13 AM Subject: Re: [gutvol-d] I'm sorry but I don't get it... > > I started e-books in the old days when PG was only plain text. Then after quite a long lapse I had returned to discover that I could release a book in HTML if I wished, supplying a standard TXT along with it. > > I am happy with this arrangement, sometimes doing both HTML and TXT, and sometimes just TXT depending on how highly formatted the original was. I tend to work the opposite way, though, doing the HTML first (using a text editor incidentally), then stripping the code for the TXT. It is probably not the most efficient way, but hobbies are not supposed to be efficient. > > I am ignorant too about the acronyms you mentioned. I am also very pragmatic, and hope to remain totally ignorant of these until someone proves to me--with a history of examples--that it is worth it. TXT and HTML have such histories, so I shall stick with these for now. > > Regarding HTML, some thoughts. . . > > - Use the full range of tags when appropriate (but if possible stick with the older 3.2 tags unless necessary. I always try the simplest tool first that will do the job). There was a reply about the limitations in TXT with heading hierarchies. HTML has several levels of header tags that are meant to be used for this purpose. Other tags can be used creatively to achieve other ends. A list of the 3.2 tags are at http://www.htmlhelp.com/reference/wilbur/list.html (don't forget to validate, though). > > - The huge benefit of HTML (besides the text formatting that you mentioned) is the ability to insert images. Some books I would never have considered working on if could not have done an HTML. > > - Don't forget to set the background color if you want a specific color (in the BODY tag, or style sheet). I have seen hundreds of pages where the writer assumes that white is always the default background color for everyone (not true) intending the graphics to blend into the background. > > > -----Original Message----- > From: Joshua Hutchinson > Sent: Oct 15, 2004 7:08 AM > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] I'm sorry but I don't get it... > > Steve makes a good answer in another post, but I wanted to add my > personal holy grail that hopefully a TEI-Lite master format will help > bring about... > > A single master document. > > Right now, I create a ASCII version and then a HTML version. If I make > the ASCII version first, it almost never fails that I find at least one > more mistake when I then do the HTML version. I fix it there, but I > have to remember it and go back to the ASCII version and make the fix > there. And god forbid the fix requires another rewrap. > > A master document format that is auto-converted to the others (at an > acceptable level) would be wonderful and, imo, worth a little extra up > front effort to prepare it. > > If someone could get a working bit of code in place, I'd be happy to > start testing it like crazy and work on old texts to get it converted to > that format. > > Josh > > John Hagerson wrote: > > >Please picture this scenario: > > > >I'm a volunteer who has scanned a public-domain book and wants to make it > >available through the PG distribution mechanism (free of charge, available > >until the Internet collapses under the weight of spam and next-generation > >pornography, yadda, yadda, yadda). > > > >Today, if I can convert this book to plain text (according to some stated > >formatting conventions), I may submit the book. If I'm ambitious, I can > >create an HTML version, which presents the same information, but allows > >"real" formatting rather than _italic_ and *bold*. > > > >In the background, however, there is this Whole New World(tm) of semantic > >tagging, which presumably will allow the book to make snacks and provide > >entertainment during the reading process. But, for me, as a volunteer, who > >spends a considerable amount of time working on books, but enjoys actually > >finishing one and seeing it posted, I can't get my arms around the benefits. > > > >Except for recognizing the acronyms, I am agnostic to XML/ZML/TEI/ABC/EIEIO. > > > >Could someone please explain the benefit of semantic tagging and why it > >won't horribly lengthen the amount of time required to produce an eBook? > > > >Thank you. > > > > > > > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > --------------------------- > Dennis McCarthy > nihil_obstat@mindspring.com > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From joshua at hutchinson.net Fri Oct 15 08:33:25 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 08:33:30 2004 Subject: [gutvol-d] PG TEI pages Message-ID: <20041015153325.EB1EFEDEC5@ws6-1.us4.outblaze.com> The recent discussion has me wanting to go back and refresh myself on what TEI options were currently available. However the links to the online TEI converter at the PG home back seems to be dead. Is this something was removed or has the link just grown old and retired when no one was looking? http://www.gutenberg.org/tei/ Josh From Bowerbird at aol.com Fri Oct 15 10:17:16 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 10:17:31 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <6d.357ee9e9.2ea1601c@aol.com> dennis said: > - The huge benefit of HTML (besides the text formatting > that you mentioned) is the ability to insert images. > Some books I would never have considered working on > if could not have done an HTML. the ability of a web-browser to combine text and images is indeed truly wonderful. doing that -- cross-platform, 24/7, world-wide -- was one big reason the web took off. but, just as you need to use a certain kind of viewer-app (i.e., that web-browser) to attain this inclusion of images, an intelligent viewer for e-texts can _also_ achieve this. regular text-viewers won't do it. but specialized ones will. indeed, my viewer-program shows images when it is used to display a text-file, _provided_ that text-file includes information that tells _which_ image to display _where_. (just like a web-browser needs the img tag with that info.) amazingly, however, this obviously-relevant-and-important information is often simply _not_included_ in the text-file. indeed, the information is sometimes _stripped_from_ files! (the in-process working files from distributed proofreaders routinely contain a note regarding the presence of an image, a line that contains the caption for the image if there is one.) what is needed, so that an intelligent viewer-program can know where to place an image, and what file contains it, is some kind of indicator in the file giving that information. the indicator could be as crude as a filename, or it could be more subtle. (i'll detail this, if you would like me to do so.) all of this is just to resubmit a plea that i have made before (and will _continue_ making until i get a positive response!) for information about the name and location of graphic-files to be included in the _plain-text_ versions of the e-texts... -bowerbird From joshua at hutchinson.net Fri Oct 15 10:39:21 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 10:39:27 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> > all of this is just to resubmit a plea that i have made before > (and will _continue_ making until i get a positive response!) > for information about the name and location of graphic-files > to be included in the _plain-text_ versions of the e-texts... And now it is no longer a plain-text file. With the added penalty of having no existing validators to make sure that the markup used is done correctly. Basically, you're reinventing the wheel for no purpose here. Josh From joel at oneporpoise.com Fri Oct 15 11:05:22 2004 From: joel at oneporpoise.com (Joel A. Erickson) Date: Fri Oct 15 11:05:16 2004 Subject: [gutvol-d] I'm sorry but I don't get it... References: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net> Message-ID: <001d01c4b2e1$8a48a960$6501a8c0@JOEL> From: "Dennis McCarthy": > I once heard a researcher talk about a man he found smelling manuscripts > in a library. A conversation started where the man explained he was > trying to trace diseases in European towns. A vinegar spray was > apparently used at one time as an attempted disinfectant when papers where > transfered between infected and uninfected areas. You shall never get a > reproduced smell from microfilm, a page scan, on an e-book. Is that "on an e-book" intended to be "or an e-book." If so, I'm not so sure about never being able to reproduce the smell. Not that I'm particularly keen on smelling books, but I've heard of working prototypes of scent devices activated digitally. From Bowerbird at aol.com Fri Oct 15 11:20:11 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 11:20:38 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <13c.3d465e9.2ea16edb@aol.com> john said: > for me, as a volunteer, who spends > a considerable amount of time working on books, > but enjoys actually finishing one and seeing it posted, > I can't get my arms around the benefits. ... > Could someone please explain the benefit of semantic tagging > and why it won't horribly lengthen the amount of time required > to produce an eBook? first of all, thank you for asking your questions. i look forward to hearing some answers to them. and thank you for your history of doing e-texts for project gutenberg. it's important to retain the volunteers who have been working all along... i wanted to make a point about one thing you said... > If I'm ambitious, I can create an HTML version, > which presents the same information, but allows > "real" formatting rather than _italic_ and *bold*. actually, if you take a look at that "real" formatting in the html-source, you'll see it's plain-ascii, namely: [i]italic[/i] and [b]bold[/b] or -- if you prefer -- [em]emphasis[/em] and [strong]strong[/strong] except, of course, using angle-brackets instead of the square ones that i used so the brackets wouldn't get swallowed up or interpreted. but yes, of course, i know what you _meant_, which is that when the e-text is _displayed_, the _viewer-program_ converts that "markup" appropriately, into "real" italics and real bold, even though there were no italics or bold in the source, just the _tags_ that indicated that styling was present. that is, you need to use the appropriate "user agent" (to use the markup-geek terminology now in favor) that knows how to interpret the markup and render it. however, it's not that difficult to write a viewer-app that can take the plain-text file as input and render any words surrounded with _underscores_ as italics, and any words surrounded with *asterisks* as bold. it's just a different "user agent" interpreting the different markup, and rendering it as called for... i say that based on experience. i've written such an app. and indeed, it's not that difficult to write a converter that will change the underscore _form_of_italics_ into the other [i]form of italics[/i] that uses brackets. it's rather easy to see they are functionally equivalent. the difference between the two forms in the _raw_ file is the underscore form _enhances_ the user's comprehension, while the bracket form [em]obscures[/em] it, and badly... rather than creating 14,000+ new files, with all the work that entails, we can achieve the same end by distributing _one_ viewer-program that utilizes the existing e-texts... -bowerbird From Bowerbird at aol.com Fri Oct 15 11:27:56 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 11:28:10 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <1e.35f0956e.2ea170ac@aol.com> joshua said: > I wanted to add my personal holy grail that > hopefully a TEI-Lite master format will help bring about... > A single master document. i've detailed this on another reply i am writing, which is still in progress, but i'll headline it here: a methodology of a "master document" is worthwhile, but there's no reason that master _must_ be in t.e.i. any form of markup that captures all the information that's deemed necessary can serve capably as a master. > A master document format that is auto-converted to the others > (at an acceptable level) would be wonderful and, imo, > worth a little extra up front effort to prepare it. well, yes, it would be wonderful. and worth the effort. and if it didn't take much extra effort, but instead was intuitive even to untrained volunteers, that would be _really_ special... > If someone could get a working bit of code in place, > I'd be happy to start testing it like crazy and > work on old texts to get it converted to that format. i've already got a "bit of code in place" using z.m.l. -- a.k.a. zen markup language, a.k.a. zero markup language -- but i doubt you're interested in testing it _at_all_, let alone "like crazy", which is quite alright with me, thank you very much, as i don't need _your_ help... nonetheless, the beta-test is open to all, by e-mailing: zml_talk-subscribe@yahoogroups.com -bowerbird From joshua at hutchinson.net Fri Oct 15 11:42:03 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 11:42:10 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <20041015184203.5F882EDED2@ws6-1.us4.outblaze.com> > i've already got a "bit of code in place" using z.m.l. -- > a.k.a. zen markup language, a.k.a. zero markup language > -- but i doubt you're interested in testing it _at_all_, > let alone "like crazy", which is quite alright with me, > thank you very much, as i don't need _your_ help... > > nonetheless, the beta-test is open to all, by e-mailing: > zml_talk-subscribe@yahoogroups.com > If you *ever* actually release any code that people can test, I *will* test it "like crazy." And if, all indications to contrary, you've actually produced something useful, I'll be the first person to eat crow on the public boards. Remember, though, it has to be able to convert from a "master" format to other formats easily and automatically... otherwise, you're just reinventing HTML in your own image. Josh From gbnewby at pglaf.org Fri Oct 15 11:51:18 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Oct 15 11:51:19 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <001d01c4b2e1$8a48a960$6501a8c0@JOEL> References: <28709391.1097853991576.JavaMail.root@wamui05.slb.atl.earthlink.net> <001d01c4b2e1$8a48a960$6501a8c0@JOEL> Message-ID: <20041015185118.GB16361@pglaf.org> On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote: > From: "Dennis McCarthy": > >I once heard a researcher talk about a man he found smelling manuscripts > >in a library. A conversation started where the man explained he was > >trying to trace diseases in European towns. A vinegar spray was > >apparently used at one time as an attempted disinfectant when papers where > >transfered between infected and uninfected areas. You shall never get a > >reproduced smell from microfilm, a page scan, on an e-book. > > > Is that "on an e-book" intended to be "or an e-book." If so, I'm not so > sure about never being able to reproduce the smell. Not that I'm > particularly keen on smelling books, but I've heard of working prototypes > of scent devices activated digitally. I do not know if there is an online source for this story, but have seen a printed copy and believe it is legimate. The story is that old books can develop mold or fungus. Sometimes this can be very light, and it might be between the pages (not just on the cover). Any preservation librarian can verify this fact. The interesting part is that the molds or fungi (or spores) have demonstrated psychoactive properties. In short, sniffing old books can get you high and/or cause hallucination. -- Greg From joshua at hutchinson.net Fri Oct 15 12:34:31 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 12:34:40 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <20041015193431.A293D109A28@ws6-4.us4.outblaze.com> > > The interesting part is that the molds or fungi (or spores) > have demonstrated psychoactive properties. In short, > sniffing old books can get you high and/or cause hallucination. > -- Greg > This explains an awful lot about some of my old English professors ... From nihil_obstat at mindspring.com Fri Oct 15 12:55:00 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Fri Oct 15 12:55:10 2004 Subject: [gutvol-d] Sniffing Books Message-ID: <6187538.1097870101072.JavaMail.root@wamui04.slb.atl.earthlink.net> 1) Correction, read "or an e-book" for "on an e-book." 2) I am pretty sure I heard this on public radio while driving home two or more years ago. The specific topic was second thoughts on an initiative by some libraries to put books on microfilm, then purge the originals from their collection (as a space saving project). Could not tell you the name of the speaker, but he had been an advocate of purging hard copies, and the vinegar incident helped him rethink it. 3) The vinegar idea sounds strange enough that the researcher may have thought it up after sniffing enough mold spores, and actually ended up believing it. -----Original Message----- From: Greg Newby Sent: Oct 15, 2004 2:51 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] I'm sorry but I don't get it... On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote: > From: "Dennis McCarthy": > >I once heard a researcher talk about a man he found smelling manuscripts > >in a library. A conversation started where the man explained he was > >trying to trace diseases in European towns. A vinegar spray was > >apparently used at one time as an attempted disinfectant when papers where > >transfered between infected and uninfected areas. You shall never get a > >reproduced smell from microfilm, a page scan, on an e-book. > > > Is that "on an e-book" intended to be "or an e-book." If so, I'm not so > sure about never being able to reproduce the smell. Not that I'm > particularly keen on smelling books, but I've heard of working prototypes > of scent devices activated digitally. I do not know if there is an online source for this story, but have seen a printed copy and believe it is legimate. The story is that old books can develop mold or fungus. Sometimes this can be very light, and it might be between the pages (not just on the cover). Any preservation librarian can verify this fact. The interesting part is that the molds or fungi (or spores) have demonstrated psychoactive properties. In short, sniffing old books can get you high and/or cause hallucination. -- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From colc at gutenberg.net.au Fri Oct 15 15:02:18 2004 From: colc at gutenberg.net.au (Col Choat) Date: Fri Oct 15 15:04:16 2004 Subject: [gutvol-d] Sniffing Books In-Reply-To: <6187538.1097870101072.JavaMail.root@wamui04.slb.atl.earthlink.net> Message-ID: Don't get too hung up on this one, as I am working on a "virtual aroma emitter". You rub your left ear in a certain way as you read the e-book and can then bring forth mould, vinegar, coffee, new-mown grass or whatever is required by that page to enhance your reading experience. I just have a few technical hitches to overcome. Version 2 will emit the smells without rubbing your ear. It will recognise words like coffee, grass, perfume, roast beef, etc. We will need a black list and a white list, of course. Most of us don't want to experience the actual smells as we are reading about running around the sewers below the streets of Paris. Col Choat -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Dennis McCarthy Sent: Saturday, 16 October 2004 5:55 AM To: Project Gutenberg Volunteer Discussion; Project Gutenberg Volunteer Discussion Subject: [gutvol-d] Sniffing Books 1) Correction, read "or an e-book" for "on an e-book." 2) I am pretty sure I heard this on public radio while driving home two or more years ago. The specific topic was second thoughts on an initiative by some libraries to put books on microfilm, then purge the originals from their collection (as a space saving project). Could not tell you the name of the speaker, but he had been an advocate of purging hard copies, and the vinegar incident helped him rethink it. 3) The vinegar idea sounds strange enough that the researcher may have thought it up after sniffing enough mold spores, and actually ended up believing it. -----Original Message----- From: Greg Newby Sent: Oct 15, 2004 2:51 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] I'm sorry but I don't get it... On Fri, Oct 15, 2004 at 11:05:22AM -0700, Joel A. Erickson wrote: > From: "Dennis McCarthy": > >I once heard a researcher talk about a man he found smelling manuscripts > >in a library. A conversation started where the man explained he was > >trying to trace diseases in European towns. A vinegar spray was > >apparently used at one time as an attempted disinfectant when papers where > >transfered between infected and uninfected areas. You shall never get a > >reproduced smell from microfilm, a page scan, on an e-book. > > > Is that "on an e-book" intended to be "or an e-book." If so, I'm not so > sure about never being able to reproduce the smell. Not that I'm > particularly keen on smelling books, but I've heard of working prototypes > of scent devices activated digitally. I do not know if there is an online source for this story, but have seen a printed copy and believe it is legimate. The story is that old books can develop mold or fungus. Sometimes this can be very light, and it might be between the pages (not just on the cover). Any preservation librarian can verify this fact. The interesting part is that the molds or fungi (or spores) have demonstrated psychoactive properties. In short, sniffing old books can get you high and/or cause hallucination. -- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d --------------------------- Dennis McCarthy nihil_obstat@mindspring.com _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From ian at babcockbrown.com Fri Oct 15 15:28:57 2004 From: ian at babcockbrown.com (Ian Stoba) Date: Fri Oct 15 15:27:02 2004 Subject: [gutvol-d] Sniffing Books In-Reply-To: References: Message-ID: <9A993116-1EF9-11D9-A9B2-003065D6440E@babcockbrown.com> Isn't there a way we could incorporate this into the XML markup? Given the recent discussions on this list, this option might appeal to both groups: 1. Those who are attracted to markup: 2. Those who think XML smells like.... Sorry, I couldn't help it. --Ian On Oct 15, 2004, at 3:02 PM, Col Choat wrote: > Don't get too hung up on this one, as I am working on a "virtual aroma > emitter". You rub your left ear in a certain way as you read the > e-book and > can then bring forth mould, vinegar, coffee, new-mown grass or > whatever is > required by that page to enhance your reading experience. I just have > a few > technical hitches to overcome. Version 2 will emit the smells without > rubbing your ear. It will recognise words like coffee, grass, perfume, > roast > beef, etc. We will need a black list and a white list, of course. Most > of us > don't want to experience the actual smells as we are reading about > running > around the sewers below the streets of Paris. > From gbnewby at pglaf.org Fri Oct 15 16:28:20 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Oct 15 16:28:21 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <004801c4b2c6$ca0042e0$c99495ce@net> References: <13026847.1097846036718.JavaMail.root@wamui07.slb.atl.earthlink.net> <004801c4b2c6$ca0042e0$c99495ce@net> Message-ID: <20041015232820.GC22068@pglaf.org> On Fri, Oct 15, 2004 at 10:53:39AM -0400, Norm Wolcott wrote: > But PG has adopted standards which limit the range of tags and CSS you can > use, so you may not be able to specify changes in background color or font, > such as Alice in Wonderland. Some contributors put their HTML elsewhere, > perhaps for this reason. Bad news. A slight correction: it's true that if you submit HTML files, it's likely for CSS, bgcolors and other stuff to be stripped out. Part of this is our automated "add a header" programs. Part is a desire to let the HTML be fairly generic. But if you have an eBook that you'd really like to be displayed with particular colors, fonts, etc., just ask. The only real "standard" is that we strongly desire valid HTML (per http://validator.w3.org). The rest is processing, programs and procedures, which might have the same impact as a standard sometimes, but should not be mistaken for one. As MH likes to say, we're pretty well willing to try almost anything, at least in small quantities. Just ask. -- Greg From Bowerbird at aol.com Fri Oct 15 17:13:17 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 17:13:36 2004 Subject: [gutvol-d] responses in the hopper Message-ID: <155.40f4f729.2ea1c19d@aol.com> i've got responses in the hopper to jeroen, jon, stephen, karl, and scott. (being a troll is hard work, but somebody's gotta do it; and god gave me the looks.) ;+) but i'll save all those until monday, give everyone some time to reflect, rest, maybe write their own post... but my bottom-line summary is this: 1) z.m.l. is a fantastic format for users, in the hands of an intelligent viewer-app; 2) z.m.l. can be a great master-format, as it's easy to create and maintain; and 3) though z.m.l. will create other formats, people will prefer z.m.l., due to the viewer. doubt it? then join the beta-test, and tear my little baby to shreds... zml_talk-subscribe@yahoogroups.com oh, i wrote a reply to josh too. that one i'll save until tuesday... or wednesday... or next month... :+) just one thing before i go, so i can give stephen the weekend to get a head-start... stephen said: > In a plain text file, we do make some effort to > distinguish different elements of a work: > quotations are indented, headings in UPPER CASE and > centered, etc. But any kind of complexity in the work > tends quickly to make that unworkable. my findings are that you are _incorrect_ in that assessment. i don't believe you can show me many e-texts from the library that i cannot format unequivocally using zen markup language. the figure i usually give is 3%, which now is 420+ e-texts, but i'll be surprised if the number you can find gets that high. frankly, i don't think you'll be able to find more than a few... but you are welcome to try... dig up a list (of 20-40?) e-texts from the library that you think can not be handled with my z.m.l., and i'll take a look at them and see if you're right... give it your best shot... (if anyone wants to help stephen out with some pointers to some particularly difficult e-texts, send him a backchannel!) *** have a nice weekend, everyone! -bowerbird p.s. i'm still wondering if name-calling and personal attacks are condoned here... From Gutenberg9443 at aol.com Fri Oct 15 17:42:47 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Oct 15 17:43:06 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: In a message dated 10/15/2004 12:51:30 PM Mountain Standard Time, gbnewby@pglaf.org writes: In short, sniffing old books can get you high and/or cause hallucination. And if you have asthma . . . I'm trying to work (my own, not PGLAF's) on copying some 1930s translations of Ancient Egyptian medical textbooks. I need them for a book I'm writing. Egad! They are killing me! But just in case anyone wonders, the Egyptians by about 2500 to 3000 BCE had medicine at a height it wouldn't reach again until the late 19th-early 20th centuries. Imhotep was both the architect for the Great Pyramid AND the author of the first surgical textbook known to have existed. He got a lot of practice by studying people who had been injured at the construction sight, but he clearly also accompanied an army into battle at least once, because he also describes battlefield injuries. I would scan them for PGLAF before sending them back to the universities they came from, but my scanner program has indigestion. It's glad to photocopy, but if I ask it to scan and save it lies down and turns up its little curly toes, insisting it HAS saved when it patently has not. Also, one of the books is somewhat bigger than my scanner. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/d1537f4d/attachment.html From Gutenberg9443 at aol.com Fri Oct 15 17:47:22 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Oct 15 17:47:39 2004 Subject: [gutvol-d] Sniffing Books Message-ID: <1d8.2d75eafe.2ea1c99a@aol.com> In a message dated 10/15/2004 4:04:24 PM Mountain Standard Time, colc@gutenberg.net.au writes: >>Don't get too hung up on this one, as I am working on a >>"virtual aroma >>emitter". You rub your left ear in a certain way as you >>read the e-book and >>can then bring forth mould, vinegar, coffee, new-mown >>grass or whatever is >>required by that page to enhance your reading >>experience. I just have a few >>technical hitches to overcome. Version 2 will emit the >>smells without >>rubbing your ear. It will recognise words like coffee, >>grass, perfume, roast >>beef, etc. We will need a black list and a white list, of >>course. Most of us >>don't want to experience the actual smells as we are >>reading about running >>around the sewers below the streets of Paris. Aha! So YOU'RE the one who has been sneaking into Snapes's study abducting his potion ingredients! Doggone it, you KNOW Harry and Ron and Hermione got in trouble over it! Apologize and admit your guilt. (Gad. That sounds like a line from THE LAST EMPEROR or TO LIVE, doesn't it!) Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/3bcaedf4/attachment-0001.html From Gutenberg9443 at aol.com Fri Oct 15 17:49:52 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Oct 15 17:50:09 2004 Subject: [gutvol-d] responses in the hopper Message-ID: In a message dated 10/15/2004 6:13:45 PM Mountain Standard Time, Bowerbird@aol.com writes: >>p.s. i'm still wondering if name-calling >>and personal attacks are condoned here... Only if done with a wink. Like this-- ;-) Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041015/b5f63e8f/attachment.html From foundation3 at softhome.net Fri Oct 15 18:17:33 2004 From: foundation3 at softhome.net (Craig Morehouse) Date: Fri Oct 15 18:18:19 2004 Subject: [gutvol-d] responses in the hopper In-Reply-To: <155.40f4f729.2ea1c19d@aol.com> References: <155.40f4f729.2ea1c19d@aol.com> Message-ID: <1097889453.2641.2.camel@localhost> On Fri, 2004-10-15 at 20:13, Bowerbird@aol.com wrote: [snip] > > have a nice weekend, everyone! > Thanks. It should be fun. No hurricanes on the Mid-Florida agenda this week. > -bowerbird > > p.s. i'm still wondering if name-calling > and personal attacks are condoned here... > ___ No, they're not, you poofing friggleschnitz! ;-) > ____________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- The fact that no one understands you doesn't mean you're an artist. From shalesller at writeme.com Fri Oct 15 16:24:52 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Oct 15 19:14:08 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <20041015232452.18E1D4BDAB@ws1-1.us4.outblaze.com> > and indeed, it's not that difficult to write a converter > that will change the underscore _form_of_italics_ > into the other [i]form of italics[/i] that uses brackets. > it's rather easy to see they are functionally equivalent. Which is odd, because they aren't. How do you convert _th_operat_er_ to brackets? Is it [i]th[/i] operate [i]er[/i], or [i]th[/i]operate[i]er[/i] (which would be unsurprising in some of the Middle English editions I've been scanning)? Once again, you're going to blow this off as irrelevant, aren't you. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From stephen.thomas at adelaide.edu.au Fri Oct 15 20:16:15 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Fri Oct 15 20:16:37 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> Message-ID: <4170927F.8040902@adelaide.edu.au> Actually, I took bowerbird's plea to mean simply that he wanted *some* indication in a plain text version of where images appeared, and which image was used. E.g.: text text [image: xyz.gif] text text This would not seem to be too much to ask, and I think Lynx will do this if you use the -dump option to save HTML as plain text. Incidentally, at what point did the world decide to indicate italics by placing underscores before and after text? The canonical usage used to be to use a forward slash (solidus), / to indicate italics, and underscores to indicate underlining. (And * to indicate bold.) Steve Joshua Hutchinson wrote: > > >>all of this is just to resubmit a plea that i have made before >>(and will _continue_ making until i get a positive response!) >>for information about the name and location of graphic-files >>to be included in the _plain-text_ versions of the e-texts... > > > And now it is no longer a plain-text file. With the added penalty of having no existing validators to make sure that the markup used is done correctly. > > Basically, you're reinventing the wheel for no purpose here. > > Josh > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From joshua at hutchinson.net Fri Oct 15 20:32:42 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 15 20:32:35 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <4170927F.8040902@adelaide.edu.au> References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> <4170927F.8040902@adelaide.edu.au> Message-ID: <4170965A.5020706@hutchinson.net> Steve Thomas wrote: > Actually, I took bowerbird's plea to mean simply that he wanted *some* > indication in a plain text version of where images appeared, and which > image was used. E.g.: > > text > text > > [image: xyz.gif] > > text > text > > This would not seem to be too much to ask, and I think Lynx will do > this if you use the -dump option to save HTML as plain text. > Sure, it isn't hard. But neither is HTML markup for the same thing. [img src="xyz.gif"] And this has the added benefit of being parse-able (is that a word?) by a whole slew of already existing validators, link-checkers, editors, etc. It isn't that what bowerbird proposes is impossible to do. It's that it's already been done, and arguably better than he'll do it, simply because there are YEARS of development and more people than I can count behind HTML. ZML has bowerbird and nothing (so far) to show for it. The bi problem with how bowerbird defines plain text (basically, ASCII letters with some markup for italics, bold, images, etc) also fits HTML/XML. But he claims that HTML/XML don't qualify as plain text. What exactly is different about them? Complexity? Hardly. HTML can be QUITE simple. If all you want is to mark italics, bold, and images, HTML is ridiculously simple. Josh From shalesller at writeme.com Fri Oct 15 19:51:53 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Oct 15 20:51:54 2004 Subject: [gutvol-d] responses in the hopper Message-ID: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com> Bowerbird@aol.com writes: > 3) though z.m.l. will create other formats, > people will prefer z.m.l., due to the viewer. Only those people who will install a viewer to read Project Gutenberg books, probably a small percentage of those who visit Project Gutenberg. > doubt it? then join the beta-test, > and tear my little baby to shreds... You ignore my critiques, so I'd rather not waste my time writing them. Furthermore, I run Un*x, not Windows. > i don't believe you can show me many e-texts from the library > that i cannot format unequivocally using zen markup language. > the figure i usually give is 3%, which now is 420+ e-texts, > but i'll be surprised if the number you can find gets that high. > frankly, i don't think you'll be able to find more than a few... Last time I checked, you only supported ASCII, so we can toss all our non-English texts in there. "Selections from the Writings of Lord Dunsany" provides a great example of an English book where you can't just magically add the accents in from a list of accented words, because the accents are in the character's names, plus anything that has Greek, or anything discussing Eastern Europe, or translations of the Sanskrit holy works, etc. Another fact is that Project Gutenberg's books were only ASCII plain text for a long time, and still are for a large part. Thus people doing work for PG did books that could be done well in ASCII plain text. Of course, you can handle 97% today, but DP does more and more stuff that isn't just novels with purely linear text. And last, I still object to this standard. We wouldn't build a building that 3% of the people couldn't enter. We shouldn't even consider standardizing on a system that can't handle 3% of the books we do. 420 books is a lot of books, and even if it stays three percent, it will only get larger. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Fri Oct 15 21:43:38 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 21:43:59 2004 Subject: [gutvol-d] responses in the hopper Message-ID: anne said: > Only if done with a wink. Like this-- ;-) wonderful! ;+) -bowerbird From hacker at gnu-designs.com Fri Oct 15 21:52:35 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Fri Oct 15 21:53:35 2004 Subject: [gutvol-d] responses in the hopper In-Reply-To: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com> References: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com> Message-ID: >> 3) though z.m.l. will create other formats, people will prefer z.m.l., >> due to the viewer. > Only those people who will install a viewer to read Project Gutenberg > books, probably a small percentage of those who visit Project Gutenberg. I've been lurking, but I'm a long-time contributor to these and similar efforts. I really don't think any of this bickering is very productive. Can we all just begin turning our attention to a common goal, instead of arguing about who has the better Acme Widget this week? I've got some comments to lend to the discussion, but I've refrained, because I see how some of the seemingly-neutral comments are taken, and responded to. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From stephen.thomas at adelaide.edu.au Fri Oct 15 21:53:59 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Fri Oct 15 21:54:20 2004 Subject: [gutvol-d] I'm sorry but I don't get it... In-Reply-To: <4170965A.5020706@hutchinson.net> References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> <4170927F.8040902@adelaide.edu.au> <4170965A.5020706@hutchinson.net> Message-ID: <4170A967.70507@adelaide.edu.au> Josh, I can't argue with you -- I've spent years marking up plain text into HTML, because I believe that HTML provides a superior ebook to plain text. (Others may feel free to disagree -- just don't tell me about it.) But PG seems wedded to the idea that there must always be a plain text version, and if we're going to create a plain text from an HTML with images, then where's the problem with retaining at least the location of the images in the plain text? Steve Joshua Hutchinson wrote: > Steve Thomas wrote: > >> Actually, I took bowerbird's plea to mean simply that he wanted *some* >> indication in a plain text version of where images appeared, and which >> image was used. E.g.: >> >> text >> text >> >> [image: xyz.gif] >> >> text >> text >> >> This would not seem to be too much to ask, and I think Lynx will do >> this if you use the -dump option to save HTML as plain text. >> > Sure, it isn't hard. But neither is HTML markup for the same thing. > > [img src="xyz.gif"] > > And this has the added benefit of being parse-able (is that a word?) by > a whole slew of already existing validators, link-checkers, editors, etc. > > It isn't that what bowerbird proposes is impossible to do. It's that > it's already been done, and arguably better than he'll do it, simply > because there are YEARS of development and more people than I can count > behind HTML. ZML has bowerbird and nothing (so far) to show for it. > > The bi problem with how bowerbird defines plain text (basically, ASCII > letters with some markup for italics, bold, images, etc) also fits > HTML/XML. But he claims that HTML/XML don't qualify as plain text. > What exactly is different about them? Complexity? Hardly. HTML can be > QUITE simple. If all you want is to mark italics, bold, and images, > HTML is ridiculously simple. > > Josh > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From traverso at dm.unipi.it Fri Oct 15 22:15:37 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Oct 15 22:15:59 2004 Subject: [gutvol-d] Fw: [PG-EU] Daisy and Gutenberg References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> <4170927F.8040902@adelaide.edu.au> <4170965A.5020706@hutchinson.net> <4170A967.70507@adelaide.edu.au> Message-ID: <200410160515.i9G5FbF7004572@posso.dm.unipi.it> I forward this message, appeared on PG-EU: Carlo ------------------------------------------------------------------------ From: "Branko Collin" To: pg-eu@vrijschrift.org Priority: normal X-Virus-Scanned: by XS4ALL Virus Scanner Subject: [PG-EU] Daisy and Gutenberg Sender: pg-eu-admin@vrijschrift.org Reply-To: pg-eu@vrijschrift.org Date: Sat, 16 Oct 2004 01:09:34 +0200 X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on posso.dm.unipi.it X-Spam-Level: X-Spam-Status: No, hits=-4.8 required=2.5 tests=AWL,BAYES_00 autolearn=ham version=2.64 The following is more of a gutvol-d subject, but I am no longer subscribed there, so I'll post it here. Vrijschrift was kind enough to get a seat reserved for a Project Gutenberg volunteer at the Symposium for Alternative Models for Copyright, and Wiebe gave me that seat. At the symposium, drs. Maarten Verboom of FNB (, subtitle: "literature and information for people with a reading disability") gave a talk, which I did not get to hear, because I had to go and visit a prospective customer. However, afterwards I did chat with his colleague, Arne Leeman. I asked him whether they knew of Project Gutenberg ("yes, definitely"), whether they are using our texts ("Yes", though not many), and whether there are things we could do to make things easier for them. Actually, there is, and that is publishing books in Daisy (an XML application specifically geared to speaking books). I told him that that may not be the XML standard we will be ending up with, to which he replied that any XML would be better than plain vanilla text, especially richer mark-up, because it could make texts easier to convert to Daisy. I also invited FNB to send us requests for public domain books they like to see digitized. (I seem to remember that there are content providers at DP who take requests.) -- branko collin collin@xs4all.nl _______________________________________________ PG-EU mailing list PG-EU@vrijschrift.org http://mailman.vrijschrift.nl/listinfo/pg-eu From Bowerbird at aol.com Fri Oct 15 22:37:09 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 15 22:37:33 2004 Subject: [gutvol-d] I'm sorry but I don't get it... Message-ID: <1dc.2e23ae00.2ea20d85@aol.com> david said: > Once again, you're going to blow this off as irrelevant, aren't you. no, i'm going to tell you to join the beta-testing listserve, where questions such as this can be raised and answered... :+) have a nice weekend, mr. starner... ;+) -bowerbird From tb at baechler.net Fri Oct 15 23:07:30 2004 From: tb at baechler.net (Tony Baechler) Date: Fri Oct 15 23:06:43 2004 Subject: [gutvol-d] Fw: [PG-EU] Daisy and Gutenberg In-Reply-To: <200410160515.i9G5FbF7004572@posso.dm.unipi.it> References: <20041015173921.527C3EDBEB@ws6-1.us4.outblaze.com> <4170927F.8040902@adelaide.edu.au> <4170965A.5020706@hutchinson.net> <4170A967.70507@adelaide.edu.au> Message-ID: <5.2.0.9.0.20041015230622.01fbb910@snoopy2.trkhosting.com> Hello. You can find more information about DAISY including conversion tools below. I think that the html books could be converted fairly easily since they include headings already. http://www.daisy.org/ I think there is a free converter but I'm not sure. I know most of them are commercial. From colc at gutenberg.net.au Sat Oct 16 00:44:13 2004 From: colc at gutenberg.net.au (Col Choat) Date: Sat Oct 16 00:46:19 2004 Subject: [gutvol-d] Sniffing Books In-Reply-To: <1d8.2d75eafe.2ea1c99a@aol.com> Message-ID: Wow, you guys really have helped me overcome the glitches. I knew that I needed two magic words to jolt the emitter into life and, while I certainly wasn't the one abducting Snap'es ingredients, once Harry's name was brought up I KNEW that one of the words just HAD to be 'Expelliarmus!'. I would never have guessed that 'XML' was the other word, but it was. I am sitting here now, reading the recently posted 'Thrilling Stories Of The Ocean', by Marmaduke Park, and the smell of the sea is wafting about me. It looks like xml IS a magic bullet after all. I just wonder if it can help me to create a gentle breeze to flutter the curtains a little. Or would that be asking too much? -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com Sent: Saturday, 16 October 2004 10:47 AM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Sniffing Books In a message dated 10/15/2004 4:04:24 PM Mountain Standard Time, colc@gutenberg.net.au writes: >>Don't get too hung up on this one, as I am working on a >>"virtual aroma >>emitter". You rub your left ear in a certain way as you >>read the e-book and >>can then bring forth mould, vinegar, coffee, new-mown >>grass or whatever is >>required by that page to enhance your reading >>experience. I just have a few >>technical hitches to overcome. Version 2 will emit the >>smells without >>rubbing your ear. It will recognise words like coffee, >>grass, perfume, roast >>beef, etc. We will need a black list and a white list, of >>course. Most of us >>don't want to experience the actual smells as we are >>reading about running >>around the sewers below the streets of Paris. Aha! So YOU'RE the one who has been sneaking into Snapes's study abducting his potion ingredients! Doggone it, you KNOW Harry and Ron and Hermione got in trouble over it! Apologize and admit your guilt. (Gad. That sounds like a line from THE LAST EMPEROR or TO LIVE, doesn't it!) Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041016/79f5523b/attachment.html From Bowerbird at aol.com Sat Oct 16 01:14:39 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Oct 16 01:15:05 2004 Subject: [gutvol-d] responses in the hopper Message-ID: <1db.2ce31fcb.2ea2326f@aol.com> hacker@gnu-designs.com > I've been lurking, but I'm a long-time contributor > to these and similar efforts. are you the "hacker" who did the 9/11 report, and was talking about doing wikipedia too? -bowerbird From shalesller at writeme.com Sat Oct 16 12:42:13 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Oct 16 12:42:22 2004 Subject: [gutvol-d] responses in the hopper Message-ID: <20041016194213.A60C24BDAB@ws1-1.us4.outblaze.com> "David A. Desrosiers" writes: > I've got some comments to lend to the discussion, but I've > refrained, because I see how some of the seemingly-neutral comments are > taken, and responded to. "seemingly" is an important word there. Bowerbird has started several flame wars here, and most of us have given up any hope of seeing anything productive out of him. Trust us, we're a lot more open to people who haven't repeatedly told us "I could help you, but I won't, because you're not worth it." (For one exact quote, try: > i've written the routines to do this (and other things), so i have > supreme confidence that the investment would be worthwhile. > > (if i hadn't been treated so badly by some people here, i would > be happy to give you the routines. but cheer up, they are _not_ > that difficult to write.) -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hacker at gnu-designs.com Sat Oct 16 21:12:29 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Oct 16 21:13:34 2004 Subject: [gutvol-d] responses in the hopper In-Reply-To: <1db.2ce31fcb.2ea2326f@aol.com> References: <1db.2ce31fcb.2ea2326f@aol.com> Message-ID: > are you the "hacker" who did the 9/11 report, and was talking about > doing wikipedia too? One and the same, yes. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Sat Oct 16 21:13:16 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Sat Oct 16 21:14:34 2004 Subject: [gutvol-d] responses in the hopper In-Reply-To: References: <20041016025153.EDB874BDAB@ws1-1.us4.outblaze.com> Message-ID: > All that I've learned from these "discussions" is that a number of > people have a number of various ideas about mark-up (myself > included), and there's no way to make everybody happy. ...much like bringing 5 friends into a video store, and trying to agree on one movie for everyone to watch. Not going to happen ;) That being said, I'd be interested in seeing a list of the tools people know of, or are working on, or have worked with in the past, that can be used to take a 7-bit ascii text PG work, and convert it into other formats. Like you, I have some ideas of my own (as well as some tools I've rolled myself to help), and I'd like to see what everyone else is using right now. A quick google for my full name (with initial and in quotes) will tell you exactly why I'm interested in this exact topic of discussion ;) David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Mon Oct 18 12:54:18 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 18 12:54:34 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <6d.35c457ea.2ea5796a@aol.com> craig said: > you poofing friggleschnitz! mommy, mommy, craig is starting a flamewar! he called me a poofing friggleschnitz, mommy! i demand that he be banned! ;+) *** david said: > One and the same, yes. great! :+) and you did a little bit of work on... what was it?... plucker, right? ;+) if you are prepared to tackle the i.m.d.b., 15,000 e-texts should be a piece of cake... > ...much like bringing 5 friends into a video store, > and trying to agree on one movie for everyone > to watch. Not going to happen ;) i wish things were that inconsequential. but the project gutenberg e-texts make up the most important e-library, historically. although even now it's starting to be dwarfed by other efforts, it would be a proper tribute to michael if it were to be well-maintained... > That being said, I'd be interested in seeing > a list of the tools people know of, or are > working on, or have worked with in the past, > that can be used to take a 7-bit ascii text > PG work, and convert it into other formats. "convert" is a rather loose and unspecific word, wouldn't you say? :+) nonetheless, i'll cut to the chase... the main problem with the e-texts is their formatting is _so_ inconsistent. so before you can do anything useful with them, you must write routines that can resolve their inconsistency. the inconsistency is very maddening, because it's so pointless. although some is understandable, considering how many hands created the e-texts, the sadder truth is that much of it could have been prevented; however, mr. newby and company simply fail to grasp the negative consequences of the inconsistency, and thus never made it their priority to minimize it. the good news is you _can_ write routines that will fix the problem. it is _not_ impossible, just thorny; the biggest expenditure of time is a quality-control check to make sure that you knew every inconsistency. their variety will amaze and astound. subsequent conversion to any format is straightforward once you have done the job of resolving the inconsistency. you don't even have to do that job, if you don't want to, you can just go to david moynihan at blackmask and get his files, as he has edited out almost all the inconsistency, which is what then allowed him to make a half-dozen versions of most e-texts in the entire library. if you're looking for explicit info, ron burkey did a converter called "gutenmark", and his website at http://www.sandroid.org/gutenmark does a good job of documenting the inconsistency he faced on the way, before he gave up the effort, saying: > the more perfect my > automated conversions became, > the farther (in my own mind) > I seemed to be from > having a perfect conversion. i think that's a nice way of saying that the more he learned about the e-texts, the more he found out how bad they are, from the standpoint of consistency... there is also some basic information at: palmdigitalmedia.com/dropbook/converting but i'd guess that at this point in time, moynihan will have the most expertise about the problems you would be facing. much of it might be inside his noggin, but i do know he has a _lot_ of macros that undoubtedly embed gobs of wisdom. and, more to the point, david has shown, incontrovertibly, that mass conversions to a plethora of formats is fully possible. recently, david even _offered_ his files to project gutenberg, but -- as far as i know -- his gift was spurned, for some bizarre reason i'll never be able to grasp. oh yeah, i've written some routines that squash out most of the inconsistency, and there's a way you could pry 'em out of me -- namely, if you got support for my z.m.l. (zen markup language) built into plucker. it's a simple rule-set; you could probably have it up-and-running in a couple days... backchannel me if you're interested. :+) once you've vanquished the inconsistency, there are other concerns, which might or might not be a problem to you, including: 1. errors in the e-texts, lots of them. 2. styling lost or converted to all-caps. 3. information about images discarded. 4. image filenames are often not unique. 5. accents lost in many foreign e-texts. 6. a confusing redundancy of some books. 7. attacks levied if you reveal problems. oh yeah, also make sure that you are always working with the freshest e-texts available, as i'm not sure if they make an announcement whenever they make corrections to an e-text; they just quietly substitute in the new file... *** i would welcome you here, but i am on my way out the door _very_ soon... :+) there are a handful of tarking naugshlocks here _so_ unworthy of my help they made me decide to decline to do any work for project gutenberg, in spite of its great historical importance and my highest regard for the genius of michael hart. i'm sure others, like you, will cover my absence, while i will be happy grazing greener pastures... at any rate, have a nice day... ;+) -bowerbird From joshua at hutchinson.net Mon Oct 18 13:23:58 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 18 13:24:07 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <20041018202358.BA3CC9E93C@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > i would welcome you here, but i am > on my way out the door _very_ soon... :+) Don't tease me. Josh From hacker at gnu-designs.com Mon Oct 18 13:34:59 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Mon Oct 18 13:35:38 2004 Subject: [gutvol-d] re: poofing and tarking In-Reply-To: <6d.35c457ea.2ea5796a@aol.com> References: <6d.35c457ea.2ea5796a@aol.com> Message-ID: > and you did a little bit of work on... what was it?... plucker, > right? ;+) Quite a bit more than "a little", but yes, thats me. > if you are prepared to tackle the i.m.d.b., 15,000 e-texts should be > a piece of cake... Yep, once the structures are laid out for the classifications of works covered by Gutenberg. A good bulk of this has already been done by hundreds of contributors over the years. > although even now it's starting to be dwarfed by other efforts, it > would be a proper tribute to michael if it were to be > well-maintained... What other efforts are you alluding to? Why not help those people who insist on reinventing a fleet of new wheels, to collaborate with existing projects that have similar/same goals? > "convert" is a rather loose and unspecific word, wouldn't you say? > :+) Yes, and specifically chosen for that reason. Gutenberg etexts are nonspecific, and "converting" them means taking a slightly different approach, depending on what I'm converting; poems, plays, books, etc. for each work. You can't use a single rigid approach for all works. > the main problem with the e-texts is their formatting is _so_ > inconsistent. so before you can do anything useful with them, you > must write routines that can resolve their inconsistency. And this is exactly what the Distributed Proofreaders project proposes to solve, and they've been pretty successful thus far, IIRC. > the inconsistency is very maddening, because it's so pointless. > although some is understandable, considering how many hands created > the e-texts, the sadder truth is that much of it could have been > prevented; however, mr. newby and company simply fail to grasp the > negative consequences of the inconsistency, and thus never made it > their priority to minimize it. I've had a lot of luck stepping out of the box, and analyzing the text based on the "style" of the text, versus the actual content itself. I was approached by someone who is doing a paper and his PhD thesis on this exact kind of approach. Basically (with my expertise and help) he's taking the bulk of Gutenberg, importing every word from every work into a database, and then running his own algorithms across the entire collection, to pull out the styles by known authors. For example, with his approach, you can determine that a work claiming to be by "A. Einstein", is the same author as one claiming to be by "Albert Einstein", (S. Clemens -> Mark Twain -> Samuel Clemens, etc.) From there, you can then begin correcting the inaccuracies in the titling, authoring, and inflection of the work itself, including basic things like sentence structure, spelling, and so on. I've extended the schema quite a bit to allow some interesting other queries to be run ("Show me all works larger than 100 pages, written by male authors between the years 1951 to 1957"). With that done, it is a (relatively) simple matter to convert the 7-bit ascii text to something more manageable, such as structured XML + an associated DTD to turn that into something else. > the good news is you _can_ write routines that will fix the problem. > it is _not_ impossible, just thorny; the biggest expenditure of time > is a quality-control check to make sure that you knew every > inconsistency. their variety will amaze and astound. And I assume you've done this? And your routines are made public somewhere, so others can improve and correct them to continue to be better? I don't recall seeing a URL to download your code or routines. Can you reply back with that, so we can take a look? > you don't even have to do that job, if you don't want to, you can > just go to david moynihan at blackmask and get his files, as he has > edited out almost all the inconsistency, which is what then allowed > him to make a half-dozen versions of most e-texts in the entire > library. And where is his code? Where are his "routines"? I don't see them on his site at all. I'll send him an email later this week to see if he wants to contribute those all back. ALL of the talk of how "easy" this is, is completely irrelevant, if nobody wants to actually contribute that knowledge back so others can improve and benefit from it. If you're not willing to do this, then our conversation stops here. There is no point in continuing the discussion, if you intend on trying to retain "control" of this kind of logic within your own circle of projects. > if you're looking for explicit info, ron burkey did a converter > called "gutenmark", and his website at > http://www.sandroid.org/gutenmark does a good job of documenting the > inconsistency he faced on the way, before he gave up the effort, > saying: I've talked to Ron before via email, and described some of my needs for improvements to his tool. He's no longer maintaining it, so it is up to me (if I choose) to update his code and improve it further. > recently, david even _offered_ his files to project gutenberg, but > -- as far as i know -- his gift was spurned, for some bizarre reason > i'll never be able to grasp. What was that "bizarre reason"? Is he still on this list? Did anyone else obtain his code? Does it exist out there for download? > oh yeah, i've written some routines that squash out most of the > inconsistency, and there's a way you could pry 'em out of me -- > namely, if you got support for my z.m.l. (zen markup language) built > into plucker. it's a simple rule-set; you could probably have it > up-and-running in a couple days... backchannel me if you're > interested. :+) Not interested. Our code is freely available. If you want someone to support "your" format, then you'll probably have to take that first step by justifying and documenting it. The only page I could find describing the format was here: http://czt.sourceforge.net/zml/ And I assume thats not your project or code. If it is anything different than HTML, it would require significant re-engineering of the core parser components used in Plucker and a lot of testing to make sure it didn't break anything in the existing parser in the process. In other words, not a couple-of-days of effort as you suggest. > once you've vanquished the inconsistency, there are other concerns, > which might or might not be a problem to you, including: > 1. errors in the e-texts, lots of them. What kind of errors? Incorrect hyphens? Broken paragraphs? Missing end quotes? (this is common) > 2. styling lost or converted to all-caps. Impossible to regain, unless you have the original work in-hand, to see if there were actual CAPS used, or not. Maybe the "errors" were intentional. Many authors use poetic license to express their thoughts, and sometimes those things break the rules of grammar and spelling. > 3. information about images discarded. Same, see above. > 4. image filenames are often not unique. How do you mean? You mean 1.jpg 1.jpg 1.jpg appearing in three places, but intended to represent 3 _different_ images? Where do you see this inconsistency? Give me an example of a Gutenberg work that shows this. I'd like to verify it for myself. > 5. accents lost in many foreign e-texts. Seems to be a problem with the auditor/editor's charset or support for those charsets in their editor. I agree that the original nature and charset of the document should be retained. How do you express a Cyrillic text in 7-bit ascii? You can't. > 6. a confusing redundancy of some books. Such as? > 7. attacks levied if you reveal problems. Are you revealing the "problems" in a condescending way? Or in a constructive way? The way you approach the "Hey, this is broke" process is very telling as to how you will be received and responded to for same. > oh yeah, also make sure that you are always working with the > freshest e-texts available, as i'm not sure if they make an > announcement whenever they make corrections to an e-text; they just > quietly substitute in the new file... ..which is exactly why you should have your own mirror of Gutenberg, or a subset of it as you work on the pieces. > there are a handful of tarking naugshlocks here _so_ unworthy of my > help they made me decide to decline to do any work for project > gutenberg, in spite of its great historical importance and my > highest regard for the genius of michael hart. i'm sure others, like > you, will cover my absence, while i will be happy grazing greener > pastures... If you are "moving on", then it behooves you to try to contribute what you've learned (in terms of knowledge, code, or "routines") back to those who will continue to contribute and learn. We're only here to help the next generation learn and improve. If we're not leaving anything here by which others can remember us and grow themselves; if we're not teaching others as we learn ourselves, then what is the point? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Gutenberg9443 at aol.com Mon Oct 18 14:11:12 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Oct 18 14:11:35 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <1b9.3df0833.2ea58b70@aol.com> In a message dated 10/18/2004 1:54:55 PM Mountain Standard Time, Bowerbird@aol.com writes: but -- as far as i know -- his gift was spurned, for some bizarre reason i'll never be able to grasp. I don't know why it was spurned, but I do know that he's posted a lot of stuff that is still in copyright and sooner or later lawyers are going to eat his lunch. As long as nobody could figure out who owned the copyright, that was okay, but now that courts have ruled on the owner, it's another matter. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041018/da89e74e/attachment.html From Gutenberg9443 at aol.com Mon Oct 18 14:12:35 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Oct 18 14:12:51 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <6.35ff8a75.2ea58bc3@aol.com> In a message dated 10/18/2004 1:54:55 PM Mountain Standard Time, Bowerbird@aol.com writes: there are a handful of tarking naugshlocks here _so_ unworthy of my help they made me decide to decline to do any work for project gutenberg, in spite of its great historical importance and my highest regard for the genius of michael hart. i'm sure others, like you, will cover my absence, while i will be happy grazing greener pastures... Why can't you just vanish, if such is your preference, without being obnoxious about it? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041018/08c63aa8/attachment.html From joshua at hutchinson.net Mon Oct 18 14:28:21 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 18 14:28:31 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <20041018212821.A66BB4F583@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] re: poofing and tarking Date: Mon, 18 Oct 2004 16:34:59 -0400 (EDT) > > I've had a lot of luck stepping out of the box, and analyzing > the text based on the "style" of the text, versus the actual content > itself. I was approached by someone who is doing a paper and his PhD > thesis on this exact kind of approach. Basically (with my expertise > and help) he's taking the bulk of Gutenberg, importing every word from > every work into a database, and then running his own algorithms across > the entire collection, to pull out the styles by known authors. You've really intrigued me by your description of what you're working on. Is there anywhere I can read up more on it? Sounds very promising. Josh From Bowerbird at aol.com Mon Oct 18 15:19:17 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 18 15:19:34 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <193.313f28ef.2ea59b65@aol.com> anne said: > I don't know why it was spurned, but > I do know that he's posted a lot of > stuff that is still in copyright and > sooner or later lawyers are going to > eat his lunch. As long as nobody could > figure out who owned the copyright, > that was okay, but now that courts have > ruled on the owner, it's another matter. i don't know anything about that. i'm very supportive of people who will have the guts to publish something and take a chance at being dragged into court, if they are making that thing _available_ when it was an orphan out of circulation. but again, i don't know about blackmask, so i don't know if that applies, or not... -bowerbird From Bowerbird at aol.com Mon Oct 18 15:54:29 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 18 15:54:58 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <1d8.2db73d37.2ea5a3a5@aol.com> david said: > Quite a bit more than "a little", but yes, thats me. that's what the winkey-smile was all about... :+) > What other efforts are you alluding to? > Why not help those people who insist on > reinventing a fleet of new wheels, to collaborate > with existing projects that have similar/same goals? well, mainly because amazon is looking for any "help". and my guess is that google will be equally isolationist. moreover, the publishers they are both coddling do not share the "let's share" mentality that drove michael hart. stanford has talked about a huge digitization effort, but i'm not sure how far they are, or if they've even started. but it is safe to say they've got enough money that they won't be on the lookout for any volunteers to assist them. and i'm not sure what's up with the million-book project, or even who is in charge of that effort, but their objective seems slightly different than the one guiding things here; they're seem to make scan-books, not cleaned-up e-texts. (they _are_ doing o.c.r., but they're not proofing the stuff.) and the specific area that's of _most_ interest to me is the one comprised of the d.i.y. authors using cyberspace to connect directly to their audience, sidestepping the clutches of the middlemen who were necessary before. this is _new_ content, so i think it'll eventually eclipse the public-domain that is the thrust of project gutenberg... > And this is exactly what the Distributed Proofreaders project > proposes to solve, and they've been pretty successful thus far, IIRC. um... distributed proofreaders _might_ solve the problem of inconsistent formatting _sometime_ down the line, if their policies settle into place. but if you've looked at their output throughout their history, you'll know they have not done it yet. but, honestly, that shouldn't even be their job in the first place. project gutenberg should have had solid formatting rules down long before distributed proofreaders even came into being... > I've had a lot of luck stepping out of the box, and > analyzing the text based on the "style" of the text, > versus the actual content itself. I was approached by > someone who is doing a paper and his PhD thesis on > this exact kind of approach. Basically (with my expertise > and help) he's taking the bulk of Gutenberg, importing > every word from every work into a database, and > then running his own algorithms across the entire collection, > to pull out the styles by known authors. well, that sounds interesting. :+) but "authorial style" is fairly irrelevant to the matter of formatting the text in the way that a typographer does it, or an e-book requires, which is actually the task at hand... > And I assume you've done this? yep. > And your routines are made public somewhere, > so others can improve and correct them to continue to be better? nope. if i turned 'em loose now, the tarking naugshlocks here could just pick them up for their own nefarious purposes, and i have _no_ intention of letting my hard work be used that way. i came here to the project gutenberg listserves in the first place because i intended to share, but that intention has been squashed. besides, all the negative feedback i've gotten here has convinced me that people out in the world think that what i've done is impossible. so i figure there must be a couple bucks in it, and why give them up? i don't have a day-job, and my girlfriend deserves some nice things... > And where is his code? Where are his "routines"? i don't believe that moynihan has ever made his macros available. he _did_ offer them to project gutenberg, but i guess they declined. i can't express to you just how stupid i think _that_ decision was... > ALL of the talk of how "easy" this is, is completely irrelevant, > if nobody wants to actually contribute that knowledge back > so others can improve and benefit from it. if somebody thinks something (which would be very valuable to them) is impossible, and you know that it isn't, don't you think that you should _tell_them_? i do. that's why i'm here. i'm not gonna do it _for_ them, because they've abused me so -- are you in the habit of helping people who mistreat you? -- but, since i do believe so passionately in electronic-books, i feel i have an obligation to _try_ and make them wake up. that's why i've stayed here for so long, nearly a year, and taken all the abuse that they have dished out to me. because i believe in e-books, and i admire michael hart. (michael, by the way, has been very supportive of me.) but i'm about to give up, because they just won't listen. nonetheless, i feel _good_ about the fact that i _tried_... > If you're not willing to do this, then our conversation stops here. ok. no problem. i believe in sharing, and told you what you could share with me, if you want me to share my work back with you, but if that's not acceptable to you, then i'm cool with that. i'll go my own way... > What was that "bizarre reason"? you'll have to ask the people in charge. > If you want someone to support "your" format, > then you'll probably have to take that first step by > justifying and documenting it. i've done that. i've posted it several times on this listserve. and i will send it to you backchannel. 11 dirt-simple rules. > The only page I could find describing the format was here: > http://czt.sourceforge.net/zml/ > And I assume thats not your project or code. you're right, that's not it. > What kind of errors? Incorrect hyphens? Broken paragraphs? > Missing end quotes? (this is common) all of those, yes. and many more. every kind imaginable. and some that you never would have been able to imagine. > Impossible to regain, unless you have the original work in-hand, > to see if there were actual CAPS used, or not. right. > Maybe the "errors" were intentional. Many authors use > poetic license to express their thoughts, and sometimes > those things break the rules of grammar and spelling. that's not it. the all-caps convention dates back to the days of keypunch machines, when computers had no lower-case characters. (there is a rumor that i started that michael hart actually entered "alice in wonderland" on a keypunch machine. don't know if it's true.) > How do you mean? You mean 1.jpg 1.jpg 1.jpg appearing in > three places, but intended to represent 3 _different_ images? no, i mean 1.jpg being used in 3 different _e-texts_. which means that you can't dump those e-texts into the same folder without experiencing a filename crash. which means that you need to rework all the filenames in the library if you want 'em to be unique, which is something that you do really want, if you value your sanity... > How do you express a Cyrillic text in 7-bit ascii? You can't. right, that's the problem, though 8-bit e-texts have become common. many mostly-english texts, though, do have foreign words in them where an 8-bit diacritic was chopped down into a 7-bit character. some of these can be automatically replaced. but then you are running the risk of turning what _was_ a non-diacritic into one. (for example, burkey cites the change of "role" to "role" with a hat on the "o". but you know there are plenty of plain "roles" out there.) > Such as? many large works have been split up into smaller sections, but then also "collected" into one e-text as well. there are also "collections" of certain authors, and so on. you'll want to cull out this redundancy... > Are you revealing the "problems" in a condescending way? > Or in a constructive way? i know of no more constructive way to reveal a problem than to diagram the code that will fix it and volunteer to write the app. i've done that, and had shit heaped at me. you be the judge. > The way you approach the "Hey, this is broke" process > is very telling as to how you will be received and > responded to for same. and the way i am "received and responded to" is very telling as to whether i will _continue_ to offer my code that will help fix the problems... > which is exactly why you should have > your own mirror of Gutenberg, > or a subset of it as you work on the pieces. i was just letting you know that piece of information. otherwise, you might think that you could get the d.v.d. of the e-texts, and simply work on the e-texts from that. odds are some of those files have already been replaced... > If you are "moving on", then it behooves you to try to contribute what you've learned (in terms of knowledge, code, or "routines") back to those who will continue to contribute and learn. i've left a ton of messages, detailing the problems and laying out exquisite details about the fixes i suggested. feel free to mine that, if you can plow through the flack. (the vast bulk of my posts are over on the u.n.c. archives; i'm not sure if those have been brought to this list, or not. you should also be aware that the .html conversion program over on the u.n.c. machine was faulty, and many of the threads are cut off in midstream. the full thread is in the .html source, so you would have to recover the missing messages from there. that's a good example of how a conversion program can mess up.) > We're only here to help the next generation learn and improve. > If we're not leaving anything here by which others can > remember us and grow themselves; if we're not teaching others > as we learn ourselves, then what is the point? oh, don't get me wrong, david... :+) as i outlined above, there are plenty of arenas in the e-book world, project gutenberg is just one of them. all the others need help too. so i don't intend to stop speaking, or to stop working on e-books. i've been doing continuous work on e-books for over 25 years now. to the contrary, i intend to _continue_to_speak_, and _loudly_, and to finally _go_to_work_ and get some of my things finished, so e-book authors out there in the world can start _using_ them. the difference is, rather than speak here _quietly_ and _privately_ with the project gutenberg folks here _behind_the_scenes_ on their own listserves, trying to get them to pay attention to the problems in their library, i will instead speak _publicly_, using my new blog, making noise about the many problems that are being ignored here, so at least the rest of the world learns -- and grows -- from them. so instead of working to make the "people in charge" here smarter, which has essentially meant banging my head against a brick wall, i can instead spend some time _productively_ by making programs that people can use to spread e-books out into the world at large. my time, thoughts, and work-product are too valuable and important to continue wasting them here on people who do not appreciate them. yours probably are too, but maybe you're more "diplomatic", and maybe they won't ignore you or badger you when you say something that they desperately _need_ to hear, even if they don't _want_ to. oh yeah, and eventually i'll even come back and clean up the e-texts, when i've automated my various procedures to the fullest extent and implemented them in code, and the people-in-charge here have made a mess of the library in the process of trying to make x.m.l. work. because michael hart deserves better than that... -bowerbird From hacker at gnu-designs.com Mon Oct 18 18:08:16 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Mon Oct 18 18:08:48 2004 Subject: [gutvol-d] [ANN] Project Gutenberg IRC channel Message-ID: I've offered this before, and maybe it got lost in the chaff of previous messages and threads, so I'll offer it up again. If anyone wants to feel free to discuss issues related to Project Gutenberg, ebooks, conversions, tools, bugs, the weather, or whatever else in "real-time", they can join our network to do so. I run an irc network that is dedicated to developer-related support issues on various projects, and I would like to extend that invitation to include Project Gutenberg as well. Feel free to join the channel #gutenberg or #project.gutenberg on our network... (irc.sourcefubar.net) and talk about any of the current, past, future, or whatever events you'd like, related to PG and other similar projects and products. We have redundant tri-coastal links, so the network will not "split" or go down like (cough) other similar networks. We also offer ssl-only ports, for those who wish to make sure their conversations are "secured". Many project teams use our network for exactly that purpose, in private, secured channels to discuss issues related to their projects. If you join the same network on port 994, and configure your client appropriately (making sure to get the validated cert from cacert.org), you will be able to talk securely from client-to-server, without the feer of snooping. These are exciting times, and I hope to see everyone there! Can someone with authority publicize this on PG's website? Welcome aboard! David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Mon Oct 18 20:13:39 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 18 20:13:59 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <67.356bca9c.2ea5e063@aol.com> anne said: > Why can't you just vanish, > if such is your preference, > without being obnoxious about it? i don't think my posts are "obnoxious", anne, or i wouldn't write them. but if you think they _are_ "obnoxious", anne, then why do you read them? -bowerbird From Bowerbird at aol.com Mon Oct 18 23:15:41 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Oct 18 23:16:04 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1d4.2d17520f.2ea60b0d@aol.com> jeroen said: > Well, lets keep the name calling off-line, and the discussion pure... sounds like an excellent idea to me. let's see if marcello will agree. *** i appreciate your analysis, and agree with it in large part, because i think you've faced a good number of the problems. to pull them out into a bullet-point list, they are these: > that semantically tagged is an ideal, that even the most > ambitious attempts at a generic DTD for pre-existing texts > (and that is what we are mostly dealing with in PG) > have not reached > and is either unreachable (since we can't know > the original intend with much of the formatting we encounter) > or impractical > (since the effort to do all this tagging is just too big > and isn't really needed by 99% of the users.) > In my opinion, the best attempt to > such a generic beast has been the TEI effort > which is described in a massive 1400 page document, > still requires customization for numerous academic projects > (both are bad news; both are unavoidable > given the complexity of the task) > but which can cover 95 percent of all text > with just 5 percent of that bulk > in an incarnation called TEI-Lite, > and that is basically all I suggest to PG to adopt as a standard. so if i was to summarize the bulk of what you've said here, concentrating on the negative, but hopefully in a fair way... semantically tagging is an ideal which may be unreachable, and is certainly impractical, since it is a big effort and is just not needed by most readers. one method -- t.e.i. -- runs to 14,000 pages of documentation, yet still requires "customization". however, a less-complex subset -- called t.e.i.-lite -- is available, and that is what i recommend... again, i don't mean to "load" the argument by concentrating on the negative aspects of a heavy-markup approach like this, because i can certainly see benefits of marked-up e-texts too. certainly a minimal form of markup is practically a requirement to move the e-texts to a reasonable e-book and typographic future. and if the library was already marked-up in x.m.l., and working, i would probably have no objections at all to continuing with it... but the reality is that the library is _not_ marked-up already. so it is necessary for us to examine very closely the _costs_ of _doing_ any markup, to make sure the _benefits_ outweigh them. in a phrase, we need to be cognizant of the _cost-benefit_ratio_. in particular, we should also consider _all_forms_of_markup_ that we think could give us a reasonable set of the benefits at a range of costs, to see which gives us the best cost-benefit ratio. > Doing fully automatic convertion to good paged PDFs for > printing nice copies (and I mean good, as different from workable) > will probably always remain a dream sometimes dreams come true, you know... :+) > as good layout, just as good a good typographic design > is a skill, learned through doing it a lot. i agree. completely. it is also worth noting that we need to be able to deliver not just _one_ "good paged .pdf" of an e-text, but rather an entire _spectrum_ of "good paged .pdfs" -- in order to satisfy the entire spectrum of _readers_ out in the world. we can't just churn out a .pdf in 12-point-type and be done, because some readers will want 18-point-type, or 36-point. most will want a plain white background, but some will want a pale blue one, or a faint yellow one, or who knows what color. to be able to give the user that full range of options and _still_ deliver "a good paged pdf with good typographic design" is hard! i believe it is also true, however, that this skill can be implemented in source-code if we dedicate some effort. (it's difficult. but it's not like sending a man to the moon.) i have taken the first steps in making that effort, and i would encourage you to feel free to give me constructive criticism in examining the progress that i've made, and guiding it along. that beta-test listserve: zml_talk-subscribe@yahoogroups.com or, since you are doing well here in the realm of theoretical, perhaps you might want to instead specify what "a good pdf" would look like, or what _you) mean by a "nice" printed copy. i don't think there is a lot of awareness here along these lines, and i think it would move the discussion along _significantly_ if we could come to share some agreement on what we _want_. at some point in time, we are going to have to evaluate the quality of the output we get from various methodologies, to determine if it is "good enough" or not. to do that, we need to develop a standard... i'm not saying i think it will be _difficult_ to create our standard. to the contrary, i think it will be fairly easy, once we get started. rather what i am saying is that that work has not been done here, so we are still operating in the dark to a large degree. > Even in a highly programmable environment such as TeX, > I've never been able to print something from "semantic" markup > without manual interventions once in a while -- > even for something as arcane as a two column dictionary. i believe you. > Simularly, doing a good HTML (as different from a reasonable HTML) > will probably also require manual intervention and tweaking i believe you here, too. and once again here, there is little conscious agreement here about _what_ constitutes a "good" .html version of an e-text (as distinguished from a "reasonable" one, to use your terms). as with the pdf/print standard, i think that it will be fairly simple to come to agreement about what we want .html versions to be like -- the best of the files being done now come fairly close, i'd say -- but we haven't actually done the process of forming that agreement. > but both these things do not disqualify the large benefits > we could have from having TEI tagged master copies here you are confounding two arguments. the argument for having a "master" version that will generate all the "ancillary" versions is _overwhelming_. it's just ridiculous to try and maintain multiple versions; the costs of that are far too high for the benefits returned. but the argument that that "master" version should be t.e.i. -- or t.e.i.-lite or any of the other x.m.l.-based formats -- is _far_ less compelling. i think z.m.l. makes a better master. > even if just at a relatively simple level of tagging > (just marking headings, divisions, italics, footnotes, and tables). i wholeheartedly agree that a "simple level of tagging" that "marks" these type of things in an unequivocal manner is a very important minimum-usability hurdle to clear. as you might expect, though, i don't think angle-brackets are necessary at all to create this "simple level of tagging". i do _not_ expect you to take that on faith, however. i'll show you how to do it. the proof is in the pudding. > The task of producing nice HTML / Printable versions > of XML documents is further complicated by the > highly verbose and somewhat unintuitive model of XSLT, > which is presented as the most important tool for this task agreed, and i'm glad you recognize the huge costs in this arena. > from the computer scientist purist point of view > that might be true, but for many less gods, > who think five lines of basic is already a lot, > its functional programming model and verbosity > is a real piss-off. i'm glad you said that, so i didn't have to... > Getting 14000+ texts to XML can be done, > just as they where produced initially, > by starting somewhere with the first one, > and not stopping until we've completed them all. that's the attitude! :+) is that the wisest choice of action, though? i'm not nearly so convinced of that. i think we need to set a better path, and go off on _that_ one... > A very simple alternative way would be to > load them in OpenOffice, > apply the formatting you like > and save it i am even less convinced of the wisdom _or_ the "simplicity", of _that_ course of action... any manual methodology is likely to be quite inferior, from a cost-benefit perspective, because the costs would be astronomical. even if you're using volunteers, at some point, you have to place value on human labor... if you cannot automate some 95% of the initial markup, you need to take your method back to the drawing board. we need to save the human labor to do the _checking_ of the markup, not waste it doing the initial markup itself... > of course that formatting would be very much non-"semantic". which, of course, negates a lot of the benefits as well, and thus degenerates the cost-benefit ratio even further. (and i should point out that none of your discussion really gets at the essence of what _semantic_ markup would be.) > (Still formatting his ebooks in SGML based TEI) i respect the work you are putting into the effort, immensely. -bowerbird From jonathan_ingram at yahoo.com Tue Oct 19 05:33:46 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Tue Oct 19 05:33:48 2004 Subject: [gutvol-d] [ANN] Project Gutenberg IRC channel In-Reply-To: Message-ID: <20041019123346.80922.qmail@web41726.mail.yahoo.com> --- "David A. Desrosiers" wrote: > Feel free to join the channel #gutenberg or #project.gutenberg > on our network... (irc.sourcefubar.net) and talk about any of the > current, past, future, or whatever events you'd like, related to PG > and other similar projects and products. We have redundant tri-coastal > links, so the network will not "split" or go down like (cough) other > similar networks. Along similar lines, we at Distributed Proofreaders also use a chat/conference room to communicate. We use Jabber's multi-user chat feature, though, rather than IRC. If you'd like to join, then create a Jabber account (if you haven't already), and connect to pgdp@muc.jabber.org . All welcome, particularly if you're a user of/contributor to DP. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From marcello at perathoner.de Tue Oct 19 06:54:37 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 06:54:43 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: <1d4.2d17520f.2ea60b0d@aol.com> References: <1d4.2d17520f.2ea60b0d@aol.com> Message-ID: <41751C9D.2010609@perathoner.de> N.B. Bowerbirds first post on gutvol-p on 11/06/03 is attached below in its full glory for the readers convenience. > bowerbird 11/06/03 > it reminds me that i firmly believe us users should file > a class-action lawsuit against you computer overlords > bowerbird 10/14/04 > is this kind of name-calling condoned on this listserve? Bowerbird started name-calling in his very first post and now he is soooo sensitive about it. In good ol' usenet tradition, he who dishes out freely must also be able to pocket graciously, but, of course, this applies to other people, not to Bowerbird himself. Also he got himself kicked out from at least one newsgroup (ask Jon Noring for details) and was at one point set under moderator supervision on this list. (Why has this changed? Have some bits got lost in the move to pglaf.org?) > bowerbird 11/06/03 > this isn't flame-bait, 'cause i ain't even gonna argue with ya. > i've concluded it's a waste of my time to even _discuss_ x.m.l. Since then, Bowerbird has done nothing else than _cuss_ XML, that is: belittle the XML language and the people who are using it. He concluded that he'd rather waste _other_ people's time than his own. If Bowerbird was really interested in establishing an alternative to XML markup, he would have fired up emacs and started coding his reader. In a month or two he would have shown us the first prototype. If the thing really was heaps better than XML, we would have acclaimed him and considered changing the DP formatting rules to his format. Of course, flamewars being his favorite pastime, he got nowhere with his reader. Furthermore he wastes the time of other volunteers and newcomers to this mailing list by luring them into yet another endless discussion of the exact same topic we already had plenty before. (The topic being: the self-celebration of His Royal Highness Bowerbird.) > bowerbird 11/06/03 > you're already over-budget and severely behind-schedule [...] > i have written such a program, and i'll have a beta version soon. Bowerbird first announced his reader on 02/14/03. > -- for immediate release -- > [...] > bowerbird intelligentleman announces > an open-source project geared toward > creating an o.e.b. "presentation system", > i.e., a cross-platform reader-program > that will allow users to read o.e.b files. > [...] > bowerbird further indicated that he is fully confident that > the effort would bear fruit quickly, since he has previously > programmed a wide variety of electronic-book applications. http://www.gnutemberg.org/pipermail/libergnu.mbox/libergnu.mbox That was 20 months ago. Since then we saw a lot of announcements but never a line of source code. (Note: he says "Open Source" in his press release, and he also says OEB, which is an XML application.) If Bowerbird was in good faith, he'd published some source code immediately after his announcement to have people review it and comment on it. As it stands, nobody has ever seen one single line of his alleged mother of all readers. (All we did see were some `screenshots' probably done with Microsoft Paint.) > bowerbird 11/06/03 > so you better know i'm prepared to deliver. I defy Bowerbird to publish the source code of what he has done in 20+ months of development. Hic Rhodus, Bowerbird, hic salta! Prove to us that you can build a better reader. Don't give us any of your lame excuses but deliver now or be silent forever. (Lame excuses we already had include: I won't show you because you are so nasty.) > bowerbird 11/06/03 > in a phrase, it's time to put up or shut up. Of course, this rule again applies to other people, not to Bowerbird himself: he did not put up and never will shut up. Conclusion: Bowerbird is a kook (def: http://www.catb.org/~esr/jargon/html/K/kook.html ) who knowingly wastes the time of volunteers who could otherwise do many useful things for PG. And he has got nothing to show for it. And now, for the enlightenment of the newcomers, and for the entertainment of those of us who know how hard Bowerbird has been working and how much he has achieved in this short year, Bowerbirds posting debut on gutvol-p of nearly a year ago ... unabridged. > i've been writing some apps for project gutenberg, > so i subscribed to this listserve this evening, and > i went back and read all the posts for a full year, > just to get the flavor of what has gone on here... > > boy, what a waste of time... :+) > > it reminds me that i firmly believe us users should file > a class-action lawsuit against you computer overlords > for all the time and trouble you have dragged us through > in trying to transition us to x.m.l. > > you're already over-budget and severely behind-schedule > in delivering on the things x.m.l. was supposed to bring us, > and we haven't seen even a fraction of the promised benefits. > > this isn't flame-bait, 'cause i ain't even gonna argue with ya. > i've concluded it's a waste of my time to even _discuss_ x.m.l. > > there are 10,000 e-texts in the project gutenberg library -- > 4,000 more than there were when you had your last flamewar > -- so you've got lots of opportunity to show the value of x.m.l., > just get to work, and let us know when you're done doing markup. > > heck, don't even bother to contact us then, just go right to work > making some x.m.l.-savvy _viewer-programs_ for us end-users, > because it doesn't do us one bit of good to have marked-up files > if we don't have any viewers that can make use of that mark-up. > > in a phrase, it's time to put up or shut up. > > and yes, i realize you'll throw that challenge right back at me, > so you better know i'm prepared to deliver. > > i say we don't need much "markup" -- sometimes _none_ -- > to turn project gutenberg's plain-ascii e-text files into a > slick electronic-book experience for end-users, if we only > put a little bit of intelligence into an e-book viewer-program. > > i have written such a program, and i'll have a beta version soon. > so i hope i've pissed you off, because i _like_ hostile beta-testers; > i trust them to step past the polite praise and tell me what's wrong. > > that's enough for now, i gotta get back to work. and so do you... > > -bowerbird -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Oct 19 08:21:46 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 08:22:03 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: <1e4.2d39426d.2ea68b0a@aol.com> once again, because i guess i haven't said it enough, the beta-test for my viewer-program is now open... you can join in and see the elusive software yourself by e-mailing: zml_talk-subscribe@yahoogroups.com as soon as i have enough people to get that test going, i'll be able to move on to that class-action suit against the computer overlords for wasting our time with x.m.l. i think the settlement on _that_ is gonna be _huge_! ;+) -bowerbird From marcello at perathoner.de Tue Oct 19 08:28:43 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 08:28:49 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: <1e4.2d39426d.2ea68b0a@aol.com> References: <1e4.2d39426d.2ea68b0a@aol.com> Message-ID: <417532AB.9000400@perathoner.de> Bowerbird@aol.com wrote: > you can join in and see the elusive software yourself > by e-mailing: zml_talk-subscribe@yahoogroups.com You said it was Open Source. Then why don't you just mail me the sources or even better, post a link to the sources tarball, so everybody can see? Or are you going back on that? -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Oct 19 09:33:57 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 09:34:13 2004 Subject: [gutvol-d] Re: Extra spaces in html files Message-ID: <197.316bf610.2ea69bf5@aol.com> karl said: > "Save as HTML" normally is not good enough. well, that tells me what's "not good enough". but it doesn't tell me about what "good" is... > Why do you ask? because i want to know what you think. and, as i said, i think the conversation here would benefit from creating a _standard_ that we can use to _evaluate_ the output that we expect from the methodology we adopt. if a procedure can create an .html version and a .pdf version -- and whatever other versions we decide are necessary -- that meet this standard, then we know we've got a winner... > We can keep the old file unchanged for the time being. > XML produced by http://www.pgdp.net/ > is good enough to work with. ok, i'll take you word for it on that. so i can take an e-text, run it through some converter located somewhere on the site (where, exactly, is it?), and come out with some x.m.l., if i understand correctly. if i do, then the question as to how you get the entire library converted over is answered -- run all the e-texts through this. and thus the conversion to x.m.l. is simple. (if i understand you.) and then what? how do i turn that x.m.l. file into an .html file? into a .pdf? back into a plain-text file? (for looking "ahead" to the time when the x.m.l. file, as the "master", is the only one retained.) i know the standard answer is through x.s.l.t. conversions, but how does a person step through those conversions today? there is also the question of _maintaining_ the x.m.l. file -- entailing things like editing errors out of it, updating it, etc. where do we get volunteers who have the expertise to do that? a quick review at x.m.l.-coding -- i'm looking at a .tei version of alice in wonderland from marcello -- reveals it is complex, definitely not the type of thing you could entrust to most people. since the whitewashers are even now at the point of overload and burnout, just from the task of verifying the submissions of distributed proofreaders, who will be responsible for _this_? > For converting TEI XML to HTML and PDF > you can use Sebastian Rahtz' XSL stylesheets: > http://www.tei-c.org/Stylesheets/teixsl.html thanks, that's good info. would you please take some e-texts -- you can choose any you want -- and convert them to x.m.l. and do the output conversion to .html and .pdf for us please? that way, we can subject these output-files to evaluation... > I'm old fashioned and like playing with DSSSL tools > (that's all in German and not that polished nor finished > -- take it as a proof of concept): > http://www.gnu.franken.de/Tieck/ > http://www.gnu.franken.de/Tieck/Dokumente/Koepke/ i don't know what "dsssl" is, or understand german, but i'll go take a look at those websites to see what i see... in the meantime, will you generate those samples please? (or feel free to point us to some that you've already done.) -bowerbird p.s. i've jotted down a few of the elements that _i_ think are essential to an electronic-book, and which should be included in any "standard" that we create, and will post that... From marcello at perathoner.de Tue Oct 19 09:49:41 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 09:49:48 2004 Subject: [gutvol-d] How new technology scares away users Message-ID: <417545A5.5060105@perathoner.de> This is a video that turned up in the TEI list about the problems users have with all those new-fangled publishing technologies. Its in Danish but you still get the gist. http://homepages.nyu.edu/~mz34/helpdesk.WMV -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Oct 19 09:56:27 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 09:56:41 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: marcello said: > You said it was Open Source. my open-source effort failed to attract any programmers. so it's dead in the water. (oh, and here's a hint, marcello: if you see me issuing a _press_release_, it's likely a joke, as i make fun of people who think "spin" is doing something. if it's a pro-x.m.l. press release to boot, you can be certain.) but yeah, no programmers = no open-source program... however, jon noring's open-source project is still alive -- http://www.openreader.org -- and i can assure you that that one is _not_ a put-on. (which doesn't mean it will eventually succeed either.) so i would encourage any interested people to join that. i'm guessing jon would welcome programmers _warmly_. meanwhile, my own individual viewer-program is fine! did i mention that i'm now getting the beta-test going? you can subscribe! zml_talk-subscribe@yahoogroups.com > Or are you going back on that? many open-source proposals never get off the ground. just a fact of life. thanks for your interest, however. but now i must get back to the discussion in progress... -bowerbird p.s. i'm not "soooo sensitive", as you put it. if name-calling _is_ condoned on this list -- i still haven't got an answer -- then that's just _fine_ with me, because there are a couple tarking naugshlocks that i'd be happy to call names... :+) you see, i am able to maintain a _sense_of_humor_ in this. i just want to make sure that i understand the ground-rules... From hacker at gnu-designs.com Tue Oct 19 10:02:04 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 10:02:41 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <197.316bf610.2ea69bf5@aol.com> References: <197.316bf610.2ea69bf5@aol.com> Message-ID: > so i can take an e-text, run it through some converter located > somewhere on the site (where, exactly, is it?), and come out with > some x.m.l., if i understand correctly. Can we start using proper acronyms here? The industry accepted term you want to be using here is XML, not "x.m.l.", unless by "x.m.l" you mean some other format which is not XML. > how do i turn that x.m.l. file into an .html file? into a .pdf? > back into a plain-text file? (for looking "ahead" to the time when > the x.m.l. file, as the "master", is the only one retained.) i know > the standard answer is through x.s.l.t. conversions, but how does a > person step through those conversions today? You use an XSLT. > there is also the question of _maintaining_ the x.m.l. file -- > entailing things like editing errors out of it, updating it, etc. > where do we get volunteers who have the expertise to do that? Create a tool that can go from PG etext, in "normalized" format to PG's accepted version of an XML document of that PG work. > thanks, that's good info. would you please take some e-texts -- you > can choose any you want -- and convert them to x.m.l. and do the > output conversion to .html and .pdf for us please? > that way, we can subject these output-files to evaluation... I thought your tool did exactly this. Am I mistaken? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From ciesiels at bigpond.net.au Tue Oct 19 10:02:41 2004 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Tue Oct 19 10:02:55 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: References: Message-ID: <417548B1.5080305@bigpond.net.au> Bowerbird@aol.com wrote: >meanwhile, my own individual viewer-program is fine! >did i mention that i'm now getting the beta-test going? >you can subscribe! zml_talk-subscribe@yahoogroups.com > > Your beta test program isn't 'fine'. It's the most counter-intuitive software I've ever used. There's also the itty-bitty issue of it not being able to *open* files, save your preselected texts, the sources of which are conveniently hidden. The 'talk' list is farcial, as all posts must be approved by bowerbird. Mike Email me off-list if you'd like a copy of this rancid pudding without surrendering your soul/Yahoo ID. From Bowerbird at aol.com Tue Oct 19 10:12:27 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 10:12:42 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: <1a7.29e5ae04.2ea6a4fb@aol.com> mike said: > Email me off-list if you'd like a copy of this > rancid pudding without surrendering your soul/Yahoo ID. alright, mike! i haven't even released my app yet, and it's already being _bootlegged_! that makes me feel all warm and fuzzy inside... :+) thankyouthankyouthankyou... -bowerbird p.s. i'll answer the rest of mike's post next week (or next month), but this part was just too good to pass up... ;+) From hacker at gnu-designs.com Tue Oct 19 10:14:15 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 10:14:41 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: References: Message-ID: > but yeah, no programmers = no open-source program... Aren't you the programmer in this case? You get to choose the license, as the person responsible for actually creating the project and writing the code. Or are you asking for others to write the code? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From marcello at perathoner.de Tue Oct 19 10:14:41 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 10:14:49 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: <197.316bf610.2ea69bf5@aol.com> References: <197.316bf610.2ea69bf5@aol.com> Message-ID: <41754B81.6070800@perathoner.de> Bowerbird@aol.com wrote: > thanks, that's good info. would you please take some e-texts > -- you can choose any you want -- and convert them to x.m.l. > and do the output conversion to .html and .pdf for us please? Why don't you go ahead and publish the source code for the "Open Source" ebook reader program you announced on 14 Feb 2003 and which has been almost in beta stage ever since? Instead of burdening your homework onto other volunteers? If you want to see the output from those stylesheets you can run them yourself. Contrary to your vapourware reader Sebastians stylesheets are working and put up for download. > that way, we can subject these output-files to evaluation... If you want to evaluate, go to http://www.gutenberg.org/tei/examples/ there are TEI source + HTML, PDF, TXT and PalmDoc generated versions for Alice in Wonderland and Life on the Mississippi ready to download. Of course, you'll also find the sources for the conversion tools there, under GPL. Of course, you'll also find there an online utility to convert from TEI to HTML, PDF, TXT and PalmDoc. Of course, you'll also find there a manual explaining how to mark up your text so they work best with the conversion utilities. > in the meantime, will you generate those samples please? > (or feel free to point us to some that you've already done.) In the meantime will you roll a tarball of your "rancid pudding"* reader sources and post them please? The only thing which hasn't made an inch of progress in 20 months is your reader program. But maybe that's the reason you want to keep everybody else too from doing useful stuff. * "rancid pudding": endearing epithet uttered by a beta-tester of this reader. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Tue Oct 19 10:29:00 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 10:29:08 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: References: Message-ID: <41754EDC.9000406@perathoner.de> Bowerbird@aol.com wrote: > my open-source effort failed to attract any programmers. > so it's dead in the water. (oh, and here's a hint, marcello: > if you see me issuing a _press_release_, it's likely a joke, Let me get this straight: 1. 14 Feb 2003: You announce you will code an open source ebook reader. 2. 19 Oct 2004: You have nothing to show. 3. You retroactively declare the announcement to be a joke. 4. You think that did save your face. Think again. > bowerbird: 14 Feb 2003 press release > bowerbird further indicated that he is fully confident that > the effort would bear fruit quickly, since he has previously > programmed a wide variety of electronic-book applications. > bowerbird: 19 Oct 2004 > but yeah, no programmers = no open-source program... So you admit you were lying to the press as you told them you were confident you could pull off this thing because you were such a good programmer yourself. > meanwhile, my own individual viewer-program is fine! > did i mention that i'm now getting the beta-test going? > you can subscribe! zml_talk-subscribe@yahoogroups.com Yeah. I heard from one of your beta-testers: "Rancid pudding". >> Or are you going back on that? > > many open-source proposals never get off the ground. So you just switched to closed source. Why should I be even remotely interested in a reader that: - is closed source - uses a proprietary non-standard file format - has got "rancid pudding" as best review to date? If I wanted that, minus the bad review, I would take the Micro$oft Reader, which is ready and working today. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Oct 19 10:56:54 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 10:57:08 2004 Subject: [gutvol-d] Re: Extra spaces in html files Message-ID: david said: > Can we start using proper acronyms here? > The industry accepted term you want to be using here is XML, > not "x.m.l.", unless by "x.m.l" you mean some other format > which is not XML. well, i'm not much for "industry accepted terms", but the _real_ reason i put the periods in is that with any acronym, when you use all-lower-case, as i do, using periods helps reader comprehension. i even do it with z.m.l., so it's no slam on x.m.l., ok? if it bothers you, just ignore it. *** i said: > > there is also the question of _maintaining_ the x.m.l. file -- > > entailing things like editing errors out of it, updating it, etc. > > where do we get volunteers who have the expertise to do that? david said: > Create a tool that can go from PG etext, in "normalized" format > to PG's accepted version of an XML document of that PG work. i'm not clear what you're saying here. the question is the difficulty of maintaining an x.m.l. file. you would transform the x.m.l. file to do maintenance on it? also, there is no "normalized" format, nor is there an "accepted version" of an x.m.l. document of the corresponding p.g. e-text. (but there are lots of contenders for that position.) > I thought your tool did exactly this. Am I mistaken? it will. and when it does, i'll submit those files for evaluation. in the meantime, i'm asking karl if he can do it with x.m.l. now. everyone talks about it like it's a developed, stable technology. so i'm wondering what the hold-up is... -bowerbird From Gutenberg9443 at aol.com Tue Oct 19 11:17:53 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Tue Oct 19 11:18:07 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <14.362c6fea.2ea6b451@aol.com> In a message dated 10/18/2004 4:19:45 PM Mountain Standard Time, Bowerbird@aol.com writes: >>i don't know anything about that. >>i'm very supportive of people who will >>have the guts to publish something and >>take a chance at being dragged into court, >>if they are making that thing _available_ >>when it was an orphan out of circulation. >>but again, i don't know about blackmask, >>so i don't know if that applies, or not... I agree. Two of the series involved were Doc Savage and The Shadow. Other people had posted all of these, and David got them from their sites and reposted them at Blackmask (I think he did make other arrangements for the last Doc Savage books, which had not been posted when the decision was handed down.). At this point, it was a public service, and I appreciated it very much. However, several months ago courts ruled that Conde Nast owned both series. Whether Conde Nast DESERVES to own the copyrights, having acquired them by buying a company that had bought another company and so on for several steps, is now a moot point. Conde OWNS them. I know for a fact, having been in touch with Conde Nast, that Conde Nast intends to rerelease them, and expects to finalize agreements on how to do that by the end of the year. When the court case wound up, the people doing the original posting immediately pulled their sites down. David continued to keep them on his. It is arguably still a public service, because the titles still aren't available commercially. But legally, he is bucking copyright. SO FAR Conde Nast has not gone after anybody, and is being as considerate as legally possible with those who kept the books alive. But sooner or later, if David doesn't remove them from his site, Conde Nast WILL go after him. I am trying to convince Conde Nast that, as they wait for whatever it takes to get them all in print, that they allow downloads from Blackmask and from FictionWise, my favorite commercial e- book publisher, for the nominal sum of a dollar apiece, and that they take down the dollar version when the new version is ready on each book. I don't know yet what they're going to wind up doing. However, because of my personal knowledge of this particular situation, I would really hesitate to post any of Blackmask's titles without finding out for myself what the copyright status is. Project Gutenberg MUST comply with the law, no matter what other people do or don't do. As a writer, I agree a hundred and ten percent that books should be kept alive, legally if possible, by piracy otherwise. I don't WANT my books to die when I do. All the same, once a copyright has been established and firmly assigned to person or corporation A, it becomes illegal for person or corporation B to continue to traffic in that book, even if the trafficking is free. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/919c3ee0/attachment.html From hacker at gnu-designs.com Tue Oct 19 11:19:28 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 11:20:43 2004 Subject: [gutvol-d] Re: Extra spaces in html files In-Reply-To: References: Message-ID: > well, i'm not much for "industry accepted terms", but the _real_ > reason i put the periods in is that with any acronym, when you use > all-lower-case, as i do, using periods helps reader comprehension. i > even do it with z.m.l., so it's no slam on x.m.l., ok? if it bothers > you, just ignore it. But you continue to refer to HTML as such, and not h.t.m.l, and PDF as such instead of p.d.f. Can we try to be consistent? For readers (including myself) who is trying to grasp the broad number of projects that people are creating and using to work with PG texts, it makes it difficult when doing research on "ZML" brings up a completely unrelated series of projects. Similarly, searching for "z.m.l." in this case, doesn't surface anything relevant. > i'm not clear what you're saying here. the question is the > difficulty of maintaining an x.m.l. file. you would transform the > x.m.l. file to do maintenance on it? And I'm saying DON'T maintain an XML file. Maintain the text, in a structured, normalized format (i.e. add some parameters by which paragraphs can be spaced, quotes can be used, etc., ala LaTeX). > also, there is no "normalized" format, nor is there an "accepted > version" of an x.m.l. document of the corresponding p.g. e-text. > (but there are lots of contenders for that position.) XML is infinitely extensible (hence the 'X' in the acronym). This means two completely independent authors can use their own XML formatting and rules, and yet both can output completely compatible formats from their own transformations. That is the whole point of XML. XML is nothing more than a container, an empty bucket. XML does absolutely nothing on its own. > it will. and when it does, i'll submit those files for evaluation. > in the meantime, i'm asking karl if he can do it with x.m.l. now. > everyone talks about it like it's a developed, stable technology. so > i'm wondering what the hold-up is... From what I read in recent posts, others are wondering the same about your tool, and what it purports to produce. If you have it in beta test already, why not submit what kinds of files IT can produce, and let others compare that output to their own versions of what their own tools can produce. The point is, turf-wars, name-calling, and vaporware projects aren't adding value to the overall goals of PG, as I read them. That includes my tools as well, but mine aren't really along the same sort of strategic direction as PG's overall goals. They just happen to intersect at several places. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Gutenberg9443 at aol.com Tue Oct 19 11:22:02 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Tue Oct 19 11:22:20 2004 Subject: [gutvol-d] re: poofing and tarking Message-ID: <13e.4135e4d.2ea6b54a@aol.com> In a message dated 10/18/2004 9:14:03 PM Mountain Standard Time, Bowerbird@aol.com writes: but if you think they _are_ "obnoxious", anne, then why do you read them? The same reason I read all the other posts. I need to know what's going on. This statement from you is rather ingenuous, don't you think, when you are deliberately being obnoxious. If goody-goody-goody-I-have-the-solution-to- your-problem-but-I-won't-share-it-with-you- because-I-don't-like-you isn't being deliberately obnoxious, I don't know what is. Lead, follow, or get out of the way. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/91c819c2/attachment.html From Gutenberg9443 at aol.com Tue Oct 19 11:49:27 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Tue Oct 19 11:49:47 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: In a message dated 10/19/2004 12:16:07 AM Mountain Standard Time, Bowerbird@aol.com writes: Doing fully automatic convertion to good paged PDFs for > printing nice copies (and I mean good, as different from workable) > will probably always remain a dream This is A goal. It is not, and cannot be, THE goal. It would be great to have everything in printable PDF for people who want printable PDF. If you want to keep ten thousand books on your computer, printable PDF isn't worth the end product of bovine digestion. I loathe PDF. I'm sure I'm not the only person who uses Gutenberg who is in my situation: I'm going blind--slowly, fortunately, unlike a neighbor who went blind overnight--and I can't get PDF documents on my Rocket, which means that as my vision continues to deteriorate I'm going to have to read sitting in front of my computer if I want to read something that is not available in a format I can convert to text or HTML in order to convert it to Rocket. I agree with Michael. Post everything in TXT first AND THEN do anything else you want to do with it. I believe that is one of the goals of of the DP team, which has all the scanned pages on computer to work from. HTML, even the "Save As" kind of HTML, can maintain formatting if you tell it to; I know because I've done it often. A basic problem in this entire discussion is that there are a lot of people here who are program-happy, as opposed to computer-happy. I'm computer- happy, but like the vast majority of people who use Gutenberg, I'm really not interested in umpteen different programs. I just want a book I can read. As a scholar, I might at times need the specific coding which will tell me what used this punctuation mark or that whatever that doesn't come across on txt, but if I need that, I can obtain the book someway and reinsert the punctuation and formatting and whatever. The village schoolmaster in a third world village, who has two hours of electricity a day, one cellular phone for the entire village, and an obsolete laptop donated to him by a first world company with a connection from the phone to the laptop cobbled together by a gadget-minded Peace Corps volunteer or church or UN aid worker, doesn't give a squiddly about umlauts and grave accents. He just wants BOOKS that he can READ to his students during the two hours a day that the electricity is on. The cowboy who's going to be stuck all winter in a back-country cabin looking after a herd of cattle in a snowed-in high pasture, or the astronaut, or the submariner, or the scientists in a South Pole research station, or the kids going to bush- school on the radio in Australia or Alaska--these people don't need pretty pages. THEY NEED BOOKS. They need good books. That's all. If we go back to the very basics, this is the goal of Project Gutenberg. It is no mistake that the very first things Michael posted were the most important documents of freedom. An educated populace can be kept enslaved for only so long, and then the privy hits the fan. We are the world's free public library. We do not serve, nor do we even NEED to serve, the few people in elite professions who want, and need, to be able to account for every comma and every umlaut. People who are arguing their heads off about ten different ways to format are losing sight of the goal. It is hard to remember that your goal was draining the swamp if you are up to your a** in alligators. Stop creating alligators. If YOU--whoever YOU happens to be--want to create all kinds of pretty formats, do it. That's grand. But don't try to inflict your vision on all of PGLAF. The TXT versions MUST come first. Then people can be joyfully reading the new books, while other people create other formats for those nice new books. Now can we go back to draining the swamp? Notice I said "can," not "may." We Ph.D.s in English know our grammar. I MEANT "can." Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041019/770e2424/attachment-0001.html From shalesller at writeme.com Tue Oct 19 12:13:23 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 19 12:26:14 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com> > The village schoolmaster in a third world village, who > has two hours of electricity a day, one cellular phone > for the entire village, and an obsolete laptop donated > to him by a first world company with a connection > from the phone to the laptop cobbled together by > a gadget-minded Peace Corps volunteer or church > or UN aid worker, doesn't give a squiddly about > umlauts and grave accents. Of course he does. How on Earth can he teach German or French, or expect his students to read a book in a language they are familiar with (in large parts of Africa, that would be French), without the proper umlauts and grave accents? -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jeroen at bohol.ph Tue Oct 19 12:49:09 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Tue Oct 19 12:49:22 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com> References: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com> Message-ID: <41756FB5.3090702@bohol.ph> D. Starner wrote: >Of course he does. How on Earth can he teach German or >French, or expect his students to read a book in a language >they are familiar with (in large parts of Africa, that >would be French), without the proper umlauts and grave >accents? > > Even worse, many African languages are written with the Latin alphabet, but using additional letters, such as an F with a curl, which, until very recently weren't supported by most computers or typewriters, and thus conveniently replaced by their nearest counterparts. You could have a look at this nice page on the Gentium font, which is really nice, and was developed with support for african languages in mind. (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=Gentium) Support for most Indian languages is only widely available since Windows and Office XP, and many less widely used languages are still not supported, let alone on the old hardware we donate (I decided recently to increase my bottom line from Pentium 90 to Pentium II 266 for machines I donate to schools in the Philippines, the latter can just run windows 2000 with Unicode support.) Jeroen. From jeroen at bohol.ph Tue Oct 19 12:51:12 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Tue Oct 19 12:50:51 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: References: Message-ID: <41757030.1020209@bohol.ph> Gutenberg9443@aol.com wrote: > > The TXT versions MUST come first. Then > people can be joyfully reading the new > books, while other people create other > formats for those nice new books. > > Anne For a novel, you may be right, but for complex texts, like most teaching materials require something more than that: illustrations, tables, maybe formulas, etc. HTML is really the bottom line here. Also, since it is normally easier to throw something away than to add, I prefer to go to XML first, and then create HTML and Text from that. Jeroen. From marcello at perathoner.de Tue Oct 19 12:52:43 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 12:52:54 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: References: Message-ID: <4175708B.3080800@perathoner.de> Gutenberg9443@aol.com wrote: > The village schoolmaster in a third world village, who > has two hours of electricity a day, one cellular phone > for the entire village, and an obsolete laptop donated > to him by a first world company with a connection > from the phone to the laptop cobbled together by > a gadget-minded Peace Corps volunteer or church > or UN aid worker, doesn't give a squiddly about > umlauts and grave accents. True, if he happens to speak or teach English. If he happens to speak or teach any other language of the world he will care very much for accents, squigglies and umlauts. I wouldn't want to teach my pupils eg. French from a book without accents. And don't start me about schoolmasters who speak or teach Chinese, Korean, Japanese, Hebrew, Arab, Vietnamese, Thai etc. Actually the 7bit craze at PG at some point went so far as to convert Chinese etexts to 7bit, completely mangling the text. Now suppose some Chinese reader did actually download one of those garbled texts and tried to make it work. Not good for the image of PG. Fortunately those bogus files have been tossed out since. Personally I hold that all the 7bit files of foreign books are useless and dangerous because people may get hold of them instead of the 8bit files they can use. Notice I said "can," not "could." > He just wants BOOKS > that he can READ to his students during the > two hours a day that the electricity is on. In this case he should trade the notebook against a PDA with solar cell charger. Then he could read 24 hours a day. Ah, and, of course, he would have to download the HTML or PDB file, because the hard-wrapped TXT files are very hard to read on a PDA. > The cowboy who's going to be stuck all winter in > a back-country cabin looking after a herd of cattle > in a snowed-in high pasture, or the astronaut, > or the submariner, or the scientists in a South Pole > research station, Acually the South Pole research stations have a pretty fat pipe and plenty of the latest and greatest in computer gadgets. Wish I had. > If we go back to the very basics, this is the goal of > Project Gutenberg. It is no mistake that the very first > things Michael posted were the most important > documents of freedom. You are very America-centric, aren't you? The most important documents of freedom are those of the French revolution (with accents). And if I were Chinese I would probably hold that the most important etc. etc. is Maos Red Book (unicode). Most importancy is relative. Michael posted those first because the computer he was on couldn't hold any longer texts. > An educated populace > can be kept enslaved for only so long, and > then the privy hits the fan. Don't kid yourself. You can fool almost all of the people almost all of the time. The rest you shoot. Or, how do you explain that the most "civilized" countries of this world still use war as an instrument for "solving" international conflicts. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Tue Oct 19 13:06:36 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 19 13:06:47 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <41756FB5.3090702@bohol.ph> References: <20041019191323.6722A4BDA9@ws1-1.us4.outblaze.com> <41756FB5.3090702@bohol.ph> Message-ID: <417573CC.7020002@perathoner.de> Jeroen Hellingman wrote: > Support for most Indian languages is only widely available since Windows > and Office XP, and many less widely used languages are still not > supported, let alone on the old hardware we donate (I decided recently > to increase my bottom line from Pentium 90 to Pentium II 266 for > machines I donate to schools in the Philippines, the latter can just run > windows 2000 with Unicode support.) Then Linux should run just fine. It has also the advantage of not tying those countries into harmful financial dependencies. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Tue Oct 19 13:46:50 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 13:47:13 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: here's my "first draft" on specific elements of a well-done e-book. for a similar analysis, on behalf of distributed proofreaders, see http://dave.maddockfamily.org:81/dp/htmlspec/proposal.html i would particularly like to hear from jon ingram on this topic, as he's done some rather nice .html files for project gutenberg. note that these aspects are for _books_, albeit a large range of them. magazines and newspapers present additional interesting challenges. (ingram could comment on those, too, because he's done magazines.) these are the criteria _i_ use to evaluate the quality of an e-book. please feel free to add anything you feel is missing, or challenge anything you believe to be unnecessary... -bowerbird _____ headers and links headers should be big and/or bold, and start on a new page multiple levels of headers should be exposed/hidden at will. headers at different levels should be sized differently. a table of contents should be created (preferably automatically) (ditto a table of illustrations, footnotes, tables, when applicable) headers should be hotlinked from a table of contents, and they should hotlink _back_ to the table of contents headers should hotlink to the previous header, and the next internal references to a header (see chapter 2) should be hotlinked any fully-specified u.r.l. should be a hotlink to that website _____ other typography block-quotes should be indented, maybe set off in a box tables should look "nicely done", maybe set off in a box the title-page and front-matter should look presentable note indicators should be linked to the note itself, and the note should be backlinked to the indicator index items should be linked to the place in the text, and a backlink should be made as well, if at all possible "also see" references in an index should hotlink appropriately images should be viewable and resizeable at will widow/orphan control is essential display of line-numbers in poems should be optional special treatment of each character's dialogue in a play (e.g., each rendered in a distinct color) should be an option if the e-book is replicating an existing paper-book, then the page-numbers from that p-books should be available, with a user option as to their display, and the e-book display should be able to mimic the p-book if the e-book is replicating an existing paper-book, then the user should be able to print it and duplicate the p-book, _but_also_ be able to change any print parameters at will _____ things about the viewer-program... fast-loading, responsive, customizable, 1-page or 2-page display pagesize, font, fontsize, leading, text-color, background-color, significant lines and strings highlighted, annotations possible, justification (vertical and horizontal) at the choice of the user From jeroen at bohol.ph Tue Oct 19 14:48:49 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Tue Oct 19 14:48:03 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <1d4.2d17520f.2ea60b0d@aol.com> References: <1d4.2d17520f.2ea60b0d@aol.com> Message-ID: <41758BC1.8040300@bohol.ph> Bowerbird@aol.com wrote: > > however, a less-complex subset > -- called t.e.i.-lite -- is available, > and that is what i recommend... > > > You do have a curious way of avoiding capital letters... :-) After al my objections against XML and TEI, you may wonder why I still recommend to use TEI lite is that it forms a very decent base to start some structurial tagging with -- you don't need the full 1400 pages of TEI to get started with it, and you also don't need to reinvent the wheel, and come up with some alternative, equally simple scheme. Doing this has the added benefit that for those text that require it, you can easily step up, and work with the full set, if so required or desired. If you're just against using angled brackets, they are simple to use and understand by both humans and computers. You can do more fancy tricks to make marked-up texts look more like plain text, but attempts to do so, both by TeX or SGML add considerable complexity to the reader -- both machine _and_ human. XML has one thing for it, and that is its simplicity (and that some people build complicated things on it, such as namespaces, XSLT, etc., that require a course in computer-science could be quite hidden from most users.) You can ofcourse object out of principle against something 1400 pages thick, but that is unavoidable, given the complexity and wide diversity of books that have been published in the 500+ years since Gutenberg's invention. Since much of the difficult stuff of XML will eventually be hidden from users. Future versions of layout programs will probably be able to read a thing coded in TEI directly (doing an XSLT transform to some internal format), and format it nicely according to some defaults. You can then apply all the required formatting tweaks to it, export to some nice lay-out format (XSL-FO, maybe, PDF, or who knows), and safe all your nice tweaks, linked to your original TEI, so you have best of both worlds. I already have numerous benefits from working in XML, in that I can generate nice HTML files (that often need no touch-up at all) and reasonable plain ASCII for PG, but also have spelling checking on a per language base, extract all fragments in a certain language, create tables of contents, etc. on the fly, extract dublin core bibliographic records, and more. Jeroen. From shalesller at writeme.com Tue Oct 19 15:19:07 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 19 15:20:05 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com> Bowerbird@aol.com writes: > headers should be big and/or bold, and start on a new page > headers at different levels should be sized differently. Headers should be headers, and defined by the user-agent. > multiple levels of headers should be exposed/hidden at will. I don't understand this one. But don't bother responding, because I know better then to expect an explanation from you. > (ditto a table of illustrations, footnotes, tables, when applicable) Why a table of footnotes? Footnotes are designed to be of minimal interest. > headers should be hotlinked from a table of contents, > and they should hotlink _back_ to the table of contents Don't waste time linking them back; Bowerbird's the only one who wants that feature. > headers should hotlink to the previous header, and the next Why, especially if they're linked to the table of contents? > any fully-specified u.r.l. should be a hotlink to that website I thought you weren't concerned with features less than 1% of PG's books have? > _____ other typography Typography is not something we should worry about. > block-quotes should be indented, maybe set off in a box block-quotes shouldn't be set off in a box; I've never seen a book that did that, and it would usually provide too\ much emphasis. > tables should look "nicely done", maybe set off in a box > > the title-page and front-matter should look presentable No, they should look poorly done and unpresentable. And again, tables shouldn't be set off in a box, for the same reasons block-quotes shouldn't be. > index items should be linked to the place in the text, > and a backlink should be made as well, if at all possible A backlink from where? And why? I think we should use links only where they are explicit or at least loudly implicit in the original work. > images should be viewable and resizeable at will Again, of course they should be viewable, and resizeable at will both strikes me as largely gratitious and user-interface dependent. > widow/orphan control is essential Again, this is all about the user-interface. And I can hardly say that it's essential; it strikes me as a feature almost pointless in online reading. > display of line-numbers in poems should be optional Again, this is all about the user-interface. Again, why does it matter? Do we really have a whole lot of people desperately crying out not to see the line-numbers on their poetry? > special treatment of each character's dialogue in a play > (e.g., each rendered in a distinct color) should be an option Again, this is all about the user-interface. The underlying principle is sound. > if the e-book is replicating an existing paper-book, > then the page-numbers from that p-books should be > available, with a user option as to their display, and Yes. > the e-book display should be able to mimic the p-book > > if the e-book is replicating an existing paper-book, then > the user should be able to print it and duplicate the p-book, "Duplicate" the physical book? We aren't preserving nearly enough information to do that. Long-s, line endings, and font information is just the start. > _but_also_ be able to change any print parameters at will Again, uselessly vague. What print parameters do you want to be able to change? > _____ things about the viewer-program... > > fast-loading, responsive, customizable, 1-page or 2-page display > > pagesize, font, fontsize, leading, text-color, background-color, > significant lines and strings highlighted, annotations possible, > justification (vertical and horizontal) at the choice of the user Sure, whatever. That can all be left to the viewing program's authors, and none of it is exactly revolutionary. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Tue Oct 19 16:02:21 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 16:02:41 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1ea.2d89e678.2ea6f6fd@aol.com> jeroen said: > You do have a curious way of avoiding capital letters... :-) yes i do. :+) thank your for this additional post, which makes your analysis even more even-handed. -bowerbird From jgruber at tampabay.rr.com Tue Oct 19 16:26:42 2004 From: jgruber at tampabay.rr.com (Joseph R. Gruber) Date: Tue Oct 19 16:26:17 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: <1a7.29e5ae04.2ea6a4fb@aol.com> Message-ID: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com> For anyone that wants this rancid pudding you can grab it from: http://www.josephgruber.com/pudding0727-exe.zip http://www.josephgruber.com/pudding0727[1].osx.sit Leech away... Oh, btw, feel free to sue me or send a cease and desist to my ISP. It's BrightHouse Networks in case you need any help... Joseph -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com Sent: Tuesday, October 19, 2004 1:12 PM To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com Subject: re: Re: [gutvol-d] Why Bowerbird is a kook mike said: > Email me off-list if you'd like a copy of this > rancid pudding without surrendering your soul/Yahoo ID. alright, mike! i haven't even released my app yet, and it's already being _bootlegged_! that makes me feel all warm and fuzzy inside... :+) thankyouthankyouthankyou... -bowerbird p.s. i'll answer the rest of mike's post next week (or next month), but this part was just too good to pass up... ;+) _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Tue Oct 19 16:37:02 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 16:37:20 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: <1f5.12f2dac.2ea6ff1e@aol.com> david said: > Aren't you the programmer in this case? nope, not for the o.e.b. reader-program. i _am_ the programmer for my program, which takes a text-file in z.m.l. format. this is my own project as an individual; i wrote it as a present for michael hart. for the o.e.b. viewer, i would have volunteered my services toward sheparding the project and designing the user-interface and functionality. but, as i said, since jon noring has now started his own open-source effort to make a reader-app, i would suggest programmers support him instead. *** > And I'm saying DON'T maintain an XML file. > Maintain the text, in a structured, normalized format > (i.e. add some parameters by which paragraphs > can be spaced, quotes can be used, etc., ala LaTeX). that's what i'm saying, david. and z.m.l. is my version of that "structured, normalized format". but the company line here is that the x.m.l. file will be the master. > If you have it in beta test already, why not submit > what kinds of files IT can produce my program is a viewer-program. it doesn't produce files. it takes ordinary plain-text raw-ascii files as _input_ -- i.e., files just like the current e-texts in the library -- and displays them, giving the user the complete range of functionalities they should expect in an electronic-book. generating other versions of an e-text -- like .html -- (and i'll try to remember to preface "html" with a period) is an add-on for a later version of the program. the purpose of my program is to give the end-user all of the _benefits_ of a marked-up file, without any markup. to do that, my program has to figure out the _structure_ of the file on its own. naturally, once it has _done_ that, it will be relatively straightforward for it to churn out an .html file (or an .rtf file) that reflects that structure. as far as generating a .pdf, the end-user can do that by printing the e-book to a .pdf driver. > The point is, turf-wars, name-calling, and vaporware projects > aren't adding value to the overall goals of PG, as I read them. i agree. but my program is not "vaporware". and if i can -- as i say -- give the benefits of markup to end-users without requiring project gutenberg to actually _do_ any markup, then i think i'll add immense value to the overall goal of the project. but i'm sure the x.m.l. advocates will _still_ want to do markup. some people just _like_ doing things the hard way... ;+) -bowerbird From Bowerbird at aol.com Tue Oct 19 17:05:07 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 19 17:05:30 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: joseph said: > http://www.josephgruber.com/pudding0727-exe.zip > http://www.josephgruber.com/pudding0727[1].osx.sit alright, the bootleggers are going to work for me! i'll have a new version up in the next few days, be sure to check back regularly for that! :+) but to make sure your copy is virus-free, you should download it from yahoogroups. because if you get it from somewhere else, someone _might_ have tampered with it... just a word to the wise, since i know that you p.c. people have a hard time with virii... -bowerbird From hacker at gnu-designs.com Tue Oct 19 17:35:33 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 17:36:47 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: References: Message-ID: > but to make sure your copy is virus-free, you should download it > from yahoogroups. Virus-free? Aren't you distributing it in source form? If you're not distributing source, running a non-descript binary, whether it is assured to be virus-free or not is a really stupid thing to do. I wouldn't trust the code without being able to audit/edit the source anyway. > because if you get it from somewhere else, someone _might_ have > tampered with it... Which is why a multitude of virus scanners exist, for those platforms that happen to be succeptable to these kinds of things. > just a word to the wise, since i know that you p.c. people have a > hard time with virii... A word to the wiser, "virii" is not a word[1]. [1] http://code.gnu-designs.com/plural-of-virus.html David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From stephen.thomas at adelaide.edu.au Tue Oct 19 17:40:57 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Tue Oct 19 17:41:19 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: References: Message-ID: <4175B419.6030301@adelaide.edu.au> Thanks Anne for reminding us all about the original objective of PG -- making texts available for people to read, wherever, and with whatever equipment. OK, you've somewhat overstated the case, and I think by now we'd all agree that "8-bit" characters are important. But it is a shame that most of the geeks -- no offence, I count myself as one -- on this list, immediately skipped your main point to whine about the need for accents and foreign scripts. You guys can't seem to see the wood for the trees. Personally, I've seen the debate about XML (not to mention z.m.l.) somewhere before -- oh, wait up, it was on THIS list, what, about eight months back? And didn't Jon go and set up a pgxml list for that discussion to continue? And didn't that list go strangely quiet shortly thereafter? You can draw your own conclusions from that. Me? I decided that my own project -- building a library of high-quality HTML "web books" was more important than trying to get a room full of experts to agree on even basic things like should we use TEI-lite or invent our own DTD. Basically, Anne is right -- who cares about this stuff? Only the few enthusiasts on this list. Most users of PG don't go around grumbling about the lack of XML or the ability to output as PDF. They're just stoked to be able to find the text online. And on the subject of PDF, I agree with Anne -- it sucks. Why? Well, apart from being too fuzzy to read on screen, it locks the user into a format that's chosen by the engine which created it. Want a different font or type size? Too bad, whoever wrote the XSLT decided that for you. But create an HTML file, properly, and then the user can do what they like with it. Want to print it out in Georgia 24pt? No problem. Your choice. Anyway ... think I'll go and convert a few more books now. Steve Gutenberg9443@aol.com wrote: > ... > > A basic problem in this entire discussion is that there > are a lot of people here who are program-happy, > as opposed to computer-happy. I'm computer- > happy, but like the vast majority of people who > use Gutenberg, I'm really not interested in umpteen > different programs. I just want a book I can read. ... -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From ke at gnu.franken.de Tue Oct 19 18:38:52 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Tue Oct 19 19:35:44 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: <4175B419.6030301@adelaide.edu.au> (Steve Thomas's message of "Wed, 20 Oct 2004 10:10:57 +0930") References: <4175B419.6030301@adelaide.edu.au> Message-ID: Steve Thomas writes: > But create an HTML file, properly, and then the user can do what > they like with it. Want to print it out in Georgia 24pt? No > problem. Your choice. HTML isn't the best choice if you are interested in printing. Define "properly created" :) XML plus a customizable stylesheet (XSL or DSSSL) is better. For those who do not want to create a printable PDF file on their own offer a pre-generated PDF file. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From ke at gnu.franken.de Tue Oct 19 18:57:10 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Tue Oct 19 19:35:45 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: <41758BC1.8040300@bohol.ph> (Jeroen Hellingman's message of "Tue, 19 Oct 2004 23:48:49 +0200") References: <1d4.2d17520f.2ea60b0d@aol.com> <41758BC1.8040300@bohol.ph> Message-ID: Jeroen Hellingman writes: > Since much of the difficult stuff of XML will eventually be hidden from > users. Future versions of layout programs will probably be able to read > a thing coded in TEI directly (doing an XSLT transform to some internal > format), and format it nicely according to some defaults. It already works for simple books using CSS; here is an example (a text by Ludwig Tieck in German): http://www.gnu.franken.de/Tieck/Werke/dichterleben/ Sorry for the strange layout - it is just a test. It works with Mozilla 1.6 and better. Of course, the other feature you mentioned are more important. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From ke at gnu.franken.de Tue Oct 19 19:08:36 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Tue Oct 19 19:35:48 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com> (D. Starner's message of "Tue, 19 Oct 2004 14:19:07 -0800") References: <20041019221907.EAF404BDA9@ws1-1.us4.outblaze.com> Message-ID: "D. Starner" writes: >> headers should be hotlinked from a table of contents, >> and they should hotlink _back_ to the table of contents > > Don't waste time linking them back; Bowerbird's the only > one who wants that feature. > >> headers should hotlink to the previous header, and the next > > Why, especially if they're linked to the table of contents? Here Bowerbird is right - that's a nice feature and, of course, it is supported by HTML since ages (check out the "link" element). >> display of line-numbers in poems should be optional > > Again, this is all about the user-interface. Again, why does it > matter? Do we really have a whole lot of people desperately > crying out not to see the line-numbers on their poetry? It is easy to solve this issue using different CSS stylesheets. >> special treatment of each character's dialogue in a play >> (e.g., each rendered in a distinct color) should be an option > > Again, this is all about the user-interface. The underlying > principle is sound. Yes, user-interface issue (-> CSS). >> if the e-book is replicating an existing paper-book, >> then the page-numbers from that p-books should be >> available, with a user option as to their display, and > > Yes. CSS. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From skip at nextra.sk Tue Oct 19 19:41:45 2004 From: skip at nextra.sk (Skippi) Date: Tue Oct 19 19:42:16 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: References: <4175B419.6030301@adelaide.edu.au> Message-ID: <1169277476.20041020044145@nextra.sk> Hello Karl! Wednesday, October 20, 2004, 3:38:52 AM, you wrote: > HTML isn't the best choice if you are interested in printing. IMO: HTML isn't the best choice if you are interested in anything. This obsolete format should not be considered if even mentioned. Use XHTML + CSS instead, if you are allergic to XML. With properly written XHTML and customizable CSS user can do what ever he wishes with the files and the things are still as they should be. -- Skippi mailto:skip@nextra.sk From shalesller at writeme.com Tue Oct 19 19:58:44 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 19 19:58:59 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com> Karl Eichwalder writes: > It is easy to solve this issue using different CSS stylesheets. I have three problems with using CSS. First, and most fundamental, if you want to turn page numbers and line numbers on and off, and offer a choice of 5 different background colors, you need 20 different CSS options. Another feature will at least double the number of files. Secondly, it doesn't work on many web browsers, and I don't think that lynx and friends have any intent of ever supporting CSS. Lastly, it's never struck me as particularly user-friendly. If you can point me to a place where it's actually used and it's easy to change without being a computer science major, I'd appreciate it. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hacker at gnu-designs.com Tue Oct 19 20:00:04 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 20:00:49 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: <1169277476.20041020044145@nextra.sk> References: <4175B419.6030301@adelaide.edu.au> <1169277476.20041020044145@nextra.sk> Message-ID: > IMO: HTML isn't the best choice if you are interested in anything. > This obsolete format should not be considered if even mentioned. Use > XHTML + CSS instead, if you are allergic to XML. XHTML is HTML 4.0 designed to work as an XML application. In fact, there aren't a lot of differences between HTML and XHTML. Of course, XHTML, HTML, and XML are all "children" of SGML anyway, so we're all talking about generalized markup in some form or another. You can't set up a rigid set of rules that will apply across all past, present and future versions of printed works in electronic format. Whatever format you choose to use, must be extensible enough to scale for future capabilities, as well as the ability to handle the capabilities of documents created in the past. > With properly written XHTML and customizable CSS user can do what > ever he wishes with the files and the things are still as they > should be. Almost whatever s/he wishes. There are limitations in every format, depending on how broadly you want to consider using it. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Tue Oct 19 20:05:02 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Tue Oct 19 20:05:49 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com> References: <20041020025844.2E7264BDA9@ws1-1.us4.outblaze.com> Message-ID: > I have three problems with using CSS. First, and most fundamental, > if you want to turn page numbers and line numbers on and off, and > offer a choice of 5 different background colors, you need 20 > different CSS options. Another feature will at least double the > number of files. Not quite... unless you're not using CSS properly. With the proper use of CSS selectors, hidden and visible properties, and other attributes and classes, you can make this very small and tight. It just takes a bit of up-front planning to get it all working right. Most people don't use CSS in any sort of optimized format. > Secondly, it doesn't work on many web browsers, and I don't think > that lynx and friends have any intent of ever supporting CSS. Hence the "C" part of the CSS spec. It should always degrade properly to continue to work with the lesser capabilities of older browsers or browsers that don't support the full rich CSS styles. This includes PDAs, cellphones, WAP devices, screen scrapers, syndicated feeds, text-based browsers, text-to-speech devices, and so on. > Lastly, it's never struck me as particularly user-friendly. If you > can point me to a place where it's actually used and it's easy to > change without being a computer science major, I'd appreciate it. Why should you want to "change" the CSS? Maybe I'm missing your goals here. Can you try to explain this a bit further, perhaps by providing some examples you've done that solve/show these problems? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From shalesller at writeme.com Tue Oct 19 20:10:50 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 19 20:11:06 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020031050.D502F4BDA9@ws1-1.us4.outblaze.com> Steve Thomas writes: > Most users of PG don't go around grumbling about the lack of XML > or the ability to output as PDF. They're just stoked to be able > to find the text online. That's why they're users of PG. If they needed XML or PDF, they go elsewhere. And frankly, I've heard many complaints about how hard it is to process PG texts and how much information is lost. I've personally found it a pain to produce good printed versions of the PG etexts. I think it a bad idea to start saying that "Most users ... don't go around grumbling", because many of those who would grumble will go elsewhere, and many of those who do grumble don't do so where we can here. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From shalesller at writeme.com Tue Oct 19 20:16:03 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Oct 19 20:18:41 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020031603.B46C74BDA9@ws1-1.us4.outblaze.com> "David A. Desrosiers" writes: > With the proper use of CSS selectors, hidden and visible > properties, and other attributes and classes, you can make this very > small and tight. I don't care how it works internally; how does it look to the users? > It should always degrade properly to continue to work with the > lesser capabilities of older browsers or browsers that don't support > the full rich CSS styles. This includes PDAs, cellphones, WAP devices, > screen scrapers, syndicated feeds, text-based browsers, text-to-speech > devices, and so on. So this won't work for many users, in fact the group of users that would most likely want to turn off line numbers on poetry. I think that important to remember. > Why should you want to "change" the CSS? Maybe I'm missing > your goals here. Can you try to explain this a bit further, perhaps by > providing some examples you've done that solve/show these problems? Again, I'm not looking at it internally. So far, all I've seen with CSS forces you to change the HTML code to change things. How does this look to the end user? -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From stephen.thomas at adelaide.edu.au Tue Oct 19 21:54:30 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Tue Oct 19 21:54:54 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: References: <4175B419.6030301@adelaide.edu.au> Message-ID: <4175EF86.8060108@adelaide.edu.au> As usual, people have missed the point of the original post (Anne's) which was that we need to remember the *user* -- that guy in Africa with only 2 hours of electricity each day. Anne suggested (I think) that he uses a laptop, but more likely he's using a worn-out IBM 486 running Windows 3, so all this geek-talk about XML and XSLT etc. is irrelevant to him -- he'll be lucky if he can run a standard web browser. [I can't believe that people still think they're doing good by shipping old 486's to Africa -- but apparently its true. I recently donated some old Pentium II's to a charity, and they couldn't believe their luck.] Anyway: Karl Eichwalder wrote: > Steve Thomas writes: > > >> But create an HTML file, properly, and then the user can do >> what they like with it. Want to print it out in Georgia >> 24pt? No problem. Your choice. > > > HTML isn't the best choice if you are interested in printing. > Define "properly created" :) XML plus a customizable > stylesheet (XSL or DSSSL) is better. For those who do not > want to create a printable PDF file on their own offer a > pre-generated PDF file. > My definition of "properly created" HTML would be HTML4 strict, plus CSS. I was trying to avoid obvious detail. And HTML is the *best* choice for printing if you don't have the in-depth knowledge about XML/XSL etc. or the tools to make that happen. Anyone with IE6 can make a pretty good print of my HTML books, straight from the browser. Skippi wrote: > This obsolete format should not be considered if even > mentioned. Use XHTML + CSS instead, if you are allergic to > XML. With properly written XHTML and customizable CSS user > can do what ever he wishes with the files and the things are > still as they should be. > XHTML is -- for practical purposes -- the same as HTML 4 strict, except that it enforces good practice, whereas HTML allows the author some latitude. The important difference, to a user, is that HTML is pretty much guaranteed to work in all browsers, whereas XHTML can be "difficult" in some circumstances -- e.g. if you include the header, it can fould up IE6. A while ago, I started converting all my ebooks to XHTML, but immediately ran into problems that were'nt worth my time to fix. One day, browsers will commonly deal correctly with XML of any type, with an appropriate style sheet. Right now, HTML is the format that works best. This is, I know, very boring to those of us who like playing with the latest gizmos and formats. But the reality is, if you want the widest possible audience, you've got to give them the format that's easiest for them. For a more detailed discussion of this topic, see the archives, about January this year if memory serves, where you'll probably find me -- and you -- saying much the same things. -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From tb at baechler.net Tue Oct 19 23:55:02 2004 From: tb at baechler.net (Tony Baechler) Date: Tue Oct 19 23:54:11 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <20041020031050.D502F4BDA9@ws1-1.us4.outblaze.com> Message-ID: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com> At 07:10 PM 10/19/2004 -0800, you wrote: >Steve Thomas writes: > > Most users of PG don't go around grumbling about the lack of XML > > or the ability to output as PDF. They're just stoked to be able > > to find the text online. > >That's why they're users of PG. If they needed XML or PDF, they >go elsewhere. And frankly, I've heard many complaints about how >hard it is to process PG texts and how much information is lost. I don't want to add to the flame war here, but I can say this, which has been said here before. Sometimes I will find a PG text which I would like more information about, so I will go to google and search for it. In almost all cases, I have found tons of sites which somehow convert the books into html or a similar format. blackmask.com immediately comes to mind but there are lots of others. Many don't give credit to PG at all. My point is that yes, I agree with gutenberg9443 in that I would much rather have plain text first and worry about the rest later, but many people don't need to complain to PG about plain text only for the simple reason that they can look for almost anything on google and find a nicer formatted version. I would like to see PG eventually go to xml not because I particularly like the format but because the new DAISY standard for digital talking books for the blind uses a form of xml. It should, in theory, be possible to convert html to DAISY, but how well that would work I don't know. If anyone wants to analyze a set of DAISY files, go to http://bookshare.org/ and search for an early PG title. I say "early" because they apparently quit adding the newer titles. I think there might be a demo link on there just for public domain books. I will make one other comment on accents. Yes, I can see the importance of 8-bit files. I have a local mirror of almost all of PG on my system and I finally switched to getting 8-bit files only of works in non-English. However, since I am blind and I read with speech, the accents really don't matter since the synthesizer doesn't pronounce them anyway. If it sees a letter in the high ASCII range, it skkips it. This is especially bad, for example, with the works of Tolkien because accents are used so heavily. From Bowerbird at aol.com Wed Oct 20 00:16:06 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 00:16:30 2004 Subject: [gutvol-d] re: e-books for blind people Message-ID: <1f4.13259f2.2ea76ab6@aol.com> tony, my "viewer" program has text-to-speech. right now i've just turned it on for the mac, because speech synthesis is so easy there... but if you are on windows, and you're willing to give me feedback on it, i'll do the extra work to make windows work too, if that's what you own, as the headers from your e-mail would indicate... accessibility is _so_ very important to society! while text-to-speech is _vital_ for blind people, i also think a lot of sighted people will come to appreciate it as well, so this is a very important arena to me, and i'd appreciate your help with it very much. think of it as your own private app! :+) for instance, i can program around that problem where accented letters are skipped, and any other glitches too, so the plain-text files, as they are, will be as useful to you as any daisy file would be. because at the rate books are being scanned these days, nobody is going to have enough time to mark 'em all up -- i'm not surprised the daisy people can't keep pace -- so we have to find a way to make plain-text shine... as you say, it's workable right now, in fact often it's the best choice available, but i know we can make it better... -bowerbird From Bowerbird at aol.com Wed Oct 20 01:26:13 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 01:26:38 2004 Subject: [gutvol-d] is your head spinning? Message-ID: <1d5.2d0bf731.2ea77b25@aol.com> is your head spinning? do you feel like you're swimming through acronym soup? here's a little _overview_ to help you get your bearings, at least in regard to _my_ work, _my_ viewer-program, _my_ format, _my_ markup system, and _my_ philosophy. 1. the e-texts -- as they are now -- must be regularized. 2. i can write programs to do most of that automatically. 3. the results need to be checked for quality control, and 4. some missing information will need to be re-inserted. 5. once that is done, the files will be _finished_, in that 6. my viewer will present them as high-powered e-books. 7. users can push a button to create high-end .html files, 8. or save text as an .rtf file, or print out to paper or .pdf, 9. in a way that gives 'em customized high-quality output. 10. my program will do text-to-speech, and screenshots, 11. and let people explore the project gutenberg library, 12. and easily report errors they encounter in any e-text. 13. those error-correction reports will be automatically 14. routed to a system that presents all the material, so 15. a human only has to say "yes" to approve the mod, and 16. change-logs will be updated and a notice distributed. 17. this e-text standardization and ease of handling will 18. nurture a flowering of synergistic uses of the library 19. by an array of creative and imaginative programmers 20. that will engender a book-driven revolution in thought. 21. and everyone will live happily ever after. the end. -bowerbird From holden.mcgroin at dsl.pipex.com Wed Oct 20 02:44:33 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Wed Oct 20 02:44:12 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com> References: <200410192326.i9JNPwNw027485@ms-smtp-05.tampabay.rr.com> Message-ID: <41763381.4010304@dsl.pipex.com> Joseph R. Gruber wrote: > For anyone that wants this rancid pudding you can grab it > from: > > http://www.josephgruber.com/pudding0727-exe.zip > http://www.josephgruber.com/pudding0727[1].osx.sit > > Leech away... Oh, btw, feel free to sue me or send a cease and desist to my > ISP. It's BrightHouse Networks in case you need any help... Wow, a Windows version _and_ an OSX version. Any suggestions on how to get it running on Linux? How about the source code? Cheers, Holden From holden.mcgroin at dsl.pipex.com Wed Oct 20 02:57:29 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Wed Oct 20 02:57:08 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <1169277476.20041020044145@nextra.sk> References: <4175B419.6030301@adelaide.edu.au> <1169277476.20041020044145@nextra.sk> Message-ID: <41763689.3040002@dsl.pipex.com> Skippi wrote: > This obsolete format should not be considered if even mentioned. Use > XHTML + CSS instead, if you are allergic to XML. With properly > written XHTML and customizable CSS user can do what ever he wishes > with the files and the things are still as they should be. I've heard some mobile devices with limited memory aren't able to parse XHTML files. Someone using an older browser may not either, plus the CSS may not even be of any use to them. I don't see why we should exclude these people. HTML has its uses just like XHTML, XML and PDF do. The ideal would be to have the texts in an XML-based format so transforming to standards-compliant HTML (and XHTML/PDF) is trivial. An XML->HTML converter could then be written by anyone who cares enough about it to do so. Cheers, Holden From Bowerbird at aol.com Wed Oct 20 03:29:15 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 03:30:07 2004 Subject: [gutvol-d] Why Bowerbird is a kook Message-ID: holden said: > Wow, a Windows version _and_ an OSX version. classic mac is also available, for people running that. > Any suggestions on how to get it running on Linux? i've announced that for release in 2005. you can make it happen faster by buying me a linux machine. or you can just wait for it... :+) > How about the source code? it's not available. even if it was, it's in realbasic. so you'd need to buy that program (for which source is not available) to compile it. you could also port it to some other language, i guess. but why not just rewrite it? the ideas are not all that complex. and i'm not that good of a programmer. in fact, i write spaghetti code. good spaghetti -- never crashes, always works. but hard to decipher... so rewriting it is what you'd end up doing anyway, i would guess... so grab a copy, see how it works, and start coding... -bowerbird From stephen.thomas at adelaide.edu.au Wed Oct 20 03:37:05 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Oct 20 03:37:35 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com> Message-ID: <41763FD1.7060004@adelaide.edu.au> At 07:10 PM 10/19/2004 -0800, somebody wrote: > Steve Thomas writes: >> Most users of PG don't go around grumbling about the lack >> of XML or the ability to output as PDF. They're just stoked >> to be able to find the text online. > > That's why they're users of PG. If they needed XML or PDF, > they go elsewhere. That's not the point. People don't go to PG thinking, "hmmm, I wonder if they have any XML files". They go looking for a book. If you want the text of a particular book, you'll use it whatever format it comes in, so long as you have the software to handle that format. Nobody "needs" XML or PDF. They "need" the words of the book. Formats are secondary. One of the original ideals of PG was that there had to be a plain text version, on the basis that everyone had at least the tools to handle plain text. Now-a-days, almost everyone has a web browser, so HTML comes second on the accessibility list. Very few people, I imagine, have the necessary tools to work with a TEI or SGML file. Now, there's nothing wrong with the notion of converting all PG texts to some XML master format, and then exporting that to umpteen other formats on demand. Practically though, that's a lot of work -- a *lot* of work -- and I don't yet see any signs that progressing. Commercially (if one were to do this commerically -- this is a hypothetical), I'd estimate such a conversion task, for 10,000 books, to cost around $1,000,000 in salaries alone. Of course, there's always volunteer effort. But if volunteers are busy converting plain texts to XML so that they can be output as plain text (or HTML/PDF/...), does that reduce the effort put into scanning/OCR/proof-reading? Could it be better to put the PG effort into getting plain text editions out, and leave it to others to do the extra conversion to XML etc.? This is a model that has worked really very well for quite a few years, without complaint from any but a few tech-enthusiasts. -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From marcello at perathoner.de Wed Oct 20 03:41:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 03:41:34 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <4175B419.6030301@adelaide.edu.au> References: <4175B419.6030301@adelaide.edu.au> Message-ID: <417640C6.5020607@perathoner.de> Steve Thomas wrote: > Basically, Anne is right -- who cares about this stuff? That is the exact same answer Tim Berners-Lee got when he first presented his stuff. :-) "I can view a text file with "more", I just hit the space bar until I get to the right page. With your new-fangled format I need a -- what? -- browser? I don't have one. Why should I need a `browser' just to read some text?" > Only the few > enthusiasts on this list. Most users of PG don't go around grumbling > about the lack of XML or the ability to output as PDF. They're just > stoked to be able to find the text online. I can assure you that some do. Many start their own projects to markup PG texts, most of them dont go very far, though. One example: http://gutenberg.hwg.org/ > And on the subject of PDF, I agree with Anne -- it sucks. The only format we have today to bring mathematics to the unsophisticated user. If you don't want to install TeX, PDF is the only way. PDF is not so bad. It is widely accepted, well documented, free tools exist to generate PDFs. It has all the limitations of paper books, though. You cannot resize a printed book, or change the font, etc. Well same limitations for PDF. It hasn't stopped people from buying paper books. -- Marcello Perathoner webmaster@gutenberg.org From jgruber at tampabay.rr.com Wed Oct 20 03:50:58 2004 From: jgruber at tampabay.rr.com (Joseph R. Gruber) Date: Wed Oct 20 03:51:19 2004 Subject: [gutvol-d] Why Bowerbird is a kook In-Reply-To: Message-ID: <200410201050.i9KAooNw006283@ms-smtp-05.tampabay.rr.com> >> in fact, i write >> spaghetti code. >> good spaghetti -- >> never crashes, >> always works. >> but hard to >> decipher... Bulls__t -- Try this. Run the program and then Ctrl+Alt+Del it. Ooops...there goes a "never-crash". First time I ever seen a program crash when you try to end task on it. ;) Joseph P.S. Why are you hiding the etexts in your code instead of making them separate .txt's? From jonathan_ingram at yahoo.com Wed Oct 20 04:00:09 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 04:00:32 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <417640C6.5020607@perathoner.de> Message-ID: <20041020110009.62059.qmail@web41702.mail.yahoo.com> --- Marcello Perathoner wrote: > Steve Thomas wrote: > > > > Basically, Anne is right -- who cares about this stuff? > > That is the exact same answer Tim Berners-Lee got when he first > presented his stuff. :-) Indeed. Over at DP we're progressing, in small baby-steps, toward producing decently marked up editions of all new material we produce. And when we at DP find a markup format we're comfortable with, then PG had better get comfortable with it as well, because we are now produce the vast majority of all PG material. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From marcello at perathoner.de Wed Oct 20 04:13:02 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 04:13:26 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <41763FD1.7060004@adelaide.edu.au> References: <5.2.0.9.0.20041019234711.01f57a70@snoopy2.trkhosting.com> <41763FD1.7060004@adelaide.edu.au> Message-ID: <4176483E.4040403@perathoner.de> Steve Thomas wrote: > Nobody "needs" XML > or PDF. They "need" the words of the book. Nobody "needs" television or cars. All they "need" is a newspaper and a pair of shoes. > Very few people, I imagine, have the necessary tools to work with a TEI > or SGML file. TEI is not intended as end-user format. End-users should grab the generated HTML file. > Now, there's nothing wrong with the notion of converting all PG texts to > some XML master format, and then exporting that to umpteen other formats > on demand. [...] I'd estimate > such a conversion task, for 10,000 books, to cost around $1,000,000 in > salaries alone. So think of what great value we would donate to the world. > Could it be better to put the PG effort into getting plain text editions > out, and leave it to others to do the extra conversion to XML etc.? This > is a model that has worked really very well for quite a few years, > without complaint from any but a few tech-enthusiasts. The main downside is that they mark up a *copy* of the text. When the original gets updated, the marked up copy falls out of sync and so all the generated formats. This problem can only be obviated if PG is to markup the original. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Oct 20 04:17:51 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 04:18:14 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <20041020110009.62059.qmail@web41702.mail.yahoo.com> References: <20041020110009.62059.qmail@web41702.mail.yahoo.com> Message-ID: <4176495F.8070805@perathoner.de> Jonathan Ingram wrote: > Indeed. Over at DP we're progressing, in small baby-steps, toward producing > decently marked up editions of all new material we produce. And when we at DP > find a markup format we're comfortable with, then PG had better get comfortable > with it as well, because we are now produce the vast majority of all PG > material. A simple XSLT will convert your format into TEI. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Oct 20 05:05:36 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 05:05:43 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis Message-ID: <20041020120536.AE85F4F45F@ws6-5.us4.outblaze.com> Most of use HTML as a shorthand for HTML with CSS (or XHTML, if you prefer). ----- Original Message ----- From: Skippi To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Re: jeroen's even-handed analysis Date: Wed, 20 Oct 2004 04:41:45 +0200 > > Hello Karl! > > Wednesday, October 20, 2004, 3:38:52 AM, you wrote: > > > HTML isn't the best choice if you are interested in printing. > > IMO: HTML isn't the best choice if you are interested in anything. > > This obsolete format should not be considered if even mentioned. Use > XHTML + CSS instead, if you are allergic to XML. With properly > written XHTML and customizable CSS user can do what ever he wishes > with the files and the things are still as they should be. > > -- > > Skippi mailto:skip@nextra.sk > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Wed Oct 20 05:16:13 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 05:16:15 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com> Well, I use CSS to allow the user to switch between showing page numbers and not showing page numbers on the fly. In fact, both CSS style sheets are embedded within the main HTML file so that extra files are unnecessary. Here is a link to a Bay State Monthly issue that uses this feature... http://www.gutenberg.org/dirs/1/3/7/6/13761/13761-h/13761-h.htm In Mozilla-based browsers, you can switch between the style sheets very easily by clicking the icon in the lower left corner of the browser window. The default setting is NOT to show the page numbers, since the majority of people could care less how the original paper version was numbered. But for those that WANT to know, clicking the Original Page Numbers style will have all the original page numbers appear in the margin. Now Internet Explorer doesn't seem to have a way to switch styles on the fly, which is a shame, but it just defaults to not showing the page numbers. The page numbers are still in the HTML source, though, so someone that needs it can still get to the information is they absolutely had to. I am *hoping* that when the switch to XML/TEI happens in the future, this markup will transition fairly easily, too. Josh ----- Original Message ----- From: "D. Starner" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] Re: aspects of a well-done e-book Date: Tue, 19 Oct 2004 18:58:44 -0800 > > Karl Eichwalder writes: > > It is easy to solve this issue using different CSS stylesheets. > > I have three problems with using CSS. First, and most fundamental, > if you want to turn page numbers and line numbers on and off, and > offer a choice of 5 different background colors, you need 20 different > CSS options. Another feature will at least double the number of files. > > Secondly, it doesn't work on many web browsers, and I don't think > that lynx and friends have any intent of ever supporting CSS. > > Lastly, it's never struck me as particularly user-friendly. If you > can point me to a place where it's actually used and it's easy to > change without being a computer science major, I'd appreciate it. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Wed Oct 20 05:21:17 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 05:21:20 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020122118.117874F481@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" > > With the proper use of CSS selectors, hidden and visible > properties, and other attributes and classes, you can make this very > small and tight. It just takes a bit of up-front planning to get it > all working right. Most people don't use CSS in any sort of optimized > format. It sounds like you're just the expert I've been looking for. Can CSS somehow specify a "general" part of the style and then have "special" sections that modify it when that style is selected. For instance, a general section that sets up a margin size, justifies the text ... Then a style for showing page numbers and a section for not showing page numbers. Right now, my CSS header includes the general section twice, once for each style. If I could just have it once, it would cut down on the size of the header quite a bit AND allow me to add some new features to the CSS header without feeling bad about how huge the header is getting. JOsh From hacker at gnu-designs.com Wed Oct 20 06:27:08 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 06:27:38 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com> References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com> Message-ID: > In fact, both CSS style sheets are embedded within the main HTML > file so that extra files are unnecessary. If you have a lot of texts, putting the stylesheet directly inside the HTML unnecessarily bloats the content, and removes one of the main benefits of CSS.. being able to separate content from presentation. This means that if you have 1,500 works all formatted with an internal stylesheet, and you want to change the fonts for one class and add some borders around another, and add a selector for a new text class... you have to modify 1,500 stylesheets, insteasd of one. Yes, you could do all of that with a single perl one-liner, but why should you? > In Mozilla-based browsers, you can switch between the style sheets > very easily by clicking the icon in the lower left corner of the > browser window. Or, more correctly, by going to View -> Use Style, because there is no such selector in Mozilla or "Mozilla-based browsers" in the lower left-hand corner. At least not on my Unix, Linux and Windows versions of Mozilla (all current). > But for those that WANT to know, clicking the Original Page Numbers > style will have all the original page numbers appear in the margin. Why not also break up the pages with border-bottom on the bottom of each respective div, so they look like _actual_ pages. > Now Internet Explorer doesn't seem to have a way to switch styles on > the fly, which is a shame, but it just defaults to not showing the > page numbers. Well, that is mostly because MSIE is not a browser, at least not according to the standards body which defines how a web browser should function, from the socket level all the way on up to the presentation level. MSIE is a file manager, based on an ActiveX control that tries to render HTML. It supports HTML3.2 fully, "most" of HTML4, "some" of CSS1, hardly any CSS2, and CSS3... whats that? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From skip at nextra.sk Wed Oct 20 06:48:21 2004 From: skip at nextra.sk (Skippi) Date: Wed Oct 20 06:48:35 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com> Message-ID: <1888158910.20041020154821@nextra.sk> Hello David! Wednesday, October 20, 2004, 3:27:08 PM, you wrote: > Or, more correctly, by going to View -> Use Style, because > there is no such selector in Mozilla or "Mozilla-based browsers" in > the lower left-hand corner. At least not on my Unix, Linux and Windows > versions of Mozilla (all current). It works perfectly on Firefox. The icon in the lower left corner is not instant, appears only when proper file is loaded. -- Skippi From jonathan_ingram at yahoo.com Wed Oct 20 06:57:50 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 06:57:53 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <4176495F.8070805@perathoner.de> Message-ID: <20041020135750.11303.qmail@web41728.mail.yahoo.com> --- Marcello Perathoner wrote: > A simple XSLT will convert your format into TEI. I'm not sure any use of XSLT can be called simple :). I've tried reading the spec, and I'm still recovering from the headaches. Fortunately there are easier ways to style (rather than transform) XML, using CSS. This is very well supported in all the Mozilla derivatives. While XSLT is something I'm going to have to look at eventually, for the moment I'm happy with CSS :). If you want an example of what I'm playing with at the moment: recently I and another DP volunteer have been kicking around some ideas for semantic markup of drama. While initially we were working with straight HTML, this quickly gets annoying, due to the amount of messing around with divs involved, and the need to consider how the output will be displayed on browsers with poor support for CSS. I've found it much easier to investigate options by working with an 'HTML+extra tags' markup. You can see my current working by looking at the blah.* files here: http://www.pgdp.net/phpBB2/viewtopic.php?p=94734 Save each file to the name given in its post subject heading. Any Mozilla derivative should show the .xml file styled in a way which almost exactly replicates the .html file. The source for the XML edition is much easier to read. Those of you who know TEI can probably tell that 'my' markup is very similar to TEI markup (although a little more verbose). Much of it was arrived at independently, which makes me more confident that this styling approach is relatively sensible. The example demonstrates markup of drama and poetry, with decent handling of line continuations and line numbers in poetry, and stage directions in drama. I've used the HTML 'edition' of this poetry markup for quite a while now in texts I've PPed for PG. Note that this is still a work in progress, so resist the tempation to criticise the minutiae of my CSS :). One of the other reasons I think a simple XML-style is useful is that we're currently planning to seperate the proofreading rounds from the markup rounds at DP. Every page of a DP project currently goes through two 'rounds' of processing. In each round proofers are expected to not only detect OCR errors, but add inline markup for italic, bold, material in non-Latin alphabets, etc., and add block markup for poetry, tables, and so on. This will be split into an initial two rounds only concerned with the text, plus an extra procedure to mark the text correctly. At the moment the markup we use is homegrown and kludgy -- we have a great opportunity at the moment to move to something more sensible, and I strongly believe that some simple XML-derivative is the markup we need. I'm even more convinced of the utility of XML for DP now that I've seen how easy it is to style it. One of the problems of relying on something like XSLT is that it can be hard to go backwards from errors in the output to find the corresponding error in the original XML input. Being able to get direct feedback by viewing a styled version of the XML makes life much easier. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From joshua at hutchinson.net Wed Oct 20 07:43:59 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 07:44:02 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" > > > In fact, both CSS style sheets are embedded within the main HTML > > file so that extra files are unnecessary. > > If you have a lot of texts, putting the stylesheet directly > inside the HTML unnecessarily bloats the content, and removes one of > the main benefits of CSS.. being able to separate content from > presentation. > > This means that if you have 1,500 works all formatted with an > internal stylesheet, and you want to change the fonts for one class > and add some borders around another, and add a selector for a new text > class... you have to modify 1,500 stylesheets, insteasd of one. Yes, > you could do all of that with a single perl one-liner, but why should > you? Well, in a perfect world, we could guarantee that the separate CSS file is accessible and life is good. Unfortunately, since we can't guarantee the CSS file is there, we decided to embed the CSS inside the HTML. It bloats it somewhat, but it is still smaller than the obligatory PG header information, so I don't feel TOO badly about it. And now we get a fully self-contained file. > > > In Mozilla-based browsers, you can switch between the style sheets > > very easily by clicking the icon in the lower left corner of the > > browser window. > > Or, more correctly, by going to View -> Use Style, because > there is no such selector in Mozilla or "Mozilla-based browsers" in > the lower left-hand corner. At least not on my Unix, Linux and Windows > versions of Mozilla (all current). > Now that I think about it, you may be right... In Firefox (which is what I have on this machine), there is no View -> Use Style menu option, but there is the icon in the bottom left corner. *shrug* > > But for those that WANT to know, clicking the Original Page Numbers > > style will have all the original page numbers appear in the margin. > > Why not also break up the pages with border-bottom on the > bottom of each respective div, so they look like _actual_ pages. > I have a big aversion to taking an electronic document and presenting it as "pages." First and foremost, it is ugly. Second, it is going to wreck havoc whenever the user wants to change font sizes, page sizes, etc. This method allows the "scholar" to have original page number references (which the scholars in the original discussion said was important) without tying the online layout to the limitations of the physical page layout. > > Now Internet Explorer doesn't seem to have a way to switch styles on > > the fly, which is a shame, but it just defaults to not showing the > > page numbers. > > Well, that is mostly because MSIE is not a browser, at least > not according to the standards body which defines how a web browser > should function, from the socket level all the way on up to the > presentation level. > > MSIE is a file manager, based on an ActiveX control that tries > to render HTML. It supports HTML3.2 fully, "most" of HTML4, "some" of > CSS1, hardly any CSS2, and CSS3... whats that? > > You are preaching to the choir here! The only thing I use IE for anymore is to check a new HTML document before posting it to make sure IE isn't mangling it TOO badly. JHutch From marcello at perathoner.de Wed Oct 20 08:25:29 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 08:25:37 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <20041020135750.11303.qmail@web41728.mail.yahoo.com> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> Message-ID: <41768369.6050204@perathoner.de> Jonathan Ingram wrote: > I'm not sure any use of XSLT can be called simple :). I've tried reading the > spec, and I'm still recovering from the headaches. Fortunately there are easier > ways to style (rather than transform) XML, using CSS. This is very well > supported in all the Mozilla derivatives. While XSLT is something I'm going to > have to look at eventually, for the moment I'm happy with CSS :). CSS, while simpler, is less powerful and gives you only HTML. > If you want an example of what I'm playing with at the moment: recently I and > another DP volunteer have been kicking around some ideas for semantic markup of > drama. What I've done with Faust is to reformat the text file in a sensible way and then use perl to automatically add TEI markup. I advise to use a perl script to add the basic markup and to refine the markup in a second markup-proofing step. > Those of you who know TEI can probably tell that 'my' markup is very similar to > TEI markup (although a little more verbose). Much of it was arrived at > independently, which makes me more confident that this styling approach is > relatively sensible. Why do people keep reinventing the wheel? TEI is perfectly good and designed explicitly for the task we have at hand. And it is a standard that is already in use in many e-libraries worldwide. I don't think we'll get PG to post texts in non-standard cooked-up formats. They are already making enough fuzz over perfectly valid TEI files. -- Marcello Perathoner webmaster@gutenberg.org From hacker at gnu-designs.com Wed Oct 20 08:28:28 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 08:29:39 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com> References: <20041020144359.37C401096F3@ws6-4.us4.outblaze.com> Message-ID: > Well, in a perfect world, we could guarantee that the separate CSS > file is accessible and life is good. Unfortunately, since we can't > guarantee the CSS file is there, we decided to embed the CSS inside > the HTML. If you can guarantee the HTML is there, you can guarantee that the CSS is there. If the CSS is missing, it shouldn't "break" the usability of the HTML document. > It bloats it somewhat, but it is still smaller than the obligatory > PG header information, so I don't feel TOO badly about it. And now > we get a fully self-contained file. I don't understand the correlation. What does your CSS size have to do with the obligatory PG header size? > Now that I think about it, you may be right... In Firefox (which is > what I have on this machine), there is no View -> Use Style menu > option, but there is the icon in the bottom left corner. *shrug* For those that want to see this in a much-more expanded version, go to http://w3.org/Style/ in a Gecko-based browser, and click on the icon, or go to View -> Use Style, and try the various stylesheets listed there. > I have a big aversion to taking an electronic document and > presenting it as "pages." First and foremost, it is ugly. I submit that having page numbers in an unintuitive place (left-side margins, which doesn't appear in any printed work I can find), is just as ugly. > Second, it is going to wreck havoc whenever the user wants to > change font sizes, page sizes, etc. Having the border at the bottom of page 423 with a font size of 1.0em is still going to put the border at the bottom of the page when the font is 2.8em. I think I'm missing your allegory here. Can you explain? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From joshua at hutchinson.net Wed Oct 20 08:48:00 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 08:48:05 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" > > > Well, in a perfect world, we could guarantee that the separate CSS > > file is accessible and life is good. Unfortunately, since we can't > > guarantee the CSS file is there, we decided to embed the CSS inside > > the HTML. > > If you can guarantee the HTML is there, you can guarantee that > the CSS is there. If the CSS is missing, it shouldn't "break" the > usability of the HTML document. > I can guarantee that CSS file is in the PG directory. I can't guarantee that Joe Sixpack will download that when he grabs the HTML file. Overall, this makes things simpler for the consumer of the e-text. > > It bloats it somewhat, but it is still smaller than the obligatory > > PG header information, so I don't feel TOO badly about it. And now > > we get a fully self-contained file. > > I don't understand the correlation. What does your CSS size > have to do with the obligatory PG header size? > The CSS adds to the size of the etext, and on some level that feels ... wrong. I can't explain why, it just does. However, whenever PG posts a new e-text, they add a great big header and footer to the document for legal reasons. That thing absolutely dwarfs the CSS style header is size, so I don't feel AS badly as I might otherwise. It was mostly a throw-away comment, so don't read too much into it. > > Now that I think about it, you may be right... In Firefox (which is > > what I have on this machine), there is no View -> Use Style menu > > option, but there is the icon in the bottom left corner. *shrug* > > For those that want to see this in a much-more expanded > version, go to http://w3.org/Style/ in a Gecko-based browser, and > click on the icon, or go to View -> Use Style, and try the various > stylesheets listed there. > > > I have a big aversion to taking an electronic document and > > presenting it as "pages." First and foremost, it is ugly. > > I submit that having page numbers in an unintuitive place > (left-side margins, which doesn't appear in any printed work I can > find), is just as ugly. > The original page breaks were necessitated by the size of paper the publisher used. There is almost never a functional meaning to the page breaks in a book (except things like chapter breaks, which are easily marked up with horizontal rules or something to that effect). The page numbers in the margins are small and fairly unobstrusive, yet give the information in the easiest manner I could devise. Furthermore, they are completely hidden unless the read WANTS to have that information. > > Second, it is going to wreck havoc whenever the user wants to > > change font sizes, page sizes, etc. > > Having the border at the bottom of page 423 with a font size > of 1.0em is still going to put the border at the bottom of the page > when the font is 2.8em. > > I think I'm missing your allegory here. Can you explain? > If you put visible page breaks into an HTML document, the user is going to expect that document to print to his printer at exactly those page breaks. Good luck. Also, page breaks would only make sense if you broke them into "visual" chunks. By that, I mean sizes that fit into one screen at a time -- no scrolling. However, if the user has a different resolution than you, it ain't gonna work. If he changes the font size, it ain't gonna work. Basically, using visual page dividers is getting into typography, something you want to avoid. Good HTML lets the browser and the user format the text. You just tell them what KIND of text it is. The page numbers are not meant to give you visual indication of page breaks as much as contextual information regarding the original source... which some people find very important and as it's fairly easy for me to include that information without disturbing the other readers, I do. Josh PS None of this is an argument for my CSS based HTML over TEI-Lite. I would LOVE if we have TEI-Lite capabilities right now... But we don't. From hacker at gnu-designs.com Wed Oct 20 08:58:32 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 08:59:42 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> Message-ID: > I can guarantee that CSS file is in the PG directory. I can't > guarantee that Joe Sixpack will download that when he grabs the HTML > file. Agreed. If he wants a richer reading experience, he should grab the CSS. Pretty simple overall. If the reader wants to grab 200 etexts, its easier to let them know they need one .css file, than 200 identical css stanzas. I understand your needs, but you're un-CSS-ifying CSS. > The CSS adds to the size of the etext, and on some level that feels > ... wrong. I can't explain why, it just does. However, whenever PG > posts a new e-text, they add a great big header and footer to the > document for legal reasons. That thing absolutely dwarfs the CSS > style header is size, so I don't feel AS badly as I might otherwise. The PG header is considered "content", while CSS is considered "presentation". Again, I understand where you're coming from here, I just don't personally agree with it. I'm more of a purist, in the strictest sense of the word. ;) > If you put visible page breaks into an HTML document, the user is > going to expect that document to print to his printer at exactly > those page breaks. Good luck. This is why 'media="print"' exists in a CSS declaration. See here for more: http://www.w3.org/TR/REC-CSS2/media.html > Also, page breaks would only make sense if you broke them into > "visual" chunks. By that, I mean sizes that fit into one screen at > a time -- no scrolling. However, if the user has a different > resolution than you, it ain't gonna work. If he changes the font > size, it ain't gonna work. You can't translate a book into something read in a web browser, and retain the same functionality. The whole point of a scrollbar is to remove that constraint. Though I agree, unnessarily-long webpages (scrolling down for hundreds of pages) are a pain, but the alternative is much more painful. > The page numbers are not meant to give you visual indication of page > breaks as much as contextual information regarding the original > source... which some people find very important and as it's fairly > easy for me to include that information without disturbing the other > readers, I do. Right. Your page numbers don't correlate to anything, except an "Oh thats neat!" kind of feeling as you imagine what it would be like to be reading page 423 in the printed (dead-tree) version of that particular work. Page 423 in your numbering scheme is not the 423'd page as seen in my browser. > PS None of this is an argument for my CSS based HTML over TEI-Lite. > I would LOVE if we have TEI-Lite capabilities right now... But we > don't. I'm still gathering info and doing research on all of the alternatives presented thus far. TEI is one of the datapoints in my research. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jon at noring.name Wed Oct 20 09:03:14 2004 From: jon at noring.name (Jon Noring) Date: Wed Oct 20 09:03:31 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <41768369.6050204@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> Message-ID: <167527178375.20041020100314@noring.name> Marcello wrote: > Jonathan Ingram wrote: >> I'm not sure any use of XSLT can be called simple :). I've tried >> reading the spec, and I'm still recovering from the headaches. >> Fortunately there are easier ways to style (rather than transform) >> XML, using CSS. This is very well supported in all the Mozilla >> derivatives. While XSLT is something I'm going to have to look at >> eventually, for the moment I'm happy with CSS :). > CSS, while simpler, is less powerful and gives you only HTML. CSS can be applied to any XML markup for viewing on web standards browsers, but because current CSS is limited there are certain HTML functions (tags) it simply won't be able to enable, or to enable cleanly. In CSS 'display', for instance, there is no value for identifying when some XML element represents an object/image, nor a hypertext link/anchor. Obviously, there is no 'display' value for an inline note because HTML never supported this. (We find markup for inline notes in the TEI and DocBook vocabularies. Note that CSS can move a span of inline text to the side in its own box, I've tested it out myself, but IE6 unfortunately does not recognize the needed CSS so in IE6 the inline note stays inline, not a good thing.) Of course, one can use XLink for object/image embedding and anchors (and XLink makes more sense anyway than using CSS since it is a vocabulary-independent means to embed objects and enable links), but then current web browsers are very deficient in XLink support (Mozilla has very limited XLink support -- haven't tested FireFox yet -- while IE and Opera have zero XLink support.) The OpenReader System 1.0, should it become a reality (and we are working on it -- we've made great strides in the last few weeks in garnering fairly high-level support), intends to fully support the more important parts of the XLink specification in version 1.0. We may also add one or more custom CSS values to 'display' to emulate links/ anchors, objects/images and inline notes (OpenReader will include a facility to open 'booklets' to display non-inline content, in part to support OEBPS which enables this cool ebook feature.) We also plan to investigate a future version of OpenReader to *natively* support TEI-Lite or some subset of TEI (including handling inline notes which will be trivial for OpenReader to handle.) We may even develop a next-generation styling language to address the deficiencies of current CSS2 and CSS3 but which doesn't have the complexity of XSLT/XSL-FO. The problem with CSS is its ties to the HTML paradigm and legacy support. In OpenReader, we are freeing ourselves from these legacy issues and thus can think outside the box and move on to the next generation web browser -- in essence to go beyond HTML. Jon Noring OpenReader: http://www.openreader.org/ From jonathan_ingram at yahoo.com Wed Oct 20 09:03:28 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 09:03:35 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <41768369.6050204@perathoner.de> Message-ID: <20041020160328.75110.qmail@web41704.mail.yahoo.com> --- Marcello Perathoner wrote: > I don't think we'll get PG to post texts in non-standard cooked-up > formats. Neither do I, and I don't want them to. Hopefully we'll either use TEI, or a markup which can easily and losslessly transformed into TEI. However, there are a lot of people out there, including a lot of DP volunteers, who are unconvinced about the utility of XML, and one of the best ways to *fail* to change their mind is to plonk 1400 pages of documentation in front of them and say 'here's what you should be using, come back when you've finished reading' -- this is true even of TEI-lite, which has some foibles you have to see past (overly terse tags, for example -- at least to my mind :) ). I used to be one of the members of the 'undecided about XML' camp myself. I've gradually changed my mind, and I'm working on helping to change the minds of those I'm working with. I also don't just automatically accept the TEI-way as being best for the applications I wish to use it for -- so I've been developing my own structured markup which I'm happy with, and which happens to have converged very closely to the corresponding TEI markup. As I said in my previous email, this makes me much more confident to accept the use of TEI-style markup in areas where I haven't had the time to investigate alternatives. Those of you who aren't involved in DP will probably see nothing more about this until the day that 85% of new PG ebooks come with a TEI edition :). -- Jon Ingram __________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail From joshua at hutchinson.net Wed Oct 20 09:09:40 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 09:09:46 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020160941.073CD109756@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "David A. Desrosiers" > > Right. Your page numbers don't correlate to anything, except > an "Oh thats neat!" kind of feeling as you imagine what it would be > like to be reading page 423 in the printed (dead-tree) version of that > particular work. Page 423 in your numbering scheme is not the 423'd > page as seen in my browser. But that isn't its purpose. It's purpose, solely and completely, is to provide information about the original source. This isn't widely needed, so it is hidden. But, for the few that need that information (and again, this discussion was held in the DP forums and the scholarly types really clamored for this), the information is available. It is most definitely not there to provide you any indication of where page breaks would or should occur in your browser or in a printed copy. Josh From Bowerbird at aol.com Wed Oct 20 09:25:32 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 09:25:45 2004 Subject: [gutvol-d] Why Bowerbird is a genius Message-ID: gruber said: > Run the program and then Ctrl+Alt+Del it. > Ooops...there goes a "never-crash". control-alt-delete? is that how you end all your programs? :+) i recommend you try the "quit" button instead, or choose "quit" or "exit" under the file menu. but if that's a bug, which is entirely possible -- to be expected in fact -- in a beta-version, i'll fix it. but please take the bug-reports to the beta-test listserve, so they'll be logged. but don't bother with doing that _now_. as your version is almost 3 months old. a new version will be out very soon, and you should wait to test that one instead... > P.S. Why are you hiding the etexts in your code > instead of making them separate .txt's? first things first. priorities. and control of the degrees of freedom for enhanced troubleshooting the first objective is to get the program solid. to focus on that, it's wise to use content that i _know_ has been correctly formatted in z.m.l. once the app is acting correctly and is stable, i'll turn to texts that might be marked up wrong, confident that if i get unexpected behavior, it's due to an incorrect text or a defective z.m.l. rule, in all likelihood, and not some bug in the program. but the e-texts aren't really "hidden" right now. if you scrutinize any version of the program in a file-viewer, you will see the e-texts inside, "hiding" in plain sight in their raw-ascii glory. they can be recovered, in full, easily, any time. -bowerbird From Bowerbird at aol.com Wed Oct 20 09:27:01 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 09:27:21 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <190.3192affa.2ea7ebd5@aol.com> jon said: > And when we at DP find a markup format we're comfortable with, > then PG had better get comfortable with it as well, because > we are now produce the vast majority of all PG material. "we make all your base." -bowerbird From Bowerbird at aol.com Wed Oct 20 09:35:39 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 09:35:54 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <146.364dc5ec.2ea7eddb@aol.com> josh said: > Now Internet Explorer doesn't seem to have a way > to switch styles on the fly, which is a shame a big shame, since i.e. still has -- what -- 93% of all surfers? -bowerbird From Bowerbird at aol.com Wed Oct 20 09:46:25 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 09:46:37 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <92.1829c179.2ea7f061@aol.com> the hacker (there's too many davids running around here) said: > If you have a lot of texts, putting the stylesheet directly > inside the HTML unnecessarily bloats the content, and > removes one of the main benefits of CSS.. being able to > separate content from presentation. > This means that if you have 1,500 works > all formatted with an internal stylesheet, > and you want to change the fonts for one class > and add some borders around another, and > add a selector for a new text class... you have to > modify 1,500 stylesheets, insteasd of one. > Yes, you could do all of that with > a single perl one-liner, but why should you? sometimes people act like c.s.s. is some magic new technology. in reality, it's just a stylesheet. and this trade-off between "one stylesheet for all your documents" versus "one for each" is a well-known dilemma to anyone who has used stylesheets. (my virginity in that arena went to ventura publisher in 1989.) the upshot is that each method has benefits and shortcomings, and you can only really make an informed decision relevant to your particular situation when you know all of them full-on. i look forward to the learning process -- over the next 5 years? -- as this comes to be appreciated by the c.s.s. community at large, and relish the time of its culmination, when we can start to make some real progress, instead of just re-grasping old knowledge... -bowerbird From hacker at gnu-designs.com Wed Oct 20 09:47:32 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 09:48:03 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <146.364dc5ec.2ea7eddb@aol.com> References: <146.364dc5ec.2ea7eddb@aol.com> Message-ID: > a big shame, since i.e. still has -- what -- 93% of all surfers? And decreasing every day. Users aren't using MSIE because it is the superior product, they're using it because they have no idea there are significanly more secure, functional, compliant browser alternatives out there, and because it came with their pee-cee, with a nice convenient icon right on their desktop. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Wed Oct 20 09:51:36 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 09:52:03 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <92.1829c179.2ea7f061@aol.com> References: <92.1829c179.2ea7f061@aol.com> Message-ID: > sometimes people act like c.s.s. is some magic new technology. I think you mean CSS. There is no such thing as "c.s.s.". David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Wed Oct 20 09:56:00 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 09:56:26 2004 Subject: [gutvol-d] re: homegrown and kludgy Message-ID: <8.59f797c8.2ea7f2a0@aol.com> ingram (there's too many jons running around here) said: > At the moment the markup we use is homegrown and kludgy yay! somebody actually said it, right out loud! > -- we have a great opportunity at the moment > to move to something more sensible, and > I strongly believe that some > simple XML-derivative > is the markup we need. "simple xml-derivative" is an oxymoron. > I'm even more convinced of the utility of XML for DP > now that I've seen how easy it is to style it. > One of the problems of relying on something like XSLT > is that it can be hard to go backwards from > errors in the output to find > the corresponding error in the original XML input. > Being able to get direct feedback by viewing > a styled version of the XML makes life much easier. yep, things will be a lot better when the current crop comes to realize some of the benefits of w.y.s.i.w.y.g. (threw the baby out with the bathwater on that one...) if we could take the 5-year learning curve on _that_ and do it simultaneously with the one on stylesheets, that would be _really_ great, wouldn't it? -bowerbird From Bowerbird at aol.com Wed Oct 20 10:06:45 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 10:07:08 2004 Subject: [gutvol-d] re: what i've been suggesting all along Message-ID: <191.30f2ff61.2ea7f525@aol.com> marcello said: > What I've done with Faust is to > reformat the text file in a sensible way > and then use perl to automatically add TEI markup. bingo. now do that to the whole library. that's what i've been suggesting all along. > I advise to use a perl script to add the basic markup > and to refine the markup in a second markup-proofing step. i advise to write a program to add the basic markup (perl is fine, but so is any other tool someone uses), and then to refine your _program_ until you no longer need to do _any_ further refinements of its output. (or until it's easier to refine output than the program.) in the long run, doing things _that_ way will save you _tons_and_tons_ of unnecessary, one-time-only work. and, again, this is what i've been suggesting all along. now someone will come along and say, "that can't be done". and then i'll say "you're wrong, it can be done, i've done it." rinse and repeat, month after month, for nearly a year now. and no, i _won't_ do it for you, just to "prove" that i can... -bowerbird From Gutenberg9443 at aol.com Wed Oct 20 10:10:46 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 10:10:59 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <12c.4e97793b.2ea7f616@aol.com> In a message dated 10/19/2004 3:20:51 PM Mountain Standard Time, jeroen@bohol.ph writes: Also, since it is normally easier to throw something away than to add, I prefer to go to XML first, and then create HTML and Text from that. I have no problem at all with that, as long as the HTML and TXT also are posted. As to whoever it was who said that I am American- centered, yes, I am. The only language I speak besides English is Spanish. I'm learning to read French so that I can read LE FIGARO better, but will probably never pronounce it properly. Would like to learn German but I probably won't live that long. The fact remains that the MAJORITY of PGLAF's books are in English. So I'm talking about what is necessary IN ENGLISH. I did not mention PDAs at all. I mentioned laptops. The PDA I have refuses to speak to my computer or allow my computer to speak to it, so I'm rather limited in that direction. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/c8a0d82e/attachment.html From jonathan_ingram at yahoo.com Wed Oct 20 10:19:42 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 10:19:49 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <190.3192affa.2ea7ebd5@aol.com> Message-ID: <20041020171942.21949.qmail@web41713.mail.yahoo.com> --- Bowerbird@aol.com wrote: > jon said: > > And when we at DP find a markup format we're comfortable with, > > then PG had better get comfortable with it as well, because > > we are now produce the vast majority of all PG material. > > "we make all your base." In a very real sense, all PG's base do, indeed, belong to DP. -- Jon Ingram __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail From marcello at perathoner.de Wed Oct 20 10:29:42 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 10:29:50 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <167527178375.20041020100314@noring.name> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <167527178375.20041020100314@noring.name> Message-ID: <4176A086.8030101@perathoner.de> Jon Noring wrote: > The OpenReader System 1.0, should it become a reality (and we are > working on it -- we've made great strides in the last few weeks in > garnering fairly high-level support), intends to fully support the > more important parts of the XLink specification in version 1.0. We may > also add one or more custom CSS values to 'display' to emulate links/ > anchors, objects/images and inline notes (OpenReader will include a > facility to open 'booklets' to display non-inline content, in part to > support OEBPS which enables this cool ebook feature.) We also plan to > investigate a future version of OpenReader to *natively* support > TEI-Lite or some subset of TEI (including handling inline notes which > will be trivial for OpenReader to handle.) We may even develop a > next-generation styling language to address the deficiencies of > current CSS2 and CSS3 but which doesn't have the complexity of > XSLT/XSL-FO. The problem with CSS is its ties to the HTML paradigm and > legacy support. In OpenReader, we are freeing ourselves from these > legacy issues and thus can think outside the box and move on to the > next generation web browser -- in essence to go beyond HTML. May I ask how many people are working on this and what the time frame may be? -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Oct 20 10:35:00 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 10:35:17 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <68.4790aea6.2ea7fbc4@aol.com> the hacker said: > Your page numbers don't correlate to anything, > except an "Oh thats neat!" kind of feeling > as you imagine what it would be like to be > reading page 423 in the printed (dead-tree) > version of that particular work. > Page 423 in your numbering scheme > is not the 423'd page as seen in my browser. actually, having a solid congruence between our information as it exists as ink-on-paper and as it exists when displayed on-screen will ultimately prove to be far more crucial than merely a "oh, that's neat" kind of feeling. it doesn't have to be a 1-to-1 congruence, but some kind of major ratio is important. it might be 1-to-2, where one printed page equals 2 screens, such as is the case now for most monitors, in the sense that they'll nicely display _half_ of an 8.5*11-inch page. or it can be 2-to-1, where two printed pages equal 1 screen, such as is the case right now for most monitors, in the sense that they'll nicely display a 2-page spread of a 5*8 novel. and it _could_ be 1-to-1, too, as is the case now if we take our monitors and turn them from landscape to portrait, where they will display an 8.5*11-inch pagesize quite nicely. (go ahead and place a piece of paper up against your monitor right now, you'll see what i mean.) oh yeah, please don't some yahoo pipe up and say "but we can't expect that every screen will be the size of our desktop monitors". _of_course_ there will be a wide variety of screen-sizes, but the notion that a major-ratio congruence will be useful _still_ has the same credence and weight. one of the main reasons people resonate to .pdf is that the congruence between screen and paper makes them comfortable. they see equivalence. making the equivalence as transparent as possible is a powerful step to ease people toward e-books. and further, once we have "clipboard computers" -- a p.c. with the form-factor of a clipboard, with wireless web-access from anywhere -- there'll be mass movement to that screensize, exactly because it maps 1-to-1 on 8.5*11 paper. -bowerbird From jtinsley at pobox.com Wed Oct 20 10:35:28 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Wed Oct 20 10:36:21 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <41768369.6050204@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> Message-ID: <20041020173528.GB3366@panix.com> On Wed, 20 Oct 2004 17:25:29 +0200, Marcello Perathoner wrote: > >I don't think we'll get PG to post texts in non-standard cooked-up >formats. They are already making enough fuzz over perfectly valid TEI files. That last is, if not inaccurate, at least misleading. And I think you mean, by "PG" and "they" above, the WWs. So let's get down to it. Nobody has an objection to valid TEI texts, but valid TEI texts alone _are not enough_. An XML file that cannot be read (by an actual human) is as useful as a lock with no key. We need the key as well as the lock. I really no longer give any headroom at all to the approach "Post XML Now Because That Is The One True Way And We'll Figure Out How To Read It Later." If for no other reason, then because the most important part of the WW job is to check the texts before posting, and if we can't read it, we can't find the errors, and if we can't find the errors, we can't fix 'em. We WWs would all LOVE to have only one format (XML) uploaded, and generate all posting files from that. It would cut out an amazing amount of work and uncertainty. Further dowwn the line, we can get to looking at posting just the XML, and generate other formats on the fly, but let's take one step at a time. Considering that this step to date has already taken three years or so, that's not overly cautious! The first thing we need to do is get substantial agreement on a flavor of XML -- not ruling out the addition of future flavors, you understand, but we need to get at least one of them bedded down before we attack others. Teixlite seems to be the majority choice among those relatively few volunteers who are enthusiastic about XML, so let's say, for the purpose of this discussion, that that's the one we're working on. Next, we need a process for adding the header and footer for PG texts for the selected flavor. That shouldn't be a problem; if we can agree how to tag them, we can automate that. (We don't actually _have_ agreement about tagging them, but I can't believe that could end up being a problem, once we settle on the rest.) Next, we need a process, using open-source, cross-platform tools -- the standarder the better -- to convert that XML into, at a minimum, plain text and HTML. Other formats are welcome but optional. That process must work for _all_ teixlite files, not just ones that are specially cooked, using constraints not specified within the chosen DTD. Here's where we hit the rocks today. I give considerable credit to you, Marcello, and to Jeroen, as the only people I know of who have come up with at least partial answers and approaches to this. Maybe you have refined your processes, but the last time I tried, I couldn't put Jeroen's files through your process, and get the expected results. I think you have most of it down, though. Is it close enough to try again? I don't want to imply specific means from which this process is to be constructed. Obviously XSLT is one possible approach, but I certainly do not want to imply limitations on what that process should use. The only things we must have -- both for our own internal practical purposes and for the use of future readers -- is that it should work reliably on _all_ texts that conform to the XML DTD chosen, be open source, and be cross-platform. A reader needs to be able to tweak the transform and re-run on her own desktop. And just re-reading that last, when I say "must work reliably on ALL texts" I do not mean to imply that the same XSLT must be used for all texts, though obviously that would be of benefit, if we can manage it. I've held just about every position on XML at one time or another, and I'm all XMLed out. I no longer believe it is worth spending my time on, until somebody (else!) solves the issues I've just laid out. jim From marcello at perathoner.de Wed Oct 20 10:37:53 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 10:38:01 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <20041020160328.75110.qmail@web41704.mail.yahoo.com> References: <20041020160328.75110.qmail@web41704.mail.yahoo.com> Message-ID: <4176A271.2010105@perathoner.de> Jonathan Ingram wrote: > and one of the best ways to *fail* to > change their mind is to plonk 1400 pages of documentation in front of them and > say 'here's what you should be using, Then don't do that. You don't plonk the IBM PC Technical Reference Manual (5000 pages) in front of your secretary if you want her to type a few pages in M$-Word. You just give her a "Word for Dummies" book and that is all she needs. She don't need to know about the difference between AGP and PCI-X bus. The full TEI spec explains the DTD and what not. Nobody needs that except the implementors. There are many gentle introductions to TEI-Lite floating around. And thats another advantage of using a standard. You don't have to write that stuff yourself. -- Marcello Perathoner webmaster@gutenberg.org From Gutenberg9443 at aol.com Wed Oct 20 10:46:53 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 10:47:08 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <9a.174b3315.2ea7fe8d@aol.com> In a message dated 10/19/2004 3:36:41 PM Mountain Standard Time, jeroen@bohol.ph writes: Of course he does. How on Earth can he teach German or >French, or expect his students to read a book in a language >they are familiar with (in large parts of Africa, that >would be French), without the proper umlauts and grave >accents? > > >Even worse, many African languages are written with the >Latin alphabet, >but using additional letters, such as an F with a curl, >which, until >very recently weren't supported by most computers or >typewriters, and >thus conveniently replaced by their nearest >counterparts. You could have Lead, follow, or get out of the way. Can you supply a way to do this? If so, do it. If not, quit bellyaching. I have gotten a sufficient number of letters and emails from Africans to be aware that in many African countries, learning English is very desirable but is not done well. I proofread for PGLAF a book in French which had been translated into English but had maintained the French forms of a good many names, titles, and other words. As my husband speaks French fluently, I had him check everything I had done. It wound up being posted in two versions: one without the French characters and one with the French characters. As I had worked extremely hard to make sure the French characters were right, I felt sad when I tried to read the version without the French characters. But all the same, I'd rather that readers have that version than no version at all of the book. The principle of the greatest good for the greatest number doesn't mean let's throw out the lesser numbers. IF I AM WRITING IN ENGLISH OR READING IN ENGLISH I don't need the grave accents and the umlauts UNLESS I AM DOING SCHOLARLY WORK. I cannot reasonably express an opinion of how to do works in other languages because I don't speak those languages. I do know that books in English posted in TXT are readable to all English-speaking people, and that includes many people for whom English is their second or third language. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/9236398e/attachment.html From marcello at perathoner.de Wed Oct 20 10:47:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 10:47:18 2004 Subject: [gutvol-d] Why Bowerbird is a genius In-Reply-To: References: Message-ID: <4176A49E.5090707@perathoner.de> Bowerbird@aol.com wrote: > a new version will be out very soon, and > you should wait to test that one instead... Don't fear. We haven't done anything else since you first announced your reader on 14 Feb 2003, 20 months ago. >> P.S. Why are you hiding the etexts in your code >> instead of making them separate .txt's? > > the first objective is to get the program solid. > to focus on that, it's wise to use content that > i _know_ has been correctly formatted in z.m.l. Malicious tongues would argue that you want to keep beta-testers from using their own ZML texts and discovering what a useless piece of crap your software is. If I want to read a new book I have to get a new reader? Even Micro$oft never went that far in Digital Restriction Management. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Wed Oct 20 10:04:36 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 10:48:11 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis Message-ID: <20041020170436.532794BDAA@ws1-1.us4.outblaze.com> Steve Thomas writes: > As usual, people have missed the point of the original post > (Anne's) which was that we need to remember the *user* -- that > guy in Africa with only 2 hours of electricity each day. I'm not spending as much time as I do with PG for him. I seriously doubt that he's interested in Ossian in Germany or Selections from Early Middle English. My target user is a scholar, whether a kid in high school, or a college student or professor or other person who may not have or may not be interested in waiting on interlibrary loan. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Gutenberg9443 at aol.com Wed Oct 20 10:48:08 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 10:49:40 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <13e.44a2533.2ea7fed8@aol.com> In a message dated 10/19/2004 3:48:35 PM Mountain Standard Time, jeroen@bohol.ph writes: I already have numerous benefits from working in XML, in that I can generate nice HTML files (that often need no touch-up at all) and reasonable plain ASCII for PG, but also have spelling checking on a per language base, extract all fragments in a certain language, create tables of contents, etc. on the fly, extract dublin core bibliographic records, and more. Good. Do it. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/32b50650/attachment-0001.html From Gutenberg9443 at aol.com Wed Oct 20 10:53:52 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 10:54:02 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <199.31dbff61.2ea80030@aol.com> In a message dated 10/19/2004 5:00:14 PM Mountain Standard Time, shalesller@writeme.com writes: > index items should be linked to the place in the text, > and a backlink should be made as well, if at all possible A backlink from where? And why? I think we should use links only where they are explicit or at least loudly implicit in the original work. I do understand this one. If I can click the index number in the text to take me to the index entry, I then want to click something on the index entry that will take me back to the same place in the text. Ditto footnotes and endnotes, which I now do by inserting them at the end of the appropriate paragraph and double-indenting them. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/7b335f69/attachment.html From Bowerbird at aol.com Wed Oct 20 10:53:53 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 10:54:17 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <159.4250e0be.2ea80031@aol.com> the hacker said: > And decreasing every day. not nearly fast enough, though. :+) and given that today's machines will likely serve their owners needs -- i.e., do e-mail -- for the next decade, it's gonna be a long haul. > Users aren't using MSIE because it is the superior product d'uh... ;+) > they're using it because they have no idea there are > significanly more secure, functional, compliant > browser alternatives out there there are a lot of things that users "don't know". but unless you're willing to actually inform them -- which can be a _tremendously_ difficult job -- then you must accept that if you want to be of service to them, then you have to work within their limitations. project gutenberg wants to be of service to people; that's why it has been the most successful e-library. the alternative -- favored by most techies, it seems -- is to leave people behind. that's fine if you want to be a minority. i have been a mac user for a very long time, so that tells you where i stand on that matter personally. but know that people aren't going to be writing you letters thanking you for what you've done, like they do michael hart. > and because it came with their pee-cee, > with a nice convenient icon right on their desktop. monopolies suck, don't they... feel free to fight the power, but know that 85% of those 93% won't be bothered to follow you... -bowerbird From Bowerbird at aol.com Wed Oct 20 10:55:47 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 10:55:57 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <1b8.41e17b6.2ea800a3@aol.com> the hacker said: > I think you mean CSS. yes, that is exactly what i meant. how psychic of you to pick that up. ;+) -bowerbird From marcello at perathoner.de Wed Oct 20 10:57:02 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 10:57:10 2004 Subject: [gutvol-d] re: what i've been suggesting all along In-Reply-To: <191.30f2ff61.2ea7f525@aol.com> References: <191.30f2ff61.2ea7f525@aol.com> Message-ID: <4176A6EE.8030707@perathoner.de> Bowerbird@aol.com wrote: >> and then use perl to automatically add TEI markup. > > bingo. now do that to the whole library. > that's what i've been suggesting all along. You have been saying nothing of the kind. You said all markup was wasted because ZML was "two steps better" than XML. TEI is an XML application after all. > and no, i _won't_ do it for you, just to "prove" that i can... Until now you have only proven that you don't know what you are talking about. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Wed Oct 20 10:15:40 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 11:01:37 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020171540.523E14BDA9@ws1-1.us4.outblaze.com> Steve Thomas writes: > That's not the point. People don't go to PG thinking, "hmmm, I > wonder if they have any XML files". They go looking for a book. > If you want the text of a particular book, you'll use it > whatever format it comes in, so long as you have the software to > handle that format. Nobody "needs" XML or PDF. They "need" the > words of the book. Formats are secondary. What if they have a page reference to the standard (or only) edition of the book? Then they "need" the page numbers. What if they have a speech synthesizer smart enough to do multiple languages, but they need the languages marked? Then they "need" language tagging. What if they "need" to process a table? Then they "need" a system that doesn't ASCII-format tables. (And, BTW, a speech synthesizer that just skips accented letters is just lame. Removing the accents could be done in one line of Perl or a dozen lines of Fortran.) > Could it be better to put the PG effort into getting plain text > editions out, and leave it to others to do the extra conversion > to XML etc.? This is a model that has worked really very well > for quite a few years, without complaint from any but a few > tech-enthusiasts. No, it doesn't work real well. The value of XML is in what it includes that plain text doesn't, and a lot of that is lost in the plain text version. You need the original book to fix that. Even with the original book, it can be a pain, whereas it's trivial to keep page numbers (for example) in the original processing. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Gutenberg9443 at aol.com Wed Oct 20 11:03:25 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 11:03:46 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: In a message dated 10/20/2004 12:54:25 AM Mountain Standard Time, tb@baechler.net writes: I would like to see PG eventually go to xml not because I particularly like the format but because the new DAISY standard for digital talking books for the blind uses a form of xml. Thank you for this input. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/080243af/attachment.html From shalesller at writeme.com Wed Oct 20 10:21:03 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 11:06:40 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com> ?"David A. Desrosiers" writes: > > a big shame, since i.e. still has -- what -- 93% of all surfers? > > And decreasing every day. > > Users aren't using MSIE because it is the superior product, > they're using it because they have no idea there are significanly more > secure, functional, compliant browser alternatives out there, and > because it came with their pee-cee, with a nice convenient icon right > on their desktop. And nothing's going to fix that in the forseeable future. And what about us who serf the net through the services of a library, and have no option on which browser to use? I really think you'll are dismissing IE and Lynx too quickly. We can't just support Mozilla, now or in the future. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From shalesller at writeme.com Wed Oct 20 10:26:09 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 11:10:53 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020172609.C9E684BDAB@ws1-1.us4.outblaze.com> Gutenberg9443@aol.com writes: > So I'm talking about what is necessary > IN ENGLISH. Then let's not act like we're doing this for third-world countries, since many of them don't speak English. If we were doing this for third-world countries, we should be doing a lot more Spanish and French and Arabic and a bunch of other languages that we generally totally ignore. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Wed Oct 20 11:12:13 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 11:12:29 2004 Subject: [gutvol-d] re: at a minimum, plain text and HTML Message-ID: <1e0.2cd9670c.2ea8047d@aol.com> jim said: > I no longer believe it is worth spending my time on, > until somebody (else!) solves the issues I've just laid out. thanks to jim for saying what needs to be said. now one specific point about one of the paragraphs in his post... > we need a process, using open-source, cross-platform tools > -- the standarder the better -- to convert that XML > into, at a minimum, plain text and HTML. > Other formats are welcome but optional. if you're willing to settle for plain-text and .html, then doing the files in plain-text and refining your text2html converter is _far_ more cost-effective. if -- sometime down the line -- the move to x.m.l. really is inevitable (and not just hyped to be that), you will find that your text2html converter can be improved so that it will convert to x.m.l., and you will have saved yourself an enormous tagging job... -bowerbird From hacker at gnu-designs.com Wed Oct 20 11:14:36 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 11:15:05 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <159.4250e0be.2ea80031@aol.com> References: <159.4250e0be.2ea80031@aol.com> Message-ID: > there are a lot of things that users "don't know". > but unless you're willing to actually inform them -- which can be a > _tremendously_ difficult job -- then you must accept that if you > want to be of service to them, then you have to work within their > limitations. Speak for yourself, but I've successfully gotten 14 local businesses to switch completely over to Firefox on Windows for their primary browser, in the last 60 days. This includes every workstation running a browser with Internet connectivity in all 14 businesses. I'm doing my part. Are you? >> and because it came with their pee-cee, >> with a nice convenient icon right on their desktop. > monopolies suck, don't they... Actually, they don't even cross my radar, ever. > feel free to fight the power, but know that 85% of those 93% won't > be bothered to follow you... Nor do I care. I only care for the ones who are willing to make their lives, and the lives of others better. For the users who refuse to learn, to adapt, and to grow, they can stagnate and stay in their own nice warm puddle. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Gutenberg9443 at aol.com Wed Oct 20 11:19:47 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 11:19:59 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <15a.4173d96b.2ea80643@aol.com> In a message dated 10/20/2004 4:41:26 AM Mountain Standard Time, marcello@perathoner.de writes: Well same limitations for PDF. It hasn't stopped people from buying paper books. They've da** well stopped ME from buying paper books. I CAN'T READ THE BLOODY THINGS! I have about 800 paperback books right beside my bed that I CANNOT READ IN BED and most of them are in no other format. I read them in the living room, using a magnifying glass when necessary. Most of my hardcover books I can still read in the living room, but I can't read them in bed either. I'm to the point that I would far rather read a hundred- year-old book on screen than a brand new one on paper, even if it's a topic in which I am extremely interested. I'm really not interested in converstion from XTM or XTL or whatever it is, if you're expecting the reader to do the conversion. Back to my third-world schoolmaster with his donated 486 and a slow CD reader--if we send him a CD of PG books in English he can read them and he can use them to teach his students English, which will greatly improve their chances of finding decent work when they are adults. But he can do this only if the books are in TXT format. Please. I am not trying to start a flame war. I detest flame wars. I am simply returning, again and again, to Michael Hart's original vision. No matter what ELSE we do to the texts, we are betraying what makes PG special if we require everybody to have this program or that program which probably won't run on most obsolete or obsolescent computers. All this other stuff sounds grand. I wish I could understand it. But I can't. Neither can 99.9999999% of the other people who use PD. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/8424b090/attachment-0001.html From Gutenberg9443 at aol.com Wed Oct 20 11:28:13 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 11:28:30 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <29.642cce53.2ea8083d@aol.com> In a message dated 10/20/2004 9:48:33 AM Mountain Standard Time, joshua@hutchinson.net writes: The original page breaks were necessitated by the size of paper the publisher used. There is almost never a functional meaning to the page breaks in a book (except things like chapter breaks, which are easily marked up with horizontal rules or something to that effect). Speaking as a writer, I strongly disagree. I often use page breaks as a transition, and most other fiction writers do the same thing. (That's where I learned it.) To keep page breaks in a TXT version, simply insert # # # at the left margin where the page break belongs. That way TXT isn't confused or confusing, and the reader can see that as a page break. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/a77536f1/attachment.html From hacker at gnu-designs.com Wed Oct 20 11:31:15 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 11:32:06 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com> References: <20041020172103.2DE244BDA9@ws1-1.us4.outblaze.com> Message-ID: > I really think you'll are dismissing IE and Lynx too quickly. We > can't just support Mozilla, now or in the future. There is this myth, and you just confirmed it again, supported by 100% of the people who hear that supporting MSIE is not a wise decision, that for some reason, that is interpreted as "won't" support MSIE. I get see this in web development circles all the time, when I explain that I develop against the standards, and in Mozilla, and I test in 13 browsers, including MSIE. I _always_ get people who come back with "Why don't you support MSIE?". Apparently logic and clear thought aren't among their better traits. For some reason, the notion that I develop in Mozilla, using standards, somehow means I am not making code that would work in MSIE. Nothing could be farther from the truth. Just because I support Mozilla, does not mean I do NOT support MSIE. That being said, if my code works in 13 browsers, and fails in MSIE, my code is not the problem. I do, however, refuse to add "hacks" to get MSIE to do what it should be doing anyway... following the standards. If the code works in MSIE, and breaks in Mozilla, MSIE is the problem. If the code works in Mozilla, and breaks in MSIE, MSIE is the problem. Which brings me to a great quote I found which is related to this exact issue of Microsoft intentionally ignoring the published standards: "Microsoft properly asserts that OpenOffice is not 100% compatible with their product. Microsoft, however, has apparently decided not to support the OpenOffice formats either, for which they have no excuse: the standards for OpenOffice documents are publicly available, whereas Microsoft makes it a habit to sue people for reverse engineering their own formats." Anyway, I don't want to turn this into a browser religious war. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Wed Oct 20 11:31:55 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 11:32:40 2004 Subject: [gutvol-d] Why Bowerbird is a genius Message-ID: <12c.4e9a7208.2ea8091b@aol.com> marcello said: > Malicious tongues would argue that malicious tongues would, would they? but _your_tongue_ wouldn't, would it? > Malicious tongues would argue that > you want to keep beta-testers from > using their own ZML texts once they have helped me locate all the bugs in the program, i'll actively _want_ beta-testers to run their own z.m.l. texts. until then, though, i don't want somebody reporting a "bug" in the program that is _actually_ due to their improper z.m.l. you have to do things in the correct order, that's all. people also have to keep in mind that the rules of z.m.l. are also "in-progress" at the same time, being continually refined, as conditions require, so there are many open parameters here. use of a constrained set of texts is the logical course of action. > and discovering what a useless piece of crap your software is. well, much better to learn that early from the beta copy than having to wait all that time for the release version, don't you think? :+) -bowerbird p.s. one definition: "condone... 2. to give tacit approval to: _by_his_silence,_he_seemed_to_condone_their_behavior_." -- page 278, random house webster's college dictionary... From joshua at hutchinson.net Wed Oct 20 11:41:00 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 11:41:09 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020184101.0F2849E980@ws6-2.us4.outblaze.com> I can hate IE and still support it. Honest! What I do, personally, is create a minimum functionality that will work in all browsers. Then, if I see a need that goes beyond that, and only some of the browsers support it, fine. As long as it doesn't degrade the minimum functionality in all browsers, I'm willing to add it. That is what the page numbers markup currently does. It hides the page numbers for those the minimum, default behavior, but if you have a browser that supports it, you can see those page numbers appear. Similarly with poetry. It has features that allow the browser to rewrap nicely if there is a long line, if the necessary CSS support is there ... but if not, it still displays the poem with its normal indents, it just doesn't rewrap nicely for you. If I try something and it dies on one of the browsers, I take it back out or find another compatible way. In my case, at the very least, if you find something I've worked on that does NOT degrade gracefully in Lynx, etc.... Let me know. I consider that a bug in my work. Josh ----- Original Message ----- From: "D. Starner" To: "Project Gutenberg Volunteer Discussion" Subject: re: Re: [gutvol-d] Re: aspects of a well-done e-book Date: Wed, 20 Oct 2004 09:21:03 -0800 > > ?"David A. Desrosiers" writes: > > > > a big shame, since i.e. still has -- what -- 93% of all surfers? > > > > And decreasing every day. > > > > Users aren't using MSIE because it is the superior product, > > they're using it because they have no idea there are significanly more > > secure, functional, compliant browser alternatives out there, and > > because it came with their pee-cee, with a nice convenient icon right > > on their desktop. > > And nothing's going to fix that in the forseeable future. And what about > us who serf the net through the services of a library, and have no option > on which browser to use? > > I really think you'll are dismissing IE and Lynx too quickly. We can't > just support Mozilla, now or in the future. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Wed Oct 20 11:43:14 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 11:43:23 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <190.3192affa.2ea7ebd5@aol.com> References: <190.3192affa.2ea7ebd5@aol.com> Message-ID: <4176B1C2.3010007@perathoner.de> Bowerbird@aol.com wrote: > "we make all your base." You should definitely learn to get your quotes right. all your base are belong to us http://www.catb.org/%7Eesr/jargon/html/A/all-your-base-are-belong-to-us.html -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Oct 20 11:46:04 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 11:46:14 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com> Let me assure you.? NONE of us (not even bowerbird) is expecting Joe Sixpack to have the conversion tools loaded on his computer. Whether the eventual process will have the whitewashers creating the TXT and HTML files from XML or the server building them on the fly, no one wants that burden to fall on the reader. There will always be plain text files available. Even the most hardened XML-phile among us isn't going to take that away. (Even if they refuse to ever actually read a text in plain-ascii format! ;) ) Josh ----- Original Message ----- From: Gutenberg9443@aol.com Please. I am not trying to start a flame war. I detest flame wars. I am simply returning, again and again, to Michael Hart's original vision. No matter what ELSE we do to the texts, we are betraying what makes PG special if we require everybody to have this program or that program which probably won't run on most obsolete or obsolescent computers. From Bowerbird at aol.com Wed Oct 20 11:46:45 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 11:46:59 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <55.645b1253.2ea80c95@aol.com> the hacker said: > Nor do I care. I only care for the ones who are > willing to make their lives, and the lives of others better. > For the users who refuse to learn, to adapt, and to grow, > they can stagnate and stay in their own nice warm puddle. fine. i have no problem with that. none whatsoever. take whatever attitude you want to, that's what i do. but michael hart's attitude -- which is the one that has made project gutenberg the most successful e-library in all cyberspace -- is to serve the trailing-edge user... and now _you_ -- like so many other e-book initiatives before, and i specifically include myself in that group -- find yourself coming here to use michael's e-text files for your own purpose. kind of ironic, isn't it? -bowerbird From joshua at hutchinson.net Wed Oct 20 11:49:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 11:50:00 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020184952.01B2AEDC4F@ws6-1.us4.outblaze.com> Are you sure we're talking about the same thing? By page break, I mean when you get to the bottom of the physical piece of paper. What you describe sounds more to me like what I usually refer to as a thought break ... a little white space or a graphic symbol between sections of text to indicate a scene transition or time passing, etc. Typically, in PG texts, those are marked with 5 asterisks (functionally equivalent to your # # #). Josh ----- Original Message ----- From: Gutenberg9443@aol.com Speaking as a writer, I strongly disagree. I often use page breaks as a transition, and most other fiction writers do the same thing. (That's where I learned it.) To keep page breaks in a TXT version, simply insert # # # at the left margin where the page break belongs. That way TXT isn't confused or confusing, and the reader can see that as a page break. Anne From Bowerbird at aol.com Wed Oct 20 11:56:32 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 11:56:48 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <197.31921c87.2ea80ee0@aol.com> marcello said: > You should definitely learn to get your quotes right. > all your base are belong to us except _that_ doesn't correspond to what ingram said... which is precisely why i reworked the phrase. and whether you rework gibberish "correctly" or not seems to be rather beside the point, not? at any rate, have a nice day, marcello. :+) -bowerbird From hacker at gnu-designs.com Wed Oct 20 11:57:02 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 11:58:07 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <55.645b1253.2ea80c95@aol.com> References: <55.645b1253.2ea80c95@aol.com> Message-ID: >> Nor do I care. I only care for the ones who are >> willing to make their lives, and the lives of others better. >> For the users who refuse to learn, to adapt, and to grow, >> they can stagnate and stay in their own nice warm puddle. > and now _you_ -- like so many other e-book initiatives before, and i > specifically include myself in that group -- find yourself coming > here to use michael's e-text files for your own purpose. > kind of ironic, isn't it? ..only in the sense that you've taken my words completely out of context, and twisted them to suit a discussion that wasn't even part of the original reply. I'm rapidly tiring of this, and its a waste of my time, as well as the time of others. If we're not moving forward, we're not moving, and that is never a wise thing to continue to expend effort upon. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Wed Oct 20 12:01:07 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 12:01:26 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1da.2e1f99d3.2ea80ff3@aol.com> starner said: > The value of XML is in what it includes that plain text doesn't, > and a lot of that is lost in the plain text version. lost? or just not currently included? or even deliberately thrown out? consider carefully what your language implies, it might constrain you... meanwhile, i will point out once again that nobody has challenged my contention that i can represent all important book features using z.m.l. plain text, when formatted wisely, is far more powerful than you know. -bowerbird From Bowerbird at aol.com Wed Oct 20 12:09:26 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 12:09:49 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1f2.13d38da.2ea811e6@aol.com> joshua said: > NONE of us (not even bowerbird) is expecting Joe Sixpack > to have the conversion tools loaded on his computer. i expect that joe sixpack won't _need_ a "conversion tool" because my viewer will give him more e-book functionality from z.m.l. plain-text than any other format/viewer gives him. and if it happens he _does_ need a conversion for a good reason, then i'll build that conversion routine into my program for him... so i _do_ expect that he'll have a conversion tool on his computer, but i _also_ expect that he'll never find any good reason to use it... (in the long run, anyway. but until my viewer is ported to the pda, and to web-servers, an .html converter will probably be necessary.) -bowerbird From joshua at hutchinson.net Wed Oct 20 12:17:37 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 12:17:50 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020191737.29D554F462@ws6-5.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > meanwhile, i will point out once again that nobody has challenged my > contention that i can represent all important book features using z.m.l. > plain text, when formatted wisely, is far more powerful than you know. Repeat after me... ZML ... IS ... NOT ... PLAIN ... TEXT! From Bowerbird at aol.com Wed Oct 20 12:18:39 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 12:18:56 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <1e2.2ca102ae.2ea8140f@aol.com> the hacker said: > only in the sense that you've taken my words completely out of context, and twisted them to suit a discussion that wasn't even part of the original reply. you said -- as clearly as it can be said -- that your attitude is to leave the trailing edge behind. i only said that's not what michael's attitude is. and it is a historical fact that it has been michael -- and _not_ all the people with your attitude -- that has created the best library in cyberspace so far. if you don't give him -- and his attitude -- that credit, you will (like so many others) fail to learn from history. > I'm rapidly tiring of this, and its a waste of my time, > as well as the time of others. you're a quick study, david. it took me 6 months to figure that out. you've gotten the message in one week. :+) > If we're not moving forward, we're not moving this place hasn't been able to move forward on x.m.l. in 3 years. if you read the archives, you'll see all these threads are re-runs. > and that is never a wise thing to continue to expend effort upon. righto. that's why i'm leaving shortly. i just came back for one last harrah, to set the record straight one more time that "i told you so, and did it repeatedly..." -bowerbird From Bowerbird at aol.com Wed Oct 20 12:25:44 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 12:25:58 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1dc.2e9d3a95.2ea815b8@aol.com> josh said: > ZML ... IS ... NOT ... PLAIN ... TEXT! of course it is. x.m.l. and h.t.m.l. and s.g.m.l. are too. it all reduces down to 1s and 0s. it's just that some formats are more or less readable in that state. -bowerbird From marcello at perathoner.de Wed Oct 20 12:35:11 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 12:35:20 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020173528.GB3366@panix.com> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> Message-ID: <4176BDEF.7050008@perathoner.de> Jim Tinsley wrote: > Nobody has an objection to valid TEI texts, but valid TEI texts alone > _are not enough_. An XML file that cannot be read (by an actual human) > is as useful as a lock with no key. Not so. Having a TEI text posted would enable third-party developers to come up with their own converter solutions eve if we didn't get very far with ours. There are a lot of people around who already convert the text files into other formats. Their jobs would get much easier. > I really no longer give any headroom at all to the approach "Post XML > Now Because That Is The One True Way And We'll Figure Out How To Read > It Later." If for no other reason, then because the most important > part of the WW job is to check the texts before posting, and if we > can't read it, we can't find the errors, and if we can't find the > errors, we can't fix 'em. A TEI text is basically a text file. So you can read it in any editor. If you use emacs you can also validate the TEI file against the DTD without leaving the editor. A perfectly valid TEI file with no spelling errors should be good enough to post. What you expect from us TEI developers is that we produce the 150% perfect solution before you even consider starting to post files. That is not the way software development works. And this attitude is in my opinion the main cause why we have gotten nowhere with TEI in the last 3 years. Lets start now with a version 0.0.1 of the TEI process. Of course at some later time we'll have to do all the posted files over again. Probably more than once. But its better than sitting here and playing with bowerbird because we are bored. > Next, we need a process, using open-source, cross-platform tools -- > the standarder the better -- to convert that XML into, at a minimum, > plain text and HTML. Other formats are welcome but optional. That > process must work for _all_ teixlite files, not just ones that are > specially cooked, using constraints not specified within the chosen > DTD. Here's where we hit the rocks today. TEI defines a standard way to extend the DTD. I used this standard way to extend the TEI DTD into what I called PGTEI. This still is a perfectly valid TEI DTD according to the TEI specs. > I don't want to imply specific means from which this process is to be > constructed. Obviously XSLT is one possible approach, but I certainly > do not want to imply limitations on what that process should use. The > only things we must have -- both for our own internal practical > purposes and for the use of future readers -- is that it should work > reliably on _all_ texts that conform to the XML DTD chosen, be open > source, and be cross-platform. A reader needs to be able to tweak the > transform and re-run on her own desktop. You misunderstand what a DTD is. It just gives you syntactical correctness. I can cook up a perfectly valid XHTML file which is semantically bogus:
1
1.1

1.1.1

...
This is valid HTML (didn't bother to check) but will render not so well. You cannot build a conversion tool that will produce good results on all syntactically valid TEI files, like you cannot build a browser that will make sense out of semantically bogus HTML files. Furthermore TEI is geared towards marking up existent texts, so scholars can study the text without having to get the physical book. It is not so good as a master format for print processing. That's why I had to add some more tags and attributes to my DTD. (Which doesn't make any text that uses my DTD less standard, because TEI is expressly designed to be extensible. But I'm repeating myself.) > And just re-reading that last, when I say "must work reliably on ALL > texts" I do not mean to imply that the same XSLT must be used for all > texts, though obviously that would be of benefit, if we can manage it. So why not start posting texts marked up in PGTEI, which will by definition work well in my conversion chain? And at the same time start posting Jeroens texts, which will convert fine in his chain? This way we could both start putting up an automatic online conversion chain. (The guy who did this already in Java has somehow vanished, so I think we have to start over again.) For the start I will act as interim Post-Processor for people wanting to post PGTEI and pass on to you only the perfectly good ones. You'll just have to stick in the etext number where I put 5 asterisks. I claim the .pgtei file extension, Jeroen can claim what extension he sees fit for his files. So we can have bith an alice30.pgtei and an alice30.jtei. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Wed Oct 20 12:47:51 2004 From: jon at noring.name (Jon Noring) Date: Wed Oct 20 12:48:21 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <4176BDEF.7050008@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> Message-ID: <103540655640.20041020134751@noring.name> Marcello wrote: > TEI defines a standard way to extend the DTD. I used this standard way > to extend the TEI DTD into what I called PGTEI. This still is a > perfectly valid TEI DTD according to the TEI specs. I probably missed it from one of your prior messages, but do you have your PGTEI documented anywhere? Have you put together an actual Schema/DTD which can be used to validate documents for validity to PGTEI? And a list of your custom vocabulary extensions? Also, another question to ask is if it is documented anywhere how Jeroen's version of TEI compares with your PGTEI? Thanks! Jon From marcello at perathoner.de Wed Oct 20 12:52:26 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 12:52:39 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <1e2.2ca102ae.2ea8140f@aol.com> References: <1e2.2ca102ae.2ea8140f@aol.com> Message-ID: <4176C1FA.6080706@perathoner.de> Bowerbird@aol.com wrote: > righto. that's why i'm leaving shortly. Promises. Promises. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Oct 20 12:55:49 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 20 12:55:59 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <103540655640.20041020134751@noring.name> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <103540655640.20041020134751@noring.name> Message-ID: <4176C2C5.6070801@perathoner.de> Jon Noring wrote: > I probably missed it from one of your prior messages, but do you have > your PGTEI documented anywhere? Have you put together an actual > Schema/DTD which can be used to validate documents for validity to > PGTEI? And a list of your custom vocabulary extensions? Start here: http://www.gutenberg.org/tei/ > Also, another question to ask is if it is documented anywhere how > Jeroen's version of TEI compares with your PGTEI? No. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Oct 20 12:56:29 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 12:56:45 2004 Subject: [gutvol-d] sudden surge in demand Message-ID: <8b.17ea1fb8.2ea81ced@aol.com> due to the sudden surge in demand, i've uploaded a new version of my beta-test viewer-program... it can be found in the files section of the yahoogroups listserve for the beta-test, which is under the name of "zml_talk". you can join that beta-test by subscribing via e-mail: zml_talk-subscribe@yahoogroups.com this upload is just "the daily build", and has _not_ been reviewed, so it should be considered as such... people who want to run a version _known_ to be stable should hold off on this. such a version will go up soon. additionally, no reports are necessary on this version, since a few of the features have yet to be implemented. but if you'd like to see the development of the program since the version uploaded on 7/27, it's there for you... sorry it's taken so long, but you know how things are. time flies when you're squashing bugs... -bowerbird From Bowerbird at aol.com Wed Oct 20 13:15:18 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 13:15:44 2004 Subject: [gutvol-d] Posting TEI Message-ID: <5b.5b84bd22.2ea82156@aol.com> marcello said: > A TEI text is basically a text file. right. if you could tell josh this, i'd appreciate it. > So you can read it in any editor. "read" might not be the best word-choice there. "view" is more appropriate, i would think, since the markup will often obscure the content itself. > If you use emacs you can also > validate the TEI file against the DTD > without leaving the editor. that's a nice thing to be able to do. but the .tei might be _valid_ and still not do what it's supposed to do, accurately reflect the underlying structure and therefore result in a correct rendering... because i know you're all curious, with the z.m.l. authoring-tool, people will be able to see exactly what the viewer-program will display, in a window alongside their edit window. if it doesn't look right in the display window, you make changes in the edit window until it does. the rules are so simple that this is very easy to do. > A perfectly valid TEI file with no spelling errors > should be good enough to post. only if it's marked up _correctly_, though, right? > So we can have bith an alice30.pgtei and an alice30.jtei. that sounds like loads of fun... twice as much, at least! -bowerbird From hacker at gnu-designs.com Wed Oct 20 13:27:40 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 13:29:08 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <1e2.2ca102ae.2ea8140f@aol.com> References: <1e2.2ca102ae.2ea8140f@aol.com> Message-ID: > you said -- as clearly as it can be said -- that your attitude is to > leave the trailing edge behind. I said no such thing, and now I know why you seem to have such a loyal following of people "supporting" your efforts here. Try spending a little more time learning why people hold the opinions and convictions they have about this project, and a little less time rewording what they've said to suit your next argument to counter them with. I think you'll find a great deal more exists if you tempt people with wine than vinegar. > this place hasn't been able to move forward on x.m.l. in 3 years. if you > read the archives, you'll see all these threads are re-runs. I believe the term you meant to use there was XML, not x.m.l. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From joshua at hutchinson.net Wed Oct 20 13:29:30 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 13:29:41 2004 Subject: [gutvol-d] Posting TEI Message-ID: <20041020202930.5295E2F8F4@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > > marcello said: > > A TEI text is basically a text file. > > right. if you could tell josh this, i'd appreciate it. > Text file and PLAIN text file are two different things. XML/HTML/XHTML/TEI/TEI-Lite/ZML ... those are all text files. None of them are PLAIN text files, which is what you always seem to advocate. ZML (Zero Markup Language) is ... wait for it ... a MARKUP LANGUAGE. That means it is just like XML/HTML, etc. Except that the rest are open standards with open source utilities available. ZML is you just wasting our time. Josh From joshua at hutchinson.net Wed Oct 20 13:34:14 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 13:34:23 2004 Subject: [gutvol-d] Posting TEI Message-ID: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Jim Tinsley wrote: > > > Nobody has an objection to valid TEI texts, but valid TEI texts alone > > _are not enough_. An XML file that cannot be read (by an actual human) > > is as useful as a lock with no key. > > Not so. Having a TEI text posted would enable third-party developers to > come up with their own converter solutions eve if we didn't get very far > with ours. There are a lot of people around who already convert the text > files into other formats. Their jobs would get much easier. I'm hoping Jim (or someone else) can clear up something for me. If I create a TEI document, use it to create a regular 8-bit ASCII file and valid HTML file, then submit all three to the whitewashers ... will they post all three (assuming the ASCII file clears GutCheck and the HTML clears the W3C validator)? If not, why not? If yes, why can't this be the "incremental" development that Marcello was alluded to? Please, this is not attacking anybody's stance. I'm really just trying to understand the positions/policies here. Josh From jtinsley at pobox.com Wed Oct 20 13:59:34 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Wed Oct 20 14:00:58 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <4176BDEF.7050008@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> Message-ID: <20041020205934.GA22445@panix.com> On Wed, Oct 20, 2004 at 09:35:11PM +0200, Marcello Perathoner wrote: >Jim Tinsley wrote: > >>Nobody has an objection to valid TEI texts, but valid TEI texts alone >>_are not enough_. An XML file that cannot be read (by an actual human) >>is as useful as a lock with no key. > >Not so. Having a TEI text posted would enable third-party developers to >come up with their own converter solutions eve if we didn't get very far >with ours. There are a lot of people around who already convert the text >files into other formats. Their jobs would get much easier. > I really do not mean to be disrepectful when I -- speaking for myself -- say that I'm not interested in spending my time making developers' jobs easier. That's not what I'm here for. We have text, and HTML, both proven and well-supported formats that we know how to work with and for which we know there is a demand. I'll stick to those until we can see a way clear through to making successful XML. >>I really no longer give any headroom at all to the approach "Post XML >>Now Because That Is The One True Way And We'll Figure Out How To Read >>It Later." If for no other reason, then because the most important >>part of the WW job is to check the texts before posting, and if we >>can't read it, we can't find the errors, and if we can't find the >>errors, we can't fix 'em. > >A TEI text is basically a text file. So you can read it in any editor. >If you use emacs you can also validate the TEI file against the DTD >without leaving the editor. > >A perfectly valid TEI file with no spelling errors should be good enough >to post. Correct spelling is necessary but not sufficient. I don't know about other people, but I most commonly find errors by skimming the text. I can't do that with XML. Also, the validity of the XML gives me no comfort at all that, say, paragraphs are sensibly separated. I can do that with text or HTML to a high degree of accuracy, because I can read them naturally in a viewer program. There are many such types of problems that I can detect by eye quite quickly -- provided I am seeing the text laid out in a natural way. > >What you expect from us TEI developers is that we produce the 150% >perfect solution before you even consider starting to post files. That >is not the way software development works. > Not 150%, surely! :-) And it may not be the way software development works, but then we're not a software development project. HTML already works. TeX already works. I've spent enough of my hours trying to get XML to work; I now leave that to others. >And this attitude is in my opinion the main cause why we have gotten >nowhere with TEI in the last 3 years. > >Lets start now with a version 0.0.1 of the TEI process. Of course at >some later time we'll have to do all the posted files over again. >Probably more than once. But its better than sitting here and playing >with bowerbird . . . or vice-versa? :-) . . . >because we are bored. > Anyway, I disagree with your substantive point above. I say that until we have (or SOMEBODY has) a . . . . OK, a 90% solution, we should not post. > >>Next, we need a process, using open-source, cross-platform tools -- >>the standarder the better -- to convert that XML into, at a minimum, >>plain text and HTML. Other formats are welcome but optional. That >>process must work for _all_ teixlite files, not just ones that are >>specially cooked, using constraints not specified within the chosen >>DTD. Here's where we hit the rocks today. > >TEI defines a standard way to extend the DTD. I used this standard way >to extend the TEI DTD into what I called PGTEI. This still is a >perfectly valid TEI DTD according to the TEI specs. > > >>I don't want to imply specific means from which this process is to be >>constructed. Obviously XSLT is one possible approach, but I certainly >>do not want to imply limitations on what that process should use. The >>only things we must have -- both for our own internal practical >>purposes and for the use of future readers -- is that it should work >>reliably on _all_ texts that conform to the XML DTD chosen, be open >>source, and be cross-platform. A reader needs to be able to tweak the >>transform and re-run on her own desktop. > >You misunderstand what a DTD is. It just gives you syntactical >correctness. I can cook up a perfectly valid XHTML file which is >semantically bogus: > >
1
>
1.1
>

1.1.1

> ... >
>
>
> >This is valid HTML (didn't bother to check) but will render not so well. > >You cannot build a conversion tool that will produce good results on all >syntactically valid TEI files, like you cannot build a browser that will >make sense out of semantically bogus HTML files. I think one of us is not understanding the other, or perhaps both. I'm pretty sure I did not misunderstand what a DTD is. I do understand that an XML file that is valid just means that it is syntactically correct. This is actually the same point I made above: the fact that the XML is valid does not mean that paragraph breaks are in the right place -- which is one of the reasons why I must be able to convert it to something I can read in order to check it. I certainly do not require a conversion tool that will correct misplacement of paragraph marks (though it would be nice! :-) -- I just require that the process for, say, teixlite will work reliably on all teixlite files; that it will produce syntactically valid HTML, and, I suppose you might reasonably say "syntactically valid" text. Actually, now that I say that, I recall a case where syntactically valid XML made invalid HTML through a bug. Anyway, that's not the problem. If the process we agree for teixlite is, say, run it through Saxon, then I expect to be able to run all teixlite files through Saxon, and not have a submitter say "oh, no, you must use Xalan for this file, and not just any Xalan, but one with my patch in it." I have no objection to requiring, say, a patched version of Saxon, but if so I expect that patched version to be stable, to work for all teixlite files submitted, to be open-source, and to be cross-platform. > >Furthermore TEI is geared towards marking up existent texts, so scholars >can study the text without having to get the physical book. It is not so >good as a master format for print processing. That's why I had to add >some more tags and attributes to my DTD. (Which doesn't make any text >that uses my DTD less standard, because TEI is expressly designed to be >extensible. But I'm repeating myself.) > >>And just re-reading that last, when I say "must work reliably on ALL >>texts" I do not mean to imply that the same XSLT must be used for all >>texts, though obviously that would be of benefit, if we can manage it. > >So why not start posting texts marked up in PGTEI, which will by >definition work well in my conversion chain? > I think we were very close to that a year and a half ago. I had a request in to you to fix the "blockquote" thing, Greg had laid down the requirements for the license. And if anyone has followed up any of that, they didn't copy me on it. Does anyone apart from you favor using PGTEI? In principle, of course, it doesn't matter, but in practice, we really couldn't cope with multiple XSLT conversion methods all happening at the same time. Your chain was, at least, rather difficult to implement. I haven't checked to see whether it still is. Can it be implemented on a Mac? on Win32? Is there a stable tarball somewhere? You see, we appear to differ very fundamentally on one point. It's my lock and key analogy again. I do not want to start down the road of producing posted files from an XML if the transform, will be, for any reason, not repeatable in a year's time, or five, or ten. I do not want to start down the road of producing posted files from XML if an end-user who wants to -- on whatever platform -- cannot replicate the process. I think that you don't care about this, or at least, it's not a priority for you, but it is one for me. >And at the same time start posting Jeroens texts, which will convert >fine in his chain? > What we said last year still holds: we need somebody -- who is not me, not any of us WWs -- to create the process. The one that I defined in my earlier posting today. When we've got that, stable and documented, or at least understood, I really think we can proceed. But _I_, at least, have not got the time to spend experimenting, and I _know_ that David Widger doesn't. >This way we could both start putting up an automatic online conversion >chain. (The guy who did this already in Java has somehow vanished, so I >think we have to start over again.) > >For the start I will act as interim Post-Processor for people wanting to >post PGTEI and pass on to you only the perfectly good ones. You'll just >have to stick in the etext number where I put 5 asterisks. > No; I, at least, don't want to work with an experimental process in which each text is an exception. I want a process in which the text comes in, I add the header, I run the conversion process and I check the resulting files. If we can't get to that point, I don't, as I said before, want to spend time on it. If _you_ can do this, then there is no reason, given a stable process, why _I_ can't. When somebody gets to this point, please let me know. >I claim the .pgtei file extension, Jeroen can claim what extension he >sees fit for his files. So we can have bith an alice30.pgtei and an >alice30.jtei. > Why can't we just name them .xml? I see no reason to invent extensions. _Is_ there one? Not that it matters much, just curious why you would think this a good idea. jim From jtinsley at pobox.com Wed Oct 20 14:09:12 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Wed Oct 20 14:09:22 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com> References: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com> Message-ID: <20041020210912.GB22445@panix.com> On Wed, Oct 20, 2004 at 03:34:14PM -0500, Joshua Hutchinson wrote: > > >I'm hoping Jim (or someone else) can clear up something for me. If I create a TEI document, use it to create a regular 8-bit ASCII file and valid HTML file, then submit all three to the whitewashers ... will they post all three (assuming the ASCII file clears GutCheck and the HTML clears the W3C validator)? > No. Not today, and, I hope, never. >If not, why not? > This is exactly what was starting to happen, and what we backed away from. In the scenario you quote, where you create the HTML and text from the XML, how do I check the XML? Take your word for it that you didn't change anything when creating the HTML? If you could create the HTML, why can't I? What happened in a few cases was that I spent many hours checking each of the three files separately, and if I find a markup error in the HTML, how do I relate that back to the XML, and . . . it was just a nightmare. Not a good way to go. I think we were all clear on this much: the XML way forward is to develop a reliable conversion method that the WWs can use to produce the other files. I really, honestly, do think that until we've got that (and why shouldn't we have it?? what's so unreasonable about it?) we should hold off. Which is what we agreed. A moratorium. That has lasted a lot longer than any of us would have believed at the time, because despite the apparent reasonableness -- to me, at least -- of the request, we still ain't got it. jim From jonathan_ingram at yahoo.com Wed Oct 20 14:14:31 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 14:14:42 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020205934.GA22445@panix.com> Message-ID: <20041020211431.57061.qmail@web41709.mail.yahoo.com> --- Jim Tinsley wrote: > Correct spelling is necessary but not sufficient. I don't know about > other people, but I most commonly find errors by skimming the text. > I can't do that with XML. As my post earlier on today indicates, this isn't true. Assume that PG starts accepting some TEI-related schema. All you need is a relatively simple CSS stylesheet, and you can open the XML and view it perfectly directly. See http://faculty.washington.edu/dillon/xml/ for some examples where you can view styled (XML-conformant) TEI directly in your browser, with no intermediate transformations required. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From shalesller at writeme.com Wed Oct 20 13:41:26 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 14:15:12 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com> "David A. Desrosiers" writes: > Just because I support Mozilla, does not mean I do NOT support > MSIE. That being said, if my code works in 13 browsers, and fails in > MSIE, my code is not the problem. I do, however, refuse to add "hacks" > to get MSIE to do what it should be doing anyway... following the > standards. If your job is to build a castle in a swamp, you don't get to blame the fact that the castle sinks into the swamp on the fact that the swamp doesn't follow standards. In real life, MSIE is a defacto standard, and not fixing your code to work with it is being an ideological pain in the ass that doesn't like to work with reality. I don't have an option which browser to use. Many other people just don't care enough about computers and your causes to switch. Let's support our users as they come, and not ignore them because they aren't interested in computers. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jonathan_ingram at yahoo.com Wed Oct 20 14:24:13 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Wed Oct 20 14:24:24 2004 Subject: XML won't eat your children (was Re: [gutvol-d] jeroen's even-handed analysis) In-Reply-To: <4176A271.2010105@perathoner.de> Message-ID: <20041020212413.8205.qmail@web41708.mail.yahoo.com> --- Marcello Perathoner wrote: > Jonathan Ingram wrote: > > > and one of the best ways to *fail* to > > change their mind is to plonk 1400 pages of documentation in front of them > and > > say 'here's what you should be using, > > Then don't do that. > > You don't plonk the IBM PC Technical Reference Manual (5000 pages) in > front of your secretary if you want her to type a few pages in M$-Word. > You just give her a "Word for Dummies" book and that is all she needs. > She don't need to know about the difference between AGP and PCI-X bus. You're quite right. I let the current confrontational 'vibe' of this mailing list get the better of me. Sorry. The point I was trying to make is that there are many people, myself included, who need to be given real arguments in favour of using something like TEI, and who won't accept that TEI does things the right way just because it's been around for a while :). There's quite a few people like me at DP, and I imagine there are quite a few more reading gutvol-d. As I've convinced myself that, at least in the areas I've investigated, TEI's methods seem quite sensible, I'm more open to 'trusting' the rest of it... and I thought some people would be interested in joining me on this journey. As gutvol-d is being a little too confrontational for me at the moment, I'll probably go back to exhibiting my enthusiasm in the more congenial atmosphere of DP. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From shalesller at writeme.com Wed Oct 20 13:54:51 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 14:29:15 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041020205451.E250A4BDA9@ws1-1.us4.outblaze.com> "Joshua Hutchinson" writes: > In my case, at the very least, if you find something > I've worked on that does NOT degrade gracefully in > Lynx, etc.... Let me know. I consider that a bug in my work. Part of my problem is that I'd rather have it just work in Lynx, IE, etc. instead of degrading gracefully. In the long run, I'd like to see us generate HTML from the XML that just works in Lynx and IE by hard-coding the CSS stuff in where possible, possibly even producing a special text-browser HTML; this all of course alongside the HTML for standards-compliant browsers (which we probably ought to call Netscape 6+, Mozilla and most other non-IE browsers, instead of assuming our client base knows or cares about standards-complaince.) Of course, CSS is the best option right now. (It's a little off-topic, but are you still up for doing the first Early English Text Society HTML edition?) -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jtinsley at pobox.com Wed Oct 20 14:32:06 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Wed Oct 20 14:32:17 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020211431.57061.qmail@web41709.mail.yahoo.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> Message-ID: <20041020213206.GA10983@panix.com> On Wed, Oct 20, 2004 at 02:14:31PM -0700, Jonathan Ingram wrote: >--- Jim Tinsley wrote: >> Correct spelling is necessary but not sufficient. I don't know about >> other people, but I most commonly find errors by skimming the text. >> I can't do that with XML. > >As my post earlier on today indicates, this isn't true. > If I may nit-pick, I think it more correct to say that it isn't _always_ true. That is, it is not true when there exists a CSS that works with the XML. Jeroen provided XML like this, which I thought was very good indeed. For any of you who haven't seen it, please point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml which is an absolute pleasure to read. (Well, if you're a geek, that is, and if you ain't, whatcha doin. here?? :-) I said before, and I say again, that where such an XML is provided, HTML is probably redundant. ("Probably" because a significant use of HTML is as input to PDA readers like, say, Mobipocket, and I'm not sure if they would swallow this XML without requiring a Heimlich.) I know of no CSS for Marcello's PGTEI. Perhaps one could be crafted for it. >Assume that PG starts accepting some TEI-related schema. All you need is a >relatively simple CSS stylesheet, and you can open the XML and view it >perfectly directly. > >See >http://faculty.washington.edu/dillon/xml/ >for some examples where you can view styled (XML-conformant) TEI directly in >your browser, with no intermediate transformations required. > It does still leave the plain-text question hanging, but I do think that XML+CSS is a Good Thing, even if the XML is also destined to go through XSLT as well. jim From hacker at gnu-designs.com Wed Oct 20 14:36:46 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 14:38:08 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com> References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com> Message-ID: > In real life, MSIE is a defacto standard, and not fixing your code to > work with it is being an ideological pain in the ass that doesn't like > to work with reality. Exactly. Its a good thing my code works in everything, from PDA to full blown xinerama-enabled desktop with browser, without any changes or hacks or workarounds required. But I agree with part of your statement. There are thousaands of sites out there that don't take the same level of care that I take with my code, to ensure this overly-pedantic level of compatibility. > I don't have an option which browser to use. Many other people just > don't care enough about computers and your causes to switch. These aren't "my causes". If users want a richer Internet browsing experience, they'll explore the alternatives, or they won't. If they want to reduce their level of maliscious exposure, they'll explorer the other alternatives, or they won't. If companies want to reduce the amount of technical support calls and man hours required to support MSIE, they'll switch to a standards-compliant browser, or they won't. The choice is theirs. > Let's support our users as they come, and not ignore them because they > aren't interested in computers. I support several thousand users in various capacities and across many dozens of projects, including my own. As a mentor, educator, and student myself, it is part of my process to present all of the possible choices to solve a particular problem to the user, and let them decide. Just saying that you can't support suggesting the alternatives because Microsoft has a larger percentage of their file manager in use on the desktop environment, isn't fair to the end-user. But this is getting way off topic, into the realm of religious wars about "Which editor is best?" (vi, of course ;), or browser wars. Let's get back to focusing on the issues related to PG and making the project and ancillary support tools and formats better and better. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From shalesller at writeme.com Wed Oct 20 14:12:16 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Oct 20 14:43:25 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041020211216.7ECBE4BDA9@ws1-1.us4.outblaze.com> Gutenberg9443@aol.com writes: > Lead, follow, or get out of the way. Can you supply a way > to do this? If so, do it. We can. XML, among other things, is a simple way to do this. > But all the same, I'd rather that readers have that version > than no version at all of the book. How many people can't read the Latin-1 version? How many people read that version when they could read the Latin-1 version just fine? People have sent back editions with the accents re-added despite the fact that we already had an edition with accents, so there is evidence that having both editions is bad. > The principle of the greatest good for the greatest number > doesn't mean let's throw out the lesser numbers. How dare we be so provincial as to ignore the EBCDIC users. There's thousands of character sets in the world--what gives us the right to ignore the "lesser numbers" of non-ASCII users? -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jtinsley at pobox.com Wed Oct 20 14:50:41 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Wed Oct 20 14:50:57 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020213206.GA10983@panix.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> Message-ID: <20041020215041.GA3631@panix.com> I was just reading over my last posting, hoping I wasn't the one sending bad vibes to Jonathan, who is exactly the kind of person we _need_ in a discussion like this, when I came across something else that Marcello said, that I didn't comment on first time round: >Lets start now with a version 0.0.1 of the TEI process. Of course at >some later time we'll have to do all the posted files over again. Now, please don't take this as a policy statement or anything, but I really, really HATE doing anything KNOWING that it's wrong and will have to be done again. I mean, bone-deep HATE it. Factor that in however you will. An argument against setting up an experiment in a production environment, or a personal foible? I report -- you decide! :-) jim From Gutenberg9443 at aol.com Wed Oct 20 15:13:04 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 15:14:38 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <1e3.2c7b180e.2ea83cf0@aol.com> In a message dated 10/20/2004 12:43:31 PM Mountain Standard Time, shalesller@writeme.com writes: we should be doing a lot more Spanish and French and Arabic and a bunch of other languages that we generally totally ignore. We don't ignore them. We beg, plead, and implore for them. But we don't get them. I sent a personal letter to the King of Saudi Arabia explaining what we are doing and telling him that we would greatly appreciate both books in Arabic and Arabic books that are translated into English. In case you don't know it, some of the most important books of exploration and history in the middle ages happen to be in Arabic. So are some seminal mathematical books, along with a good many other books. His Majesty's staff ignored me. I sent a similar letter to the Saudi Aramco Oil Company. I got similar results, despite the fact that Saudi Aramco World is one of the best National-Geographic type magazines in print. Every month when our copy arrives my husband reads it first on the grounds that he's the historian in the family; then I get it the second he lays it down. We WANT all these other texts. Obviously what goes for English does not necessarily go for other languages, so quit badgering me about that. Now, have you got any bright ideas where we can GET those books in other languages? If so, get them, and the sooner the better. I assure that they will be posted as soon as their copyright status is determined. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/d5617341/attachment.html From Gutenberg9443 at aol.com Wed Oct 20 15:15:15 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 15:15:45 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <67.35a9d1d7.2ea83d73@aol.com> In a message dated 10/20/2004 12:50:16 PM Mountain Standard Time, joshua@hutchinson.net writes: Are you sure we're talking about the same thing? By page break, I mean when you get to the bottom of the physical piece of paper. What you describe sounds more to me like what I usually refer to as a thought break ... a little white space or a graphic symbol between sections of text to indicate a scene transition or time passing, etc. Typically, in PG texts, those are marked with 5 asterisks (functionally equivalent to your # # #). Okay. The name of those things is page break. Obviously in computerese the word page break has a different meaning, as I should remember myself considering how many times I have inserted a page break into something. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/d05c51b4/attachment.html From Bowerbird at aol.com Wed Oct 20 15:16:32 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 15:16:47 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <8b.17ed6e78.2ea83dc0@aol.com> the hacker said: > I said no such thing it's silly to argue about this. this is what you said: > For the users who refuse to learn, to adapt, and to grow, > they can stagnate and stay in their own nice warm puddle. you also said this: > if my code works in 13 browsers, and fails in MSIE, > my code is not the problem. I do, however, > refuse to add "hacks" to get MSIE to do what > it should be doing anyway... following the standards. those aren't the types of things michael hart would say. there's no reason to get all up in a huff. i'm not telling you that you need to change. on most days, i share your feelings. and i have said often that every e-book-related innovation will come sniffing for a chance to work with this library, and needs to prove its ability to handle it to earn its stripes. so i'm not faulting you for being here. that's why i'm here. but what you've said here is _not_ what michael would say. so the simple fact is that yours is _not_ the attitude that has guided project gutenberg from where it started to today. the mission here has been to do _whatever_it_might_take_ for those e-texts to be available to the maximum audience. among the many things that that has meant is to work with the _trailing_ edge, not the _leading_ edge, of technology. and that strategy hasn't caused it to "stagnate", but rather what has caused it to grow into the biggest cyber-library... -bowerbird From Bowerbird at aol.com Wed Oct 20 15:22:58 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 15:23:11 2004 Subject: [gutvol-d] re: news flash, josh is psychic too Message-ID: <12a.4e63e09c.2ea83f42@aol.com> josh said: > ZML (Zero Markup Language) is ... wait for it ... a MARKUP LANGUAGE. you seem to be psychic today too. must be the rain... a plain-text file and a z.m.l. file are so similar that -- even using your newfound psychic powers, josh -- you probably can't tell them apart. look at alice30.txt. is it a plain-text file? or is it a z.m.l. file? you tell me. the important points about z.m.l. that make it relevant are that: 1) it is extremely simple, so simple anyone can use it, and 2) it does the job that needs to be done. now, let us discuss x.m.l. in that context... -bowerbird From Gutenberg9443 at aol.com Wed Oct 20 15:34:14 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Wed Oct 20 15:34:32 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <194.308d19da.2ea841e6@aol.com> In a message dated 10/20/2004 3:43:48 PM Mountain Standard Time, shalesller@writeme.com writes: > The principle of the greatest good for the greatest number > doesn't mean let's throw out the lesser numbers. How dare we be so provincial as to ignore the EBCDIC users. There's thousands of character sets in the world--what gives us the right to ignore the "lesser numbers" of non-ASCII users? Excuse me, I thought that was exactly what I said. Doesn't "doesn't mean let's throw out the lesser numbers" mean the same as "what gives us the right to ignore the lesser numbers"? I don't want to ignore anybody. And I'm crawling back into the woodwork. Every time I start posting I wind up in a flame war, which is the last thing on earth I want. Anne Egads. I'm crawling -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041020/af577479/attachment-0001.html From sly at victoria.tc.ca Wed Oct 20 15:40:15 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Oct 20 15:40:26 2004 Subject: [gutvol-d] Languages in PG In-Reply-To: <1e3.2c7b180e.2ea83cf0@aol.com> References: <1e3.2c7b180e.2ea83cf0@aol.com> Message-ID: > In a message dated 10/20/2004 12:43:31 PM Mountain Standard Time, > shalesller@writeme.com writes: > > we should be doing a lot more Spanish > and French and Arabic and a bunch of other languages that we generally > totally ignore. In reply, Anne, Gutenberg9443@aol.com wrote: > We don't ignore them. We beg, plead, and implore for them. But we don't get > them. More in the same vein... Perhaps for a little reminder, check out this faq: http://gutenberg.net/faq/G-15 I've been contributing a few French-Canadian books to PG myself, by reformatting some already online elsewhere. I've also done the same with German texts in the past. I find it goes a good deal slower when I'm not too familiar with the language in question, because I'm afraid of letting obvious mistakes get through... Also, the numbers below (taken from the catalog) show that, although PG's non-english content can certainly be expanded, it is not insignificant: French (367) German (307) Finnish (85) Chinese (69) Spanish (59) Italian (36) I just scanned through a list of the titles posted in the last seven days, and a quick count gave me 23 in languages other than English. That doesn't seem to me to be "totally ignoring" Andrew From hacker at gnu-designs.com Wed Oct 20 15:45:07 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 15:46:10 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <8b.17ed6e78.2ea83dc0@aol.com> References: <8b.17ed6e78.2ea83dc0@aol.com> Message-ID: > it's silly to argue about this. this is what you said: >> For the users who refuse to learn, to adapt, and to grow, >> they can stagnate and stay in their own nice warm puddle. You wrongly asserted that I voted to "leave the trailing edge behind". I said no such thing. Users make their own choice to learn or not to learn. The results of that choice are their own, and I have nothing to do with it. I can only do my part to educate and mentor as necessary. End of discussion on this point. > you also said this: >> if my code works in 13 browsers, and fails in MSIE, >> my code is not the problem. I do, however, >> refuse to add "hacks" to get MSIE to do what >> it should be doing anyway... following the standards. > those aren't the types of things michael hart would say. Well that isn't surprising. I'm not Michael Hart. > so the simple fact is that yours is _not_ the attitude that has guided > project gutenberg from where it started to today. You're comparing how I treat content delivered for an audience using primarily web browsers (i.e. webpages) with PG etexts, which are not being viewed in a web browser. My ideas and beliefs on web development are quite different from my ideas and beliefs about how to best engineer a scalable electronic book format. Please don't mix my words up like this. This is the third time you've done it, and each time, you've strategically taken my words out of context to try to suit your own bend in the discussion. > among the many things that that has meant is to work with the _trailing_ > edge, not the _leading_ edge, of technology. and that strategy hasn't > caused it to "stagnate", but rather what has caused it to grow into the > biggest cyber-library... My goal is to provide PG etexts (as well as those from about a dozen other places) in a format for everyone who can read, regardless of platform, reader, language, file format, and also to include those who cannot read at all. I never claimed that my interest in the PG project was in direct alignment with Michael Hart, or anyone else on this list for that matter. I fully expect that my beliefs will intersect a lot of the beliefs of others, and contracdict those of yet other members here. Such is life, and it is through this combination of agreement and disagreement, that actual action gets done. Nothing would evolve, if there weren't others who thought their own beliefs were better than the beliefs of others, and that they were committed to persuing them until completion, despite strong objection from others. There's no point in continuing this further. I'm done. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Wed Oct 20 16:03:36 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 16:03:54 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <1e1.2d1fafad.2ea848c8@aol.com> the hacker said: > You wrongly asserted that I voted to > "leave the trailing edge behind". I said no such thing. > Users make their own choice to learn or not to learn. the trailing edge _is_ largely composed of people who -- as you put it -- "refuse to learn, to adapt, and to grow". (ironically, some of them actually want to read books!) further, telling them you don't care if they "stagnate and stay in their own nice warm puddle" is leaving them behind. you can try to fancy it up with a spin, but that's what it is. and the fact of the matter is that a whole lot of people are _stuck_ with machines that simply will not run a browser that is "standards-compatible", because they don't believe -- rightly or wrongly -- that they can afford a newer one, perhaps because this month they're trying to decide whether they want to spend their last dollars keeping warm or eating. there are people right here on this listserve who've told you that they don't control the browser that their machine runs, probably because they are using a computer in their library that billy g. installed there precisely to extend his monopoly. or maybe, once again ironically, it was a computer that was put into a school when microsoft fulfilled the terms of the court judgment, thereby (amazingly) extending the monopoly. or maybe, even more likely, they're in a school that hasn't had any budget to buy new computers since the ones they got in '94. there's a whole big world out there, with lots of people in it... myself, i'm running a mac g3, circa 1998, with os8.1. can you tell me a standards-compliant browser to use? -bowerbird From jgruber at tampabay.rr.com Wed Oct 20 16:09:09 2004 From: jgruber at tampabay.rr.com (Joseph R. Gruber) Date: Wed Oct 20 16:08:46 2004 Subject: [gutvol-d] Why Bowerbird is a genius In-Reply-To: Message-ID: <200410202308.i9KN8SB9009523@ms-smtp-04.tampabay.rr.com> Not raw ascii glory -- hex...get it right. And now I don't end all my programs by ctrl+alt+del but ones that lock up I do (eg: your program). Also, why announce your program for beta testing and then say don't worry about reporting bugs since you have a new version coming out soon anyway...what a waste of time. Oh and you can put your ZML in .txt's and package them with an installer. You can then guarantee the txts are properly "formatted". Joseph -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com Sent: Wednesday, October 20, 2004 12:26 PM To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com Subject: re: RE: Re: [gutvol-d] Why Bowerbird is a genius gruber said: > Run the program and then Ctrl+Alt+Del it. > Ooops...there goes a "never-crash". control-alt-delete? is that how you end all your programs? :+) i recommend you try the "quit" button instead, or choose "quit" or "exit" under the file menu. but if that's a bug, which is entirely possible -- to be expected in fact -- in a beta-version, i'll fix it. but please take the bug-reports to the beta-test listserve, so they'll be logged. but don't bother with doing that _now_. as your version is almost 3 months old. a new version will be out very soon, and you should wait to test that one instead... > P.S. Why are you hiding the etexts in your code > instead of making them separate .txt's? first things first. priorities. and control of the degrees of freedom for enhanced troubleshooting the first objective is to get the program solid. to focus on that, it's wise to use content that i _know_ has been correctly formatted in z.m.l. once the app is acting correctly and is stable, i'll turn to texts that might be marked up wrong, confident that if i get unexpected behavior, it's due to an incorrect text or a defective z.m.l. rule, in all likelihood, and not some bug in the program. but the e-texts aren't really "hidden" right now. if you scrutinize any version of the program in a file-viewer, you will see the e-texts inside, "hiding" in plain sight in their raw-ascii glory. they can be recovered, in full, easily, any time. -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From jgruber at tampabay.rr.com Wed Oct 20 16:13:49 2004 From: jgruber at tampabay.rr.com (Joseph R. Gruber) Date: Wed Oct 20 16:13:25 2004 Subject: [gutvol-d] sudden surge in demand In-Reply-To: <8b.17ea1fb8.2ea81ced@aol.com> Message-ID: <200410202313.i9KND7T6008392@ms-smtp-03.tampabay.rr.com> For those who don't want to give your email to this (I'll hold off on what I really want to say) -- you can get the latest version at: http://www.josephgruber.com/pudding1020-exe.zip It's virus free (but if you don't trust it feel free to give your info to this....nm. ;) Joseph -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Bowerbird@aol.com Sent: Wednesday, October 20, 2004 3:56 PM To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com Subject: [gutvol-d] sudden surge in demand due to the sudden surge in demand, i've uploaded a new version of my beta-test viewer-program... it can be found in the files section of the yahoogroups listserve for the beta-test, which is under the name of "zml_talk". you can join that beta-test by subscribing via e-mail: zml_talk-subscribe@yahoogroups.com this upload is just "the daily build", and has _not_ been reviewed, so it should be considered as such... people who want to run a version _known_ to be stable should hold off on this. such a version will go up soon. additionally, no reports are necessary on this version, since a few of the features have yet to be implemented. but if you'd like to see the development of the program since the version uploaded on 7/27, it's there for you... sorry it's taken so long, but you know how things are. time flies when you're squashing bugs... -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From sly at victoria.tc.ca Wed Oct 20 16:30:06 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Oct 20 16:30:18 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020215041.GA3631@panix.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> <20041020215041.GA3631@panix.com> Message-ID: I've read almost every that's been sent to the gutvol-d list in the recent burst of messages. I think it may be worthwhile trying to place everything that's been said in the larger perspective... Throughout much of its history, PG as an organization has been open to posting texts with formatting details or additional file formats done as volunteers wished to contribute them. Some examples of closed, propriatory formats (such as .prc and .lit) can be even found. This freedom has led to a wonderful array of inconsistencies and differences of approach which are probably most fully realized only by those who try to analyze, or convert large portions of the PG collection. (At least a few people involved in these recent discussions fall into that catagory.) I would argue that if we go about posting various people's implementations of markup using XML, we risk forming an increasingly incompatible jumble of formats. Andrew From Bowerbird at aol.com Wed Oct 20 16:38:01 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 16:38:17 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes Message-ID: <158.419ce2ec.2ea850d9@aol.com> marcello said: > 1. 14 Feb 2003: You announce you will > code an open source ebook reader. yes. that was a big joke. :+) i was pulling jon noring's leg. read posts on that listserve. everyone knew i am anti-xml. everyone knew i am anti-oeb. anyone who's stupid enough to think _i_ would be _serious_ in announcing an effort to write an oeb viewer is _really_ stupid. nonetheless, i _would_ have gone ahead with the project if anyone would have responded. as long as _some_ programmers were willing to puzzle through the difficulty of figuring out o.e.b., i'd happily advise them on the u.i. but no one showed up, so it died. that's not unusual. there are many open-source projects on sourceforge that have died with one contributor. heck, there are more than a couple _directly_ for project gutenberg -- creating viewer-programs -- that have died early on the vine... my guess is that jon noring still has less than 3 programmers involved -- jon, care to comment on that? -- in his open-source openreader thing, and david rothman has been flogging openreader _incessantly_ on his blog, even relaying a specific request for mac programmers to join in and help. nonetheless, i would _still_ go ahead with _my_ open-source o.e.b. project, if any programmers were to turn up. do you know any? let's go to work! the more viewers we have, the better. on second thought, have 'em join jon's effort. the fewer open-source projects that _fail_, the better off we will be, in the long run... see, i've prodded jon noring for _years_ to get a viewer-program for his beloved o.e.b. it's _silly_ to propose a "standard format" and then not have any tools that support it! (you need a viewer _and_ an authoring-tool!) i don't know if this joke was the thing that actually got jon to get to work on the task, but if it was, then i am sure glad i did it. (he's a hard worker. if he directed his energy in a productive way, he might do a good job.) actually, it was probably the full-on review i wrote in response to jon's o.e.b. puff piece on ebookweb.com that was the real motivation, if there was anything specific that _i_ did. but whatever got him to pay some attention to the point i'd been making for many years, it was "a good thing", as jon would put it. > 2. 19 Oct 2004: You have nothing to show. _that_ project has "nothing to show". my own viewer-program, which has _never_ been open-source, and probably never will be, not until the open-source community can match it, is ready for beta-testing. have i given the address that people can use to join that beta-test listserve? yes, i do believe i have. > 3. You retroactively declare the announcement to be a joke. it was a joke from the time the post was a gleam in my eye. :+) anyone who knows me knows that i do not do press releases. the mere _thought_ is funny. i puke on the shoes of the press. > 4. You think that did save your face. > Think again. i'm thinking all the time, marcello, all the time... > So you admit you were lying to the press how outrageous! it's _their_ job to lie to _me_! :+) what was i thinking? oh yeah, i know. that "press release" never got any farther than the listserve where i "released" it, which -- if i remember correctly -- was populated by about two posters at the time, jon noring and me. hence, the joke... -bowerbird From stephen.thomas at adelaide.edu.au Wed Oct 20 17:02:37 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Oct 20 17:02:59 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> Message-ID: <4176FC9D.6000409@adelaide.edu.au> David A. Desrosiers wrote: > >> I can guarantee that CSS file is in the PG directory. I can't >> guarantee that Joe Sixpack will download that when he grabs the HTML >> file. > This is one of those problems with no easy answer. If you want the user to be able to download your book to read offline, then you've got to also make sure the user downloads the style sheet that goes with it. [If you only expect them to read online, it doesn't matter.] My own solution is simply to make a zip file for downloading, which includes both the html page(s) and style sheet. I use the same style sheet for all books, but actually copy it to each ebook's directory, so there are currently around 850 copies of the same style sheet. But it is trivial to update them all from the master. It still uses more space, but the alternative, having all the html files link to a single css doesn't allow for zipping and downloading. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From stephen.thomas at adelaide.edu.au Wed Oct 20 17:02:47 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Oct 20 17:03:05 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> Message-ID: <4176FCA7.5060809@adelaide.edu.au> David A. Desrosiers wrote: > > You can't translate a book into something read in a web browser, and > retain the same functionality. The whole point of a scrollbar is to > remove that constraint. Yeay! Something I've been saying for years. The "e" in ebook gives us opportunities that don't exist in print, so let's use them. > > Though I agree, unnessarily-long webpages (scrolling down for > hundreds of pages) are a pain, but the alternative is much more painful. Reading a book with hundreds of pages is painful. I don't see why scrolling is any more painful than turning pages. (The Mobipocket reader for Palm also has an auto scroll option which just scrolls the text slowly by, which could be a nice feature in browsers.) One advantage of print is the ease of bookmarking a spot -- something that can't be done easily on most ebooks, although I'm working on a simple HTML solution. I also now provide a single HTML file version and a multi-page version of my ebooks. Usually the multi-page version splits the work into chapters (or whatever is the major division for the work). The multi-page version was mainly intended to make online reading easier -- there's less to download for each chapter. It also means that Google is more likely to index the content -- they have, I think, a 100k limit per file. But most browsers can easily accomodate the complete, single-file version of the average work, up to a MB or so. Something like Don Quixote is a bit more of a problem as a single file, being large in text size and also carrying many illustrations, making the total download many megabytes. Something that large really needs to be split. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From Bowerbird at aol.com Wed Oct 20 17:15:09 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 17:15:25 2004 Subject: [gutvol-d] Why Bowerbird is a genius Message-ID: <154.41f2c1d2.2ea8598d@aol.com> joseph said: > Not raw ascii glory -- hex...get it right. have you looked? maybe it's different on the p.c. side, but on my mac, when i pull either one of the versions into a file-viewer, and scroll down, the text is right there. > Also, why announce your program for beta testing > and then say don't worry about reporting bugs since > you have a new version coming out soon anyway... > what a waste of time. you seemed very eager to have the program, and to spread it around, even posting it on your own site, so i figured that kind of devotion deserved the newest version i could provide. and once i have gotten all the features to where _i_ want them, in a few days or so, _then_ i will be ready for beta-testers to send me reports on that version. until then, anything they report might well be something that i already intend to fix anyway... but then again, you don't really care anyway, do you? nonetheless, i will respond to you, at least until we have posted as many messages with the "genius" header as we posted with the "kook" header, which was a lot... -bowerbird From hacker at gnu-designs.com Wed Oct 20 17:18:16 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 17:19:13 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <4176FC9D.6000409@adelaide.edu.au> References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> <4176FC9D.6000409@adelaide.edu.au> Message-ID: > I use the same style sheet for all books, but actually copy it to each > ebook's directory, so there are currently around 850 copies of the same > style sheet. But it is trivial to update them all from the master. That seems like a horrible waste of inodes. I feel this pain, because I ran out of inodes on one of my arrays working on some PG works, even though I had 50GiB of space free on the drive. I had to reformat with more inodes to work around the problem. > It still uses more space, but the alternative, having all the html files > link to a single css doesn't allow for zipping and downloading. Here's an easy solution: In each .zip, you include a copy of the stylesheet, the same stylesheet you include with every copy... except, when you unzip the works, they go into a structure like this: Gutenberg/ |-- books | |-- Book_One.xml | `-- Book_Two.xml `-- styles `-- Gutenberg.css Every .zip that you unzip into there, will overwrite Gutenberg.css with the copy that you duplicate inside each .zip file, and the .xml (or .html or text or whatever) versions of the books go into a separate subdir. In your .xml files, you use the standard clause or simply point your style declaration to ../styles/Gutenberg.css. This is exactly how it works on the Web in general, for very similar projects. Did that make sense? David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Wed Oct 20 17:28:51 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 17:30:14 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <4176FCA7.5060809@adelaide.edu.au> References: <20041020154800.BF87FEDC5F@ws6-1.us4.outblaze.com> <4176FCA7.5060809@adelaide.edu.au> Message-ID: > Reading a book with hundreds of pages is painful. I don't see why > scrolling is any more painful than turning pages. (The Mobipocket reader > for Palm also has an auto scroll option which just scrolls the text > slowly by, which could be a nice feature in browsers.) We've had that in Plucker for quite some time also (and Plucker's format is openly documented, unlike MobiPocket's format). Related to that, you CAN have autoscroll in your browser (again, making the assumption that you're using a standards-compliant browser). http://autoscroll.mozdev.org/ > One advantage of print is the ease of bookmarking a spot -- something > that can't be done easily on most ebooks, although I'm working on a > simple HTML solution. We've got bookmarking, and we're adding cross-document bookmarks and interlinking in our next version. We've been thinking about these (and other similar problems and solutions) for quite awhile now. > I also now provide a single HTML file version and a multi-page version > of my ebooks. Usually the multi-page version splits the work into > chapters (or whatever is the major division for the work). I do the same for my HOWTO documents, sourced from SGML. One call each with with jade or sgmltools will generate the multi-document version of HTML or the single-document version. I run that through hindent and tidy for a few passes, and out comes properly-validated XHTML (mostly). You can see what one of those kinds of preparations looks like over here. This particular work is only HTML4.0 Transitional, and not fully validated yet, but you can see what I did with the stylesheet and general output of the SGML: http://faqs.gnu-designs.com/pokerfaq/ The mobile version is over here (with screenshots): http://plkr.org/news/46 > The multi-page version was mainly intended to make online reading easier > -- there's less to download for each chapter. It also means that Google > is more likely to index the content -- they have, I think, a 100k limit > per file. Funny you mention that. I've been doing some SEO work on my HTML version of the 9/11 Commission Report, and the original chapters I converted were 100+k and more, many of them into the 200k and 300k range. I took some time to split those up into their own subchapters. You can see THAT work over here: http://911.gnu-designs.com/ I put a ton of hand-editing and automated work into this particular effort. With over 7,000 downloads of the mobile formats I've created from that work, it seems to be quite popular. It is this same level of quality that I am striving for with PG works I convert. > Something that large really needs to be split. We agree. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Wed Oct 20 17:32:59 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 17:33:14 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <75.360fea32.2ea85dbb@aol.com> stephen said: > One advantage of print is the ease of bookmarking a spot -- > something that can't be done easily on most ebooks, depends on the viewer-program. > although I'm working on a simple HTML solution. ok. :+) > most browsers can easily accomodate the complete, > single-file version of the average work, up to a MB or so. that download is downright painful if you're on dial-up. and if there are images involved, it gets even worse. and -- at least on some browsers, not naming any names -- when c.s.s. is used, the formatting doesn't seem to get done until the whole file is downloaded, which is a huge handicap. (and every time you resize the window, you have to wait again.) > Something like Don Quixote is a bit more of a problem > as a single file, being large in text size and also carrying > many illustrations, making the total download many megabytes. > Something that large really needs to be split. or downloaded as a zip and read offline. and even then, it can take a while for a browser to load and display it. if you want, i can do some actual timed trials on my mac; i suspect that the results might surprise you. in general, i've found speed in computers is very easy to get used to; it's difficult to remember how slow an old computer was, unless you actually fire it up and deal with it once again. when you do, it'll usually dumbfound you how slow it was, and you wonder how you ever got anything done at that pace. same with broadband. put a bunch of techies back on dialup, and they would understand why the masses are slow to adopt -- the systems the techies are building are simply unusable... -bowerbird From joshua at hutchinson.net Wed Oct 20 17:46:40 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 17:46:37 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020213206.GA10983@panix.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> Message-ID: <417706F0.4070207@hutchinson.net> Jim Tinsley wrote: >On Wed, Oct 20, 2004 at 02:14:31PM -0700, Jonathan Ingram wrote: > > >>--- Jim Tinsley wrote: >> >> >Jeroen provided XML like this, which I thought was very >good indeed. For any of you who haven't seen it, please >point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml >which is an absolute pleasure to read. (Well, if you're >a geek, that is, and if you ain't, whatcha doin. here?? :-) > > First off, let me say that ... is a beautiful e-text. I really like the look and thanks to Jeroen for producing it and Jim for point it out! And next, let me make a modest proposal. Jon (in the DP forums) is making some progress toward a XML/CSS standard of sorts. I'm going to be watching closely (and helping as much as I can). One of the things I'm going to be pushing for is TEI-Lite compliance as much as possible. Since Marcello has his PGTEI document guidelines on the web site, I'll be looking through that for ideas and such. I'll be going over this with Jon when I can, but my early idea is that we work on a couple of DP e-texts (the two of us have TONS to choose from!) and improve the XML markup standard enough for basic work. In a few weeks or so, I'd like to get a few projects posted to PG that use XML (TEI) with a CSS style sheet in place of the normal HTML that we always produce on our projects. The normal text file will of course be created. Once we have a canon of TEI to work with, hopefully the developers out there can start working on tools to help produce HTML or TEXT or PDF directly from the master. It seems to me that the XML/CSS process is the best method to incrementally approach a XML master. Marcello has a point that if we wait until we have a 100% solution, we may never get there... But a XML/CSS process is doable now and it gets us closer. Now... everyone let me know where my logic fails. (Everyone but bowerbird... don't even bother to respond, please... I'm trying to actually get something going besides a diverting flame war!) Josh From joshua at hutchinson.net Wed Oct 20 17:51:51 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 17:51:48 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com> Message-ID: <41770827.4070100@hutchinson.net> David A. Desrosiers wrote: > But this is getting way off topic, into the realm of religious > wars about "Which editor is best?" (vi, of course ;), or browser wars. > Let's get back to focusing on the issues related to PG and making the > project and ancillary support tools and formats better and better. Bah... Winguts rules VI! (Ok, you have to be a DP geek to know the winguts editor... but trust me, it rules! ;) ) Josh From joshua at hutchinson.net Wed Oct 20 17:58:18 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 17:58:15 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <194.308d19da.2ea841e6@aol.com> References: <194.308d19da.2ea841e6@aol.com> Message-ID: <417709AA.7040201@hutchinson.net> Gutenberg9443@aol.com wrote: > > And I'm crawling back into the woodwork. Every time I start posting I > wind up in a flame war, which is the last thing on earth I want. > > Anne > Come on back out, Anne. It's just that everyone gets so worked up when bowerbird is around that they start snapping at everything. Plus, each of us has his or her own little pet issues (for instance, mine is wanting a master document format ... David's is character set support and LYNX level browser support ... Jon loves his CSS markup). When our pet issues come up, sometimes we have trouble seeing the forest for the trees or realize the tone of our words. David and I have argued plenty of times on these subjects and probably will plenty of times more.... but when I stop to think about it, we actually agree on most things. We are just passionate and text communication doesn't always handle passion well. Anyways, no one is trying to chase you away. Please come back out to play! Josh From joshua at hutchinson.net Wed Oct 20 18:01:02 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 20 18:00:59 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: References: <8b.17ed6e78.2ea83dc0@aol.com> Message-ID: <41770A4E.9030305@hutchinson.net> David A. Desrosiers wrote: > There's no point in continuing this further. I'm done. You know, I think everyone that has a conversation with bowerbird says this at some point. Heck, I've said it have dozen times or more ... and then he says something so aggravating that I can't help but respond. If nothing else, and I've said this before, bowerbird is a very good troll. Josh From hacker at gnu-designs.com Wed Oct 20 18:11:02 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Wed Oct 20 18:12:15 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <41770A4E.9030305@hutchinson.net> References: <8b.17ed6e78.2ea83dc0@aol.com> <41770A4E.9030305@hutchinson.net> Message-ID: > If nothing else, and I've said this before, bowerbird is a very good > troll. Every bridge has its troll. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From servalan at ar.com.au Wed Oct 20 18:18:35 2004 From: servalan at ar.com.au (Pauline) Date: Wed Oct 20 18:20:19 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <41770827.4070100@hutchinson.net> References: <20041020204126.7947B4BDA9@ws1-1.us4.outblaze.com> <41770827.4070100@hutchinson.net> Message-ID: <41770E6B.4050904@ar.com.au> Joshua Hutchinson wrote: > Bah... Winguts rules VI! Yup. Huge time saver for processing texts: ascii, HTML, Unicode, Foo-AutoFormat8001... Get da gooeyguts here: http://mywebpages.comcast.net/thundergnat/guiguts.html & the best software support I've ever encountered. More info & guiguts help/discussion in the DP Forums. You'll need to register at DP to view the relevant Forums. Cheers, P -- Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." JabberID: servalan@jabber.org Jabber? - http://www.jabber.org/about/overview.php From Bowerbird at aol.com Wed Oct 20 18:26:51 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 18:27:11 2004 Subject: [gutvol-d] Posting TEI Message-ID: <1de.2c209486.2ea86a5b@aol.com> joshua said: > (Everyone but bowerbird... > don't even bother to respond, please... > I'm trying to actually get something going i'm all in favor of actually getting something going. as far as not "bothering" to respond, if you'll do the same for me, it's a deal. -bowerbird From jon at noring.name Wed Oct 20 18:38:57 2004 From: jon at noring.name (Jon Noring) Date: Wed Oct 20 18:39:22 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes In-Reply-To: <158.419ce2ec.2ea850d9@aol.com> References: <158.419ce2ec.2ea850d9@aol.com> Message-ID: <91561720843.20041020193857@noring.name> Bowerbird wrote: > marcello said: >> 1. 14 Feb 2003: You announce you will >> code an open source ebook reader. > yes. that was a big joke. :+) > i was pulling jon noring's leg. Hark, I hear my name! > my guess is that jon noring still has > less than 3 programmers involved > -- jon, care to comment on that? -- Yes. Plan A for the OpenReader 1.0 code base (there is a Plan B) is 75% complete (and in good shape.) Because of a request not to discuss it in any more detail, I can't say any more on this except that the people working on it are really sharp and active in the XML and CSS worlds (the founder actively serves on the W3C CSS working group), with a *proven* track record in creating a real-pudding, honest-to-god, and excellent XML+CSS-based product. They know their document rendering stuff as well as anyone. We are seeking support to finish the last 25% of the job, since they are a commercial outfit with professional programmers and an investment in the code base they do have, so that is holding things up, but the needed support is small, and the final codebase will be donated and released under an open source license under the control of the Consortium (discussed below). It's a much better approach than kludging something together from scratch since the codebase we are starting from is of very high commercial quality, fast, compact, and supports many of the advanced features we need such as SVG and advanced font-handling (not to mention probably the best CSS parser in the world, and of course fairly complex XML document handling capabilities suitable for OEBPS.) It is also fully cross-platform (it is primarily developed for Linux but already portable to Windows, thus it will easily port to Windows and Mac OS X, both desktop and mobile flavors. Support for legacy Mac and Palm is detailed at our web site (for the Plucker developers reading this, I'd like to chat with you!) However, there is more than just issuing the open source code base. We also need to intelligently hammer out the OpenReader encapsulation format spec (which is intended for more than just ebooks, such as encapsulating web sites to compete with Microsoft's proprietary MHT format), and most importantly the OpenReader conformance requirements, so anyone else building their own OpenReader browser will not deviate too far from the vision (we will encourage competitive OpenReader browsers -- Mozilla, Opera and Safari folk are all capable of building their own OpenReader versions, although it won't be trivial for them since they will need to add SVG support and higher typographic rendering capability including "paged" display which at present they don't do for web browsers.) We will balance out the need for following strict conformance rules in order to use the name 'OpenReader' with a desire not to stifle innovation. I believe we will reach a proper balance. In addition, it is important to establish a Consortium, which is simply an organized group of various key players in the ebook and digital publication worlds who want OR to succeed (and are dedicated to both open source and open standards in the digital publication industry) since they will take advantage of it in some way which will benefit them (either profit-wise for profit companies, and for non-profits it will further their goals.) The Consortium (comprising the members, and not any one individual such as yours truly) will hold the IP to the OpenReader trademark so as to enforce conformance requirements, and to maintain and improve the specifications via established Technical Working Groups, either working under OpenReader or maybe under some other umbrella organization (I've been offered the DAISY-NISO umbrella, for example.) So a lot of effort is going on behind-the-scenes to build the needed relationships and interest in the Consortium, and we've had a great increase in interest in the last couple weeks, with some fairly *big* names in the ebook universe deciding to throw their name behind the OpenReader vision. I don't believe we are at "critical mass" yet, but we are definitely getting a lot closer. Will the little train make it over the hill? -- we'll see. Of course, anyone reading this, whether representing a company or organization, or simply an interested individual, who wishes to publicly state their support/endorsement for OpenReader (with no other obligation asked for), please contact me in private. We are preparing a supporters/endorsers web page showing the logos of companies/ organizations with links, and the names/affiliations of individuals. > in his open-source openreader thing, > and david rothman has been flogging > openreader _incessantly_ on his blog, > even relaying a specific request for > mac programmers to join in and help. Yes, we want OpenReader to be a community developed and maintained effort. Community is the key to success in this instance, in my opinion, since the ebook and digital publication realms are essentially commercially and organizationally oriented (authors, publishers, retailers, accessibility activists, librarians/archivists, etc. -- but we will not forget to give ebook buyers and readers a say in the process, unlike past standardization efforts in the ebook realm which ignored them.) For example, refer to the Digital Radio Mondiale effort ( http://www.drm.org/ -- yes, they use 'DRM' as their acronym!) for an archetype of a fairly successful community effort to establish an international, open standard for shortwave (and BCB/AM) digital radio. Notice how they formed their "Consortium" -- getting buy-in from a large number of companies/organizations who are working together for mutual interest. Jon Noring http://www.openreader.org/ From Bowerbird at aol.com Wed Oct 20 18:56:09 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 18:56:30 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <148.366aff1b.2ea87139@aol.com> joshua said: > Come on back out, Anne. It's just that > everyone gets so worked up when bowerbird is around > that they start snapping at everything. that deal, of course, includes talking about me in posts that you address to other people too... -bowerbird From Bowerbird at aol.com Wed Oct 20 19:02:54 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 19:03:13 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes Message-ID: <42.5ae9eb0a.2ea872ce@aol.com> that's all wonderful to hear, jon, congratulations! the world needs a good e-book viewer-program. (now don't forget the authoring-tool as well!) :+) -bowerbird From Bowerbird at aol.com Wed Oct 20 19:03:54 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 20 19:04:10 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <1dc.2ea74971.2ea8730a@aol.com> joshua said: > If nothing else, and I've said this before, bowerbird is a very good troll. just can't help but blame me for your own bad behavior, can you... -bowerbird From ke at gnu.franken.de Wed Oct 20 20:00:14 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Wed Oct 20 20:31:00 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com> (Joshua Hutchinson's message of "Wed, 20 Oct 2004 13:46:04 -0500") References: <20041020184605.DB96B2F95F@ws6-3.us4.outblaze.com> Message-ID: "Joshua Hutchinson" writes: > There will always be plain text files available. You do more harm than good with these "plain text files". At least, please adjust your filename conventions: foo.zip -> foo-7.zip foo-8.zip -> foo.zip Otherwise you can fool the innocent downloader easily. And stop wasting bandwidth: delete all .txt files if a .zip file is available. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From lofstrom at lava.net Wed Oct 20 20:49:51 2004 From: lofstrom at lava.net (Karen Lofstrom) Date: Wed Oct 20 20:50:08 2004 Subject: [gutvol-d] Aside on old computers In-Reply-To: <4175EF86.8060108@adelaide.edu.au> References: <4175B419.6030301@adelaide.edu.au> <4175EF86.8060108@adelaide.edu.au> Message-ID: On Wed, 20 Oct 2004, Steve Thomas wrote: > [I can't believe that people still think they're doing good by > shipping old 486's to Africa -- but apparently its true. I > recently donated some old Pentium II's to a charity, and they > couldn't believe their luck.] My Linux users group installs thin client computer labs for schools. We happily accept PIIs, but turn down 486s. We use PIIs and PIIIs as thin clients, removing the hard drives and installing bootable NIC cards, and connect them to a fast server running K12LTSP Linux. We can create a usable 30 client computer lab for $3000 or so, since the clients are all donations. Currently we're preparing clients for our first foreign lab, to be run by Peace Corps Volunteers in Western Samoa. If that works -- then perhaps Africa :) -- Karen Lofstrom Zora on DP From tb at baechler.net Thu Oct 21 00:13:59 2004 From: tb at baechler.net (Tony Baechler) Date: Thu Oct 21 00:13:07 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041020184101.0F2849E980@ws6-2.us4.outblaze.com> Message-ID: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> At 01:41 PM 10/20/2004 -0500, you wrote: >That is what the page numbers markup currently does. It hides the page >numbers for those the minimum, default behavior, but if you have a browser >that supports it, you can see those page numbers appear. Similarly with >poetry. It has features that allow the browser to rewrap nicely if there >is a long line, if the necessary CSS support is there ... but if not, it >still displays the poem with its normal indents, it just doesn't rewrap >nicely for you. OK, but I have a question. I regularly use Lynx because of convenience. I prefer plain text but I will sometimes use lynx to convert html when necessary. Let's say that I do, in fact, want the page numbers. How am I supposed to get them if my browser doesn't support it? Lynx doesn't do css as far as I know, so what you're saying is that page numbers will always be hidden from me unless I want to look at the raw html source. Because plain text is, among other things, removing the markup from html, wouldn't that also eliminate the page numbers? I can use IE and it is accessible to the blind, but according to what you said IE hides different styles anyway. So, unless I misunderstood the above completely, some information will always be inaccessible to me. Right? Please don't tell me to use Mozilla or some other browser. At some point they will probably be accessible, but not for now. They are working on it but aren't there yet. I would like to repeat that I still prefer plain text and normally I wouldn't even care about line indents or page numbers. However, it would be nice to at least be able to access the information if I have a need for it. From tb at baechler.net Thu Oct 21 00:27:51 2004 From: tb at baechler.net (Tony Baechler) Date: Thu Oct 21 00:26:59 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis In-Reply-To: <20041020170436.532794BDAA@ws1-1.us4.outblaze.com> Message-ID: <5.2.0.9.0.20041021002504.025972f0@snoopy2.trkhosting.com> At 09:04 AM 10/20/2004 -0800, you wrote: >I'm not spending as much time as I do with PG for him. I seriously >doubt that he's interested in Ossian in Germany or Selections >from Early Middle English. My target user is a scholar, whether >a kid in high school, or a college student or professor or other >person who may not have or may not be interested in waiting on >interlibrary loan. I would like to briefly comment on this. As I've said here before, I am blind. Yes, it is possible to get books in Braille or on cassette. Until recently, electronic books were very hard to come by and PG was really the only major producer of them. When I was in high school, often teachers wouldn't bother to get books in Braille in time. I would often find out literally the day before that I needed to produce a book out of thin air. PG saved my butt many times. I was able to keep up because David Price had posted the works of Dickens and others had posted the works of Twain. I would have never made it through English otherwise. For that I will always be greatful! From jonathan_ingram at yahoo.com Thu Oct 21 01:02:54 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Thu Oct 21 01:03:12 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> Message-ID: <20041021080254.30607.qmail@web41727.mail.yahoo.com> --- Tony Baechler wrote: > OK, but I have a question. I regularly use Lynx because of convenience. I > prefer plain text but I will sometimes use lynx to convert html when > necessary. Let's say that I do, in fact, want the page numbers. How am I > supposed to get them if my browser doesn't support it? The markup I'm currently using for page numbers will not display them on non-CSS-capable browsers -- and by default won't display on CSS capable browsers either unless you change the stylesheet / switch to an alternate stylesheet. It would be possible to use a different markup, which wouldn't display page numbers on CSS-capable browsers (which can hide sections of HTML), but would always display them on non-CSS-capable browsers. As text-mode browsers are the main example of non-CSS browsers in use today, the former markup made more sense to use, as it replicates the behavour exhibited by the text-only edition (which doesn't record page numbers). Both markup styles allow you to navigate to a particular page number, even in non-CSS browsers, by using named anchors (i.e. append #pageXXX to the end of the URL). As you say, the information is in the source file, but currently inaccessible to you. One of the ways to solve this problem is to switch to a relatively standard master document format, such as TEI, combined with flexible tools that could convert the source to other editions such as HTML or text, while allowing us to choose how much of the preserved information, and to also choose how that information was encoded. You could then easily generate for yourself a 'with page numbers' text edition of the document you're interested in. -- Jon Ingram __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo From jonathan_ingram at yahoo.com Thu Oct 21 01:09:22 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Thu Oct 21 01:09:41 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <417706F0.4070207@hutchinson.net> Message-ID: <20041021080922.31094.qmail@web41721.mail.yahoo.com> --- Joshua Hutchinson wrote: > I'll be going over this with Jon when I can, but my early idea is that > we work on a couple of DP e-texts (the two of us have TONS to choose > from!) and improve the XML markup standard enough for basic work. In a > few weeks or so, I'd like to get a few projects posted to PG that use > XML (TEI) with a CSS style sheet in place of the normal HTML that we > always produce on our projects. The normal text file will of course be > created. Once we have a canon of TEI to work with, hopefully the > developers out there can start working on tools to help produce HTML or > TEXT or PDF directly from the master. Just to put people's minds at rest, I don't believe we should post XML+CSS without (at the very least) an HTML edition -- certainly not until we have agreement on a common base of XML to use, and well tested tools to convert from this to (at the minimum) an HTML edition that displays acceptably on a wide range of browses. Even if we do end-up using a 'nonstandard' XML markup at DP, I agree with Joshua that we should try as hard as possible to ensure it can be converted easily to TEI (derivatives of which seem to be in favour around here). People at PG will not see the DP-internal markup, only our output, which will conform to the standards we will hopefully agree on at some point :). -- Jon Ingram __________________________________ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail From Bowerbird at aol.com Thu Oct 21 02:14:47 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 02:15:12 2004 Subject: [gutvol-d] 126 messages in one day Message-ID: <1f4.144353d.2ea8d807@aol.com> by my count, there were 126 messages yesterday. there were probably some _months_ in the last half year when this listserve did not total that many messages... there are a certain number of messages that i _will_ post before leaving here. if y'all like this intense level of traffic, then just keep on responding like you have been. we can turn each one of my topics into a long-drawn-out thread, if that's really what you want to do. we can. on the other hand, if you'd prefer a more sedate experience, i suggest you just let me post my messages and move on... whatever you want to do is quite alright with me... but let's look at my "aspects of a well-done e-book" thread; that generated 36 replies, and the only one that was really pursuing the topic was david starner's, and his message was... well, go and re-read it, if you want, and evaluate it yourself. suffice it to say that y'all still don't have a standard for that. it might be nice if we could have a _useful_ discussion. as for the let's-get-x.m.l.-going crowd, perhaps what you need is your own _separate_listserve_ to coordinate your efforts, one where i am not allowed, so you can really be "productive". good idea, eh? -bowerbird From marcello at perathoner.de Thu Oct 21 02:42:16 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 02:42:39 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020210912.GB22445@panix.com> References: <20041020203414.88526EDE60@ws6-1.us4.outblaze.com> <20041020210912.GB22445@panix.com> Message-ID: <41778478.3060609@perathoner.de> Jim Tinsley wrote: > That has lasted a lot longer than any of us would have believed > at the time, because despite the apparent reasonableness -- > to me, at least -- of the request, we still ain't got it. Maybe the request is just that: reasonable to *you* and nobody else. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 21 03:10:18 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 03:10:43 2004 Subject: [gutvol-d] Posting TEI Message-ID: <1e1.2d2ba8d4.2ea8e50a@aol.com> marcello said: > Maybe the request is just that: > reasonable to *you* and nobody else. yeah, right... -bowerbird From marcello at perathoner.de Thu Oct 21 03:31:59 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 03:32:29 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020205934.GA22445@panix.com> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> Message-ID: <4177901F.7010006@perathoner.de> Jim Tinsley wrote: > I really do not mean to be disrepectful when I -- speaking for myself -- > say that I'm not interested in spending my time making developers' jobs > easier. That's not what I'm here for. One one hand you wonder why we developers are not able to come up with a solution on the other hand you are not disposed to get one inch back on your developer-unfriendly position. Your policy of not posting TEI files is at present the main roadblock. It's like requesting the final release of the product before allowing a beta test. I have been doing other (hopefully useful) work and have not looked at the TEI code for about a year now because I don't see a way to get it to work with this `moratorium' in place. > We have text, and HTML, both > proven and well-supported formats that we know how to work with and for > which we know there is a demand. I'll stick to those until we can see > a way clear through to making successful XML. You sure know how to work with PLAIN ALL CAPS ASCII TEXT FILES but that's not a reason to shun all progress since. > Correct spelling is necessary but not sufficient. I don't know about > other people, but I most commonly find errors by skimming the text. > I can't do that with XML. After a few weeks you'll skim thru TEI like you skim thru plain text. (Use an editor that highlights the tags and use a low contrast color for the tags.) > And it may not be the way software development works, but then we're not > a software development project. But you depend on software. DP is 250.000 lines of code. If it was not for software you wouldn't have much to do. > that's not the problem. If the process we agree for teixlite is, say, run > it through Saxon, then I expect to be able to run all teixlite files > through Saxon, and not have a submitter say "oh, no, you must use Xalan for > this file, and not just any Xalan, but one with my patch in it." You have to use PGTEI stylesheets to convert PGTEI text. You can use them with any XSLT 1.0 compliant processor. > You see, we appear to differ very fundamentally on one point. It's > my lock and key analogy again. I do not want to start down the road > of producing posted files from an XML if the transform, will be, for > any reason, not repeatable in a year's time, or five, or ten. This amounts to the same as: never start at all. Remember: the first files were uppercase ascii. We *had* to do them over again. We *are* doing all pre-10K texts over again. We *will* have to do the TEI files over again, maybe more than once. That's only being realistic. > I do > not want to start down the road of producing posted files from XML > if an end-user who wants to -- on whatever platform -- cannot > replicate the process. Then you should also post all the scanned pages so a user can redo the OCR on her platform if she wants to. I think we can postpone this, because the user can grab the converted files. And if converting at home is an issue with him, hey!, the tools are Free Software. He can change them until they work on his platform and and submit the patches to us. >>For the start I will act as interim Post-Processor for people wanting to >>post PGTEI and pass on to you only the perfectly good ones. You'll just >>have to stick in the etext number where I put 5 asterisks. > > No; I, at least, don't want to work with an experimental process in which > each text is an exception. Is there some qualifying exam to become a whitewasher? I ask, because by now I'm so desperate that I'm quite willing to become a whitewasher myself just to see some TEI texts posted. > Why can't we just name them .xml? I see no reason to invent extensions. > _Is_ there one? Not that it matters much, just curious why you would > think this a good idea. Because there ain't such a thing as an XML file. XML is just a framework for building applications. XHTML is an XML application, SVG is an XML application, TEI is an XML application, OpenOffice file format is an XML application ... Labelling a file .xml is like labelling a Word file .bytes -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Oct 21 03:59:48 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 04:00:19 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020213206.GA10983@panix.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> Message-ID: <417796A4.8000703@perathoner.de> Jim Tinsley wrote: > I know of no CSS for Marcello's PGTEI. Perhaps one could > be crafted for it. It already works pretty well with Jeroens XSL: http://www.gutenberg.org/tei/examples/css/lmiss.xml I had to replace all named entities (like —) with numeric ones. I did that manually, so maybe I got some of them wrong. All quotation signs are missing because I replace quotation signs with and Jeroen does not. But this should be very easy to add to Jeroens XSL. -- Marcello Perathoner webmaster@gutenberg.org From cweyant at twcny.rr.com Thu Oct 21 04:44:40 2004 From: cweyant at twcny.rr.com (Curtis A. Weyant) Date: Thu Oct 21 04:41:12 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <20041020121613.C3F9D9E96A@ws6-2.us4.outblaze.com> Message-ID: <4177A128.10803@twcny.rr.com> David A. Desrosiers wrote: > Or, more correctly, by going to View -> Use Style, because there is > no such selector in Mozilla or "Mozilla-based browsers" in the lower > left-hand corner. At least not on my Unix, Linux and Windows versions of > Mozilla (all current). Firefox has an icon as JHutch describes. Curtis. From cweyant at twcny.rr.com Thu Oct 21 04:52:03 2004 From: cweyant at twcny.rr.com (Curtis A. Weyant) Date: Thu Oct 21 04:48:35 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: References: <146.364dc5ec.2ea7eddb@aol.com> Message-ID: <4177A2E3.9080703@twcny.rr.com> David A. Desrosiers wrote: > Users aren't using MSIE because it is the superior product, they're > using it because they have no idea there are significanly more secure, > functional, compliant browser alternatives out there, and because it > came with their pee-cee, with a nice convenient icon right on their > desktop. This is why I installed Firefox on my mom's computer for her. I then deleted the IE icon and changed the Firefox icon to read "Internet" -- and I have no doubt she has not noticed a single change. ;O) Curtis. From marcello at perathoner.de Thu Oct 21 04:49:30 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 04:49:55 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <8b.17ed6e78.2ea83dc0@aol.com> References: <8b.17ed6e78.2ea83dc0@aol.com> Message-ID: <4177A24A.40000@perathoner.de> Bowerbird@aol.com wrote: > those aren't the types of things michael hart would say. That ain't no Hank Williams song! > among the many things that that has meant is to work with > the _trailing_ edge, not the _leading_ edge, of technology. > and that strategy hasn't caused it to "stagnate", but rather > what has caused it to grow into the biggest cyber-library... Have you got any data to sustain your theory, like, uhm, a representative poll of pg user population? Or are you bumbling along without a clue as usual, causally linking to facts that, for all *you* know, may be completely unrelated. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Oct 21 05:00:38 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 05:00:42 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes In-Reply-To: <158.419ce2ec.2ea850d9@aol.com> References: <158.419ce2ec.2ea850d9@aol.com> Message-ID: <4177A4E6.8060805@perathoner.de> Bowerbird@aol.com wrote: > anyone who's stupid enough to > think _i_ would be _serious_ > in announcing an effort to write > an oeb viewer is _really_ stupid. Hey! Why be narrow-minded about this? I think everybody who thinks that you'll ever be _serious_ is really stupid. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Thu Oct 21 05:09:19 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 05:09:19 2004 Subject: [gutvol-d] Re: jeroen's even-handed analysis Message-ID: <20041021120919.105919E97A@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Karl Eichwalder > > "Joshua Hutchinson" writes: > > > There will always be plain text files available. > > You do more harm than good with these "plain text files". At least, > please adjust your filename conventions: > > foo.zip -> foo-7.zip > foo-8.zip -> foo.zip > > Otherwise you can fool the innocent downloader easily. And stop wasting > bandwidth: delete all .txt files if a .zip file is available. > Personally, I lump 8-bit and 7-bit text files together when I say plain text files. I wouldn't shed a single tear if we did away with the 7-bit text files from here on out. Then again, I personally, wouldn't shed a tear if we did away with text files completely, but that is a personal preference. 7bit files aren't any easier to read than 8bit files and the lost information (accents, etc) can be significant. As far as the having everything .zip ... that's being a little anal, don'tcha think? Josh From marcello at perathoner.de Thu Oct 21 05:10:43 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 05:10:46 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: References: <8b.17ed6e78.2ea83dc0@aol.com> <41770A4E.9030305@hutchinson.net> Message-ID: <4177A743.6020108@perathoner.de> David A. Desrosiers wrote: >> If nothing else, and I've said this before, bowerbird is a very good >> troll. > > Every bridge has its troll. How did you get rid of the troll? I seem to recall, by having the bear come after you. Maybe we could lure another troll onto this list and have the two of them fight each other until they've both had it. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Thu Oct 21 05:15:12 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 05:15:14 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041021121512.D434A4F4AB@ws6-5.us4.outblaze.com> Well, the page numbers is something I looked into having in the non-CSS versions (LYNX). However, it ended up being unworkable. Basically, since you can't move them to the side in LYNX, they appear right in the middle of the text. And then, if you didn't know already what it was, it looked like it was part of the sentence you were reading and didn't make a whole lot of sense. This was the best compromise I could come up with. A minimum functionality (the functionality offered by the vast majority of HTML texts we offer) with a few added benefits for the browsers that can support them. And IE can show the page numbers, it just doesn't do it by default. Since IE doesn't have on the fly style switching, it requires modifying the HTML doc manually. Definitely something I wish I could avoid, but since I can't, I tried to default the layout to the least obtrusive. Josh ----- Original Message ----- From: Tony Baechler To: Project Gutenberg Volunteer Discussion Subject: re: Re: [gutvol-d] Re: aspects of a well-done e-book Date: Thu, 21 Oct 2004 00:13:59 -0700 > > At 01:41 PM 10/20/2004 -0500, you wrote: > >That is what the page numbers markup currently does. It hides the page > >numbers for those the minimum, default behavior, but if you have a browser > >that supports it, you can see those page numbers appear. Similarly with > >poetry. It has features that allow the browser to rewrap nicely if there > >is a long line, if the necessary CSS support is there ... but if not, it > >still displays the poem with its normal indents, it just doesn't rewrap > >nicely for you. > > > OK, but I have a question. I regularly use Lynx because of convenience. I > prefer plain text but I will sometimes use lynx to convert html when > necessary. Let's say that I do, in fact, want the page numbers. How am I > supposed to get them if my browser doesn't support it? Lynx doesn't do css > as far as I know, so what you're saying is that page numbers will always be > hidden from me unless I want to look at the raw html source. Because plain > text is, among other things, removing the markup from html, wouldn't that > also eliminate the page numbers? I can use IE and it is accessible to the > blind, but according to what you said IE hides different styles > anyway. So, unless I misunderstood the above completely, some information > will always be inaccessible to me. Right? Please don't tell me to use > Mozilla or some other browser. At some point they will probably be > accessible, but not for now. They are working on it but aren't there yet. > > I would like to repeat that I still prefer plain text and normally I > wouldn't even care about line indents or page numbers. However, it would be > nice to at least be able to access the information if I have a need for it. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Thu Oct 21 05:22:48 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 05:22:50 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041021122248.8451EEDD67@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "Curtis A. Weyant" > > This is why I installed Firefox on my mom's computer for her. I then > deleted the IE icon and changed the Firefox icon to read "Internet" -- > and I have no doubt she has not noticed a single change. > Heh! I'm not the only one that does "stealth" fixes on relatives' computers! ;) Josh From hacker at gnu-designs.com Thu Oct 21 06:21:28 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Oct 21 06:22:11 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041021080922.31094.qmail@web41721.mail.yahoo.com> References: <20041021080922.31094.qmail@web41721.mail.yahoo.com> Message-ID: > Even if we do end-up using a 'nonstandard' XML markup at DP, I agree > with Joshua that we should try as hard as possible to ensure it can be > converted easily to TEI (derivatives of which seem to be in favour > around here). People at PG will not see the DP-internal markup, only our > output, which will conform to the standards we will hopefully agree on > at some point :). It doesn't matter if it is "non-standard XML" (of course, there is no such thing, as long as it is a well-formed XML document). Once the format is in something like XML, we are all free to create our own output from that base, including "correcting" the XML to output a different form of XML from it. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Thu Oct 21 06:26:23 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Oct 21 06:27:10 2004 Subject: [gutvol-d] 126 messages in one day In-Reply-To: <1f4.144353d.2ea8d807@aol.com> References: <1f4.144353d.2ea8d807@aol.com> Message-ID: > there were probably some _months_ in the last half year when this > listserve did not total that many messages... Excuse me, "Listserve" is the trademarked name of a product owned and created by L-Soft International, Inc. We shouldn't be referring to the PG lists as a list run by that product, because well, it isn't. This is a mailing list run by a product called Mailman. I may be being pedantic here, but since we should all be aware of certain copyright and trademark issues as we continue to work on PG and other similar products, the distinction is an important one to make. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From hacker at gnu-designs.com Thu Oct 21 06:31:18 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Oct 21 06:32:11 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <4177A2E3.9080703@twcny.rr.com> References: <146.364dc5ec.2ea7eddb@aol.com> <4177A2E3.9080703@twcny.rr.com> Message-ID: > This is why I installed Firefox on my mom's computer for her. I then > deleted the IE icon and changed the Firefox icon to read "Internet" -- > and I have no doubt she has not noticed a single change. I've done something similar for some local businesses here as well, except the IE icon itself, launches FireFox. You also have to remember to set the user's default browser to FireFox, or IE will always be used, and you have to make sure you change the shortcut that the one in the Start bar points to (the one next to the desktop icon, etc. to the right of the Start bar.) After that, I installed about 6 extensions that help them with their daily work, and to compel them to stay on FireFox, and I installed the IE FireFox theme, so it "looks" identical. So far, no complaints, and one company even said they can't believe they were using MSIE before for all of their Inter and Intranet work, given the huge number of nice features in FireFox that are lacking in MSIE. (Little trivia fact: The last version of MSIE was released over two years ago). This also included the dramatic reduction in their support calls after installing FireFox, because they didn't have to handle sending out a tech to deal with viruses, trojans, spyware, popups, and other things that MSIE seems rife to proliferate. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From SCREAMING.xml.queen at gmail.com Thu Oct 21 06:36:33 2004 From: SCREAMING.xml.queen at gmail.com (name>XML Queen Marcello Perathoner wrote: > Maybe we could lure another troll onto this list and have the two of them fight each other until they've both had it. You rang, babycakes? I am XML Queen Letitia! Hear me ROAR!! You've been doing things all wrong for the last 30 years. :+?/> x/M/l is the Way, the Truth, and the Light. There's no point what-so-fricken-ever in giving people choice. FEED the Democratic citizens of Gutenbergia XmL and they will use it with my viewer PROGRAM which will dazzle them. I have written ROUTINES in LOGO which will transform your worthless TXT into XML, XSLT, XQML, and XBB38GGOBS. You cannot have these routines unless you bow down to me becuase otherwise YOU are not worthy. It will be fun for you to redo work already done ;+?/> I have an ally in this list I SEE. BOWERBIRD is very evanGELical, but she consistently mis-labels xML (I know z and x are very close on the keyboard). I am a competitive interpretive yogalates-funk artist AND I KNOW BETTER THAN YOU GEEKS. Fear me, Letita. -- twajs From tb at baechler.net Thu Oct 21 06:46:03 2004 From: tb at baechler.net (Tony Baechler) Date: Thu Oct 21 06:44:58 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <20041021080254.30607.qmail@web41727.mail.yahoo.com> References: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> Message-ID: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> At 01:02 AM 10/21/2004 -0700, you wrote: >As you say, the information is in the source file, but currently inaccessible >to you. One of the ways to solve this problem is to switch to a relatively >standard master document format, such as TEI, combined with flexible tools >that >could convert the source to other editions such as HTML or text, while >allowing >us to choose how much of the preserved information, and to also choose how >that >information was encoded. You could then easily generate for yourself a 'with >page numbers' text edition of the document you're interested in. So, does this mean that I now not only have to download the master xml file, the css, and a set of conversion tools? You must be kidding, right? If it came to that, I would rather have the plain text and forget the page numbers. It is already inconvenient to use "lynx -dump -nolist filename.htm." Why in the world would I want to run it through a conversion tool and still have to do that anyway? OK, so a plain text file can be output directly from the xml. I still have to go through at least one extra conversion step that I wouldn't have to otherwise. I had a look at sgml just to see how hard it would be to get plain text. What a royal pain! I gave up when it kept complaining about some file missing when I was using their samples. From joshua at hutchinson.net Thu Oct 21 06:51:02 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 06:51:06 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book Message-ID: <20041021135103.0B17A1097DB@ws6-4.us4.outblaze.com> No, no, no. We NEVER expect you (as a consumer) to have to convert files from the master XML format to format du jour. However, if you want something non-standard (like a text file that *includes* source page numbers), *then* you may have to convert it yourself. If you want an HTML with pink backgrounds and purple text, you can convert a copy yourself (if you have the tools and the know-how) but we won't have such a version pre-made on the download page. Basically, we will never have LESS available than we do now. The XML initiative is about giving more options above and beyond the baseline we already have (as well as making like easier on the text preparers and maintainers). Josh ----- Original Message ----- From: Tony Baechler To: Project Gutenberg Volunteer Discussion Subject: re: Re: [gutvol-d] Re: aspects of a well-done e-book Date: Thu, 21 Oct 2004 06:46:03 -0700 > > At 01:02 AM 10/21/2004 -0700, you wrote: > >As you say, the information is in the source file, but currently inaccessible > >to you. One of the ways to solve this problem is to switch to a relatively > >standard master document format, such as TEI, combined with flexible tools > >that > >could convert the source to other editions such as HTML or text, while > >allowing > >us to choose how much of the preserved information, and to also choose how > >that > >information was encoded. You could then easily generate for yourself a 'with > >page numbers' text edition of the document you're interested in. > > > So, does this mean that I now not only have to download the master xml > file, the css, and a set of conversion tools? You must be kidding, > right? If it came to that, I would rather have the plain text and forget > the page numbers. It is already inconvenient to use "lynx -dump -nolist > filename.htm." Why in the world would I want to run it through a > conversion tool and still have to do that anyway? OK, so a plain text file > can be output directly from the xml. I still have to go through at least > one extra conversion step that I wouldn't have to otherwise. I had a look > at sgml just to see how hard it would be to get plain text. What a royal > pain! I gave up when it kept complaining about some file missing when I > was using their samples. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Thu Oct 21 07:06:08 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 07:06:13 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020215041.GA3631@panix.com> References: <20041020205934.GA22445@panix.com> <20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> <20041020215041.GA3631@panix.com> Message-ID: <4177C250.3020600@perathoner.de> Jim Tinsley wrote: >>Lets start now with a version 0.0.1 of the TEI process. Of course at >>some later time we'll have to do all the posted files over again. > > Now, please don't take this as a policy statement or > anything, but I really, really HATE doing anything > KNOWING that it's wrong and will have to be done again. > I mean, bone-deep HATE it. *Why* is it wrong? If, having to redo something is an indication of premature start, we shouldn't have posted the first 10.000 books, because we have to repost them all. I have been architecturing and programming for 20 years now and I cannot remember one single instance I got it 100% right the first time. -- Marcello Perathoner webmaster@gutenberg.org From jonathan_ingram at yahoo.com Thu Oct 21 07:12:44 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Thu Oct 21 07:12:47 2004 Subject: [gutvol-d] Re: aspects of a well-done e-book In-Reply-To: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> Message-ID: <20041021141244.32137.qmail@web41701.mail.yahoo.com> --- Tony Baechler wrote: > So, does this mean that I now not only have to download the master xml > file, the css, and a set of conversion tools? If you wanted material in plaintext format which wasn't in the plaintext edition provided already, yes. > If it came to that, I would rather have the plain text and forget the > page numbers. If you want the mainstream plain text edition, then you download the mainstream plain text edition. If you want to create your own edition with extra material, then at the moment you're completely out of luck, as there's no way for you to generate it. With a transformable master document, then you get the chance to get your hands on this information in the format you want, for the first time. -- Jon Ingram __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail From gbnewby at pglaf.org Thu Oct 21 08:02:27 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Oct 21 08:02:28 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <4177901F.7010006@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> Message-ID: <20041021150227.GA17442@pglaf.org> On Thu, Oct 21, 2004 at 12:31:59PM +0200, Marcello Perathoner wrote: > Jim Tinsley wrote: > > >I really do not mean to be disrepectful when I -- speaking for myself -- > >say that I'm not interested in spending my time making developers' jobs > >easier. That's not what I'm here for. > > One one hand you wonder why we developers are not able to come up with a > solution on the other hand you are not disposed to get one inch back on > your developer-unfriendly position. > > Your policy of not posting TEI files is at present the main roadblock. > > It's like requesting the final release of the product before allowing a > beta test. > > I have been doing other (hopefully useful) work and have not looked at > the TEI code for about a year now because I don't see a way to get it to > work with this `moratorium' in place. I don't understand this limitation, so will rephrase what we're waiting for. It was among the first messages in this thread. ** What we want is an automatic means of generating canonical ** documents from an XML master. The minimums are: XML --> HTML and XML --> text (yes, it's ok to go via HTML) Displaying XML directly in a browser is not a requirement, but is nice to have. There are a few subsidiary requirements, like incorporating the header materials in a sanely marked up way (trivial with teixlite.dtd, but not unambiguous). Both Marcello and Jeroen have demonstrated techniques for these, but neither is quite ready. We do have several XML documents online, we also have this list (also gutvol-p), and there are a couple of demonstration pages. Your claim that we need to start posting more stuff in XML in order to achieve the ** goal above does not make sense to me. I do not see the logic. I'm personally not strongly opposed to doing all sorts of experimentation, and do NOT feel the urge to get it right from the start. I also am certain that there is not going to be a one-size-fits-all technical solution for all of our content. I've asked both Marcello & Jeroen for updates & ideas in the past months. Maybe they did not get my messages. My belief is that there is a definite commitment at PG (including DP) in creating XML masters. I also believe that TEI-lite encoding will work well for the majority of our content. >From my point of view, I'd rather see the gutvol-d group of highly motivated & talented individuals focused on solving the remaining challenges for the solutions that Marcello and Jeroen have in place already. Arguing about whether we'll use XML is a waste of time: we will. The challenges before us are primarily technical, not policy. -- Greg PS: No, I have not read every message in the threads over the past few days. If there's another solution somewhere, I hope someone can point it out to me. From brad at chenla.org Wed Oct 20 00:25:18 2004 From: brad at chenla.org (Brad Collins) Date: Thu Oct 21 08:03:32 2004 Subject: [gutvol-d] jeroen's even-handed analysis In-Reply-To: <4175B419.6030301@adelaide.edu.au> (Steve Thomas's message of "Wed, 20 Oct 2004 10:10:57 +0930") References: <4175B419.6030301@adelaide.edu.au> Message-ID: Ack! This is a looong post.... and I'd promised myself I wouldn't get dragged into this flame-fest :( Steve Thomas writes: > OK, you've somewhat overstated the case, and I think by now we'd all > agree that "8-bit" characters are important. But it is a shame that > most of the geeks -- no offence, I count myself as one -- on this > list, immediately skipped your main point to whine about the need for > accents and foreign scripts. You guys can't seem to see the wood for > the trees. > You're right, it's not just about accents, and it's not just about consistently converting texts into different formats, though these are both important issues in their own right. This aside, it's you who have it backwards. You keep talking about the end-use of the text, which is opening up a file and reading it. But it's far from being this simple. XML is not meant for humans, it is meant for software. The XML will be converted to plain-text, HTML and PDF for humans but mostly the XML will be used by applications humans need to find texts and determine if they are worth reading in the first place. If you have a small library with 10,000 books in it, and the library is shelved roughly by category you can easily get to know it just by glancing over the spines. You could even have a rough list that breaks down the books by title, author and category. But if you have 100,000 or a 1,000,000 books in your library your job of finding things becomes a lot more difficult. Keyword searching ala Google fill never cut it. Google gives you a means of finding your car keys -- you know what you are looking for and you ask it to look for places which it thinks might have them. Ask Google for a list of the works by Charles Dickens and you will get a list of web pages it thinks has lists of Charles Dickens' works. Ask the LOC Catalog the same question and it will return you a list of items in their catalog which claim to have been written by Charles Dickens. But this list would be huge because of duplicate editions of individual works. A Christmas Carol alone turns up a couple hundred items. But what if you could ask this same question and it would return a list of works (not web pages, or different editions) by Charles Dickens organized in any way you want? But this is not a good example. Can you ask for a list of all the characters in Great Expectations? Can you search for all contemporary obituaries of Charles Dickens? To build applications which answer these types of questions requires more than a good cataloging system (though the FRBR approach goes a long ways in this regard) you need the table of contents of each work (a TOC is description of the structure of a text) and you need to have a good index of what is in the text. A back of book index is more than just a matter of keywords, it is a form of semantic markup. It maps concepts, people, places and events to the text itself. By combining the catalog metadata, the table of contents, and a good quality index we have the basic tools for finding a book and determining if it is worth reading. We do this today in libraries but it is a slow laborious task which requires you going to a catalog looking for possible candidates, then retrieving each candidate and scanning it's TOC, preface, dust-jacket blurb or introduction or index to determine if it's worth reading. Traditional libraries are restricted by the physical medium that books are published in. But if you could pull all of these elements together into a consistent framework, you would have a remarkable resource which would transform an archive of books into a repository of knowledge which is far more valuable and powerful than the sum of its parts. Semantic markup like TEI is needed not only for creating this kind of library, but for creating services which will be needed as the amount of information on the Net grows beyond what even monster search services like Google can handle. You talk about missing the forest for the trees but you forget that a large part of the forest is a tangled root system deep underground which the end user will never see. Without that root system the forest will die. Structured, semantic markup and rich cataloging are the root system of a library. Anyone who says -- I don't care about the technical stuff just give me what I want, doesn't understand that it's the technical stuff which enables them to get the stuff they want. Is this hard work? Hell yes, and it should be. Understanding, evaluating and making sense of the world around us is the most difficult thing humans do. But saying that it's not worth doing because it's hard is simply pathetic. Look at works like the OED. Would they have been created if their attitude was, oh, it's too hard to build a dictionary based on historical principles and I don't read the quotes much anyway, so just give me a list of words. Even if you don't read the quotes, the unabridged OED and the unabridged Websters, or Century Dictionary were used to create brilliant concise works like Merriam-Websters Collegiate, or the Concise Oxford English Dictionary. The OED and the massive collection of research and material that was created to write is the root system for all dictionaries Oxford produces. The more important question we should be asking is, what is the role that PG and even DP should be playing in all of this. It's reasonable to ask that PG produce basic structured markup which shows the basic structure and important elements in each text. This is no more difficult than HTML. I believe that a new group needs to be established who will then take the simple TEI produced by PG and DP and then doing more complex cataloging, indexing and semantic markup which will then be sent back to PG to be released as new editions. The TEI documentation (which is 1,400 pages -- not 14,000 as Bowerbird exaggerated) recommends that markup be done in several passes. Start with simple structural markup (as I said, is about the same as HTML), and then pass it onto another team which can do a second more detailed pass, and so on until its complete. In this way you have a means of creating texts which will be gradually woven into the library but everyone will be using a consistent and interoperable format which can be as simple or as complex as anyone requires. If everything is in basic TEI-Lite, it will be easy for smaller specialized groups to come along to do this additional markup. A group could form around a single author like Mark Twain, or around a category of works like mathematics. Then it will be easy for them to donate back their work to PG, making the texts richer, rather than their work becoming a separate branch of the texts which aren't interoperable with the PG editions. Plain-text, HTML and PDF can't do this because they are display formats for human consumption. Each have their uses and their markets. TEI is used for the root system which needs to be grown, tended and cared for as the forest grows, even if 90% of people aren't even aware it's there or don't understand that the applications they depend on to find any particular tree in the forest and see if it's the tree they need, wouldn't work without it. To understand where I'm coming from on all of this, I should mention (plug plug plug) that I've been working on just such a system (http://www.chenla.org) which is divided into two parts -- the Burr Metadata Framework (BMF) which is meant to be sort of a Wiki markup for both integration of and export to TEI and MARC. The second part is the Librarium which uses BMF to integrate the catalog with the works in a library. We have recently put up our first experimental record (an authority record Charles Dickens) which has been converted into html, and plain text. Conversion to TEI and MARC is coming. Taken together, the system can be used to integrate library catalogs with books and other texts and reference works all together with authority data for persons and groups, geographic locations, events and concepts. We don't intend to be a service for the general public but rather create a catalog and content for other use in other libraries and web sites. The site is hosted at ibiblio. In the next few weeks we should have enough documentation and another 30 or more records (which we call Burrs) online to make a general announcement of the project. I'm still ironing out some bugs in the version control software and still need to do a lot of work to complete a general introduction to the design but it's all getting there. At the moment we can convert BMF to Emacs-Wiki format which I then use to publish to Blosxom which delivers basic HTML. BMF was designed with conversion to TEI in mind, though this might seem hard to believe when you look at the BMF source the first time (there is a link to a pretty-print version of the Dickens source). So what's in it for PG? The Librarium will be developing detailed authority and bibliographic records for all PG material and it's hoped that PG can eventually draw on our catalog material for it's own authority records and catalog. This should be a help both for books already in PG's collection but also for copyright clearance for new books and free up resources for putting out more books, with better metadata. b/ -- Brad Collins , Bangkok, Thailand From joshua at hutchinson.net Thu Oct 21 08:18:16 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 08:18:20 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <20041021151816.691302F975@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Brad Collins Thank you, Brad. That was probably the best essay I've read on WHY XML markup is such a good idea. I've got a good 40-50 years left in this lifetime. I want to get to the digital e-text nirvana by then. XML seems to be the best first step. Josh From nwolcott2 at kreative.net Thu Oct 21 08:42:46 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Thu Oct 21 08:47:29 2004 Subject: [gutvol-d] POD update. Message-ID: <000f01c4b784$b30609e0$4f9495ce@net> This is probably of most interest to DP'ers, but this is the only discussion site. Recently several new self publishing sites have emerged with zero up front publishing costs. Two I have looked at are www.lulu.com and www.cafepress.com There are 2 jules verne books on Cafe. Costs are $7 + .03/page for Cafe and $4 + .02/page for Lulu. Print resoloution, gray scale images not covered in the web sites. These prices are comparable to xerox costs. I checked out the shipping it was $4 flat on cafe, and 2.46 media mail from Lulu for a 8 1/2 by 11 300 p book. Journey to the Interior of the earth was 300 pages, so it might include some illustrations. No preview was available. Might be on Amazon.com, says Boulder Pier Press. Check back withme if you get any more info on these or better no cost upfront services. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/2b19d8cb/attachment.html From ke at gnu.franken.de Thu Oct 21 08:21:54 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Thu Oct 21 09:31:50 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <20041021150227.GA17442@pglaf.org> (Greg Newby's message of "Thu, 21 Oct 2004 08:02:27 -0700") References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org> Message-ID: Greg Newby writes: > Your claim that we need to start posting more stuff in > XML in order to achieve the ** goal above does not make sense > to me. I do not see the logic. I am very interested in XML files. Please post them even if they look useless to you. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From Bowerbird at aol.com Thu Oct 21 09:33:37 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 09:33:49 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes Message-ID: <1e2.2cbee04e.2ea93ee1@aol.com> marcello said: > I think everybody who thinks that > you'll ever be _serious_ is really stupid. and i think any reasonable outsider reading these threads will be able to clearly see that i contribute a lot of thought, while all you do here is come on and badger me continuously. -bowerbird From joshua at hutchinson.net Thu Oct 21 09:40:44 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 09:40:50 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <20041021164044.47AE29E9A9@ws6-2.us4.outblaze.com> Honestly, I see an easy compromise here. As long as a conformant TEXT file and a conformant HTML file show up with the XML file, I say post all three. Granted, right now we don't have a method for the WW'ers to verify the XML file is valid, so if you want to put a disclaimer to that effect in the file ... fine. But this way, the XML folks can get the catalog of texts they want and the process will be able to make incremental steps forward. Josh ----- Original Message ----- From: Karl Eichwalder To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Re: barriers to XML posting Date: Thu, 21 Oct 2004 17:21:54 +0200 > > Greg Newby writes: > > > Your claim that we need to start posting more stuff in > > XML in order to achieve the ** goal above does not make sense > > to me. I do not see the logic. > > I am very interested in XML files. Please post them even if they look > useless to you. > > -- > | ,__o > | _-\_<, > http://www.gnu.franken.de/ke/ | (*)/'(*) > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Thu Oct 21 09:46:05 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 09:46:10 2004 Subject: [gutvol-d] press releases and puke on a reporter's shoes Message-ID: <20041021164605.1A47FEDE98@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > > marcello said: > > I think everybody who thinks that > > you'll ever be _serious_ is really stupid. > > and i think any reasonable outsider reading these threads > will be able to clearly see that i contribute a lot of thought, > while all you do here is come on and badger me continuously. > Yep ... That's why *so* many people have been jumping to your defense lately! You're the poor little victim of us big mean bullies! Josh From holden.mcgroin at dsl.pipex.com Thu Oct 21 09:49:00 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Thu Oct 21 09:48:17 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> Message-ID: <4177E87C.90101@dsl.pipex.com> > So, does this mean that I now not only have to download the master xml > file, the css, and a set of conversion tools? You must be kidding, > right? If it came to that, I would rather have the plain text and > forget the page numbers. It is already inconvenient to use "lynx -dump > -nolist filename.htm." Why in the world would I want to run it through > a conversion tool and still have to do that anyway? OK, so a plain text > file can be output directly from the xml. I still have to go through at > least one extra conversion step that I wouldn't have to otherwise. Why? The whole idea behind PG moving to XML is not to complicate things, it's to give more flexibility while retaining simplicity. How about this situation: PG files are, by default, coded in XML. All other formats are then automatically generated from that XML format. There would still be TXT versions, there would still be HTML versions. Getting those would be no harder than it is for you to retrieve a TXT file now. All this conversion stuff should be done by the PG back-end, not the end-user (why make a human do a machine's job?). That way, instead of manually preparing every different format like what goes on at PG for the most part now, we could make every format available with only the effort of creating a super-format from which every other format could be derived and a set of tools which could automatically generate other formats from the super-format. If someone wants the entire PG library to be available in some obscure format, then it could be if they can just write a converter which outputs that format. Cheers, Holden From marcello at perathoner.de Thu Oct 21 09:54:35 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 09:54:43 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <20041021150227.GA17442@pglaf.org> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org> Message-ID: <4177E9CB.7080200@perathoner.de> Greg Newby wrote: > I don't understand this limitation, so will rephrase what > we're waiting for. It was among the first messages in this thread. > > ** What we want is an automatic means of generating canonical > ** documents from an XML master. > > The minimums are: > XML --> HTML > and XML --> text (yes, it's ok to go via HTML) You already got that. There are 2 different ways to do this, both of them mature enough for beta testing. The roadblock is that a 100% correct and complete solution was requested by Jim before he considered starting to post TEI texts. Now, we don't have a toolchain for the whitewashers that is equivalent to the one already in place for TXT and HTML files. That's why I volunteered to act as "interim" whitewasher: to manually go thru the steps needed to post a TEI file and derivative formats, to understand how this toolchain needs to be built. I will only take a few texts (maybe a dozen) from a few selected sources Some of the objections raised by Jim will not go away real soon. He says he cannot skim thru a TEI file like thru a TXT or HTML file. But there are at present no readers that accept TEI as native file format. If we had to build that first (Jon Noring is trying), we will likely never start posting. I feel Jim is raising artificial objections he knows we cannot overcome. If he doesn't want to learn TEI and he doesn't feel like proofing a TEI text in emacs, fine. But then, he should step aside and let other people do this work. Now for another thing. Jim fears that we will end up with a lot of files marked up in differing TEI dialects. OTOH, the moratorium has actively encouraged this. People being eager to try TEI and there being no official place to post TEI files, everybody has posted the files they have marked up in a different place. I have been working on my dialect, Jeroen on his and DP is cooking up another one. There is no central "clearing house" where we can see the other guys work. I don't say it would be impossible for me to obtain a glimpse of the TEI texts the folks at DP are working on, it would just be much easier if I could get them from the archive. At this point we need to set a signal that the TEI era has started. We don't need more discussion about whether TEI is the right language, I think we are all agreed on that. pgxml.org is dead and ZML is good for laughs. What we need now is to compare notes, all who have been doing TEI and get to an agreement of which dialect to use. That can be best reached if we all post samples of our work and try to run the other guys markup thru our XSL etc. etc. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 21 10:35:55 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 10:36:11 2004 Subject: [gutvol-d] re: poor little victim Message-ID: joshua said: > You're the poor little victim of us big mean bullies! i'm no "poor little victim". i am the one toying with all of you. i wasn't "complaining" about marcello's badgering -- i find it amusing he shoots himself in the foot, and the last thing i will do is go whining to a moderator -- i was simply remarking on it as a mere observation, confident that an objective outsider would share it... your little circle of friends deludes itself if y'all think that the lurkers can't see that... -bowerbird From Bowerbird at aol.com Thu Oct 21 11:00:02 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 11:00:18 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <1b8.4414d73.2ea95322@aol.com> the hacker said: > Every bridge has its troll. you're now a full-fledged member of the group, david. _welcome_... -bowerbird From Gutenberg9443 at aol.com Thu Oct 21 12:35:48 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Oct 21 12:36:14 2004 Subject: [gutvol-d] jeroen's even-handed analysis Message-ID: <104.53658a2d.2ea96994@aol.com> In a message dated 10/20/2004 6:58:19 PM Mountain Standard Time, joshua@hutchinson.net writes: Anyways, no one is trying to chase you away. Please come back out to play! Thank you. I continue to think that PG is one of the most important weapons of freedom, and if that is naive, so be it. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/69b0d379/attachment.html From Gutenberg9443 at aol.com Thu Oct 21 12:36:47 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Oct 21 12:36:59 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: In a message dated 10/20/2004 7:12:28 PM Mountain Standard Time, hacker@gnu-designs.com writes: Every bridge has its troll. So where are the billy goats gruff? THEY know how to deal with trolls. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041021/cdef92ea/attachment.html From joshua at hutchinson.net Thu Oct 21 12:38:31 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 12:38:41 2004 Subject: [gutvol-d] A layman's critique of PGTEI Message-ID: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> First of all, please take this as it is intended, namely my experiences while attempting to convert a short and sweet text to the PGTEI format found here (http://www.gutenberg.org/tei/). It is my hope that this will lead to some improvements in the process. I also apologize for the large size of this message. I felt it was necessary to get all the information in place in the email so no one was making assumptions about info that was alluded to but not present. Ok, I decided to start at the beginning ... namely our first e-text, the US Declaration of Independence (original: http://www.gutenberg.org/etext/1). The upside is that is short and very simple. The first thing I did was grab the standard PGTEI header from the documentation (http://www.gutenberg.org/tei/doc/pg-guide.html#toc_12). This spaghetti is lot easier than it looks. Further, I definitely see how this could be easily generated by filling out a web form (somewhat similar to what I understand is done right now when DP submits a text to the whitewashers). It contains information like the original creation date, who wrote it, Library of Congress subject classifications, who converted it to TEI (me), etc. (The PGTEI encoded file is attached at the end of this message for reference purposes.) The nice thing here is that the PG header and footer information is auto-generated by the and sections. The Declaration file in PG has a small foreword from Michael. I felt that should not be marked as if it were part of the main document. Luckily, TEI-Lite documentation provided a solution. (http://www.tei-c.org/Lite/teiu5_en.html#h52) You can mark up some text in the header section with a type of foreword. EXAMPLE: This is an example foreword section. Note: Paragraphs have to be surrounded by

markup, just like HTML. This shouldn't be difficult for anyone trying to tackle this... It certainly felt natural enough for me. Next, I added the actual Declaration text in the section. Again, all the paragraphs needed to be wrapped in

. I also ran into one problem with the & character. Instead of looking up the escape code, I went lazy and just converted it to 'and'. This would also be required for an HTML edition, so I don't consider that a big deal. That was it on creating the PGTEI markup. Total time, even with looking things up, maybe 20 minutes of my time. And, no, I haven't done this before, so this is coming into it raw. Next step was to use the validator on the page (http://www.gutenberg.org/tei/services/tei-online). It complained about one typo on my part and the & I mentioned before. The errors are NOT very friendly, but anyone familiar with the W3C validator should be able to puzzle it out. Next, I had it create a text file. This went very well. The resulting file looked pretty good to me. I didn't run it through GutCheck, but nothing jumped out at me as being problematic. Granted, this was a very simple text, so there are probably limitations in this conversion that I just haven't run into yet. Lastly, I had it create a HTML file. There are two problems I encountered here. One, cosmetic and fixable by changing the CSS, isn't that big a deal. The second is more of a deal breaker, but still fixable, I'd imagine. 1) The CSS specifies more of a printed page style than a web based style. For instance, all the paragraphs have no blank line between them and have a first line indent, just like a printed page. However, to me, this was a bit jarring, since it isn't the format I'd used to on the web. Again, this is mostly cosmetic and easily changeable. 2) The resulting HTML, while rendering fine in the browsers I have here, is NOT valid HTML. The file specifies HTML 4.01 strict, but there were 13 warnings/errors when I used W3C's validator on it. I didn't check real closely, but it looked like some of them were perfectly valid under HTML 4.01 transitional, and the others are fixable. The XSLT conversion process can probably be tweaked by someone knowledgable in that area to eliminate the validation errors. *** Well, that's my quick personal experiment. My question for the experts: Can the HTML validation problem be easily fixed? I'd also like to request a change to the CSS used, but that is a personal preference and something to really worry about after the show-stoppers are fixed. My next experiment will choose a text with some other stuff like poetry in it, so that I can see what more complexity does to the whole process. Josh **** Attached Declaration PGTEI file: The Declaration of Independence Edition 12 October 2004 Project Gutenberg www.gutenberg.org October 2004 1 when

This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included online at www.gutenberg.org/license

unknown
Library of Congress Classification American JK: Political science: Political inst. and pub. Admin.: United States Government United States December, 1971 Michael S. Hart Project Gutenberg Edition 12 October 2004 Joshua Hutchinson TEI markup

The United States Declaration of Independence was the first Etext released by Project Gutenberg, early in 1971. The title was stored in an emailed instruction set which required a tape or diskpack be hand mounted for retrieval. The diskpack was the size of a large cake in a cake carrier, cost $1500, and contained 5 megabytes, of which this file took 1-2%. Two tape backups were kept plus one on paper tape. The 10,000 files we hope to have online by the end of 2001 should take about 1-2% of a comparably priced drive in 2001.

This file was never copyrighted, Sharewared, etc., and is thus for all to use and copy in any manner they choose. Please feel free to make your own edition using this as a base.

In my research for creating this transcription of our first Etext, I have come across enough discrepancies [even within that official documentation provided by the United States] to conclude that even "facsimiles" of the Declaration of Indendence will NOT going to be all the same as the original, nor of other "facsimiles." There is a plethora of variations in capitalization, punctuation, and, even where names appear on the documents [which names I have left out].

The resulting document has several misspellings removed from those parchment "facsimiles" I used back in 1971, and which I should not be able to easily find at this time, including "Brittain."

The Declaration of Independence of The United States of America

When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume, among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty, and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shown, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security. --Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.

He has refused his Assent to Laws, the most wholesome and necessary for the public good.

He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them.

He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only.

He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their Public Records, for the sole purpose of fatiguing them into compliance with his measures.

He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people.

He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative Powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within.

He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws of Naturalization of Foreigners; refusing to pass others to encourage their migration hither, and raising the conditions of new Appropriations of Lands.

He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary Powers.

He has made judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries.

He has erected a multitude of New Offices, and sent hither swarms of Officers to harass our People, and eat out their substance.

He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures.

He has affected to render the Military independent of and superior to the Civil Power.

He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended legislation:

For quartering large bodies of armed troops among us:

For protecting them, by a mock Trial, from Punishment for any Murders which they should commit on the Inhabitants of these States:

For cutting off our Trade with all parts of the world:

For imposing taxes on us without our Consent:

For depriving us, in many cases, of the benefits of Trial by Jury:

For transporting us beyond Seas to be tried for pretended offences:

For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies:

For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments:

For suspending our own Legislatures, and declaring themselves invested with Power to legislate for us in all cases whatsoever.

He has abdicated Government here, by declaring us out of his Protection and waging War against us.

He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people.

He is at this time transporting large armies of foreign mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty and perfidy scarcely paralleled in the most barbarous ages, and totally unworthy of the Head of a civilized nation.

He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands.

He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.

In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince, whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free People.

Nor have We been wanting in attention to our British brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends.

We, therefore, the Representatives of the United States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by the Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the Protection of Divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor.

From Bowerbird at aol.com Thu Oct 21 13:00:46 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 13:01:00 2004 Subject: [gutvol-d] A layman's critique of PGTEI Message-ID: joshua said: > My next experiment will choose a text > with some other stuff like poetry in it, > so that I can see what more complexity > does to the whole process. when you have worked your way up to it, i suggest you do the test-suite i created. when you can mark _that_ up in a way that makes everyone here happy, you'll have gone a long way towards meeting jim's criteria... and when you can convert it to all formats, you will have gone _all_ the way... (of course, that still won't get the e-texts actually marked up, but that's minor, right?) -bowerbird From hacker at gnu-designs.com Thu Oct 21 13:04:10 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Oct 21 13:05:27 2004 Subject: [gutvol-d] A layman's critique of PGTEI In-Reply-To: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> Message-ID: > First of all, please take this as it is intended, namely my experiences > while attempting to convert a short and sweet text to the PGTEI format > found here (http://www.gutenberg.org/tei/). It is my hope that this > will lead to some improvements in the process. I went through every link there, and could not find a reference to download any sort of tool or set of tools that purports to convert the TEI format to other formats. Where did you find the converter to use locally? The docs link to an "online" converter, which no longer exists at that link, apparently. This one is dead: http://www.gutenberg.net/testing/gnutenberg/tei-online.php This one (linked from the front page) is not: http://www.gutenberg.org/tei/services/tei-online I don't see where the code to this online converter, or any converter that works with TEI for that matter, is documented, referenced, or linked to. Did I miss the link on one of the other pages? It looks like we're all back to square one... reinventing all of our own wheels from scratch. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jeroen at bohol.ph Thu Oct 21 13:18:03 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Thu Oct 21 13:17:43 2004 Subject: [gutvol-d] Posting TEI In-Reply-To: <20041020213206.GA10983@panix.com> References: <20041020205934.GA22445@panix.com><20041020211431.57061.qmail@web41709.mail.yahoo.com> <20041020213206.GA10983@panix.com> Message-ID: <4178197B.7060804@bohol.ph> Jim Tinsley wrote: > If I may nit-pick, I think it more correct to say that it > >isn't _always_ true. That is, it is not true when there >exists a CSS that works with the XML. > >Jeroen provided XML like this, which I thought was very >good indeed. For any of you who haven't seen it, please >point your browsers to http://www.gutenberg.org/dirs/1/1/3/3/11335/11335-x/11335-x.xml >which is an absolute pleasure to read. (Well, if you're >a geek, that is, and if you ain't, whatcha doin. here?? :-) > >I said before, and I say again, that where such an XML is >provided, HTML is probably redundant. ("Probably" because >a significant use of HTML is as input to PDA readers like, >say, Mobipocket, and I'm not sure if they would swallow >this XML without requiring a Heimlich.) > >I know of no CSS for Marcello's PGTEI. Perhaps one could >be crafted for it. > > > One additional note, before the XML of this text is rendered on your browser, it is fed through an XSLT stylesheet, which turns it into HTML, and then, to that HTML, CSS is applied. The entire process is done for you by your browser. The XML follows TEILite, and validates on a validating parser; the HTML should validate on a validating HTML parser. Mercello's PGTEI is close enough to TEI that this will probably give very decent results on his files too. He basically added a few small extentions to TEILite, which are "documented" in his well commented DTD or XSLT sheets. (But that is stuff for specialists really) Jeroen. From jeroen at bohol.ph Thu Oct 21 13:33:34 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Thu Oct 21 13:33:10 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <4177E9CB.7080200@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de><20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de><20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de><20041021150227.GA17442@pglaf.org> <4177E9CB.7080200@perathoner.de> Message-ID: <41781D1E.5010700@bohol.ph> > > People being eager to try TEI and there being no official place to > post TEI files, everybody has posted the files they have marked up in > a different place. I have been working on my dialect, Jeroen on his > and DP is cooking up another one. There is no central "clearing house" > where we can see the other guys work. I don't say it would be > impossible for me to obtain a glimpse of the TEI texts the folks at DP > are working on, it would just be much easier if I could get them from > the archive. Personally, I try to stick as closely to TEILite as possible. I can add extentions to it, but then can easily produce XSLT to pull out those extentions before posting. I think a few of Marcello's extentions for PGTEI are not needed, as elements exists to encode the same information, or alternative mechanisms can be divised within TEILite -- but even if you stick to pure TEILite, you will need to agree on conventions, for example, I leave in quotation marks (as I have numerous old works that deal with these in a very irregular way, turning them to and would be difficult. Marcello leaves them out. We can fix an XSLT to re-supply them, and even an XSLT to supply them only if they are removed (given we agree on a standard way of documenting this fact) -- and that is what you need if you're working on a certain project using TEI -- a gentle intruduction and some guidelines. A few very nice ones are on the Net. If people wish, I can set up a website with TEI versions of _all_ my posted texts, both in my original master SGML, and converted to XML. Gives everybody something to experiment with. Jeroen. From joshua at hutchinson.net Thu Oct 21 13:58:02 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 21 13:58:11 2004 Subject: [gutvol-d] A layman's critique of PGTEI Message-ID: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com> I used the online converter. I'm baby-stepping my way into this. Marcello's converter was the easiest starting point for me. I didn't check into any closer, but it looks like there is a link to a .zip file containing Marcello's XSLT stylesheets. Any I don't think we all reinventing the wheel... Other than Marcello's stuff, I don't see ANY body's wheel out there. I'm working with what I have available. I figure if a TEI newb like me can get something working reliably, we're getting somewhere. Josh ----- Original Message ----- From: "David A. Desrosiers" To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] A layman's critique of PGTEI Date: Thu, 21 Oct 2004 16:04:10 -0400 (EDT) > > > > First of all, please take this as it is intended, namely my experiences > > while attempting to convert a short and sweet text to the PGTEI format > > found here (http://www.gutenberg.org/tei/). It is my hope that this > > will lead to some improvements in the process. > > I went through every link there, and could not find a reference to > download any sort of tool or set of tools that purports to convert the TEI > format to other formats. Where did you find the converter to use locally? > > The docs link to an "online" converter, which no longer exists at > that link, apparently. This one is dead: > > http://www.gutenberg.net/testing/gnutenberg/tei-online.php > > This one (linked from the front page) is not: > > http://www.gutenberg.org/tei/services/tei-online > > I don't see where the code to this online converter, or any > converter that works with TEI for that matter, is documented, referenced, > or linked to. Did I miss the link on one of the other pages? It looks like > we're all back to square one... reinventing all of our own wheels from > scratch. > > > David A. Desrosiers > desrod@gnu-designs.com > http://gnu-designs.com > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hacker at gnu-designs.com Thu Oct 21 14:06:43 2004 From: hacker at gnu-designs.com (David A. Desrosiers) Date: Thu Oct 21 14:07:27 2004 Subject: [gutvol-d] A layman's critique of PGTEI In-Reply-To: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com> References: <20041021205802.255382F8EB@ws6-3.us4.outblaze.com> Message-ID: > Any I don't think we all reinventing the wheel... Other than Marcello's > stuff, I don't see ANY body's wheel out there. I'm working with what I > have available. You've nailed the problem dead-on. Nobody is providing any tools or converters for this, and hence, everyone is forced to reinvent their own version of the wheel. And because they had to do it themselves, they don't care to release it (or are embarrassed because of the quality of their own code), compounding the problem for everyone else. There is a real, documented, psychology behind this, and I think this stumbling block is really causing a lot of fracturing amongst the rest of the potential contributors out there, myself included. David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From Bowerbird at aol.com Thu Oct 21 14:29:49 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 14:30:01 2004 Subject: [gutvol-d] 126 messages in one day Message-ID: <155.41991e6b.2ea9844d@aol.com> david said: > Excuse me, "Listserve" is the trademarked name > of a product owned and created by L-Soft International, Inc. right. and amazon has a patent on one-click web purchasing. and various companies now "own" pieces of the human genome. if l-soft wants to come after me, they know my e-mail address. -bowerbird From Bowerbird at aol.com Thu Oct 21 14:37:33 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 14:37:51 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <1e2.2cc70258.2ea9861d@aol.com> joshua said: > Honestly, I see an easy compromise here. oh-oh, something smells bad here... :+) > As long as a conformant TEXT file > and a conformant HTML file > show up with the XML file, > I say post all three. well then i'm sure glad you don't _have_ a say... > Granted, right now we don't have a method > for the WW'ers to verify the XML file is valid all these minor details, eh? > so if you want to put a disclaimer to that effect in the file ... fine. a disclaimer? that's your "compromise". yeah, right. for those on the outside, who wonder why this fuss is being made whether an .xml file can be "posted", it's because the x.m.l. people want the imprimatur of being "official". why is that so important? because that's how the x.m.l. ponzi game is being played these days. people are adopting x.m.l. not because they think it's the best route, but rather because they've been "convinced" that it is "inevitable". even though -- in too many situations -- it just plain doesn't work, this "inevitability" makes people shrug and say, "ok, give me some." after all, you don't want to miss out on the ground floor, do you? they will tell you time and time again how there are "so many tools" for dealing with x.m.l., how x.m.l. is gonna be able to do conversions for whatever format anyone wants, but they can't even demonstrate a simple ability to convert out a text file and an .html version now. and when you call them on that, they whine about how unreasonable you are being, and how unfair it is to expect "150% perfection". bull. it is _far_ more sensible -- especially in, as jim delicately put it, a "production environment rather than an experimental one" -- to make the process _work_ before you put it in play, the x.m.l. people don't want to be bothered with that "technicality" before the fact. that's something that someone will figure out "later". yeah, right. if x.m.l. gets the stamp of approval here, what's the motivation for x.m.l. experts to come make it work? after all, there's no money in it for them here. they're off being high-paid consultants, telling the next mark, "look, even project gutenberg is using x.m.l. now too." as marcello puts it: > At this point we need to set a signal that the TEI era has started. he's not interested in actually making t.e.i. _work_ in reality -- he tried, and got a grand total of two simple e-texts done -- he just wants to "set a signal" that the "era has started" here... -bowerbird From marcello at perathoner.de Thu Oct 21 15:05:02 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 15:05:15 2004 Subject: [gutvol-d] A layman's critique of PGTEI In-Reply-To: References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> Message-ID: <4178328E.8050502@perathoner.de> David A. Desrosiers wrote: > I don't see where the code to this online converter, or any > converter that works with TEI for that matter, is documented, > referenced, or linked to. The link is on this page. http://www.gutenberg.org/tei/ http://www.gutenberg.org/tei/src/gnutenberg-press-0.0.2.tgz -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 21 15:21:54 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 15:22:08 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <145.3687ad7b.2ea99082@aol.com> marcello said: > Have you got any data to sustain your theory, like, uhm, > a representative poll of pg user population? first of all, let me say that i think it would be _great_ to have some way to survey project gutenberg users! might open a lot of eyes around here. (or probably not.) i've talked to a lot of people about project gutenberg, and discussed it on a lot of listserves over the years. since i'm not involved in it, and because i'm curious, i learn what people think; i hear the good and the bad. and you know what the biggest eye-opener was for me? it was last december, at the 10,000th-e-text gathering, when michael gave a talk at the berkeley public library. many of the attendees seemed to be just learning about the project; they were excited, but had some questions. the most frequent one revolved on how to read the e-texts. in other words, these people didn't even know how to open and read a text-file, or an .html file. and when greg had to try and explain a "zip" file, their eyes started to glaze over. we overestimate -- and that word is an understatement -- the sophistication of the audience far too frequently here. so i think it'd be great to open a communication pipeline between us and them, so we got to know them a bit better... anyway... my view on the factors that have made project gutenberg a success is based on listening to what michael himself says, coupled with some very-long-term observation of the many different e-book projects that have _failed_ along the way... the more they depended on tech not yet in the mainstream, the faster they plummeted. the more they depended on having the newest hardware, the faster they plummeted. the more they depended on special knowledge by the user, the faster they plummeted. the more they cluttered up the text with extraneous stuff -- including everything from proprietary formats to d.r.m. -- the faster they plummeted. yet michael's project -- which michael himself considers to be successful exactly because he stripped down to basics -- maintained its ground and grew just as he predicted it would. and today, most computer owners have grown totally weary of the need to constantly update, to buy new hardware and/or install complex new software. they are digging their heels in, deciding to make do with what they have. p.c. sales have been in _decline_ for years, after a decade-plus of yearly increases. and the situation will only get worse as things go from here. software has always grown the must-upgrade pie in the past, but there's just no money in it now, so the decline will spiral. when billy g. talks to his shareholders these days, he doesn't talk about his software; nope, he talks about his i.p. patents. "innovation" used to be his buzzword (albeit a very big lie), but now it's "licensing" (and this one we can surely believe). when the 800-pound gorilla decides to get in your way, beware. an absence of reasons to upgrade will make users dig in deeper. they'll live with what they've got, and we'll have live with that. and this is _not_ -- as some techies would have you believe -- because they are stupid, or don't want to "grow", it's because they don't want to always have to be updating their computer. just like some people like to tinker with their car -- great! -- but other people just want to get in it and drive somewhere... i'm from the mac side, where the mantra has always been "it just works", so maybe i'm biased, but it just _amazes_ me how much time the rest of the world spends _fiddling_. but guess what? you've used up all of the users' patience. if what you give 'em won't work without fiddling, forget it. and i'll kick this up to another level of abstraction as well. the reason _books_ -- paper ones -- have been so successful is because they are utterly and completely _simple_ to use. a child can learn how to use a book. the more difficulty that you tack onto electronic-books, the more you buck history... books have also been an instrument that let people _rise_. it's part of the philosophy underlying public library systems. when you move books out to a place where only new machines can access them, you're making a very bad political decision. instead of books closing the gap between the rich and the poor, they become another wedge making that gap bigger and bigger... e-texts "too cheap to meter" is michael's bedrock philosophy, and moving this library to a technological methodology that won't run on the trailing-edge -- let alone one that cannot even prove itself to _work_ -- does that philosophy grave disservice. in conclusion... what this means is that the new imperative is to "make it work" using the existing infrastructure, or you too will plummet, fast. and _that_ means that if you want to introduce some innovation, the obligation should be on you to prove that it works, first... the end-user is tired of being your guinea pig... -bowerbird From marcello at perathoner.de Thu Oct 21 15:43:55 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 15:44:07 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <1e2.2cc70258.2ea9861d@aol.com> References: <1e2.2cc70258.2ea9861d@aol.com> Message-ID: <41783BAB.2020901@perathoner.de> Bowerbird@aol.com wrote: > as marcello puts it: > >> At this point we need to set a signal that the TEI era has started. > > he's not interested in actually making t.e.i. _work_ in reality > -- he tried, and got a grand total of two simple e-texts done -- Bzzzzt. Wrong. But thank you for playing. He has *25* titles marked up in 3 languages. Ranging from Alice (illustrated), to Life on the Mississippi (tables and footnotes), Faust and Wallenstein (plays), Deutschland. Ein Winterm?rchen (lyrics) and a technical manual about, guess what? PGTEI. Go to http://www.gnutenberg.de/search/titles/results/ and eat crow, Bowerbird. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Thu Oct 21 15:39:32 2004 From: shalesller at writeme.com (D. Starner) Date: Thu Oct 21 15:51:31 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <20041021223932.16F954BDA9@ws1-1.us4.outblaze.com> Bowerbird@aol.com writes: > what this means is that the new imperative is to "make it work" > using the existing infrastructure, or you too will plummet, fast. > > and _that_ means that if you want to introduce some innovation, > the obligation should be on you to prove that it works, first... So does that mean you're going to stop pushing ZML on us? Or are you going to argue over the definition of the word "is" again? -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Thu Oct 21 15:59:13 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 15:59:39 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <12a.4e8737b7.2ea99941@aol.com> marcello said: > He has *25* titles marked up in 3 languages. have you run out the .html and .txt versions, and put them online, so i can evaluate them? > Go to > http://www.gnutenberg.de/search/titles/results/ > and eat crow, Bowerbird. actually, i'm tired of being a guinea pig, so i won't be going anywhere today, thanks. if you say you've got 25 titles done, i'll believe it. but you could have 250 done and it wouldn't change my point. let me know when you get to 2,500, marked up and converted to .html and plain-text. because that number will impress me. -bowerbird From marcello at perathoner.de Thu Oct 21 15:59:55 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 16:00:08 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <145.3687ad7b.2ea99082@aol.com> References: <145.3687ad7b.2ea99082@aol.com> Message-ID: <41783F6B.1080500@perathoner.de> Bowerbird@aol.com wrote: > i've talked to a lot of people about project gutenberg, > and discussed it on a lot of listserves over the years. Dissed and cussed, yes, but not discussed. > the most frequent one revolved on how to read the e-texts. > in other words, these people didn't even know how to open > and read a text-file, or an .html file. and when greg had to > try and explain a "zip" file, their eyes started to glaze over. So its better to give them a reader that freezes on them the first time they use it and takes their whole machine down if they press ctrl-alt-del. > the reason _books_ -- paper ones -- have been so successful > is because they are utterly and completely _simple_ to use. Thats what you think: take a look here if your decrepit Macintrash can handle videos: http://homepages.nyu.edu/~mz34/helpdesk.WMV -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Thu Oct 21 16:03:35 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 16:03:47 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <145.3687ad7b.2ea99082@aol.com> References: <145.3687ad7b.2ea99082@aol.com> Message-ID: <41784047.8060203@perathoner.de> Bowerbird@aol.com wrote: > what this means is that the new imperative is to "make it work" > using the existing infrastructure, or you too will plummet, fast. I heard your program still crashes after printing the headline ... But, of course, the advice you are meting out is for other people to follow, not for you. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 21 16:23:41 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 16:24:02 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <9e.176f512e.2ea99efd@aol.com> mar cello said: > Dissed and cussed, yes, but not discussed. actually, i think you would be hard-pressed to find even one place where i have said something bad about project gutenberg. i give it very high praise. even on my upcoming blog, where i fully intend to speak frankly about some bad decisions that i think are being made here by some people around michael, i will continue to say good things about the library, and especially about michael. as many people recall, you tarking naugshlocks here have often accused me of "kissing michael's ass". no one's ever said that about me, not in regard to anyone, but i do think michael is a genius, so i'll continue to say good things about him and his library. but some _other_ people might not get such rosy treatment... > So its better to give them a reader that > freezes on them the first time they use it and > takes their whole machine down if they press ctrl-alt-del. beta-test software can do that sometimes. but i've had not one report of my program doing that, if that's what you are trying to imply here, marcello. if the tester can't report or replicate the crash, and describe the conditions of its occurrence too, i'll believe it's a problem unique to their machine. most likely with their windows operating system. maybe you've heard there are a few bugs in that... > Thats what you think: take a look here > if your decrepit Macintrash can handle videos: you think you're cool insulting someone because their machine is old, don't you? but yeah, my mac can "handle videos" just fine, thank you. so can the authoring-tools that i _program_ on this old mac. of course, they also work fine on the circa-1989 mac that i used before this one, thanks to a thing called quicktime, have you heard of that?, so i guess it's not too surprising... -bowerbird From Bowerbird at aol.com Thu Oct 21 16:28:47 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 16:29:06 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: <156.420709c0.2ea9a02f@aol.com> marcello said: > I heard your program still crashes after printing the headline ... i've heard not a single report of that, confirmed or unconfirmed. but if you can report that, take it to my beta-test listserve. the people signed up to this listserve don't want to hear it... -bowerbird From marcello at perathoner.de Thu Oct 21 17:28:08 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 21 17:28:23 2004 Subject: [gutvol-d] Best of Bowerbird Message-ID: <41785418.4020306@perathoner.de> New "Best of Bowerbird" fansite at: http://www.gnutenberg.de/bowerbird/ -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Oct 21 17:39:20 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 17:39:52 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand Message-ID: starner said: > So does that mean you're going to stop pushing ZML on us? pushing? that's a rather loaded term, don't you think? and inaccurate too. z.m.l. is a solution to problems with the library. i will be the person who implements that solution. i came here to _show_ you that solution, so if y'all were _smart_ enough to see it as such, y'all could help me with that implementation... but i didn't count any chickens before they were hatched, so it is no loss to me that y'all are unable to see clearly. i was always willing to implement the solution myself, and i still am. what _i_ got out of the whole deal was a much more detailed picture of how to make it happen, by virtue of having explained it some six ways to sunday. meanwhile, same rule as day one: the proof is in the pudding. > Or are you going to argue over the definition of the word "is" again? what? -bowerbird From jtinsley at pobox.com Thu Oct 21 18:20:30 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Thu Oct 21 18:20:45 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <4177E9CB.7080200@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org> <4177E9CB.7080200@perathoner.de> Message-ID: <20041022012030.GA23907@panix.com> On Thu, 21 Oct 2004 18:54:35 +0200, Marcello Perathoner wrote: >I feel Jim is raising artificial objections he knows we cannot overcome. >If he doesn't want to learn TEI and he doesn't feel like proofing a TEI >text in emacs, fine. But then, he should step aside and let other people >do this work. I find this very offensive. I came home, and was reading happily enough through the threads until this. I differ with you quite profoundly about the implementation of XML, and, I'm sure, several other issues. But my opinions are honest, and based on what I believe is best for PG as a whole. I do not "raise artificial objections" -- these are the expectations I have had for XML as far back as I can remember, and they are expectations regularly assumed, if not met, by people who evangelize XML. I "learned TEI" (not all of it, of course) with the hope of using it in PG, in late 2001/early 2002, and I marked up my first book in XML in February, 2002, which was long before I ever heard your name. If you can't accept that I am debating these issues in good faith, there is no point in continuing this discussion. jim From cannona at fireantproductions.com Thu Oct 21 18:29:56 2004 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu Oct 21 18:31:18 2004 Subject: [gutvol-d] Best of Bowerbird In-Reply-To: <41785418.4020306@perathoner.de> References: <41785418.4020306@perathoner.de> Message-ID: <6.1.2.0.0.20041021202349.01c217f8@mail.fireantproductions.com> I was just wondering today which text would be deemed worthy of becoming #15000, now that we're close. I'll be looking forward to seeing it posted. http://www.gutenberg.org/dirs/1/5/0/0/15000/15000.zml At 07:28 PM 10/21/2004, you wrote: >New "Best of Bowerbird" fansite at: > > http://www.gnutenberg.de/bowerbird/ > > > >-- >Marcello Perathoner >webmaster@gutenberg.org > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From Bowerbird at aol.com Thu Oct 21 18:50:21 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 21 18:50:41 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <1e2.2cced801.2ea9c15d@aol.com> > If you can't accept that I am debating these issues in good faith, > there is no point in continuing this discussion. oh c'mon, jim, "good faith"? from marcello? yeah, i got a good laugh out of that one... :+) -bowerbird From Gutenberg9443 at aol.com Thu Oct 21 22:05:19 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Thu Oct 21 22:05:39 2004 Subject: [gutvol-d] English and request Message-ID: <193.31ace50b.2ea9ef0f@aol.com> I don't know what European language is spoken in Somalia; in the past parts of it have been colonized by British, French, and Italians. I met a Somali refugee today, one of a good many that our Church has "adopted." Catholic charities are also working with some of the Somalis in Salt Lake City, and I'm sure that other churches I don't know about are doing the same. In this small group, there is one adult male, who speaks broken English; two young adult women, one of whom speaks broken English and the other of whom speaks only a few words; an older adult woman who speaks very little English; and four children. They are all Muslims, and their first language is Somali. The man got a job about two days after he got here; the young adult women are both starting work tomorrow, even though one of them had a miscarriage only yesterday. The grandmother will be caring for the children while the other adults are working. They all want desperately to learn English. PG has a lot of children's books, and I can prepare a CD of them. But T and I don't have a spare computer. Does ANYBODY have an extra laptop or notebook that could be given to them? If so, let me know, and I'll find out whether it should be sent directly to them or to me to get to them. I think the grandmother could learn a lot more English in a hurry if they had a computer and if the two adults who speak reasonably good English would read and translate. Uh . . . what was that about people in Africa don't need books in English? We gave them some paper reference books, but we long since gave all our children's books to the shelter for battered women and their children. We'll probably be able to get them some books through the library's book sale, but having a whole CD of children's books and a way to read them would help them so much. I am a scholar. I have written reference books, and I am very grateful to those who post scholarly material, especially Pepys's diaries, which are some of the most fascinating books I have ever read. But to my mind, it is these people, and others in unfortunate situations or locations, to whom PG should be mainly aiming. Scholars are going to get their books one way or another, if they have to hitchhike to the closest good library. I grew up in a town without a library; when I went to visit my grandmother she knew that the first place she had to take me was the library. What a blessing it would have been to me to have computers and PG then! Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041022/27260b29/attachment.html From brad at chenla.org Thu Oct 21 23:34:32 2004 From: brad at chenla.org (Brad Collins) Date: Thu Oct 21 23:36:19 2004 Subject: [gutvol-d] Aside on old computers In-Reply-To: (Karen Lofstrom's message of "Wed, 20 Oct 2004 17:49:51 -1000 (HST)") References: <4175B419.6030301@adelaide.edu.au> <4175EF86.8060108@adelaide.edu.au> Message-ID: Karen Lofstrom writes: > On Wed, 20 Oct 2004, Steve Thomas wrote: > >> [I can't believe that people still think they're doing good by >> shipping old 486's to Africa -- but apparently its true. I >> recently donated some old Pentium II's to a charity, and they >> couldn't believe their luck.] > > My Linux users group installs thin client computer labs for schools. We > happily accept PIIs, but turn down 486s. We use PIIs and PIIIs as thin > clients, removing the hard drives and installing bootable NIC cards, and > connect them to a fast server running K12LTSP Linux. We can create a > usable 30 client computer lab for $3000 or so, since the clients are all > donations. > I can't speak for Africa, but I have spent the last 14 years living in the deepest parts of China, Laos, and Cambodia. In the last 7 years I have not seen anything older than a PII except in a few old government systems and ancient bank computer networks running OS/2. And I'm not talking about the big cities like Vientiene, I'm talking about villages which barely have electricity and other odd corners with flakey old generators grumbling in blackend soot covered back sheds. Just because the electricity is only on for a few hours a day doesn't mean that people don't have access to okay technology. Hell, I've seen rice farmers along the Mekong River using picture phones to send pictures of babies to relatives in Bangkok. The third world ain't always as backward as people in the first world think. There are large areas that are that bad, but then ebooks will not be an option for them until they have bridges connecting them to settled areas, or proper water, bottled gas for cooking.... b/ -- Brad Collins , Bangkok, Thailand From tb at baechler.net Fri Oct 22 01:14:40 2004 From: tb at baechler.net (Tony Baechler) Date: Fri Oct 22 01:13:45 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <4177E87C.90101@dsl.pipex.com> References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> Message-ID: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com> At 05:49 PM 10/21/2004 +0100, you wrote: >>So, does this mean that I now not only have to download the master xml >>file, the css, and a set of conversion tools? You must be kidding, >>right? If it came to that, I would rather have the plain text and forget >>the page numbers. It is already inconvenient to use "lynx -dump -nolist >>filename.htm." Why in the world would I want to run it through a >>conversion tool and still have to do that anyway? OK, so a plain text >>file can be output directly from the xml. I still have to go through at >>least one extra conversion step that I wouldn't have to otherwise. > >Why? The whole idea behind PG moving to XML is not to complicate things, >it's to give more flexibility while retaining simplicity. How about this >situation: Apparently context was lost here. The "why" is that, according to what Joshua was saying, the page numbers are not available anywhere in the plain text because they would look ugly. OK, I understand that and I myself might not even want them most of the time. However, if I decide that for a particular file I want them, I have to go to the master xml document and do my own conversion. The PG supplied plain text won't help me, and the html won't work correctly in Lynx or IE. Therefore, I have to redo the conversion to get the information I want in the plain text file or whatever other format. This does not seem simpler to me. From traverso at dm.unipi.it Fri Oct 22 01:33:31 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Oct 22 01:33:55 2004 Subject: [gutvol-d] Aside on old computers In-Reply-To: (message from Brad Collins on Fri, 22 Oct 2004 13:34:32 +0700) References: <4175B419.6030301@adelaide.edu.au> <4175EF86.8060108@adelaide.edu.au> Message-ID: <200410220833.i9M8XVW4016806@posso.dm.unipi.it> >>>>> "Brad" == Brad Collins writes: Brad> Karen Lofstrom writes: >> On Wed, 20 Oct 2004, Steve Thomas wrote: >> >>> [I can't believe that people still think they're doing good by >>> shipping old 486's to Africa -- but apparently its true. I >>> recently donated some old Pentium II's to a charity, and they >>> couldn't believe their luck.] >> My Linux users group installs thin client computer labs for >> schools. We happily accept PIIs, but turn down 486s. We use >> PIIs and PIIIs as thin clients, removing the hard drives and >> installing bootable NIC cards, and connect them to a fast >> server running K12LTSP Linux. We can create a usable 30 client >> computer lab for $3000 or so, since the clients are all >> donations. >> Brad> I can't speak for Africa, but I have spent the last 14 years Brad> living in the deepest parts of China, Laos, and Cambodia. Brad> In the last 7 years I have not seen anything older than a Brad> PII except in a few old government systems and ancient bank Brad> computer networks running OS/2. Brad> And I'm not talking about the big cities like Vientiene, I'm Brad> talking about villages which barely have electricity and Brad> other odd corners with flakey old generators grumbling in Brad> blackend soot covered back sheds. When I started a EU-financed international research project on symbolic computation, some 12 years ago, the computers that we were using were 486. And they were running linux, (slackware) X, and I was able to run TeX, and view high quality output. I am still using and developing the software that we wrote in this project. There should be something wrong if in 12 years what was good for a half-million-dollar research project isn't even good for a forest village. Not only this, but also the following generation of processors (Pentium-I). Carlo From traverso at dm.unipi.it Fri Oct 22 02:06:53 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Oct 22 02:07:16 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <41783BAB.2020901@perathoner.de> (message from Marcello Perathoner on Fri, 22 Oct 2004 00:43:55 +0200) References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> Message-ID: <200410220906.i9M96rMS019592@posso.dm.unipi.it> The problem is how to have beta-testing AND respect PG tradition of posting only definitive stuff. Would this be useful? I might offer web space, computing and bandwidth to post XML, convert it to txt and html and what else, and submit the result to whitewashing. You will be able to have installed all the software to handle the conversion, and have submissions converted by automatic procedures. This might be seen as a beta-test of xml whitewashing procedures. I am at most neutral to xml (I recognize its unavoidability, but I complain the trend, I would prefer a more human-friendly markup). So it will not be pro-XML biased. And I am authorized to whitewashing, so this can be seen as making my whitewashing in public. The posting-and-converting should be automatic: a web interface to submit a zip/tar.gz/tar.bz2 file, semi-automatic unzipping and conversion, poster and site administrator OK to make the posting public. Then the whitewashing could start WITHOUT corrections: if anything in the result is wrong, then one should repeat the submission. If the post will be XML + converted, or converted only, will be PG choice. The posts, complete of XML, will remain indefinitely on the test site. Of course, an additional line will be included to warn that the file is not an official PG file but only an intermediate working file. But except for this line, everything should be identical to a PG file, header and footer, PG number and filename included. Drawback: the server is located in Italy, so I cannot do it for non-EU clearable items. You'll have to submit clearance for death+70 (with procedures to decide, but a copy of a LOC authority record or an encyclopaedia article will of course be enough). Carlo From marcello at perathoner.de Fri Oct 22 03:15:59 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 03:16:21 2004 Subject: [gutvol-d] English and request In-Reply-To: <193.31ace50b.2ea9ef0f@aol.com> References: <193.31ace50b.2ea9ef0f@aol.com> Message-ID: <4178DDDF.4010307@perathoner.de> Gutenberg9443@aol.com wrote: > Uh . . . what was that about people in Africa don't need books in English? You said those people were refugees. You said they came to the US and wanted to learn English. That very is clear. But I fail to see how this could possibly imply that people who live in Africa wanted to learn English. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Oct 22 03:22:50 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 03:23:15 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com> Message-ID: <4178DF7A.6000808@perathoner.de> Tony Baechler wrote: > according to what > Joshua was saying, the page numbers are not available anywhere in the > plain text because they would look ugly. > However, if I decide > that for a particular file I want them, I have to go to the master xml > document and do my own conversion. > This does not seem simpler to me. That may not be simple but is still better than what you have now: if the txt file happens to lack the page numbers there is no way you could get them short of redoing the book. In TEI, it may be not quite simple to set up, but you *can* do it. (And you can do lot more.) Im my eyes that's a big advantage. Of course, XML is not and has never claimed to be the solution to all the world's problems. ZML is :-) -- Marcello Perathoner webmaster@gutenberg.org From stephen.thomas at adelaide.edu.au Fri Oct 22 03:47:10 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Fri Oct 22 03:47:40 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <200410220906.i9M96rMS019592@posso.dm.unipi.it> References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> <200410220906.i9M96rMS019592@posso.dm.unipi.it> Message-ID: <4178E52E.2030107@adelaide.edu.au> A question (possibly better put over on the DP list): Is it possible to OCR a scan directly to XML? Or is the output from OCR always going to be text? If the first, then we need two processes -- one to deal with new scans (OCR to XML), one to deal with existing plain texts (to convert them to XML). But if the output of OCR is still going to be plain text, then we can use the same process to convert both existing and new books to XML. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From marcello at perathoner.de Fri Oct 22 03:49:34 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 03:49:56 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <200410220906.i9M96rMS019592@posso.dm.unipi.it> References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> <200410220906.i9M96rMS019592@posso.dm.unipi.it> Message-ID: <4178E5BE.9010805@perathoner.de> Carlo Traverso wrote: > The problem is how to have beta-testing AND respect PG tradition of > posting only definitive stuff. I believe the PG policy is (or at least, has been at some point) to encourage the posting of preliminary material. From the PG header: Please note: neither this list nor its contents are final till midnight of the last day of the month of any such announcement. The official release date of all Project Gutenberg Etexts is at Midnight, Central Time, of the last day of the stated month. A preliminary version may often be posted for suggestion, comment and editing by those who wish to do so. To be sure you have an up to date first edition [xxxxx10x.xxx] please check file sizes in the first week of the next month. That is exactly what we want to do: post a preliminary version for suggestion, comment and editing. I don't understand why this is not possible for a TEI file. > I might offer web space, computing and bandwidth to post XML, convert > it to txt and html and what else, and submit the result to > whitewashing. Thank you. As for the server I can also offer one located in Germany, so the same limitations apply. But this is sooo tedious! We have to replicate the exact setup of gutenberg.org *and* pglaf.org to get reliable results from the beta-test. Example: my servers are all debian and have perl 5.8 whereas ibiblio is redhat enterprise with perl 5.6. This has often before given me headache because programs that ran at home, misteriously failed at ibiblio. -- Marcello Perathoner webmaster@gutenberg.org From holden.mcgroin at dsl.pipex.com Fri Oct 22 03:54:23 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Fri Oct 22 03:53:45 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> <5.2.0.9.0.20041021000757.02562720@snoopy2.trkhosting.com> <5.2.0.9.0.20041021064220.0200b760@snoopy2.trkhosting.com> <5.2.0.9.0.20041022011051.02005570@snoopy2.trkhosting.com> Message-ID: <4178E6DF.9020101@dsl.pipex.com> Tony Baechler wrote: >>> So, does this mean that I now not only have to download the master >>> xml file, the css, and a set of conversion tools? You must be >>> kidding, right? If it came to that, I would rather have the plain >>> text and forget the page numbers. It is already inconvenient to use >>> "lynx -dump -nolist filename.htm." Why in the world would I want to >>> run it through a conversion tool and still have to do that anyway? >>> OK, so a plain text file can be output directly from the xml. I >>> still have to go through at least one extra conversion step that I >>> wouldn't have to otherwise. >> >> >> Why? The whole idea behind PG moving to XML is not to complicate >> things, it's to give more flexibility while retaining simplicity. How >> about this situation: > > Apparently context was lost here. The "why" is that, according to what > Joshua was saying, the page numbers are not available anywhere in the > plain text because they would look ugly. OK, I understand that and I > myself might not even want them most of the time. However, if I decide > that for a particular file I want them, I have to go to the master xml > document and do my own conversion. The PG supplied plain text won't > help me, and the html won't work correctly in Lynx or IE. Therefore, I > have to redo the conversion to get the information I want in the plain > text file or whatever other format. This does not seem simpler to me. Why must _you_ do it? If the information's available, then it would be TRIVIAL to add an option to the TXT or HTML converter which says "check here if you want page numbers included." We're really arguing over features in a system which hasn't been built yet, where even the form of the system isn't even set yet. _I_ can envision a system where we have the standard TXT and HTML files generated in the same format as we have them now but where there's a simple web page where you can configure the version you want. Want Page Numbers? Tick a box. Want each chapter in a separate file? Tick a box. So, whereas before, you had to have the standard TXT or HTML versions because that was all that was available, now we can actually talk about making customised versions as people want them. Maybe the settings could even be stored as a Cookie so you choose which settings you want once then every time you look at a text on PG, the text will be created as _you_ like it. We can only do cool stuff like this _because_ we're creating this new super-format which contains information far beyond what was previously available in the TXT and HTML versions. Cheers, Holden From marcello at perathoner.de Fri Oct 22 03:57:00 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 03:57:23 2004 Subject: [gutvol-d] A layman's critique of PGTEI In-Reply-To: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> References: <20041021193831.7050C109723@ws6-4.us4.outblaze.com> Message-ID: <4178E77C.2010309@perathoner.de> Joshua Hutchinson wrote: > Well, that's my quick personal experiment. My question for the > experts: Can the HTML validation problem be easily fixed? At present there is a known bug in the title page. The converter produces a H1 inside a SPAN. (The SPAN should be a DIV or the H1 should be dropped.) This is a thing i would like to postpone until we are agreed about how to format a title page. There are far too many ways you can do that according to the specs. Supporting them all will be very difficult. I didn't get any other warnings running your example thru the validator. Can you mail me the output of the validator with "show code" enabled? > I'd also > like to request a change to the CSS used, but that is a personal > preference and something to really worry about after the > show-stoppers are fixed. The CSS is in an external file. You can make your own. -- Marcello Perathoner webmaster@gutenberg.org From traverso at dm.unipi.it Fri Oct 22 04:20:58 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Oct 22 04:21:23 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <4178E5BE.9010805@perathoner.de> (message from Marcello Perathoner on Fri, 22 Oct 2004 12:49:34 +0200) References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> <200410220906.i9M96rMS019592@posso.dm.unipi.it> <4178E5BE.9010805@perathoner.de> Message-ID: <200410221120.i9MBKwwx022743@posso.dm.unipi.it> >>>>> "Marcello" == Marcello Perathoner writes: Marcello> Example: my servers are all debian and have perl 5.8 Marcello> whereas ibiblio is redhat enterprise with perl 5.6. This Marcello> has often before given me headache because programs that Marcello> ran at home, misteriously failed at ibiblio. That's one of the points. The conversion tools are mature when they are independent on the exact version of the software that you have. And having a "neutral" site for testing is one of the important points: you cannot rely on your own configuration. PG has to rely on tools that are stable, not on bleeding edge. Carlo From marcello at perathoner.de Fri Oct 22 05:24:53 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 05:25:18 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <200410221120.i9MBKwwx022743@posso.dm.unipi.it> References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> <200410220906.i9M96rMS019592@posso.dm.unipi.it> <4178E5BE.9010805@perathoner.de> <200410221120.i9MBKwwx022743@posso.dm.unipi.it> Message-ID: <4178FC15.1020303@perathoner.de> Carlo Traverso wrote: > That's one of the points. The conversion tools are mature when they > are independent on the exact version of the software that you > have. I was referring to the scripts that run the catalog. > PG has to rely on tools that are stable, not on bleeding edge. This is open source development. We dont have enough resources to test the tools everywhere before releasing. We need bug reports and patches from the people out there. I don't even have a Winsloth machine ... The mentality of "everything has to be perfect before we start" doesn't work. Linus didn't post Linux when it was ready, he posted it when it was no more than a filesystem with a bit of memory management attached. Tim Berners-Lee didn't start with XHTML 1.1. He started with what he had and refined it later. Michael Hart didn't wait till he got a computer that understood lower case. He started with upper case only and fixed that later. Success stories. Not an argument, but maybe an illustration. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Fri Oct 22 05:45:37 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 05:46:00 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <20041022124537.B625AEDE84@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Tony Baechler > > Apparently context was lost here. The "why" is that, according to what > Joshua was saying, the page numbers are not available anywhere in the plain > text because they would look ugly. OK, I understand that and I myself > might not even want them most of the time. However, if I decide that for a > particular file I want them, I have to go to the master xml document and do > my own conversion. The PG supplied plain text won't help me, and the html > won't work correctly in Lynx or IE. Therefore, I have to redo the > conversion to get the information I want in the plain text file or whatever > other format. This does not seem simpler to me. > You're right, converting your own plain text is not simpler. Right now, if you grab the plain text file of any project in the collection, it won't have page numbers. Right now, if you grab a few select HTML files in the collection, it has an option to show page numbers. In the future, if you want what he have now, nothing will change. You'll grab the text file and it won't have page numbers. In the future, if you want something more/different than what we have now ... you can get it, but it requires an extra step. Right now, you can't get it, period. That's the advancement that XML promises. I really want people to understand that the move to XML master documents will NOT take away anything from what we have now. It will give us more options beyond that basic setup. Josh PS In the distant future, I foresee a web page where you can customize what options you want and the server generates the file to your specifications on the fly. That's down the road, though. From joshua at hutchinson.net Fri Oct 22 05:54:43 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 05:55:06 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Steve Thomas > > A question (possibly better put over on the DP list): > > Is it possible to OCR a scan directly to XML? Or is the output > from OCR always going to be text? > That is a very DP related question, but I'll answer here as best as I understand the future plans (and let others correct me where needed). The plan at DP is to move from the current 2 round proofing model to a (probably) 4 round proofing/markup model. The content provider will take the scans and OCR them normally. That part doesn't change. Then, there are 2 rounds of proofing that concentrate on typos, spelling, etc. Very similar to the 2 rounds we have now. Then, there are 2 MORE rounds of markup. Here is where all the markup like poetry, italics/bold, footnotes, chapter headings, thoughtbreaks, etc, etc are done. Then, when the final result gets out of 4 rounds, it is nicely marked up (in theory) XML. The post-processor does his/her normal magic, combining all the pages, running validators on it, etc. As far as the OCR process, we currently run some pre-processors on text to fix common scannos, etc. I'd be surprised if those pre-processors didn't improve/change as the XML world emerges at DP. Josh From joshua at hutchinson.net Fri Oct 22 06:03:11 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 06:03:34 2004 Subject: [gutvol-d] A layman's critique of PGTEI Message-ID: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com> My apologies. I ran both Tidy and the Validator on the file (as is my normal HTML procedure, so it is second nature). Tidy reported 13 warnings. The W3C validator just show one (which you mention below). I misspoke when I said the errors were all from the validator. Now, from your comments, it looks like the next thing to do is decide on a standard title page. I personally, don't have a problem with the format you have. It is clean and easy to read. There are some things, as I mentioned before, that I would change on a CSS level, but generally, I like the layout well enough. Josh ----- Original Message ----- From: Marcello Perathoner > > At present there is a known bug in the title page. The converter > produces a H1 inside a SPAN. (The SPAN should be a DIV or the H1 should > be dropped.) > > This is a thing i would like to postpone until we are agreed about how > to format a title page. There are far too many ways you can do that > according to the specs. Supporting them all will be very difficult. > > I didn't get any other warnings running your example thru the validator. > Can you mail me the output of the validator with "show code" enabled? > > > > I'd also > > like to request a change to the CSS used, but that is a personal > > preference and something to really worry about after the > > show-stoppers are fixed. > > The CSS is in an external file. You can make your own. > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Fri Oct 22 06:41:18 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 06:41:47 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <20041022012030.GA23907@panix.com> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org> <4177E9CB.7080200@perathoner.de> <20041022012030.GA23907@panix.com> Message-ID: <41790DFE.3070606@perathoner.de> Jim Tinsley wrote: >>I feel Jim is raising artificial objections he knows we cannot overcome. >>If he doesn't want to learn TEI and he doesn't feel like proofing a TEI >>text in emacs, fine. But then, he should step aside and let other people >>do this work. > > I find this very offensive. > > I came home, and was reading happily enough through the threads until > this. I am sorry if I spoilt your evening and I apologize for that. I said "I feel" and that's the truth. Maybe it's just my fault. > these are the expectations I have had for > XML as far back as I can remember, and they are expectations regularly > assumed, if not met, by people who evangelize XML. Some of your expectation cannot be met. Some would imply an enourmous expense of time on the developers part to save relatively little time on your part. > I "learned TEI" > (not all of it, of course) with the hope of using it in PG, in late > 2001/early 2002, and I marked up my first book in XML in February, > 2002, which was long before I ever heard your name. I have marked up 25 books, prose, lyrics and plays. And I transformed all of them successfully to HTML, TXT, PDF and PalmDoc. That was a year ago. I could have done more but I felt that it was better to go public with what I had, to get comments and suggestions from other people. I thought if PG posted some of those files I would get comments. Since then I have been waiting. I think I have done my part. My files are done better than many I see posted. Even if we had to fix them later, the philosophy of PG did at some point expressly allow the posting of preliminary files. I cannot see why this simple request should cause so much trouble and fear today. These are some of your expectations that you should reconsider: > I really no longer give any headroom at all to the approach "Post XML > Now Because That Is The One True Way And We'll Figure Out How To Read > It Later." If for no other reason, then because the most important > part of the WW job is to check the texts before posting, and if we > can't read it, we can't find the errors, and if we can't find the > errors, we can't fix 'em. You can read a TEI file in an editor. You can spell-check it. You can validate it. You can find the errors. The process is just a bit different from what you have now, and will always be until there crop up some native TEI readers. > That > process must work for _all_ teixlite files, not just ones that are > specially cooked, using constraints not specified within the chosen > DTD. Here's where we hit the rocks today. Impossible. There are things you cannot specify in a DTD but still must be followed to get a semantically correct file. (This holds for every XML application not just for PGTEI.) You always have to obey some extra rules besides validity. These are put down in the PGTEI guide. > The > only things we must have -- both for our own internal practical > purposes and for the use of future readers -- is that it should work > reliably on _all_ texts that conform to the XML DTD chosen, be open > source, and be cross-platform. A reader needs to be able to tweak the > transform and re-run on her own desktop. Same as above. The DTD is not strict enough (RelaxNG will be better, but it's still early). There will always be valid TEI files that do not transform to `correct' output files. I don't see why it is necessary for the conversion tools to run on everybodies desktop before we can start posting files. If the tools run on pglaf.org and gutenberg.org that is more than enough for a start. The tools can be fixed later. That won't make posted valid TEI files invalid. -- Marcello Perathoner webmaster@gutenberg.org From juliet.sutherland at verizon.net Fri Oct 22 06:49:41 2004 From: juliet.sutherland at verizon.net (Juliet Sutherland) Date: Fri Oct 22 06:50:05 2004 Subject: [gutvol-d] Re: barriers to XML posting References: <1e2.2cc70258.2ea9861d@aol.com> <41783BAB.2020901@perathoner.de> <200410220906.i9M96rMS019592@posso.dm.unipi.it> Message-ID: <02b701c4b83d$fb0cc350$6501a8c0@Unicorn> If Carlo or someone else is willing to help with admnistering it, we can provide webspace, computing, and bandwidth on either the PGDP server or our test server to be used for this same purpose. Being located in the US, we would be following the same copyright rules at PG. We would also be happy to keep XML versions of any projects until PG is ready to accept them. JulietS ----- Original Message ----- From: "Carlo Traverso" To: Sent: Friday, October 22, 2004 5:06 AM Subject: Re: [gutvol-d] Re: barriers to XML posting > > The problem is how to have beta-testing AND respect PG tradition of > posting only definitive stuff. > > > Would this be useful? > > > I might offer web space, computing and bandwidth to post XML, convert > it to txt and html and what else, and submit the result to > whitewashing. > > You will be able to have installed all the software to handle the > conversion, and have submissions converted by automatic procedures. > > This might be seen as a beta-test of xml whitewashing procedures. I am > at most neutral to xml (I recognize its unavoidability, but I complain > the trend, I would prefer a more human-friendly markup). So it will > not be pro-XML biased. And I am authorized to whitewashing, so this > can be seen as making my whitewashing in public. > > The posting-and-converting should be automatic: a web interface to > submit a zip/tar.gz/tar.bz2 file, semi-automatic unzipping and > conversion, poster and site administrator OK to make the posting > public. Then the whitewashing could start WITHOUT corrections: if > anything in the result is wrong, then one should repeat the > submission. If the post will be XML + converted, or converted only, > will be PG choice. The posts, complete of XML, will remain > indefinitely on the test site. Of course, an additional line will be > included to warn that the file is not an official PG file but only an > intermediate working file. But except for this line, everything should > be identical to a PG file, header and footer, PG number and filename > included. > > Drawback: the server is located in Italy, so I cannot do it for non-EU > clearable items. You'll have to submit clearance for death+70 (with > procedures to decide, but a copy of a LOC authority record or an > encyclopaedia article will of course be enough). > > > > Carlo > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Fri Oct 22 06:52:18 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 06:52:43 2004 Subject: [gutvol-d] A layman's critique of PGTEI In-Reply-To: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com> References: <20041022130311.8ACE8EDCC0@ws6-1.us4.outblaze.com> Message-ID: <41791092.3010907@perathoner.de> Joshua Hutchinson wrote: > Now, from your comments, it looks like the next thing to do is decide > on a standard title page. I personally, don't have a problem with > the format you have. It is clean and easy to read. The title page you see is automatically generated from the teiHeader inside the transformation. You can also have a custom title page if you replace the with your own
like this:
Hamster Hooey and his Gooey Kablooie

by

Maple Syrup

bla bla

I didn't get very far implementing custom title pages because I have always worked from the PG text only. I never did see any real title page of a PG book and have no notion of how funny they get. If you can spare the time, it would help immensely if you could grab some representative scanned title pages at DP and put them up somewhere for everybody to see so we could discuss how to mark them up. -- Marcello Perathoner webmaster@gutenberg.org From juliet.sutherland at verizon.net Fri Oct 22 07:19:14 2004 From: juliet.sutherland at verizon.net (Juliet Sutherland) Date: Fri Oct 22 07:19:37 2004 Subject: [gutvol-d] Re: barriers to XML posting References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> Message-ID: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> At DP, most of us use ABBYY Finereader (versions ranging from 5.0 to 7.0) to do the OCR work. It does not currently have an option to save the result as XML, though I suppose they might well implement something like that eventually. Also, for proofreading purposes, it is much easier to work with material that does not yet have all the XML tags, etc. We have always planned to have formating rounds, and, in fact, they are currently in active development and I hope they will be in place by the end of the year. I expect that the nature of the formatting rounds will change with time. My hope, however, is that in most cases, even the people working in the formating rounds will not have to see all the verbosity that goes with XML and that can be represeted unambiguously in a more reader friendly way. Paragraph markers are an example that springs easily to mind. They can be added automatically later and it would be a serious waste of volunteer time to have to type those in in place of the blank line that we currently use. That case is trivially obvious, but most others may not be. One of the things that we will have to work out through experience is exactly what kinds of markup happen at which stage of the process. When one really gets into the details, there are a staggering number of them. A formating/markup issue that we've struggled with recently is teaching people how to know when to include a period inside italics and when not to. And then getting them to do it correctly. Yes, I know this doesn't matter for html, and won't matter for XML, but it does matter for plain text versions that use underscores to mark italics. I mention it only as an example of the tiddly, little, significant details that must be worked out in a day-to-day production environment. But back to my point. I expect that we will end up with a combination of automatic tools and manual intervention. Exactly what will happen where in the process remains to be determined. We'll try something, which inevitably won't be the right thing, and we will proceed with incremental changes until we end up with a system that works reasonably well. Ideally the output of that system will be an XML master file that can then be used to generate versions in whatever form any individual user requests. And answering another request from another message: I have LOTS of scanned title pages. Where would you like them? JulietS DP Site Admin ----- Original Message ----- From: "Joshua Hutchinson" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, October 22, 2004 8:54 AM Subject: Re: [gutvol-d] Re: barriers to XML posting ----- Original Message ----- From: Steve Thomas > > A question (possibly better put over on the DP list): > > Is it possible to OCR a scan directly to XML? Or is the output > from OCR always going to be text? > That is a very DP related question, but I'll answer here as best as I understand the future plans (and let others correct me where needed). The plan at DP is to move from the current 2 round proofing model to a (probably) 4 round proofing/markup model. The content provider will take the scans and OCR them normally. That part doesn't change. Then, there are 2 rounds of proofing that concentrate on typos, spelling, etc. Very similar to the 2 rounds we have now. Then, there are 2 MORE rounds of markup. Here is where all the markup like poetry, italics/bold, footnotes, chapter headings, thoughtbreaks, etc, etc are done. Then, when the final result gets out of 4 rounds, it is nicely marked up (in theory) XML. The post-processor does his/her normal magic, combining all the pages, running validators on it, etc. As far as the OCR process, we currently run some pre-processors on text to fix common scannos, etc. I'd be surprised if those pre-processors didn't improve/change as the XML world emerges at DP. Josh _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Fri Oct 22 07:21:22 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 07:21:46 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Jim Tinsley wrote: > > > That > > process must work for _all_ teixlite files, not just ones that are > > specially cooked, using constraints not specified within the chosen > > DTD. Here's where we hit the rocks today. > > Impossible. There are things you cannot specify in a DTD but still must > be followed to get a semantically correct file. (This holds for every > XML application not just for PGTEI.) You always have to obey some extra > rules besides validity. These are put down in the PGTEI guide. > Hmm... Maybe I misunderstand here. If a file comes in, marked up in TEI-Lite and we cannot transform it with our standard process, it seems to me either the DTD we've chosen is incomplete or the TEI markup has a bug. Now, if a new text needs a feature not in our current DTD (am I using the teminology right here), I'm not against modifying the DTD standard to include it, but there would need to be some procedure to do it so that it gets "reviewed" by others first. Or, maybe there is a way to define new elements that are outside the standard DTD within the XML submission file itself? Again, I'm trying to learn this as I go, so if my question is stupid, I apologize in advance. > > > The > > only things we must have -- both for our own internal practical > > purposes and for the use of future readers -- is that it should work > > reliably on _all_ texts that conform to the XML DTD chosen, be open > > source, and be cross-platform. A reader needs to be able to tweak the > > transform and re-run on her own desktop. > > Same as above. The DTD is not strict enough (RelaxNG will be better, but > it's still early). There will always be valid TEI files that do not > transform to `correct' output files. > > I don't see why it is necessary for the conversion tools to run on > everybodies desktop before we can start posting files. If the tools run > on pglaf.org and gutenberg.org that is more than enough for a start. The > tools can be fixed later. That won't make posted valid TEI files invalid. > If we have the tools on the server and available for use, that is sufficient for me. But I also think that all the files (DTD, XSLT, and whatever else) should always be available for download for the industrious person that DOES want to run it on their own machine. Josh From joshua at hutchinson.net Fri Oct 22 07:32:49 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 07:33:13 2004 Subject: [gutvol-d] A layman's critique of PGTEI Message-ID: <20041022143249.9E4889E945@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > If you can spare the time, it would help immensely if you could grab > some representative scanned title pages at DP and put them up somewhere > for everybody to see so we could discuss how to mark them up. > Hey, this I can do! I'll try to get a good cross section of project types. American Missionary (periodical) - http://www.pgdp.net/projects/projectID3f1ea8bfa6d0c/227.png Manhood Perfectly Restored (non-fiction pamphelt) - http://www.pgdp.net/projects/projectID4173613f31c06/003.png Mike Flannery On Duty and Off (novel fiction) - http://www.pgdp.net/projects/projectID4154ff24abb42/002.png The History of Woman Suffrage (non-fiction) - http://www.pgdp.net/projects/projectID403a76a8ebb0f/0001.png Josh From Bowerbird at aol.com Fri Oct 22 08:13:53 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 08:14:23 2004 Subject: [gutvol-d] Aside on old computers Message-ID: <9a.178ae7fc.2eaa7db1@aol.com> carlo said: > There should be something wrong > if in 12 years what was good for > a half-million-dollar research project > isn't even good for a forest village. well, that technology _did_ move _very_ fast in those 12 years, so i don't believe one could always say that unequivocally -- for instance, there's no reason to make the third world wait 12 years to get the cell-phones we have now, when we can give them to them immediately -- but nonetheless, it _is_ true that 12-year-old computers _can_ display an electronic-book just _fine_, if we use its resources _wisely_, rather than imposing bloatware on it instead... it's possible to build a state-of-the-art viewer that runs under windows95. i know, i've done it. -bowerbird From marcello at perathoner.de Fri Oct 22 08:38:07 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 08:38:35 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> Message-ID: <4179295F.1010702@perathoner.de> Juliet Sutherland wrote: > And answering another request from another message: I have LOTS of > scanned title pages. Where would you like them? Anywhere I can get them. Or, if you prefer, zip and mail them to me. I can put them up at PG. -- Marcello Perathoner webmaster@gutenberg.org From Gutenberg9443 at aol.com Fri Oct 22 08:39:07 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Oct 22 08:39:38 2004 Subject: [gutvol-d] English and request Message-ID: <155.41b2bdc1.2eaa839b@aol.com> In a message dated 10/22/2004 4:16:12 AM Mountain Standard Time, marcello@perathoner.de writes: >>But I fail to see how this could possibly imply that >>people who live in >>Africa wanted to learn English. English is becoming, if you will pardon a rather risible expression, the lingua franca of the business world. Most diplomacy is done in French and English. Of course someone who expects his/her offspring to remain in exactly their present location and circumstances has no need to learn, or to teach their offspring, any languages other than those spoken locally. But many small African countries have several different local languages--Ghana comes to mind at once, with three languages and many dialects of those languages. English is the official language there, because it's the only way that the country can get its business done when people of one cultural group can't even talk with people of another cultural group five miles away. Let's look at another small country, not in Africa. This is a quotation from the online version of World Book Encyclopedia: "New Guineans speak more than 700 languages. Because of the number of languages, many people cannot communicate with neighbors who live only a short distance away. A growing number of eastern New Guineans speak Pidgin English, or Tok Pisin, as a second language. This lingua franca, or common language, enables speakers of different tongues to communicate with one another. In the west, many people speak Malay as a second language." I have read elsewhere that those 700 languages involve 48 different language families. Tok Pisin works, but it is too awkward to use for anything more than local conversation. "Belly belonga me walk about too much" is an awkward way of saying "I have an upset stomach," and "big feller you punch him teeth him cry" doesn't immediately make me think of a piano. Malay is much better, but it still isn't a language that will allow somebody to get into the worldwide market. Afghanistan has three languages. Most middle and upper class Afghans also speak Farsi. I don't remember how many languages India has, but it's a lot. Most Americans do not understand that many Europeans routinely speak several languages, and do not realize that learning another language, or two or three other languages, should ideally start in infancy. Ideally, everybody worldwide would learn at least French and English in addition to their own local languages. In the real world, that's not going to happen. I have found that from speaking English and Spanish and having a working knowledge of Latin and linguistics, I can read fairly well in Portuguese and Italian. I miss a lot of words, but I can get the gist of what I'm reading. I'm not up to reading a French textbook, but I can usually wade through an article in Le Figaro. I'm hopeless in German, even if I recognize the root words, because I do not comprehend the way German is put together. I am not trying to be insular and I am not insulting anybody else's language. I hate to see any language die, because just about every language is able to express at least one thing that other languages can't. For reasons I do not understand, ancient Egyptian translates better into German than into French or English; therefore Egyptologists must learn German. I have to limp along on English translations, realizing that they are inadequate. I am certainly not saying that everybody in the world has to learn English. What I am saying is that, like it or not, it is one of the languages one must have to progress very far in life. Therefore, I think that books should be made available in English to as close as possible to everybody. I would be overjoyed if as many books were available in other languages, especially French, as are available in English. I hope this clarifies my position. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041022/ddf54f0b/attachment.html From scott_bulkmail at productarchitect.com Fri Oct 22 09:00:34 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Fri Oct 22 09:05:52 2004 Subject: [gutvol-d] scanned title pages In-Reply-To: <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> Message-ID: >And answering another request from another message: I have LOTS of scanned title pages. Where would you like them? If there is diskspace + bandwidth to host and serve them, I think it would be useful to post every title page somewhere. Or, at least to start with the top 100 or 1000 or some reasonable subset. For example, I recently did a massive review of the "catalog" data (posted to GUTCAT, alas with minimal response). With so many inconsistencies between various sources, I would like to be able to reference the original. -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From marcello at perathoner.de Fri Oct 22 09:09:25 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 09:09:52 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com> References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com> Message-ID: <417930B5.5040907@perathoner.de> Joshua Hutchinson wrote: > Hmm... Maybe I misunderstand here. If a file comes in, marked up in > TEI-Lite and we cannot transform it with our standard process, it > seems to me either the DTD we've chosen is incomplete or the TEI > markup has a bug. Consider following examples. A DTD-based validator can catch this:
01 Jan 2004
because a date has no business inside an address. But not this:
Chicago 2830 North Clark Curl Up and Dye Beauty Salon
The validator cannot know that the markup is all wrong. Of course this will _transform_ all right. > Now, if a new text needs a feature not in our current DTD (am I using > the teminology right here), I'm not against modifying the DTD > standard to include it, but there would need to be some procedure to > do it so that it gets "reviewed" by others first. TEI has a well documented interface for exactly this purpose. Experience has shown that not even the full TEI can accomodate all cases. So, if you need to mark up something completely new, as eg. the message you just got from an alien civilization, you can expand the TEI DTD and still conform to the TEI standard. > Or, maybe there is a way to define new elements that are outside the > standard DTD within the XML submission file itself? Again, I'm > trying to learn this as I go, so if my question is stupid, I > apologize in advance. No. All you can define inside an XML file is the DTD (or other schema) you want to use and entities like &myentity; Of course you can use a DTD that defines some stuff and then includes the standard TEI DTD. But, as said above, there is a better way to do that in TEI. > If we have the tools on the server and available for use, that is > sufficient for me. But I also think that all the files (DTD, XSLT, > and whatever else) should always be available for download for the > industrious person that DOES want to run it on their own machine. Already done. Start here: http://www.gutenberg.org/tei/ -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Fri Oct 22 09:25:11 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 09:25:47 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <1f4.15d8e76.2eaa8e67@aol.com> marcello said: > That may not be simple but is still better than what you have now: > if the txt file happens to lack the page numbers > there is no way you could get them short of redoing the book. you wouldn't need to "redo the book" to insert page number information. and if the .html file had that information, you could do it automatically. and if page-number information _was_ included in the text-file, i would support it in my viewer-program. so tony, or any other user, could simply toggle its display on or off, by choosing a menu-item. it's ridiculous to have users go through all the difficulty of doing a conversion to access such a simple and basic piece of information. y'all should step back and look at yourselves for even suggesting it. and why should the text-file "happen to lack the page numbers" in the first place? that kind of terminology makes it sound so "accidental". distributed proofreaders retains page-number information through all of its processes, because, get this, they find it's useful to them. but then they drop it from the final product! why? don't they realize that someone else might find it useful? of course they do, that's why people have started _retaining_ it (i almost said "including" it, but that's the same type of error) in the .html versions. but nonetheless, it is still dropped from the text-file. just like the information about the names of image-files and their correct placement. it's as if there was a conscious attempt to make the text-files as useless as possible. and what will the users at large think when they are informed that this policy is in place? i don't know for sure, but i'm gonna find out. -bowerbird From Bowerbird at aol.com Fri Oct 22 09:30:52 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 09:31:25 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: marcello said: > But this is sooo tedious! We have to replicate the exact setup of > gutenberg.org *and* pglaf.org to get reliable results from the beta-test. > Example: my servers are all debian and have perl 5.8 > whereas ibiblio is redhat enterprise with perl 5.6. > This has often before given me headache because > programs that ran at home, misteriously failed at ibiblio. if it can't even span 2 versions of linux which run perl that is .2 versions apart, that means the process is very fragile; not nearly as robust as x.m.l. advocates always make it sound in their sales pitch. -bowerbird From Bowerbird at aol.com Fri Oct 22 09:34:16 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 09:34:48 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <8e.181c91ac.2eaa9088@aol.com> holden said: > So, whereas before, you had to have the standard TXT or HTML versions > because that was all that was available, now we can actually talk about > making customised versions as people want them. so every time a person wants to switch an option, they have to go and do the conversion over again? do you really not see why that won't appeal to them? -bowerbird From joshua at hutchinson.net Fri Oct 22 09:38:42 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 09:39:09 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <20041022163842.776FD2F8CB@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > > holden said: > > So, whereas before, you had to have the standard TXT or HTML versions > > because that was all that was available, now we can actually talk about > > making customised versions as people want them. > > so every time a person wants to switch an option, > they have to go and do the conversion over again? > > do you really not see why that won't appeal to them? > As opposed to not having an option at all, like we have now? So, they can either have exactly what they have now... or if the want to, they can have more with a little extra effort. You're right, we shouldn't give them anything new. Oh, and don't tell me the reader program should do it. The reader program will never be the same for every reader, even if you do actually produce a working program. The only way to provide this information that is platform/reader program independent is to somehow put that into the source in a standard format that multiple reader programs will support. So far, XML is the *only* format anyone has suggested that will allow that. Josh From marcello at perathoner.de Fri Oct 22 09:44:25 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 09:44:53 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <1f4.15d8e76.2eaa8e67@aol.com> References: <1f4.15d8e76.2eaa8e67@aol.com> Message-ID: <417938E9.7030307@perathoner.de> Bowerbird@aol.com wrote: > and if page-number information _was_ included in the text-file, > i would support it in my viewer-program. so tony, or any other user, > could simply toggle its display on or off, by choosing a menu-item. You are being narrow-minded about this. What if the user wanted to see *only* the page numbers. Your reader does not support that. But with a simple XSL transformation the user can easily strip all except the page numbers from the TEI master file. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Fri Oct 22 10:03:29 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Oct 22 10:03:59 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com> Steve Thomas writes: > Is it possible to OCR a scan directly to XML? Or is the output > from OCR always going to be text? We don't usually scan to text; we scan to RTF, and guiprep extracts some of the markup and converts it to lightly marked up text. guiprep could certainly convert the RTF to XML if we wanted, but DP plans to seperate the markup and proofing rounds. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Fri Oct 22 10:10:52 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 10:11:28 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: joshua said: > As opposed to not having an option at all, like we have now? except there's a _much_ easier way to give them the option. it's the same way you do it with .html -- put it into the file in a way that enables them to turn it on and off as they wish. if you're not imaginative enough to figure out how to do that, then just sit on your hands until you see how i accomplish it. if you ain't gonna put the information in the file, no viewer on the surface of the planet can put it in. but that's _your_ fault, not the fault of the program. and it shows you don't have the user's interest at heart. > Oh, and don't tell me the reader program should do it. telling _you_ is an exercise in futility. but when i tell other people that a reader-program should do it, and give them one which actually does it, _they_ will understand. and then _they_ will start telling you to include that information. i've tried speaking to "the powers that be" directly, and found them nonresponsive, so i'll route stuff through the _users_ from now on. -bowerbird From Bowerbird at aol.com Fri Oct 22 10:18:06 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 10:18:38 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: marcello said: > You are being narrow-minded about this. that's an _incredibly_ stupid thing to say. > What if the user wanted to see *only* the page numbers. > Your reader does not support that. well, my program doesn't support original-page-numbers _at_all_ yet, we're talking about what i _will_ implement, so i don't see how you can assert anything on the matter. i'm not sure i understand the request, anyway -- you want to see _only_ the page numbers, nothing else?, i don't see any utility in that, you would have to explain it, but since the user can control the color of the text, they'd just match it to the background color to make it "disappear" -- but if it is something that readers will want, i _will_ support it. -bowerbird From jonathan_ingram at yahoo.com Fri Oct 22 10:19:11 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Fri Oct 22 10:19:37 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: Message-ID: <20041022171911.82526.qmail@web41721.mail.yahoo.com> --- Bowerbird@aol.com wrote: > joshua said: > > As opposed to not having an option at all, like we have now? > > except there's a _much_ easier way to give them the option. > > it's the same way you do it with .html -- put it into the file > in a way that enables them to turn it on and off as they wish. That's what we do, and the reader is any Mozilla derivative. If you don't wish to use a Mozilla derivative, then when we switch to a TEI-based master format, you can generate files to used in other readers, such as any text editor, and which will contain as much of the information from the original as you require. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From marcello at perathoner.de Fri Oct 22 10:19:14 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 10:19:46 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: References: Message-ID: <41794112.9050309@perathoner.de> Bowerbird@aol.com wrote: > if it can't even span 2 versions of linux > which run perl that is .2 versions apart, > that means the process is very fragile; > not nearly as robust as x.m.l. advocates > always make it sound in their sales pitch. Bumbling along without a clue, as usual? perl has nothing to do with XML and the software I was speaking about drives the catalog. -- Visit the Bowerbird Fansite: www.gnutenberg.de/bowerbird/ From jonathan_ingram at yahoo.com Fri Oct 22 10:22:49 2004 From: jonathan_ingram at yahoo.com (Jonathan Ingram) Date: Fri Oct 22 10:23:17 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: Message-ID: <20041022172249.81135.qmail@web41727.mail.yahoo.com> --- Bowerbird@aol.com wrote: > i'm not sure i understand the request, anyway -- > you want to see _only_ the page numbers, nothing else?, > i don't see any utility in that, One of the advantages of using a standard, structured, and well-supported format for marking our texts is that we can do things with them that you don't see the utility of. -- Jon Ingram _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com From Bowerbird at aol.com Fri Oct 22 10:28:25 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 10:28:58 2004 Subject: [gutvol-d] Re: barriers to XML posting Message-ID: <1b9.463d0db.2eaa9d39@aol.com> marcello said: > Bumbling along without a clue, as usual? your process is _fragile_. and everybody can see that. the person who would be the emperor's tailor is naked. -bowerbird From marcello at perathoner.de Fri Oct 22 10:29:45 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 10:30:15 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: References: Message-ID: <41794389.9080702@perathoner.de> Bowerbird@aol.com wrote: > if you ain't gonna put the information in the file, > no viewer on the surface of the planet can put it in. > but that's _your_ fault, not the fault of the program. > and it shows you don't have the user's interest at heart. Assiming a worm has eaten himself thru a fat and juicy word in a book. How do you mark that up in ZML ? This is how we do it in TEI: Bwerbird And you ? Will there be a menu item to show / hide wormholes ? Or are you gonna sacrifice the users interest over an inadequacy of your reader program ? -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Fri Oct 22 10:32:56 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 10:33:35 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <9a.178fd9b0.2eaa9e48@aol.com> ingram said: > One of the advantages of using > a standard, structured, and well-supported format > for marking our texts is that we can do things with them > that you don't see the utility of. if users want something, i'll provide it, even if i _cannot_ see the utility of it... but perhaps _you_ can explain to me the utility of showing page-numbers and nothing else? because, for the _life_ of me, it escapes me right now. i've only thought about it for 5 minutes, and maybe something will come to me just as soon as i hit "send" on this, but... -bowerbird From jon at noring.name Fri Oct 22 10:35:32 2004 From: jon at noring.name (Jon Noring) Date: Fri Oct 22 10:36:06 2004 Subject: [gutvol-d] Re: barriers to XML posting In-Reply-To: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com> References: <20041022170330.0589F4BDAA@ws1-1.us4.outblaze.com> Message-ID: <108705515875.20041022113532@noring.name> D. Starner wrote: > Steve Thomas writes: >> Is it possible to OCR a scan directly to XML? Or is the output >> from OCR always going to be text? > We don't usually scan to text; we scan to RTF, and guiprep extracts > some of the markup and converts it to lightly marked up text. guiprep > could certainly convert the RTF to XML if we wanted, but DP plans > to seperate the markup and proofing rounds. It is certainly possible to OCR directly to "XML", but it won't be very useful XML. It is nigh impossible to train an OCR program, unless we get breakthroughs with AI so we can build machines with human intelligence, to unambiguously recognize and markup the *structure* and *semantics* of documents and textual content (such as using the TEI vocabulary designed for this purpose.) Thus, there must be substantial human interaction to determine what any chunk of text represents (structurally/semantically). Of course, if the goal is simply to "clone" the original printed text's visual presentation, then forget the above. But then the resulting cloned text is a lot less useful for repurposing, for accessibility and for other advanced purposes. Jon Noring From Bowerbird at aol.com Fri Oct 22 10:46:02 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 10:46:37 2004 Subject: [gutvol-d] aspects of a well-done e-book Message-ID: <1b9.4659080.2eaaa15a@aol.com> marcello said: > Assiming a worm has eaten himself > thru a fat and juicy word in a book. > How do you mark that up in ZML ? with an annotation. or with a form of emphasis you'd label as "wormhole". (how is this _currently_ indicated in the e-texts? oh, never mind, don't answer that, nobody cares...) besides, this is the kind of question you should be asking on the beta-test listserve for the z.m.l. viewer-program... people on _this_ list don't need to read about the simple implementational details of z.m.l., they just need to know that it's a good way to escape from the complexity of x.m.l. markup, a way that gives powerful e-book functionality, and yet still resonates with the plain-text files they are familiar with from project gutenberg... -bowerbird From hart at pglaf.org Fri Oct 22 10:53:31 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Oct 22 10:53:32 2004 Subject: [gutvol-d] Languages in PG In-Reply-To: References: <1e3.2c7b180e.2ea83cf0@aol.com> Message-ID: Don't forget all the languages available at pgcc.net Michael From joshua at hutchinson.net Fri Oct 22 10:53:33 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 10:54:00 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Joshua Hutchinson wrote: > > > Hmm... Maybe I misunderstand here. If a file comes in, marked up in > > TEI-Lite and we cannot transform it with our standard process, it > > seems to me either the DTD we've chosen is incomplete or the TEI > > markup has a bug. > > Consider following examples. > > A DTD-based validator can catch this: > >
> 01 Jan 2004 >
> > because a date has no business inside an address. > > But not this: > >
> Chicago > 2830 North Clark > Curl Up and Dye Beauty Salon >
> > The validator cannot know that the markup is all wrong. Of course this > will _transform_ all right. > > Ok, I am learning here, honest. But here's another dumb question in the meantime. Shouldn't a TEI-Lite validator flag your second example as wrong, too? Looking over the TEI-Lite documentation, you could markup that information, but in a slightly different format.
Chicago 2830 North Clark Curl Up and Dye Beauty Salon
Now, putting aside the fact that I doubt I'd ever bother to mark up an address to that exacting of a detail ;) ... Am I understanding the role of the validator properly in that it should choke on the first, since it doesn't, as far as I can tell, conform to TEI-Lite? > No. All you can define inside an XML file is the DTD (or other schema) > you want to use and entities like &myentity; > > Of course you can use a DTD that defines some stuff and then includes > the standard TEI DTD. But, as said above, there is a better way to do > that in TEI. That seems acceptable to me. For instance, to continue you example above, if you wanted to add , , and to the
markup, then you could put those elements in your personally DTD, which calls the TEI-Lite DTD, and then the validator should be able to parse it as acceptable code, right? But the question then becomes, will the standard transform be able to handle the new code in your DTD? If it just ignores what it doesn't understand, that would be acceptable, I'd think. But if the new tags cause the transform to choke, then we'd have a problem. Josh From marcello at perathoner.de Fri Oct 22 10:53:36 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 10:54:04 2004 Subject: [gutvol-d] aspects of a well-done e-book In-Reply-To: <9a.178fd9b0.2eaa9e48@aol.com> References: <9a.178fd9b0.2eaa9e48@aol.com> Message-ID: <41794920.5050109@perathoner.de> Bowerbird@aol.com wrote: > but perhaps _you_ can explain to me > the utility of showing page-numbers > and nothing else? because, for the > _life_ of me, it escapes me right now. YHBT. YHL. HAND. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Fri Oct 22 11:02:57 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 11:03:24 2004 Subject: [gutvol-d] Languages in PG Message-ID: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com> That isn't PG, though, so it doesn't really apply to a discussion about the numbers of languages we have support for. Josh ----- Original Message ----- From: Michael Hart To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Languages in PG Date: Fri, 22 Oct 2004 10:53:31 -0700 (PDT) > > > > Don't forget all the languages available at pgcc.net > > > Michael > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Fri Oct 22 11:05:45 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 22 11:06:14 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com> References: <20041022175333.EC53EEDF76@ws6-1.us4.outblaze.com> Message-ID: <41794BF9.6060503@perathoner.de> Joshua Hutchinson wrote: > Shouldn't a TEI-Lite validator flag your second example as wrong, > too? Looking over the TEI-Lite documentation, you could markup that > information, but in a slightly different format. That was just a general example. It was not meant to be specific to TEI. > Now, putting aside the fact that I doubt I'd ever bother to mark up > an address to that exacting of a detail ;) ... Am I understanding the > role of the validator properly in that it should choke on the first, > since it doesn't, as far as I can tell, conform to TEI-Lite? It will choke if you validate my example against the TEI DTD. But I could write a SOMETHING DTD that validates that example all right. > That seems acceptable to me. For instance, to continue you example > above, if you wanted to add , , and to the >
markup, then you could put those elements in your > personally DTD, which calls the TEI-Lite DTD, and then the validator > should be able to parse it as acceptable code, right? Yes. > But the question then becomes, will the standard transform be able to > handle the new code in your DTD? If it just ignores what it doesn't > understand, that would be acceptable, I'd think. But if the new tags > cause the transform to choke, then we'd have a problem. A standard TEI transform will simply ignore all tags he doesn't recognize. Just like a HTML browser does. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Fri Oct 22 11:10:03 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Oct 22 11:10:05 2004 Subject: [gutvol-d] scanned title pages In-Reply-To: References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> Message-ID: <20041022181003.GB24510@pglaf.org> On Fri, Oct 22, 2004 at 12:00:34PM -0400, Scott Lawton wrote: > >And answering another request from another message: I have LOTS of scanned title pages. Where would you like them? > > If there is diskspace + bandwidth to host and serve them, I think it would be useful to post every title page somewhere. Or, at least to start with the top 100 or 1000 or some reasonable subset. > > For example, I recently did a massive review of the "catalog" data (posted to GUTCAT, alas with minimal response). With so many inconsistencies between various sources, I would like to be able to reference the original. I do have all of the title pages & verso pages submitted electronically. This is thousands and thousands of images. Our new copyright system makes it relatively easy to just find one online (though I have not made this feature available - but it's easy). I think this will be a good method for the future. The older system is not as easy, but I still have the images. Just email me if you need images for a particular item. If you'd rather, I could package up the older clearances (pre-August '04 or so) and get them to you. It's probably < 2GB total. N.B., this stuff is not suitable for public redistribution with our eBooks. Many scans are not very high quality. Some are, and it would be fine with me to make them publicly available somewhere. I don't have much opinion about including these with the eBooks themselves - that's something for the producer to decide. Most title & verso pages are pretty boring, though, so probably are not worth including as part of an eBook. -- Greg From joshua at hutchinson.net Fri Oct 22 11:23:58 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 11:24:26 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <20041022182358.9FF3A4F4E0@ws6-5.us4.outblaze.com> Excellent! Your answers are exactly what I was hoping they would be! (Does this mean I'm starting get my brain wrap around this stuff? ;) ) I think the next step(s) should be making some XML, running in through the transforms, critiqing the output and improve the transform/CSS until we have a workable process, yes? I'll try to take a look at some of the XML you and Joeren have done up to this point and see what the transform on the server does with them. We've already identified a need for a "standard" title page format, so that'll be my first area I'll look at. Josh ----- Original Message ----- From: Marcello Perathoner To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] barriers to XML posting Date: Fri, 22 Oct 2004 20:05:45 +0200 > > Joshua Hutchinson wrote: > > > Shouldn't a TEI-Lite validator flag your second example as wrong, > > too? Looking over the TEI-Lite documentation, you could markup that > > information, but in a slightly different format. > > That was just a general example. It was not meant to be specific to TEI. > > > Now, putting aside the fact that I doubt I'd ever bother to mark up > > an address to that exacting of a detail ;) ... Am I understanding the > > role of the validator properly in that it should choke on the first, > > since it doesn't, as far as I can tell, conform to TEI-Lite? > > It will choke if you validate my example against the TEI DTD. But I > could write a SOMETHING DTD that validates that example all right. > > > That seems acceptable to me. For instance, to continue you example > > above, if you wanted to add , , and to the > >
markup, then you could put those elements in your > > personally DTD, which calls the TEI-Lite DTD, and then the validator > > should be able to parse it as acceptable code, right? > > Yes. > > > But the question then becomes, will the standard transform be able to > > handle the new code in your DTD? If it just ignores what it doesn't > > understand, that would be acceptable, I'd think. But if the new tags > > cause the transform to choke, then we'd have a problem. > > A standard TEI transform will simply ignore all tags he doesn't > recognize. Just like a HTML browser does. > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Fri Oct 22 11:29:59 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 11:30:30 2004 Subject: [gutvol-d] presentation *is* structure (it's right in front of your eyes) Message-ID: <12d.4debe45c.2eaaaba7@aol.com> noring said: > It is certainly possible to OCR directly to "XML", > but it won't be very useful XML. this is an important point here, folks, pay attention. although lots and lots of the time, the x.m.l. advocates talk about x.m.l. as if it were uniformly high-quality -- which it needs to be to spit out all those conversions, or magically transform into a different breed of x.m.l., two qualities that are often discussed as "automatic" -- the fact of the matter is that x.m.l. markup can be awful, and serve no real purpose that is of use to us... you should look at the x.m.l. that some apps churn out. so even after a decision is made as to which brand of t.e.i. to use (such as tei-lite), the real hard decisions about how to implement a markup strategy still exist. and after those are answered, the even more difficult decisions about how to actually implement the strategy will rear their ugly heads. it's gonna be a long road, folks. > It is nigh impossible to train an OCR program, unless > we get breakthroughs with AI so we can build machines > with human intelligence, to unambiguously recognize > and markup the *structure* and *semantics* of > documents and textual content (such as using > the TEI vocabulary designed for this purpose.) if you're waiting for "artificial intelligence" to come through, you're gonna be waiting for a really long time. > Thus, there must be substantial human interaction to determine > what any chunk of text represents (structurally/semantically). that is the common understanding. it is also wrong. it might (or might not) be true of the _semantic_ nature of a "chunk". (but we can put that matter aside, because _that_ issue is gonna be difficult enough even when you have _humans_ work on it directly.) but it is _definitely_ not true for the _structural_ role of a chunk. in any book that was prepared by a professional typographer, _presentation_ *is* _structure_, because that is _exactly_ what a good typographer does, uses _presentation_ to show _structure_. that's why humans don't have trouble figuring out a book's structure. i don't blame people for telling you this. they don't know any better. but the mere fact that they don't know any better does _not_ mean that you have to believe what they tell you. because they are wrong. when they tell you the only way you can have that information is to have humans encode it in a complex markup system, don't believe it. they are wrong, and their mistake will waste _tons_ of your labor. their emperor is naked, and you must tell them they need to go away. you can get that information, easily. it's right in front of your eyes. of course, when they willy-nilly flatten the o.c.r. results of a book to plain text, they throw away most of that valuable information. but _even_then_, it's possible to ascertain most all of its structure. for an obvious example, people recognize headers because they are big and bold. strip away fontsize and styling, it gets more difficult. nonetheless, if you're smart, you can still locate headers accurately. you can even write computer routines that will do it for you. fast. i know, because i've written 'em. other people could write 'em too. i repeat: in a well-laid-out book, presentation _is_ structure. and that is the message i have been communicating here for a year. but nobody here seemed to want to believe it. your advance notice period has expired now, so i will go and tell the rest of the world... -bowerbird From hart at pglaf.org Fri Oct 22 12:00:58 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Oct 22 12:01:00 2004 Subject: [gutvol-d] re: coming to michael doorstep with hat in hand In-Reply-To: <4177A743.6020108@perathoner.de> References: <8b.17ed6e78.2ea83dc0@aol.com> <41770A4E.9030305@hutchinson.net> <4177A743.6020108@perathoner.de> Message-ID: I'm just going to as all parties concerned to dial it down a few notches. From hart at pglaf.org Fri Oct 22 12:08:27 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Oct 22 12:08:27 2004 Subject: [gutvol-d] Languages in PG In-Reply-To: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com> References: <20041022180257.C0C05EDF79@ws6-1.us4.outblaze.com> Message-ID: On Fri, 22 Oct 2004, Joshua Hutchinson wrote: > That isn't PG, though, so it doesn't really apply to a discussion about the > numbers of languages we have support for. Just making sure people know there are over 100 languages available there for the taking when/if they want them. mh From ke at gnu.franken.de Fri Oct 22 12:33:49 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Oct 22 13:30:56 2004 Subject: [gutvol-d] Re: Languages in PG In-Reply-To: (Andrew Sly's message of "Wed, 20 Oct 2004 15:40:15 -0700 (PDT)") References: <1e3.2c7b180e.2ea83cf0@aol.com> Message-ID: Andrew Sly writes: > Also, the numbers below (taken from the catalog) show that, > although PG's non-english content can certainly be expanded, > it is not insignificant: > French (367) > German (307) > Finnish (85) > Chinese (69) > Spanish (59) > Italian (36) Not too bad. German is "slow" because many good texts are available elsewhere. It starts with http://gutenberg.spiegel.de; continues with sites dedicated to special authors like Karl May, Arno Schmidt, Novalis, or Georg Simmel; and does not end with digitizing projects located at Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck). Especially the Austrian project (alo - austrian literature online: http://www.literature.at/) is very interesting even if seem to offer only PDF "for free". More German texts are tracked at http://www.litlinks.it -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From joshua at hutchinson.net Fri Oct 22 13:38:21 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 22 13:38:50 2004 Subject: [gutvol-d] Re: Languages in PG Message-ID: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com> Interesting... Can any of those sites be raided for content to bolster our German titles? (I can't read German, so my checking directly wouldn't do me any good!) Josh ----- Original Message ----- From: Karl Eichwalder To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Re: Languages in PG Date: Fri, 22 Oct 2004 21:33:49 +0200 > > Andrew Sly writes: > > > Also, the numbers below (taken from the catalog) show that, > > although PG's non-english content can certainly be expanded, > > it is not insignificant: > > French (367) > > German (307) > > Finnish (85) > > Chinese (69) > > Spanish (59) > > Italian (36) > > Not too bad. German is "slow" because many good texts are available > elsewhere. It starts with http://gutenberg.spiegel.de; continues with > sites dedicated to special authors like Karl May, Arno Schmidt, Novalis, > or Georg Simmel; and does not end with digitizing projects located at > Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck). > Especially the Austrian project (alo - austrian literature online: > http://www.literature.at/) is very interesting even if seem to offer > only PDF "for free". > > More German texts are tracked at http://www.litlinks.it > > -- > | ,__o > | _-\_<, > http://www.gnu.franken.de/ke/ | (*)/'(*) > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Fri Oct 22 14:20:47 2004 From: jon at noring.name (Jon Noring) Date: Fri Oct 22 14:21:53 2004 Subject: [gutvol-d] presentation *is* structure (it's right in front of your eyes) In-Reply-To: <12d.4debe45c.2eaaaba7@aol.com> References: <12d.4debe45c.2eaaaba7@aol.com> Message-ID: <84719031375.20041022152047@noring.name> Bowerbird wrote: > Jon Noring said: >> It is certainly possible to OCR directly to "XML", but it won't be >> very useful XML. > although lots and lots of the time, the x.m.l. advocates > talk about x.m.l. as if it were uniformly high-quality -- > which it needs to be to spit out all those conversions, > or magically transform into a different breed of x.m.l., > two qualities that are often discussed as "automatic" > -- the fact of the matter is that x.m.l. markup can be > awful, and serve no real purpose that is of use to us... Definitely. The key is to use the right markup vocabulary and apply it consistently. Any system representing document structure (such as ZML) must be "right and sufficient" and be applied consistently. Those who understand and speak of XML, they know that XML is not in and of itself a specific markup vocabulary, it is a rule-set or framework on how to apply markup to textual content. There are an infinity of markup vocabularies, and a good markup vocabulary depends upon the purpose of the markup. XML is used for both database and publishing applications, and there are many extraordinarily successful applications of XML. One of the most recent applications of XML which a lot of people recognize and use is RSS, used for blog feeds and the like. XHTML is used a lot on the Internet, and is no more complex (in fact it is simpler in many ways) than legacy HTML. ZML is an example of a "regularized plain text" system to represent certain important textual document structures in a way which is fully machine-readable. I could easily create an XML-based markup vocabulary clone of the ZML system to represent the same identical structures. >> Thus, there must be substantial human interaction to determine >> what any chunk of text represents (structurally/semantically). > in any book that was prepared by a professional typographer, > _presentation_ *is* _structure_, because that is _exactly_ what > a good typographer does, uses _presentation_ to show _structure_. > that's why humans don't have trouble figuring out a book's structure. Definitely. But what we require is to be able to machine-read and machine-process the structure and semantics of a textual document. Even if humans can figure this out by a simple visual glance of the content in a high-typographic-quality presentation, does not automatically mean it is easy for machines to do likewise. It is also not easy to codify because visual presentation is "fuzzy" (pun not intended), sometimes relying on surrounding context to precisely define the document structure. We have to remember that there are a lot of variances in conventions (both historically and geographically) used for typographic layouts to visually represent structure and semantics. Not only that, in some cases they don't even follow conventions, especially when there are oddities in the content where no convention has been firmly established. And as previously noted, sometimes the context must be factored in to fully ascertain structure and semantics. The "Gedanken" test I use for the minimum requirements of machine- readable markup (or system such as ZML) for textual documents is if a text-to-speech engine is potentially capable of communicating the structure and semantics of the content to a blind listener (who is unfamiliar with any print conventions -- they've never heard the terms 'italic' or 'bold') so they can, in real-time (i.e., a one-time linear audio presentation), gain the same level of comprehension as a sighted person (familar with typographic conventions) would in reading a high-quality print version of the text. Pass this test, and the markup will likely be pretty good for just about any purpose in addition to accessibility. Is ZML or other type of "regularized plain text" (or the XML-based ZML markup vocabulary analog) sufficient to pass this test? > when they tell you the only way you can have that information is to > have humans encode it in a complex markup system, don't believe it. The system only needs to be as complicated as needed to represent the needed document structures and content semantics in a machine-readable way such that it passes the test described above. The $64,000 question therefore is what structure and semantics needs to be represented in a machine-readable way, and to what degree of precision. Maybe ZML (and its markup analog) is sufficient, maybe it isn't. I interpret from those here who have first-hand experience handling large numbers of the various types of texts in Project Gutenberg, that ZML (or any other type of "regularized plain text" system) does not have sufficient granularity to pass the "test." Of course, we can argue whether the test as I describe above is too strict, or maybe not even on-target. But keep in mind this is what the *accessibility community* wants in machine-readable textual documents, and what they are working towards in their activities -- they've wholeheartedly embraced XML-based approaches, for example. To wave one's hand in dismissal and say they are being unrealistic or stupid, or that they don't really matter in our decision-making, is a pretty bigoted and "blind" position (pun intended) to take -- it is also stupid since meeting their needs for structure and semantics has many other benefits as well. I might ask a few text-to-speech experts I know at DAISY to look at the ZML system and tell me if it has sufficient structural granularity for high-quality text-to-speech purposes. As far as I am concerned, if they come back and say "no it doesn't", then I would recommend that PG should not consider ZML for its Master format, but maybe consider ZML for its plain text output versions. > for an obvious example, people recognize headers because they are > big and bold. strip away fontsize and styling, it gets more > difficult. nonetheless, if you're smart, you can still locate > headers accurately. you can even write computer routines that will > do it for you. fast. i know, because i've written 'em. other > people could write 'em too. Bold lines which appear by themselves in the flow of text are sometimes used for structures other than headers. There are many other similar weirdities involved with italicized text, indented text, etc., that we see in visual layouts of texts. Context is often important to consider to unambiguously discern structure for a visual cue. For example, one convention often used is that the names of ships is to be italicized. Thus, if a machine is to discern the name of a ship from linguistically emphasized text, it has to look at the context. > i repeat: in a well-laid-out book, presentation _is_ structure. No, I'd say it is more accurate to say "for reading by eyesight, structure is represented by visual presentation cues." Remember, there are different types of presentation of text, not only visual. To focus on visual as the only form of presentation that matters is being very short-sighted (pun intended.) The only time we must give up and focus only on the visual is when visual presentation is an important and integral part of the content itself, such as "poetry as art" and similar avante-garde things. (Here SVG is of especial appeal, so we have an XML-based solution for this as well.) > and that is the message i have been communicating here for a year. > but nobody here seemed to want to believe it. your advance notice > period has expired now, so i will go and tell the rest of the world... And I've stated the core question to answer is: "Is ZML (or any other system of regularized plain text) sufficient to represent document structure and semantics for Project Gutenberg Master texts?" I assume Bowerbird is saying "yes", and many others here are saying "No". I answer the question with a "No". Amusingly, Networker, a very insightful ebook expert who often posts to The eBook Community, calls ZML a type of ITF, "Impoverished Text Format", to indicate ZML has insufficient granularity -- it is "impoverished". Jon Noring From jtinsley at pobox.com Fri Oct 22 16:51:23 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Oct 22 16:51:58 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <41790DFE.3070606@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> <20041020173528.GB3366@panix.com> <4176BDEF.7050008@perathoner.de> <20041020205934.GA22445@panix.com> <4177901F.7010006@perathoner.de> <20041021150227.GA17442@pglaf.org> <4177E9CB.7080200@perathoner.de> <20041022012030.GA23907@panix.com> <41790DFE.3070606@perathoner.de> Message-ID: <20041022235123.GD27926@panix.com> On Fri, Oct 22, 2004 at 03:41:18PM +0200, Marcello Perathoner wrote: >Jim Tinsley wrote: > >>>I feel Jim is raising artificial objections he knows we cannot overcome. >>>If he doesn't want to learn TEI and he doesn't feel like proofing a TEI >>>text in emacs, fine. But then, he should step aside and let other people >>>do this work. >> >>I find this very offensive. >> >>I came home, and was reading happily enough through the threads until >>this. > >I am sorry if I spoilt your evening and I apologize for that. You didn't spoil my evening; just my participation in the thread. Having been so accused of evilly blocking the righteous progress of destiny because of my own hidden agenda and neuroses, it's hard to say constructive things. But I will say one more thing: if you read what I actually _said_ you will realize that almost everyone in this thread -- I would think -- could fairly easily create an XML and transform that meets the criteria I laid down. I could myself. So could you, or Jeroen, for sure. Josh and Jon, no problem. Anyone I've left out? Of course, none of us could do it for ALL texts. Not yet. But it doesn't need to be done for all texts; that was explicitly stated. If somebody wants to set up a standard that works for prose texts containing Title, Author, Chapter Heads, Paragraphs, Verses, Letter Headings and Signatures -- plus emphasis and languages, and try to work with that for a while, that would do. And which of us could NOT do that with just Xalan or Saxon, a simple XSLT, and quite a limited HTML-to-text converter? Of course, it wouldn't handle Alice. It wouldn't handle footnotes or tables. But for books that don't need these features it would work fine. There would be some details to work out in how the PG header works with them, and maybe the XML file itself should contain a description of how the HTML and text formats were derived, so that when we fixed the texts we would know how to remake them, or that some future reader could re-do the transform to their own tastes. And it would be good if, having got all that straight, we could set it up and document it as a standard so that other people wouldn't need to reinvent that wheel. It may be limited, but nobody said that we have to have a standard to cover all cases before tackling any. And then the people who are interested could go on to add more features, enlarging the standard. I'm surprised, after last year, that nobody has done this already. I'm surprised that you and Jeroen, who, in your different ways, had the best shot at XML didn't get together on it. Certainly, Greg has been asking you both about it. It _would_ be nice if we had a few people working together on it, so we get a shared understanding and consensus. Frankly, what is going to happen is that a few people at DP are going to forge a workable standard between them. Others will take it up, and then everyone will be doing it, so personally I'm just waiting for it to happen. jim From Bowerbird at aol.com Fri Oct 22 17:39:57 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 22 17:40:36 2004 Subject: [gutvol-d] presentation *is* structure (it's right in front of your eyes) Message-ID: thank you jon, for weighing in... *** jon said: > Those who understand and speak of XML, they know that > XML is not in and of itself a specific markup vocabulary, those who _know_ x.m.l. do know that, right. but some of the people who _speak_ of x.m.l. do _not_ seem to know it, and they gloss over all the difficulties without much comprehension. they think as long as something is "in x.m.l.", it's gonna have all these magical properties, when the truth of the matter is that you must put a lot of sweat into it to get most of them, sometimes more sweat than they're even worth. > there are many extraordinarily successful applications of XML. > One of the most recent applications of XML which a lot of people > recognize and use is RSS, used for blog feeds and the like. on that you are correct. any time that you want to exchange data between incongruent applications, x.m.l. _can_ be a good solution. (it's not _necessarily_ good, a lot of complications can occur that mess things up regardless, but the _potential_ is certainly there.) but even on this "successful" use in the case of r.s.s. and blog feeds, there is -- as i am sure you know -- a great deal of "controversy" concerning whether r.s.s. is the best way of doing it, or "atom" is... and there are additional controversies about _which_ r.s.s. version is the _best_ one. and even when all those things get sorted out, what bloggers might find is they have simply reinvented the wheel previously known as an announcement listserve, where a missive is sent out to a group of subscribers and simultaneously added to a cumulative website, in which case a whole lot of work was done for no real good reason. but hey, as long as everyone had fun along the way, i guess that's ok. > ZML is an example of a "regularized plain text" system > to represent certain important textual document structures > in a way which is fully machine-readable. I could easily create > an XML-based markup vocabulary clone of the ZML system > to represent the same identical structures. you say that often. but you've never really told us what the point is. even if it's possible to represent a simple system in a complex one, nothing is gained. you've only lost the benefit you had of simplicity. and indeed, that's my essence: use the most simple system possible. > Definitely. But what we require is to be able to machine-read and > machine-process the structure and semantics of a textual document. right, and my "machine" (i.e., app) can read and process the structure. (and we really need to handle "structure" and "semantics" separately, because semantics is a _lot_ more complex, and much too thorny to just toss off so casually. but i'll have more to say on that later...) > Even if humans can figure this out by a simple visual glance of > the content in a high-typographic-quality presentation, does > not automatically mean it is easy for machines to do likewise. let's put aside the question of how "easy" it is for a machine to do it. what i have said here, and will say elsewhere, is my routines _can_. and when i release the proof, other people will know that it's possible, and they'll then be able to write their own routines that can do it too. then everyone will wonder why they thought it was so difficult before. > It is also not easy to codify because visual presentation is "fuzzy" > (pun not intended), sometimes relying on surrounding context > to precisely define the document structure. well, you can go on and on about all the reasons why it is difficult. but once people are doing it, routinely, those "reasons" won't matter. > We have to remember that there are a lot of variances in conventions > (both historically and geographically) used for typographic layouts > to visually represent structure and semantics. so someone will modify their routines to work with those conventions. > Not only that, in some cases they don't even follow conventions, > especially when there are oddities in the content where > no convention has been firmly established. "oddities" are only "oddities" until someone figures out their pattern. because if there is no pattern, then nobody understood the structure in the first place, so there's no way to mark it up using _any_ system. > And as previously noted, sometimes the context must be > factored in to fully ascertain structure and semantics. ok, _now_ you're finally getting into the "semantic" part. if the only way you can understand how to mark up the text is to actually _understand_ the content, that is _semantic_. and yes, you need a high level of "intelligence" -- either human or artificial, and the artificial kind ain't here yet -- to do that markup, which means that you need humans to do it, and that's why it's costly. and even if you've got a lot of volunteer labor to throw at the task, it might not be enough, because this job is also _complex_ to boot. so you can't just use any volunteers, they have to be highly skilled. and to top it all off, it's time-consuming, so it's even more costly. that's why there are very high costs to doing semantic markup, much higher than the costs of (even manual) structural markup. and you know what the real kicker is? even though the _costs_ are sky-high, the _benefits_ of semantic markup ain't that great. certainly not from the standpoint of the average reader, anyway. (some scholars might make out, if you coded what they want.) hey, it's great that the machine can now tell you with certainty that the reason "new york times" has been rendered in italics is because it's a newspaper. but the reader _already_knew_that_. the writer made it clear in the course of setting the context. i will get to more examples down below, but you get the drift... > The "Gedanken" test I use for the minimum requirements > of machine-readable markup (or system such as ZML) > for textual documents is if a text-to-speech engine > is potentially capable of communicating the > structure and semantics of the content to a blind listener > (who is unfamiliar with any print conventions -- > they've never heard the terms 'italic' or 'bold') i doubt you'd find a blind person who's never heard those terms. but go on... > so they can, in real-time (i.e., a one-time linear audio presentation), > gain the same level of comprehension as a sighted person > (familar with typographic conventions) would in reading > a high-quality print version of the text. Pass this test, and > the markup will likely be pretty good for just about any purpose > in addition to accessibility. not only will a text-to-speech engine be "potentially capable" of communicating the content to a blind person, i actually intend to build such an engine right into my viewer-program. whether or not it delivers the _semantics_ of the content is wholly dependent on whether you put that information _into_ the file in the first place. and -- of course -- that's true of _any_ markup system. but z.m.l. will have a way to put it in, yes, and if you do, then there'll be a way to get it out as well. you'll have to specify exactly _how_ the text-to-speech engine should vocalize this info. but any way you can do it, i can too. > Is ZML or other type of "regularized plain text" > (or the XML-based ZML markup vocabulary analog) > sufficient to pass this test? yes. that's what i've been saying all along. that's what the test-suite is all about, baby. > The system only needs to be as complicated as needed to > represent the needed document structures and content semantics in > a machine-readable way such that it passes the test described above. if you can do it, i can too. > The $64,000 question therefore is > what structure and semantics needs to > be represented in a machine-readable way, > and to what degree of precision. different people will require different degrees of "precision". my target-population is the one michael has always targeted. > Maybe ZML (and its markup analog) is sufficient, maybe it isn't. of course, we can say that about any system, can't we... ;+) > I interpret from those here who have > first-hand experience handling large numbers of > the various types of texts in Project Gutenberg, > that ZML (or any other type of "regularized plain text" system) > does not have sufficient granularity to pass the "test." well, that's how i read the feelings of everyone here who has chimed in so far on the matter, except myself and maybe a couple of other people in varying degrees. but i note once again, for the record, that no one has yet given me a list of "hard e-texts" that they think might give my z.m.l. a run for its money on difficulty. so we really don't have an answer to that yet, do we? > Of course, we can argue whether the test > as I describe above is too strict, or maybe not even on-target. well, my primary aim is sighted people, so your test is not "on-target", but that's ok, i understand what your point is. i should note, however, that blind people seem to me to be the most delighted group of users that project gutenberg has, and are probably the people _most_ appreciative of plain text. all this in spite of the fact that there is _no_ semantic markup -- and very little structural markup either -- in the e-texts. no, it appears the magic formula for _that_ has been simple -- get everything else _out_of_the_way_ of the words themselves. i will let you think about that... > But keep in mind this is what the *accessibility community* > wants in machine-readable textual documents, and > what they are working towards in their activities -- they've > wholeheartedly embraced XML-based approaches, for example. they've been misled to believe the promises just like everyone else. > To wave one's hand in dismissal it is dishonest to try to imply i am "waving my hand in dismissal". please don't do that. > and say they are being unrealistic or stupid, i, of course, have never said anything like that. don't say that i have. please don't do that. > or that they don't really matter in our decision-making, it is unseemly of you to put those kind of words in _my_ mouth. please don't do that. > is a pretty bigoted and "blind" position (pun intended) to take which is what makes it so distasteful. so just stop it. please don't do that. > -- it is also stupid since meeting their needs for > structure and semantics has many other benefits as well. enough, jon. please don't do that. > I might ask a few text-to-speech experts I know at DAISY > to look at the ZML system and tell me if it has > sufficient structural granularity for > high-quality text-to-speech purposes. the judgement of bureaucrats doesn't impress me. i'll listen to the reports of blind users themselves. > As far as I am concerned, if they come back and say > "no it doesn't", then I would recommend that > PG should not consider ZML for its Master format i'm not seeking your endorsement, jon, so please feel free to make any recommendation to project gutenberg that you want concerning what they should consider for their master format. > but maybe consider ZML for its plain text output versions. whatever. > Bold lines which appear by themselves in the flow of text > are sometimes used for structures other than headers. my routines are not so brain-dead as to be confused by that. but thanks for enlightening me. > There are many other similar weirdities involved with > italicized text, indented text, etc., that we see in > visual layouts of texts. please do let me know about any mistakes that my routines make on any e-text in the library if you review my program, as i am sure there are "weirdities" i've not yet come across. > Context is often important to consider to > unambiguously discern structure for a visual cue. > For example, one convention often used is that > the names of ships is to be italicized. Thus, > if a machine is to discern the name of a ship from > linguistically emphasized text, it has to look at the context. that's a very good example, jon, so i'll discuss it a bit. my approach is to have the o.c.r. program _retain_text_styling_. so if the ship-name was italicized in the original book, it would continue to be italicized in the o.c.r. text (assuming recognition), and that would carry through all the editing to the final version. unless the person creating the digital version were to indicate that those italics represented a ship-name, they would remain as simple italics, and an end-user would be on her own to know why. _just_like_she's_on_her_own_when_she_reads_a_paper-book_. you might consider it to be some huge problem that the reader doesn't know _exactly_why_ something is being italicized, but i don't think it is, because they virtually always figure it out... even a blind reader can figure it out. heck, even in the e-texts with the italics stripped out, the blind reader can figure it out. if you asked any of those readers -- sighted or blind -- how much money they would pay to have that information supplied, to assess how much _value_ they place on it, they would laugh in your face. and that's _all_ you need to know about _that_ cost-benefit ratio. in the _rare_ case where that information _might_ be valuable, i have ways to mark it. and as soon as you show me those cases, and show me exactly how your x.m.l. markup provides a solution, i will be quite happy to show you exactly how i would do it too. > No, I'd say it is more accurate to say "for reading by eyesight, > structure is represented by visual presentation cues." you're talking more about _output_ here. whereas i am talking about _input_ instead. i'm talking about how to examine the p-book -- specifically, the o.c.r. that results from it -- to automatically determine the structure of the text. that structured text can then be rendered visually (on-screen or paper) or via text-to-speech. when i talk about "presentation", i'm talking about the p-book that we work with as our original source. however, in an aside, i've never even heard this _discussed_ yet, not here or anywhere else for that matter, but the time has come where we can expect to start seeing (or should i say "hearing") books that have been "input" using voice-recognition technology. in other words, the age of scanning might come to an abrupt end, or taper off significantly, when people start creating e-books by reading a book aloud into a voice-recognition system. they are remarkably improved these days, according to everything i read, plus their cost might fall _considerably_ in the near future too, and the number of people who might be willing to "enter" a book in this manner is probably far greater than those willing to scan. of course, it will take a new kind of software program to "fix" the transcription errors that will occur using this input method, but maybe that's already a part of these systems, i don't know... not making any predictions here, just keeping my eye open for it. what this might mean for blind people, i don't even have to say... > Remember, there are different types of presentation of text, > not only visual. the mac has had text-to-speech for well over a decade now, jon, right in the system. i've already put it in some of my e-book apps. > To focus on visual as the only form of presentation that matters > is being very short-sighted (pun intended.) good pun, if there can be said to be such a thing... ;+) but making the point to me is totally unnecessary. > And I've stated the core question to answer is: > "Is ZML (or any other system of regularized plain text) > sufficient to represent document structure and semantics > for Project Gutenberg Master texts?" that _is_ the right question. > I assume Bowerbird is saying "yes" there's no reason to "assume" that i am saying "yes". i've actually _said_it_, over and over and over again. and built a test-suite to prove it. > and many others here are saying "No". well, most everyone who has spoken up has said "no". (dale and maybe james have given a limp "perhaps".) and there might be some lurkers who i have convinced. but by and large, all the loudmouths have loudly said "no". > I answer the question with a "No". well, thanks for putting yourself firmly on the record jon. again. > Amusingly, Networker, a very insightful ebook expert who > often posts to The eBook Community, calls ZML a type of ITF, > "Impoverished Text Format", to indicate ZML has > insufficient granularity -- it is "impoverished". well, heck, jon, if the only thing i'd ever heard about z.m.l. was the one-sided "descriptions" you've given it over there, i would think that it sounded like a ludicrous idea too. networker will come around when he sees the real thing. everyone will. after all, the proof _is_ in the pudding... -bowerbird From hart at pglaf.org Fri Oct 22 17:49:02 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Oct 22 17:49:05 2004 Subject: [gutvol-d] Re: Languages in PG In-Reply-To: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com> References: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com> Message-ID: On Fri, 22 Oct 2004, Joshua Hutchinson wrote: > Interesting... Can any of those sites be raided for content to bolster our > German titles? (I can't read German, so my checking directly wouldn't do me > any good!) > > Josh I forwarded this directly to our German Team leader to check for us. Also, anyone interested might also want to take a look at Gunther Hille's for the Gutenberg Projekt-DE mh [DE is the German abbr. for Germany] > > ----- Original Message ----- From: Karl Eichwalder To: > gutvol-d@lists.pglaf.org Subject: [gutvol-d] Re: Languages in PG Date: Fri, > 22 Oct 2004 21:33:49 +0200 > >> >> Andrew Sly writes: >> >>> Also, the numbers below (taken from the catalog) show that, >>> although PG's non-english content can certainly be expanded, >>> it is not insignificant: >>> French (367) >>> German (307) >>> Finnish (85) >>> Chinese (69) >>> Spanish (59) >>> Italian (36) >> >> Not too bad. German is "slow" because many good texts are available >> elsewhere. It starts with http://gutenberg.spiegel.de; continues with >> sites dedicated to special authors like Karl May, Arno Schmidt, Novalis, >> or Georg Simmel; and does not end with digitizing projects located at >> Universities (G?ttingen, Trier, M?nchen, Bielefeld, Innsbruck). >> Especially the Austrian project (alo - austrian literature online: >> http://www.literature.at/) is very interesting even if seem to offer >> only PDF "for free". >> >> More German texts are tracked at http://www.litlinks.it >> >> -- >> | ,__o >> | _-\_<, >> http://www.gnu.franken.de/ke/ | (*)/'(*) >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From ke at gnu.franken.de Fri Oct 22 19:51:52 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Oct 22 20:31:03 2004 Subject: [gutvol-d] Re: Languages in PG In-Reply-To: (Michael Hart's message of "Fri, 22 Oct 2004 17:49:02 -0700 (PDT)") References: <20041022203821.40B09109AB9@ws6-4.us4.outblaze.com> Message-ID: Michael Hart writes: > On Fri, 22 Oct 2004, Joshua Hutchinson wrote: > >> Interesting... Can any of those sites be raided for content to bolster >> our German titles? At least, you can use theirs texts for comparison purposes. Some of them are "hidden" behind web interfaces (frames/javascript) and highly fragmented... > I forwarded this directly to our German Team leader to check for us. > > Also, anyone interested might also want to take a look at Gunther > Hille's for the Gutenberg Projekt-DE Gutenberg-DE is now to be found unter http://gutenberg.spiegel.de. -- | ,__o | _-\_<, http://www.gnu.franken.de/ke/ | (*)/'(*) From cweyant at twcny.rr.com Sat Oct 23 05:33:36 2004 From: cweyant at twcny.rr.com (Curtis A. Weyant) Date: Sat Oct 23 05:29:41 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <417930B5.5040907@perathoner.de> References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com> <417930B5.5040907@perathoner.de> Message-ID: <417A4FA0.3080206@twcny.rr.com> Marcello Perathoner wrote: > No. All you can define inside an XML file is the DTD (or other schema) > you want to use and entities like &myentity; That's not true. You can define a full DTD (or a subset of one) within the XML document itself if you want. The W3C gives the following example in the XML 1.0 (3rd ed.) spec: ]> Hello, world! This is a fully valid and well-formed XML file with the DTD defined in the DOCTYPE header instead of in a separate DTD file. Of course, while you _can_ do that, it's probably not the best way. Curtis. From marcello at perathoner.de Sat Oct 23 05:44:02 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 23 05:44:47 2004 Subject: [gutvol-d] presentation *is* structure (it's right in front of your eyes) In-Reply-To: References: Message-ID: <417A5212.4080700@perathoner.de> Bowerbird@aol.com wrote: > and indeed, that's my essence: use the most simple system possible. Use the most simple tool that does the job, but don't use a simpler one. If you had done any research on ebooks before self-proclaiming yourself demi-god, you my have noticed that your toy markup language is woefully underpowered. You don't even handle the very first page of Sherlock Holmes. Mark this up in ZML. Note that "Being a reprint" is a subtitle to "Part I" and not a paragraf. Same goes for "Mr. Sherlock Holmes". Note also that: "John H. Watson" is emphasized, although it's the only part of the title that's not italic. --- PART I. _Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late of the Army Medical Department._ CHAPTER I. MR. SHERLOCK HOLMES. IN the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army. ... --- > but i note once again, for the record, that no one has > yet given me a list of "hard e-texts" that they think > might give my z.m.l. a run for its money on difficulty. > so we really don't have an answer to that yet, do we? How about doing your homework yourself? The world at large was not created to do your bidding. Go, find a slew of difficult texts, mark them up, fix your program and show us what you can do. But, please, stop whining about us not doing your work. > of course, it will take a new kind of software program to "fix" > the transcription errors that will occur using this input method, > but maybe that's already a part of these systems, i don't know... Again, researching your stuff before starting a colossal handwave is out of the question. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Oct 23 06:01:12 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 23 06:01:57 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <417A4FA0.3080206@twcny.rr.com> References: <20041022142122.16EF9109A3E@ws6-4.us4.outblaze.com> <417930B5.5040907@perathoner.de> <417A4FA0.3080206@twcny.rr.com> Message-ID: <417A5618.5030809@perathoner.de> Curtis A. Weyant wrote: > That's not true. You can define a full DTD (or a subset of one) within > the XML document itself if you want. The W3C gives the following example > in the XML 1.0 (3rd ed.) spec: > > > > ]> > Hello, world! You are right. This way you could introduce some personal tags into a document and slip them past the validator. > Of course, while you _can_ do that, it's probably not the best way. It could become difficult to track which translator goes with which files. It is easier if you just reference one out of a known set of DTDs. -- Marcello Perathoner webmaster@gutenberg.org From jlinden at projectgutenberg.ca Sat Oct 23 11:13:15 2004 From: jlinden at projectgutenberg.ca (James Linden) Date: Sat Oct 23 11:19:07 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <4177E9CB.7080200@perathoner.de> Message-ID: Two points to mention before you read my actual reply: 1) My apologies for the delay in responding. A configuration issue has caused all my email to bounce from pglaf.org's mail server for the past couple weeks, and only yesterday was I able to get it rectified. I will be posting many delayed replies for the next few days. 2) While the original email that I'm replying to here was written by Marcello, my replies are not directed at him personally, but rather, at everyone in the community. ---------------------------- Of the many assumptions being made on the list these days, here are three of the most erroneous -- and they all came in a single paragraph. > We don't need more discussion about whether TEI is the right language, I > think we are all agreed on that. Not everyone agrees with the use of TEI. I'm not even going to begin my arguments again -- there would no point. Some of us, such as myself, foresee issues, based on our own experiences, with using TEI (or varient) for PG work. I'm resigned to the fact that it may be the only way to get XML into PG at all, so I'll just deal with the issues on my own time at that point. Simply put, TEI is one of the most verbose markup vocabularies available, and using it for PG is going to turn off a LOT of people to XML. A simpler, more concise vocabulary would be less intimidating! > pgxml.org is dead PGXML is not dead at all. Just like other XML stuff in PG, it's been on hiatus, mostly for two reasons: 1) lack of agreement in the community, and 2) lack of personal time to work on it. I will be meeting with the other co-founder of pgxml.org (Ben Crowder) in November, after which time, we hope to present a definitive plan for pgxml.org to the community. > and ZML is good for laughs. You can laugh at ZML all you want, but from the examples and personal discussion with Bowerbird, I have learned that ZML is not at all what most people think it is. From the examples that I have seen, ZML is basically PG vanilla text format, but cleaned up and normalized. ---------------------------- The entire rest of this email is a rant, so please feel free to skip it. You have been given the choice! ---------------------------- Maybe if you learned to listen to other people, you'd not make such erroneous assumptions. Maybe, just maybe, other people do have a clue, and you aren't the only one that knows something. Yes, I have blocked Bowerbird from joining the PGXML list, but he is the _only_ person that I've blocked. The only reason for this is because of other people's reactions to him, not because of Bowerbird himself. While he can be irritating and annoying, Bowerbird does have a clue about some of the issues we have to deal with in PG work, particuarly in converting to other formats, etc. If you don't like his attitude, ignore his posts. You can at least try to extract the useful information that he does give from the flame wars they often come in. This way, you might actually LEARN something. More mud is slung on this list than on ANY other list that I'm subscribed to, but I have to admit, I'm only subscribed to about 200 active lists, so I may be missing the mud-slinging ones. If you don't like the way something is being done in PG, don't throw a hissy-fit. Get off your arse and do something about it, or sit down and shut up, and let other people do what they think should be done. I've made no secret of my personal opinions of PG: 1) the website is a disgrace 2) the archive is poorly organized 3) the catalog system is a hack job done by unqualified people 4) the PG text format is extremely disgusting 5) PG makes volunteers work uphill to get anything done 6) the lack of quality in our content offsets any gain from it Just because I have the opinions doesn't automatically mean I A) know what I'm talking about, or B) have an alternative solution. Some of the biggest technological innovations cames not from people who had a better idea, but from people who knew the current idea wasn't that good, and were open minded when the better idea came along. As a whole, I find that PG is not a very open-minded community. As a community, we reguarly discourage people from volunteers, mostly because we don't support them well. At times, not only do we not support them, but we actively, and publically, bash their skulls. We lie to the general public about PG on a regular basis. When posting ebooks, we ignore the wishes of the volunteers who made the texts. We don't even provide well-suited tools for the volunteers to use to improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe the tool has been offered to PG on a perpetual right to use for PG status, but oh, lordy, that's just not good enough. We reguarly tell some of our hardest working volunteers that they are full of crap (more or less). True, it's usually a paragraph long description of what they are doing "wrong", but it's basically telling them their work is unwanted. Do you realize that out of over 300 librarians I've talked to personally, the first thing that came to mind for "ebooks" was the University of Virginia's eText Library? PG simply is not a mover or a shaker in the ebook world, regardless of the hocus pocus you might hear to the contrary. ---------------------------- And, to all these things I've said, I'm a victim of some, but a perpetrator of others. I'm just as guilty as you are. I have no soapbox, only a conscience and a desire to change things. ---------------------------- So, dangit all to heck, we don't have to be like this. PG _could_ be great. PG SHOULD be great. -- James From marcello at perathoner.de Sat Oct 23 12:39:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 23 12:40:02 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: References: Message-ID: <417AB35E.6010202@perathoner.de> James Linden wrote: >>pgxml.org is dead > > PGXML is not dead at all. I stand corrected. Sometime ago I noticed the domain was expired, but somebody re-registered it just yesterday. But I gather it wasn't you. Who is this ? > $ whois pgxml.org Domain ID:D105038884-LROR Domain Name:PGXML.ORG Created On:22-Oct-2004 20:47:27 UTC Last Updated On:22-Oct-2004 20:47:48 UTC Expiration Date:22-Oct-2005 20:47:27 UTC Sponsoring Registrar:Go Daddy Software, Inc. (R91-LROR) Status:TRANSFER PROHIBITED Registrant ID:GODA-08593143 Registrant Name:Registration Private Registrant Organization:Domains by Proxy, Inc. Registrant Street1:15111 N Hayden Rd., Suite 160 Registrant Street2:PMB353 Registrant Street3: Registrant City:Scottsdale Registrant State/Province:Arizona Registrant Postal Code:85260 Registrant Country:US Registrant Phone:+1.4806242599 Registrant Phone Ext.: Registrant FAX: Registrant FAX Ext.: Registrant Email:PGXML.ORG@domainsbyproxy.com ... -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Sat Oct 23 12:56:32 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Oct 23 12:57:26 2004 Subject: [gutvol-d] re: the king of vaporware Message-ID: <1e.3685a94c.2eac1170@aol.com> james said: > More mud is slung on this list than on ANY other list > that I'm subscribed to, but I have to admit, > I'm only subscribed to about 200 active lists, > so I may be missing the mud-slinging ones. yeah, it's tough to keep up. :+) and those are just the listserves. don't forget the forums... like the one over at distributed proofreaders... why just the other day (maybe yesterday?), two minutes after you posted a message where you mentioned pgxml.org, someone -- who must not know that kodekrash is you -- said, "last i heard, pgxml was being run by james linden, the king of vaporware..." i guess he didn't know that's supposed to be _my_ title... :+) anyway, like i said, it's tough to keep up... ;+) -bowerbird From joshua at hutchinson.net Sat Oct 23 13:04:54 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Oct 23 13:05:19 2004 Subject: [gutvol-d] re: the king of vaporware In-Reply-To: <1e.3685a94c.2eac1170@aol.com> References: <1e.3685a94c.2eac1170@aol.com> Message-ID: <417AB966.8010204@hutchinson.net> Bowerbird@aol.com wrote: >said, "last i heard, pgxml was being run >by james linden, the king of vaporware..." > >i guess he didn't know that's supposed to be >_my_ title... :+) > > Nah... You're the clown prince of vaporware. The king position is still open. While I haven't seen much beyond talk from James, he is usually very informed and reasonable. I don't always agree with him, but I respect *his* opinions. Josh From joshua at hutchinson.net Sat Oct 23 13:09:40 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Oct 23 13:10:03 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: References: Message-ID: <417ABA84.80705@hutchinson.net> James Linden wrote: There are two things I wanted to say to this: One was that other than bowerbird, the rest of the discussions, while sometimes passionate, are not typically mean spirited. We just have people who feel passionate about their particular vision of what PG should be (as it looks like you feel as well, from your post). In the end, most of the people who write the most respect each other and each other's opinions. And discussion of the topics at hand is the only method we have of moving forward. The second is that while a more simple XML markup, like what you loosely described as PGXML, sounds wonderful on the surface ... it requires, once again, largely reinvented the wheel AND not being compatible with a standard that seems to be gaining momentum out there. Granted, the very nature of XML makes converting from a home-grown markup to TEI a possibility, removing the need to convert would seem to be the wiser path. Josh From marcello at perathoner.de Sat Oct 23 13:29:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 23 13:30:00 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: References: Message-ID: <417ABF16.40203@perathoner.de> James Linden wrote: > You can laugh at ZML all you want, but from the examples and personal > discussion with Bowerbird, I have learned that ZML is not at all what most > people think it is. From the examples that I have seen, ZML is basically PG > vanilla text format, but cleaned up and normalized. Because Bowerbird is more enamoured in hearing himself talking than in doing any research whatsoever, ZML doesn't address the simplest issues of text markup. Bowerbird just somehow found the DP rules for formatting proofed texts and amplified them with some rather sub-optimal ad hoc extensions, like the use of tabs for marking centered text etc. Altogether ZML is not much better (more likely worse) than what DP is outputting right now. > Yes, I have blocked Bowerbird from joining the PGXML list, but he is the > _only_ person that I've blocked. The only reason for this is because of > other people's reactions to him, not because of Bowerbird himself. That is an original way of seeing things ... > I've made no secret of my personal opinions of PG: > > 1) the website is a disgrace > 3) the catalog system is a hack job done by unqualified people That's what I've fixed in the last year. > 4) the PG text format is extremely disgusting That's what I tried to fix. But ran against 5) > 5) PG makes volunteers work uphill to get anything done -- Marcello Perathoner webmaster@gutenberg.org From jlinden at projectgutenberg.ca Sat Oct 23 13:43:16 2004 From: jlinden at projectgutenberg.ca (James Linden) Date: Sat Oct 23 13:45:02 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <417AB35E.6010202@perathoner.de> Message-ID: Oh geez... I'm going to ask Ben what happened. :-( Ok, so as far as I know pgxml.org doesn't belong to us anymore, but that doesn't make it dead, just the domain will have to change. -- James > -----Original Message----- > From: gutvol-d-bounces@lists.pglaf.org > [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Marcello > Perathoner > Sent: Saturday, October 23, 2004 3:39 pm > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] barriers to XML posting > > > James Linden wrote: > > >>pgxml.org is dead > > > > PGXML is not dead at all. > > I stand corrected. Sometime ago I noticed the domain was expired, but > somebody re-registered it just yesterday. But I gather it wasn't you. > Who is this ? > > > > $ whois pgxml.org > Domain ID:D105038884-LROR > Domain Name:PGXML.ORG > Created On:22-Oct-2004 20:47:27 UTC > Last Updated On:22-Oct-2004 20:47:48 UTC > Expiration Date:22-Oct-2005 20:47:27 UTC > Sponsoring Registrar:Go Daddy Software, Inc. (R91-LROR) > Status:TRANSFER PROHIBITED > Registrant ID:GODA-08593143 > Registrant Name:Registration Private > Registrant Organization:Domains by Proxy, Inc. > Registrant Street1:15111 N Hayden Rd., Suite 160 > Registrant Street2:PMB353 > Registrant Street3: > Registrant City:Scottsdale > Registrant State/Province:Arizona > Registrant Postal Code:85260 > Registrant Country:US > Registrant Phone:+1.4806242599 > Registrant Phone Ext.: > Registrant FAX: > Registrant FAX Ext.: > Registrant Email:PGXML.ORG@domainsbyproxy.com > ... > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From jlinden at projectgutenberg.ca Sat Oct 23 13:48:14 2004 From: jlinden at projectgutenberg.ca (James Linden) Date: Sat Oct 23 13:50:00 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <417ABF16.40203@perathoner.de> Message-ID: > > I've made no secret of my personal opinions of PG: > > > > 1) the website is a disgrace > > 3) the catalog system is a hack job done by unqualified people > > That's what I've fixed in the last year. Yes, I have to admit that you've been working VERY hard on the site, but in my own opinion, it is still not up to par. That's not really any fault of your's tho. I wouldn't call your new catalog system "fixed", but it IS better than the old CGI one. :-) > > 4) the PG text format is extremely disgusting > > That's what I tried to fix. But ran against 5) > > > 5) PG makes volunteers work uphill to get anything done Hey, I'm with you there. :-| -- James From scott_bulkmail at productarchitect.com Sat Oct 23 14:20:24 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sat Oct 23 14:32:13 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <417ABA84.80705@hutchinson.net> References: <417ABA84.80705@hutchinson.net> Message-ID: >The second is that while a more simple XML markup, like what you loosely described as PGXML, sounds wonderful on the surface ... it requires, once again, largely reinvented the wheel "Reinventing the wheel" is often something to be avoided, but I'm not sure it's a compelling issue here. First, there are other models to use, e.g. XHTML. Second, the most important standard is XML itself. That's what enables an incredible variety of tools and platforms; the specific DTD is much less important. (In fact, XML's designers made sure it was useful even without a DTD.) Third, TEI was created for a very different world: scholarly publishing. If PG's markup was going to be done by paid experts, TEI would probably be the best choice. But I'm not convinced it's appropriate for a volunteer organization. XML can be much simpler than HTML, yet TEI is (IMHO) more complex not less. I just finished converting The Wonderful World of Oz to PGTEI. (I'll post it on Classicosm.com once I have a chance to write up my impressions.) During my learning process, I came across an interesting comparison of Shakespeare marked up using TEI and an "ad hoc" markup used by Jon Bosak (a key inventor of XML). Though the comparison was done by a TEI advocate, I think Jon's is a much better model for our purpose. http://www.tei-c.org.uk/Sample_Manuals/mueller-main.htm A very gentle introduction to the TEI (the comparison is near the end -- look for the garish background colors) >Granted, the very nature of XML makes converting from a home-grown markup to TEI a possibility, removing the need to convert would seem to be the wiser path. The whole point of a master format is that PG is going to convert to other useful formats. If TEI is useful in and of itself, that can be just another conversion. -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From scott_bulkmail at productarchitect.com Sat Oct 23 14:30:09 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sat Oct 23 14:32:19 2004 Subject: [gutvol-d] scanned title pages In-Reply-To: <20041022181003.GB24510@pglaf.org> References: <20041022125443.7BD719E88D@ws6-2.us4.outblaze.com> <02ed01c4b842$1c31baf0$6501a8c0@Unicorn> <20041022181003.GB24510@pglaf.org> Message-ID: Greg typed: >I do have all of the title pages & verso pages submitted >electronically. This is thousands and thousands of images. >Just email me if you need images for a particular item. If >you'd rather, I could package up the older clearances (pre-August '04 >or so) and get them to you. It's probably < 2GB total. Thanks much for the offer. I've moved beyond cataloging onto the main part of my project. If I ever revisit the entire catalog, it would be great to have a DVD of all the title+verso pages (if they don't get posted somewhere in the meantime). Meanwhile, I may indeed make some one-off requests. >N.B., this stuff is not suitable for public redistribution >with our eBooks. Many scans are not very high quality. Some >are, and it would be fine with me to make them publicly >available somewhere. I don't have much opinion about >including these with the eBooks themselves - that's something >for the producer to decide. Most title & verso pages are pretty >boring, though, so probably are not worth including as part >of an eBook. I completely agree that there's no reason to include these in the default .zip file for the HTML or any other edition/format. I just think it's important to make them available somewhere people can find them, e.g. for librarians, scholars or other catalogers. -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From shalesller at writeme.com Sat Oct 23 16:50:34 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Oct 23 16:51:29 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <20041023235034.B91FC4BDA9@ws1-1.us4.outblaze.com> Scott Lawton writes: > During my learning process, I came across an interesting > comparison of Shakespeare marked up using TEI and an "ad > hoc" markup used by Jon Bosak (a key inventor of XML). > Though the comparison was done by a TEI advocate, I think > Jon's is a much better model for our purpose. > > http://www.tei-c.org.uk/Sample_Manuals/mueller-main.htm > A very gentle introduction to the TEI > (the comparison is near the end -- look for the garish background colors) I'd disagree. The bibliographic information in Jon's is insufficent; it doesn't even give us an author, much less the rest of the information we need to have computer readable. Once we've dumped the lines in the TEI, the main difference is the line numbers (which are optional in TEI and necessary for some books) and the capacity to do split metric lines, which is again necessary for some books. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From shalesller at writeme.com Sat Oct 23 22:44:46 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Oct 23 22:45:48 2004 Subject: [gutvol-d] barriers to XML posting Message-ID: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com> "James Linden" writes: > Yes, I have blocked Bowerbird from joining the PGXML list, but he is the > _only_ person that I've blocked. The only reason for this is because of > other people's reactions to him, not because of Bowerbird himself. Just like we innocuate against polio, not because of polio itself, but because of other people's reactions to it? > You can at > least try to extract the useful information that he does give from the flame > wars they often come in. This way, you might actually LEARN something. My stress level is high enough without listening to a self-centered egomaniac who has repeatedly stated his unwillingness to work with me. > More mud is slung on this list than on ANY other list that I'm subscribed > to, but I have to admit, I'm only subscribed to about 200 active lists, so I > may be missing the mud-slinging ones. I haven't seen that at all. It really doesn't compare to debian-devel and other lists I've been on at all. > I've made no secret of my personal opinions of PG: > > 1) the website is a disgrace > 2) the archive is poorly organized > 3) the catalog system is a hack job done by unqualified people > 4) the PG text format is extremely disgusting > 5) PG makes volunteers work uphill to get anything done > 6) the lack of quality in our content offsets any gain from it [More wild criticisms of PG] What makes you think this is at all constructive? You use emotionally charged words--"disgrace", "hack job", "disgusting"--and criticize almost everything about PG. I think it would be much more constructive in the future to deal with one issue at a time. > We don't even provide well-suited tools for the volunteers to use to > improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe > the tool has been offered to PG on a perpetual right to use for PG status, > but oh, lordy, that's just not good enough. I have no idea what the context was on this, and that would be terribly helpful. I'm sceptical to the idea that PG would turn down a great improvement on our current tools merely because they aren't open source. However, open source is about the flexibility to get the job done. There was a non-web based frontend to DP, but it had to be abandoned because the author disappeared and nobody had the source to fix it as the site changed. Having an open-source program means we can fix it, we can port it, and we don't have to worry about whether we're using it for "PG status" or for Rastko or for our private project. That's a valuable thing, and something that PG should push for when possible. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From gbnewby at pglaf.org Sun Oct 24 11:28:57 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Oct 24 11:28:59 2004 Subject: [gutvol-d] barriers to XML posting In-Reply-To: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com> References: <20041024054446.5B63A4BDA9@ws1-1.us4.outblaze.com> Message-ID: <20041024182857.GA28975@pglaf.org> On Sat, Oct 23, 2004 at 09:44:46PM -0800, D. Starner wrote: > "James Linden" writes: ... > > We don't even provide well-suited tools for the volunteers to use to > > improve PG, because, oh my god, maybe the tool isn't 100% open-source! Maybe > > the tool has been offered to PG on a perpetual right to use for PG status, > > but oh, lordy, that's just not good enough. > > I have no idea what the context was on this, and that would be terribly > helpful. I'm sceptical to the idea that PG would turn down a great > improvement on our current tools merely because they aren't open source. > However, open source is about the flexibility to get the job done. > There was a non-web based frontend to DP, but it had to be abandoned > because the author disappeared and nobody had the source to fix it as > the site changed. Having an open-source program means we can fix it, > we can port it, and we don't have to worry about whether we're using it > for "PG status" or for Rastko or for our private project. That's a > valuable thing, and something that PG should push for when possible. I've never heard or seen a "party line" on open source from/for PG. Yes, it's preferable for the reasons David mentions. Yes, it's often free (which matters a lot when we want volunteers to get their own copy). Yes, there is a conceptual alignment between PG's efforts to enhance the public domain and many open source philosophies. But we're essentially pramatists, using the tools we have available to do the best job we can do. If people have tools to offer, they can/should offer them. There is a full range of tools in use at PG, and lots of stuff we developed ourselves. There's always room for more tools that people might be able to use. -- Greg From joshua at hutchinson.net Mon Oct 25 07:03:27 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 25 07:03:30 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website Message-ID: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com> Continuing my evalaution of Marcello's PGTEI setup on the gutenberg website (http://www.gutenberg.org/tei/)... I used the same Declaration of Independence file I used last week to comment on the XML markup itself. This time I'm converted that XML file to HTML and TEXT using the online services section. Below are the bulleted items that *I* believe need some improvement. If anyone wants to duplicate my conversions, see my post from last week that contained the XML I used (or send me a quick e-mail and I'll forward the file on). Josh *** HTML conversion items: 1 - First thing that jumps out is the need for bigger left and right margins. This is a simple CSS change. Currently, DP has *mostly* standardized on 10% margins on the left and right. This gives some nice white space for easier reading and gives room for things like original source page numbers and sidenotes to be put in the margin area. 2 - If the author field is left blank, the conversion shouldn't put a "by" out there all by itself. Both the HTML and the TEXT version have this dangling word. 3 - The publication and edition date are both being printed, but it isn't clear which is which. Maybe put "Original publication date:" label before the date itself? 4 - Since the title, author, etc. is already list in the first few lines, the second listing below the gutenberg disclaimer line is redundant. Also, in that same spot, the language code is printed, which is nice, but I would suggest changing the format slightly. Namely, put the language code in brackets after the written out language. i.e.: English-United States [en-us] For most of us normal humans, the language codes are not intuitive. 5 - In the CONTENTS section, if there are no footnotes/endnotes, don't list a NOTES section. 6 - Use standard HTML paragraph spacing. Right now, the CSS specifies no blank line between paragraphs and an indent to the beginning of each paragraph. While this matches the original paper source, for me at least, it is jarring to read on a computer screen. This type of formatting would make perfect sense in the PDF conversion, since that one is geared for printing on paper. 7 - Need a horizontal rule (75% width seems right to me) between the CONTENTS section and the first section of the text. Right now, they run together. 8 - Need horizontal rule between major divisions of the text. Currently, the large type header gives a visual indication, but I don't believe it is enough. 9 - No need for the extra horizontal rule to mark off the FOOTNOTES section if there is no footnote section in that text. Currently, this situation makes for two horizontal rules in a row in a text with no footnotes. *** TEXT conversion items: 1 - It lists "The Project Gutenberg EBook of" twice. 2 - Has a dangling "by" line even when no author is specified. 3 - Same redundant title/author info as in the HTML conversion. 4 - Notes section appears whether there are any footnotes or not. From joshua at hutchinson.net Mon Oct 25 07:21:13 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 25 07:21:16 2004 Subject: [gutvol-d] Request for a PGTEI web form developer... Message-ID: <20041025142113.2CA442F8FE@ws6-3.us4.outblaze.com> The hardest part of creating a PGTEI text right now is the header information. It is simply confusing to parse for a human and is very dense in information. This will be the section that most quickly kills people enthusiasm for the format. However, this section is very important because there is SO MUCH good information stored there. And it is information that needs to be in XML because it is exactly the type of information most likely to be accessed by a computer indexing program or something similar. That said, I am asking for someone with good web developing skills to set up to the plate and create a web page that allows a human to enter information in a nicely formatting web form and then spits out the PGTEI compliant header to be included in the beginning of a converted text. I am very willing to help test such a beast and provide whatever help I can, but my HTML skills don't extend beyond the normal layout variety. I've never played with forms, much less a backend that can create output from the web form input. Thank you in advance to any willing to take this on! Josh From marcello at perathoner.de Mon Oct 25 07:36:06 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Oct 25 07:36:10 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website In-Reply-To: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com> References: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com> Message-ID: <417D0F56.4080405@perathoner.de> Joshua Hutchinson wrote: > 1 - First thing that jumps out is the need for bigger left and right > margins. This is a simple CSS change. Currently, DP has *mostly* > standardized on 10% margins on the left and right. This gives some > nice white space for easier reading and gives room for things like > original source page numbers and sidenotes to be put in the margin > area. OTOH I like to read texts in a small (horizontally) browser window so I can put a shell window and the browser window on one screen. The shell is usually compiling something or doing boring work. If the shell stumbles over something I can immediately switch over, correct and switch back to my reading. Big margins in the browser window would definitely be a major annoyance. I think, the CSS provided is just an example. Everybody here has enough skills to build a CSS he/she likes. For the end user we may consider an "alternate stylesheet" model where she may switch between a set of predefined ones. > 6 - Use standard HTML paragraph spacing. Same as above. > 7 - Need a horizontal rule (75% width seems right to me) between the > CONTENTS section and the first section of the text. Right now, they > run together. > > 8 - Need horizontal rule between major divisions of the text. > Currently, the large type header gives a visual indication, but I > don't believe it is enough. Use the rend="newpage" or rend="newdoublepage" attribute on a div, front, back element like eg.:
This will start a new page on paginated media and put a rule on HTML. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Mon Oct 25 07:45:22 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 25 07:45:26 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website Message-ID: <20041025144522.560E1109764@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Joshua Hutchinson wrote: > > > 1 - First thing that jumps out is the need for bigger left and right > > margins. This is a simple CSS change. Currently, DP has *mostly* > > standardized on 10% margins on the left and right. This gives some > > nice white space for easier reading and gives room for things like > > original source page numbers and sidenotes to be put in the margin > > area. > > OTOH I like to read texts in a small (horizontally) browser window so I > can put a shell window and the browser window on one screen. The shell > is usually compiling something or doing boring work. If the shell > stumbles over something I can immediately switch over, correct and > switch back to my reading. > > Big margins in the browser window would definitely be a major annoyance. > > I think, the CSS provided is just an example. Everybody here has enough > skills to build a CSS he/she likes. For the end user we may consider an > "alternate stylesheet" model where she may switch between a set of > predefined ones. > That is why a 10% margin is used instead of a fixed value. On a small window, the margin almost disappears, while on a "normal" sized window, it provides the white space. Also, while I agree CSS changes are the fix for these, what I am trying to do here is help to create a "standard" conversion that is workable. We don't really want the volunteer to HAVE to create their own CSS. We want them to have the ability to, of course, if they want to, but the standard conversion should have a baseline that is the best we can do. NOTE: Many of my suggestions are my personal opinion, such as the margins, and part of the purpose here is to get conflicting opinions for others. So, coming to a baseline style consensus is also a dual objective here. > > > 6 - Use standard HTML paragraph spacing. > > Same as above. > Which should be our baseline style, though? If other people like the printer style paragraphs better, that's fine. This is, again, my opinion here. > > > 7 - Need a horizontal rule (75% width seems right to me) between the > > CONTENTS section and the first section of the text. Right now, they > > run together. > > > > 8 - Need horizontal rule between major divisions of the text. > > Currently, the large type header gives a visual indication, but I > > don't believe it is enough. > > Use the rend="newpage" or rend="newdoublepage" attribute on a div, > front, back element like eg.: > >
> > This will start a new page on paginated media and put a rule on HTML. > Cool, I learned something here. That takes care of that concern (at least to my mind, it does). Josh From joel at oneporpoise.com Mon Oct 25 08:11:53 2004 From: joel at oneporpoise.com (Joel A. Erickson) Date: Mon Oct 25 08:30:08 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website References: <20041025144522.560E1109764@ws6-4.us4.outblaze.com> Message-ID: <003001c4baa4$f604a940$6501a8c0@JOEL> Joshua Hutchinson wrote: > NOTE: Many of my suggestions are my personal opinion, such as > the margins, and part of the purpose here is to get conflicting opinions > for others. So, coming to a baseline style consensus is also a dual > objective here. I vote for 10% margins and indenting paragraphs with a space above about 1/2 to 3/4ths the height of a standard line. From jon at noring.name Mon Oct 25 09:50:22 2004 From: jon at noring.name (Jon Noring) Date: Mon Oct 25 09:50:32 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website In-Reply-To: <417D0F56.4080405@perathoner.de> References: <20041025140327.BFBD24F490@ws6-5.us4.outblaze.com> <417D0F56.4080405@perathoner.de> Message-ID: <48961996703.20041025105022@noring.name> Marcello wrote: > Joshua Hutchinson wrote: >> 1 - First thing that jumps out is the need for bigger left and right >> margins. This is a simple CSS change. Currently, DP has *mostly* >> standardized on 10% margins on the left and right. This gives some >> nice white space for easier reading and gives room for things like >> original source page numbers and sidenotes to be put in the margin >> area. > ... I think, the CSS provided is just an example. Everybody here has > enough skills to build a CSS he/she likes. For the end user we may > consider an "alternate stylesheet" model where she may switch > between a set of predefined ones. The beauty of transforming "standardized" TEI documents into XHTML [see note at end] is that, when done right (with no presentational markup), the XHTML for all the documents will itself be uniform and standardized, thus amenable to swappable CSS style sheets which can be applied to almost the whole collection, if not all of it. Of course, the documents will also be reasonably accessible since accessibility is enhanced by this approach. A favorite site of mine which demonstrates the power of swappable CSS is "CSS Zen Garden", http://www.csszengarden.com/ , which essentially uses the same, high quality (and accessible) document, and invites anyone to submit their own CSS style sheet -- hundreds of style sheets have been submitted so far from many web designers/artists/enthusiasts. It's amazing to see the variation of complex styling which can be applied to such a simple document (try viewing the base document without CSS -- images are separate from the document and also swappable in CSS Zen Garden.) Certainly, how PG would enable style sheet swapping may be different than how CSS Zen Garden does it, but that's beside the point. The important point is that it can be done, and will be an exciting addition to PG by allowing readers to "have it their way" rather than "having it our way." We will not have to argue on whether we want 10% or 20% margins, etc. This will also entice many to submit their own CSS designs for people to use. But it all starts with the Master markup being done *right*. Jon Noring [Note referenced above: This indicates that there should be NO presentational markup in the source TEI-conforming documents -- to take a pure structural/semantic approach to markup. About XHTML, the documents spit out from XSLT should be XHTML 1.1, or at least the content markup itself between ... be valid to XHTML 1.1. I suppose we could also offer a "legacy", pre-styled, non-CSS HTML for those running really old and crusty, non-CSS browsers.] From joshua at hutchinson.net Mon Oct 25 10:23:43 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Oct 25 10:23:51 2004 Subject: [gutvol-d] Critique of automated conversion of PGTEI on PG website Message-ID: <20041025172343.43DD410976D@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Jon Noring > Marcello wrote: > > Joshua Hutchinson wrote: > > Certainly, how PG would enable style sheet swapping may be different > than how CSS Zen Garden does it, but that's beside the point. The > important point is that it can be done, and will be an exciting > addition to PG by allowing readers to "have it their way" rather than > "having it our way." We will not have to argue on whether we want 10% > or 20% margins, etc. This will also entice many to submit their own > CSS designs for people to use. But it all starts with the Master > markup being done *right*. > I agree that the CSS provides a powerful and easy way to have different formats. However, we need a "standard" format as a base. The default style is the one that Joe Sixpack, for instance, will see when he clicks on the HTML doc link at the main PG website. We can then have the ability to "swap" CSS at the click of a button, but that functionality is somewhere down the road. We need a functional style in place now for this to move forward (back to Marcello's baby steps). Some of the issues I brought up are purely presentational and fall under the CSS heading. Some are functional (ie, the extra PG header at the beginning of the TEXT version). I think both need to be addressed and a "fix" decided on before XML can move forward. Currently, I'm working on marking up The Hunting of the Snark in PGTEI to see what further issues are introduced by poetry markup. As expected, there are a few and they are largely presentational in nature (CSS), but they need to be addressed to make sure that the XML itself is sufficient to the task of handling the content. Josh From joshua at hutchinson.net Tue Oct 26 06:33:04 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 06:33:08 2004 Subject: [gutvol-d] Final PGTEI run-thru for a while... Message-ID: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> This e-mail concludes the "common" items I want to check in PGTEI. With the items in my previous tests and the ones today, I could mark up 95%+ of what I see in DP. So, my next step is to start trying to understand how the transforms work and see what *I* can do to improve things. Expect this to be slow going, folks, I'm an old English major who likes computers, so I have to puzzle things out as I go! :) **** For this one, I create a hodge-podge of stuff from the beginning of The Hunting of the Snark and my own gibberish additions to get to all the features I wanted to test. The XML file is attached at the end for anyone wanting to reproduce my tests. As before, the conversion was done with Marcello's online transforms at http://www.gutenberg.org/tei/services/tei-online. **** Table markup: 1 - XML was very straight forward. It is similar to HTML table markup, just with slightly different tag words. instead of . instead of . All in all, the more human friendly tags in the XML are easier to parse than the HTML. Under the HTML conversion, the tables came out well. No complaints there. Under the TEXT conversion, the small table came out well. However, when I used longer data items in the second table, the TEXT conversion did not do so well. Basically, the text conversion does not try to line wrap the table cells at all, the table grew to be extremely wide. This one is a bit of a show stopper as far as automated conversion is concerned. Granted, the tables could be manually edited, but that hurts the whole reason for using a master document format. ** Footnote markup: Again, no real complains on the footnotes/endnotes. It is pretty straight forward once you read the formatting rules. The nice thing is that the conversion process handles moving the notes to their proper location for you. However, I did have one question that I couldn't find the answer to. How would you handle sidenotes? It looked like you could put a place="left" (or "right") in the tag, but PGTEI doesn't support that. Is that even the right semantic tag for a sidenote? TEXT conversion had one glitch. For some reason, the footnote listing at the end of the text did not put a number 1 in front of the first footnote. The second footnote was labelled with a 2 correctly. This problem was not present in the HTML conversion. ** Page number markup: No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern. ** Blockquotes: I wanted to markup a blockquote example, but I didn't see how. Anyone out there know how to handle a blockquote with a text? ** Poetry markup: I had the most notes for poetry, so I left it for last in the markup. 1 - How should we markup poetry indents? In HTML, I use    toput two spaces for indents on the text.... *edit* I just found in Marcello's guide that he suggests using   as a quad indent. Works for me, unless someone has a different suggestion. 2 - It was unclear to me at first, that a poetry fragment still needed around it. which marks off one line of poetry is insufficient, because the poem line would still be treated as inline in the sentence with just . Putting around it set it off on its own line. 3 - If I understand the markup right, represents a portion of the poem, such as a single stanza. To represent the whole poem in one structural element, you need a higher level tag. Would work ok here? Or is the some poem tag I'm missing? HTML results - Poetry is not marked off well. The poems are flush with the left margin. Adding a larger margin around the poem will help it appear distinct from the prose text around it. Also, the paragraph indenting is affected by poetry. Since the conversion only indents a paragraph if the previous line was the end of a paragraph, it doesn't indent after a poem. This is taken care of if we revert to standard HTML paragraph spacing. **** Josh **** source.xml-- ============ The Hunting of the Snark Lewis Carroll Edition 12 March 1992 Project Gutenberg www.gutenberg.org March 1992 snark12

This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included online at www.gutenberg.org/license

THE MILLENNIUM FULCRUM EDITION 1.2
Library of Congress Classification British *** March 1992 unknown Project Gutenberg Edition October 2004 Joshua Hutchinson TEI markup
THE HUNTING OF THE SNARK an Agony in Eight Fits Lewis Carroll THE MILLENNIUM FULCRUM EDITION 1.2
PREFACE

If—and the thing is wildly possible—the charge of writing nonsense were ever brought against the author of this brief but instructive poem, it would be based, I feel convinced, on the line (in p.4)

"Then the bowsprit got mixed with the rudder sometimes." This is an example footnote.

In view of this painful possibility, I will not (as I might) appeal indignantly to my other writings as a proof that I am incapable of such a deed: I will not (as I might) point to the strong moral purpose of this poem itself, to the arithmetical principles so cautiously inculcated in it, or to its noble teachings in Natural History— I will take the more prosaic course of simply explaining how it happened.

The Bellman, who was almost morbidly sensitive about appearances, used to have the bowsprit unshipped once or twice a week to be revarnished, and it more than once happened, when the time came for replacing it, that no one on board could remember which end of the ship it belonged to. They knew it was not of the slightest use to appeal to the Bellman about it — he would only refer to his Naval Code, and read out in pathetic tones Admiralty Instructions which none of them had ever been able to understand — so it generally ended in its being fastened on, anyhow, across the rudder. The helmsman used to stand by with tears in his eyes; he knew it was all wrong, but alas! Rule 42 of the Code, "No one shall speak to the Man at the Helm," had been completed by the Bellman himself with the words "and the Man at the Helm shall speak to no one." So remonstrance was impossible, and no steering could be done till the next varnishing day. During these bewildering intervals the ship usually sailed backwards.

As this poem is to some extent connected with the lay of the Jabberwock, let me take this opportunity of answering a question that has often been asked me, how to pronounce "slithy toves." The "i" in "slithy" is long, as in "writhe"; and "toves" is pronounced so as to rhyme with "groves." Again, the first "o" in "borogoves" is pronounced like the "o" in "borrow." I have heard people try to give it the sound of the "o" in "worry. Such is Human Perversity.

This also seems a fitting occasion to notice the other hard works in that poem. Humpty-Dumpty's theory, of two meanings packed into one word like a portmanteau, seems to me the right explanation for all.

For instance, take the two words "fuming" and "furious." Make up your mind that you will say both words, but leave it unsettled which you will say first. Now open your mouth and speak. If your thoughts incline ever so little towards "fuming," you will say "fuming-furious;" if they turn, by even a hair's breadth, towards "furious," you will say "furious-fuming;" but if you have the rarest of gifts, a perfectly balanced mind, you will say "frumious."

Supposing that, when Pistol uttered the well-known words —

"Under which king, Bezonian? Speak or die!"

This is, hopefully, an example of a multi-line footnote.

Here is where the second line of the footnote should be.

Justice Shallow had felt certain that it was either William or Richard, but had not been able to settle which, so that he could not possibly say either name before the other, can it be doubted that, rather than die, he would have gasped out "Rilchiam!"

Fit the First THE LANDING "Just the place for a Snark!" the Bellman cried, As he landed his crew with care; Supporting each man on the top of the tide By a finger entwined in his hair. "Just the place for a Snark! I have said it twice: That alone should encourage the crew. Just the place for a Snark! I have said it thrice: What i tell you three times is true." The crew was complete: it included a Boots — A maker of Bonnets and Hoods — A Barrister, brought to arrange their disputes — And a Broker, to value their goods.
Example of a Table Column 1 HeadingColumn 2 Heading Column 1 DataColumn 2 Data
Column 1 Heading - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONGColumn 2 Heading - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG Column 1 Data - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONGColumn 2 Data - REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY LONG
THE END

From jon at noring.name Tue Oct 26 08:39:13 2004 From: jon at noring.name (Jon Noring) Date: Tue Oct 26 08:39:24 2004 Subject: [gutvol-d] Final PGTEI run-thru for a while... In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> Message-ID: <141044128062.20041026093913@noring.name> Joshua wrote: > This e-mail concludes the "common" items I want to check in PGTEI. > > ... > > 1 - How should we markup poetry indents? In HTML, I use >    toput two spaces for indents on the text.... *edit* I > just found in Marcello's guide that he suggests using   as a quad > indent. Works for me, unless someone has a different suggestion. As we've discussed (and argued) before, it is my belief that, except where typography is integral to the poem itself ("poetry as visual art"), that poetry should be marked up in a structural, not presentational, sense. This means text characters should NEVER be used for visual layout purposes -- characters should be used only for representing textual content. Using text characters for layout mucks up usability, repurposeability, CSS styling, and accessibility. Use XSL*, CSS or other styling language to effect the desired output. End-users will now have more ability to tailor the verse to their particular reading devices. Of course, a non-parsed comment could be added to the markup explaining how the original was typeset for those wishing to try to duplicate the original layout (but then, that's one purpose for having access to the original page scans.) Why some here are so enamored with needlessly duplicating the layout of verse in markup is beyond me -- especially when the original page scans are now preserved. I see no one here saying if the original text had indented paragraphs, that we must use a tab or spaces at the start of each paragraph in markup to duplicate that. Wherever the typography is used to help the end-user identify the structure of the poem, that is automatically amenable to structural markup (even if it has to be customized for some really weird poem.) Only when the typography *is* the poem itself does one resort to presentational markup, and here SVG makes the most sense. In a project I'm working on, the 1001 Arabian Nights by Sir Richard F. Burton, there are literally thousands of "quatrains" spread throughout the work. Burton, or the typesetter, chose to present these quatrains in an unusual way, no doubt simply to save paper since the following format makes each quatrain much more compact, and with thousands of quatrains in 6000+ pages, this could mean a lot fewer pages and substantially lower printing costs. Here's an example of how a quatrain is typeset in the source: The blear-eyed scapes the pits * Wherein the lynx-eyed fall: A word the wise man slays * And saves the natural: The Moslem fails of food * The Kafir feasts in hall: What art or act is man's? * God's will obligeth all! It is clear that the layout used in this example has nothing to do with the quatrain itself (the original being Arabic and very likely formatted in a totally different way.) In XHTML, here's how I have chosen to structure it (as you see, the '*' character seen above is not reproduced since it's purpose in the original is for typographic layout only -- it is not part of the content of the verse, just as page numbers are not part of the content of a work):

The blear-eyed scapes the pits

Wherein the lynx-eyed fall:

A word the wise man slays

And saves the natural:

The Moslem fails of food

The Kafir feasts in hall:

What art or act is man's?

God's will obligeth all!

With XSLT, if I wanted, the above could be transformed into the original format Burton used in print, or it could be output in the more traditional ABABABAB form of most 19th century Western poetry, with no loss in comprehension of the quatrain itself. There is nothing sacred about the typographic layout of *most* poetry I've seen, pretty as it might be in the printed source -- it simply extends the various typographic conventions used for ordinary prose to aid in understanding the "voiceability" of the verse and how the verses relate to each other. Only when we get to the "poetry as visual art" craze we see a lot in 20th century poetry (and as a few have noted, in older works) that we need to preserve the exact layout. As just noted, SVG is certainly intriguing to do this layout preservation. (This is not the only possible markup scheme, but works for my purposes. I suggest PG study a more generalized structural markup scheme for verse -- study maybe 100 random works containing verse and see if for at least 90 of them some sort of general markup scheme can be developed which, when converted to XHTML, allows a single CSS style sheet to reasonably display the poetry as originally typeset. It would not surprise me if such a 90% generalized markup scheme is possible: a sort of "Poetry Markup Language" -- the other 10% would be covered by customized extensions, and for "poetry as visual art" by SVG.) Jon Noring From joshua at hutchinson.net Tue Oct 26 08:57:08 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 08:57:13 2004 Subject: [gutvol-d] Final PGTEI run-thru for a while... Message-ID: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Jon Noring To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Final PGTEI run-thru for a while... Date: Tue, 26 Oct 2004 09:39:13 -0600 > > Joshua wrote: > > > This e-mail concludes the "common" items I want to check in PGTEI. > > > > ... > > > > 1 - How should we markup poetry indents? In HTML, I use > > toput two spaces for indents on the text.... *edit* I > > just found in Marcello's guide that he suggests using   as a quad > > indent. Works for me, unless someone has a different suggestion. > > As we've discussed (and argued) before, it is my belief that, except > where typography is integral to the poem itself ("poetry as visual > art"), that poetry should be marked up in a structural, not > presentational, sense. This means text characters should NEVER be used > for visual layout purposes -- characters should be used only for > representing textual content. Using text characters for layout mucks > up usability, repurposeability, CSS styling, and accessibility. > > Use XSL*, CSS or other styling language to effect the desired output. > End-users will now have more ability to tailor the verse to their > particular reading devices. Of course, a non-parsed comment could be > added to the markup explaining how the original was typeset for those > wishing to try to duplicate the original layout (but then, that's one > purpose for having access to the original page scans.) > The big difference here is that the indent spacing in lines of poetry is NOT just presentational. It can and has been argued that the spacing is INTENTIONAL and STRUCTURAL to the poem. Hence, the addition of the leading spaces, either through   (non-breaking space) or   (quad space). Your example of the layout of the quatrains is purely presentational. They do not provide any structural meaning to the poem, whereas poetry indentions often DO provide structural meaning. I'll let David argue the case further if necessary, as he's been the biggest proponent of poetry indents as structural vs presentational in the DP forum discussions on the subject. Josh From jon at noring.name Tue Oct 26 09:21:06 2004 From: jon at noring.name (Jon Noring) Date: Tue Oct 26 09:21:23 2004 Subject: [gutvol-d] Final PGTEI run-thru for a while... In-Reply-To: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com> References: <20041026155708.248B0EDDD0@ws6-1.us4.outblaze.com> Message-ID: <301046640781.20041026102106@noring.name> Josh wrote > Jon Noring wrote: >> As we've discussed (and argued) before, it is my belief that, except >> where typography is integral to the poem itself ("poetry as visual >> art"), that poetry should be marked up in a structural, not >> presentational, sense. This means text characters should NEVER be used >> for visual layout purposes -- characters should be used only for >> representing textual content. Using text characters for layout mucks >> up usability, repurposeability, CSS styling, and accessibility. >> >> Use XSL*, CSS or other styling language to effect the desired output. >> End-users will now have more ability to tailor the verse to their >> particular reading devices. Of course, a non-parsed comment could be >> added to the markup explaining how the original was typeset for those >> wishing to try to duplicate the original layout (but then, that's one >> purpose for having access to the original page scans.) > The big difference here is that the indent spacing in lines of > poetry is NOT just presentational. It can and has been argued that > the spacing is INTENTIONAL and STRUCTURAL to the poem. Hence, the > addition of the leading spaces, either through   (non-breaking > space) or   (quad space). How is this different than indenting the first line of a paragraph to communicate to the reader "this is a new paragraph"? All the poetry I've seen (and of course I've not seen it all), uses indentations of various types to communicate to the reader the *structure* of the poem, thus the poetry is amenable to structural markup provided the granularity of the markup is sufficient. CSS can be used to reproduce the original typographic layout if desired. I also assert that except for "poetry as visual art", any indentation intentionally added by the poet was done simply to assist the reader in understanding the structure -- this is comparable to an author insisting that new paragraphs indent the first line. > Your example of the layout of the quatrains is purely presentational. > They do not provide any structural meaning to the poem, whereas > poetry indentions often DO provide structural meaning. To the contrary -- the presentation used in Burton's quatrains does communicate the underlying structure of quatrains, which is essentially ABABABAB. Jon Noring From joshua at hutchinson.net Tue Oct 26 09:50:30 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 09:50:38 2004 Subject: [gutvol-d] Final PGTEI run-thru for a while... Message-ID: <20041026165030.B76192F94C@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Jon Noring Fundamentally, it comes down to whether the indentation is structual (it provides some author intended meaning to the poem) or simply presentational (the type setter put those indents there to make everything "pretty"). First argument for structural ... The author can use indention differently for different poems within the same book. That, to me, means that the indention pattern has some meaning for that author. Eg: It was intentional that the third and fifth lines are indented and the seventh and ninth lines were double indented. Yet, in the next poem, every other line is indented equally. That is an intentional indentation and implies meaning to it. Second argument for structural ... In the Chicago Manual of Style (reference I used is here: http://nutsandbolts.washcoll.edu/chicago.html), it does include an indent in the quoted poem it gives as the first example. Since the style guide thinks that indention is important and should be preserved, I think that argues for the preservation in our texts as well. Argument for using   (non-breaking space) or   (quad space) ... The full TEI spec (much less TEI-Lite) does not seem to have a tag element to control indents in a poem. Hence, the use of spaces to specify those indents. With the added benefit of being XML format independent. If our text is converted to some other flavor of XML, the indents are kept intact. Whereas if we tried to create a custom tag element to control indention, that tag element (and hence the information) would most likely be lost if the text was convert to some other flavor of XML. Josh PS After reading the full TEI spec section on poetry, I'm not sure at all anymore on how to tag an entire poem as a single entity. ie, if you have two separate poems in a book of poetry, what element(s) do you use to mark one poem as a separate from the other? markup seems to indicate "a stanza, refrain, verse paragraph, etc." (http://www.tei-c.org/P4X/VE.html) it mentions using
to mark off sections of a long poem into Cantos or Books... Should I use the same
markup for setting off individual poems as separate entities? From joshua at hutchinson.net Tue Oct 26 10:12:42 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 10:12:49 2004 Subject: [gutvol-d] Found an indention example in TEI... Message-ID: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com> Google is such a wonderful thing... :) I did some further digging and found an example (of the exact poem I was using for an example, no less) from the Electronic Text Center at the UofVirginia. They use a simple rend element within the tag to mark an indent. Anyone have a problem with that? It keeps the indention information, but does it within the XML tag element. The only possible issue I see if that it doesn't allow for variable length indents. Josh **** Lewis Carroll's The Hunting of the Snark Fit the First: THE LANDING "Just the place for a Snark!" the Bellman cried, As he landed his crew with care; Supporting each man on the top of the tide By a finger entwined in his hair. "Just the place for a Snark! I have said it twice: That alone should encourage the crew. Just the place for a Snark! I have said it thrice: What I tell you three times is true." [ETC....] From sly at victoria.tc.ca Tue Oct 26 10:31:28 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Oct 26 10:31:36 2004 Subject: [gutvol-d] Found an indention example in TEI... In-Reply-To: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com> References: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com> Message-ID: Some recent texts coming from dp have had css used for indicating indentation in verse. They use class selectors such as i2, i4, i6 to indicate various levels of indentation. Could something similar be adopted in this case? Andrew On Tue, 26 Oct 2004, Joshua Hutchinson wrote: > Google is such a wonderful thing... :) > > I did some further digging and found an example (of the exact poem I was using for an example, no less) from the Electronic Text Center at the UofVirginia. They use a simple rend element within the tag to mark an indent. > > Anyone have a problem with that? It keeps the indention information, but does it within the XML tag element. > > The only possible issue I see if that it doesn't allow for variable length indents. > > Josh > > **** > > Lewis Carroll's The Hunting of the Snark > > > Fit the First: THE LANDING > > > > "Just the place for a Snark!" the Bellman cried, > As he landed his crew with care; > Supporting each man on the top of the tide > By a finger entwined in his hair. > > > > > > "Just the place for a Snark! I have said it twice: > That alone should encourage the crew. > Just the place for a Snark! I have said it thrice: > What I tell you three times is true." > > > [ETC....] > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > From dlainson at sympatico.ca Tue Oct 26 10:32:06 2004 From: dlainson at sympatico.ca (dlainson@sympatico.ca) Date: Tue Oct 26 10:32:34 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind Message-ID: <417E51D6.16261.2CC444@localhost> Hello Here's a letter (which I'm apparently breaking some US law by forwarding, but I'll take the risk) which I find disturbing. Seems that "Project Gutenberg established PGA to permit the illegal downloading of works". Of this I wasn't aware. As a big contributor to PGA it concerns me personally, as well as setting a very dangerous precedent. Does one country have the right to dictate to another what a website can contain when it falls within the law of the host country, and can they force some sort of restrictions on the downloading of material? Don. ------- Forwarded message follows ------- From: "Col Choat" To: "Don Lainson" Subject: FW: Copyright Infringement of Gone With the Wind Date sent: Tue, 26 Oct 2004 09:36:48 +1000 -----Original Message----- From: Gonzalez, Dalgis [mailto:dgonzalez@fkkslaw.com]On Behalf Of Selz, Thomas Sent: Tuesday, 26 October 2004 6:29 AM To: colc@gutenberg.net.au Cc: Paul Anderson Sr. (E-mail); Paul Anderson Jr. (E-mail); Thomas Hal Clarke (E-mail); Thomas Hal Clarke (E-mail 2); Selz, Thomas Subject: Copyright Infringement of Gone With the Wind October 25, 2004 Certified Mail- Return receipt Requested Project Gutenberg 405 West Elm Street Urbana, IL 61801 By e-mail (colc@gutenberg.net.au) Project Gutenberg of Australia Re: Copyright Infringement of Gone With the Wind To Whom It May Concern: We represent the Stephens Mitchell Trusts (the ?Trusts?), the owner of the copyright to the book, Gone With The Wind (?GWTW?). There are copyright provisions around the world, including, without limitation, the United States Copyright Act, 17 U.S.C. ?101 et. seq, which grant the Trusts, as copyright owner, the exclusive right to reproduce and distribute GWTW in the United States and elsewhere. It has come to our attention that Project Gutenberg?s affiliate, Project Gutenberg of Australia (?PGA?), is publishing GWTW in electronic book form on its web site located at www.gutenberg.net.au (the ?Web Site?). The Web Site states that PGA ?produces etexts in accordance with Australian law? and that the books available on its site are in the public domain in Australia. While the Web Site warns that some of its ebooks may still be protected by copyright in the U.S. and suggests that U.S. users check U.S. copyright laws or visit Project Gutenberg?s U.S. web site for its list of public domain works, there is nothing to prevent any U.S. user from simply downloading GWTW from the Web Site. Indeed, we were able to do so easily. It appears to us that Project Gutenberg established PGA to permit the illegal downloading of works that are still subject to copyright protection in the U.S. and elsewhere. Project Gutenberg?s and PGA?s willful, knowing and unauthorized distribution of GWTW to users in the U.S. and elsewhere where copyright protection remains available is a blatant violation of our client?s rights under applicable statutes and common law. Please be advised that Project Gutenberg and PGA are subject to U.S. copyright law and to jurisdiction in the U.S. for their infringing activities through applicable jurisdiction statutes governing the commission of acts of infringement that either occur in the U.S. or have an effect in the U.S. On behalf of the Trusts, we hereby demand that Project Gutenberg and/or PGA confirm to us within five (5) days of receipt of this letter that you have removed GWTW from the Web Site entirely or that you have taken all necessary steps to prevent the downloading of GWTW in all places in which it is protected by copyright. Please be advised that if we have not received confirmation of your willingness to comply with the foregoing demands, we will take all appropriate steps to protect and enforce our clients? rights. This demand is without prejudice to all of the Trusts? rights and remedies in this matter, both legal and equitable, all of which are specifically and expressly reserved. Very truly yours, Thomas D. Selz cc:Paul H. Anderson, Sr., Esq. Paul Anderson, Jr., Esq. Thomas Hal Clarke, Jr., Esq. Dalgis E. Gonzalez FrankfurtKurnit Klein & Selz, PC 488 Madison Avenue New York, New York 10022 Tel: (212) 980-0120 x6735 Fax: (212) 593-9175 E-mail: dgonzalez@fkkslaw.com This e-mail and any attached files are intended solely for the use of the individual or entity to which this mail is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. Any use, disclosure, copying or distribution of this e-mail or the attached files by anyone other than the intended recipient is strictly prohibited. If you have received this e-mail in error, please notify the sender by reply e- mail or collect call to (212) 980-0120 and delete this e-mail and attached files from your system. Thank you. ------- End of forwarded message ------- Don Lainson dlainson@sympatico.ca From joshua at hutchinson.net Tue Oct 26 10:53:37 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 10:53:45 2004 Subject: [gutvol-d] Found an indention example in TEI... Message-ID: <20041026175337.E9330109774@ws6-4.us4.outblaze.com> I found another example of using the rend attribute. As he landed his crew with care; As he landed his crew with care; etc. The indent level would indicate number of tab stop indentions for that line. Not in the TEI spec itself, but seems to be widely used modification/addition. Josh ----- Original Message ----- From: Andrew Sly To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Found an indention example in TEI... Date: Tue, 26 Oct 2004 10:31:28 -0700 (PDT) > > > Some recent texts coming from dp have had css used for indicating > indentation in verse. They use class selectors such as i2, i4, i6 > to indicate various levels of indentation. Could something similar > be adopted in this case? > > Andrew > > On Tue, 26 Oct 2004, Joshua Hutchinson wrote: > > > As he landed his crew with care; From Bowerbird at aol.com Tue Oct 26 11:05:41 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Oct 26 11:05:55 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind Message-ID: <1e0.2d85fdac.2eafebf5@aol.com> frankfurtkurnit klein & selz said: > Tel: (212) 980-0120 x6735 michael, whenever you want us to call these guys, you just let us know... :+) -bowerbird From jeroen at bohol.ph Tue Oct 26 11:39:48 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Tue Oct 26 11:39:41 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind In-Reply-To: <417E51D6.16261.2CC444@localhost> References: <417E51D6.16261.2CC444@localhost> Message-ID: <417E99F4.5000605@bohol.ph> I would reply something following these lines. 1. The copyright of the work has experid accourding to Australian law. 2. PG of A is not subject to US copyright law, as it has no activities in the US. It is the responsibility of US visitors to our website to comply with US law, similar to a US visitor visiting Australia, and buying a printed copy of GWTW and bringing it to the US, or buying the same by postal order. 2. Nothing in Australian law requires PG of A to prevent access to public domain materials for visitors appearantly from outside Australia. Furthermore, no reliable means exists to determine the geographical location of a visitor to our website. Even if we would be able to implement such a mechanism, it would be easy to circumvent, using proxy servers. 3. The downloading for personal use and study, as is facilitated by the PG of Oz website, may, in many jurisdictions, consititute fair use of the work by the visitor, and hence, the downloading is not necessarily illegal in third countries, even if the work is still under copyright. 4. Project Gutenberg of Australia is legally an entirely independent organisation from PG of US, and PG of US cannot be held liable for any actions of PG of A. It has not been set up for the express purpose of evading US law, but as an independent sister organisation to allow Australian volunteers to distribute works in the public domain in Au. Nobody of PG US has any hand in establishing or running PG of A. Hence, PG US is lacks the means comply with your request, and PG of A is fully within its rights to behave as it does. Any litigation will be fully without merit. We advise you that any such litigation will be accompanied by a campaign on our side to increase public awareness of the overly long duration of copyright, and the highly disputable way such extentions have been bought in the US. I think Michael has dealth with issues like these a few times, so may have a letter ready in the correct legalese... Jeroen Hellingman > > From cannona at fireantproductions.com Tue Oct 26 11:40:50 2004 From: cannona at fireantproductions.com (Aaron Cannon) Date: Tue Oct 26 11:42:16 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind In-Reply-To: <417E51D6.16261.2CC444@localhost> References: <417E51D6.16261.2CC444@localhost> Message-ID: <6.1.2.0.0.20041026132146.01ff6c30@mail.fireantproductions.com> Cute letter. It's built upon a foundation of faulty suppositions, and imaginative stretches of logic, but it's cute. :) I'll be interested (if the powers that be decide to share) to see the official response (if any). Sincerely Aaron Cannon At 12:32 PM 10/26/2004, you wrote: > Hello > > Here's a letter (which I'm apparently breaking some US law by > forwarding, but I'll take the risk) which I find disturbing. Seems > that "Project Gutenberg established PGA to permit the illegal > downloading of works". Of this I wasn't aware. As a big contributor > to PGA it concerns me personally, as well as setting a very dangerous > precedent. > > Does one country have the right to dictate to another what a website > can contain when it falls within the law of the host country, and can > they force some sort of restrictions on the downloading of material? > > Don. > > ------- Forwarded message follows ------- > From: "Col Choat" > To: "Don Lainson" > Subject: FW: Copyright Infringement of Gone With the Wind > Date sent: Tue, 26 Oct 2004 09:36:48 +1000 > > > > -----Original Message----- > From: Gonzalez, Dalgis [mailto:dgonzalez@fkkslaw.com]On Behalf Of > Selz, Thomas > Sent: Tuesday, 26 October 2004 6:29 AM > To: colc@gutenberg.net.au > Cc: Paul Anderson Sr. (E-mail); Paul Anderson Jr. (E-mail); Thomas > Hal Clarke (E-mail); Thomas Hal Clarke (E-mail 2); Selz, Thomas > Subject: Copyright Infringement of Gone With the Wind > > October 25, 2004 > > > Certified Mail- > Return receipt Requested > > Project Gutenberg > 405 West Elm Street > Urbana, IL 61801 > > By e-mail (colc@gutenberg.net.au) > > Project Gutenberg of Australia > > >Re: Copyright > Infringement of Gone With the Wind > > To Whom It May Concern: > We represent the Stephens Mitchell Trusts (the ?Trusts?), the owner > of the copyright to the book, Gone With The Wind (?GWTW?). There are > copyright provisions around the world, including, without limitation, > the United States Copyright Act, 17 U.S.C. ?101 et. seq, which grant > the Trusts, as copyright owner, the exclusive right to reproduce and > distribute GWTW in the United States and elsewhere. > It has come to our attention that Project Gutenberg?s affiliate, > Project Gutenberg of Australia (?PGA?), is publishing GWTW in > electronic book form on its web site located at www.gutenberg.net.au > (the ?Web Site?). The Web Site states that PGA ?produces etexts in > accordance with Australian law? and that the books available on its > site are in the public domain in Australia. While the Web Site warns > that some of its ebooks may still be protected by copyright in the > U.S. and suggests that U.S. users check U.S. copyright laws or visit > Project Gutenberg?s U.S. web site for its list of public domain > works, there is nothing to prevent any U.S. user from simply > downloading GWTW from the Web Site. Indeed, we were able to do so > easily. > It appears to us that Project Gutenberg established PGA to permit the > illegal downloading of works that are still subject to copyright > protection in the U.S. and elsewhere. Project Gutenberg?s and PGA?s > willful, knowing and unauthorized distribution of GWTW to users in > the U.S. and elsewhere where copyright protection remains available > is a blatant violation of our client?s rights under applicable > statutes and common law. Please be advised that Project Gutenberg > and PGA are subject to U.S. copyright law and to jurisdiction in the > U.S. for their infringing activities through applicable jurisdiction > statutes governing the commission of acts of infringement that either > occur in the U.S. or have an effect in the U.S. > On behalf of the Trusts, we hereby demand that Project Gutenberg > and/or PGA confirm to us within five (5) days of receipt of this > letter that you have removed GWTW from the Web Site entirely or that > you have taken all necessary steps to prevent the downloading of GWTW > in all places in which it is protected by copyright. > Please be advised that if we have not received confirmation of your > willingness to comply with the foregoing demands, we will take all > appropriate steps to protect and enforce our clients? rights. > This demand is without prejudice to all of the Trusts? rights and > remedies in this matter, both legal and equitable, all of which are > specifically and expressly reserved. > > Very truly yours, > > > Thomas D. Selz > > cc:Paul H. Anderson, Sr., Esq. > Paul Anderson, Jr., Esq. > Thomas Hal Clarke, Jr., Esq. > Dalgis E. Gonzalez > FrankfurtKurnit Klein & Selz, PC > 488 Madison Avenue > New York, New York 10022 > Tel: (212) 980-0120 x6735 > Fax: (212) 593-9175 > E-mail: dgonzalez@fkkslaw.com > > This e-mail and any attached files are intended solely for the use of > the individual or entity to which this mail is addressed and may > contain information that is privileged, confidential and exempt from > disclosure under applicable law. Any use, disclosure, copying or > distribution of this e-mail or the attached files by anyone other > than the intended recipient is strictly prohibited. If you have > received this e-mail in error, please notify the sender by reply e- > mail or collect call to (212) 980-0120 and delete this e-mail and > attached files from your system. Thank you. > > ------- End of forwarded message ------- > > Don Lainson > dlainson@sympatico.ca > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d -- E-mail: cannona@fireantproductions.com Skype: cannona MSN Messenger: cannona@hotmail.com (Do not send E-mail to the hotmail address.) From marcello at perathoner.de Tue Oct 26 13:42:13 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Oct 26 13:42:26 2004 Subject: [gutvol-d] Found an indention example in TEI... In-Reply-To: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com> References: <20041026171242.B5572EDE67@ws6-1.us4.outblaze.com> Message-ID: <417EB6A5.5060805@perathoner.de> Joshua Hutchinson wrote: > "Just the place for a Snark!" the Bellman cried, > As he landed his crew with care; Indenting is difficult to handle because it is such a hybrid structural / presentational stuff. There are at least 3 types of indent: 1) indent of a block(quote) 2) indent of the first line of a paragraf 3) indent of a verse line Lets just consider 3) Ways of purely presentational tagging:   simple, robust, standard-conforming and already implemented. found in the TEI spec but limited because just one level. for X = 1, 2, 3 ugly, makes me want to puke. Negative indents ? better. Compatible with TEI spec. Falls back to one indent if more than one is not supported by XSLT. still better. most elegant but not so standard. Needs new attribute. Structural tagging: There was a young lady of Riga, Who smiled as she rode on a tiger; They returned from the ride With the lady inside, And the smile on the face of the tiger. This is all just off of the top of my head. Once we have figured out what we want, I can start implementing. -- Marcello Perathoner webmaster@gutenberg.org From nihil_obstat at mindspring.com Tue Oct 26 13:45:03 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Tue Oct 26 13:45:14 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind Message-ID: <11773933.1098823504193.JavaMail.root@wamui09.slb.atl.earthlink.net> Have they claimed that the cab driver who hit Margaret Mitchell at 13th and Peachtree in 1949 was also working for the vast P.G. conspiracy? Perhaps if she had know how it would have affected her copyright term in Australia, she would have been more careful crossing the street. Has anyone looked into rewriting the Gutenberg License to prohibiting inherited copyright holders and their attorneys from downloading PG works? --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From joshua at hutchinson.net Tue Oct 26 14:02:21 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Oct 26 14:02:30 2004 Subject: [gutvol-d] Found an indention example in TEI... Message-ID: <20041026210221.6A880109711@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Joshua Hutchinson wrote: > > > "Just the place for a Snark!" the Bellman cried, > > As he landed his crew with care; > I think I understood everything but this line. > > for X = 1, 2, 3 > > ugly, makes me want to puke. Negative indents ? > Negative indents? 1, 2, 3 would seem to indicate number of indents to apply, which would seem to me would be a positive indent number. I'm missing something here, I know it. > > > better. Compatible with TEI spec. Falls back to one indent > if more than one is not supported by XSLT. > I don't really like it. Too verbose and not as intuitive (IMO). > > > still better. Related to the negative indent comment above, I'm sure, but I'm still not connecting the dots. > > > > most elegant but not so standard. Needs new attribute. > > The not as "standard" is the only problem I have. But I could go either way if you feel strongly about it. > Structural tagging: > > > There was a young lady of Riga, > Who smiled as she rode on a tiger; > > They returned from the ride > With the lady inside, > > And the smile on the face of the tiger. > > Looks fine to me (as long as the rend attribute is possible in the tag, too, if necessary). Josh From stephen.thomas at adelaide.edu.au Tue Oct 26 17:33:01 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Tue Oct 26 17:33:20 2004 Subject: [gutvol-d] Found an indention example in TEI... In-Reply-To: <20041026175337.E9330109774@ws6-4.us4.outblaze.com> References: <20041026175337.E9330109774@ws6-4.us4.outblaze.com> Message-ID: <417EECBD.7060107@adelaide.edu.au> As it says in the "Snark": ?What?s the good of Mercator?s North Poles and Equators, ?Tropics, Zones, and Meridian Lines?? So the Bellman would cry: and the crew would reply ??They are merely conventional signs! Ditto markup: merely conventional signs. ;-) The discussion is fascinating. About nine months back, I got seriously interested in TEI, and was looking at converting all my ebooks to TEI. Among a number of stumbling blocks I encountered was this question of what to do with poetry. Probably, I lack suffucuent energy or interest, or possibly time -- always a great excuse. But I regret to say that I gave up at this point. But during my "research" into the poetry question, I wondered aloud on the TEI list whether there were identified verse structures which could/should be used in markup. E.g. sonnets and limericks seem to have a generally accepted layout, so maybe there were other forms too. Unfortunately, possibly because I didn't pay attention in school, I am rather ignorant about such things. Unfortunately, no one else on that list seemed to know either. Now, someone here just posted an example which began: ... and for TEI's purposes, that's probably enough. (Although TEI has the rend attribute, TEI is actually pretty weak on the presentational side -- not just my opinion, but that of many experts on the TEI list.) Unfortunately, it is not possible to define a CSS style which will translate "limerick" into the desired presentation. In my HTML, I've used the em-space entity to indent lines where necessary. It's the easy way out, I know, but somehow I can't stomach the mess that results from etc. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From gbnewby at pglaf.org Wed Oct 27 01:34:41 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Oct 27 01:34:44 2004 Subject: [gutvol-d] (Fwd) FW: Copyright Infringement of Gone With the Wind In-Reply-To: <417E51D6.16261.2CC444@localhost> References: <417E51D6.16261.2CC444@localhost> Message-ID: <20041027083441.GB5668@pglaf.org> On Tue, Oct 26, 2004 at 01:32:06PM -0400, dlainson@sympatico.ca wrote: > > Hello > > Here's a letter (which I'm apparently breaking some US law by > forwarding, but I'll take the risk) which I find disturbing. Seems > that "Project Gutenberg established PGA to permit the illegal > downloading of works". Of this I wasn't aware. As a big contributor > to PGA it concerns me personally, as well as setting a very dangerous > precedent. Folks, it's safe to assume that the people who sent the letter (or other folks who might send other letters) could access the gutvol-d list or archives. So, in the interest of not helping them to think of new ways to harass us, I won't send a lot of detail on this particular case, or the ones like it. Suffice to say that, as others have commented, these folks are incorrect in many things. The PG response to such threats is to tell them this (politely), and mention that we have done extensive legal research over the years (in consultation with numerous lawyers) to support our notions. If *they* know of laws or legal precedents to the contrary, we would be very happy to hear of them and will seek to comply, as we do with all other laws. We also offer to help them, by providing information about these laws in the copies of the eBook(s) in question that we distribute, and to help further by writing letters to infringers they can identify. In short, we point out the errors in their requests, assumptions, claims, etc. and put the ball back in their court. This has been an effective strategy over the years. But of course, it's effective mostly because we're right, and legal -- PG's diligence in copyright procedures gives us a strong moral & practical ground to stand on. > Does one country have the right to dictate to another what a website > can contain when it falls within the law of the host country, and can > they force some sort of restrictions on the downloading of material? The short answer is, no. We are not aware of anything like this, and have looked extensively, and consulted with many legal experts. There are definitely some vague points and unknowns, and eventually there might be treaties etc. that address some of these issues. -- Greg Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation http://gutenberg.net A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org From marcello at perathoner.de Wed Oct 27 02:25:13 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 27 02:25:36 2004 Subject: [gutvol-d] Found an indention example in TEI... In-Reply-To: <20041026210221.6A880109711@ws6-4.us4.outblaze.com> References: <20041026210221.6A880109711@ws6-4.us4.outblaze.com> Message-ID: <417F6979.2030200@perathoner.de> Joshua Hutchinson wrote: > Negative indents? 1, 2, 3 would seem to indicate number of indents > to apply, which would seem to me would be a positive indent number. > I'm missing something here, I know it. What if you want a line to stick out stanza stanza like this? stanza stanza Solutions: like this or like this > > or > > > downloading of works". Of this I wasn't aware. As a big contributor > > to PGA it concerns me personally, as well as setting a very dangerous > > precedent. > > Folks, it's safe to assume that the people who sent the letter (or > other folks who might send other letters) could access the gutvol-d > list or archives. So, in the interest of not helping them to think of > new ways to harass us, I won't send a lot of detail on this particular > case, or the ones like it. Saw this story on slashdot, as well as Michael's Part I of today's newsletter. So much for keeping this under our hats :-) One quick factoid: None of PG (of US) has received any contact from the Mitchell lawyers (not to Michael's house, despite the statements in the letter, not to our business office in Utah, nor to my house in Fairbanks, which is the corporate business address of record). So, it's a little premature to make any sort of response. In case you were wondering, one of the first things we do when we get such letters is confirm that the people involved are who they say they are, and have some sort of legal relationship to the texts. In this case, we only have one forwarded email, which could be fake or from parties without legal standing. Thus, it's premature to even feel confident that the letter/complaint is real, or that (if it is real) the people who sent it are legitimate. -- Greg PS: I probably won't have time to post to /. today [my day job is calling], but people can feel free to repost my comments or extracts. There aren't any secrets here, but I do urge a modicum of discretion since so few facts are known. > Suffice to say that, as others have commented, these folks are > incorrect in many things. The PG response to such threats is to tell > them this (politely), and mention that we have done extensive legal > research over the years (in consultation with numerous lawyers) to > support our notions. If *they* know of laws or legal precedents to > the contrary, we would be very happy to hear of them and will seek to > comply, as we do with all other laws. > > We also offer to help them, by providing information about these laws > in the copies of the eBook(s) in question that we distribute, and to > help further by writing letters to infringers they can identify. In > short, we point out the errors in their requests, assumptions, claims, > etc. and put the ball back in their court. > > This has been an effective strategy over the years. But of course, > it's effective mostly because we're right, and legal -- PG's diligence > in copyright procedures gives us a strong moral & practical ground to > stand on. > > > Does one country have the right to dictate to another what a website > > can contain when it falls within the law of the host country, and can > > they force some sort of restrictions on the downloading of material? > > The short answer is, no. We are not aware of anything like this, and > have looked extensively, and consulted with many legal experts. There > are definitely some vague points and unknowns, and eventually there > might be treaties etc. that address some of these issues. > -- Greg > > Dr. Gregory B. Newby > Chief Executive and Director > Project Gutenberg Literary Archive Foundation http://gutenberg.net > A 501(c)(3) not-for-profit organization with EIN 64-6221541 > gbnewby@pglaf.org > From scott_bulkmail at productarchitect.com Wed Oct 27 11:25:00 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Oct 27 11:25:54 2004 Subject: [gutvol-d] Final PGTEI... blockquote In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> Message-ID: >Blockquotes: > >I wanted to markup a blockquote example, but I didn't see how. Anyone out there know how to handle a blockquote with a text? Perhaps:
(not very intuitive, eh?) It's mentioned briefly in Marcello's docs, once you know what to look for ("wider margins"). Also, search for these in Marcello's alice.tei and lmiss.tei examples. The "q" one will add quote marks, unless supressed via the appropriate attribute. I couldn't find it in the TEI Lite docs (though I assume it's there somewhere). I did find it in Section 4.3 of "Bare Bones TEI" http://www.tei-c.org/Vault/Bare/ -- which also suggests that rend="block" is equivalent. (I didn't find independent confirmation.) -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From scott_bulkmail at productarchitect.com Wed Oct 27 11:25:15 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Wed Oct 27 11:25:58 2004 Subject: [gutvol-d] Final PGTEI... page numbers In-Reply-To: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> Message-ID: >Page number markup: > >No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern. I didn't see an explicit way to mark the original page numbers. Perhaps as a marginal note? 27 -- Scott Practical Software Innovation (tm), http://ProductArchitect.com/ From joshua at hutchinson.net Wed Oct 27 11:33:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 27 11:34:05 2004 Subject: [gutvol-d] Final PGTEI... page numbers Message-ID: <20041027183357.8AFEE4F4A7@ws6-5.us4.outblaze.com> Where x is the original source page number. Here is a tentative TEI2HTML transform for it... [pg ] (Thanks to Marcello for cleaning up my ugly first attempt at the transform.) The CSS will need a span.pagenum defined to put the [pg x] markup in the side margin. Josh ----- Original Message ----- From: Scott Lawton To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Final PGTEI... page numbers Date: Wed, 27 Oct 2004 14:25:15 -0400 > > >Page number markup: > > > >No complaints. I'll be looking into a transform that will place the numbers in the margin, but that is a secondary concern. > > I didn't see an explicit way to mark the original page numbers. Perhaps as a marginal note? > > 27 > -- > > Scott > > Practical Software Innovation (tm), http://ProductArchitect.com/ > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Wed Oct 27 11:40:58 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 27 11:41:06 2004 Subject: [gutvol-d] Final PGTEI... blockquote Message-ID: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com> I think I'd prefer the rend="block" over "display" just because it is a little more intuitive. Thanks for tracking that down Scott. I also found how to handle sidenotes since I wrote that original message. It is actually used in Marcello's PGTEI guide (though not talked about specifically... it is used to provide some info on another topic). If anyone is wondering, I am keeping track of all this markup. I gonna see about putting together a webpage with the ongoing notes. Josh ----- Original Message ----- From: Scott Lawton To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Final PGTEI... blockquote Date: Wed, 27 Oct 2004 14:25:00 -0400 > > >Blockquotes: > > > >I wanted to markup a blockquote example, but I didn't see how. Anyone out there know how to handle a blockquote with a text? > > Perhaps: >
> > > (not very intuitive, eh?) > > It's mentioned briefly in Marcello's docs, once you know what to look for ("wider margins"). > > Also, search for these in Marcello's alice.tei and lmiss.tei examples. The "q" one will add quote marks, unless supressed via the appropriate attribute. > > I couldn't find it in the TEI Lite docs (though I assume it's there somewhere). I did find it in Section 4.3 of "Bare Bones TEI" http://www.tei-c.org/Vault/Bare/ -- which also suggests that rend="block" is equivalent. (I didn't find independent confirmation.) > -- > > Scott > > Practical Software Innovation (tm), http://ProductArchitect.com/ > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Wed Oct 27 13:45:11 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Oct 27 13:45:23 2004 Subject: [gutvol-d] Final PGTEI... blockquote In-Reply-To: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com> References: <20041027184058.DB3E8EDCF0@ws6-1.us4.outblaze.com> Message-ID: <418008D7.8040702@perathoner.de> Joshua Hutchinson wrote: > I think I'd prefer the rend="block" over "display" just because it is > a little more intuitive. Thanks for tracking that down Scott. "display" is different from "block". A display is a chunk of text set off (displayed) from the rest usually by enlarging the left and right margins and inserting some top and bottom margins. A block is just a plain old block with nothing special to distinguish it from any block fore or aft. The HTML tag
is just a bit misnamed. > I also found how to handle sidenotes since I wrote that original > message. > > This works only in HTML for now. Ideas on how to display it in TXT without wasting a lot of space? -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Oct 27 13:55:05 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 27 13:55:14 2004 Subject: [gutvol-d] Final PGTEI... blockquote Message-ID: <20041027205505.B98BA2F959@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Joshua Hutchinson wrote: > > > I think I'd prefer the rend="block" over "display" just because it is > > a little more intuitive. Thanks for tracking that down Scott. > > "display" is different from "block". > > A display is a chunk of text set off (displayed) from the rest usually > by enlarging the left and right margins and inserting some top and > bottom margins. > > A block is just a plain old block with nothing special to distinguish it > from any block fore or aft. > > The HTML tag
is just a bit misnamed. > The barebones guide didn't seem to make a distinction between the two, but if "display" seems more correct to you, I'm willing to use it. > > I also found how to handle sidenotes since I wrote that original > > message. > > > > > > This works only in HTML for now. Ideas on how to display it in TXT > without wasting a lot of space? > In a text file, I would say one of two ways. 1 - Easy way, just treat it like a footnote/endnote and stick it at the end. 2 - Slightly better way (pulled from how DP texts do it) ... Move the note to before the paragraph it is part off and mark it with a [Sidenote: blah blah] markup. Both methods lose a little fidelity, since the Sidenote is not printed exactly right by the text it refers to, like it would in the original. But method two keeps it fairly close, and context should allow the reader to easily tell the part of the paragraph it refers to. Method one would allow the marker to appear near its original source location, but the information is now not in the same eye region. The user must click to the notes section to see the information, which is commonly meant to be more accessible/more important than a typical footnote. Josh From Bowerbird at aol.com Wed Oct 27 15:26:11 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 27 15:26:35 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: <1a5.29beadb0.2eb17a83@aol.com> joshua said: > Both methods lose a little fidelity, since the Sidenote > is not printed exactly right by the text it refers to, > like it would in the original. But method two keeps it > fairly close, and context should allow the reader > to easily tell the part of the paragraph it refers to. > Method one would allow the marker to appear near its original > source location, but the information is now > not in the same eye region. The user must click to > the notes section to see the information, which is commonly > meant to be more accessible/more important than a typical footnote. is it unreasonable to want to view _all_ notes -- sidenotes, footnotes, _and_ endnotes _too_ -- right there close to the context where they apply? i think not. in print form, it cannot be done, of course. (not by the printer, anyway, although readers can do a pretty good job of using their hands to hold both pages, and switch between them.) hotlinks between a note and its referent can enable a person to "switch" in a similar way. but you're still looking at either one page or the other, when you want to look at _both_. and in the electronic arena, we can easily go that step better, so why not take advantage? in my viewer-program, all notes are stored in an end-note section, but any note can be "popped up" by a user just by clicking on its note-indicator. so on the left half of the screen, they have the body of the text, and on the right-half they have the end-note section in a scrollable edit-field. (actually the whole file, but it's auto-positioned at the appropriate note in the end-note section.) this lets them see each note in the context of the notes that surround it -- which can be very useful when the author has used ibids and op cits. in addition, if the user double-clicks inside the scrolling-field on the number of _another_ note, the display on the left-hand side jumps to show the page that has the text that calls _that_ note. so even though the notes are collected together in a place that is removed from their referents in the _file_, the viewer-program brings them together in a way giving users maximum power, letting them see text and note at the same time, and navigate easily amongst all of the notes. as i experiment with this system, i'm quite happy i've achieved a very good solution to the problem, and i consider myself to be "done" working on it... (at least until i consider what to do when printing.) but if you can think of any other capability i should add to it, please suggest it, i would love to hear it! and let the programmers of _your_ favorite viewer -- whoever they are -- know that _you_ would enjoy this ability to view _all_ notes just like sidenotes, simultaneously with the text that references them, so you would appreciate it if they would program that. let those programmers know that you'll be willing to format notes however they require in order to provide this feature, but you definitely _want_ the capability. and if you're not on speaking terms with the people who are programming your viewer-tools, _why_not_? -bowerbird From joshua at hutchinson.net Wed Oct 27 16:35:33 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Oct 27 16:35:02 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes In-Reply-To: <1a5.29beadb0.2eb17a83@aol.com> References: <1a5.29beadb0.2eb17a83@aol.com> Message-ID: <418030C5.7070505@hutchinson.net> Yes, genius, we have that ability, too, in the HTML. We were talking about the plain text version which is reader program agnostic. Josh Bowerbird@aol.com wrote: >joshua said: > > >> Both methods lose a little fidelity, since the Sidenote >> is not printed exactly right by the text it refers to, >> like it would in the original. But method two keeps it >> fairly close, and context should allow the reader >> to easily tell the part of the paragraph it refers to. >> Method one would allow the marker to appear near its original >> source location, but the information is now >> not in the same eye region. The user must click to >> the notes section to see the information, which is commonly >> meant to be more accessible/more important than a typical footnote. >> >> > > >is it unreasonable to want to view _all_ notes >-- sidenotes, footnotes, _and_ endnotes _too_ -- >right there close to the context where they apply? > >i think not. > >in print form, it cannot be done, of course. >(not by the printer, anyway, although readers >can do a pretty good job of using their hands >to hold both pages, and switch between them.) > >hotlinks between a note and its referent can >enable a person to "switch" in a similar way. > >but you're still looking at either one page or >the other, when you want to look at _both_. > >and in the electronic arena, we can easily go >that step better, so why not take advantage? > >in my viewer-program, all notes are stored in an >end-note section, but any note can be "popped up" >by a user just by clicking on its note-indicator. > >so on the left half of the screen, they have the >body of the text, and on the right-half they have >the end-note section in a scrollable edit-field. >(actually the whole file, but it's auto-positioned >at the appropriate note in the end-note section.) > >this lets them see each note in the context of >the notes that surround it -- which can be very >useful when the author has used ibids and op cits. > >in addition, if the user double-clicks inside the >scrolling-field on the number of _another_ note, >the display on the left-hand side jumps to show >the page that has the text that calls _that_ note. > >so even though the notes are collected together >in a place that is removed from their referents >in the _file_, the viewer-program brings them >together in a way giving users maximum power, >letting them see text and note at the same time, >and navigate easily amongst all of the notes. > >as i experiment with this system, i'm quite happy >i've achieved a very good solution to the problem, >and i consider myself to be "done" working on it... >(at least until i consider what to do when printing.) > >but if you can think of any other capability i should >add to it, please suggest it, i would love to hear it! > >and let the programmers of _your_ favorite viewer >-- whoever they are -- know that _you_ would enjoy >this ability to view _all_ notes just like sidenotes, >simultaneously with the text that references them, >so you would appreciate it if they would program that. > >let those programmers know that you'll be willing to >format notes however they require in order to provide >this feature, but you definitely _want_ the capability. > >and if you're not on speaking terms with the people >who are programming your viewer-tools, _why_not_? > >-bowerbird >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From Bowerbird at aol.com Wed Oct 27 16:48:42 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Oct 27 16:48:58 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: joshua said: > Yes, genius, we have that ability, too, in the HTML. do you now? then why don't i see more .html versions prepared this way? read what i wrote, carefully, and then show me some e-texts that have an .html version that can match those capabilities... and what's with the "genius" comment? are you being snide? > We were talking about the plain text version > which is reader program agnostic. my viewer-program takes plain-text files. you can be as "agnostic" as you care to be, but if you don't serve the readers, who are you serving? -bowerbird From brad at chenla.org Wed Oct 27 21:02:32 2004 From: brad at chenla.org (Brad Collins) Date: Wed Oct 27 21:04:47 2004 Subject: [gutvol-d] Final PGTEI... page numbers In-Reply-To: (Scott Lawton's message of "Wed, 27 Oct 2004 14:25:15 -0400") References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> Message-ID: Scott Lawton writes: >>Page number markup: >> >>No complaints. I'll be looking into a transform that will place the >>numbers in the margin, but that is a secondary concern. > > I didn't see an explicit way to mark the original page numbers. > Perhaps as a marginal note? > > 27 > -- Page numbers are put in the `pb' pagebreak element. ,----[ TEI Manual: 6.9.3 Milestone Tags ] | - marks the boundary between one page of a text and the next | in a standard reference system. | | `ed' (edition) indicates the edition or version in which the page | break is located at this point | | - marks the start of a new (typographic) line in some edition | or version of a text. | | `ed' (edition) indicates the edition or version in which the line | break is located at this point | | - marks the boundary between one column of a text and the next | in a standard reference system. | | `ed' (edition) indicates the edition or version in which the column | break is located at this point `---- There is no need for a `place' attribute, you can use rend="margin" instead. But this is confusing because this it's saying that the page breaks in the original edition were in the margin. And if so, which margin, left or right? Presentational markup should be used to indicate how the original was marked up. Instructions for how something should be displayed should be done using CSS or XSLT. I'm using a EETS edition of The Merlin as a development text because it has a running analysis in the left margin, footnotes, and indicates the page breaks in the original manuscript. So in the electronic edition I need to indicate two different sets of page breaks, one for the original manuscript and another for the page breaks in the EETS edition. This can easily be done using the edition `ed' attribute. Learning TEI is like learning Emacs or Unix like systems. It's a gradual process of incremental epiphanies. TEI is a large and complex spec and takes some time to digest. More than once over the past couple of years I have quickly looked up something in TEI and thought that it was silly and then came up with my own alternate solution. However, most of the time, after putting my hack into practice I found it didn't work as I expect and finally understood why TEI had had done things the way they had. I've come to respect TEI more and more as a mature body of experience which I am trusting more and more. If something seems stupid or awkward I now try to stop and step back and assume that there is a good chance I don't understand the design before trying to cobble to together my own solution. Detractors of XML on this list have brought up the fact that the TEI manual is 1400 pages long as a negative. Why? This shows that TEI is well documented. As a general rule, the more documentation that is available for a spec the more mature and useful the standard and the easier it is to learn and implement. I remember a sig file from someone back in the early 90's that went something like, "documentation is a sign of failure". This is somewhat true for simple end-user applications, but it certainly isn't true for things like computer languages and markup languages. b/ -- Brad Collins , Bangkok, Thailand From Bowerbird at aol.com Thu Oct 28 01:44:37 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 28 01:45:01 2004 Subject: [gutvol-d] Final PGTEI... page numbers Message-ID: <65.36dc4587.2eb20b75@aol.com> brad said: > Detractors of XML on this list have brought up > the fact that the TEI manual is 1400 pages long > as a negative. Why? actually, it was jeroen who initially mentioned that fact. but since you asked, the reason this is seen as a "negative" is because we think that precious few of the volunteers who have traditionally shouldered the effort of creating e-texts will continue to do so if an understanding of those 1400 pages of t.e.i. documentation were to become a prerequisite. but maybe now distributed proofreaders has enough people on-board that they feel less uncomfortable taking that risk... or maybe not, as their tentative plan thus far involves adding two "markup" rounds (at least, and maybe more) to their existing two "proofing" rounds, so as to minimize the number of people who need to be concerned with markup. > This shows that TEI is well documented. um, well, yes, i guess it does. although _more_ documentation is not _always_ a good sign of _better_ documentation, is it? > As a general rule, the more documentation that is available > for a spec the more mature and useful the standard > and the easier it is to learn and implement. i'm not quite so sure i agree with that "general rule", brad... i think it would be just as possible -- and more compelling -- to formulate a "general rule" that the more documentation that a spec needs, the more complex it is, which means that it is _harder_ to "learn and implement"... i'm not afraid of documentation. indeed, quite to the contrary, i'm one of those rare people who often prefers reading it _first_, because if you can stomach it, it'll save you lots of fiddling time. and i'm a word geek too. so i find the massive t.e.i. documentation -- and indeed the whole framework itself -- to be a remarkable and fascinating piece of work. it is mind-boggling to witness how _complex_ and _variegated_ a comprehensive examination of text can become, once you pour a foundation and start building a building. on the other hand, i can equally admire a system that boils things down to their essence, and creates great benefits with few costs. if all the volunteers contributing their efforts to project gutenberg were word geeks as willing to throw themselves into a devotion of documentation, like you and me, brad, it might not matter whether we went with the complex system or one that is a lot more easy. but given that they probably aren't, we should think very carefully before committing them to a world with a high degree of difficulty. as you put it, the learning of a complex system like t.e.i. is often "a gradual process of incremental epiphanies". can we _survive_ the situation where thousands of volunteers are put through that? with perhaps many becoming alienated in the course of doing so? unless i miss my guess, just the last few days of "how do we do this?" posts on this listserve have tried the patience of most subscribers... (which leads me to suggest that perhaps there is another listserve that is more appropriate for that, where the markup geeks can go?) -bowerbird From brad at chenla.org Thu Oct 28 06:57:42 2004 From: brad at chenla.org (Brad Collins) Date: Thu Oct 28 06:59:25 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes In-Reply-To: <418030C5.7070505@hutchinson.net> (Joshua Hutchinson's message of "Wed, 27 Oct 2004 19:35:33 -0400") References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> Message-ID: The side note is not always the same as an end note or foot note. Take the following attempt at an ascii version of the first page of the EETS edition of the Romance of Merlin. It's the best I could do using a proportional font. Note: This example is 80 columns--some mail programs might mangle this. If it looks like it's a mess, try re-sizing your window larger. If it still doesn't look right then your mailer has inserted hard line-breaks. -- Begin -- The Romance of Merlin. --------- CHAPTER I . CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. Fvll wrothe and angry was the Deuell, whan that oure lorde [Fol 1a.] hadde ben in helle, and had take oute Adam and Eve, and Anger of the other at his plesier; and whan the fendes sien that, they hadden Devil against right grete feer and gret merveile; thei assembleden to-gedir, our Lord. and seiden, "What is he this thus vs supprisith and dis- troyeth, in so moche that our strengthes ne nought ellis that we Assembly of haue may nought with-holde hym, nor again hym stonde in no the fiends diffence; but that he doth all that hym lyketh, we ne trowe not and their dis- that eny man myght be bore of woman, but that he sholde ben cussion. oures, and he that thus vs distroyeth, how is he born in whom we [did]1 knowe non erthely delyte." Than ansuerde anothir fende and seide, "He this hath distroyed that which we wende sholde haue be mooste oure a-vaile. Remembre ye not how the prophetes The prophets seiden, how that god shulde come in to erthe for to saue the said that God synners of Adam and Eve, and we yeden bysily a-boute theym should come that so seiden, and dide them moste turment of eny othir pepill, on earth to and it semed by their [feire]1 semblant, that it greved hem but save sinners. litill or nought, but they comforted hem that weren synners, and seide that oon sholde come, which sholde delyuer hem out of tharldome and disese. 1 Illegible 1 -- End -- First let's ignore the running analysis and do a simple markup of the main body of the passage: ---
The Romance of Merlin. CHAPTER I CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

Fvll wrothe and angry was the Deuell, whan that oure lorde hadde ben in helle, and had take oute Adam and Eve, and other at his plesier; and whan the fendes sien that, they hadden right grete feer and gret merveile; thei assembleden to-gedir, and seiden, "What is he this thus vs supprisith and dis-troyeth, in so moche that our strengthes ne nought ellis that we haue may nought with-holde hym, nor again hym stonde in no diffence; but that he doth all that hym lyketh, we ne trowe not that eny man myght be bore of woman, but that he sholde ben oures, and he that thus vs distroyeth, how is he born in whom we did knowe non erthely delyte." Than ansuerde anothir fende and seide, "He this hath distroyed that which we wende sholde haue be mooste oure a-vaile. Remembre ye not how the prophetes seiden, how that god shulde come in to erthe for to saue the synners of Adam and Eve, and we yeden bysily a-boute theym that so seiden, and dide them moste turment of eny othir pepill, and it semed by their feire semblant, that it greved hem but litill or nought, but they comforted hem that weren synners, and seide that oon sholde come, which sholde delyuer hem out of tharldome and disese.

... Except for the tag this is all basic TEI-Lite. We have replaced the page break note and the page break with tags which indicate which edition they came from (the original folio manuscript or the EETS edition). We have also marked up the `Illegible' text with the tag with the responsibility attribute indicating that the original editor of the EETS edition, Henry B. Wheatley was responsible for indicating that the marked text was unclear. Now, what about the running analysis. Can we use TEI to mark this up as well? Yes. ---
The Romance of Merlin. CHAPTER I CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

Fvll wrothe and angry was the Deuell, whan that oure lorde hadde ben in helle, and had take oute Adam and Eve, and other at his plesier; and whan the fendes sien that, they hadden right grete feer and gret merveile; thei assembleden to-gedir, and seiden, "What is he this thus vs supprisith and dis- troyeth, in so moche that our strengthes ne nought ellis that we haue may nought with-holde hym, nor again hym stonde in no diffence; but that he doth all that hym lyketh, we ne trowe not that eny man myght be bore of woman, but that he sholde ben oures, and he that thus vs distroyeth, how is he born in whom we did knowe non erthely delyte." Than ansuerde anothir fende and seide, "He this hath distroyed that which we wende sholde haue be mooste oure a-vaile. Remembre ye not how the prophetes seiden, how that god shulde come in to erthe for to saue the synners of Adam and Eve, and we yeden bysily a-boute theym that so seiden, and dide them moste turment of eny othir pepill, and it semed by their feire semblant, that it greved hem but litill or nought, but they comforted hem that weren synners, and seide that oon sholde come, which sholde delyuer hem out of tharldome and disese.

... We've broken the paragraph into sections which each have an id which is used to link textual analysis using tags which are collected in a `interpGrp' group tag. An can be created for the whole chapter or paragraph by paragraph. Obviously this solution uses full blown TEI, not just the TEI-Lite subset, but it does work. I would suggest that in the case of works like The Romance of Merlin, that PG should create two editions. The first would be a clean, reference edition of the original text, and then a second edition with all of the textual analysis and notes from the victorian edition. So using FRBR Entities the resulting texts would look something like this (I think, this is very confusing and I could have screwed this up). W Romance of Merlin (circa 1450-1460). E ... Text of Middle Eng. Translation of Middle French Suite De Merlin. M ...... Original MS. Transcription. I ......... Manuscript (University Library, Cambridge University) E ...... Merlin A Prose Romance (Edited By Henry B. Wheatley, Introduction by Edward Mead, 1899) M ......... EETS ed. London, 1899. M ......... PG TEI Master ed based on EETS ed. 2004. F ............... PG Plain Text Ed. F ............... PG HTML Ed. E ...... Text Only Electronic Edition (TEI Ed. 2004). M ............ PG TEI Master Ed. F ............... PG Plain Text Ed. F ............... PG HTML Ed. W ==> Work E ==> Expression M ==> Manifestation F ==> Format I ==> Item/Instance The Text-Only TEI version could then be used as a base reference text by anyone to create new annotated editions. b/ -- Brad Collins , Bangkok, Thailand From jon at noring.name Thu Oct 28 07:24:14 2004 From: jon at noring.name (Jon Noring) Date: Thu Oct 28 07:24:26 2004 Subject: [gutvol-d] Final PGTEI... page numbers In-Reply-To: <65.36dc4587.2eb20b75@aol.com> References: <65.36dc4587.2eb20b75@aol.com> Message-ID: <501212428078.20041028082414@noring.name> Bowerbird said: > brad said: >> Detractors of XML on this list have brought up the fact that the >> TEI manual is 1400 pages long as a negative. Why? > but since you asked, the reason this is seen as a "negative" > is because we think that precious few of the volunteers who > have traditionally shouldered the effort of creating e-texts > will continue to do so if an understanding of those 1400 pages > of t.e.i. documentation were to become a prerequisite. > > but maybe now distributed proofreaders has enough people > on-board that they feel less uncomfortable taking that risk... The "1400" pages is for the full-blown TEI spec, which includes some pretty obscure stuff. Interspersed within it include long and (to me) fascinating general discourses on the structure of textual documents, with copious examples. In essence, it is probably one of the better "textbooks" ever written on this topic even if it is only there to support the description of the TEI markup. > or maybe not, as their tentative plan thus far involves > adding two "markup" rounds (at least, and maybe more) > to their existing two "proofing" rounds, so as to minimize > the number of people who need to be concerned with markup. Essentially yes. Distributed Proofreader's longer-term vision, as I understand it (and Juliet can correct me where I'm off anywhere in this message), is to settle upon some subset of TEI to apply to all documents (either use TEI-Lite or some other comparable subset -- for the occasional oddball document the more extended TEI will be used in "manual" mode.) In addition, for most of DP's volunteers, the markup will be "under-the-hood" and largely invisible -- most of the volunteer work anyway is for copyediting the text (correcting OCR errors), not markup insertion, so no need to require these volunteers to learn the gory details of TEI. Only the most experienced and interested of the DP volunteers, who do the final cleanup/finishing stages, will actually play with the markup itself. > as you put it, the learning of a complex system like t.e.i. is often > "a gradual process of incremental epiphanies". can we _survive_ > the situation where thousands of volunteers are put through that? > with perhaps many becoming alienated in the course of doing so? Well as I noted above, DP, where the action is for large-scale production of e-texts (they are now the actual engine which drives PG's growth), does not plan to inflict TEI on the general first-level volunteers (this is what I inferred from my talks a while back with Charles.) With regards to the specifics of the markup which DP will eventually use (likely a subset of TEI as previously noted), that will ultimately be determined by them based on compatibility with the production interface as well as what works best for the various uses (note the plural) of the texts. [Aside: the DP-produced XML Master texts will certainly be used for many purposes, all of which instill requirements on the markup specification, and which must be considered -- this is the biggest missing area not being discussed on gutvol-*. The most exciting of these is where the DPXML texts will be archived into a special library-like repository which allows a very high-level of end-user interface and customizability to the collection (e.g., bookmarking, annotation, interlinking within the repository and to other content repositories, blogging, etc. -- all things several associates and I are now working on.) Of course, the other uses are to generate portable digital formats as the end-user wants, higher-quality text-to-speech capability, and Michael Hart's dream of language translation. These, too, guide the nature of the Master markup vocabulary. Of course, there must be library-compatible and properly designed catalog, metadata, and identifier information for each e-text in the repository. And where they exist, the original page scans of the source documents will also be available and interlinked with the XML versions. Brewster Kahle at the Internet Archive will *gladly* archive the page scans for DP/PG. I envision that most of the earlier portion of the PG collection, which contains most of the classics, will be redone by DP from source documents to assure proper metadata collection, uniformity and conformity with the rest of the DPXML texts and to have the page scans available. Once DP gets into major production with many more volunteers, redoing the earlier texts won't be a big deal -- it needs to eventually be done anyway, in my view.] I would think and hope that DP will convene a formalized working group of the various experts and enthusiasts here and elsewhere to hammer out the DP Markup Specification based on requirements gathering and analysis, which is the proper way to do this. The DPMWG will have a more formalized and committed leadership structure, with weekly teleconference calls. From my standards working group experience, it's amazing how much stuff gets done during weekly teleconferences and the occasional face-to-face meeting (biannual or annual), while written listserv exchanges in a group like gutvol-* usually ends up going around and around in circles. I expect it won't take that long to hammer out the "beta" of the DP Markup Vocabulary when the working group is organized properly and committed to generate and then resolve the various requirements. I would even ask someone like C. Michael Sperberg-McQueen to be an advisor to the working group (his brother Roger Sperberg and I have worked closely together on various projects in the past. .) I would think that DP's vision to include TEI in its next generation system so as to do *large-scale* production of e-texts (possibly up to a few hundred *per day* to begin the process of one million texts in a decade or two) will greatly excite the TEI community and we will attract some pretty smart and dedicated working group members to add to the several already here. Volunteerism is not only for the "rank and file" (those who will do the basic copyediting), but also includes those who are more technically minded and understand the markup issues as it relates to the production environment. Jon Noring From joshua at hutchinson.net Thu Oct 28 07:36:51 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 28 07:36:55 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> That does work, but why note use the already existing markup? Then, you don't have to do the extra work of segmenting your paragraph for the same result. For an HTML edition, it already works in our transform. The text version we just have to decide HOW to handle it, then code up the transform. My version of your example:
The Romance of Merlin. CHAPTER I CONSULTATION OF DEVILS, AND BIRTH OF MERLIN.

Anger of the Devil against our Lord. Fvll wrothe and angry was the Deuell, whan that oure lorde hadde ben in helle, and had take oute Adam and Eve, and other at his plesier; and whan the fendes sien that, they hadden right grete feer and gret merveile; Assembly of the fiends and their discussion. thei assembleden to-gedir, and seiden, "What is he this thus vs supprisith and dis-troyeth, in so moche that our strengthes ne nought ellis that we haue may nought with-holde hym, nor again hym stonde in no diffence; but that he doth all that hym lyketh, we ne trowe not that eny man myght be bore of woman, but that he sholde ben oures, and he that thus vs distroyeth, how is he born in whom we did knowe non erthely delyte." Than ansuerde anothir fende and seide, "He this hath distroyed that which we wende sholde haue be mooste oure a-vaile. The prophets said that God should come on earth to save sinners. Remembre ye not how the prophetes seiden, how that god shulde come in to erthe for to saue the synners of Adam and Eve, and we yeden bysily a-boute theym that so seiden, and dide them moste turment of eny othir pepill, and it semed by their feire semblant, that it greved hem but litill or nought, but they comforted hem that weren synners, and seide that oon sholde come, which sholde delyuer hem out of tharldome and disese.

Josh ----- Original Message ----- From: Brad Collins To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Date: Thu, 28 Oct 2004 20:57:42 +0700 > > > > The side note is not always the same as an end note or foot note. > Take the following attempt at an ascii version of the first page of > the EETS edition of the Romance of Merlin. It's the best I could do > using a proportional font. > > Note: This example is 80 columns--some mail programs might mangle > this. If it looks like it's a mess, try re-sizing your window larger. > If it still doesn't look right then your mailer has inserted hard > line-breaks. > > -- Begin -- > The Romance of Merlin. > > --------- > > CHAPTER I > . > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > Fvll wrothe and angry was the Deuell, whan that oure lorde [Fol 1a.] > hadde ben in helle, and had take oute Adam and Eve, and Anger of the > other at his plesier; and whan the fendes sien that, they hadden Devil against > right grete feer and gret merveile; thei assembleden to-gedir, our Lord. > and seiden, "What is he this thus vs supprisith and dis- > troyeth, in so moche that our strengthes ne nought ellis that we Assembly of > haue may nought with-holde hym, nor again hym stonde in no the fiends > diffence; but that he doth all that hym lyketh, we ne trowe not and their dis- > that eny man myght be bore of woman, but that he sholde ben cussion. > oures, and he that thus vs distroyeth, how is he born in whom we > [did]1 knowe non erthely delyte." Than ansuerde anothir fende > and seide, "He this hath distroyed that which we wende sholde > haue be mooste oure a-vaile. Remembre ye not how the prophetes The prophets > seiden, how that god shulde come in to erthe for to saue the said that God > synners of Adam and Eve, and we yeden bysily a-boute theym should come > that so seiden, and dide them moste turment of eny othir pepill, on earth to > and it semed by their [feire]1 semblant, that it greved hem but save sinners. > litill or nought, but they comforted hem that weren synners, and > seide that oon sholde come, which sholde delyuer hem out of > tharldome and disese. > > 1 Illegible > > 1 > -- End -- > > First let's ignore the running analysis and do a simple markup of the > main body of the passage: > > --- >
> The Romance of Merlin. > CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF > MERLIN. > > > >

Fvll wrothe and angry was the Deuell, whan that oure lorde hadde > ben in helle, and had take oute Adam and Eve, and other at his > plesier; and whan the fendes sien that, they hadden right grete feer > and gret merveile; thei assembleden to-gedir, and seiden, "What is > he this thus vs supprisith and dis-troyeth, in so moche that our > strengthes ne nought ellis that we haue may nought with-holde hym, > nor again hym stonde in no diffence; but that he doth all that hym > lyketh, we ne trowe not that eny man myght be bore of woman, but > that he sholde ben oures, and he that thus vs distroyeth, how is he > born in whom we did knowe non > erthely delyte." Than ansuerde anothir fende and seide, "He this > hath distroyed that which we wende sholde haue be mooste oure > a-vaile. Remembre ye not how the prophetes seiden, how that god > shulde come in to erthe for to saue the synners of Adam and Eve, and > we yeden bysily a-boute theym that so seiden, and dide them moste > turment of eny othir pepill, and it semed by their resp="wheatly">feire semblant, that it greved hem but > litill or nought, but they comforted hem that weren synners, and > seide that oon sholde come, which sholde delyuer hem out of > tharldome and disese.

> > >
> ... > > Except for the tag this is all basic TEI-Lite. > > We have replaced the page break note and the page break with > tags which indicate which edition they came from (the original folio > manuscript or the EETS edition). > > We have also marked up the `Illegible' text with the tag > with the responsibility attribute indicating that the original editor > of the EETS edition, Henry B. Wheatley was responsible for indicating > that the marked text was unclear. > > Now, what about the running analysis. Can we use TEI to mark this up > as well? > > Yes. > > --- >
> The Romance of Merlin. > CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF > MERLIN. > > > >

> Fvll wrothe and angry was the Deuell, whan that oure > lorde hadde ben in helle, and had take oute Adam and Eve, and other > at his plesier; and whan the fendes sien that, they hadden right > grete feer and gret merveile; > > thei assembleden to-gedir, and seiden, "What is he this > thus vs supprisith and dis- troyeth, in so moche that our strengthes > ne nought ellis that we haue may nought with-holde hym, nor again > hym stonde in no diffence; but that he doth all that hym lyketh, we > ne trowe not that eny man myght be bore of woman, but that he sholde > ben oures, and he that thus vs distroyeth, how is he born in whom we > did knowe non erthely delyte." Than > ansuerde anothir fende and seide, "He this hath distroyed that which > we wende sholde haue be mooste oure a-vaile. > > Remembre ye not how the prophetes seiden, how that god > shulde come in to erthe for to saue the synners of Adam and Eve, and > we yeden bysily a-boute theym that so seiden, and dide them moste > turment of eny othir pepill, and it semed by their resp="wheatly">feire semblant, that it greved hem but > litill or nought, but they comforted hem that weren synners, and > seide that oon sholde come, which sholde delyuer hem out of > tharldome and disese. >

> > > > > > > > >
> ... > > We've broken the paragraph into sections which each have an id which > is used to link textual analysis using tags which are > collected in a `interpGrp' group tag. > > An can be created for the whole chapter or paragraph by > paragraph. > > Obviously this solution uses full blown TEI, not just the TEI-Lite > subset, but it does work. > > I would suggest that in the case of works like The Romance of Merlin, > that PG should create two editions. > > The first would be a clean, reference edition of the original text, > and then a second edition with all of the textual analysis and notes > from the victorian edition. > > So using FRBR Entities the resulting texts would look something like > this (I think, this is very confusing and I could have screwed this up). > > W Romance of Merlin (circa 1450-1460). > E ... Text of Middle Eng. Translation of Middle French Suite De Merlin. > M ...... Original MS. Transcription. > I ......... Manuscript (University Library, Cambridge University) > E ...... Merlin A Prose Romance (Edited By Henry B. Wheatley, > Introduction by Edward Mead, 1899) > M ......... EETS ed. London, 1899. > M ......... PG TEI Master ed based on EETS ed. 2004. > F ............... PG Plain Text Ed. > F ............... PG HTML Ed. > E ...... Text Only Electronic Edition (TEI Ed. 2004). > M ............ PG TEI Master Ed. > F ............... PG Plain Text Ed. > F ............... PG HTML Ed. > > W ==> Work > E ==> Expression > M ==> Manifestation > F ==> Format > I ==> Item/Instance > > The Text-Only TEI version could then be used as a base reference text > by anyone to create new annotated editions. > > b/ > > -- > Brad Collins , Bangkok, Thailand > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Thu Oct 28 07:58:18 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 28 07:58:23 2004 Subject: [gutvol-d] Final PGTEI... page numbers Message-ID: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Jon Noring > I just wanted to address the DP/PG TEI Working Subcommittee idea. Probably not going to happen. Things tend happen like this around DP: One or two people see something irritating they want to scratch badly enough to start working on it themselves. Then, they come up with a workable implementation that is put out as a "test" of some sort. It gets refined and used more and more until it becomes a defacto standard. (See our current proofing guidelines at DP for a prime example of this process.) In this case, TEI was talked about but no one wanted to "scratch the itch" enough to take it on as their baby. I'm finally irritated enough by multiple formats that I've decided to get something going. With Marcello's invaluable technical expertise, I'm trying to keep the ball rolling. It's slowed down by the fact that I am NOT a TEI/XML expert, but I am making some progress. I've already learned a ton in the last week ... enough to be impressed by the TEI spec and by Marcello's work on the transforms so far. My goal is to have a working XML->HTML and XML->TEXT conversion for 90% of the texts that go through DP sometime before Christmas. The caveat here is that I'm still learning and while it looks doable to me now, I may learn something tomorrow that makes me revise my estimate. So, for now, I guess I'm the "unofficial" working committee (well, Marcello, too, since I keep bugging the poor guy constantly and he's still nice enough to respond to my e-mails). Others have provided very helpful pointers and advice, too, but I'm hoping to push this past just talking about it and into actually having something that works at some level. (Right now, the XML -> HTML conversion is "almost" there ... the XML -> TEXT conversion needs more work.) Josh From scott_bulkmail at productarchitect.com Thu Oct 28 08:03:37 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Thu Oct 28 08:11:58 2004 Subject: [gutvol-d] PGTEI and more Message-ID: My feedback on PGTEI is too long for email, so I posted it here: http://Classicosm.com/xml/feedbackonpgtei.html Sections include: PGTEI Vocabulary PGTEI Examples PGTEI Documentation Generated HTML Default CSS Generated PDF Generated Text PGText to PGTEI I also put together a (rough draft!) Quick Reference table, including a comparison to XHTML http://Classicosm.com/xml/pgteiquickreference.html To help explore alternatives to TEI, I critique a side-by-side comparison to a dedicated vocabulary for plays: http://Classicosm.com/xml/tei-vs-play.html Feedback welcome!!! -- Cheers, Scott S. Lawton http://Classicosm.com/ - Classic Books http://ProductArchitect.com/ - consulting From scott_bulkmail at productarchitect.com Thu Oct 28 08:03:42 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Thu Oct 28 08:12:02 2004 Subject: [gutvol-d] Final PGTEI... page numbers In-Reply-To: References: <20041026133304.C1FB04F46F@ws6-5.us4.outblaze.com> Message-ID: >Presentational markup should be used to indicate how the original was >marked up. Aha! That wasn't clear to me since I've been approaching TEI as a "master" format, whereas it was really designed to describe existing texts (which is fine; that's also something I hope is part of PG's XML solution). > Instructions for how something should be displayed should >be done using CSS or XSLT. Agreed. (Though I include all transformation methods here, not just XSLT.) >I've come to respect TEI more and more as a mature body of >experience which I am trusting more and more. If something seems >stupid or awkward I now try to stop and step back and assume that >there is a good chance I don't understand the design before trying to >cobble to together my own solution. I think that's a good approach with things like TEI, XHTML, etc. A bunch of very smart people spent quite a bit of time on them. Three caveats: 1. there are still aspects that are *truly* awkward, e.g. rend="display" to indent (though I welcome a good explanation) 2. the design goals for TEI (or any other particular solution) may not match PG's design goals 3. different people work differently, so there's often no one "best" answer (e.g. some people love XSLT, some hate it) -- Cheers, Scott S. Lawton http://Classicosm.com/ - Classic Books http://ProductArchitect.com/ - consulting From joshua at hutchinson.net Thu Oct 28 08:38:05 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 28 08:38:09 2004 Subject: [gutvol-d] PGTEI notes webpage Message-ID: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> I have almost no notes on the page, but I do have a link to the current test.xml I'm using and the resulting html and text from the online conversion. If you're at all interested in on-going progress, you can see if here. http://home.alltel.net/hutch2000/test.html Josh From jon at noring.name Thu Oct 28 09:13:24 2004 From: jon at noring.name (Jon Noring) Date: Thu Oct 28 09:15:11 2004 Subject: (And a note on organization/Board) Re: [gutvol-d] Final PGTEI... page numbers In-Reply-To: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com> References: <20041028145818.F30DF9E98F@ws6-2.us4.outblaze.com> Message-ID: <1201218978515.20041028101324@noring.name> Josh wrote: > Jon Noring: > I just wanted to address the DP/PG TEI Working Subcommittee idea. > > Probably not going to happen. Things tend happen like this around DP: > > One or two people see something irritating they want to scratch > badly enough to start working on it themselves. Then, they come up > with a workable implementation that is put out as a "test" of some > sort. It gets refined and used more and more until it becomes a > defacto standard. (See our current proofing guidelines at DP for a > prime example of this process.) Well, that is the way things are currently done in DP. But if Charles and Juliet decide it is time to formalize some of the next generation system development, they will make it happen. The option to create a formal Working Group is always an option, and recommended at some stage, even if it is to simply "finish" what is currently being done by the various people individually hammering away at it and doing an excellent job (such as you and Marcello, among others.) Such a formal Working Group can attract some pretty sharp minds in the TEI, text conversion, and other related communities to contribute their time and energy and informed insights, and this has **many other tangible benefits** to the goals of the DP and PG projects besides just coming up with a workable TEI subset DP can use in its future activities: it is important not to ignore the human and social networking element in the equation, something which techno-geeks tend to overlook. For example, this gets "buy-in" to the DP/PG vision by many interested communities (it now becomes "their" project), and their many connections will greatly benefit DP and PG in its various activities, such as greatly improving the chances of Foundation and similar funding to help move DP's and PG's activities to the next level of production, quality and wider acceptance. DP and PG are volunteer activities -- it is best to do what is necessary to get the largest number of the sharpest volunteer minds, individual and organizational. Formalizing the various processes will help with attracting these volunteers -- they tend not to join movements which have no centralized authority and which don't try to forge close working relationships with many related communities. (PG is essentially rudderless in leadership by design, and does little effort to reach out to other well-known organizations to form strategic partnerships -- it acts as if the rest of the world does not exist. For example, has PG tried to form a close working alliance with DAISY so as to plugin with the accessibility community and to mobilize its help?) (Note that Mozilla is now exploding on the scene and making a huge impact with Firefox by competing directly with IE, and this is partly because it is coming together in a more formal, organized way -- the Mozilla Foundation -- with leadership which recognizes that even volunteer, open source projects which aspire to greatness need to be well-organized and to "network" closely with various recognized communities -- to play with the proverbial "Big Boys". I know the anarchist-oriented geeks here do not accept this assessment. For info on who serves on the Board of Directors of the Mozilla Foundation: http://www.mozilla.org/foundation/ . It includes Mitch Kapor, as the Chair.) For starters, why doesn't the PG Board of Trustees include some of the top names in the etext, library and digital archiving, accessibility, and public domain advocacy communities, all of whom support the purpose and goals of PG and DP? Why isn't Brewster Kahle, for example, on the Board? Why isn't there a representative of ALA on the Board? Why isn't George Kerscher of DAISY, or someone of his caliber in the accessibility community, on the Board? What about Larry Lessig or John Perry Barlow or Cory Doctorow? Having such a distinguished Board will open up all kinds of doors for PG including funding opportunities -- and this can be done without compromising any of the goals and vision of PG. Personally, I believe it will *attract* many more enthusiastic volunteers, too, and create new excitement. And the DP- produced texts will now become more important to many other organizations since they now have a more personal stake in the work product. Success breeds success; momentum is built. Anyway, this is getting off of the main topic of gutvol-d... Jon Noring From joshua at hutchinson.net Thu Oct 28 09:25:09 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 28 09:25:14 2004 Subject: [gutvol-d] PGTEI and more Message-ID: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Scott Lawton > > My feedback on PGTEI is too long for email, so I posted it here: > http://Classicosm.com/xml/feedbackonpgtei.html > > Feedback welcome!!! quote is used in an example but apparently isn't part of TEI Lite (it's not in link_outAppendix A). What's the story? It is part of the full TEI spec. Thanks for pointing it out. I meant to have it in my test.xml, but I forgot. The test.xml should have for blockquotes (and will on the next update.) TEI-Lite is the starting point, but we will probably pull in other stuff from the full spec where we need it. ** q: in cases where the quotation marks don't balance, it may be difficult to automatically convert quotation marks to the appropriate q.../q form, and time consuming to manually proof. Accordingly, I suggest this step be left as optional. I actually agree here. I prefer using " instead of . Can any of the experts explain why this is a "bad idea"? ** pgHeader looks like it's contains information that should be described in teiHeader (though I'm new to TEI so may be wrong). alice.tei and lmiss.tei both contain pgHeader; the generated PGTEI does not. Assuming I understand this part right ... The teiHeader contains all the information. pgHeader is the call out to the part that takes the info in teiHeader and formats into a standard display header when you convert to HTML or TEXT. Marcello is probably the guy to explain it more fully. ** Having separate index tags for TOC, PDF and PDB strikes me as unnecessary and prone to error. Shouldn't the TOC one suffice for all? In fact, the tag itself seems redundant. Shouldn't the head itself suffice? (If TEI requires it, that's another example of where I think TEI is too complex.) Well, the reason they are separate is for the occasion where you have a header, but you don't want that header to appear in the Tabel of Contents. HTML requires an anchor and

markup both ... this is the TEI equivalent. As for the multiple index entries, I wondered about the need myself, but I haven't gotten around to asking Marcello about it (or digging through documentation to try to understand the need). ** alice.tei: reg="Carroll, Lewis" should use the complete "authority" form, which I believe is "Carroll, Lewis, 1832-1898". Note that unlike the PG website, there are no parens around the dates. Here's an illustration of paren usage: "Baum, L. Frank (Lyman Frank), 1856-1919". I'm hoping consistency in format will be achieved when we have 1) some examples in place and 2) a web form for generating the, admittedly confusing, teiHeader section. ** There appear to be two validation errors, e.g. in the link_outPGTEI documentation: Error (7/117): must not contain block level elements like

. Error (379/1): The start tag for

can't be found. Marcello knows about these and they will be fixed. ** In the documentation, why is "Versprich mir, Heinrich" repeated in the output, the second time in white? This one confused me for a minute, too... Then I realized, it is the only way a HTML browser will be able to space over the right amount. In effect, Marcello is trying to make the text invisible. There may be a better way to hide the spacing text, but I haven't given it much thought yet. It works now, if not in an "elegant solution" manner. ** The lack of space between paragraphs goes against Web conventions. (It's fine as an option but a poor choice for the default.) Agreed. I promise it will be changed. ** Thanks again for your analysis! Josh From Bowerbird at aol.com Thu Oct 28 09:46:28 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Oct 28 09:46:53 2004 Subject: [gutvol-d] Final PGTEI... page numbers Message-ID: <85.192a0367.2eb27c64@aol.com> jon noring said: > for most of DP's volunteers, the markup will be > "under-the-hood" and largely invisible -- most of the > volunteer work anyway is for copyediting the text > (correcting OCR errors), not markup insertion, > so no need to require these volunteers to learn > the gory details of TEI. actually, that represents a poor understanding of the work-flow at distributed proofreaders, under the current system anyway, where the proofers are using a clumsy system of pseudo-markup. > Only the most experienced and interested of the DP volunteers, > who do the final cleanup/finishing stages, will actually > play with the markup itself. well, now you're talking about the system that will be created, and what that will ultimately look like has not yet been decided. the way you've put it here is, to some degree, what is desired, but there's some question about whether proofers can do their job prior to the introduction of any markup at all. of course, there is _also_ some question about how easy it will be to do the proofing if any obtrusive markup is "inflicted" on the text prior to proofing. further, at present, "proofing" -- the act of catching and correcting errors, either in the text or in the formatting -- happens right up until the end of the text's processing, and i think the finding will be that obtrusive markup, whenever it occurs, will short-circuit that. whether the early rounds can be improved to the point that this "short-circuiting" causes no problems is yet another open question. > Aside: the DP-produced XML Master texts will certainly be > used for many purposes, all of which instill requirements on > the markup specification, and which must be considered -- > this is the biggest missing area not being discussed on gutvol-*. well, the discussion _here_ carries absolutely no weight at all. if you want to know what d.p. is going to do, you'll have to go over to their forums, where they're batting abut these issues right now. (look under the "everything but distributed proofreaders" section, which is an odd place to put such a discussion, wouldn't you think?) > The most exciting of these is where the DPXML texts will be > archived into a special library-like repository which allows > a very high-level of end-user interface and customizability > to the collection (e.g., bookmarking, annotation, interlinking > within the repository and to other content repositories, blogging, > etc. -- all things several associates and I are now working on. sounds like you're off and running. perhaps you could teach people here how to crawl first. -bowerbird From jeroen at bohol.ph Thu Oct 28 12:13:17 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Thu Oct 28 12:13:00 2004 Subject: [gutvol-d] draft TEI conventions and larger example file In-Reply-To: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> Message-ID: <418144CD.7070707@bohol.ph> As I promised already some time ago, I've prepared a draft TEI Lite conventions document, and a zip file with a TEI encoded file, following that. The end result of the TEI to HTML transform can already be seen in PG: http://www.gutenberg.net/etext/10772 To get the guidelines in PDF format: http://www.bohol.ph/PG/TEI-PG-Guidelines-0.1.pdf in open office format: http://www.bohol.ph/PG/TEI-PG-Guidelines-0.1.sxw To get the sample file: http://www.bohol.ph/PG/IncaLand.zip I hope to be adding more examples soon, and update the guidelines, among others with what things are optional, and what will be required. I hope to keep the instructions within 30 pages. Jeroen Hellingman. From marcello at perathoner.de Thu Oct 28 12:19:18 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 28 12:19:27 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: References: Message-ID: <41814636.9080904@perathoner.de> Scott Lawton wrote: > Section 18: I strongly recommend omitting the requirement that TEX > and NROFF characters must be escaped. (As far as I can tell, that's > not part of TEI.) It may well be a useful optional feature; perhaps > it could be turned on by including a specific processing instruction. That limitation will go away once XSLT can handle entities or until I rewrite the XSLT transform into perl. > quote is used in an example but apparently isn't part of TEI Lite > (it's not in link_outAppendix A). What's the story? is to be used when you quote some written source, when you quote direct speech. > q: in cases where the quotation marks don't balance, it may be > difficult to automatically convert quotation marks to the appropriate > q.../q form, and time consuming to manually proof. Accordingly, I > suggest this step be left as optional. It is optional. Using will debug your quotes automatically, something that is near impossible to program any other way. Using will also get the most pretty quote signs the output format can display. > langUsage: I suggest the standard should be to omit the content of > the tag (e.g. "British", which is probably more useful as "British > English" or "English (British)"). This information should be > generated to ensure consistency. (They appear in the generated PGTEI > and in alice.tei, but not in lmiss.tei.) You have to include only the languages you actually use in the text. The converter includes some more because it is easier to delete than to add and if you declare too many it doesn't hurt. > pgHeader looks like it's contains information that should be > described in teiHeader (though I'm new to TEI so may be wrong). > alice.tei and lmiss.tei both contain pgHeader; the generated PGTEI > does not. PGTEI Examples pgHeader is a hack that can be removed once we agree on how to insert all that information in the teiHeader. > Having separate index tags for TOC, PDF and PDB strikes me as > unnecessary and prone to error. Shouldn't the TOC one suffice for > all? Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you don't always want the full to appear in the contents. > In the documentation, why is "Versprich mir, Heinrich" repeated in > the output, the second time in white? Because there is no other way to properly indent a continuation line in HTML. If you can figure out one (that does not use tables or javascript!) I'd like to hear. > Are the heuristics from things like GutenMark included? That would > seem to be quite valuable. No. Its just a quick perl hack. Better take GutenMark and make it output TEI instead of HTML. But that is a job for the author of GutenMark. Better bug him :-) -- Marcello Perathoner webmaster@gutenberg.org From brad at chenla.org Thu Oct 28 12:45:38 2004 From: brad at chenla.org (Brad Collins) Date: Thu Oct 28 12:47:21 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes In-Reply-To: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> (Joshua Hutchinson's message of "Thu, 28 Oct 2004 09:36:51 -0500") References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> Message-ID: "Joshua Hutchinson" writes: > That does work, but why note use the already existing place="margin"> markup? Then, you don't have to do the extra work > of segmenting your paragraph for the same result. For an HTML > edition, it already works in our transform. The text version we > just have to decide HOW to handle it, then code up the transform. > I'm not very happy with segmenting either... but I am trying to fit the concept of a running analysis into a larger framework for scaling texts. To understand what I am talking about it helps to think of a book in terms of the 3D Modeling concept of LOD (Level of Detail). A 3D computer model is made up of polygons. The more polygons you have, the more detailed the model. Models used in big Hollywood films may have millions of polygons in a model. This allows you to create believable virtual characters like Gollum, or (sadly) Jar Jar Binks. But all of those polygons are expensive to render. And if have a shot with Gollum in the distance, you will be spending enormous amounts of resources to draw polygons that can't be seen. To deal with this, LOD is used to reduce the number of polygons in a model the farther away it gets and then increase them again as the model gets closer. When you see a book from far away, you may only see the title on the spine on the shelf. When you get closer you take the book off the shelf and read a synopsis of the contents of the book on the dust jacket. Get closer still and you see a table of contents. Closer again you turn to a chapter and there might be a summary of the chapter at the beginning. Then, in the case of works like the Merlin, there is a running analysis which provides a paragraph by paragraph summary. Then, finally you get as close as you can and are confronted by the body of text itself. So, in this way you can see the running analysis as a way of zooming in or scaling the text. In the XML tests I was doing two years ago on the Merlin I used this concept to progressively zoom in on the book from a single title in a list, to a brief synopsis to the detailed synopsis to a table of contents to a chapter summary to the running analysis. With this kind of a structural approach to summaries and descriptions it was easy to create some very powerful browsing interfaces and indexing mechanisms. I haven't been happy with what I'd done before with the running analysis so my last post were working notes towards finding a way of incorporating summaries at different scales into a text rather than a proposal for PG. Your note approach is fine for providing a presentational means of adding in a running analysis, but it doesn't tell us the span of text that each note describes. This is why TEI offers the and approach. I _DO_ agree that this as overkill for PG texts. When I went to market today, I was struck by the fact that the egg stalls recieve eggs from the farms in plastic flats. The eggs arrive in an organized structured way. The eggs sellers then proceed to pile them into piles in bins by price. This helps to muddle the difference between the eggs when you, the customer are picking through them. The structure and organization provided by the flats was counter productive for the egg sellers because it made it more difficult to unload bum eggs. In the same way, excessive structure in a marked up text makes it more difficult to transform into simpler formats. It would be very awkward to map the notes in the interp tags to the segs in the text. Your notes approach is a lot easier. But I like the idea of having a base reference text which others can use to overlay their own annotations. The `resp' attribute is good at indicating who has annotated what in a text, so that you could easily toggle between annotations from different sources, or strip them out all together. The EETS version of the Merlin is a base text which Wheatley has overlaid all sorts of information. It's a good idea to keep the markup mechanisms for overlayed annotations separate from the base text that is being annotated. This is a larger issue and goal than simply providing electronic editions of books, and is beyond what PG is about. But it is worth keeping these ideas in the back of your mind, if for no other reason than to remember that reading a book from cover to cover is not the only or even the most common way that books are used. b/ -- Brad Collins , Bangkok, Thailand From joshua at hutchinson.net Thu Oct 28 13:04:53 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Oct 28 13:05:02 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: <20041028200453.19AACEDCA3@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Brad Collins vs using > OK, the gist I got was that you want to be able to tie the information from the margins to specific sections (a beginning and ending point) rather than tie to a single spot and assume it applies to some indeterminate length ahead. I can see where allows that. Is that much fidelity in the original? For instance, the marks you applied ... is there something in the original that says this segment stops *right here*? I ask because while the beginning of the seems clear, in the text I'm picturing in my head, the end of the segment would be kinda of fuzzy. (Unless the text used a marker or line break or something at the same spot you are using breaks.) If it is fuzzy, it seems the markup provides the same "here forward to some point" fidelity in information. (I think if I had a scan of the original, I'd probably more fully understand what you were trying to do.) Josh From jon at noring.name Thu Oct 28 13:27:23 2004 From: jon at noring.name (Jon Noring) Date: Thu Oct 28 13:28:05 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes In-Reply-To: References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> Message-ID: <1881234217531.20041028142723@noring.name> Brad wrote: > Josh wrote: >> That does work, but why note use the already existing > place="margin"> markup? Then, you don't have to do the extra work >> of segmenting your paragraph for the same result. For an HTML >> edition, it already works in our transform. The text version we >> just have to decide HOW to handle it, then code up the transform. > I'm not very happy with segmenting either... but I am trying to fit > the concept of a running analysis into a larger framework for scaling > texts. > > [snip of example] > > Your note approach is fine for providing a presentational means of > adding in a running analysis, but it doesn't tell us the span of text > that each note describes. This is why TEI offers the and > approach... > > In the same way, excessive structure in a marked up text makes it more > difficult to transform into simpler formats. It would be very awkward > to map the notes in the interp tags to the segs in the text. Your > notes approach is a lot easier... > > The EETS version of the Merlin is a base text which Wheatley has > overlaid all sorts of information. It's a good idea to keep the > markup mechanisms for overlayed annotations separate from the base > text that is being annotated. > > This is a larger issue and goal than simply providing electronic > editions of books, and is beyond what PG is about. But it is worth > keeping these ideas in the back of your mind, if for no other reason > than to remember that reading a book from cover to cover is not the > only or even the most common way that books are used. We have two issues as I see them here: 1) Notes, sidebars, running analysis, and other types of "out-of-spine" chunks of texts, as found in the original source work. To what chunk of text in the main, "in-spine" text flow each out-of-spine chunk applies to may not be explicitly marked in the source text. Rather it must be figured out by contextual analysis. Obviously, these "out-of-spine" chunks are important to be kept with the Master document format, whatever that may be. 2) Bookmarks, annotations, running commentary, references to and from other digital text works, etc., which is added on by third parties. This is the exciting aspect to make digital texts very useful, as I've previously noted. It is important to keep this stuff separate from the Master document format. Assuming the Master digital texts are XML documents, item (2) can be implemented using the various related W3C specifications of XLink/XPath/XPointer. For example, it is possible with the full XPointer specification to define an exact chunk of text within an XML document. >From the discussion of implementing item (1) within TEI, it appears there's more than one way to do it, with segmenting allowing one to specify the exact range of in-spine text which any out-of-spine chunk applies to. Just some general observations without any suggestions. Jon Noring (p.s., I use the terms "out-of-spine" and "in-spine" loosely based upon the Open eBook Publication Structure, which defines these constructs so ebook reading systems can implement more advanced ways to present "out-of-spine" content. As Bowerbird noted, such "out-of-spine" stuff can be presented in more innovative ways than which is allowed in print, and even in HTML. For example, OEBPS suggests popups to present out-of-spine content, which the Microsoft Reader system implements (but which is largely unknown.) The biggest mistake which the creators of HTML made is not to include a (or more generically-named) tag, which can define some chunk of inline text as being "out-of-spine", and thereby be presented in a popup window or similar innovative fashion. Of course, this would have added significant complexity to the early browsers such as Mosaic, and thus probably explains why this feature was not implemented. But this lack of vision for such a powerful feature is still regrettable. OpenReader definitely plans on making this a major feature, thus one reason we're interested in native recognition of TEI documents.) From marcello at perathoner.de Thu Oct 28 13:52:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Oct 28 13:52:25 2004 Subject: [gutvol-d] Final PGTEI... page numbers In-Reply-To: <501212428078.20041028082414@noring.name> References: <65.36dc4587.2eb20b75@aol.com> <501212428078.20041028082414@noring.name> Message-ID: <41815BFA.2000402@perathoner.de> Jon Noring wrote: > I would think and hope that DP will convene a formalized working group > of the various experts and enthusiasts here and elsewhere to hammer > out the DP Markup Specification based on requirements gathering and > analysis, which is the proper way to do this. I think design-by-committee is the wrong way to go about this. Experimenting markup with more and more complicated books and refining the specs along the way seems to me far more promising. But that's the Cathedral vs. the Bazaar discussion again. To see a particularly disgusting example of design by committee just look at XSLT. > The DPMWG will have a > more formalized and committed leadership structure, with weekly > teleconference calls. From my standards working group experience, it's > amazing how much stuff gets done during weekly teleconferences and the > occasional face-to-face meeting (biannual or annual), while written > listserv exchanges in a group like gutvol-* usually ends up going > around and around in circles. Teleconferencing will essentially shut out all non-us based people via the prohibitive costs or via the language barrier. Non native English speakers like me may have a better standing in a written discussion channel. > I would even ask someone like C. Michael Sperberg-McQueen to be an > advisor to the working group I don't know if the TEI people could advise us much. What we need is not advice about the use of TEI as markup language but about the use of TEI as master format for automatic rendition into a wide variety of output formats. There is the tei-presentation list for this sort of thing but traffic there has been very light. The only person who could really help is Sebastian Rahtz. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Thu Oct 28 15:07:35 2004 From: shalesller at writeme.com (D. Starner) Date: Thu Oct 28 15:08:07 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: <20041028220735.784F56EEF6@ws1-5.us4.outblaze.com> "Joshua Hutchinson" writes: > (I think if I had a scan of the original, I'd probably more > fully understand what you were trying to do.) Take a look at any of the EETS works by PM EETS (that is, all of them except the book titled Early Middle English). They've got good examples of where you'd want to attach a note to a strech of text. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From shalesller at writeme.com Thu Oct 28 16:06:39 2004 From: shalesller at writeme.com (D. Starner) Date: Thu Oct 28 16:06:51 2004 Subject: [gutvol-d] draft TEI conventions and larger example file Message-ID: <20041028230639.1A4AA4BE64@ws1-1.us4.outblaze.com> I have a few comments on the draft guidelines. It'd be nice to have page numbers printed on the pages. A letter size PDF would be useful, but the margins on this one seem generous enough to print on letter-sized paper. DP probably will not be preserving the long-s, and I think it a little unrealistic to expect most of PG's XML documents to preserve it. Also, the description is incorrect; in English, it's used everywhere except at the end of the word, and it was used until about 1800, making it used in the 18th century. It's always used in Fraktur; are we going to preserve that? Counting that, it was used until the middle of the 20th century. It's probably too minor for this document, but several German documents I've seen use a non-ligatured long-s/s combination for the eszett, while not using the long-s elsewhere. Even at the most pedantic, it's arguable whether this should be encoded with the long-s. There should be an option to preserve running headers where they encode information not found elsewhere. I think we should go with standards on the languages section; that is, RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish (how does this differ from sco?) in the draft, and I doubt anything would choke on it today. #6 is a bad idea, especially as 3 letter 639 codes sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad or x-gaddang is a better idea. What happened to emph? All I see is rend. Likewise, I'd rather see foreign do italics and let you mark it with rend="none" if needed, as that would match how most books do it, and give a guideline to when to use foreign. I partially marked up Japanese Literature, and eventually decided not to mark up all the non-italics Japanese words used in running English text, like names of plants and such. I think a comment to mark up running foreign text and italized foreign words, but avoid single words, like the names of plants and foods, in running text if not italized. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jon at noring.name Thu Oct 28 17:30:18 2004 From: jon at noring.name (Jon Noring) Date: Thu Oct 28 17:30:41 2004 Subject: [gutvol-d] Working Group comments Message-ID: <151248792125.20041028183018@noring.name> Marcello wrote: >Jon Noring wrote: >> I would think and hope that DP will convene a formalized working group >> of the various experts and enthusiasts here and elsewhere to hammer >> out the DP Markup Specification based on requirements gathering and >> analysis, which is the proper way to do this. > I think design-by-committee is the wrong way to go about this. > Experimenting markup with more and more complicated books and refining > the specs along the way seems to me far more promising. Experimentation can be done *while* the working group is working on the specification, in an iterative process. It is important that experimentation itself is guided by the requirements established by the working group in looking at the bigger picture which can only be done by formalized collaboration between the members representing the various stakeholders interested in this. Of course, the most important player is DP, which has to implement the spec into their work flow, and they will certainly have their own laundry list of requirements. Anyway, you and others have already made a great head start on the experimental side, as well as establishing the first proposed "beta" specification, PGTEI. The working group will not need to start from scratch, but will be able to build upon the proverbial shoulders of giants. And to address the issue that the working group will likely not get it perfect the first time (it won't) -- in an informal chat I had with Charles a while back, one strategy he brought up is to come up with a "version 1.0" of the DP Markup spec, and implement that in the DP workflow. Then, as production rolls along, continue to update the spec as necessary to handle problem texts. The issue of compatibility (forward/backward) between versions of DP Markup spec will need to be addressed, and may be solved by keeping version 1.0 simpler (a smaller number of tags) and then add tags as needed over time. We don't want to remove tags as time goes on (since we may have finished texts using the deprecated/removed tags), but rather add support over time as needed. > But that's the Cathedral vs. the Bazaar discussion again. Yep. > To see a particularly disgusting example of design by committee just > look at XSLT. And to see another particularly disgusting example of design by committee, look at SGML and XML. And the worst of them all: TEI. Nothing good ever comes out of committees / formalized working groups. >> The DPMWG will have a >> more formalized and committed leadership structure, with weekly >> teleconference calls. From my standards working group experience, it's >> amazing how much stuff gets done during weekly teleconferences and the >> occasional face-to-face meeting (biannual or annual), while written >> listserv exchanges in a group like gutvol-* usually ends up going >> around and around in circles. > Teleconferencing will essentially shut out all non-us based people > via the prohibitive costs or via the language barrier. Non native > English speakers like me may have a better standing in a written > discussion channel. This is of course an issue. There are now very inexpensive and free ways to hold a teleconference via VoIP or similar. In addition, all teleconference meetings, typically 1.0 to 1.5 hours in length, will be scribed and written Minutes produced. In-between teleconferences we can discuss details on a group forum. It is possible to hold the teleconferences at a time which is convenient for those in North/South America and those in Europe -- it gets tougher to find a good time when there are those in Australia/East Asia to include along with Europe/Americas, but this can be worked out somehow. >From firsthand experience it is amazing how more effective technical working groups are when they can at least meet by phone on a regular basis. It is a social and psychological thing which enhances working together and group creativity, which I've yet to see in text-only working relationships. At first I was skeptical (being a strong introvert myself who has always worked "online"), but came around as I observed the importance of human interaction by voice and in person for *technical working groups*. It's amazing, really. >> I would even ask someone like C. Michael Sperberg-McQueen to be an >> advisor to the working group > I don't know if the TEI people could advise us much. I think they will be very helpful -- I know a few others who are very knowledgeable about TEI in an XML-based publishing environment who can help establish requirements. A couple people I know are world-class at XSLT/XSL-FO (I think one of them served a while back on the original XSL working group at W3C.) A close business acquaintance presently serves on the CSS3 Working Group at W3C, and is working on high- quality presentation using XML+CSS in his business. Having varied views on the DP Markup vocabulary is important. > What we need is not advice about the use of TEI as markup language but > about the use of TEI as master format for automatic rendition into a > wide variety of output formats. There is the tei-presentation list for > this sort of thing but traffic there has been very light. Yes, definitely! But note that other groups will also be helpful by providing requirements, including end-user requirements. Having sharp tech people from the accessibility, the library/archive, the ebook publishing, and related communities, will contribute to the bigger picture by providing requirements that they see from their unique perspectives (e.g., metadata from the librarian types.) If we want DP-TNG (TNG: The Next Generation) texts to be good for a wide range of uses, then we have to have people representing the various user groups to have their say in the global design of DP Markup. And I need to re-emphasize the importance to the embracement and success of PG/DP by involving the various user groups with PG/DP, and the inclusive working group approach is one of several strategies to achieve this "buy in". > The only person who could really help is Sebastian Rahtz. Yes, definitely. I recall he and I chatted in private email a couple years ago (for what I don't remember). I'm not sure if he is subscribed to this forum. He is definitely very sharp and would make a great addition to the working group if he is able to participate in some capacity, even if only as an advisor. Jon Noring From bkeir at pgdp.net Thu Oct 28 20:49:32 2004 From: bkeir at pgdp.net (bkeir@pgdp.net) Date: Thu Oct 28 20:49:49 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> Message-ID: <50877.203.12.144.232.1099021772.squirrel@203.12.144.232> > q: in cases where the quotation marks don't balance, it may be > difficult to automatically convert quotation marks to the appropriate > q.../q form, and time consuming to manually proof. Accordingly, I > suggest this step be left as optional. > > I actually agree here. I prefer using " instead of . Can any of the > experts explain why this is a "bad idea"? Presumably is meant to top and tail a quotation, making it possible to extract quotations from within a work if desired. However I'd be worried about going to because of the possible ambiguities in quotations of multiple paragraphs, and the dangers of these being retransformed to " incorrectly for the text versions. "We often find at DP that people brought up on reading only contemporary works, which rarely quote several paragraphs at a time, incorrectly expect that each paragraph of a quotation needs a closing quote mark. "People who have read a lot of 19th century books are well aware that correct usage is that while each paragraph in a quoted passage starts with a quotation mark, only the final paragraph in a quoted passage gets a closing one. "Like this." From scott_bulkmail at productarchitect.com Thu Oct 28 21:47:00 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Thu Oct 28 21:51:27 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> Message-ID: I'd like to address a different issue raised by Brad's example. It may even be a typo of sorts or just a quick-and-dirty sample that's not representative -- but I've seen it elsewhere and think it should be covered in docs and perhaps verification suites. > CHAPTER I >. > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. >
> The Romance of Merlin. > CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF > MERLIN. Using the plain meaning of the terms (rather than any special TEI meaning), it's clear that "CONSULTATION..." is the chapter title. In this particular book, the chapter number appears on the previous line, as a roman numeral, preceeded by the word "CHAPTER" in all caps. That's worth recording so that we can reproduce the original, but I don't think the above is the best way to do it. I'm going to suggest some alternatives that seem more logical; perhaps TEI experts can "translate" these into valid TEI (or suggest extensions that are TEI-like). First, let's take a simpler case; a chapter that starts with just the bare title: CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. I think the markup here can be very simple:
CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. I don't think any TYPE attribute is required; that's clear from context. Now, let's add "CHAPTER I". It's sort of a label that precedes the actual chapter title (much like "Figure" or such for certain illustrations); that gives us:
CHAPTER I CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. NOTE: when automatically extracting chapter titles, it's important to get the first unadorned , i.e. skip . And, AFAIK, no "index" tag is required. Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first. Well, that description suggests:
The Romance of Merlin. CHAPTER I CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. Thoughts? -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From marcello at perathoner.de Fri Oct 29 04:00:19 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Oct 29 04:00:45 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> Message-ID: <418222C3.10402@perathoner.de> Scott Lawton wrote: > Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first. Well, that description suggests: > >
> The Romance of Merlin. > CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > Thoughts? The book title is at a different level from a chapter title so it gets its own div. If you find multiple chapter titles, you decide which is the main one and which are subtitles.
The Romance of Merlin
Chapter I Consultations of Devils, and Birth of Merlin In PGTEI the attribute defaults to the contents of the next element. This will give you "Consultatons ..." in the TOC instead of "Chapter I". If you want it different just use or something like this. -- Marcello Perathoner webmaster@gutenberg.org From brad at chenla.org Fri Oct 29 06:22:22 2004 From: brad at chenla.org (Brad Collins) Date: Fri Oct 29 06:23:58 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: <418222C3.10402@perathoner.de> (Marcello Perathoner's message of "Fri, 29 Oct 2004 13:00:19 +0200") References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> <418222C3.10402@perathoner.de> Message-ID: Marcello Perathoner writes: > The book title is at a different level from a chapter title so it gets > its own div. If you find multiple chapter titles, you decide which is > the main one and which are subtitles. In the case of this specific example, the title of the book is included on the first page of the first chapter.... which I wasn't quite sure how to markup. There is no need to include it in an electronic edition. The title page in it's own div is the correct way to go. The markup for the head elements was off the top of my head, I'm glad someone caught that and pointed it out. Cheers, b/ -- Brad Collins , Bangkok, Thailand From joshua at hutchinson.net Fri Oct 29 08:17:21 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 29 08:17:26 2004 Subject: [gutvol-d] draft TEI conventions and larger example file Message-ID: <20041029151721.737234F4F8@ws6-5.us4.outblaze.com> All of this is very good stuff. But I hope you don't mind if most of it is pushed back to the second iteration of PGTEI. My personal thoughts are to get a "standard" in place that handles what DP would normally label Easy through Normal difficult, in Latin-1 compatible texts. Then, once we have that in place, move on the stuff that gives DP fits on a regular basis, like fraktur, long-s, non-Latin-1 texts (granted DP-Europe handles most of those now). I definitely want to see the issues you bring up addressed. I'm just trying to set some realistic boundaries on what we can address on an incremental basis. Josh ----- Original Message ----- From: "D. Starner" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] draft TEI conventions and larger example file Date: Thu, 28 Oct 2004 15:06:39 -0800 > > I have a few comments on the draft guidelines. It'd be nice to have page > numbers printed on the pages. A letter size PDF would be useful, but the > margins on this one seem generous enough to print on letter-sized paper. > > DP probably will not be preserving the long-s, and I think it a little > unrealistic to expect most of PG's XML documents to preserve it. Also, > the description is incorrect; in English, it's used everywhere except > at the end of the word, and it was used until about 1800, making it > used in the 18th century. > > It's always used in Fraktur; are we going to preserve that? Counting > that, it was used until the middle of the 20th century. It's probably > too minor for this document, but several German documents I've seen > use a non-ligatured long-s/s combination for the eszett, while not > using the long-s elsewhere. Even at the most pedantic, it's arguable > whether this should be encoded with the long-s. > > There should be an option to preserve running headers where they encode > information not found elsewhere. > > I think we should go with standards on the languages section; that is, > RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found > in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish > (how does this differ from sco?) in the draft, and I doubt anything would > choke on it today. #6 is a bad idea, especially as 3 letter 639 codes > sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad > or x-gaddang is a better idea. > > What happened to emph? All I see is rend. Likewise, I'd rather see foreign > do italics and let you mark it with rend="none" if needed, as that would > match how most books do it, and give a guideline to when to use foreign. > > I partially marked up Japanese Literature, and eventually decided not to > mark up all the non-italics Japanese words used in running English text, > like names of plants and such. I think a comment to mark up running > foreign text and italized foreign words, but avoid single words, like > the names of plants and foods, in running text if not italized. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Fri Oct 29 08:26:43 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 29 08:26:48 2004 Subject: [gutvol-d] PGTEI and more Message-ID: <20041029152643.91D679E987@ws6-2.us4.outblaze.com> ----- Original Message ----- From: bkeir@pgdp.net > > > q: in cases where the quotation marks don't balance, it may be > > difficult to automatically convert quotation marks to the appropriate > > q.../q form, and time consuming to manually proof. Accordingly, I > > suggest this step be left as optional. > > > > I actually agree here. I prefer using " instead of . Can any of the > > experts explain why this is a "bad idea"? > > > Presumably is meant to top and tail a quotation, making it > possible to extract quotations from within a work if desired. > > However I'd be worried about going to because of the possible > ambiguities in quotations of multiple paragraphs, and the dangers of these > being retransformed to " incorrectly for the text versions. > > > > "We often find at DP that people brought up on reading only contemporary > works, which rarely quote several paragraphs at a time, incorrectly expect > that each paragraph of a quotation needs a closing quote mark. > > "People who have read a lot of 19th century books are well aware that > correct usage is that while each paragraph in a quoted passage starts with > a quotation mark, only the final paragraph in a quoted passage gets a > closing one. > > "Like this." > There *is* a way to tell TEI to not print a closing quote, but it is rather cumbersome. I think markup is acceptable if used, but should be optional at best. Josh From joshua at hutchinson.net Fri Oct 29 14:59:11 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Oct 29 14:58:32 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes In-Reply-To: References: Message-ID: <4182BD2F.9060701@hutchinson.net> Bowerbird@aol.com wrote: >joshua said: > > >> Yes, genius, we have that ability, too, in the HTML. >> >> > >do you now? >then why don't i see more .html versions prepared this way? >read what i wrote, carefully, and then show me some e-texts >that have an .html version that can match those capabilities... > > > Most of the recent HTML texts from DP use that sort of thing. >and what's with the "genius" comment? are you being snide? > > > > Caught that, did you, genius? >> We were talking about the plain text version >> which is reader program agnostic. >> >> > >my viewer-program takes plain-text files. > >you can be as "agnostic" as you care to be, but >if you don't serve the readers, who are you serving? > > If the only thing that does what you want is a program that doesn't exist, how is that serving the readers? Josh From Bowerbird at aol.com Fri Oct 29 16:24:36 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Oct 29 16:24:57 2004 Subject: [gutvol-d] on the question of sidenotes, footnotes, and end-notes Message-ID: <1e0.2de5534a.2eb42b34@aol.com> joshua said: > Most of the recent HTML texts from DP use that sort of thing. nope. you didn't read my post carefully. show me the .html e-books from d.p. where the reader can see the body of the text _and_ the note _simultaneously_... > Caught that, did you, genius? yep. just making sure everyone else did too. there's this myth going around -- and you're the biggest proponent of it -- that _i_ am the rude one around here... > If the only thing that does what you want > is a program that doesn't exist, > how is that serving the readers? another falsehood you are working hard to propagate... the newest beta of my program will go up this evening. and i expect that it'll come out of beta very soon now... -bowerbird From Jeroen.Hellingman at kabelfoon.nl Fri Oct 29 04:28:42 2004 From: Jeroen.Hellingman at kabelfoon.nl (Jeroen.Hellingman@kabelfoon.nl) Date: Fri Oct 29 17:21:00 2004 Subject: [gutvol-d] draft TEI conventions and larger example file Message-ID: <20041029112842.21153556E7@betazoid.kabelfoon.nl> Op 29-10-2004 01:06, schreef jij: Thanks for your comments. > I have a few comments on the draft guidelines. It'd be nice to have page > numbers printed on the pages. A letter size PDF would be useful, but the > margins on this one seem generous enough to print on letter-sized paper. I follow the ISO standard A4 size, which is an international standard, and offers several benefits to letter, which is only used in the US. I will keep margins generous though. > DP probably will not be preserving the long-s, and I think it a little > unrealistic to expect most of PG's XML documents to preserve it. Also, > the description is incorrect; in English, it's used everywhere except > at the end of the word, and it was used until about 1800, making it > used in the 18th century. Agreed. I will make this optional, and indeed, expect most people to drop it. I normally keep it. > It's always used in Fraktur; are we going to preserve that? Counting > that, it was used until the middle of the 20th century. It's probably > too minor for this document, but several German documents I've seen > use a non-ligatured long-s/s combination for the eszett, while not > using the long-s elsewhere. Even at the most pedantic, it's arguable > whether this should be encoded with the long-s. I would suggest, let the person who prepares the text decide. > There should be an option to preserve running headers where they encode > information not found elsewhere. I did this once, in an easy, but non TEI fashion, the formal method is to use tags. > I think we should go with standards on the languages section; that is, > RFC 3066 or its successor in draft. That is, #1, #2, #3, #8 with #5 found > in the draft. #4 and #7 can be encoded as en-x-1800 and en-x-Scottish > (how does this differ from sco?) in the draft, and I doubt anything would > choke on it today. #6 is a bad idea, especially as 3 letter 639 codes > sometimes overlap with SIL codes; if you need to encode Gaddang, phi-x-SIL-gad > or x-gaddang is a better idea. Isn't sco gaelic? I agree, and will adjust the guidelines (and the texts and tools I have) > What happened to emph? All I see is rend. Likewise, I'd rather see foreign > do italics and let you mark it with rend="none" if needed, as that would > match how most books do it, and give a guideline to when to use foreign. This is due to the way text are produced from printed sources, and it is often difficult to establish the reason a word is in italics. It could be because it is considered foreign, or for some other reason. I use foreign exclusively as a holder for the lang attribute. If you use the lang attribute consequently, you can use that to isolate fragments in a certain language. > I partially marked up Japanese Literature, and eventually decided not to > mark up all the non-italics Japanese words used in running English text, > like names of plants and such. I think a comment to mark up running > foreign text and italized foreign words, but avoid single words, like > the names of plants and foods, in running text if not italized. The decision to mark up words as foreign is sometimes difficult, and can impose a lot of work in such cases. I normally do this, as it helps much in spell checking. However, I would say, it is not required. Jeroen. From sly at victoria.tc.ca Fri Oct 29 23:55:46 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Oct 29 23:56:07 2004 Subject: [gutvol-d] Sidney L. Gulick In-Reply-To: <1881234217531.20041028142723@noring.name> References: <20041028143651.B7BA01097DB@ws6-4.us4.outblaze.com> <1881234217531.20041028142723@noring.name> Message-ID: And now for a slight break from the xml/tei discussion... One nice thing about helping to add additional information to the PG catalog is that sometimes I end up learning about some person who I never would have run accross otherwise. The most recent example is Sidney L. Gulick As I read a little more about this man, I thought that this is someone I would not hesiate to call an American hero. He dedicated his life to international friendship and understanding, most notably instigating a program whereby over 12,000 "friendship dolls" were made by Americans and sent to Japanese schools. I've put together a wikipedia article about him and linked to it from his author record in the PG catalog. http://en.wikipedia.org/wiki/Sidney_Gulick From stephen.thomas at adelaide.edu.au Sat Oct 30 00:29:38 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Sat Oct 30 00:30:02 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> Message-ID: <418342E2.5080906@adelaide.edu.au> Joshua Hutchinson wrote: > > quote is used in an example but apparently isn't part of TEI > Lite (it's not in link_outAppendix A). What's the story? The common advice seems to be to use to enclose quoted speech *inline*, and use for quoting larger blocks of text. The P4 TEI manual was a bit vague on this, but that seems to be a sensible convention worth using. > > It is part of the full TEI spec. Thanks for pointing it out. > I meant to have it in my test.xml, but I forgot. The > test.xml should have for blockquotes > (and will on the next update.) As I understand this (from an earlier post), 'rend="display"' is supposed to mean that the block should be indented (rather like the HTML blockquote). This seems like a very poor choice of terms to me. CSS has a "display" property, which can take values such as "inline", "block", and -- crucially -- "none". "display:none" is used where you don't want the content displayed at all. So using this rend="display" seems likely to result in confusion. In any case, the choice is poor because it does not convey the information desired. If you use on its own without rend="display", does that indicate you don't want to display the content? Or that you don't want to indent it? I personally don't see any need to use rend here. If you are quoting a passage from some other work, then enclose it in .. . That's enough. When someone comes to present this (e.g. in an HTML version), the most natural thing would be to convert the tag to blockquote. The rend is redundant. > > q: in cases where the quotation marks don't balance, it may > be difficult to automatically convert quotation marks to the > appropriate q.../q form, and time consuming to manually > proof. Accordingly, I suggest this step be left as optional. > > I actually agree here. I prefer using " instead of . Can > any of the experts explain why this is a "bad idea"? This was thrashed out at great length almost a year ago. Basically, while purists will see enormous merit in using instead of quote marks, the practical approach is to stick with the quote marks, due to reasons outlined by another poster. (The terminating quote question with muli-paragraph quotes.) There's also nothing *wrong* with using this: "Hello," she said. at least it's not disallowed in TEI. I believe there's a place in the TEI header to indicate which practice you are using in the text. -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From stephen.thomas at adelaide.edu.au Sat Oct 30 00:39:00 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Sat Oct 30 00:39:21 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> Message-ID: <41834514.5040201@adelaide.edu.au> ALl you need is this:
CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. The "rendering agent" can then, if desired, use the type and n attributes to generate the additional "Chapter 1" heading. Steve Scott Lawton wrote: > I'd like to address a different issue raised by Brad's example. It may even be a typo of sorts or just a quick-and-dirty sample that's not representative -- but I've seen it elsewhere and think it should be covered in docs and perhaps verification suites. > > >> CHAPTER I >>. >> CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > > >>
>>The Romance of Merlin. >>CHAPTER I >>CONSULTATION OF DEVILS, AND BIRTH OF >>MERLIN. > > > Using the plain meaning of the terms (rather than any special TEI meaning), it's clear that "CONSULTATION..." is the chapter title. In this particular book, the chapter number appears on the previous line, as a roman numeral, preceeded by the word "CHAPTER" in all caps. That's worth recording so that we can reproduce the original, but I don't think the above is the best way to do it. > > I'm going to suggest some alternatives that seem more logical; perhaps TEI experts can "translate" these into valid TEI (or suggest extensions that are TEI-like). > > First, let's take a simpler case; a chapter that starts with just the bare title: > > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > I think the markup here can be very simple: > >
> CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > I don't think any TYPE attribute is required; that's clear from context. > > Now, let's add "CHAPTER I". It's sort of a label that precedes the actual chapter title (much like "Figure" or such for certain illustrations); that gives us: > >
> CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > NOTE: when automatically extracting chapter titles, it's important to get the first unadorned , i.e. skip . And, AFAIK, no "index" tag is required. > > Since the original example is the first chapter, it has an additional (and common) complication: the book title appears first. Well, that description suggests: > >
> The Romance of Merlin. > CHAPTER I > CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > Thoughts? -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From marcello at perathoner.de Sat Oct 30 03:00:49 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 03:01:13 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <418342E2.5080906@adelaide.edu.au> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> Message-ID: <41836651.3040107@perathoner.de> Steve Thomas wrote: > The common advice seems to be to use to enclose quoted speech > *inline*, and use for quoting larger blocks of text. The P4 TEI > manual was a bit vague on this, but that seems to be a sensible > convention worth using. That would be presentational markup and very against the TEI specs. The specs are very detailed on this: 6.3.3 Quotation This section discusses the following elements, all of which are often rendered by the use of quotation marks: * contains a quotation or apparent quotation ? a representation of speech or thought marked as being quoted from someone else (whether in fact quoted or not); in narrative, the words are usually those of of a character or speaker; in dictionaries, q may be used to mark real or contrived examples of usage. * contains a phrase or passage attributed by the narrator or author to some agency external to the text. * A quotation from some other document, together with a bibliographic reference to its source. * contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics. One form of presentational variation found particularly frequently in written and printed texts is the use of quotation marks. As with the typographic variations discussed in the preceding section, it is generally helpful to separate the encoding of the underlying textual feature (for example, a quotation or a piece of direct speech) from the encoding of its rendering (for example, the use of a particular style of quotation marks). The most common and important use of quotation marks is, of course, to mark quotation, by which we mean simply any part of the text attributed by the author or narrator to some agency other than the narrative voice. Typical examples include passages cited from other works, for which the element may be used, and words or phrases attributed to other voices within the current work, for which the element may be used. If this distinction between intra-textual and inter-textual voices cannot be made reliably, or is not of interest, then all quoted matter may simply be marked using the tag. The editorial policy in this respect should be stated in the encoding description of the TEI Header. The element is used for cases where the author or narrator distances him or herself from the words in question without however attributing them to any other voice in particular. http://www.tei-c.org/P4X/CO.html#COHQQ > As I understand this (from an earlier post), 'rend="display"' is > supposed to mean that the block should be indented (rather like the HTML > blockquote). > > This seems like a very poor choice of terms to me. CSS has a "display" > property, which can take values such as "inline", "block", and -- > crucially -- "none". "display:none" is used where you don't want the > content displayed at all. > > So using this rend="display" seems likely to result in confusion. > > In any case, the choice is poor because it does not convey the > information desired. If you use on its own without > rend="display", does that indicate you don't want to display the > content? Or that you don't want to indent it? "These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines." -- http://www.tei-c.org/P4X/ref-GLOBAL.html Thus we are perfectly right in making up a convention of our own. But TEI is not CSS. Although CSS and the rend attribute are both purely presentational we should not mix TEI and CSS conventions. The "display" choice may be poor but it is exactly the same choice Sebastian Rahtz made in his stylesheets. Look at the code in: http://www.tei-c.org/Stylesheets/P4/html/teihtml-misc.xsl While not dictated by TEI specs, using rend="display" makes our convention compatible with Sebastian's stylesheets. Also, using would be a still poorer choice because the rend attribute is global and can be used on all TEI elements.
is perfectly valid TEI and it would be quite counter-intuitive to have it set a display margin around the block, whereas
makes quite clear what you want. > This was thrashed out at great length almost a year ago. Basically, > while purists will see enormous merit in using instead of quote > marks, the practical approach is to stick with the quote marks, due to > reasons outlined by another poster. (The terminating quote question with > muli-paragraph quotes.) Using has advantages: - automatically finds quotation mark errors - renderer can use prettiest quote in output format, eg. plain ugly apostrophe in TXT and pretty typografical quotes in PDF. - automatically extract quotes from text and disadvantages: - more work The argument about the terminating quote character in multi-paragraph quotes is moot since there is a way to deal with it:

He said: Blah.

And blah.

-- Marcello Perathoner webmaster@gutenberg.org From traverso at dm.unipi.it Sat Oct 30 03:03:51 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sat Oct 30 03:04:15 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <418342E2.5080906@adelaide.edu.au> (message from Steve Thomas on Sat, 30 Oct 2004 16:59:38 +0930) References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> Message-ID: <200410301003.i9UA3p43031055@posso.dm.unipi.it> It is usual, in freench typography (and in french typewriting too, btw) to include an half-width, non-breaking space before "broken punctiation", i.e. [:;!?]. Some typsestting engines (e.g. TeX through the \frenchspacing declaration, and LaTeX through the \usepackage[francais]{babel} header), implement this convention. So the TeX source should not contain these spaces, that will be included by the rendering engine. Putting in and uot these spaces can of course be automated. What should be done to encode correctly a french text in TEI, and what is (should be) done by the text rendering engine? For french text in ISO-Latin it is customary to include a full non-breaking space, in Unicode half-width spaces should be used. Similar conventions apply for em-dashes; here however spaces can be broken, so half-width (breaking) spaces can be used instead. Carlo From traverso at dm.unipi.it Sat Oct 30 03:34:49 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sat Oct 30 03:35:14 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <41836651.3040107@perathoner.de> (message from Marcello Perathoner on Sat, 30 Oct 2004 12:00:49 +0200) References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> Message-ID: <200410301034.i9UAYnHO024145@posso.dm.unipi.it> >>>>> "Marcello" == Marcello Perathoner writes: Marcello> Steve Thomas wrote: >> The common advice seems to be to use to enclose quoted >> speech *inline*, and use for quoting larger blocks of >> text. The P4 TEI manual was a bit vague on this, but that seems >> to be a sensible convention worth using. Marcello> That would be presentational markup and very against the Marcello> TEI specs. The specs are very detailed on this: If TEI has to be used only semantically, then it is inadequate for PG needs. PG markup has to contain presentational elements, in such a way that one can obtain presentations "faithful to the original". A PG-TEI encoded text should allow to call a transform to a presentation form with an "original" formatting specification, allowing to recover whatever was in the original, (as well as other specifications allowing to change it). This might include, (referring to quotations), the possibility of rendering a quoted section with running quotation marks at the start of each line. One should never forget that presentation IS semantic: this is evident with heavily formatted poetry, (Mallarme's "Un coup de des jamais n'abolira le hazard" is a quite extreme case) but in some form or another it is always true. Carlo From jeroen at bohol.ph Sat Oct 30 04:47:13 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sat Oct 30 04:47:07 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <41836651.3040107@perathoner.de> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> Message-ID: <41837F41.2000302@bohol.ph> Marcello Perathoner wrote: > Steve Thomas wrote: > >> The common advice seems to be to use to enclose quoted speech >> *inline*, and use for quoting larger blocks of text. The P4 >> TEI manual was a bit vague on this, but that seems to be a sensible >> convention worth using. > > > That would be presentational markup and very against the TEI specs. > The specs are very detailed on this: > I do not agree with this, especially not in the context of pre-existing books, for a number of reasons. 0. TEI is highly flexible, and prescribes fairly little. You choose what elements you wish to mark up and which not. 1. Quotations do not nest well with paragraphs. TEI (or XML) do not provide mechanisms to properly represent overlapping hierarchies. Older books can be quite difficult to mark up this way, as closing marks are often missing, etc. (I can provide examples) 2. Quotation marks can be considered part of the content, and thus should be retained. Adding elements to these parts is fully optional, and I would only provide these if I have a good reason to do so, as indicated in Marcello's mail. (and I would add, if you would like to create an aural style sheet, and have parts spoken by different voices, they also make sense, just as providing expantions of abbreviations, etc.!) 3. Adding to all quotations (even with help of a script) is labour intensive, and adds little value. > > > > The argument about the terminating quote character in multi-paragraph > quotes is moot since there is a way to deal with it: > >

He said: Blah.

> >

And blah.

> And you will need a very smart renderer to correctly supply them, leave the quotation marks intact (inside or outside the ) or provide cumbersome rend attributes. > > > From jeroen at bohol.ph Sat Oct 30 04:49:25 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sat Oct 30 04:49:17 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <200410301003.i9UA3p43031055@posso.dm.unipi.it> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au> <200410301003.i9UA3p43031055@posso.dm.unipi.it> Message-ID: <41837FC5.60208@bohol.ph> Carlo Traverso wrote: >It is usual, in freench typography (and in french typewriting too, >btw) to include an half-width, non-breaking space before "broken >punctiation", i.e. [:;!?]. > > > My own go would be to ignore it in the encoded version, and let the rendering process deal with it. Jeroen. PS. The dutch story about Pisa is now in PP. Hope to post it somewhere next week -- and ofcourse will prepare a TEI version. From marcello at perathoner.de Sat Oct 30 04:49:21 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 04:49:47 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <200410301034.i9UAYnHO024145@posso.dm.unipi.it> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> Message-ID: <41837FC1.8080704@perathoner.de> Carlo Traverso wrote: > Marcello> Steve Thomas wrote: > > >> The common advice seems to be to use to enclose quoted > >> speech *inline*, and use for quoting larger blocks of > >> text. The P4 TEI manual was a bit vague on this, but that seems > >> to be a sensible convention worth using. > > Marcello> That would be presentational markup and very against the > Marcello> TEI specs. The specs are very detailed on this: > > If TEI has to be used only semantically, then it is inadequate for PG > needs. PG markup has to contain presentational elements, in such a way > that one can obtain presentations "faithful to the original". I didn't say that. I said that using and to markup inline and block quotes respectively was wrong. In TEI all of the presentational stuff should be done with the rend attribute. As to the "faithful to the original" debate: Most people are far too much enamoured of exactly replicating the one edition of the text they happen to work on. (I can understand people wanting to faithfully replicate a Shakespeare First Folio, but not the books PG usually produces.) Most of the presentational attributes of any edition of a text are just whims of the publisher. Who cares if the authors name was printed in Zapf Chancery Slanted 17,4 pt gold embossed with 0.1em of extra inter-character spacing added? If you get a different edition of the same work the authors name will be printed in a very different font. The best guess is to just encode that this is the authors name. > One should never forget that presentation IS semantic: this is evident > with heavily formatted poetry, (Mallarme's "Un coup de des jamais > n'abolira le hazard" is a quite extreme case) but in some form or > another it is always true. That is a half truth at the best. Presentation encodes semantics, but it is a lossy encoding. The same presentational attribute "italics" can encode a wide range of semantic features like "emphasis", "foreign word", "name", etc. If presentation could losslessly encode semantics, and an accepted standard existed how to do this, a program could recover the semantics from the presentation and mark up a text all by itself. But then, if a program can guess, why mark up at all? This is Bowerbirds ZML approach. What Bowerbird does not understand is that there are far too many semantic features to make a presentational encoding reversible. (Technically Bowerbird is farther off the rocker still: he says that ASCII TXT can encode all semantics in the world, which is even sillier than to say that typography can.) Mathematically speaking: Let PRE be the set of all presentational attributes that can reasonably be distinguished by human eye, and SEM be the set of all semantics. Then there is no bijective function PRE = f (SEM) Thus we can say "presentation hints at semantics" but not "presentation IS semantic". -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Oct 30 05:35:12 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 05:35:16 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <41837F41.2000302@bohol.ph> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com><418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <41837F41.2000302@bohol.ph> Message-ID: <41838A80.7050808@perathoner.de> Jeroen Hellingman wrote: > 0. TEI is highly flexible, and prescribes fairly little. You choose what > elements you wish to mark up and which not. Yes. But *if* you mark up you have to use the right element. Using for all displayed quotes is wrong. > 1. Quotations do not nest well with paragraphs. TEI (or XML) do not > provide mechanisms to properly represent overlapping hierarchies. Older > books can be quite difficult to mark up this way, as closing marks are > often missing, etc. (I can provide examples) I have marked up a lot of books with multi-paragraph quotations. I also have a script that replaces most quotation signs with and even gets right most of the time. I found a lot of quotation mark errors in PG texts this way. > 2. Quotation marks can be considered part of the content, and thus > should be retained. Adding elements to these parts is fully > optional, and I would only provide these if I have a good reason to do > so, as indicated in Marcello's mail. (and I would add, if you would like > to create an aural style sheet, and have parts spoken by different > voices, they also make sense, just as providing expantions of > abbreviations, etc.!) 1. Quotation marks are just presentational markup for "this is a quote", no more than italic is presentational markup for "this is emphasized". You should retain the underlying semantic feature not the presentation. 2. Replacing quotation signs with will actually preserve them *better*. Unless you replace all apostroph chars with the correct lsquo and rsquo characters or entities, almost every output will look nearer to the original if the renderer can insert the correct unicode lsquo rsquo glyphs. (Note: its difficult for a renderer to guess from context if it should render apos as apos, lsquo or rsquo, but it is easy to transform and .) But *if* you replace apos with lsquo and rsquo you may as well replace it with and . But of course all this discussion is moot, because my converter supports both ways and you can do as you like. > 3. Adding to all quotations (even with help of a script) is labour > intensive, and adds little value. Not at all. The script finds most of these. The validator finds some more. Then you make a last pass in the editor with a regexp search. (Of course doing Mark Twain will take a little longer.) -- Marcello Perathoner webmaster@gutenberg.org From brad at chenla.org Sat Oct 30 06:12:51 2004 From: brad at chenla.org (Brad Collins) Date: Sat Oct 30 06:14:35 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: <41834514.5040201@adelaide.edu.au> (Steve Thomas's message of "Sat, 30 Oct 2004 17:09:00 +0930") References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> <41834514.5040201@adelaide.edu.au> Message-ID: Steve Thomas writes: > ALl you need is this: > >
> CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. > > The "rendering agent" can then, if desired, use the type and n > attributes to generate the additional "Chapter 1" heading. > > > Steve This should work, at least on the vast majority of modern texts. And I agree that the purpose of marking up a text is to markup the content of the text, not duplicate the original layout or typography. PG is producing electronic editions, not electronic facimiles of an original. This is no problem in texts like A Christmas Carol where Chapters are called `staves', but what if the text uses an alternate spelling for the word chapter, or only uses numbers or spells out the number into words? For example. 1 i. Chapter One CHAPTER ONE chapter 1. Chap 1. CH I First Chapter The type attribute should generally use enumerated values so that processing software can understand that all of these different forms of the concept `chapter' are the same. We could normalize all headings, no matter what the original was, but I would prefer to keep the original. In rare cases it reflects authorial intent or is a stylistic element in the overall flow of the work. For A Christmas Carol I would rather use Scott's approach:
STAVE ONE. MARLEY?S GHOST. Rather than:
In this way processing software would understand that a stave is a chapter when it looks up a reference in another work which points to Chapter 1, page 4 in the Carol. I also think that the type of label should be stated clearly. There might be many kinds of labels in a complex document. ------ Ack! Before sending this I had a look through the TEI manual and found this example: Libro Primo Which is somewhere between Scott's idea and and Steve's. If you used this for the Carol it might look like this:
MARLEY?S GHOST. This preserves the type value as an enumerated value, but using the `n' value as a text string rather than an integer make's it more difficult for processing agents to understand the structure of the text. I would prefer that the `n' value be an integer and use the head/label approach. I believe that as a general rule attribute values should be used for items which help process a text, or clarify the meaning of a text, rather than for any part of the text which is displayed. The spec defines the datatype for `n' as CDATA. So `Stave 1' is a legal value, but I would seriously consider making the value more restrictive. All in all I think I still like Scott's approach but I'm still open to any better suggestions. BTW: This has been a fantastic discussion and has helped me clarify a lot of details in using TEI which I hadn't completely worked out before. This is very difficult stuff folks, and b/ -- Brad Collins , Bangkok, Thailand From marcello at perathoner.de Sat Oct 30 06:22:36 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 06:22:42 2004 Subject: [gutvol-d] draft TEI conventions and larger example file In-Reply-To: <418144CD.7070707@bohol.ph> References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> <418144CD.7070707@bohol.ph> Message-ID: <4183959C.6010806@perathoner.de> Jeroen Hellingman wrote: > As I promised already some time ago, I've prepared a draft TEI Lite > conventions document, 1. The main point of incompatibility with my proposal is the lack of support for plain
. I support both,
and . I think you should also. 2. The rend attributes are ill-chosen and need reworking. rend is a global attribute and can be used on all TEI elements. It is counter-intuitive to make the effect dependent on the element.
floats the picture to the left

makes a ragged-right paragraph better use

3. The urls have to be changed. www.gutenberg.org/css/ is already taken for the site css and I don't want to mix those with the book css. www.gutenberg.org/xslt/ and www.gutenberg.org/dtd/ are off the main directory. I try to keep the number of subdirectories in the main directory to a minimum. Proposal: one directory off the main with a hierarchy to accomodate all xslt stuff by different people. www.gutenberg.org/tei/ www.gutenberg.org/tei/jeroen/ www.gutenberg.org/tei/jeroen/css/ www.gutenberg.org/tei/jeroen/dtd/ www.gutenberg.org/tei/jeroen/xslt/ www.gutenberg.org/tei/marcello/ www.gutenberg.org/tei/marcello/css www.gutenberg.org/tei/marcello/dtd www.gutenberg.org/tei/marcello/xslt etc. The prefix www.gutenberg.org/tei/jeroen/ should also be used for all your namespaces. Is this ok? -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Sat Oct 30 06:43:24 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Oct 30 06:42:35 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <418342E2.5080906@adelaide.edu.au> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> Message-ID: <41839A7C.8090500@hutchinson.net> Steve Thomas wrote: > > As I understand this (from an earlier post), 'rend="display"' is > supposed to mean that the block should be indented (rather like the > HTML blockquote). > > This seems like a very poor choice of terms to me. CSS has a "display" > property, which can take values such as "inline", "block", and -- > crucially -- "none". "display:none" is used where you don't want the > content displayed at all. > > So using this rend="display" seems likely to result in confusion. > > In any case, the choice is poor because it does not convey the > information desired. If you use on its own without > rend="display", does that indicate you don't want to display the > content? Or that you don't want to indent it? > > I personally don't see any need to use rend here. If you are quoting a > passage from some other work, then enclose it in .. . > That's enough. When someone comes to present this (e.g. in an HTML > version), the most natural thing would be to convert the tag to > blockquote. The rend is redundant. You know... Thank you, Steve. When I read this, I had a "duh!" moment and slapped my head. You are absolutely right. *should* just result in a blockquote when converted to HTML. The rend=display is redundant here. Josh From brad at chenla.org Sat Oct 30 06:41:35 2004 From: brad at chenla.org (Brad Collins) Date: Sat Oct 30 06:43:17 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <41837FC1.8080704@perathoner.de> (Marcello Perathoner's message of "Sat, 30 Oct 2004 13:49:21 +0200") References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> <41837FC1.8080704@perathoner.de> Message-ID: Marcello Perathoner writes: > Carlo Traverso wrote: > >> Marcello> Steve Thomas wrote: >> >> The common advice seems to be to use to enclose quoted >> >> speech *inline*, and use for quoting larger blocks of >> >> text. The P4 TEI manual was a bit vague on this, but that seems >> >> to be a sensible convention worth using. >> Marcello> That would be presentational markup and very against >> the >> Marcello> TEI specs. The specs are very detailed on this: >> If TEI has to be used only semantically, then it is inadequate for >> PG >> needs. PG markup has to contain presentational elements, in such a way >> that one can obtain presentations "faithful to the original". > Marcello of course is completely correct, but that doesn't mean that Steve is wrong.... Einstein didn't invalidate Newton, he refined Newton. That's how progressive passes of markup should work. A lot of people are coming to TEI from an HTML background. It's the 'ol when the only tool you have is a hammer, everything begins to look like a nail. And in a way, as a general rule you could say that is for inline and is for block quotes. And many times you'd be right, even though many times it would be for the wrong reasons. Steve has voiced a sort of first-pass, rule of thumb. It's a bit like the tag in TEI which isn't terribly semantic when used as a first pass general markup tag. I would love to see a defined first-pass set of markup tags which would be as easy as HTML to learn and apply. This would help enormously in early stages of markup which could then be done by folks who haven't spent long lonely hours pouring over the TEI manual and then testing chunks of code in nxml-mode (an XML editing mode in Emacs). b/ Who is bloody thankful the sun just went down after a blistering day in the big shitty.... sometimes I wish I could afford air-con. And the hot season is still yet to come. -- Brad Collins , Bangkok, Thailand From marcello at perathoner.de Sat Oct 30 06:49:04 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 06:49:09 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> <41834514.5040201@adelaide.edu.au> Message-ID: <41839BD0.2010405@perathoner.de> Brad Collins wrote: > For A Christmas Carol I would rather use Scott's approach: > >

> STAVE ONE. > MARLEY?S GHOST. > > Rather than: > >
You'll also have to consider XPath queries. In a couple of years we'll likely put all of the PG TEI files into a giant XML database. No more files. You'll retrieve a book with an XPath query like (simplyfied): /org/gutenberg/etext/12345 You'll get the book title(s) with /org/gutenberg/etext/12345//titleStmt/title and the title of the first chapter with /org/gutenberg/etext/12345//div[@type="chapter"][@n=1]/head Of course this will only work if the first chapter always has attribute type="chapter" and attribute n=1 and not n="I" or n="Chapter 1" or n="Chapter I" ... -- Marcello Perathoner webmaster@gutenberg.org From jeroen at bohol.ph Sat Oct 30 06:52:24 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sat Oct 30 06:51:27 2004 Subject: [gutvol-d] draft TEI conventions and larger example file In-Reply-To: <4183959C.6010806@perathoner.de> References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com><418144CD.7070707@bohol.ph> <4183959C.6010806@perathoner.de> Message-ID: <41839C98.4000702@bohol.ph> Marcello Perathoner wrote: > 1. The main point of incompatibility with my proposal is the lack of > support for plain
. I support both,
and . I think you > should also. It means some extra programming from my side. I would like to see the various people working on this issue converge to a single standard. > 2. The rend attributes are ill-chosen and need reworking. > > rend is a global attribute and can be used on all TEI elements. It is > counter-intuitive to make the effect dependent on the element. > >
floats the picture to the left >

makes a ragged-right paragraph > > better use > >

>

> I agree the rendition ladder approach is much better. The "simple" rend attributes are actually quick hacks. I've looked in your code, and the generic code for rend attributes you use is a much better way to deal with it. Quite some work needs to be done though before rendition ladders are fully supported. > > 3. The urls have to be changed. > > www.gutenberg.org/css/ is already taken for the site css and I don't > want to mix those with the book css. > > www.gutenberg.org/xslt/ and www.gutenberg.org/dtd/ are off the main > directory. I try to keep the number of subdirectories in the main > directory to a minimum. > > Proposal: one directory off the main with a hierarchy to accomodate > all xslt stuff by different people. > > www.gutenberg.org/tei/ > www.gutenberg.org/tei/jeroen/ > www.gutenberg.org/tei/jeroen/css/ > www.gutenberg.org/tei/jeroen/dtd/ > www.gutenberg.org/tei/jeroen/xslt/ > www.gutenberg.org/tei/marcello/ > www.gutenberg.org/tei/marcello/css > www.gutenberg.org/tei/marcello/dtd > www.gutenberg.org/tei/marcello/xslt > etc. > > The prefix www.gutenberg.org/tei/jeroen/ should also be used for all > your namespaces. > That sounds like a good proposal, how do others think about it, especially if books in the books hierarchy start referencing to these things? Currently, they are rather self-contained things, with all required stuff in one place. Doing this will basically require us to keep things in the generic directories downwards compatible with texts posted before. Jeroen From brad at chenla.org Sat Oct 30 06:59:22 2004 From: brad at chenla.org (Brad Collins) Date: Sat Oct 30 07:01:05 2004 Subject: [gutvol-d] draft TEI conventions and larger example file In-Reply-To: <4183959C.6010806@perathoner.de> (Marcello Perathoner's message of "Sat, 30 Oct 2004 15:22:36 +0200") References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> <418144CD.7070707@bohol.ph> <4183959C.6010806@perathoner.de> Message-ID: Marcello Perathoner writes: > Jeroen Hellingman wrote: > > 2. The rend attributes are ill-chosen and need reworking. > > rend is a global attribute and can be used on all TEI elements. It is > counter-intuitive to make the effect dependent on the element. > >

floats the picture to the left >

makes a ragged-right paragraph > > better use > >

>

Should `rend' then be a means of passing CSS to a processor? I see a lot of people using the `rend' attribute as a means of dumping in presentational instructions, when it should be used as a means of describing the original: ,----[ TEI Manual: Global Attributes ] | rend (rendition or presentation) indicates how the element in question | was rendered or presented in the source text. | Datatype: CDATA | Values: any string of characters; if the typographic rendition | of a text is to be systematically recorded, a | systematic set of values for the rend attribute should | be defined. | Default: #IMPLIED `---- I would suggest that PG define a set of enumerated values for `rend' which then can be mapped to CSS. That is more restrictive and requires changing the datatype. Any thoughts? b/ -- Brad Collins , Bangkok, Thailand From jon at noring.name Sat Oct 30 07:57:30 2004 From: jon at noring.name (Jon Noring) Date: Sat Oct 30 07:57:44 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <200410301034.i9UAYnHO024145@posso.dm.unipi.it> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> Message-ID: <1181387224250.20041030085730@noring.name> Carlo wrote: > Marcello wrote >> Steve Thomas wrote: >>> The common advice seems to be to use to enclose quoted >>> speech *inline*, and use for quoting larger blocks of >>> text. The P4 TEI manual was a bit vague on this, but that seems >>> to be a sensible convention worth using. >> That would be presentational markup and very against the >> TEI specs. The specs are very detailed on this: > If TEI has to be used only semantically, then it is inadequate for PG > needs. PG markup has to contain presentational elements, in such a way > that one can obtain presentations "faithful to the original". Is this a requirement that it be possible *without some manual work* to regenerate the typographic layout of the source document? And what impact does this attempt to be 'faithful to the original' have on accessibility and non-visual uses of the PG texts? > A PG-TEI encoded text should allow to call a transform to a > presentation form with an "original" formatting specification, > allowing to recover whatever was in the original, (as well as other > specifications allowing to change it). This might include, (referring > to quotations), the possibility of rendering a quoted section with > running quotation marks at the start of each line. This implies, for example, that "long-s" characters, common in pre-19th century English texts, should be preserved (e.g., use the Unicode character equivalent). For modern usage someone can later transform all Unicode "long-s" characters to the ordinary "s". But to do it the other way around is more difficult. (Yes, a special character is not usually a "presentation" issue, but in this case it has become a modern presentation issue.) > One should never forget that presentation IS semantic: this is evident > with heavily formatted poetry, (Mallarme's "Un coup de des jamais > n'abolira le hazard" is a quite extreme case) but in some form or > another it is always true. I disagree with this in a general sense. Presentation is most used to communicate document structure and sometimes the semantics of particular chunks of content (e.g., "this is a foreign phrase".) In a few cases visual layout becomes part of content itself ("poetry as visual art"). In these rare cases I believe that SVG should be used since there are facilities in SVG for accessibility, and SVG will truly get it exactly right all the time. SVG is XML-based, too. Jon Noring From marcello at perathoner.de Sat Oct 30 08:27:19 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 08:27:26 2004 Subject: [gutvol-d] draft TEI conventions and larger example file In-Reply-To: References: <20041028153805.353A32F97A@ws6-3.us4.outblaze.com> <418144CD.7070707@bohol.ph> <4183959C.6010806@perathoner.de> Message-ID: <4183B2D7.6010109@perathoner.de> Brad Collins wrote: >>

>>

> > Should `rend' then be a means of passing CSS to a processor? Not necessarily CSS. I used those values as example. > I see a lot of people using the `rend' attribute as a means of > dumping in presentational instructions, when it should be used as a > means of describing the original: Of course, you should use rend judiciously, and only to preserve the rendition of the original when you feel it needs recording. If we are going to define a set of values we will most probably end up with something resembling CSS very much. -- Marcello Perathoner webmaster@gutenberg.org From sly at victoria.tc.ca Sat Oct 30 09:40:42 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Oct 30 09:40:53 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> <41834514.5040201@adelaide.edu.au> Message-ID: H. G. Wells texts very often use: Chapter the First Chapter the Second Chapter the Third etc. Andrew On Sat, 30 Oct 2004, Brad Collins wrote: > This is no problem in texts like A Christmas Carol where Chapters are > called `staves', but what if the text uses an alternate spelling for > the word chapter, or only uses numbers or spells out the number into > words? For example. > > 1 > i. > Chapter One > CHAPTER ONE > chapter 1. > Chap 1. > CH I > First Chapter > From gbnewby at pglaf.org Sat Oct 30 11:20:32 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Oct 30 11:20:33 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <1181387224250.20041030085730@noring.name> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> <1181387224250.20041030085730@noring.name> Message-ID: <20041030182032.GA7344@pglaf.org> On Sat, Oct 30, 2004 at 08:57:30AM -0600, Jon Noring wrote: > Carlo wrote: > > If TEI has to be used only semantically, then it is inadequate for PG > > needs. PG markup has to contain presentational elements, in such a way > > that one can obtain presentations "faithful to the original". > > Is this a requirement that it be possible *without some manual work* > to regenerate the typographic layout of the source document? I have not heard this as a requirement. Of course, some eBook producers might believe it's valuable, and they are welcome to prepare their work to be "typographyically correct" (whatever that might mean to them). However, it *is* a requirement to automatically regenerate plain text and HTML (perhaps other formats as desired) from the XML. -- Greg From jon at noring.name Sat Oct 30 11:43:12 2004 From: jon at noring.name (Jon Noring) Date: Sat Oct 30 11:43:30 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: <20041030182032.GA7344@pglaf.org> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> <1181387224250.20041030085730@noring.name> <20041030182032.GA7344@pglaf.org> Message-ID: <251400766265.20041030124312@noring.name> Greg Newby wrote: > Jon Noring wrote: >> Carlo wrote: >>> If TEI has to be used only semantically, then it is inadequate for PG >>> needs. PG markup has to contain presentational elements, in such a way >>> that one can obtain presentations "faithful to the original". >> Is this a requirement that it be possible *without some manual work* >> to regenerate the typographic layout of the source document? > I have not heard this as a requirement. Of course, some eBook > producers might believe it's valuable, and they are welcome to prepare > their work to be "typographyically correct" (whatever that might mean > to them). My question was more rhetorical rather than inquisitive. The discussion shows there's different views on the issue of what we preserve, in a presentational sense, of the original source document. For me, only rarely must the typographic layout be reproduced in some manner (such as "poetry as visual art" and a few other rarities as have been brought out here.) And for this, I recommend using SVG rather than trying to use presentational markup plus CSS to effect the desired result in the digital text version. I've previously commented on eschewing tabs and spaces for poetry/verse used to preserve visual indentation (my view is to use structural or semantic markup instead -- and where poetry moves into the visual art realm, then use SVG.) Whether to preserve the "long-s" or not is more problematic, since where do we draw the line? For example, if we have an old Russian text, do we transliterate the character set to Latin? Of course we don't. Isn't the use of a "long-s" part of a variant character used at the time of publication? It is easy to auto-convert the Unicode equivalent of the 'long-s' character to an ordinary 's' (as it is for the German ess-tsett), but going the other way is much more difficult. > However, it *is* a requirement to automatically regenerate plain text > and HTML (perhaps other formats as desired) from the XML. Definitely! Both repurposeability and accessibility is vital. My view is that, as much as possible, make the final master digital content as agnostic with respect to presentation type as possible. And in the rare instances this is not possible, then use SVG, which when done right allows much better accessibility and repurposeability. If enough agree here, we might want to begin discussing how to integrate islands of SVG within the TEI framework. Jon Noring From scott_bulkmail at productarchitect.com Sat Oct 30 15:33:03 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sat Oct 30 15:36:34 2004 Subject: [gutvol-d] capture original presentation? In-Reply-To: <251400766265.20041030124312@noring.name> References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> <1181387224250.20041030085730@noring.name> <20041030182032.GA7344@pglaf.org> <251400766265.20041030124312@noring.name> Message-ID: I've taken the liberty of starting a new thread since I think this issue is important. It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup. It's fine if others don't place any or much importance on that goal, but I hope they will still contribute TEI/markup knowledge so that this choice is supported. Here, I think we can have our cake and eat it too. >Jon Noring typed: > >My view >is that, as much as possible, make the final master digital content as >agnostic with respect to presentation type as possible. And in the >rare instances this is not possible, then use SVG I think there's a better middle ground here. Yes, SVG is useful in "extreme" cases, but I don't think it addresses the primary use case. My suggestion is that structural markup is *required*, and additional presentational markup is *optional*. For those who want an agnostic master file, just ignore the presentational markup -- i.e. we have to design the XML so that the presentation is clearly distinct from structure. Paraphrasing what Brad said to me in an earlier thread, the "rend" attribute describes the original presentation but doesn't enforce any specific output presentation. Here's an example where SVG is clearly overkill: --Introduction-- 1. The Cyclone 2. The Council with the Munchkins Structural markup and regeneration would yield: Introduction 1. The Cyclone 2. The Council with the Munchkins That's perfectly reasonable, and may suffice for most people. I just want there to be a way for those who think it's worth the effort to capture the former presentation in the master file. For example: Introduction Or, using Marcelo's index tag: Introduction (In both cases, -- should probably be — and there may be a better solution than hardcoding the leading spaces.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From scott_bulkmail at productarchitect.com Sat Oct 30 15:34:58 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sat Oct 30 15:36:39 2004 Subject: [gutvol-d] "Chapter n" as title vs. something else In-Reply-To: <418222C3.10402@perathoner.de> References: <1a5.29beadb0.2eb17a83@aol.com> <418030C5.7070505@hutchinson.net> <418222C3.10402@perathoner.de> Message-ID: >>

>> The Romance of Merlin. >> CHAPTER I >> CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. >> >>Thoughts? > >The book title is at a different level from a chapter title so it gets its own div. If you find multiple chapter titles, you decide which is the main one and which are subtitles. > >
> The Romance of Merlin > >
> Chapter I > > Consultations of Devils, and Birth of Merlin I want to make sure that I understand how
fits into the big picture. Is the following correct? ... table of contents, introduction ...
The Romance of Merlin
Chapter I ... or type="label" Consultations of Devils, and Birth of Merlin If so: 1. I agree that it's consistent, and may be the best TEI-centric solution 2. it introduces a level of hierarchy that some may find confusing Whether #2 is important depends in part on who will be doing the most markup: volunteers who have some HTML experience vs. volunteers who are already TEI savvy (or don't mind the additional complexity). > Chapter I I much prefer type=label. "Chapter I" is not a subhead according to the plain meaning of the term. Also, unlike a true subhead, it may be something that some people want to strip out or translate or standardize. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From shalesller at writeme.com Sat Oct 30 14:04:40 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Oct 30 16:06:37 2004 Subject: [gutvol-d] PGTEI and more Message-ID: <20041030210440.B0B1B4C0CF@ws1-1.us4.outblaze.com> Jon Noring writes: > Whether to preserve the "long-s" or not is more problematic, since > where do we draw the line? For example, if we have an old Russian > text, do we transliterate the character set to Latin? Of course we > don't. Yes, because that would be stupid and useless. The real question is if we have an old Russian text, do we convert the fitas and other letters that are strictly redundant and hence abandoned in modern Russian to their modern Russian equivalent? Most English editions convert the long-s, and I believe most Russian editions convert the old letters to the modern equivalents. Or how about the o with e above that was the written form of o-umlaut? Do we preserve that or convert it to the modern form? There's one book in DP that preserves it in the Unicode edition because umlauts were used in a brief section, but I don't know that it was more important than a change to Fraktur or bold. Again, modern German that was original printed with an o-e above is converted to umlauts when reprinted; the e above was encoded for people who wanted to use it with middle German to contrast with modern German. > Isn't the use of a "long-s" part of a variant character used at > the time of publication? It is easy to auto-convert the Unicode > equivalent of the 'long-s' character to an ordinary 's' (as it is for > the German ess-tsett), but going the other way is much more difficult. It depends. In English, most of the long-s usage is trivial to convert; if it's not at the end of a word, it's a long-s. Sometimes a long-s s combination is seen for ss, but that's consistent within one work generally. In German, it takes a dictionary lookup and there's one or two minor examples, comparable to the ones about Polish and polish in English, where it can't be automatically converted. But one question should be our readers. There's a lot of well-educated people who aren't familiar with the long-s; are we doing more good in keeping what's more a detail of the typography than the spelling at the cost of some of our readers? The vast majority of the editions I've seen that reprint pre-1800 English works or German works orginally printed in Fraktur in modern fonts do not use the long-s, even when preserving original spelling. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From gbnewby at pglaf.org Sat Oct 30 17:57:26 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Oct 30 17:57:27 2004 Subject: [gutvol-d] capture original presentation? In-Reply-To: References: <20041028162509.73285109A27@ws6-4.us4.outblaze.com> <418342E2.5080906@adelaide.edu.au> <41836651.3040107@perathoner.de> <200410301034.i9UAYnHO024145@posso.dm.unipi.it> <1181387224250.20041030085730@noring.name> <20041030182032.GA7344@pglaf.org> <251400766265.20041030124312@noring.name> Message-ID: <20041031005726.GA14737@pglaf.org> On Sat, Oct 30, 2004 at 06:33:03PM -0400, Scott Lawton wrote: > I've taken the liberty of starting a new thread since I think this issue is important. > > It's clear that some people (myself included) would like to capture more information about presentation than would be done if the goal were ONLY semantic markup. Just a quick note related to this, and my apologies if it turned up in the thread already and I missed it: We're planning to include the scanned page images along with eBooks. In fact, this is part of the intent with the new directory structure for the PG servers (the /1/0/8/0/... structure). We haven't done any (or many, anyway) because we're still trying to figure out how to best name the page files, and how to link them on a page-by-page basis into the (marked up?) eBooks. Jim Tinsley drafted some general guidelines for the image files themselves, but linking them to the eBooks is something we need to figure out still. (BTW, the Million Books project at archive.org uses djvu for this purpose. It's not bad, but I like our intended solution of XML markup much better. Plus, of course, the MBP is mostly working with relatively poor quality proofreading. For PG, the text has taken the main emphasis, not the appearance.) My notion is that the PGTEI and TEI lite solutions I've been reading about in this list will be easily adaptable to including links to specific page image files, so I've not mentioned it until now. But since it's related to your desire for preservation of the actual appearance of the scanned page, I figured I'd type it up now. That accomplished, please continue with your further thoughts - preserving appearance is definitely something that is frequently desired. -- Greg From scott_bulkmail at productarchitect.com Sat Oct 30 18:11:54 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Sat Oct 30 18:12:49 2004 Subject: [gutvol-d] PGTEI and more Message-ID: >>langUsage: I suggest the standard should be to omit the content of >>the tag (e.g. "British", which is probably more useful as "British >>English" or "English (British)"). This information should be >>generated to ensure consistency. (They appear in the generated PGTEI >>and in alice.tei, but not in lmiss.tei.) > >You have to include only the languages you actually use in the text. What about the content of the tag? i.e. which is correct? # lmiss.tei British # alice.tei I think the first is much better. Given the second, it will be extra work to enforce a consistent word or phrase. >The converter includes some more because it is easier to delete than to add and if you declare too many it doesn't hurt. I agree that it's easier to delete; hence my suggestion to include a note. Actually, all languages except the main one should be able to be determined programmatically, right? Just extract and dedup lang= attributes. We certainly don't want to include languages that aren't used; no point in bothering with all this XML if we're just going to populate it with wrong data. >>Having separate index tags for TOC, PDF and PDB strikes me as >>unnecessary and prone to error. Shouldn't the TOC one suffice for >>all? > >Some formats have limitations. eg. PamlDoc bookmarks have a maximum of 16 characters. PDF bookmarks have to use iso-8859-1 chars. Moreover you don't always want the full to appear in the contents. So, the PalmDoc and PDF headers can be generated to conform to those limitations. I don't see the benefit of including these extra tags for every chapter of every document in the PG collection! -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From marcello at perathoner.de Sat Oct 30 18:37:56 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Oct 30 18:38:12 2004 Subject: [gutvol-d] PGTEI and more In-Reply-To: References: Message-ID: <418441F4.4030708@perathoner.de> Scott Lawton wrote: > What about the content of the tag? i.e. which is correct? > > # lmiss.tei > British # alice.tei Both work. The contents of the tag does not matter. The lang attribute is and IDREF. If you say then you must have an element somewhere in your TEI with and id of "fr" otherwise it will not validate. The section is just a bin to hold those elements. >> Some formats have limitations. eg. PamlDoc bookmarks have a maximum >> of 16 characters. PDF bookmarks have to use iso-8859-1 chars. >> Moreover you don't always want the full to appear in the >> contents. > > So, the PalmDoc and PDF headers can be generated to conform to those > limitations. I don't see the benefit of including these extra tags > for every chapter of every document in the PG collection! How do you go about to condense a longer title into 16 characters? There is no algorithm that can do that nearly as well as a human. A human will always choose to include the most important part. CONSULTATION OF DEVILS, AND BIRTH OF MERLIN. => Birth of Merlin -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Sun Oct 31 01:18:28 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Oct 31 01:18:51 2004 Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise) Message-ID: <1b9.54e4965.2eb607e4@aol.com> john said: > Please picture this scenario: > > I'm a volunteer who has scanned a public-domain book and > wants to make it available through the PG distribution mechanism > (free of charge, available until the Internet collapses under the weight of > spam and next-generation pornography, yadda, yadda, yadda). > > Today, if I can convert this book to plain text (according to > some stated formatting conventions), I may submit the book. > If I'm ambitious, I can create an HTML version, which presents > the same information, but allows "real" formatting rather than > _italic_ and *bold*. > > In the background, however, there is this Whole New World(tm) > of semantic tagging, which presumably will allow the book to > make snacks and provide entertainment during the reading process. > But, for me, as a volunteer, who spends a considerable amount of time > working on books, but enjoys actually finishing one and seeing it posted, > I can't get my arms around the benefits. > > Except for recognizing the acronyms, I am agnostic to > XML/ZML/TEI/ABC/EIEIO. > > Could someone please explain the benefit of semantic tagging > and why it won't horribly lengthen the amount of time required > to produce an eBook? > > Thank you. well? -bowerbird From marcello at perathoner.de Sun Oct 31 03:57:19 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Oct 31 03:57:46 2004 Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise) In-Reply-To: <1b9.54e4965.2eb607e4@aol.com> References: <1b9.54e4965.2eb607e4@aol.com> Message-ID: <4184D31F.4030204@perathoner.de> Bowerbird@aol.com wrote: >> Could someone please explain the benefit of semantic tagging >> and why it won't horribly lengthen the amount of time required >> to produce an eBook? > > well? This has already been discussed at great length. He can go to the archives to read it up. -- Marcello Perathoner webmaster@gutenberg.org From jeroen at bohol.ph Sun Oct 31 07:46:36 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sun Oct 31 07:46:28 2004 Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise) In-Reply-To: <4184D31F.4030204@perathoner.de> References: <1b9.54e4965.2eb607e4@aol.com> <4184D31F.4030204@perathoner.de> Message-ID: <418508DC.4080109@bohol.ph> Marcello Perathoner wrote: > Bowerbird@aol.com wrote: > >>> Could someone please explain the benefit of semantic tagging and >>> why it won't horribly lengthen the amount of time required to >>> produce an eBook? >> You'll need about one hour to add very basic level TEI tagging to a simple work, such as a novel. For scientific works with loads of tables, footnotes, foreign citations, and numerous cross references, it can take several days, but they will be increasingly required to be able to handle such works at all. The learning curve for basic TEI is not too steep, and can be learned as easy as HTML in a few hours, then as you encounter more difficult constructs, you can gradually absorb more of the stuff. For books requiring special things, we will probably end up having specialists. Important in this stage is that we will have tools available such that people can easily validate what they are doing. Jeroen. From hyphen at hyphenologist.co.uk Sun Oct 31 08:19:21 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sun Oct 31 08:19:38 2004 Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise) In-Reply-To: <418508DC.4080109@bohol.ph> References: <1b9.54e4965.2eb607e4@aol.com> <4184D31F.4030204@perathoner.de> <418508DC.4080109@bohol.ph> Message-ID: On Sun, 31 Oct 2004 16:46:36 +0100, Jeroen Hellingman wrote: | Marcello Perathoner wrote: | | > Bowerbird@aol.com wrote: | > | >>> Could someone please explain the benefit of semantic tagging and | >>> why it won't horribly lengthen the amount of time required to | >>> produce an eBook? | >> | | You'll need about one hour to add very basic level TEI tagging to a | simple work, such as a novel. For scientific works with loads of tables, | footnotes, foreign citations, and numerous cross references, it can take | several days, but they will be increasingly required to be able to | handle such works at all. | | The learning curve for basic TEI is not too steep, and can be learned as | easy as HTML in a few hours, then as you encounter more difficult | constructs, you can gradually absorb more of the stuff. For books | requiring special things, we will probably end up having specialists. | | Important in this stage is that we will have tools available such that | people can easily validate what they are doing. Last time I marked up a text by hand was, Hmmmm 19 years ago using nroff, and a great pain in the **** it was. Then someone invented WYSYWYG (What You See is What You Get), and producing properly laid out text became a doddle. Surely you are not suggesting that we go back the Dark Ages, nay Neolithic times? Which Windoze WYSIWYG application produces TEI tagging, to whatever standard PG proposes? -- Dave F From jeroen at bohol.ph Sun Oct 31 09:08:31 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sun Oct 31 09:08:00 2004 Subject: [gutvol-d] I'm sorry but I don't get it... (the reprise) In-Reply-To: References: <1b9.54e4965.2eb607e4@aol.com> <4184D31F.4030204@perathoner.de><418508DC.4080109@bohol.ph> Message-ID: <41851C0F.1020203@bohol.ph> Dave Fawthrop wrote: > >Which Windoze WYSIWYG application produces TEI tagging, to whatever >standard PG proposes? > > > OpenOffice can do so with some customization... Jeroen. From jmdyck at ibiblio.org Sun Oct 31 18:18:24 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Sun Oct 31 18:18:41 2004 Subject: [gutvol-d] test #5; please ignore Message-ID: <41859CF0.9667CAA5@ibiblio.org> From gbnewby at pglaf.org Sun Oct 31 20:31:33 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Oct 31 20:31:34 2004 Subject: [gutvol-d] pglaf.org settings might block some messages Message-ID: <20041101043133.GA26281@pglaf.org> I've heard in the past 2 weeks from two people who discovered that their mail server settings resulted in pglaf.org bouncing their messages to gutvol-d@lists.pglaf.org (not necessarily through fault of their own). If you've sent messages to pglaf.org, but are not sure if they got through, you might want to send a test message or check the list archives (http://lists.pglaf.org). You can set your own list settings to get your own messages, and/or an acknowledgement message. The setting we use is to enforce the standard that mail servers (aka MTAs) must have reverse DNS entries (also known as PTRs). Several of the big ISPs, such as AOL, enforce this rule. Others don't. Mail from addresses with no corresponding PTR is (at pglaf.org) well over 99.9% likely to be spam. Over 10,000 such messages are blocked per week on the pglaf.org server using this single rule. The postfix server at pglaf.org has had this setting since the spring, and while I'm sure it has bounced some legitimate messages, it's only recently that anyone has brought any false positives to my attention. -- Greg From brad at chenla.org Sun Oct 31 21:43:41 2004 From: brad at chenla.org (Brad Collins) Date: Sun Oct 31 21:45:32 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? Message-ID: I took a look at the source for the recent handsome re-release of PG's edition of A Christmas Carol (46-h). The code is bit old,

tags are not terminated and the formating could be formated a bit better to make it more readable. For example, the first paragraph looked like this:

Marley was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge’s name was good upon ’Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail. I ran the file through HTML-Tidy which turned it into this:

Marley was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge's name was good upon 'Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail.

It took about ten seconds to open the, file run the file through tidy and save it. This resulted in a file which is consistent, standards compliant and far easier to read and process. Open tags in HTML are an artifact of SGML which can confuse some browsers, processing software and limit what you can do with CSS. I suggest that all PG html files be run through Tidy before being released. If anyone wants the tidy'd version let me know. b/ -- Brad Collins , Bangkok, Thailand From gbnewby at pglaf.org Sun Oct 31 22:27:04 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Oct 31 22:27:06 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: References: Message-ID: <20041101062704.GA28837@pglaf.org> On Mon, Nov 01, 2004 at 12:43:41PM +0700, Brad Collins wrote: > > I took a look at the source for the recent handsome re-release of > PG's edition of A Christmas Carol (46-h). > > The code is bit old,

tags are not terminated and the formating > could be formated a bit better to make it more readable. ... Strangely, this title doesn't have the usual filename mask in GUTINDEX.ALL. I'm cc'ing George to see about adding this. The answer, as you saw, is that the file is old and therefore predates our current procedures. Lacking a /p doesn't prevent a file from passing the validator at w3c, except for the most recent HTML versions, so this file could probably still pass today. Anyway: cleaning up HTML is definitely welcome. When we update a file, these days, we also move it into the new directory structure (the post-10K naming scheme), so this would be /4/46/46h.htm rather than /etext91/xmas10h.htm or whatever. We also add a new header, and apply it to all other files for this eBook. In short, it's more involved than just fixing the file. David Widger has updated hundreds of titles, and we would welcome anyone else with desires to work on this task. Personally, I would not mind waiting until we also have good XML procedures in place, so that we could kill two birds with one stone (actually, more than one stone, since it's more work). Finally, let me mention that we usually also run gutcheck and find/fix many other errors in a typical older title. I hope this helps explain. I didn't mention any limitations of Tidy, but of course like any tool you need to make sure it doesn't accidentally do greater harm than it solves. Really finally: send updated files (or URLs) to errata AT pglaf.org , even if you didn't do all of the above. Thanks! -- Greg > For example, the first paragraph looked like this: > >

> Marley was dead: to begin with. There is no > doubt whatever about that. The register of his burial was signed by > the clergyman, the clerk, the undertaker, and the chief > mourner. Scrooge signed it: and Scrooge’s name was good upon > ’Change, for anything he chose to put his hand to. Old Marley > was as dead as a door-nail. > > I ran the file through HTML-Tidy which turned it into this: > >

Marley was dead: to begin with. > There is no doubt whatever about that. The register of his burial > was signed by the clergyman, the clerk, the undertaker, and the > chief mourner. Scrooge signed it: and Scrooge's name was good > upon 'Change, for anything he chose to put his hand to. Old > Marley was as dead as a door-nail.

> > It took about ten seconds to open the, file run the file through tidy > and save it. This resulted in a file which is consistent, standards > compliant and far easier to read and process. > > Open tags in HTML are an artifact of SGML which can confuse some > browsers, processing software and limit what you can do with CSS. > > I suggest that all PG html files be run through Tidy before being > released. > > If anyone wants the tidy'd version let me know. > > b/ > > -- > Brad Collins , Bangkok, Thailand > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From sly at victoria.tc.ca Sun Oct 31 22:33:34 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Oct 31 22:33:51 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: References: Message-ID: A possible argument against using tidy as you mention is that it can have side effects the user does not intend. In the example you gave below, it appears to have replaced the numberic character entities the volunteer wanted to put in. Also, in the tidy executable I have, results are not always reliable in the contents of a "pre" tag; I have seen Tidy remove blank lines from within them before. When I'm preparing an html and plain text file for PG, I almost always do so in a way which has all the line endings in the same place, which makes it much easier for anyone in the future making corrections etc... I use tidy to check html, but not to produce a final version. I've just checked, and the file in question, while not necessarily the way I would have marked it up, _is_ valid HTML 4.01 Transitional, which matches what is required to add it to PG. Andrew On Mon, 1 Nov 2004, Brad Collins wrote: > > I took a look at the source for the recent handsome re-release of > PG's edition of A Christmas Carol (46-h). > > The code is bit old,

tags are not terminated and the formating > could be formated a bit better to make it more readable. > > For example, the first paragraph looked like this: > >

> Marley was dead: to begin with. There is no > doubt whatever about that. The register of his burial was signed by > the clergyman, the clerk, the undertaker, and the chief > mourner. Scrooge signed it: and Scrooge’s name was good upon > ’Change, for anything he chose to put his hand to. Old Marley > was as dead as a door-nail. > > I ran the file through HTML-Tidy which turned it into this: > >

Marley was dead: to begin with. > There is no doubt whatever about that. The register of his burial > was signed by the clergyman, the clerk, the undertaker, and the > chief mourner. Scrooge signed it: and Scrooge's name was good > upon 'Change, for anything he chose to put his hand to. Old > Marley was as dead as a door-nail.

> > It took about ten seconds to open the, file run the file through tidy > and save it. This resulted in a file which is consistent, standards > compliant and far easier to read and process. > > Open tags in HTML are an artifact of SGML which can confuse some > browsers, processing software and limit what you can do with CSS. > > I suggest that all PG html files be run through Tidy before being > released. > From jtinsley at pobox.com Sun Oct 31 22:39:23 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Oct 31 22:39:41 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: References: Message-ID: <20041101063923.GD6833@panix.com> On Sun, Oct 31, 2004 at 10:33:34PM -0800, Andrew Sly wrote: > >I use tidy to check html, but not to produce a final version. > Me too. jim From sly at victoria.tc.ca Sun Oct 31 22:47:31 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Oct 31 22:47:49 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: <20041101062704.GA28837@pglaf.org> References: <20041101062704.GA28837@pglaf.org> Message-ID: Just to clear up any confusion.... Our text of A Christmas Carol has already been moved into the new directory structure. It's "base directory" can be found at: http://www.gutenberg.org/dirs/4/46/ When a file is reposted, the Gutindex line is modified, so as to no longer include the filename mask (which no longer serves a purpose in finding the files). Also, a note that not all files that are fixed up are automatically put into the new directory structure. For instance, our text of "The Count of Monte Cristo" was updated in the past day, and is still in its old location: http://www.gutenberg.org/etext/1184 Andrew On Sun, 31 Oct 2004, Greg Newby wrote: > On Mon, Nov 01, 2004 at 12:43:41PM +0700, Brad Collins wrote: > > > > I took a look at the source for the recent handsome re-release of > > PG's edition of A Christmas Carol (46-h). > > > > The code is bit old,

tags are not terminated and the formating > > could be formated a bit better to make it more readable. > ... > > Strangely, this title doesn't have the usual filename mask > in GUTINDEX.ALL. I'm cc'ing George to see about adding this. > > The answer, as you saw, is that the file is old and therefore > predates our current procedures. Lacking a /p doesn't prevent > a file from passing the validator at w3c, except for the > most recent HTML versions, so this file could probably still > pass today. > > Anyway: cleaning up HTML is definitely welcome. When we update > a file, these days, we also move it into the new directory > structure (the post-10K naming scheme), so this would be /4/46/46h.htm > rather than /etext91/xmas10h.htm or whatever. > > We also add a new header, and apply it to all other files for > this eBook. In short, it's more involved than just fixing the > file. > > David Widger has updated hundreds of titles, and we would welcome > anyone else with desires to work on this task. Personally, I would > not mind waiting until we also have good XML procedures in place, > so that we could kill two birds with one stone (actually, more than > one stone, since it's more work). > > Finally, let me mention that we usually also run gutcheck and > find/fix many other errors in a typical older title. > > I hope this helps explain. I didn't mention any limitations > of Tidy, but of course like any tool you need to make sure it > doesn't accidentally do greater harm than it solves. > > Really finally: send updated files (or URLs) to > errata AT pglaf.org , even if you didn't do all of the > above. Thanks! > -- Greg > > > For example, the first paragraph looked like this: > > > >

> > Marley was dead: to begin with. There is no > > doubt whatever about that. The register of his burial was signed by > > the clergyman, the clerk, the undertaker, and the chief > > mourner. Scrooge signed it: and Scrooge’s name was good upon > > ’Change, for anything he chose to put his hand to. Old Marley > > was as dead as a door-nail. > > > > I ran the file through HTML-Tidy which turned it into this: > > > >

Marley was dead: to begin with. > > There is no doubt whatever about that. The register of his burial > > was signed by the clergyman, the clerk, the undertaker, and the > > chief mourner. Scrooge signed it: and Scrooge's name was good > > upon 'Change, for anything he chose to put his hand to. Old > > Marley was as dead as a door-nail.

> > > > It took about ten seconds to open the, file run the file through tidy > > and save it. This resulted in a file which is consistent, standards > > compliant and far easier to read and process. > > > > Open tags in HTML are an artifact of SGML which can confuse some > > browsers, processing software and limit what you can do with CSS. > > > > I suggest that all PG html files be run through Tidy before being > > released. > > > > If anyone wants the tidy'd version let me know. > > > > b/ > > > > -- > > Brad Collins , Bangkok, Thailand > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jtinsley at pobox.com Sun Oct 31 22:49:27 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Oct 31 22:49:46 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: <20041101062704.GA28837@pglaf.org> References: <20041101062704.GA28837@pglaf.org> Message-ID: <20041101064927.GE6833@panix.com> On Sun, Oct 31, 2004 at 10:27:04PM -0800, Greg Newby wrote: > >Strangely, this title doesn't have the usual filename mask >in GUTINDEX.ALL. I'm cc'ing George to see about adding this. > The file has been reposted into the new filesystem, and its GUTINDEX entry is A Christmas Carol, A Ghost Story of Christmas, by Charles Dickens 46 which is correct. >Really finally: send updated files (or URLs) to >errata AT pglaf.org , even if you didn't do all of the >above. Thanks! Please, please do not. There are a few cases when it is better to send a whole replacement file, but in my experience they amount to no more than 1% of all cases. Mostly, sending a whole file just causes a lot of unnecessary work and confusion. And please don't send a replacement HTML just because you like your HTML coded in a different style. The W3C sets quite enough standards, thank you: if everyone starts insisting on their own home-grown standards in addition, we'll never get anything useful done, even if open warfare does not break out between the various DIY standards-setters! And please, whatever you do, never send a file that has been put through any kind of automatic converter, tidier, rewrapper, or anything that programmatically alters the text. If you really feel that something like that needs to be used, then you really need to re-proof the whole thing. jim From gbnewby at pglaf.org Sun Oct 31 22:52:14 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Oct 31 22:52:16 2004 Subject: [gutvol-d] PG Policy for Releasing HTML? In-Reply-To: References: <20041101062704.GA28837@pglaf.org> Message-ID: <20041101065214.GA29779@pglaf.org> On Sun, Oct 31, 2004 at 10:47:31PM -0800, Andrew Sly wrote: > > Just to clear up any confusion.... > > Our text of A Christmas Carol has already been moved into the > new directory structure. It's "base directory" can be found at: > http://www.gutenberg.org/dirs/4/46/ Duh! Sorry... It's probably time for me to stop typing now, and get some rest. -- Greg From hyphen at hyphenologist.co.uk Sun Oct 31 22:59:48 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sun Oct 31 23:00:20 2004 Subject: [gutvol-d] pglaf.org settings might block some messages In-Reply-To: <20041101043133.GA26281@pglaf.org> References: <20041101043133.GA26281@pglaf.org> Message-ID: On Sun, 31 Oct 2004 20:31:33 -0800, Greg Newby wrote: | I've heard in the past 2 weeks from two people who discovered that | their mail server settings resulted in pglaf.org bouncing their | messages to gutvol-d@lists.pglaf.org (not necessarily through fault of | their own). | | If you've sent messages to pglaf.org, but are not sure if they got | through, you might want to send a test message or check the list | archives (http://lists.pglaf.org). You can set your own list settings | to get your own messages, and/or an acknowledgement message. | | The setting we use is to enforce the standard that mail servers (aka | MTAs) must have reverse DNS entries (also known as PTRs). Several of | the big ISPs, such as AOL, enforce this rule. Others don't. Mail | from addresses with no corresponding PTR is (at pglaf.org) well over | 99.9% likely to be spam. Over 10,000 such messages are blocked | per week on the pglaf.org server using this single rule. | | The postfix server at pglaf.org has had this setting since the spring, | and while I'm sure it has bounced some legitimate messages, it's only | recently that anyone has brought any false positives to my attention. | -- Greg I am blocked and use BTConnect, BT is one of the largest ISPs in the UK -- Dave Fawthrop 8 Cooper Grove, Shelf, Halifax, HX3 7RF, UK, Tel/F/A +44(0)1274 691092. H 01274 677161 M: +44(0)7720455248 From gbnewby at pglaf.org Sun Oct 31 23:10:16 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Oct 31 23:10:17 2004 Subject: [gutvol-d] pglaf.org settings might block some messages In-Reply-To: References: <20041101043133.GA26281@pglaf.org> Message-ID: <20041101071016.GA30421@pglaf.org> On Mon, Nov 01, 2004 at 06:59:48AM +0000, Dave Fawthrop wrote: > On Sun, 31 Oct 2004 20:31:33 -0800, Greg Newby wrote: > > | I've heard in the past 2 weeks from two people who discovered that > | their mail server settings resulted in pglaf.org bouncing their > | messages to gutvol-d@lists.pglaf.org (not necessarily through fault of > | their own). > | > | If you've sent messages to pglaf.org, but are not sure if they got > | through, you might want to send a test message or check the list > | archives (http://lists.pglaf.org). You can set your own list settings > | to get your own messages, and/or an acknowledgement message. > | > | The setting we use is to enforce the standard that mail servers (aka > | MTAs) must have reverse DNS entries (also known as PTRs). Several of > | the big ISPs, such as AOL, enforce this rule. Others don't. Mail > | from addresses with no corresponding PTR is (at pglaf.org) well over > | 99.9% likely to be spam. Over 10,000 such messages are blocked > | per week on the pglaf.org server using this single rule. > | > | The postfix server at pglaf.org has had this setting since the spring, > | and while I'm sure it has bounced some legitimate messages, it's only > | recently that anyone has brought any false positives to my attention. > | -- Greg > > I am blocked and use BTConnect, BT is one of the largest ISPs in the UK You don't seem to be blocked. Your message arrived. Do you have bounced messages? If so, please send them and I'll try to diagnose. -- gbn