From Bowerbird at aol.com Wed Nov 1 15:50:04 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 1 15:50:09 2006 Subject: [gutvol-d] gvd061101 -- the niceties of book typography Message-ID: <4ad.3840eb24.327a8cac@aol.com> here's your next issue in our "open-source" project, "babelfish". first, a little more background on our sample book, "my antonia". you can download an .html version from project gutenberg at... oh wait... there _is_ no .html version from project gutenberg, just the plain-text version. gee, that's too bad, isn't it? that means all the people who would like an .html version are out of luck. and without an .html version, we cannot easily get automatic conversions to the various e-book formats. even the offered plucker conversion will likely be a straight text-dump... a straight text-dump is _not_ an "electronic-book", not in my book. oh well, you _can_ download it in .html from manybooks.net: > http://manybooks.net/titles/catherwietext95myant11.html indeed, the "custom .html" lets you specify certain parameters, like the fonts used (ok, you get a choice of 5, but it _is_ a choice), size (10pt-16pt), leading (1x, 1.25x, 1.5x, 1.75x), justification, and various indentation and margin parameters. quite nifty! it'd be neat if project gutenberg offered something like this... (i should add i was unable to get a working download from this.) i give huge props to matthew mcclintock, who runs manybooks.net. since the sidelining of blackmask.com, he is the go-to guy for those people who want to find p.g. e-texts in the various handheld formats. he's put a converter on his site that can export out to all these formats: > PDF > PDF Large Print > eReader > Doc > Plucker > iSilo > zTXT > Rocketbook > iPod Notes > Sony > TCR > iRex iLiad PDF > Custom PDF > Custom HTML > RTF > Newton > Mobipocket wow. that's very impressive. it seems that we don't even have to build any conversion capability at all, we just have to feed matthew some files! i appreciate all the hard work that he's done in providing such a service! there is a glitch, though. and it's the same one we experienced above: a straight text-dump is _not_ an "electronic-book", not in my book, some manybooks.net conversions are simply a straight text-dump. (that's not matthew's fault, i'm just stating it as a pure observation.) i won't dwell on this, i'll just give a few examples. i downloaded the regular .pdf version, so you can download that if you want to look at the exact thing that i saw when i wrote this... here's the "titlepage" of "my antonia", as shown in the .pdf: > http://snowy.arsc.alaska.edu/bowerbird/misc/anttitle.jpg ouch. not very pretty. the original titlepage looked like this: > http://snowy.arsc.alaska.edu/bowerbird/myant/myantf003.png that's what we expect a title page to look like. and here's the scan of page #181 of the book: > http://snowy.arsc.alaska.edu/bowerbird/myant/myantp181.png but here's what that same page looks like in the .pdf. > http://snowy.arsc.alaska.edu/bowerbird/misc/weevil.jpg not only is the new chapter not at the head of a page, neither is it bigger or bold like we expect of headers. even worse, the poem in the epigraph is not just unformatted, but it is even incorrectly wrapped... typographical niceties like these have been the hallmark of paper-books for over 100 years, so it's embarrassing when our newfangled e-books fail to clear that standard. and many people report that it is a huge turn-off to them. (and it's hard to tell 'em not to be so picky, because frankly, when something falls so far below expectations, it _is_ bad.) nor are some of the _advances_ that we expect of e-books present here (e.g., there is no hotlinked table of contents). in these areas, we want our open-source project to do better. we want it to be able to make a first-rate e-book in .html form, to the extent that such an animal is possible, so that the various converter-programs have optimal input for best possible output. (in this regard, one of the first things i do to a p.g. e-text is to strip off the header and footer. sorry, chaps, but they're ugly. and besides, the very first item in an e-book file should be _the_title_of_the_book_, and the next should be the author. again, sorry, but that's just the way that it should be, period.) *** before today's exercise, let me remind people once again that... ...i am a beginner with perl... my code is _not_ something you should emulate. (kids, do not try this at home. you might get hurt.) my formatting of that code, especially, is "unusual", and will not look very familiar to most perl people. so be it. i hate those stupid curly braces. hate 'em. i'll repeat: i'm a beginner with perl. moreover, _that_is_the_point_. (cue the ring of a bell here.) you don't need to do anything more than copy sample code out of a programming primer to get some good functionality, _providing_ that the file-format of your e-book is dirt-simple. if your format is complex, like docbook or .tei or x.m.l., then you're gonna need a sophisticated programmer to get _any_ functionality out of your e-texts, and it'll be slow in coming... simple is better. so i thank my "critics" who characterized my perl as elementary. they've done a better job of making my point than i could have... *** for your reference: > http://www.greatamericannovel.com/myant/myantp123.html you will remember that i am still looking for a contribution to our open-source thing, in the form of c.s.s., but here goes anyway... so today's assignment is: churn out the code for that page 123. > #!/usr/bin/perl > use CGI::Carp qw(fatalsToBrowser); > > ########## read the file in... > $filename="/home2/yoursiteinfohere/public_html/myant/myant-lf.zml"; > open (inf,"$filename") or print "that file was not available...

\n"; > read (inf,$thebook,2222222); close inf; > > ########## changes made here include the c.s.s. stylesheet... > print "content-type: text/html\n\n"; > print ''; print "\n"; print "\n"; > print 'my antonia!'; > print "\n"; > print ''; > print "\n"; print ''; print "\n"; > print "\n"; > > ########## ok, do page 123... > ########## first the navigation headers... > print ' '; print "\n"; > print 'p123.png print 'src="http://snowy.arsc.alaska.edu/bowerbird/myant/myantp123.png" />'; print "\n"; > print ''; print "\n"; > print "\n"; print ''; print "\n"; > print ' p122 _'; print "\n"; > print ' -chap- _'; print "\n"; > print ' toc-1 _'; print "\n"; > print ' p123w _'; print "\n"; > print ' toc-2 _'; print "\n"; > print ' +chap+ _'; print "\n"; > print ' p124 '; print "\n"; > print "\n"; > > ########## split the pages and pull out the correct one... > @thepage=split("{{",$thebook); foreach $thepage (@thepage) { > $pp++; if ($pp eq "148") { > ########## now split out the lines on that page > @oneline=split("\n",$thepage); $maxline=@oneline; $maxminustwo=$maxline-2; > > ########## output the runhead... > print ''; print "
"; print '------ '; print "{{"; > print $oneline[0]; print "
"; print '


'; > > ########## and all the lines on the page... > foreach $oneline (@oneline) { > $nn++; if ($nn ne "1" and $nn ne "2" and $nn < $maxminustwo) { > print $oneline; > if ($oneline ne "") {print '
'; print "\n";} > if ($oneline eq "") {print '

'; print "\n";} > }}}} > > ########## then the pagenumber... > print '

[[123]]

'; print "\n"; > print ''; print "\n"; > > ########## now repeat the navitgational links... > print ' p122 _'; > print "\n"; > print ' -chap- _'; > print "\n"; > print ' toc-1 _'; > print "\n"; > print ' p123w _'; > print "\n"; > print ' toc-2 _'; print "\n"; > print ' '; > print ' +chap+ _'; print "\n"; > print ' p124 '; > print "\n"; > print '

'; print "\n"; > print "\n"; > > ########## now put in the error-reporting form... > print '

print 'action="http://www.greatamericannovel.com/scgi-bin/appendcomment.pl">'; > print "\n"; > print 'name
'; print "\n"; > print 'e-mail

'; print "\n"; > print 'bad

'; print "\n"; > print 'new

'; print "\n"; > print ''; print "\n"; > print '
'; > print "\n"; > print ' or '; print "\n"; > print '

'; print "\n"; > print "\n"; you can see the results of this code by running this script: > http://www.greatamericannovel.com/scgi-bin/babelfish10.pl there are a number of things to notice about this particular routine, all of which will be dealt with in further detail in coming days... first, i've reworked the .html so as to make use of a .css stylesheet. (this lets me indent paragraphs, instead of using blank lines between. it also allows me to have a proportional-spaced font, not that dreadful monospaced font that is the default whenever you use the "pre" tag. and of course the c.s.s. will help us in the future, on the pages which have various structural features that we will want to display properly.) second, i have included some links to help the user with navigation, with one set of them at the top, and an identical set at the bottom... third, i've pulled in the scan of the page, for easy comparison. this is necessary when we want to do "continuous proofreading". fourth, i've added a form that readers can use to report errors, another essential aspect of our "continuous proofing" system... i was going to add in each of these things on a separate day, but i figured you could absorb the shock of all of them at once. still, at the heart of this routine, we're displaying the text of a page, something we had already worked out previously. and indeed, this routine to display a page is the main "engine" in an e-book program. as to this code, it does a good job of presenting one page, #123. the links to the surrounding pages (like page 122 and 124) are hardwired, however, so tomorrow's exercise will require that we turn them into variables, so that this routine will be able to present _any_ page in the book, not just page 123... go ahead, feel free to have a pass at modifying this routine. after all, that's the point of open-source, that people can just jump in and join the coding fun any time they want to! *** now, for some other commentary... *** oh geez, part 4... in other news today, some people are unhappy with the iliad e-book-machine, because it takes 40 seconds to boot up, and you have to shut it down if you're not reading because otherwise the battery will run down... it ends up that people consider this slow boot-up time to be very "unpaperlike", which is the main claim to fame that e-ink has been bragging. that's not all, either, since a relatively slow page-turning time is another liability... and we won't even talk about a price that is over $700. our good friend david rothman has this to say: > Shortcomings like this should long have been solved only an idiot would have had the expectation that an early version of this product would be _free_ of such "shortcomings" as this one... and only a _pure_ idiot would have led other people on, in terms of creating that stupid expectation in them... and only the most _extreme_ of pure idiots would then lash out at the product-maker for failing to live up to the unreasonable expectations that the idiot had created. the unmitigated bile of unrealized hype can be very nasty. -bowerbird p.s. above, i commented on the lack of formatting on an epigraph. as you can see, by referring to my version of that same page, > http://www.greatamericannovel.com/myant/myantp181.html i have chosen to format the poem differently than it was formatted in the paper-book, which is my prerogative as a re-publisher... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061101/95ae46c6/attachment-0001.html From jon at noring.name Wed Nov 1 18:11:09 2006 From: jon at noring.name (Jon Noring) Date: Wed Nov 1 18:17:49 2006 Subject: [gutvol-d] Line by line proofing of OCR text? Message-ID: <73794793.20061101191109@noring.name> Everyone, For quite a while I've been wondering if line-by-line proofing of OCR text will result in more accurate results, with higher efficiency compared to the side-by-side page editing used in the initial proofing stage at Distributed Proofreaders. The difficulty I have with page-by-page editing is that the OCR text and the original page scans are big blocks that sit side-by-side, and I find during proofing that I have to move my eyes back and forth, which to me is fairly tiring as I try to realign my view -- it definitely slows me up and I always sense I may be missing something. Now, if instead we had the following in our proofing window display: original scan line --> development. He is always able to raise capi- OCR text/edit window --> developmenl, He is always.able to ra6e capi- [Of course the original scan line is an actual image of the line, not ASCII text as shown above. It is scaled as close as possible to the OCR text line below which is user-editable. And the OCR text example is something I made up, so don't criticize the choice of OCR errors! Certainly the standard PG/DP scripts can be run to remove some to most of the OCR errors before the line-by-line human proofing stage.] This alignment allows me to do a vertical comparison, which I think may make it easier to spot any OCR errors. It should, at least for some people, increase the speed and accuracy of proofing. Well, that's the hypothesis at least. Now, certainly it will be argued that the proofer should be able to see the entire page scan, such as for context, simple pleasure, and to see if there were errors in generating the page image line. I agree! But the original page can certainly be displayed to the proofer in a separate window or to the side. So it is possible to have both (as well as offer both proofing methods -- there are definitely pages with odd text layouts where page-level proofing may be more appropriate.) So, asking the proofing mavens here, has this been tested? What are the fatal flaws in this? I can't help but think that this has already been thought of and discarded by Charles Franks when he started DP. But then, technology has changed the last few years, and maybe this idea may again be considered. Jon From grythumn at gmail.com Wed Nov 1 18:41:05 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Wed Nov 1 18:47:24 2006 Subject: [gutvol-d] Line by line proofing of OCR text? In-Reply-To: <73794793.20061101191109@noring.name> References: <73794793.20061101191109@noring.name> Message-ID: <15cfa2a50611011841x5abc4bacxa00c2b468827825c@mail.gmail.com> I've used similar techniques when single-proofing in an OCR program, and the trouble is one often needs to zoom out for context.. plus the fact that'd we'd need to extract character or line position information from the OCR engines to automate matching the text to the image. However, what you've asked for can be manually approximated using the horizontal interface at DP.. enlarge the font size in the text window, and increase the zoom level in the image area. You'll get three or four lines of text, one above the other. R C On 11/1/06, Jon Noring wrote: > Everyone, > > For quite a while I've been wondering if line-by-line proofing of > OCR text will result in more accurate results, with higher efficiency > compared to the side-by-side page editing used in the initial > proofing stage at Distributed Proofreaders. From schultzk at uni-trier.de Thu Nov 2 00:05:28 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu Nov 2 00:05:35 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <45477F3D.70205@perathoner.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> Message-ID: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> Hi Marcello, I will ask my question again: Do you know what you are doing? Why do not you can your comments to another list, please. Your are worse than a kindergarden kid. Personally, I do not know any of the systems, but from what I heard and seen they are all primitive and none do the JOB and will. I know what it takes top do the job! It is my profession: Linguistics. Have you heard of SGML? If that is to complex( not complicated) then use XML. But, the problem is not the format, but getting the formatting done. According to my analysis so far automatic formating can only be done to a max of 80%. The rest has to be proofed manually. In the early days of PG I have discussed the matter with Micheal Hart that plain ASCII is not enough. Today, computers have advance and computing power is abundant. My opinion is that PG start using a markup language from the start. Sure the scanning and especially the proofing of the text will take a little longer, but the benifits are far greater. The markup should conatin: Chapter section, Character formating, PG Header, picture, sound, etc tags. Hey XML can do all that. All we need is a common xml template. One format! a known straucture a few filters and voila. a neat package exactly what everybody is trying to create. If you scan into word, and use a few macros(or one big one) you can get 90-95% of the mark-up done. Now add 10% mor time for proofing and you guys and gals have just what you will ever need. regards Keith. Am 31.10.2006 um 17:52 schrieb Marcello Perathoner: > David A. Desrosiers wrote: > >> Its obvious from reading the snippets, that it is indeed copied >> out of a >> rudimentary Perl primer, and not touched by anyone who has a strong >> grasp of the power of the language at hand. > > He's a baby that makes poo in the chamberpot for the first time and > thinks his parents are watching him because they want poo. > > >> Exactly what is it you are trying to prove with this anyway? We >> know how >> to write parsers that can chew up and spit out a Gutenberg etext into >> other formats, I don't think that's the core of the problem here. > > He's just inventing warm water (and trying to get credit for it). > > This parser is online. It converts any PG text into a plucker > database. > And it is open source and written in gasp! python. We have served > 130,000 plucker texts in October this way. The only guy who hasn't > noticed yet is him who notices everything. > > There are a few other PG parsers around like GutenMark and my PG to > TEI > converter. All of them are open source and working today. So its only > natural that you-know-who will hold his non-working > at-the-rate-its-going-never-to-be-released zml parser against them, > just > for the fun of causing confusion. Ever wondered who pays him to > fuzz and > fudge? > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From hyphen at hyphenologist.co.uk Thu Nov 2 00:55:49 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Nov 2 00:56:22 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> Message-ID: On Thu, 2 Nov 2006 09:05:28 +0100, "Schultz Keith J." wrote: |Hi Marcello, | | I will ask my question again: Do you know what you are doing? | | | Why do not you can your comments to another list, please. | | Your are worse than a kindergarden kid. | | Personally, I do not know any of the systems, but from what I | heard and seen they are all primitive and none do the JOB and will. | | I know what it takes top do the job! It is my profession: Linguistics. | | Have you heard of SGML? If that is to complex( not complicated) then |use | XML. But, the problem is not the format, but getting the formatting |done. | According to my analysis so far automatic formating can only be done |to a | max of 80%. The rest has to be proofed manually. | | In the early days of PG I have discussed the matter with Micheal |Hart that | plain ASCII is not enough. Today, computers have advance and |computing power | is abundant. My opinion is that PG start using a markup language |from the start. | Sure the scanning and especially the proofing of the text will take |a little longer, | but the benifits are far greater. | | The markup should conatin: | Chapter | section, | Character formating, | PG Header, | picture, sound, etc | tags. | | Hey XML can do all that. All we need is a common xml template. One |format! a known straucture | a few filters and voila. a neat package exactly what everybody is |trying to create. | If you scan into word, and use a few macros(or one big one) you can |get 90-95% of the mark-up done. | Now add 10% mor time for proofing and you guys and gals have just |what you will ever need. ROTFLMAO When you learn to format things in plain text someone might listen. -- Dave Fawthrop For Yorkshire Dialect http://www.gutenberg.org/author/John_Hartley http://www.gutenberg.org/author/F_W_Moorman 19,000 free e-books at Project Gutenberg! http://www.gutenberg.org From marcello at perathoner.de Thu Nov 2 03:54:07 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Nov 2 03:54:11 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> Message-ID: <4549DC5F.8010002@perathoner.de> Schultz Keith J. wrote: > Hey XML can do all that. All we need is a common xml template. One > format! a known straucture > a few filters and voila. a neat package exactly what everybody is > trying to create. This is the reason why we have to shut up BB: People reading this list will think that the most vociferous person represents the consensus in PG research. Not so. BB just pesters eveybody who doesn't want to hear with *his* at best half-baked ideas about text representation and delivery. Nobody takes BB seriously, and you shouldn't too. The state of PG research is: Consensus has been reached about using a subset of TEI as master format for PG texts (since PGXML seems to be dead). Which subset is still being discussed. There are at least 2 different working toolchains to convert subsets of TEI to end user formats. Files produced with these toolchains have been posted. Of course, everything is still in active research and can change a lot. But nobody seriously considers using anything other than TEI or XML as master format. -- Marcello Perathoner webmaster@gutenberg.org From mattsen at arvig.net Thu Nov 2 03:57:26 2006 From: mattsen at arvig.net (Chuck MATTSEN) Date: Thu Nov 2 04:11:57 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <4549DC5F.8010002@perathoner.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> Message-ID: On Thu, 02 Nov 2006 05:54:07 -0600, Marcello Perathoner wrote: > This is the reason why we have to shut up BB: People reading this list > will think that the most vociferous person represents the consensus in > PG research. Not so. BB just pesters eveybody who doesn't want to hear > with *his* at best half-baked ideas about text representation and > delivery. Nobody takes BB seriously, and you shouldn't too. Oh, I dunno ... I think any thinking person reading the list will quickly be able to discern the intent behind, and value of, any frequent flyer. :-) -- Chuck Mattsen (Mahnomen, MN) mattsen@arvig.net From joshua at hutchinson.net Thu Nov 2 05:29:55 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Thu Nov 2 05:30:02 2006 Subject: [gutvol-d] Line by line proofing of OCR text? Message-ID: <15656271.1162474195305.JavaMail.?@fh1038.dia.cp.net> At first blush, it seems like small return of investment. The programming required (as well as the difference in how scans/ocr are prepared) would be very significant, while the increase in quality would be minuscule. DP gets very good results with their current method and I think a better return on the programming investment would be to implement a "roundless" system, where each page is proofed again and again until a certain "confidence" level is reach. Easy pages may only be seen a couple times, while a particularly nasty page might get seen by scores of people. (See DP forums for length discussions of how this system might work.) But, as always, the bottleneck is developer time. We ALWAYS have more work than we have volunteers to do it. Josh >----Original Message---- >From: jon@noring.name >Date: Nov 1, 2006 21:11 >To: >Subj: [gutvol-d] Line by line proofing of OCR text? > >Everyone, > >For quite a while I've been wondering if line-by-line proofing of >OCR text will result in more accurate results, with higher efficiency >compared to the side-by-side page editing used in the initial >proofing stage at Distributed Proofreaders. > >The difficulty I have with page-by-page editing is that the OCR text >and the original page scans are big blocks that sit side-by-side, and >I find during proofing that I have to move my eyes back and forth, >which to me is fairly tiring as I try to realign my view -- it >definitely slows me up and I always sense I may be missing something. > >Now, if instead we had the following in our proofing window display: > >original scan line --> development. He is always able to raise capi- >OCR text/edit window --> developmenl, He is always.able to ra6e capi- > >[Of course the original scan line is an actual image of the line, not >ASCII text as shown above. It is scaled as close as possible to the >OCR text line below which is user-editable. And the OCR text example >is something I made up, so don't criticize the choice of OCR errors! >Certainly the standard PG/DP scripts can be run to remove some to most >of the OCR errors before the line-by-line human proofing stage.] > >This alignment allows me to do a vertical comparison, which I think >may make it easier to spot any OCR errors. It should, at least for >some people, increase the speed and accuracy of proofing. Well, >that's the hypothesis at least. > >Now, certainly it will be argued that the proofer should be able to >see the entire page scan, such as for context, simple pleasure, and to >see if there were errors in generating the page image line. I agree! >But the original page can certainly be displayed to the proofer in a >separate window or to the side. So it is possible to have both (as well >as offer both proofing methods -- there are definitely pages with odd >text layouts where page-level proofing may be more appropriate.) > >So, asking the proofing mavens here, has this been tested? What are >the fatal flaws in this? I can't help but think that this has already >been thought of and discarded by Charles Franks when he started DP. >But then, technology has changed the last few years, and maybe this >idea may again be considered. > >Jon > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Thu Nov 2 05:33:18 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Thu Nov 2 05:33:21 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net> Are you sure you meant to address those comments to Marcello? What you are talking about is what Marcello has done. bowerbird is the kindergarten kid you seem to be talking about... Josh >----Original Message---- >From: schultzk@uni-trier.de >Date: Nov 2, 2006 3:05 >To: "Project Gutenberg Volunteer Discussion" >Subj: Re: [gutvol-d] gvd061030 -- let's get it started in here > >Hi Marcello, > > I will ask my question again: Do you know what you are doing? > > > Why do not you can your comments to another list, please. > > Your are worse than a kindergarden kid. > > Personally, I do not know any of the systems, but from what I > heard and seen they are all primitive and none do the JOB and will. > > I know what it takes top do the job! It is my profession: Linguistics. > > Have you heard of SGML? If that is to complex( not complicated) then >use > XML. But, the problem is not the format, but getting the formatting >done. > According to my analysis so far automatic formating can only be done >to a > max of 80%. The rest has to be proofed manually. > > In the early days of PG I have discussed the matter with Micheal >Hart that > plain ASCII is not enough. Today, computers have advance and >computing power > is abundant. My opinion is that PG start using a markup language >from the start. > Sure the scanning and especially the proofing of the text will take >a little longer, > but the benifits are far greater. > > The markup should conatin: > Chapter > section, > Character formating, > PG Header, > picture, sound, etc > tags. > > Hey XML can do all that. All we need is a common xml template. One >format! a known straucture > a few filters and voila. a neat package exactly what everybody is >trying to create. > If you scan into word, and use a few macros(or one big one) you can >get 90-95% of the mark-up done. > Now add 10% mor time for proofing and you guys and gals have just >what you will ever need. > > > > regards > Keith. > >Am 31.10.2006 um 17:52 schrieb Marcello Perathoner: > >> David A. Desrosiers wrote: >> >>> Its obvious from reading the snippets, that it is indeed copied >>> out of a >>> rudimentary Perl primer, and not touched by anyone who has a strong >>> grasp of the power of the language at hand. >> >> He's a baby that makes poo in the chamberpot for the first time and >> thinks his parents are watching him because they want poo. >> >> >>> Exactly what is it you are trying to prove with this anyway? We >>> know how >>> to write parsers that can chew up and spit out a Gutenberg etext into >>> other formats, I don't think that's the core of the problem here. >> >> He's just inventing warm water (and trying to get credit for it). >> >> This parser is online. It converts any PG text into a plucker >> database. >> And it is open source and written in gasp! python. We have served >> 130,000 plucker texts in October this way. The only guy who hasn't >> noticed yet is him who notices everything. >> >> There are a few other PG parsers around like GutenMark and my PG to >> TEI >> converter. All of them are open source and working today. So its only >> natural that you-know-who will hold his non-working >> at-the-rate-its-going-never-to-be-released zml parser against them, >> just >> for the fun of causing confusion. Ever wondered who pays him to >> fuzz and >> fudge? >> >> >> >> -- >> Marcello Perathoner >> webmaster@gutenberg.org >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From bill at williamtozier.com Thu Nov 2 05:36:41 2006 From: bill at williamtozier.com (William Tozier) Date: Thu Nov 2 05:36:52 2006 Subject: [gutvol-d] Line by line proofing of OCR text? In-Reply-To: <73794793.20061101191109@noring.name> References: <73794793.20061101191109@noring.name> Message-ID: <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com> On Nov 1, 2006, at 9:11 PM, Jon Noring wrote: > So, asking the proofing mavens here, has this been tested? What are > the fatal flaws in this? I can't help but think that this has already > been thought of and discarded by Charles Franks when he started DP. > But then, technology has changed the last few years, and maybe this > idea may again be considered. Far from being flawed, it's how I proof as well. As another respondent already pointed out, the DP proofing interface can be restructured to do something like this. Unfortunately, the diversity of proofers abilities, habits and interface preferences make it hard to standardize this sort of thing. Even fonts differ from platform to platform; if we're not working in Flash or some other typographically fixed standard, this sort of thing is sunk. I'd say, though, that more important than presenting single lines to the reader, the act of forcing the gaze of the proofer to *follow* lines is what you're really looking for. When proofing, I always ensure that the insertion cursor in the page's text field touches every character -- essentially I click before the first letter, and right-arrow through the entire text. Not least because this spell-checks every word (client-side, on my Mac), but also because the result is a word-by-word serial visit to every portion of the page. Even without Flash, we could imagine a number of interface elements that do this sort of thing: Something that serially highlights every word, two per second; an audible reader; a requirement that the cursor visit each letter before a page is considered done. When I was a professional proofreader in a large academic printer, there were a number of old tried-and-true tricks we were taught: reading the text backwards, reading it aloud to a partner complete with punctuation, &c. But they all boiled down to getting the reader to look at the typeset page as a proofer, not a reader. Slowing them down to the point where their eyes' habits were no longer comfortable, and they saw more of everything. Prohibiting saccades, among other things, and allowing them pay attention to short- and medium-scale textual patterns at the same time. There are nearsighted little old ladies and 24-inch monitor-users among us at DP, and their ability to customize the interface and the presentation of the work is probably much more a boon than a threat: it invites more people to work. What we might consider is changing what that work is, to make it more obvious that it is not the kind of reading they're used to. ----- Bill Tozier AIM: vaguery@mac.com blog: http://williamtozier.com/slurry plazes: http://beta.plazes.com/user/BillTozier skype: vaguery "Nature, however picturesque, never yet made a poet of a dullard." --Hjalmar Hjorth Boyesen From desrod at gnu-designs.com Thu Nov 2 06:20:32 2006 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Thu Nov 2 06:20:54 2006 Subject: [gutvol-d] gvd061101 -- the niceties of book typography In-Reply-To: <4ad.3840eb24.327a8cac@aol.com> References: <4ad.3840eb24.327a8cac@aol.com> Message-ID: <1162477232.10976.36.camel@localhost.localdomain> On Wed, 2006-11-01 at 18:50 -0500, Bowerbird@aol.com wrote: > ...i am a beginner with perl... ^^^^^^^^ You spelled "dangerous" wrong. ;) -- David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/7ae9287e/attachment.bin From jon at noring.name Thu Nov 2 07:54:35 2006 From: jon at noring.name (Jon Noring) Date: Thu Nov 2 07:54:54 2006 Subject: [gutvol-d] Line by line proofing of OCR text? In-Reply-To: <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com> References: <73794793.20061101191109@noring.name> <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com> Message-ID: <187839134.20061102085435@noring.name> I'll answer both Joshua and Bill in this message... Joshua Hutchinson wrote: > At first blush, it seems like small return of investment. The > programming required (as well as the difference in how scans/ocr are > prepared) would be very significant, while the increase in quality > would be minuscule. DP gets very good results with their current > method and I think a better return on the programming investment would > be to implement a "roundless" system, where each page is proofed again > and again until a certain "confidence" level is reach. Easy pages may > only be seen a couple times, while a particularly nasty page might get > seen by scores of people. (See DP forums for length discussions of how > this system might work.) A roundless approach definitely is smarter. Compare a page edit with the prior edit, and when one does not see any new corrections, maybe twice or three times in a row, there's high confidence the page has been proofed to zero errors. Since it seems like the real bottleneck at present in DP (at least this is my understanding) is the latter stages, not the initial proofing, then there should be no loss in throughput by implementing this page edit comparison to get, hopefully, very high accuracy. > But, as always, the bottleneck is developer time. We ALWAYS have more > work than we have volunteers to do it. Yep, this is one of the Laws of the Universe: There will never be enough developers to do the job as one wants. Bill Tozier wrote: > I'd say, though, that more important than presenting single lines to > the reader, the act of forcing the gaze of the proofer to *follow* > lines is what you're really looking for. Yes, this is definitely one of the problems I have with the current system, knowing where one is on both the original page scan and the text edit box. It requires effort for the mere mortal to realign oneself as one goes back and forth between the original and the proofed text. This realignment is, for a mere mortal like me at least, pretty uncomfortable and quite inefficient. For those with photographic memories (I am not of this elite), the page-by-page approach probably works well for them. So, yes, everyone is different in their abilities and preferences to proof pages. I think the line-by-line approach should at least be experimented with, and I'll look into doing so. > When proofing, I always ensure that the insertion cursor in the > page's text field touches every character -- essentially I click > before the first letter, and right-arrow through the entire text. Not > least because this spell-checks every word (client-side, on my Mac), > but also because the result is a word-by-word serial visit to every > portion of the page. Even without Flash, we could imagine a number of > interface elements that do this sort of thing: Something that > serially highlights every word, two per second; an audible reader; a > requirement that the cursor visit each letter before a page is > considered done. Interesting. > When I was a professional proofreader in a large academic printer, > there were a number of old tried-and-true tricks we were taught: > reading the text backwards, reading it aloud to a partner complete > with punctuation, &c. But they all boiled down to getting the reader > to look at the typeset page as a proofer, not a reader. Again interesting. The line-by-line approach definitely forces this naturally, because usually there's little interesting content-wise in a single line to distract -- it also eliminates reading since one is doing a vertical comparison, rather than horizontal. > There are nearsighted little old ladies and 24-inch monitor-users > among us at DP, and their ability to customize the interface and the > presentation of the work is probably much more a boon than a threat: > it invites more people to work. What we might consider is changing > what that work is, to make it more obvious that it is not the kind of > reading they're used to. One thing I like with the line-by-line system is that it might even allow proofing on limited hardware, like PDA's. Here we might not even allow the proofer to make any edits -- but simply to flag whether the text is right or not. (Hmmm, this is interesting). If the line gets flagged 2-3 times that no edits occured, we assume it is proofed to zero errors. If flagged as having an error, then someone else can actually do the edit. I surmise that with the quality of OCR today, plus the PG/DP tools to pre-process an OCR text, that in an *average* book the percentage of lines with errors will be fairly low (less than 10% ???). Anyway... ***** Now, it is my understanding that most advanced OCR packages can produce an XML document of the raw OCR text, and the XML data includes the bounding box information (the coordinates on the original page scan where a word occurs) and line information. (I'm sure what I just described is well-known among most of the PG/DP OCR experts, but I'm sharing it with the others here who may not be aware.) For example, here's a link which Branko Collin posted a few months ago in a comment to the TeleRead blog. It points to one of these XML documents, produced by DJVU OCR: http://ia201107.eu.archive.org/2/items/englishbookbindings00davenuoft/englishbookbindings00davenuoft_djvuxml.xml (depending upon one's browser, you may have to look at the source to see the bare document.) This XML document contains all the raw OCR text associated with each scanned page in the DJVU book. Here's a snippet of the markup from somewhere in the middle, for "page 36": XXVIII GENERAL INTRODUCTION the eighteenth century a new grace was added by the inlaying of a leather of a second colour. For each line, we can easily determine the top line and bottom line coordinates so the "strip" of the page scan associated with the line can be displayed (as well as where the first word in the line starts and where the final word ends -- useful for alignment of the strip with the editable text.) [We have a knotty problem if changes are made to the text in a line, in rewriting the edits back into the XML (I won't explain why.) So we only use the XML bounding box information to give us the coordinates of the 'strip' in the image associated with a line, but we won't update the original XML document. We might produce a different XML doc with the edited results, though, viz. the eighteenth century a new grace was added by the inlaying of a leather of a second colour. ... Jon Noring From marcello at perathoner.de Thu Nov 2 08:55:46 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Nov 2 08:55:51 2006 Subject: [gutvol-d] Line by line proofing of OCR text? In-Reply-To: <187839134.20061102085435@noring.name> References: <73794793.20061101191109@noring.name> <680E209C-05DF-476C-B31F-F47D08FB88BD@williamtozier.com> <187839134.20061102085435@noring.name> Message-ID: <454A2312.7080700@perathoner.de> Jon Noring wrote: > Yes, this is definitely one of the problems I have with the current > system, knowing where one is on both the original page scan and the > text edit box. It requires effort for the mere mortal to realign > oneself as one goes back and forth between the original and the > proofed text. This realignment is, for a mere mortal like me at least, > pretty uncomfortable and quite inefficient. The quick fix would be to implement a function that puts a horizontal ruler on the image window if you click on it. (And scrolls the window so the ruler is in the vertical middle.) A few lines of JavaScript will do that. Firefox will even support opacity so you can highlight a portion of text. > > the > eighteenth Why not break the whole text down into words and use it as captcha (http://en.wikipedia.org/wiki/Captcha) for the PG website? Everybody who wants to download a file has to decipher a word. Haha, only serious. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Thu Nov 2 11:49:02 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Nov 2 11:49:09 2006 Subject: [gutvol-d] gvd061102 -- thoughts on a thursday Message-ID: today i'll wait for someone else to contribute to babelfish, our little open-source project here... but i have some other thoughts... *** jon noring said: > I can't help but think that this has already been thought of > and discarded by Charles Franks when he started DP. well, that's probably because charles floated this very idea at the meetings held in san francisco for the 10,000th e-text... so that's where you "got" the idea, jon. i've tested the method, and yes, it would work just fine, except there's no need to do line-by-line proofing at _all_ these days, so finding a "better" way to go about doing it is irrelevant... besides, i'm not sure how this would fit in the d.p. interface, what with the slicing of each scan into dozens of files... as for the coordinates of each line or word... although it's simple enough to get that coordinate information from an o.c.r. program, it is also very simple to write a routine that collects the information just by examining the actual scan. a screenshot of output from such a routine can be seen here: > http://snowy.arsc.alaska.edu/bowerbird/misc/line-determination.jpg the number to the left of each line gives its topmost pixel, while the number to its right indicates the height of its bottom pixels. considering that the setting of this type was a _manual_ process, the leading is amazingly consistent throughout, as you'll notice. those typesetters really had their craft down... i use this routine to highlight a line -- as shown in the graphic -- where a possible error might exist. my program also selects the questionable text -- in the editfield displayed next to the scan -- because automating this boring manual work of doing a correction makes the process go much more quickly. the proofer's attention is drawn to the red-highlighted line on the scan so they can read that, and then focus on the text-in-question to correct it when necessary. *** jon said: > A roundless approach definitely is smarter. gee, when both jon and josh agree with me, i figure it won't be long before things change. unfortunately "not long" is not the same thing as "soon" over in the land of distributed proofreaders. meanwhile, any more reaction to the duguid article? heck, noring's little "idea" about a proofing wrinkle has pulled more commentary than duguid's piece... so it's a good think duguid took his piece to the public, instead of letting it get buried by taking it to d.p. alone. *** jon said: > Yep, this is one of the Laws of the Universe: > There will never be enough developers to > do the job as one wants. you should try the "open-source community", where there are scads of programmers who will happily do your programming for free... *** marcello said: > Nobody takes BB seriously wishful thinking! the .tei folks have been touting their "solution" for 5 years now, and nothing has yet materialized. and -- in the 3 years i've been on this listserve -- the size of the library doubled to 20,000 e-texts. when i've mirrored the whole thing in z.m.l. format, and can maintain the entire library in my spare time, while p.g. is still trying to figure out what kind of .tei they're gonna settle on, and then goes begging for the expertise needed to maintain that complex format, let alone get any useful functionality out of it, we'll see who takes whom seriously... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/4d3f06bc/attachment-0001.html From jon at noring.name Thu Nov 2 12:21:11 2006 From: jon at noring.name (Jon Noring) Date: Thu Nov 2 12:21:31 2006 Subject: [gutvol-d] gvd061102 -- thoughts on a thursday In-Reply-To: References: Message-ID: <1455561431.20061102132111@noring.name> jon noring said: >>?? I can't help but think that this has already been thought of >>?? and discarded by Charles Franks when he started DP. > well, that's probably because charles floated this very idea at > the meetings held in san francisco for the 10,000th e-text... > so that's where you "got" the idea, jon. Is that the meeting held at the Internet Archive which you and I attended? I don't remember Charles mentioning this technique, nor again when I met him in Las Vegas a few months later. So if he did, it has bounced around in my subconscious for a while and only now is emerging as I see a need for it. Charles (if you're still there), and Juliet, was the line-by-line editing method mentioned at the PG/IA bash? > you should try the "open-source community", > where there are scads of programmers who > will happily do your programming for free... Agreed, but even there, there's never enough volunteers to do all that is often needed. Jon From Bowerbird at aol.com Thu Nov 2 14:44:25 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Nov 2 14:44:31 2006 Subject: [gutvol-d] sometimes i read the funniest things Message-ID: sometimes i read the funniest things. :+) like when carlo said this, about me, over on the d.p. boards: > he regurarly googles himself, and might come back if he > is named in an open forum. This one is currently not open, > hence it is not indexed, but it is better to edit the posts > with the name anyway. don't be silly, carlo. i haven't done a vanity search in ages, mostly because "bowerbird" turns up too many false alarms. (it seems there are some birds called that, in australia and new guinea. who knew?) ;+) you really think i care that i'm mentioned over there? especially since the mentions are uniformly asinine? i read the d.p. boards to learn stuff about digitizing. i like to do my homework. *** for instance, laura said this, yesterday: > For what it's worth, this is how Wikipedia handles equations > on the pages where the need exists. Any mathematical equation > is enclosed in tags, and the rest of the page is dealt with > normally. The server gives the reader a .png of the result, > when the page is viewed. that's interesting. wikipedia serves up a graphic of an equation. some of you people should go tell them how inadequate that is. *** and here's what josh said just the other day: > PGTEI is extremely useful for one big reason. ? > You create one master document and then > the system creates the other files automatically. of course, the only two formats they are outputting to -- even when they _do_ create an occasional .tei text -- is .pdf and .html, which is straightforward enough that it can be done with the much-easier-to-maintain z.m.l. because of this, the "multiple-formats" rationale will not fly here, because i'll shoot it down, but over on the d.p. boards, they're still giving it as the reason volunteers need to love .tei. *** and carlo, speaking of you, here's another thing you just said: > The guiprep user guide that you quoted gives the wrong > impression that you have to use rtf files. This is necessary > only if you extract italics and bold markup, that should be > avoided. For manual pre-processing txt is much better. you're actually _recommending_ save people o.c.r. to plain-ascii, which means all the formatting has to be reapplied by volunteers. man, how laughable and backward is _that_? like i've said before, it's a good thing those volunteers don't realize how much of their time and energy you are _wasting_, or they wouldn't stick around. it's 2006, we've got full-on word-processing on a web-page, and distributed proofreaders still strips everything back to raw ascii... and telling people they have to do .tei markup to get .html and .pdf. so hey, i'd have _plenty_ to talk about if i _was_ on the d.p. boards. *** however, lucy's right, i have much better things to do than waste my time and energy over there on the d.p. boards. but if i _do_ decide it might be fun to come and start in again, y'all will be blessed with 256 more posts -- on top of the 256 posts i've made so far, which evidently made an impression, since you're still talking about me -- before i call another halt, so yeah, maybe you better not mention me. (believe me, elisa, i don't need to go anywhere near 32,768 posts; 512 will be _more_ than sufficient to be remembered a long time. especially since i've now started delivering the pudding...) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061102/8ad08564/attachment.html From cannona at fireantproductions.com Thu Nov 2 17:27:47 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu Nov 2 17:27:56 2006 Subject: [gutvol-d] sometimes i read the funniest things References: Message-ID: <003901c6fee7$495ba0c0$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bowerbird wrote: for instance, laura said this, yesterday: > For what it's worth, this is how Wikipedia handles equations > on the pages where the need exists. Any mathematical equation > is enclosed in tags, and the rest of the page is dealt with > normally. The server gives the reader a .png of the result, > when the page is viewed. that's interesting. wikipedia serves up a graphic of an equation. some of you people should go tell them how inadequate that is. Hmmm... Let's think about this. Wikipedia renders latex into images and serves those images by default. Your program just displays the images and leaves it up to the ebook producer to render/scan/draw them. Wikipedia lets you change the default behavior and have the latex sent to the browser directly. Your program gives the user no choice in the matter. So, once again, the only thing that's inadequate is your zml. Aaron Cannon - -- Skype: cannona MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFSpsbI7J99hVZuJcRAvp+AJoCozh5nXEGTvEIB1HpDIsyQRVNGACeIJZh VVctWUw0IOAsoDVHRo3/yGI= =iCTH -----END PGP SIGNATURE----- From schultzk at uni-trier.de Fri Nov 3 00:33:52 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri Nov 3 00:33:58 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <4549DC5F.8010002@perathoner.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> Message-ID: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> Hi Marcello, I YOU DO NOT take Bowerbird seriously, then why do you even make the effort to be absolutely nasty to him. If he is not to be taken seriously THEN IGNORE him and with all so called pests, he will go away. First off, his method is not any better than anything or worse than your socalled TEI ! As you said it is not finished nor anywhere near it. The TEI movement has been around for at least 5 years. As far as I am concerned it is vaporware. So if Bowerbird wants to get something going let him do it. THE ACTUAL PROBLEM PG has that there is no official specification to an official format or markup!! If there was one it would only take at the max 2 months to get something working. BUT, for the past 10 years all I see is bits and pieces and a lot of discussion. Since the begining of PG I have discuss and ask Micheal Hart for a change in concept that PG use something else than PLAIN VANILLA ASCII as the BASE format for PG. To do conversion after scanning and especially proofing IS NOT the way to GO! It is a waste of resources. Furthermore, everbody wants to reinvent the wheel. Micheal Hart has to step in and allow for a change. We can always still make plain vanilla texts availible. It is not hard for the people doing the scanning and proofing to learn a official structure and use the tools that we can create. How about we all sitting done and discussing an official format and specifing it and geting the tools implemented? Instead of everybody doing there own thing !! The implementation will naturally have to be multi-platform and able to be web-based, too. We will have to use freely availible programming tools and languages. Am 02.11.2006 um 12:54 schrieb Marcello Perathoner: > Schultz Keith J. wrote: > >> Hey XML can do all that. All we need is a common xml template. >> One >> format! a known straucture >> a few filters and voila. a neat package exactly what everybody is >> trying to create. > > This is the reason why we have to shut up BB: People reading this list > will think that the most vociferous person represents the consensus in > PG research. Not so. BB just pesters eveybody who doesn't want to hear > with *his* at best half-baked ideas about text representation and > delivery. Nobody takes BB seriously, and you shouldn't too. > > > The state of PG research is: > > Consensus has been reached about using a subset of TEI as master > format > for PG texts (since PGXML seems to be dead). Which subset is still > being > discussed. > > There are at least 2 different working toolchains to convert > subsets of > TEI to end user formats. Files produced with these toolchains have > been > posted. > > Of course, everything is still in active research and can change a > lot. > But nobody seriously considers using anything other than TEI or XML as > master format. > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Fri Nov 3 00:45:06 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri Nov 3 00:45:10 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net> References: <24690987.1162474398658.JavaMail.?@fh1038.dia.cp.net> Message-ID: Hi Josh, Yes, I did and still direct that question at him. It has to do with his remarks and a post he made about two weeks ago, as with continuous posts he is making. Bowerbird is synical (spelling). He has his ways and ideas. Not all are bad. Just like his points. But, he has not gotten rude before somebody else has. Just because you do not like someone, do not jump on him everytime you see him. The proper way is to ignore him and not scream at him!!! regards Keith. Am 02.11.2006 um 14:33 schrieb joshua@hutchinson.net: > Are you sure you meant to address those comments to Marcello? > > What you are talking about is what Marcello has done. bowerbird is > the kindergarten kid you seem to be talking about... > > Josh > >> ----Original Message---- >> From: schultzk@uni-trier.de >> Date: Nov 2, 2006 3:05 >> To: "Project Gutenberg Volunteer Discussion" org> >> Subj: Re: [gutvol-d] gvd061030 -- let's get it started in here >> >> Hi Marcello, >> >> I will ask my question again: Do you know what you are doing? >> >> >> Why do not you can your comments to another list, please. >> >> Your are worse than a kindergarden kid. >> >> Personally, I do not know any of the systems, but from what I >> heard and seen they are all primitive and none do the JOB and will. >> [snip, snip] From Bowerbird at aol.com Fri Nov 3 02:05:38 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 3 02:05:42 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: keith said: > he has not gotten rude before somebody else has. thanks for noticing. :+) and actually, i hope i haven't gotten rude even then. i _will_ admit to heavy sarcasm, but that at least has an element of humor in it, which my detractors lack, so they are left with nothing but meanspiritedness... and i regret that i bring that out in them, i do. and if i didn't have a message that i firmly believe needs to be heard here, then i would probably let their bullying carry the day, and go away. but i have something to say. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/78ac4ee4/attachment.html From marcello at perathoner.de Fri Nov 3 03:47:50 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 03:47:53 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> Message-ID: <454B2C66.5020100@perathoner.de> Schultz Keith J. wrote: > First off, his method is not any better than anything or worse > than your socalled TEI ! As you said it is not finished nor > anywhere near it. The TEI movement has been around for at least > 5 years. As far as I am concerned it is vaporware. So if Bowerbird > wants to get something going let him do it. zml is worse than TEI because it cannot represent text features that are widely used in the library. If a simple text format could do that, major universities would use it. They all use TEI instead. But if you mention that to BB, he just tells you to do his homework for him. The TEI markup language is long since ready to use, it is undergoing its 5th revision right now. What is still in development are the tools to convert TEI to PG end-user formats. But this doesn't stop you from marking up any text using the full TEI right now. > THE ACTUAL PROBLEM PG has that there is no official specification > to an official format or markup!! If there was one it would only take > at the max 2 months to get something working. BUT, for the past 10 > years all I see is bits and pieces and a lot of discussion. Your judgement of development time is very poor. BB is working nearly 4 years on his very easy to do zml thingie and has nothing to show. But even if he knew how to program, he couldn't do anything in 2 months. > How about we all sitting done and discussing an official format and > specifing it and > geting the tools implemented? Instead of everybody doing there own > thing !! > The implementation will naturally have to be multi-platform and able > to be > web-based, too. We will have to use freely availible programming > tools and languages. Design by committee will never work in PG. It is against Michael Hart's expressed policy to step down and give directions. So, if you want an "official" format, the way to go is to create it yourself and make it so good it will blow your opponents' formats away. Maybe BB will team up with you. Good luck! -- Marcello Perathoner webmaster@gutenberg.org From schultzk at uni-trier.de Fri Nov 3 04:38:00 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri Nov 3 04:38:08 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <454B2C66.5020100@perathoner.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> <454B2C66.5020100@perathoner.de> Message-ID: Hi Marcello, Am 03.11.2006 um 12:47 schrieb Marcello Perathoner: > Schultz Keith J. wrote: > >> First off, his method is not any better than anything or worse >> than your socalled TEI ! As you said it is not finished nor >> anywhere near it. The TEI movement has been around for at least >> 5 years. As far as I am concerned it is vaporware. So if >> Bowerbird >> wants to get something going let him do it. > > zml is worse than TEI because it cannot represent text features > that are > widely used in the library. If a simple text format could do that, > major > universities would use it. They all use TEI instead. The only place I heard about TEI is here. I also do care to know where it is used, so do not even think about telling me where. Whether zml or TEI is better I personally could careless. IMHO neither is worth the effort. There are already better systems out there that work and have tools for them. As I have said time and time again people like to reinvented the wheel to get a so called simpler method, but at the same time getting a system that have serious lackings. > > But if you mention that to BB, he just tells you to do his homework > for him. To quote a friend: "One million flies can not err!! Eat Shit!!" This quote is NOT directed at BB. > > The TEI markup language is long since ready to use, it is > undergoing its > 5th revision right now. What is still in development are the tools to > convert TEI to PG end-user formats. But this doesn't stop you from > marking up any text using the full TEI right now. > > >> THE ACTUAL PROBLEM PG has that there is no official specification >> to an official format or markup!! If there was one it would >> only take >> at the max 2 months to get something working. BUT, for the >> past 10 >> years all I see is bits and pieces and a lot of discussion. > > Your judgement of development time is very poor. BB is working > nearly 4 > years on his very easy to do zml thingie and has nothing to show. But > even if he knew how to program, he couldn't do anything in 2 months. I was NOT talking about Bowerbird!!!! I was talking about PG in general !!! > > >> How about we all sitting done and discussing an official >> format and >> specifing it and >> geting the tools implemented? Instead of everybody doing there >> own >> thing !! >> The implementation will naturally have to be multi-platform >> and able >> to be >> web-based, too. We will have to use freely availible programming >> tools and languages. > > Design by committee will never work in PG. It is against Michael > Hart's > expressed policy to step down and give directions. It would, if he allowed it. It works everywhere else in the world!!! > > So, if you want an "official" format, the way to go is to create it > yourself and make it so good it will blow your opponents' formats > away. > Maybe BB will team up with you. Good luck! Nice contradiction here !! I Keith J. Schultz can make and force an official PG format on everybody ???!!! Come on, Marcello. I get the feeling you have a big ego problem. I would have already set up a working system 10 years ago, if Micheal would have allowed it. But, I will not start doing anything unless it gets an official go ahead and will be used by PG officially. My time is to precious to waste on anything else. regards Keith. From marcello at perathoner.de Fri Nov 3 04:48:20 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 04:48:24 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: References: Message-ID: <454B3A94.3000301@perathoner.de> Bowerbird@aol.com wrote: > but i have something to say. New *updated* edition! Illustrated! Get it now! Run! "The Proof is in the Poo" The Collected Sayings of Bowerbird. As HTML: http://www.gnutenberg.de/bowerbird/ Also available as TEI master: http://www.gnutenberg.de/bowerbird/poo.tei -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Nov 3 05:33:10 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 05:33:13 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> <454B2C66.5020100@perathoner.de> Message-ID: <454B4516.3070605@perathoner.de> Schultz Keith J. wrote: > I know what it takes top do the job! It is my profession: Linguistics. but he also wrote: > The only place I heard about TEI is here. "The TEI was originally sponsored by the Association of Computers in the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association of Literary and Linguistic Computing (ALLC). Major support has been received from the U.S. National Endowment for the Humanities (NEH), the European Community, the Mellon Foundation, and the Social Science and Humanities Research Council of Canada." http://www.tei-c.org/ A list of 134 projects using TEI can be found here: http://www.tei-c.org/Applications/ > I would have already set up a working system 10 years ago, if Micheal > would have allowed it. But, I will not start doing anything unless > it gets an official go ahead and will be used by PG officially. I'm sure you would! Especially with the great competence you have already shown in your very own professional field. Bottom line: You never will get PG to officially endorse any one format. Your only chance is to create a format that is so much better than any other format, that PG and DP volunteers don't want to use anything else. At that point it will automatically become the "official" format. Tip: don't go to DP and piss off everybody in sight like BB did. DP people will have a very loud say in this question because its they that create the books. -- Marcello Perathoner webmaster@gutenberg.org From hyphen at hyphenologist.co.uk Fri Nov 3 06:15:42 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Nov 3 06:16:17 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <454B3A94.3000301@perathoner.de> References: <454B3A94.3000301@perathoner.de> Message-ID: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> On Fri, 03 Nov 2006 13:48:20 +0100, Marcello Perathoner wrote: | http://www.gnutenberg.de/bowerbird/poo.tei >>> The XML page cannot be displayed Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later. -------------------------------------------------------------------------------- The system cannot locate the object specified. Error processing resource 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent... %TEI.extensions.ent; <<< So modern I can not view it with latest IE 6.0.2900.2180 What use is something which *ordinary* people can not read? -- Dave Fawthrop From joshua at hutchinson.net Fri Nov 3 06:31:13 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Fri Nov 3 06:31:24 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> >----Original Message---- >From: marcello@perathoner.de > >Schultz Keith J. wrote: > >> I would have already set up a working system 10 years ago, if Micheal >> would have allowed it. But, I will not start doing anything unless >> it gets an official go ahead and will be used by PG officially. > > >Bottom line: You never will get PG to officially endorse any one format. > >Your only chance is to create a format that is so much better than any >other format, that PG and DP volunteers don't want to use anything else. >At that point it will automatically become the "official" format. > Marcello is right. Greg and Michael have both said, in private communications and in public messages, that they will not dictate direction in PG. Michael, especially, likes the "throw it against the wall and see if it sticks" method of management. While it can be frustrating at times, because a more decisive leadership can often "make things happen," this is not something that is likely to change. Ever. So, we have to plan with that in mind. That said, the problem with TEI as a master format isn't that the format itself isn't ready. It isn't even really the toolchain that converts TEI to other formats (and it is more formats than bb gives us credit for ... HTML, PDF, UTF-8, Latin-1, and US-ASCII are directly created, then a background server script creates a plucker document after posting ... so 6 formats are created from one TEI master), the problem is during the earlier steps. Creating the TEI doc itself. We have a good set of guidelines, but no tools specifically designed to help in that process. Personally, I use the GuiGuts editor developed by thundergnat over at DP (a wonderful little perl-based editor!) and a series of Regular Expressions that do the heavy lifting. But I still spend quite a bit of time manually tweaking the results. It's nice that once I'm done and it validates in TEI, I *know* the results will validate in HTML, create a good PDF and follow PG guidelines in the text documents, but it'd be nicer to have a better interface/toolset for creating the TEI. Unfortunately, it is a bit of a chicken and egg thing. Until I can make TEI more popular with folks, the developers don't make the tools. And until I have the tools, I can't get enough people to use it to reach critical mass. I can create texts (and do) in TEI, but I don't have the skills to make tools for helping in the creation of said TEI docs. As Jon Noring agreed earlier, there are NEVER enough developers to go around! ;) Josh From desrod at gnu-designs.com Fri Nov 3 06:39:29 2006 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Fri Nov 3 06:40:18 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> References: <454B3A94.3000301@perathoner.de> <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> Message-ID: <1162564769.18628.11.camel@localhost.localdomain> On Fri, 2006-11-03 at 14:15 +0000, Dave Fawthrop wrote: > What use is something which *ordinary* people can not read? "ordinary" people use the HTML version, and those with the proper TEI environment set up, use the TEI version. -- David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/588a40ad/attachment.bin From hart at pglaf.org Fri Nov 3 08:37:14 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Nov 3 08:37:16 2006 Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> Message-ID: Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are correct, and, in two of these three cases, this is part of a long running pattern, so it is likely an intentional error. Refutations applied to each comment below: On Fri, 3 Nov 2006, joshua@hutchinson.net wrote: >> ----Original Message---- From: marcello@perathoner.de >> >> Schultz Keith J. wrote: >> >>> I would have already set up a working system 10 years ago, if Micheal >>> would have allowed it. But, I will not start doing anything unless >>> it gets an official go ahead and will be used by PG officially. 1. Project Gutenberg encourages all such system setups and always has. 2. You are now, and always have been, welcome to your own directories on the official Project Gutenberg servers to work with. Greg Newby will be only too glad to set up any such services you like, and even to help recruit volunteers to help you. >> Bottom line: You never will get PG to officially endorse any one format. To the extent of an exclusivve endorsement that would disallow other formats that is is most likely true, but if you don't have faith in your own format, then you can't expect anyone else to have it. There will be no government sponsored religions here. >> Your only chance is to create a format that is so much better than any other >> format, that PG and DP volunteers don't want to use anything else. At that >> point it will automatically become the "official" format. As it should be. > Marcello is right. Greg and Michael have both said, in private communications > and in public messages, that they will not dictate direction in PG. We will not dictate one direction at the expense of other similar efforts. This is not a race to create the official exclusive Project Gutenberg format. We will present eBooks in lots of formats, particularly those request by those to read our books. > Michael, especially, likes the "throw it against the wall and see if > it sticks" method of management. The actual quotation is: "We encourage you to run your ideas up the flagpole and see who salutes." If you can't get anyone to use your format, we are hardly going to force your ideas through anyone's alimentary canal, either. > While it can be frustrating at times, because a more decisive leadership can > often "make things happen," this is not something that is likely to change. > Ever. So, we have to plan with that in mind. "Make things happen" is exactly what Project Gutenberg encourages. What you seem to want is for someone else to "make things happen" for you. We'll help, but we won't do it for you. And we won't declare your or your format the "official" winner. There will always be be room for improvements. Project Gutenberg is a dyanmic process to maximize the eBook potential, not a static system to be once achieved and then left as a fossil record. "Make Things Happen" Don't wait for someone else to tell you that your idea has happened. > Unfortunately, it is a bit of a chicken and egg thing. Until I can > make TEI more popular with folks, the developers don't make the tools. > And until I have the tools, I can't get enough people to use it to > reach critical mass. I can create texts (and do) in TEI, but I don't > have the skills to make tools for helping in the creation of said TEI docs. Just start by posting your examples and pointing to them. That's how YouTube, MySpace, Google, Yahoo, and Project Gutenberg started. Don't expect to start at the end, it helps to start at the beginning. > As Jon Noring agreed earlier, there are NEVER enough developers to go > around! ;) > > Josh Or, there are too many developers creating not enough example eBooks to generate any interest. Over the 10 years mentioned at the top of this commentary, if you had created just one eBook per month for any particular new style of format then you would have a collection of well over 100 eBooks to demonstrate. Without such an initial collection, it's hard to expect anyone to come. "Build it, and they will come." Thanks!!! Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From marcello at perathoner.de Fri Nov 3 08:51:30 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 08:51:33 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> References: <454B3A94.3000301@perathoner.de> <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> Message-ID: <454B7392.1060201@perathoner.de> Dave Fawthrop wrote: > On Fri, 03 Nov 2006 13:48:20 +0100, Marcello Perathoner > wrote: > > | http://www.gnutenberg.de/bowerbird/poo.tei > The XML page cannot be displayed > Cannot view XML input using style sheet. Please correct the error and then > click the Refresh button, or try again later. > > > -------------------------------------------------------------------------------- > > The system cannot locate the object specified. Error processing resource > 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent... > > %TEI.extensions.ent; > <<< > > So modern I can not view it with latest IE 6.0.2900.2180 > > What use is something which *ordinary* people can not read? The error is in your browser. The object is served as "text/plain". That means: It is to be displayed to the user without further ado. IE has no business trying to interpret it in any way. $ wget -S http://www.gnutenberg.de/bowerbird/poo.tei --17:26:32-- http://www.gnutenberg.de/bowerbird/poo.tei => `poo.tei' Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8118... connected. Proxy request sent, awaiting response... HTTP/1.1 200 OK Date: Fri, 03 Nov 2006 16:26:32 GMT Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1 Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT ETag: "27400a-7c6c-2fe12c00" Accept-Ranges: bytes Content-Length: 31852 Content-Type: text/plain; charset=utf-8 Connection: close Length: 31,852 (31K) [text/plain] 100%[====================================>] 31,852 123.50K/s 17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852] $ -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Fri Nov 3 08:58:09 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 3 08:58:14 2006 Subject: [gutvol-d] The Proof is in the Poo Message-ID: marcello said: > Also available as TEI master: > http://www.gnutenberg.de/bowerbird/poo.tei ok, that's kind of funny... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/39877901/attachment-0001.html From Bowerbird at aol.com Fri Nov 3 10:10:54 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 3 10:11:04 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: josh said: > it is more formats than bb gives us credit for ... > HTML, PDF, UTF-8, Latin-1, and US-ASCII are directly created surely you don't count different encodings as different "formats". > then a background server script creates a plucker document > after posting ... so 6 formats are created from one TEI master well, at least plucker is an honest-to-goodness e-book format, but creating plucker from .html isn't all that earth-shaking, is it? plucker can pull down any set of associated pages on a site, not? > the problem is during the earlier steps.? > Creating the TEI doc itself.? right. applying the markup. that's always the hard part. the part that invokes costs -- human time and energy. (not to mention expertise.) and if the only reason for paying these costs is to get the benefit of multiple formats, and that benefit can be obtained by use of simpler (and thus less costly) means, then there's no sense in applying the markup. you're gonna have to come up with different benefits if you really want to be convincing. follow the lead that david gave a few weeks back, based on network effects. > We have a good set of guidelines, but no tools > specifically designed to help in that process.? and here the question of costs raises its ugly head again. a complex format requires complex tools to deal with it, and such tools are difficult and time-consuming to create. and that's precisely why you don't have the tools you need. > Personally, I use the GuiGuts editor developed by thundergnat > over at DP (a wonderful little perl-based editor!) and a series of > Regular Expressions that do the heavy lifting.? i encourage everyone to take a good hard look at guiguts and delve into regular expressions to see if that approach will work for you. it might. or it might not. you will know. > But I still spend quite a bit of time manually tweaking the results.? right. more time and energy and expertise. all to get out a .pdf and an .html version. which i can also return, with a simple .zml file, which can be edited together by a 4th-grader. and yes, i haven't yet demonstrated those capabilities, not fully, but i have begun the process of doing so, and will be continuing. (actually, i did show output from my process some time ago, for "alice in wonderland", but it was just shouted down and then forgotten about, the modus operandi of my detractors.) *** marcello said: > d.p. is currently averaging about 2000 books a year. that's a healthy number. it makes me happy to see it. since p.g. took 10 years to get its first 10,000 books (and 20 years before that to get its first _100_ books), i have the historical perspective to appreciate 2000/year. yet still, google's contract with the university of california calls for u.c. to provide 3000 books per _day_ for scanning. and who knows how many more are being done each day from michigan and stanford and all of the other partners? so let us hope we can open up more digitization channels than d.p. to handle this huge avalanche of scanned books, or p.g. will become little more than a former waystation... yet the .tei move challenges even the dedicated technoids over at d.p. so how will "ordinary people" be able to deal? and if we can't use the input from "ordinary people", then how will we be able to keep up with the deluge? we won't. so you're being squeezed in two directions. at the same time that you need to make things easier, to get more volunteers, you're moving to a system that makes digitization _harder_... something's got to give... *** michael said: > "Build it, and they will come." as it should be. i want a nice little contest, me alone against the .tei gang. i've said it before, and i'll say it again: i know i have a winning hand. and i'm not gonna show it until the other people at the table _fold_, or throw all their money in the pot. i want the .tei people to invest _lots_ of their precious time and energy in their format, so they see with their own eyes, and understand from their own experience, that the cost-benefit ratio cannot be justified. depending on how stubborn they are, they might have to spend a lot of time and energy before they get that realization. but that's putting your money where your mouth is. when they have put all their money in the pot, i'll show y'all _my_ hand... which is not to say that i'm not willing to throw a few cards down now. so... i'm sorry i got suckered -- even just a little bit -- back into this "argument", because we're into pudding time now, my friends, where you will begin to find the proof. more on "babelfish" starting monday... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/af790b18/attachment.html From joshua at hutchinson.net Fri Nov 3 10:28:33 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Fri Nov 3 10:29:00 2006 Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net> >----Original Message---- >From: hart@pglaf.org > >Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are correct, >and, in two of these three cases, this is part of a long running pattern, so >it is likely an intentional error. It looks like you were taking my comments as disparaging against you or the way PG does things. I apologize. It wasn't meant that way. There are, however, a few things I want to clear up, so I'll add comments to your comments. > >Refutations applied to each comment below: > >2. You are now, and always have been, welcome to your own directories >on the official Project Gutenberg servers to work with. Greg Newby >will be only too glad to set up any such services you like, and even >to help recruit volunteers to help you. > Yes, this is absolutely true. I've had a directory on the PG server for a couple years now (as long as I've been working actively on the PGTEI stuff). Greg set me, almost no questions asked, and has given me lots of space to work. I've since started using the same account for audio posting duties, so it isn't *just* for PGTEI anymore, but that is what it was originally intended for. > >> Marcello is right. Greg and Michael have both said, in private communications >> and in public messages, that they will not dictate direction in PG. > >We will not dictate one direction at the expense of other similar efforts. > >This is not a race to create the official exclusive Project Gutenberg format. > >We will present eBooks in lots of formats, particularly those request by >those to read our books. > We just said the same thing, Michael. You make it sound like I am looking for official backing. I'm not. I was explaining to Keith *why* such backing is never likely to be forthcoming. Personally, I can see why a benevolent dictatorship style of managing has its appeal for Keith. Things, right or wrong, tend to get done. But the more chaotic and *democratic* approach that PG has always followed allows for a survival of the fittest approach to new ideas. This has its merits too. My intention, however, was not to argue or hint at a preference for either, but rather explain why (in my opinion) things work the way they do. > >> Michael, especially, likes the "throw it against the wall and see if >> it sticks" method of management. > >The actual quotation is: > >"We encourage you to run your ideas up the flagpole and see who salutes." > *sings* You say tomato, I say tomato ... Ahem, sorry, my voice stinks. ;) > >What you seem to want is for someone else to "make things happen" for you. > Not at all. As I explained above, my comments were not a whine or call for help from above (and I apologize if I was unclear in that regard), but rather an explanation of how PG works. > >> Unfortunately, it is a bit of a chicken and egg thing. Until I can >> make TEI more popular with folks, the developers don't make the tools. >> And until I have the tools, I can't get enough people to use it to >> reach critical mass. I can create texts (and do) in TEI, but I don't >> have the skills to make tools for helping in the creation of said TEI docs. > >Just start by posting your examples and pointing to them. > I do. A lot of them. In fact ... > >Over the 10 years mentioned at the top of this commentary, if you had >created just one eBook per month for any particular new style of format >then you would have a collection of well over 100 eBooks to demonstrate. > I've been doing this for just over 2 years and have 81 PGTEI encoded documents posted to the PG archive. I work my butt off, in fact. However, I'm not a good evangelist. I'll be the first to admit it. I've tried to make the information known in a low-key manner and some people will come to me with questions because of it, but I've found the lack of utilities is the biggest stumbling block for new people. They like the potential, but the tools to help them get there are not robust. Yet. > >"Build it, and they will come." > Yes. I agree. The problem is that I can't build the tools. I can do examples, I can clarify guidelines and I can provide advice and feedback (all of which I do), but I can't do much more than basic scripting in the way of tools. (I created a script for David Widger that helps him quickly run a TEI doc through the TEI toolchain to create the posting files, then zip them up in a file ready for posting, but that is the extent of my abilities). Marcello, who does wonderful work on the PGTEI toolchain, also has lots of other demands on his volunteer time since he is our webmaster, too (not to mention his life outside PG). I can't rely on him to provide tools for editing/generating the TEI as well. Hence, my original "chicken and egg" comment. Josh From lee at novomail.net Fri Nov 3 10:31:00 2006 From: lee at novomail.net (Lee Passey) Date: Fri Nov 3 10:31:10 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> Message-ID: <454B8AE4.4000603@novomail.net> joshua@hutchinson.net wrote: > ... the problem is during the earlier steps. Creating the TEI doc > itself. We have a good set of guidelines, but no tools specifically > designed to help in that process. Is this really true? I don't mean about the tools, because there clearly are no good tools to help with TEI doc creation (although if Mr. Noring's push to create an e-book-oriented XML editor ever comes to fruition, that could help). I mean about the good set of guidelines. The last time I looked (admitedly about a year ago) I couldn't find any. Can you provide a link? How were these guidelines created? Are they merely a pointer to the TEI documentation, which is actually more obtuse than BB's ZML "specification"? Are they a result of discussion and consensus, and is that discussion archived somewhere? Or did you and Marcello simply put something together and say "here are the guidelines?" (I'm not saying that's a bad thing; consensus on the Internet is almost impossible to acheive, guidelines by experts and stake-holders have more relevance than popular opinion, and almost any guidelines, no matter how derived, are better than no guidelines at all.) Just curious. From joshua at hutchinson.net Fri Nov 3 10:42:40 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Fri Nov 3 10:42:44 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here Message-ID: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net> We do have guidelines. I suppose whether they are good or not is in the eye of the beholder. ;) What we have is here: http://pgtei.pglaf.org/marcello/0.4/ (Warning: Due to the migration Greg is currently doing to a newer and faster pglaf server, the page is down.) It is deliberately structured like the TEI documentation and does contain many links back to their much more extension documentation. There have been changes made, both due to my feedback and due to others that have tried to use it. However, it could definitely be improved and flesh out. As far as community feedback, that tends to happen more in the DP forums where folks are more actively putting together and talking about new etexts. Josh >----Original Message---- >From: lee@novomail.net >Date: Nov 3, 2006 13:31 >To: "Project Gutenberg Volunteer Discussion" >Subj: Re: [gutvol-d] gvd061030 -- let's get it started in here > >joshua@hutchinson.net wrote: > >> ... the problem is during the earlier steps. Creating the TEI doc >> itself. We have a good set of guidelines, but no tools specifically >> designed to help in that process. > > >Is this really true? I don't mean about the tools, because there clearly >are no good tools to help with TEI doc creation (although if Mr. >Noring's push to create an e-book-oriented XML editor ever comes to >fruition, that could help). I mean about the good set of guidelines. The >last time I looked (admitedly about a year ago) I couldn't find any. Can >you provide a link? > >How were these guidelines created? Are they merely a pointer to the TEI >documentation, which is actually more obtuse than BB's ZML >"specification"? Are they a result of discussion and consensus, and is >that discussion archived somewhere? Or did you and Marcello simply put >something together and say "here are the guidelines?" (I'm not saying >that's a bad thing; consensus on the Internet is almost impossible to >acheive, guidelines by experts and stake-holders have more relevance >than popular opinion, and almost any guidelines, no matter how derived, >are better than no guidelines at all.) > >Just curious. >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Fri Nov 3 11:23:36 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 11:23:39 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <454B8AE4.4000603@novomail.net> References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> <454B8AE4.4000603@novomail.net> Message-ID: <454B9738.4050603@perathoner.de> Lee Passey wrote: > Is this really true? I don't mean about the tools, because there clearly > are no good tools to help with TEI doc creation Sebastian Rahtz of Oxford University Computing Services has created a set of stylesheets to enable OpenOffice to load and save TEI documents. I never checked them out though. > did you and Marcello simply put > something together and say "here are the guidelines?" Yes. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Nov 3 12:28:51 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 12:28:55 2006 Subject: [gutvol-d] Challenge In-Reply-To: References: Message-ID: <454BA683.9030303@perathoner.de> Bowerbird@aol.com wrote: > and if the only reason for paying these costs is to get > the benefit of multiple formats, and that benefit can be > obtained by use of simpler (and thus less costly) means, > then there's no sense in applying the markup. Strawman! Nobody is disputing that *if* you can get there cheaper you should do it. We are just saying that ZML cannot get us there. Therefore it is immaterial how cheap it is. All you have "demonstrated" until today is, that you can select *one* text that is simple enough to make ZML look good. Instead you should demonstrate that ZML is mighty enough to handle *every* text in the library. Say: footnotes, endnotes, tables, lists, titles, subtitles, images, equations, I propose this contest: I will select one PG text for you to mark up in ZML with all textual features preserved. You will select one PG text for me to mark up in TEI with all textual features preserved. In one week we will present our results online with the end-user formats we generated, the markup source and the open-source source code of all tools used in the process. If the end-user formats cannot be regenerated using only the master format and the tools provided in source code, the entrant is disqualified. This will assure that both of us operate with the tools available today on a real-world text of some difficulty and that the tools used are freely available today to everybody else. Chicken out? > a complex format requires complex tools to deal with it, > and such tools are difficult and time-consuming to create. > and that's precisely why you don't have the tools you need. Misrepresentation of facts! I have all the editors I need. I have emacs and I have nxml-mode which makes applying TEI tags a breeze and validates while I type so I don't have to waste time looking for markup errors. I have vi, I have OpenOffice and a couple commercial ones. All those handle TEI. All you have is some generic text editor that helps nothing to get your format right. And nobody is going to develop an editor that understands ZML for you. I have all the libraries I need to read, transform and write my format. For all programming languages I might choose. Good and time-tested ones. All open-source. You have not one library for your format. Even if you write one in perl, you still have none in java. If you write one in java you still have none in python. ... I have all the support I need. There is a dedicated mailing list for TEI chock full of knowledgeable people who want to help. These people all work in humanities and if I have a question eg. concerning TEI and Thai script I'll get an answer. You have no support at all. If a problem surfaces your only choice is to resort to the time-tested method of denying there is a problem at all. -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Fri Nov 3 13:00:26 2006 From: hart at pglaf.org (Michael Hart) Date: Fri Nov 3 13:00:28 2006 Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net> References: <85688.1162578513975.JavaMail.?@fh1063.dia.cp.net> Message-ID: On Fri, 3 Nov 2006, joshua@hutchinson.net wrote: > >> ----Original Message---- >> From: hart@pglaf.org >> >> Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are correct, >> and, in two of these three cases, this is part of a long running pattern, so >> it is likely an intentional error. > > It looks like you were taking my comments as disparaging against you > or the way PG does things. I responded to the content only, of all three voices. Perhaps I should have included Jon Noring, but he was only a 3rd party, so I didn't. > I apologize. It wasn't meant that way. Since I responded to content only, no need to apologize. > There are, however, a few things I want to clear up, so I'll add > comments to your comments. > >> >> Refutations applied to each comment below: > >> >> 2. You are now, and always have been, welcome to your own directories >> on the official Project Gutenberg servers to work with. Greg Newby >> will be only too glad to set up any such services you like, and even >> to help recruit volunteers to help you. >> > > Yes, this is absolutely true. I've had a directory on the PG server > for a couple years now (as long as I've been working actively on the > PGTEI stuff). Greg set me, almost no questions asked, and has given me > lots of space to work. I've since started using the same account for > audio posting duties, so it isn't *just* for PGTEI anymore, but that is > what it was originally intended for. However, there is no reason you coulnd't get other directories for other purposes than just your original purpose. > >> >>> Marcello is right. Greg and Michael have both said, in private communications >>> and in public messages, that they will not dictate direction in PG. >> >> We will not dictate one direction at the expense of other similar efforts. >> >> This is not a race to create the official exclusive Project Gutenberg format. >> >> We will present eBooks in lots of formats, particularly those request by >> those to read our books. >> > > We just said the same thing, Michael. You make it sound like I am > looking for official backing. I'm not. I was explaining to Keith > *why* such backing is never likely to be forthcoming. Again, I was responding to all three voices, no need to apologize for what someone else said. > Personally, I can see why a benevolent dictatorship style of managing > has its appeal for Keith. Things, right or wrong, tend to get done. Benevolent dictatorships can be very effective, and have been, even inside PG there are subgroups that are very disciplined, and that is fine with me, as long as they don't try to take over the entire Project Gutenberg structure, which it seems they try to do every year or so. > But the more chaotic and *democratic* approach that PG has always > followed allows for a survival of the fittest approach to new ideas. > This has its merits too. No reason you can't have both, and should not have both, when we have the world world as potential volutneers. > My intention, however, was not to argue or hint at a preference for either, > but rather explain why (in my opinion) things work the way they do. However, there is no reason we can't have both. . .which would require, by structural definition, that the very highest level accept both, then the sub-levels can define themselves as they please. . .but with a top level without such flexibility, this cannot happen, other than by a volunteer revolt. >>> Michael, especially, likes the "throw it against the wall and see if >>> it sticks" method of management. >> >> The actual quotation is: >> >> "We encourage you to run your ideas up the flagpole and see who salutes." >> > > *sings* You say tomato, I say tomato ... If you are going to use quotes, you should actually be guoting what was said, or identify some other source. Since you used my name, you are responsible to use my quotation, unless otherwise specified, and then it is still a red herring. > Ahem, sorry, my voice stinks. ;) [comment withheld] >> >> What you seem to want is for someone else to "make things happen" for you. >> > > Not at all. As I explained above, my comments were not a whine or > call for help from above (and I apologize if I was unclear in that > regard), but rather an explanation of how PG works. Not all I wrote was in response to comments of one person, as mentioned in the first line. And How PG works is to allow pretty much everyone to do their own things. It is how PG does NOT work to create bosses that bothers some here. >>> Unfortunately, it is a bit of a chicken and egg thing. Until I can >>> make TEI more popular with folks, the developers don't make the tools. >>> And until I have the tools, I can't get enough people to use it to >>> reach critical mass. I can create texts (and do) in TEI, but I don't >>> have the skills to make tools for helping in the creation of said > TEI docs. >> >> Just start by posting your examples and pointing to them. >> > > I do. A lot of them. In fact ... > >> >> Over the 10 years mentioned at the top of this commentary, if you had >> created just one eBook per month for any particular new style of format >> then you would have a collection of well over 100 eBooks to demonstrate. >> > > I've been doing this for just over 2 years and have 81 PGTEI encoded > documents posted to the PG archive. I work my butt off, in fact. > However, I'm not a good evangelist. I'll be the first to admit it. Greg, and I, and the Newseltter editors would be only to happy to help here. > I've tried to make the information known in a low-key manner and some > people will come to me with questions because of it, but I've found the > lack of utilities is the biggest stumbling block for new people. They > like the potential, but the tools to help them get there are not robust. Yet. Perhaps we could be less low key on your behalf than you would be. >> >> "Build it, and they will come." >> > > Yes. I agree. The problem is that I can't build the tools. I can do > examples, I can clarify guidelines and I can provide advice and > feedback (all of which I do), but I can't do much more than basic > scripting in the way of tools. (I created a script for David Widger > that helps him quickly run a TEI doc through the TEI toolchain to > create the posting files, then zip them up in a file ready for posting, > but that is the extent of my abilities). This is why we should make an issue of promoting your project and getting volunteers to help further teh tool development! > Marcello, who does wonderful work on the PGTEI toolchain, also has > lots of other demands on his volunteer time since he is our webmaster, > too (not to mention his life outside PG). I can't rely on him to > provide tools for editing/generating the TEI as well. Hence, my > original "chicken and egg" comment. Hencemy comments on doing some PR for your. > > Josh > Michael From Bowerbird at aol.com Fri Nov 3 14:26:12 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 3 14:26:29 2006 Subject: [gutvol-d] Challenge Message-ID: marcello said: > The error is in your browser. standard technoid answer: it's _your_ fault. man i'm glad i'm from the mac world, where the mantra is "...it just works..." *** marcello said: > A list of 134 projects using TEI can be found here: > http://www.tei-c.org/Applications/ and guess what... in terms of societal mindshare as a cyberspace library, project gutenberg kicks the ass out of _all_ of them, not just individually, but _combined_. and that's because, instead of the goop of tag soup, michael hart had the vision to give people _the_words_. and -- in the invention of "remixing" before anyone had even coined the term for it -- people took those words and they _ran_ with them. that's the reason why project gutenberg, with no money, ended up outcompeting all of those well-funded projects, which continue to cost money while providing few benefits. and now you want project gutenberg to mimic _their_ methods? you're not the solution. you're the problem. *** marcello said: > Nobody is disputing that *if* you can get there cheaper > you should do it. ok, good, i'm glad we can agree on economics 101 anyway. > We are just saying that ZML cannot get us there. since "there" is a relatively ambiguous place, i'm not sure how you can say that so casually. i was talking about .html and .pdf conversions. i can pull that off. that and a whole lot more... > Therefore it is immaterial how cheap it is. it's not quite so easy to cast cost-benefit in those terms. you have to specify the exact benefits and the exact costs, and compare it to other combinations, and then decide that. it's _never_ "immaterial" how cheap something is, unless it returns absolutely _no_ benefits... > All you have "demonstrated" until today is, that you can > select *one* text that is simple enough to make ZML look good. don't be silly. i didn't select that text. jon noring did. and anyway, i've had several other books up, for some time now. here's "books and culture", by hamilton wright mabie: > http://www.greatamericannovel.com/mabie/mabiec001.html i didn't select this one either; it was google's first example book. and here's "the secret garden", by frances hodgson burnett: > http://www.greatamericannovel.com/sgfhb/sgfhbc001.html i didn't "select" this one, either, but i can't remember why i have it; it might have been compare-independent-digitizations research. and here's "the open library", by brewster kahle: > http://www.greatamericannovel.com/tolbk/tolbkp001.html i did this for reasons that should be obvious, namely its relevance. and then of course there's the "alice in wonderland": > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html this one has been up for over a year now (2005/09/20). now, it's certainly true that all of these books were fairly simple. but here's a newsflash: there are tons and tons of simple books. most of the books in the library already are relatively simple books. of course, even a simple book becomes complicated in a complex format. so, hey, if you think you can sell your complex format, go right ahead... but i'd suggest you get cracking, because in just a couple weeks from now, after i've shown people here how a simple format can handle simple books, and maybe a good percentage of the complicated ones as well, it's gonna be a lot harder for you to pull off that con job. > Instead you should demonstrate that ZML is mighty enough > to handle *every* text in the library. Say: > footnotes, endnotes, tables, lists, titles, subtitles, images, equations, everything in your list is in my test-suite (with equations as images): > http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml > http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html by the way, even though aaron's latest post was just the same old schtick, it reminded me that since tex is just ascii (right?), there's no reason that i can't bundle it in. so just save your tex file(s) alongside the main text, and my viewer will send them to the tex processor of the user's choice. this -- as i have told you -- is a method that can be used for _any_ type of auxiliary file(s) that you might want to bundle inside of your e-book. so they wouldn't even have to be ascii files; any file-format would do... > I propose this contest: > I will select one PG text for you to mark up in ZML > with all textual features preserved. > You will select one PG text for me to mark up in TEI > with all textual features preserved. one text? right up above you wanted me to handle _every_ text in the library. now you want to construe the contest in terms of _one_text_? sorry, the contest has already been framed: it's the entire library. whenever you're ready to start making .tei texts in earnest, do it. i'm ready to start matching you on a text-for-text basis any time. i'll maintain my .zml mirror of the library by my widdle wonesome. you can get all your .tei buddies to help you maintain your mirror. and the decision as to the winner will be made by _the_people_... the same people who have voted project gutenberg #1 so far... and hey, if you wanna point me to a certain text now, i'll take a look, and see how .zml would approach it. i'll remind people that i have been asking for this kind of pointer for some time now. you should even give me a half-dozen such pointers, because i'm sure you will point to something esoteric, and i'll have to wave off a few of 'em, since i've always said there will be some texts that .zml can't do... and if you want me to point you to one text, it'll be my test-suite. i will likewise remind people that i've been asking to see that done in .tei for well over a year already now, and nobody ever did it... > and the open-source source code of all tools used in the process you really wanna see the cards in my hand before you throw all of your money in the pot, don't you? i don't blame you. why waste your time away, eh? unfortunately for you, that's not how the game is played. if you were totally confident that you were gonna win, you'd be as disinterested in my toolset as i am in yours. (emacs and those linux apps only a technoid could love; yeah, right, you're gonna make those palatable to people. here's another newsflash: writely, zoho, jotspot, pbwiki...) > nobody is going to develop an editor that understands ZML for you. "nobody" doesn't have to do that "for me", because i've already done it. i'm not a mere script kiddie; i can make honest-to-goodness applications. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/586b1187/attachment-0001.html From marcello at perathoner.de Fri Nov 3 15:02:05 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 3 15:02:09 2006 Subject: [gutvol-d] Challenge In-Reply-To: References: Message-ID: <454BCA6D.3040903@perathoner.de> Bowerbird@aol.com wrote: >> I propose this contest: >> I will select one PG text for you to mark up in ZML >> with all textual features preserved. >> You will select one PG text for me to mark up in TEI >> with all textual features preserved. > > sorry, the contest has already been framed: it's the entire library. Chickenheart! >> and the open-source source code of all tools used in the process > > unfortunately for you, that's not how the game is played. Nothing to show after 4 years of hype? That's too bad. -- Marcello Perathoner webmaster@gutenberg.org From jeroen.mailinglist at bohol.ph Fri Nov 3 14:57:35 2006 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Fri Nov 3 15:15:20 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> Message-ID: <454BC95F.90807@bohol.ph> Schultz Keith J. wrote: > > First off, his method is not any better than anything or worse > than your socalled TEI ! As you said it is not finished nor > anywhere near it. The TEI movement has been around for at least > 5 years. As far as I am concerned it is vaporware. I have been using TEI for 8 years. I have posted over 200 ebooks produced using TEI. My tools are available on-line, and the TEI master files can be had for the asking. The reason I haven't posted the TEI themselves is simply because I haven't taken enough care to make my tools ready for prime time. TEI isn't vaporware, but a format that is actually and widely used, however, since it is fairly flexible and very rich, and since it approaches text from the semantic, not the visual appearance edge, tools for it are not always up-to-task to generate a good-looking or even acceptable rendered results in a multitude of formats, such as HTML, PDF, plain text, etc. With increasing complexity and features of texts, some programming and fine-tuning of processes will be required... Jeroen. From Bowerbird at aol.com Fri Nov 3 16:13:41 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 3 16:13:53 2006 Subject: [gutvol-d] Challenge Message-ID: marcello said: > Chickenheart! um, did i miss a pointer to an e-text somewhere? and is my pointer to my test-suite being ignored? > Nothing to show after 4 years of hype? That's too bad. well, i'm not showing my source code to you, nope. but i invite you to join the open-source project here. and i'm still ready to match you text-for-text, any time, whenever you think you're ready to start getting started. i'm gonna make sure you've wasted a _lot_ of your time before i show other people that you're wasting your time. (but i'll make sure they see it before they waste _their_ time.) meanwhile, every day, more and more e-texts are posted that are amazingly close to being totally-finished .zml files. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061103/39b46dda/attachment.html From gbnewby at pglaf.org Fri Nov 3 17:34:59 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Nov 3 17:35:01 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net> Message-ID: <20061104013459.GA21049@pglaf.org> On Fri, Nov 03, 2006 at 06:42:40PM +0000, joshua@hutchinson.net wrote: > We do have guidelines. I suppose whether they are good or not is in > the eye of the beholder. ;) > > What we have is here: http://pgtei.pglaf.org/marcello/0.4/ > > (Warning: Due to the migration Greg is currently doing to a newer and > faster pglaf server, the page is down.) Sorry, I didn't realize...fixed now. I put it in to place a week or two ago, but our free nameserver host (xname.org) had a major DDoS last week, and I moved our DNS to a new host. Next, I'll make our entries active on *both* hosts :) Let me know if anything still seems amiss. -- Greg From gbnewby at pglaf.org Fri Nov 3 17:35:50 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Nov 3 17:35:50 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: Message-ID: <20061104013550.GA21170@pglaf.org> On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote: > ... > I would have already set up a working system 10 years ago, if Micheal > would have allowed it. But, I will not start doing anything unless > it gets an official go ahead and will be used by PG officially. > My time is to precious to waste on anything else. Keith, here is your official "go ahead." As Michael responded, we'll use essentially any reasonable format. So if you have something in mind, go for it. If you were thinking that your official format would exclude other formats, that's a different issue. I don't think that will happen too soon... Officially yours, -- Greg Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation http://gutenberg.net A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org From joey at joeysmith.com Fri Nov 3 18:09:23 2006 From: joey at joeysmith.com (joey) Date: Fri Nov 3 18:11:39 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: References: Message-ID: <20061104020923.GA9638@joeysmith.com> On Fri, Nov 03, 2006 at 01:10:54PM -0500, Bowerbird@aol.com wrote: > i want a nice little contest, me alone against the .tei gang. Based on what you've told us about available "ZML marked up etexts" and a little bit of Google-mojo (see http://tinyurl.com/y6kfhl), it would appear you're about 100 books behind. From cannona at fireantproductions.com Fri Nov 3 18:48:59 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri Nov 3 18:51:12 2006 Subject: [gutvol-d] Challenge References: Message-ID: <00a801c6ffbc$1666acc0$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bowerbird wrote: > > everything in your list is in my test-suite (with equations as images): >> http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml >> http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html > > by the way, even though aaron's latest post was just the same old schtick, > it reminded me that since tex is just ascii (right?), there's no reason > that > i can't bundle it in. so just save your tex file(s) alongside the main > text, > and my viewer will send them to the tex processor of the user's choice. > this -- as i have told you -- is a method that can be used for _any_ type > of auxiliary file(s) that you might want to bundle inside of your e-book. > so they wouldn't even have to be ascii files; any file-format would do... On an earlier occasion, he also wrote: "> Sure we do. We use TeX (or pseudo-TeX fragments). and that's why that's what i'll probably do as well, when the time comes that i feel that it's necessary, because that's my modus operandi, to utilize the existing conventions, to best leverage current work." "but for now, i'm not at all worried about this 'problem'." So, I'm curious. can we expect support for equations some time before the death of the universe? What other plain-as-day modifications to ZML can we lead you to? Don't worry, those were rhetorical questions. I wouldn't want to trick you in to accidentally answering me after you promised not to. Aaron Cannon - -- Skype: cannona MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFTAAgI7J99hVZuJcRAmzXAKChO19RmWqe0yvFI/WlOC1IWutu0gCgqZdN aICB3d+EUkmgMf2mBDiQxic= =XLMv -----END PGP SIGNATURE----- From schultzk at uni-trier.de Sat Nov 4 11:28:28 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Sat Nov 4 11:28:38 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <454B4516.3070605@perathoner.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> <454B2C66.5020100@perathoner.de> <454B4516.3070605@perathoner.de> Message-ID: <3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de> Hi Marcello, Am 03.11.2006 um 14:33 schrieb Marcello Perathoner: > Schultz Keith J. wrote: > >> I know what it takes top do the job! It is my profession: >> Linguistics. > > but he also wrote: > >> The only place I heard about TEI is here. > > "The TEI was originally sponsored by the Association of Computers > in the > Humanities (ACH), the Association for Computational Linguistics (ACL), > and the Association of Literary and Linguistic Computing (ALLC). Major > support has been received from the U.S. National Endowment for the > Humanities (NEH), the European Community, the Mellon Foundation, > and the > Social Science and Humanities Research Council of Canada." > > http://www.tei-c.org/ A lot of things get sponsered in the acedemic world, but that does not mean it gets very far in the WORLD and it still does not mean that TEI is useful in linguistics!!? Like I said there are a lot better systems around for doing work in linguistics. > > > A list of 134 projects using TEI can be found here: > > http://www.tei-c.org/Applications/ > > >> I would have already set up a working system 10 years ago, if >> Micheal >> would have allowed it. But, I will not start doing anything >> unless >> it gets an official go ahead and will be used by PG officially. > > I'm sure you would! Especially with the great competence you have > already shown in your very own professional field. > > Bottom line: You never will get PG to officially endorse any one > format. That is why you still have gotten your tools finished, or have someone like me step and program it for you in no time flat. > > Your only chance is to create a format that is so much better than any > other format, that PG and DP volunteers don't want to use anything > else. > At that point it will automatically become the "official" format. I am not on an ego trip and do not reinvent the wheel. > > Tip: don't go to DP and piss off everybody in sight like BB did. DP > people will have a very loud say in this question because its they > that > create the books. > If I go to DP. I would look at their concept and specifications and ask how I can help or offer and discuss better ways of doing things. As far as TEI and zml are concerned if you a specific question on how implement something -- just ask. Keith From schultzk at uni-trier.de Sat Nov 4 11:46:23 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Sat Nov 4 11:46:32 2006 Subject: !@!Re: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: References: <12192433.1162564273211.JavaMail.?@fh1063.dia.cp.net> Message-ID: <20ED1141-1350-4E89-9DC5-C23043E04F4C@uni-trier.de> Hi Micheal, Glad you step-in. I have to disgree with you one one point though If PG had as a base format a minimalistic markup then all would benefit from that. It would be easier if in the scanning and proofing process information about type, chapters, paragraphs, footnotes, graphics were preserved !! With this information then Plain Vanilla Texts, TEI, zml, html, etc can be easily created ! I am not saying one is better than the other or if one wish to develop something new he should not. The exact opposite is the case. But, if one has to reedit the files from scratch we are wasting valuable resources!! THIS IS WHY I SAY PG NEEDS A BETTER BASE FORMAT THAN PVT. regards Keith Am 03.11.2006 um 17:37 schrieb Michael Hart: > > Neither Joshus Hutchinson, Marcello Perathoner, or Keith Shultz are > correct, > and, in two of these three cases, this is part of a long running > pattern, so > it is likely an intentional error. > > Refutations applied to each comment below: > > > On Fri, 3 Nov 2006, joshua@hutchinson.net wrote: > >>> ----Original Message---- From: marcello@perathoner.de >>> >>> Schultz Keith J. wrote: >>> >>>> I would have already set up a working system 10 years ago, >>>> if Micheal >>>> would have allowed it. But, I will not start doing anything >>>> unless >>>> it gets an official go ahead and will be used by PG officially. > > 1. Project Gutenberg encourages all such system setups and always > has. > > 2. You are now, and always have been, welcome to your own directories > on the official Project Gutenberg servers to work with. Greg Newby > will be only too glad to set up any such services you like, and even > to help recruit volunteers to help you. > > >>> Bottom line: You never will get PG to officially endorse any one >>> format. > > To the extent of an exclusivve endorsement that would disallow > other formats > that is is most likely true, but if you don't have faith in your > own format, > then you can't expect anyone else to have it. > > There will be no government sponsored religions here. > > >>> Your only chance is to create a format that is so much better >>> than any other format, that PG and DP volunteers don't want to >>> use anything else. At that point it will automatically become the >>> "official" format. > > As it should be. > > > >> Marcello is right. Greg and Michael have both said, in private >> communications >> and in public messages, that they will not dictate direction in PG. > > We will not dictate one direction at the expense of other similar > efforts. > > This is not a race to create the official exclusive Project > Gutenberg format. > > We will present eBooks in lots of formats, particularly those > request by > those to read our books. > > >> Michael, especially, likes the "throw it against the wall and see >> if it sticks" method of management. > > The actual quotation is: > > "We encourage you to run your ideas up the flagpole and see who > salutes." > > If you can't get anyone to use your format, we are hardly going to > force > your ideas through anyone's alimentary canal, either. > > >> While it can be frustrating at times, because a more decisive >> leadership can often "make things happen," this is not something >> that is likely to change. Ever. So, we have to plan with that in >> mind. > > > "Make things happen" is exactly what Project Gutenberg encourages. > > What you seem to want is for someone else to "make things happen" > for you. > > We'll help, but we won't do it for you. > > And we won't declare your or your format the "official" winner. > > There will always be be room for improvements. > > Project Gutenberg is a dyanmic process to maximize the eBook > potential, > not a static system to be once achieved and then left as a fossil > record. > > "Make Things Happen" > > Don't wait for someone else to tell you that your idea has happened. > > > >> Unfortunately, it is a bit of a chicken and egg thing. Until I can >> make TEI more popular with folks, the developers don't make the >> tools. >> And until I have the tools, I can't get enough people to use it to >> reach critical mass. I can create texts (and do) in TEI, but I don't >> have the skills to make tools for helping in the creation of said >> TEI docs. > > Just start by posting your examples and pointing to them. > > That's how YouTube, MySpace, Google, Yahoo, and Project Gutenberg > started. > > Don't expect to start at the end, it helps to start at the beginning. > > >> As Jon Noring agreed earlier, there are NEVER enough developers to go >> around! ;) >> >> Josh > > > Or, there are too many developers creating not enough example eBooks > to generate any interest. > > Over the 10 years mentioned at the top of this commentary, if you had > created just one eBook per month for any particular new style of > format > then you would have a collection of well over 100 eBooks to > demonstrate. > > Without such an initial collection, it's hard to expect anyone to > come. > > "Build it, and they will come." > > > > Thanks!!! > > Give the world eBooks in 2006!!! > > Michael S. Hart > Founder > Project Gutenberg > > Blog at http://hart.pglaf.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Sat Nov 4 11:52:00 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Sat Nov 4 11:52:10 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <454B7392.1060201@perathoner.de> References: <454B3A94.3000301@perathoner.de> <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> <454B7392.1060201@perathoner.de> Message-ID: <6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de> Hi There, Just loaded it in camino came up fine. I also displayed the source it xml and IE thinks it is xml so it tries to interpret it as xml ! BTW. no mention in your source about "text/plain" Sorry, whose mistake? Keith. Am 03.11.2006 um 17:51 schrieb Marcello Perathoner: > Dave Fawthrop wrote: >> On Fri, 03 Nov 2006 13:48:20 +0100, Marcello Perathoner >> wrote: >> >> | http://www.gnutenberg.de/bowerbird/poo.tei >> The XML page cannot be displayed >> Cannot view XML input using style sheet. Please correct the error >> and then >> click the Refresh button, or try again later. >> >> >> --------------------------------------------------------------------- >> ----------- >> >> The system cannot locate the object specified. Error processing >> resource >> 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent... >> >> %TEI.extensions.ent; >> <<< >> >> So modern I can not view it with latest IE 6.0.2900.2180 >> >> What use is something which *ordinary* people can not read? > > The error is in your browser. The object is served as "text/plain". > That > means: It is to be displayed to the user without further ado. IE > has no > business trying to interpret it in any way. > > > $ wget -S http://www.gnutenberg.de/bowerbird/poo.tei > --17:26:32-- http://www.gnutenberg.de/bowerbird/poo.tei > => `poo.tei' > Resolving localhost... 127.0.0.1 > Connecting to localhost|127.0.0.1|:8118... connected. > Proxy request sent, awaiting response... > HTTP/1.1 200 OK > Date: Fri, 03 Nov 2006 16:26:32 GMT > Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1 > Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT > ETag: "27400a-7c6c-2fe12c00" > Accept-Ranges: bytes > Content-Length: 31852 > Content-Type: text/plain; charset=utf-8 > Connection: close > Length: 31,852 (31K) [text/plain] > > 100%[====================================>] 31,852 123.50K/s > > 17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852] > > $ > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Sat Nov 4 12:06:44 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Sat Nov 4 12:06:52 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <454BC95F.90807@bohol.ph> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> <454BC95F.90807@bohol.ph> Message-ID: <41FAA4BF-8415-4C4B-8235-A564955CC64C@uni-trier.de> Hi Jereon, I was not talking about TEI being used by people, but by PG itsself. Marcello has said they do not have the tools!! Keith Am 03.11.2006 um 23:57 schrieb Jeroen Hellingman (Mailing List Account): > Schultz Keith J. wrote: >> >> First off, his method is not any better than anything or worse >> than your socalled TEI ! As you said it is not finished nor >> anywhere near it. The TEI movement has been around for at least >> 5 years. As far as I am concerned it is vaporware. > I have been using TEI for 8 years. I have posted over 200 ebooks > produced using TEI. > My tools are available on-line, and the TEI master files can be had > for > the asking. The reason > I haven't posted the TEI themselves is simply because I haven't taken > enough care to > make my tools ready for prime time. TEI isn't vaporware, but a format > that is actually and > widely used, however, since it is fairly flexible and very rich, and > since it approaches text from the semantic, > not the visual appearance edge, tools for it are not always up-to-task > to generate a good-looking > or even acceptable rendered results in a multitude of formats, such as > HTML, PDF, plain text, etc. > With increasing complexity and features of texts, some programming and > fine-tuning of processes > will be required... > > Jeroen. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Sat Nov 4 12:15:56 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Sat Nov 4 12:16:04 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <20061104013550.GA21170@pglaf.org> References: <20061104013550.GA21170@pglaf.org> Message-ID: <91C6CECB-0128-4730-B1B0-51CEE6E0B158@uni-trier.de> Hi Greg, What I am talking about is a base format which would require that all PG texts be scanned and process to contain information on chapter, pages, paragraphs, type, etc.. It would not exclude conversion to other formats, but greatly simplify the process. Also, the format could contain information to be imported the databases use by PG-Website. As I am only one person, who may not see all implecation or aspects others would be needed. Also, it would have to be fully endorsed by PG. If PG is interested in developing such a standard I am your man. regards Keith. Am 04.11.2006 um 02:35 schrieb Greg Newby: > On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote: >> ... >> I would have already set up a working system 10 years ago, if >> Micheal >> would have allowed it. But, I will not start doing anything unless >> it gets an official go ahead and will be used by PG officially. >> My time is to precious to waste on anything else. > > Keith, here is your official "go ahead." > As Michael responded, we'll use essentially any reasonable format. > So if you have something in mind, go for it. > > If you were thinking that your official format would exclude other > formats, that's a different issue. I don't think that will happen > too soon... > > Officially yours, > -- Greg > > Dr. Gregory B. Newby > Chief Executive and Director > Project Gutenberg Literary Archive Foundation http://gutenberg.net > A 501(c)(3) not-for-profit organization with EIN 64-6221541 > gbnewby@pglaf.org > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Sat Nov 4 12:42:26 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 4 12:42:31 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de> References: <1162248212.5857.1.camel@localhost.localdomain> <45477F3D.70205@perathoner.de> <064A2694-3938-46C7-810D-4651CDCABDC6@uni-trier.de> <4549DC5F.8010002@perathoner.de> <73E053C8-26F7-487B-A072-5D711A6B18C0@uni-trier.de> <454B2C66.5020100@perathoner.de> <454B4516.3070605@perathoner.de> <3FEB672A-57D7-4F44-972C-7D8F78BFE843@uni-trier.de> Message-ID: <454CFB32.2080308@perathoner.de> Schultz Keith J. wrote: > A lot of things get sponsered in the acedemic world, but that does > not mean it gets very far in the WORLD and it still does not mean > that TEI is useful in linguistics!!? Like I said there are a lot better > systems around for doing work in linguistics. It would be much more helpful if you did say *which* systems are better than TEI. > That is why you still have gotten your tools finished, or have > someone like me step and program it for you in no time flat. Your words are big enough. Now if you could show me some of your big deeds maybe I could start to take you seriously. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Nov 4 12:46:12 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 4 12:46:15 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de> References: <454B3A94.3000301@perathoner.de> <0bjmk2d51jmp5gbqnj8u5qajcfrhsgg9k8@4ax.com> <454B7392.1060201@perathoner.de> <6E736CFF-8A2B-493C-BB02-DE7F24D44860@uni-trier.de> Message-ID: <454CFC14.1000307@perathoner.de> Schultz Keith J. wrote: > Just loaded it in camino came up fine. I also displayed the source > it xml and IE thinks it is xml so it tries to interpret it as xml ! > BTW. no mention in your source about "text/plain" The web server tells you that in the "Content-Type" field of the HTTP response header. To see the header say this: wget -S http://www.gnutenberg.de/bowerbird/poo.tei -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Sat Nov 4 19:14:22 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Sat Nov 4 19:14:24 2006 Subject: [gutvol-d] The Proof is in the Poo Message-ID: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> The problem is that IE is not following the "proper" web standards. The server sends information that the file is text/plain. Which means IE should treat it like a plain text file and display it accordingly, ignoring any and all markup. IE, which as we all know, is notorious for playing fast and loose with the w3c specs. In this case, it ignores the server and just looks directly at the file and tries to parse it alone, which, as you can see from the results, it is unable to do properly. The gist is, the TEI files are not meant to be parse by a web browser, so the fact that they DON'T display properly basically means everything is working according to design. Josh >----Original Message---- >From: schultzk@uni-trier.de >Date: Nov 4, 2006 14:52 >To: "Project Gutenberg Volunteer Discussion" >Subj: Re: [gutvol-d] The Proof is in the Poo > >Hi There, > > Just loaded it in camino came up fine. I also displayed the source > it xml and IE thinks it is xml so it tries to interpret it as xml ! > BTW. no mention in your source about "text/plain" > > Sorry, whose mistake? > > Keith. > >Am 03.11.2006 um 17:51 schrieb Marcello Perathoner: > >> Dave Fawthrop wrote: >>> On Fri, 03 Nov 2006 13:48:20 +0100, Marcello Perathoner >>> wrote: >>> >>> | http://www.gnutenberg.de/bowerbird/poo.tei >>> The XML page cannot be displayed >>> Cannot view XML input using style sheet. Please correct the error >>> and then >>> click the Refresh button, or try again later. >>> >>> >>> --------------------------------------------------------------------- >>> ----------- >>> >>> The system cannot locate the object specified. Error processing >>> resource >>> 'http://www.tei-c.org/P4X/DTD/pgtei-extensions.ent... >>> >>> %TEI.extensions.ent; >>> <<< >>> >>> So modern I can not view it with latest IE 6.0.2900.2180 >>> >>> What use is something which *ordinary* people can not read? >> >> The error is in your browser. The object is served as "text/plain". >> That >> means: It is to be displayed to the user without further ado. IE >> has no >> business trying to interpret it in any way. >> >> >> $ wget -S http://www.gnutenberg.de/bowerbird/poo.tei >> --17:26:32-- http://www.gnutenberg.de/bowerbird/poo.tei >> => `poo.tei' >> Resolving localhost... 127.0.0.1 >> Connecting to localhost|127.0.0.1|:8118... connected. >> Proxy request sent, awaiting response... >> HTTP/1.1 200 OK >> Date: Fri, 03 Nov 2006 16:26:32 GMT >> Server: Apache/2.0.55 (Debian) mod_jk/1.2.18 PHP/5.1.6-1 >> Last-Modified: Fri, 03 Nov 2006 12:50:56 GMT >> ETag: "27400a-7c6c-2fe12c00" >> Accept-Ranges: bytes >> Content-Length: 31852 >> Content-Type: text/plain; charset=utf-8 >> Connection: close >> Length: 31,852 (31K) [text/plain] >> >> 100%[====================================>] 31,852 123.50K/s >> >> 17:26:32 (123.38 KB/s) - `poo.tei' saved [31852/31852] >> >> $ >> >> -- >> Marcello Perathoner >> webmaster@gutenberg.org >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From sam.bretheim at gmail.com Sat Nov 4 22:14:16 2006 From: sam.bretheim at gmail.com (Sam Bretheim) Date: Sat Nov 4 22:14:45 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> Message-ID: <454D8138.6040307@gmail.com> joshua@hutchinson.net wrote: > The gist is, the TEI files are not meant to be parse by a web browser, > so the fact that they DON'T display properly basically means everything > is working according to design. > It's worth mentioning that modern Web browsers are quite capable of displaying TEI reasonably well, though some work on the relevant TEI and XSL stylesheets is necessary before they're ready to be widely used. For instance, if the author had inserted the following near the beginning of the document, it would have rendered quite tolerably in recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, Opera, and iCab. (IE and Amaya have trouble with some of the code in this CSS file; I'll try to figure out how to make them display TEI properly.) Here are two books I'm in the midst of proofing and marking up, both of which look fairly good when viewed with that CSS stylesheet: http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml From joshua at hutchinson.net Sun Nov 5 05:29:17 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Sun Nov 5 05:29:20 2006 Subject: [gutvol-d] TEI rendering in Web browsers Message-ID: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net> Yep, you *can* make tei render. And maybe someday it is something we will work on, but right now, the design decision is that the tei is master file and NOT an end-user document. (And some of the elements in some tei elements will never render directly in a web document with just a bit of css due to its dynamic nature) Josh >----Original Message---- >From: sam.bretheim@gmail.com >Date: Nov 5, 2006 1:14 >To: "Project Gutenberg Volunteer Discussion" >Subj: [gutvol-d] TEI rendering in Web browsers > >joshua@hutchinson.net wrote: >> The gist is, the TEI files are not meant to be parse by a web browser, >> so the fact that they DON'T display properly basically means everything >> is working according to design. >> > >It's worth mentioning that modern Web browsers are quite capable of >displaying TEI reasonably well, though some work on the relevant TEI and >XSL stylesheets is necessary before they're ready to be widely used. > >For instance, if the author had inserted the following near the >beginning of the document, it would have rendered quite tolerably in >recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, >Opera, and iCab. (IE and Amaya have trouble with some of the code in >this CSS file; I'll try to figure out how to make them display TEI >properly.) > >href="http://www.shinparam.org/Sam/Projects/TEI-CSS/prettynovel.css"? > > > >Here are two books I'm in the midst of proofing and marking up, both of >which look fairly good when viewed with that CSS stylesheet: > >http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml > >http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft. xml >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From desrod at gnu-designs.com Sun Nov 5 06:08:56 2006 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Sun Nov 5 06:09:36 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> Message-ID: > The problem is that IE is not following the "proper" web standards. > The server sends information that the file is text/plain. Which > means IE should treat it like a plain text file and display it > accordingly, ignoring any and all markup. Does PG have TEI as a registered mime type? http://www.iana.org/assignments/media-types/ In our own space, Plucker has one[1].. and it might further adoption if TEI had one registered as well, so webserver maintainers could set the proper AddType directive and serve TEI documents with a mime type that browsers (even broken ones like MSIE) could render properly. [1] http://www.iana.org/assignments/media-types/application/prs.plucker David A. Desrosiers desrod@gnu-designs.com http://gnu-designs.com From jon at noring.name Sun Nov 5 06:48:16 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 06:46:28 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net> References: <12274920.1162733357578.JavaMail.?@fh1039.dia.cp.net> Message-ID: <865311856.20061105074816@noring.name> Joshua wrote: > Yep, you *can* make tei render. And maybe someday it is something we > will work on, but right now, the design decision is that the tei is > master file and NOT an end-user document. (And some of the elements in > some tei elements will never render directly in a web document with > just a bit of css due to its dynamic nature) As Joshua notes, it is certainly possible to make TEI, and most any XML vocabualary, render using CSS. However, TEI may include structures which HTML never had to support, so if these structures are used in TEI markup, it becomes more difficult to properly render that document in web browsers. The prime example of such a TEI construct is the tag, where a note can be embedded right within the main flow of the book. For a couple TEI examples of using the element, see: http://www.tei-c.org/P4X/ref-NOTE.htmlIn In direct TEI rendering we want the note to be extracted from the main flow and rendered separately. Unfortunately the HTML vocabulary never included this construct, so web browsers never have had to natively develop something to handle inline notes. Thus, without special CSS, the inline note simply merges with the main text when rendered. Not pretty. Here's a demo of how an inline TEI might be rendered using CSS (my aesthetic skill with CSS sucks royally, but you get the picture): http://www.windspun.com/demoxml/demonote.xml The above demo works best in Opera (Opera has the best and widest CSS conformance, mainly because the Opera CTO, Hakon Lie, is the co-inventor of CSS), but also renders o.k. in Firefox (looks pretty good in the recent Firefox 2.0). The demo will NOT work in IE6 -- I haven't yet tried the new IE7. Most processing with TEI these days is to use it as a master format, and then use XSLT to convert it to XHTML optimized for web browsers. Things like are yanked and moved somewhere else during the transformation. It is the intent of the OpenReader format to someday natively support TEI, or some "ebook subset" of TEI (been in touch with Sebastian Rahtz on this). Since OpenReader supports the OEBPS out-of-spine feature, all OpenReader reading clients will already have a built-in means to natively handle the TEI and do so in innovative ways not requiring special user-provided CSS. Jon Noring From jon at noring.name Sun Nov 5 07:14:34 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 07:12:43 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <454D8138.6040307@gmail.com> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> Message-ID: <1912936803.20061105081434@noring.name> Sam Bretheim wrote: > It's worth mentioning that modern Web browsers are quite capable of > displaying TEI reasonably well, though some work on the relevant TEI and > XSL stylesheets is necessary before they're ready to be widely > used.... > > Here are two books I'm in the midst of proofing and marking up, both of > which look fairly good when viewed with that CSS stylesheet: > > http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml > > http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml Cool! Good work! Referring to my prior message a few minutes ago, where I showed how CSS may be used to handle TEI used inline, I notice from looking at your CSS commentary that PGTEI does not support used inline, which is understandable... Jon From jon at noring.name Sun Nov 5 08:20:17 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 08:18:28 2006 Subject: [gutvol-d] Oops, got it wrong... (was TEI rendering in Web browsers) In-Reply-To: <1912936803.20061105081434@noring.name> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> Message-ID: <558740280.20061105092017@noring.name> Jon Noring previously wrote: > Referring to my prior message a few minutes ago... I notice from > looking at [Sam's] CSS commentary that PGTEI does not support > used inline, which is understandable... Oops, got this totally wrong. Sorry. I just referred to Marcello's PGTEI guide (PG text #20000) and misinterpreted what Sam meant by non-support for inline notes. He does not refer to the placement of the note in the PGTEI document (a note certainly can be placed inline at the point of reference), but rather to how it is to be dealt with in conversion and rendering. Here's a PGTEI markup example from the Marcello's PGTEI guide: *********************************************************************

When I was a boy, there was but one permanent ambition among my comrades in our village

Hannibal, Missouri.

on the west bank of the Mississippi River. That was, to be a steamboatman. ...

Will be rendered as: When I was a boy, there was but one permanent ambition among my comrades in our village(3) on the west bank of the Mississippi River. That was, to be a steamboatman. ... ********************************************************************* Indeed in PGTEI the note can be placed inline in the paragraph at the point of reference. Btw, aren't there a few old books where notes do appear inline at the point of reference? Jon From Bowerbird at aol.com Sun Nov 5 09:53:52 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Nov 5 09:54:13 2006 Subject: [gutvol-d] gvd061104 -- another commentary on consistency Message-ID: keith said: > If I go to DP. I would look at their concept and specifications and > ask how I can help or offer and discuss better ways of doing things. and when you do that, you will "piss off everybody in sight", as marcello so delicately put it... unless you're meek and obsequious, in which case you will be tolerated... and ignored... *** keith said: > If PG had as a base format > a minimalistic markup > then all would benefit from that. just to let you know, keith, p.g. _does_ have "a minimalistic markup", at least "officially"... for instance, the rule is that four blank lines should precede a header, and two follow it. the problem -- especially bad in the past -- is they are too casual about quality-control, so the e-texts are _inconsistent_ in regard to their adherence to this minimalistic markup... so, to use our example, you will find _some_ headers occasionally have _three_ blank lines preceding them, or only one following them... _if_ the e-texts _were_ consistent, it would be good, because then tools could be created to bring about a wide variety of benefits for users. again with the headers, it would be easy as pie to create a _library-wide_ "table of contents" -- each header hotlinked to its respective chapter. i've already demonstrated how to make that work. z.m.l. merely codifies p.g.'s "minimalistic markup" -- adding a few extensions to it, where required -- which is why so many e-texts are "z.m.l.-ready" right out of the box. the "extension" that was necessary for headers was to specify that different "levels" of headers can be indicated by using _more_than_four_ blank lines before the header, with more blank lines indicated a higher priority of the heading... what i'd say to you, keith, is that you already have what you need -- at least "officially" -- but you'll find when you get "on the ground" (inside the e-texts), the "official" standard is not being applied consistently, unfortunately. that's what i've found. and when i traced it, that's what _every_ programmer has found. this is why i would be very happy if all of the e-texts were marked up in .tei. not because i think .tei is the answer -- it's not -- but because i would finally be getting _consistent_ e-texts... so i wish josh would get to work on that .tei stuff. (of course, taking on the complexity of .tei so you can get the easy benefit of _consistency_ is overkill, but as long as it's _somebody_else_ paying the cost in terms of their time and energy, _i_ don't mind...) *** on an encouraging note, the level of consistency _has_ been increasing in recent years, due to d.p. (for the text versions, anyway. now it's the .html versions that are the rat's nest of inconsistencies.) but, on a discouraging note, there are still _huge_ holes in the text versions that are being created... the first is that image-names are not being included. the _caption_ for the image is included, but the actual _name_ of the image-file is not. i have _requested_ that the image-file-name be included, but this request was _turned_down_. indeed, turned down _repeatedly_. it's as if the "powers that be" are _deliberately_ trying to make the text-file as impotent as possible. astounding. another problem is that linebreak and pagebreak information is routinely tossed out of the text files. again, it's tragic this information is being discarded. the pagebreak information, at least, is being retained in the .html version, at least by _some_ postprocessors. and of course the .html versions have the information on the filenames of the graphics that were in the book. thus i've written routines that scour the .html file to get this information and restore it to the text version. but it is a shame that work has to be done just to restore the information that distributed proofreaders tossed out; still, if they're gonna be _stubborn_ about their stupidity, at least it's good that the information _can_ be recovered. the information about _linebreaks_, for instance, is gone, and cannot be recovered. what that means, ultimately, is that all the books that are in the p.g. library will have to be subjected to o.c.r. again, this time _retaining_ the linebreaks. (thank _goodness_ that google is doing all the scanning, so we can be sure that all of the books in p.g. will be scanned, eventually...) then the p.g. version can be used to _proof_ that new o.c.r. after that, though, the p.g. version will simply be discarded. and the d.p. volunteers, who _thought_ they were working against the backdrop of all time, will be quite saddened (and perhaps angered as well) to learn that they had been misled. all because they didn't save linebreak information. sad, eh? they thought linebreaks were "superfluous" in an e-book. well, maybe they are, but since it will be simple for people in the future to toss the linebreaks _if_they_want_to_do_so_, why not keep 'em, in case those future people _want_ them? besides, in the near future, when we're still making the shift between paper-books and their electronic cousins, anything that helps maintain synergy between the two (like linebreaks) is something that we cannot afford to be tossing out casually. i'm still waiting for the postprocessor who is brave enough to make the decision to keep the linebreaks in the books they do. (in that vein, big congratulations to chuck greif, who recently posted #19703, whose chapter-headings link _back_ to the table-of-contents, which i've recommended for a long time.) the other main shortcoming in the text versions these days is a failure to indicate which lines should _not_ be rewrapped... this is a terrible problem that causes the most havoc with the conversions to other formats, since poetry, lists, and tables -- even tables of contents and tables of illustrations -- and sundry other stuff (such as address blocks) get mangled... this problem could be easily solved with a dirt-simple rule. the one i use in z.m.l. is that any line with leading whitespace (a space or a tab) is not to be wrapped. so all a person has to do is put a leading space on such lines. this little rule makes a huge difference by making sure lines aren't incorrectly wrapped, and programmers can write simple routines to do rewrapping... for books where _nothing_ is to be rewrapped (e.g., poetry), you can just do a global change at the end of all your editing to change every newline into a newline followed by a space... also, this rule nicely delineates a _block_, such as a table, because all of the lines in the block are indented, while the empty lines above and below them set off the block... again, just one more demonstration where a very simple rule -- easy enough for an average 4th-grader to grok and use -- can give us tremendous power if we only leverage it correctly... but of course, all the good rules in the world don't matter much if you ain't willing to spend the time to make sure you follow 'em. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061105/b5db8b06/attachment.html From gbnewby at pglaf.org Sun Nov 5 11:39:41 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Nov 5 11:39:42 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <91C6CECB-0128-4730-B1B0-51CEE6E0B158@uni-trier.de> Message-ID: <20061105193940.GA5654@pglaf.org> On Sat, Nov 04, 2006 at 09:15:56PM +0100, Schultz Keith J. wrote: > Hi Greg, > > What I am talking about is a base format which would require > that all PG texts be scanned and process to contain information > on chapter, pages, paragraphs, type, etc.. Your use of the phrase of "require that all PG texts..." indicates you're not paying attention. What you are asking for, will not be granted. What is granted, to you or anyone, is in my email below. Both what will and will not be granted has already been stated by many people in this discussion thread, and is in the About articles I mentioned. It sounds like you need to find or start another project to meet your goals (also described in the About articles, and similarly encouraged). -- Greg > It would not exclude conversion to other formats, but greatly > simplify > the process. Also, the format could contain information to be > imported > the databases use by PG-Website. > > As I am only one person, who may not see all implecation or aspects > others would be needed. Also, it would have to be fully endorsed by > PG. > If PG is interested in developing such a standard I am your man. > > regards > Keith. > > Am 04.11.2006 um 02:35 schrieb Greg Newby: > > >On Fri, Nov 03, 2006 at 01:38:00PM +0100, Schultz Keith J. wrote: > >>... > >> I would have already set up a working system 10 years ago, if > >>Micheal > >> would have allowed it. But, I will not start doing anything unless > >> it gets an official go ahead and will be used by PG officially. > >> My time is to precious to waste on anything else. > > > >Keith, here is your official "go ahead." > >As Michael responded, we'll use essentially any reasonable format. > >So if you have something in mind, go for it. > > > >If you were thinking that your official format would exclude other > >formats, that's a different issue. I don't think that will happen > >too soon... > > > >Officially yours, > > -- Greg > > > >Dr. Gregory B. Newby > >Chief Executive and Director > >Project Gutenberg Literary Archive Foundation http://gutenberg.net > >A 501(c)(3) not-for-profit organization with EIN 64-6221541 > >gbnewby@pglaf.org > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d From traverso at dm.unipi.it Sun Nov 5 12:19:38 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sun Nov 5 12:15:03 2006 Subject: [gutvol-d] Oops, got it wrong... (was TEI rendering in Web browsers) In-Reply-To: <558740280.20061105092017@noring.name> (message from Jon Noring on Sun, 5 Nov 2006 09:20:17 -0700) References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> <558740280.20061105092017@noring.name> Message-ID: <200611052019.kA5KJc414884@pico.dm.unipi.it> >>>>> "Jon" == Jon Noring writes: Jon> Btw, aren't there a few old books where notes do appear Jon> inline at the point of reference? Many old books have side notes in the margins, not exactly inline, although on the same line. I have never seen one with notes exactly inline. But when it is inline, how to you tell that it is a note and not an incidental remark? Carlo From jon at noring.name Sun Nov 5 13:26:42 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 13:24:52 2006 Subject: [gutvol-d] Oops, got it wrong... (was TEI rendering in Web browsers) In-Reply-To: <200611052019.kA5KJc414884@pico.dm.unipi.it> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> <558740280.20061105092017@noring.name> <200611052019.kA5KJc414884@pico.dm.unipi.it> Message-ID: <556830562.20061105142642@noring.name> Carl wrote: > Many old books have side notes in the margins, not exactly inline, > although on the same line. I have never seen one with notes exactly > inline. > > But when it is inline, how to you tell that it is a note and not an > incidental remark? Good point. Contentwise, the distinction between a note and an incidental remark may sometimes be difficult to discern. It depends upon a bunch of factors, such as if the snippet of text is typographically distinguished from the surrounding text, or if it was inserted by the editor (not the author) and noted thusly, etc. In some cases it indeed can be murky whether or not the inline snippet can be "reassigned" to be a note, footnote, endnote, etc. I suppose if the book contains lots of referenced notes, but then has this inline snippet, the snippet should not be considered a note but instead simply part of the main text. But what if otherwise the book has no other notes? It is a case-by-case basis, I suppose. I assume the decision by PGTEI not to support "inline" among the attribute values for type of is based on a lot of experience with real-world texts already worked on by PG and DP, which includes a lot of older books wherein we find all kinds of difficult conventions no longer in use today. Jon From marcello at perathoner.de Sun Nov 5 13:52:10 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Nov 5 13:52:15 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> Message-ID: <454E5D0A.90904@perathoner.de> David A. Desrosiers wrote: > Does PG have TEI as a registered mime type? TEI is a format maintained by the TEI Consortium: http://www.tei-c.org I can ask them what their plans are ... -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sun Nov 5 14:02:13 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Nov 5 14:02:24 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <1912936803.20061105081434@noring.name> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> Message-ID: <454E5F65.2080909@perathoner.de> Jon Noring wrote: > I notice from > looking at your CSS commentary that PGTEI does not support used > inline, which is understandable... PGTEI supports inlined elements and can render them as run-in, page footnote, any section endnote, book endnote or page margin note. -- Marcello Perathoner webmaster@gutenberg.org From sam.bretheim at gmail.com Sun Nov 5 14:14:23 2006 From: sam.bretheim at gmail.com (Sam Bretheim) Date: Sun Nov 5 14:15:29 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <454E5F65.2080909@perathoner.de> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> <454E5F65.2080909@perathoner.de> Message-ID: <454E623F.3010107@gmail.com> Marcello Perathoner wrote: > Jon Noring wrote: > > >> I notice from >> looking at your CSS commentary that PGTEI does not support used >> inline, which is understandable... >> > > PGTEI supports inlined elements and can render them as run-in, > page footnote, any section endnote, book endnote or page margin note. > From the PGTEI 0.4 guide's section on the element: "The *place* attribute supports only the values of *foot*, *end* and *margin*." Jon's comment was referring to , not the general notion of inlined elements. From marcello at perathoner.de Sun Nov 5 14:39:07 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Nov 5 14:39:12 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <454E623F.3010107@gmail.com> References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net> <454D8138.6040307@gmail.com> <1912936803.20061105081434@noring.name> <454E5F65.2080909@perathoner.de> <454E623F.3010107@gmail.com> Message-ID: <454E680B.9010305@perathoner.de> Sam Bretheim wrote: >> PGTEI supports inlined elements and can render them as run-in, >> page footnote, any section endnote, book endnote or page margin note. >> > > From the PGTEI 0.4 guide's section on the element: > > "The *place* attribute supports only the values of *foot*, *end* and > *margin*." > > Jon's comment was referring to , not the general > notion of inlined elements. You can collect endnotes at any given point in your text. If you place someplace in your text it will output all notes in the section identfied by id. In the newest version (to be released) a will just be converted into a with no special formatting attached. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Sun Nov 5 15:15:28 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Nov 5 15:15:36 2006 Subject: [gutvol-d] gvd061105 -- to mimic the competitors left back in the dust Message-ID: sam said: > modern Web browsers are quite capable > of displaying TEI reasonably well, though > some work on the relevant TEI and XSL stylesheets > is necessary before they're ready to be widely used. ... > http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml > http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml first, nice work, on both of them. the headers are good -- big, and bold, and start on a new page when printed to .pdf... i woulda liked a hotlinked table-of-contents, as a bare minimum. and the dictionary just cries out for hotlinks in the "see:" entries and elsewhere, but as i said, overall, they look nice. *** second, we don't need a 1.3-meg example, or a 2.3-meg one either. how about you mark up my nice little 17k test-suite, eh? > http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.zml there are also interesting philosophical issues that are raised by the dictionary example, but i won't jump in to that can of worms quite yet... (the lead question, if you want to start preparing: is a 2.3-meg scrolling-field the best way to display a dictionary, where most people, most of the time, likely want to view just one or a few specified terms? that segues to a critical look at browser as e-book.) *** third, examination of your markup shows a very high level of expertise. congratulations on attaining that. how much time are you willing to volunteer to p.g.? the answer will be telling. we can't expect volunteers to develop expertise like this. conversely, we can't expect people who have such expertise to volunteer much time. so a big question is this: exactly who does the markup? i'm guessing you'll agree that botched markup is worse than none at all. but botched markup is all you will get from 89% of the people even willing to take on the task. so how much time would you have to supervise people? and do you have a generous tolerance for incompetence? another big question: where is the demand for markup? do average readers really need it? i seriously doubt it... even if it came for free (and it doesn't), who needs it? *** fourth, the ability to load .tei directly into the browser is where the .tei needs to head, because it is pointless to spend the resources to create a high-quality format and then convert that to lesser-and-dumber formats... but that conflict is telling too, because project gutenberg is targeted squarely and directly at "the ordinary people", so the expectation of "modern browsers" cannot be made. to the contrary, to hit that target, we are forced to assume _trailing-edge_machinery_ that only runs _old_software_, or (at best) apps we make that execute on low resources. yeah, i know, that's no fun, is it? we like to be all modern, play with the fast, new, shiny toys, don't we? sure we do... ...but... 3 out of 4 computers are still running internet explorer 6. yeah, it's a piece-of-crap browser, you'll get no argument from me on that, i threw that shit program away long ago, along with just about every other microsoft app i ever had. but you and i both know that, if that 75% penetration for ie6 changes substantially in the future, the gains will go to ie7, because the people running ie6 now are microsoft zombies. (so, do you know if ie7 will be better? i know what i'd bet...) and there's absolutely no question that the number one reason project gutenberg is known as the premiere cyberspace library is that it made itself available to every computer on the planet. (and yes, that does in fact mean even the microsoft zombies, because if you don't have them, you don't have critical mass.) it's the guys that required the high-end equipment that got left in the dust, back at the starting gate, never to catch up... so i'll ask the question again: do you want to mimic _them_? -bowerbird p.s. of course, even when .tei displays in the trailing browsers, which will be what, about 5-10 years from now, that won't be an _end_ point, as much as it is the time to _start_ reflecting on the philosophical questions hinted at in an earlier point; but hey, let's hope we start asking those questions sooner... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061105/194b4f57/attachment.html From jon at noring.name Sun Nov 5 15:33:50 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 15:32:01 2006 Subject: [gutvol-d] Oops, forgot to mention hypertext linking and embedding images (was gvd061105...) In-Reply-To: References: Message-ID: <291263687.20061105163350@noring.name> Bowerbird wrote: > sam said: >>?modern Web browsers are quite capable of displaying TEI reasonably >> well, though some work on the relevant TEI and XSL stylesheets is >> necessary before they're ready to be widely used.... >> >> ? http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml >>?? http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml > i woulda liked a hotlinked table-of-contents, > as a bare minimum.? and the dictionary just > cries out for hotlinks in the "see:" entries and > elsewhere, but as i said, overall, they look nice. Ah yes, in addition to inline notes, I forgot to mention the problems in getting links and embedded images to work when natively rendering TEI documents in web browsers. (Linking is needed for a hypertext- enabled table of contents as Bowerbird is suggesting.) Unfortunately, to do both, generically-speaking, requires XLink. And unfortunately, only Firefox recognizes some XLink (why Opera still doesn't implement XLink is a mystery -- and forget IE). And unfortunately again, Firefox 1.x (haven't checked 2.0) will only enable hypertext links and not embed images and multimedia. Here's a demo of XLink-based linking to try in Firefox: http://www.windspun.com/demoxml/demolink.xml All in all, web browsers still have a little ways to go to be able to natively render the full-power of TEI or any similar XML vocabulary. That's why TEI is mostly used for "mastering" which is then transformed into other XML vocabularies as needed for specific applications, such as rendering in browsers. Jon Noring From sam.bretheim at gmail.com Sun Nov 5 16:48:05 2006 From: sam.bretheim at gmail.com (Sam Bretheim) Date: Sun Nov 5 17:12:47 2006 Subject: [gutvol-d] Oops, forgot to mention hypertext linking and embedding images (was gvd061105...) In-Reply-To: <291263687.20061105163350@noring.name> References: <291263687.20061105163350@noring.name> Message-ID: <454E8645.7010306@gmail.com> Jon Noring wrote: > > Ah yes, in addition to inline notes, I forgot to mention the problems > in getting links and embedded images to work when natively rendering > TEI documents in web browsers. (Linking is needed for a hypertext- > enabled table of contents as Bowerbird is suggesting.) > > Unfortunately, to do both, generically-speaking, requires XLink. And > unfortunately, only Firefox recognizes some XLink (why Opera still > doesn't implement XLink is a mystery -- and forget IE). And > unfortunately again, Firefox 1.x (haven't checked 2.0) will only > enable hypertext links and not embed images and multimedia. > XLink isn't actually the only way to accomplish that: you can get links, images, and a number of other things by importing the XHTML namespace into your XML document. (Of course, browser compatibility is limited, and you'd still need some XSL to transform images from the bizarre entity construct that regular TEI uses, or even the more sane PGTEI image syntax.) I've put up a quickie example at: http://shinparam.org/Sam/Projects/TEI-CSS/tei-xmlns-demo.xml I'm certainly not advocating that in-browser TEI is the One True Way way to distribute eTexts; it's just a trick that I found useful for previewing files while authoring them. From bruce at zuhause.org Sun Nov 5 17:07:33 2006 From: bruce at zuhause.org (Bruce Albrecht) Date: Sun Nov 5 17:35:43 2006 Subject: [gutvol-d] PG disclaimer obsolete? Message-ID: <17742.35541.262189.857767@celery.zuhause.org> And now for something completely different. One of the things that has irked me for some time is the following paragraph of the disclaimer: Project Gutenberg-tm eBooks are often created from several printed editions, all of which are confirmed as Public Domain in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. All of Distributed Proofreader projects where I have been the content provider have been from specific editions, and most, if not all of them, have HTML editions which preserve page breaks. Is there any way to have this paragraph removed from the boilerplate, at least by request of the submitter? From jon at noring.name Sun Nov 5 18:46:20 2006 From: jon at noring.name (Jon Noring) Date: Sun Nov 5 18:44:32 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org> References: <17742.35541.262189.857767@celery.zuhause.org> Message-ID: <1626627843.20061105194620@noring.name> Bruce Albrecht wrote: > One of the things that has irked me for some time is the following > paragraph of the disclaimer: Well, this issue has irked a lot of people. The number of rabid supporters of this policy of "source mystery" is very few, maybe even one or two. I have my theories as to the origin of this policy, but will refrain since hopefully all the PG "mystery source" texts will be properly redone in the future. > Project Gutenberg-tm eBooks are often created from several printed > editions, all of which are confirmed as Public Domain in the U.S. > unless a copyright notice is included. Thus, we do not necessarily > keep eBooks in compliance with any particular paper edition. > > All of Distributed Proofreader projects where I have been the content > provider have been from specific editions, and most, if not all of > them, have HTML editions which preserve page breaks. Is there any way > to have this paragraph removed from the boilerplate, at least by > request of the submitter? Well, definitely if a text comes from a single source, then that *should* be specifically noted in the text, and to mention which text it comes from. Also called "source metadata". I think one has to take a consumer's advocate view of this: Would you, as a consumer: 1) *prefer* the etext you get is a faithful reproduction of a known source? or 2) Are you content knowing the PG text you read may be: a) a composite from several sources, none of which are given, and b) that there may have been editing done on the text, the extent of which is not stated, nor do you even know who did the editing or on what basis editing was done. Unfortunately, many of the most popular texts in the PG corpus were put together before DP came on the scene. This is why several of us have urged DP to redo each of these works using a known and reasonably authoritative source. I'm all for DP doing all those obscure texts (like 19th century pigeon recipe cookbooks ), but the most popular works need to be put on a sound footing. If DP doesn't, someone else will, and these new texts will be outside of PG. Of course, as Greg just noted, PG encourages outside projects to digitize the public domain. We're all in this together. Jon Noring From joshua at hutchinson.net Sun Nov 5 18:49:11 2006 From: joshua at hutchinson.net (joshua@hutchinson.net) Date: Sun Nov 5 18:49:18 2006 Subject: [gutvol-d] The Proof is in the Poo Message-ID: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> >----Original Message---- >From: desrod@gnu-designs.com > > Does PG have TEI as a registered mime type? > > http://www.iana.org/assignments/media-types/ > > In our own space, Plucker has one[1].. and it might further >adoption if TEI had one registered as well, so webserver maintainers >could set the proper AddType directive and serve TEI documents with a >mime type that browsers (even broken ones like MSIE) could render >properly. > > The problem is that a browser CAN'T render a TEI document correctly (see Jon Noring's posts on how close it can come in certain browsers). So having it SAY it is a TEI document gives no benefit. We have it specifically say a TEI document is plain text hoping the browser will do what web standards say it should do and treat it as plain text. Unfortunately, IE thinks it knows better and proceeds to fail miserably. :) Josh From traverso at dm.unipi.it Sun Nov 5 21:19:46 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sun Nov 5 21:15:07 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org> (message from Bruce Albrecht on Sun, 5 Nov 2006 19:07:33 -0600) References: <17742.35541.262189.857767@celery.zuhause.org> Message-ID: <200611060519.kA65Jkd04782@pico.dm.unipi.it> >>>>> "Bruce" == Bruce Albrecht writes: Bruce> And now for something completely different. Bruce> One of the things that has irked me for some time is the Bruce> following paragraph of the disclaimer: Bruce> Project Gutenberg-tm eBooks are often created from several Bruce> printed editions, all of which are confirmed as Public Bruce> Domain in the U.S. unless a copyright notice is included. Bruce> Thus, we do not necessarily keep eBooks in compliance with Bruce> any particular paper edition. Bruce> All of Distributed Proofreader projects where I have been Bruce> the content provider have been from specific editions, and Bruce> most, if not all of them, have HTML editions which preserve Bruce> page breaks. Is there any way to have this paragraph Bruce> removed from the boilerplate, at least by request of the Bruce> submitter? A custom boilerplate is impossible, since it may be changed when the PG standard changes and an ebook is revised. But a small change of wording might be adopted: instead of "often" say "sometimes". Then the source informations (that now are no longer removed from the text) might be sufficient. Although including metadata stating the source would be much much better. And I believe that the sentence as it stands is a big lie: how many PG books have a filed clearance for more than one source, and these have been used for the preparation? No more than 0.1% I believe, and this is not "often". Carlo From cannona at fireantproductions.com Sun Nov 5 21:25:31 2006 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun Nov 5 21:25:36 2006 Subject: [gutvol-d] PG disclaimer obsolete? References: <17742.35541.262189.857767@celery.zuhause.org> <200611060519.kA65Jkd04782@pico.dm.unipi.it> Message-ID: <002401c70163$fc2e20a0$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 You beat me to it. However, I would personally suggest the following wording: In some rare cases, Project Gutenberg-tm eBooks are created from several printed editions, all of which are confirmed as Public Domain in the U.S. Sincerely Aaron Cannon - -- Skype: cannona MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: "Carlo Traverso" To: Cc: Sent: Sunday, November 05, 2006 11:19 PM Subject: Re: [gutvol-d] PG disclaimer obsolete? >>>>>> "Bruce" == Bruce Albrecht writes: > > Bruce> And now for something completely different. > > Bruce> One of the things that has irked me for some time is the > Bruce> following paragraph of the disclaimer: > > Bruce> Project Gutenberg-tm eBooks are often created from several > Bruce> printed editions, all of which are confirmed as Public > Bruce> Domain in the U.S. unless a copyright notice is included. > Bruce> Thus, we do not necessarily keep eBooks in compliance with > Bruce> any particular paper edition. > > Bruce> All of Distributed Proofreader projects where I have been > Bruce> the content provider have been from specific editions, and > Bruce> most, if not all of them, have HTML editions which preserve > Bruce> page breaks. Is there any way to have this paragraph > Bruce> removed from the boilerplate, at least by request of the > Bruce> submitter? > > A custom boilerplate is impossible, since it may be changed when the > PG standard changes and an ebook is revised. But a small change of > wording might be adopted: instead of "often" say "sometimes". Then the > source informations (that now are no longer removed from the text) > might be sufficient. Although including metadata stating the source > would be much much better. > > And I believe that the sentence as it stands is a big lie: how many PG > books have a filed clearance for more than one source, and these have > been used for the preparation? No more than 0.1% I believe, and this > is not "often". > > Carlo > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFTsdOI7J99hVZuJcRAs1ZAJ9bcKO/31fTLKwjLF3PVV7Vo4rCbwCgnp3o KRx9RNW9Mqh23XUj65E4ls0= =AxKX -----END PGP SIGNATURE----- From schultzk at uni-trier.de Mon Nov 6 01:05:50 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon Nov 6 01:05:57 2006 Subject: [gutvol-d] gvd061104 -- another commentary on consistency In-Reply-To: References: Message-ID: <69A833B0-CFB2-415E-ABC6-DEFCD0F8384B@uni-trier.de> Hi BB, PG has a recomended format! I would not call it mark-up per-se. Am 05.11.2006 um 18:53 schrieb Bowerbird@aol.com: > keith said: > [snip, snip] > keith said: > > If PG had as a base format > > a minimalistic markup > > then all would benefit from that. > > just to let you know, keith, p.g. _does_ have > "a minimalistic markup", at least "officially"... > > for instance, the rule is that four blank lines > should precede a header, and two follow it. [snip, snip] > > the problem -- especially bad in the past -- > is they are too casual about quality-control, > so the e-texts are _inconsistent_ in regard to > their adherence to this minimalistic markup... That is why I wish for a base format with mark-up. regards, Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/eb8bd381/attachment.html From schultzk at uni-trier.de Mon Nov 6 01:15:30 2006 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon Nov 6 01:15:36 2006 Subject: [gutvol-d] gvd061030 -- let's get it started in here In-Reply-To: <20061105193940.GA5654@pglaf.org> References: <20061105193940.GA5654@pglaf.org> Message-ID: Hi Greg, I will make a proposal and explain the advantages and possible use. I ask only to reflect on the concept. Then decided if it is feasible and maybe if accepted have it implemented. I do know and understand perfectly well that I may be just barking up a tree for most people. But, it has been more than 5 years since I tried last. regards Keith. Am 05.11.2006 um 20:39 schrieb Greg Newby: > On Sat, Nov 04, 2006 at 09:15:56PM +0100, Schultz Keith J. wrote: >> Hi Greg, >> >> What I am talking about is a base format which would require >> that all PG texts be scanned and process to contain information >> on chapter, pages, paragraphs, type, etc.. > > Your use of the phrase of "require that all PG texts..." indicates > you're not paying attention. > > What you are asking for, will not be granted. What is granted, to you > or anyone, is in my email below. Both what will and will not be > granted > has already been stated by many people in this discussion thread, > and is > in the About articles I mentioned. > > It sounds like you need to find or start another project to meet your > goals (also described in the About articles, and similarly > encouraged). > [snip, snip] From prosfilaes at gmail.com Mon Nov 6 05:08:26 2006 From: prosfilaes at gmail.com (David Starner) Date: Mon Nov 6 05:08:31 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> Message-ID: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> On 11/5/06, joshua@hutchinson.net wrote: > > > >----Original Message---- > >From: desrod@gnu-designs.com > > > > Does PG have TEI as a registered mime type? > > > > http://www.iana.org/assignments/media-types/ > > > > In our own space, Plucker has one[1].. and it might further > >adoption if TEI had one registered as well, so webserver maintainers > >could set the proper AddType directive and serve TEI documents with > a > >mime type that browsers (even broken ones like MSIE) could render > >properly. > > > > > > The problem is that a browser CAN'T render a TEI document correctly > (see Jon Noring's posts on how close it can come in certain browsers). > So having it SAY it is a TEI document gives no benefit. That's not true. Saying that it is a TEI document means that most browsers would say that I don't know what this is and it would offer to load it in another application. Anybody who actually wants to view the TEI version probably has a better program to view it in; if all else fails, Wordpad is about as good a tool as opening it up in the browser. When a TEI viewer is written, the browser can be told to open it in the viewer, and it shouldn't be that hard to make it a browser plugin and view it in browser just like a PDF document. From marcello at perathoner.de Mon Nov 6 05:39:18 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Nov 6 05:39:23 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> Message-ID: <454F3B06.3010802@perathoner.de> David Starner wrote: > That's not true. Saying that it is a TEI document means that most > browsers would say that I don't know what this is and it would offer > to load it in another application. Anybody who actually wants to view > the TEI version probably has a better program to view it in; if all > else fails, Wordpad is about as good a tool as opening it up in the > browser. When a TEI viewer is written, the browser can be told to open > it in the viewer, and it shouldn't be that hard to make it a browser > plugin and view it in browser just like a PDF document. I have just ascertained on the TEI list that there is no registered IANA MIME type for TEI documents. We thus have following alternatives: text/plain: browser displays doc. User might save and open in app if she wants. application/octet-stream: browser prompts user to execute or save doc. There are no native TEI viewers yet, so user will probably open in editor, gaining nothing. application/tei+xml: and don't we all love breaking standards for no good reason? -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Mon Nov 6 06:35:59 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 06:41:17 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> Message-ID: <1110693392.20061106073559@noring.name> David Starner wrote: > Joshua wrote: >> The problem is that a browser CAN'T render a TEI document correctly >> (see Jon Noring's posts on how close it can come in certain >> browsers). So having it SAY it is a TEI document gives no benefit. > That's not true. Saying that it is a TEI document means that most > browsers would say that I don't know what this is and it would offer > to load it in another application. Anybody who actually wants to > view the TEI version probably has a better program to view it in; if > all else fails, Wordpad is about as good a tool as opening it up in > the browser. When a TEI viewer is written, the browser can be told > to open it in the viewer, and it shouldn't be that hard to make it a > browser plugin and view it in browser just like a PDF document. I'm intrigued with the possibility of using a browser plug-in to view a full-blown PGTEI document. The plug-in would do something so that notes placed inline, embedded images, and hypertext links would work as we want them to. I know nothing about browser plug-ins, so I'm not certain this is possible to do. We should not feel compelled to provide omni-browser support -- Firefox and/or Opera are sufficient as a start. I'm pessimistic about IE6/7 anyway because of its lack of full CSS standards support. An alternative to a plug-in would be to adapt an open-source browser, like Firefox, to render PGTEI documents. Again, I'm not sure of the amount of time it would take to adapt the code. Handling links and embedded images should be fairly straightforward since it may only require an internal transformation in the DOM, but handling notes placed inline may be a little more knotty. One would like to yank and present each note placed inline (regardless of what the attribute type says) into a separate browser window (in popup fashion) like Microsoft Reader's pagelet feature. That would be very innovative. Assuming the OpenReader format gets traction, we plan to provide some form of native TEI support in the future. Since OpenReader supports the out-of-spine feature, the ability to yank an inline note and place it into a popup or elsewhere will already be there. Jon Noring From lee at novomail.net Mon Nov 6 06:57:52 2006 From: lee at novomail.net (Lee Passey) Date: Mon Nov 6 06:57:21 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: <1626627843.20061105194620@noring.name> References: <17742.35541.262189.857767@celery.zuhause.org> <1626627843.20061105194620@noring.name> Message-ID: <454F4D70.1000001@novomail.net> Jon Noring wrote: > I think one has to take a consumer's advocate view of this: Would > you, as a consumer: > > 1) *prefer* the etext you get is a faithful reproduction of a known > source? > > or > > 2) Are you content knowing the PG text you read may be: > > a) a composite from several sources, none of which are given, and > > b) that there may have been editing done on the text, the extent of > which is not stated, nor do you even know who did the editing or on > what basis editing was done. As a consumer, I frankly don't care, so long as there is no deception. That is, I don't want an e-text to be labeled as "a faithful reproduction of a known source" unless it is. And this should be true whether the label is explicit, as in your example #1 or implicit, as in calling a collection a "literary archive," which implies some attempt to preserve the state of a document. In addition to the language referred to by the original poster, the standard PG boilerplate also contains language attempting to exercise some legal control over the distribution of its documents. This control is based on the law of trademark. The sine qua non of trademark law is that practices which would tend to cause confusion in the minds of consumers as to the origin of a product are prohibited. Thus you cannot create silverware and label it "Revere," you cannot fill oil cans with oil and label them "Quaker State," and you cannot create an arbitrary file called "Alice's Adventures in Wonderland" and label it "Project Gutenberg." These prohibitions are not designed so much to protect a manufacturer or tradesman from unfair competition as much as they are designed to protect consumers from deceptive business practices. Because PG e-texts are ostensibly in the public domain, Project Gutenberg cannot control the use of the text itself, but it /can/ control use of the Project Gutenberg trademark, for example, by requiring certain disclaimers on an e-text if it is labeled as coming from Project Gutenberg. Among other things, the Project Gutenberg license requires that if you discover errors in a Project Gutenberg e-text, and correct them, you must strip of file of all references to Project Gutenberg if you wish to distribute the corrected file. As Mr. Hart and Mr. Newby have made abundantly clear over the past few days, the leadership of Project Gutenberg is committed to avoiding the establishment of /any/ standards for Project Gutenberg e-texts. Because the use of certain words on the PG web site (such as "literary archive" and "guidelines") may imply the existence of a standard I think it is important that /every/ PG e-text continue to contain the disclaimer that no attempt has been made to assure that any e-text conforms to any particular existing version or edition. Indeed, the disclaimer should probably be made stronger, more in line with Mr. Noring's option number 2, above. Of course, as Project Gutenberg is devoid of standards, there is nothing which would prevent a contributer from adding his or her own information to any text, in addition to the standard Project Gutenberg boilerplate, claiming conformance to a specific edition or making other assurances of quality. But only by the proper use of the Project Gutenberg trademark and disclaimers, can consumers be assured that a Project Gutenberg e-text may contain unintentional or intentional errors or omissions and is unreliable for any purpose other than casual reading. From marcello at perathoner.de Mon Nov 6 07:00:43 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Nov 6 07:00:48 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <1110693392.20061106073559@noring.name> References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> <1110693392.20061106073559@noring.name> Message-ID: <454F4E1B.5010108@perathoner.de> Jon Noring wrote: > I know nothing about browser plug-ins, so I'm not certain this is > possible to do. We should not feel compelled to provide omni-browser > support -- Firefox and/or Opera are sufficient as a start. I'm > pessimistic about IE6/7 anyway because of its lack of full CSS > standards support. A browser plug-in is an independent peace of code that just happens to display inside the browser window. It does not need any support on behalf of the browser's rendering engine although it may elect to use some. To build this on a reasonable cross-platform (IE / FF / Opera) basis will take a good programmer as little as 3 or 4 years (BB: 2500 years). As I see no earthly reason why anybody might want his high-end browser to display the TEI version, when the HTML version displays right out-of-the-box on every browser, I will invest no time in doing any of this. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Mon Nov 6 07:16:46 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 07:15:01 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: <454F4D70.1000001@novomail.net> References: <17742.35541.262189.857767@celery.zuhause.org> <1626627843.20061105194620@noring.name> <454F4D70.1000001@novomail.net> Message-ID: <11554190.20061106081646@noring.name> Lee Passey wrote: > Of course, as Project Gutenberg is devoid of standards, there is > nothing which would prevent a contributer from adding his or her own > information to any text, in addition to the standard Project > Gutenberg boilerplate, claiming conformance to a specific edition or > making other assurances of quality. But only by the proper use of > the Project Gutenberg trademark and disclaimers, can consumers be > assured that a Project Gutenberg e-text may contain unintentional or > intentional errors or omissions and is unreliable for any purpose > other than casual reading. Agreed! Jon Noring From jon at noring.name Mon Nov 6 07:25:49 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 07:24:03 2006 Subject: [gutvol-d] The Proof is in the Poo In-Reply-To: <454F3B06.3010802@perathoner.de> References: <7356866.1162781351278.JavaMail.?@fh1040.dia.cp.net> <6d99d1fd0611060508h7b8ed0c7ob66c2e11fdbccbe9@mail.gmail.com> <454F3B06.3010802@perathoner.de> Message-ID: <685910051.20061106082549@noring.name> Marcello wrote: > I have just ascertained on the TEI list that there is no registered IANA > MIME type for TEI documents. We thus have following alternatives: > > > text/plain: browser displays doc. User might save and open in app if she > wants. > > application/octet-stream: browser prompts user to execute or save doc. > There are no native TEI viewers yet, so user will probably open in > editor, gaining nothing. > > application/tei+xml: and don't we all love breaking standards for no > good reason? If one wants to "roll their own", maybe for temporary use until one secures IANA registration, there is another syntax, e.g., application/x-pgtei+xml or simply application/x-tei+xml Also, there's a move away from the above to a "dot" kind of structure for mimetypes. I wrote up a summary of this for the IDPF OCF working group, and will dig it out if anyone is interested. Jon From hart at pglaf.org Mon Nov 6 11:01:51 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Nov 6 11:01:53 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: <17742.35541.262189.857767@celery.zuhause.org> References: <17742.35541.262189.857767@celery.zuhause.org> Message-ID: No reason not to indicate that any particular was intended to be an exact copy of a certain paper edition, though you should be prepared for various other editions to be added to the mix in new efforts in the future, so make some kind of statement that you want at least one eBook to remain faithful to the original. You can decide yourself if you want [sic] to indicate canononical errors. Michael S. Hart Founder Project Gutenberg On Sun, 5 Nov 2006, Bruce Albrecht wrote: > And now for something completely different. > > One of the things that has irked me for some time is the following > paragraph of the disclaimer: > > Project Gutenberg-tm eBooks are often created from several printed > editions, all of which are confirmed as Public Domain in the U.S. > unless a copyright notice is included. Thus, we do not necessarily > keep eBooks in compliance with any particular paper edition. > > All of Distributed Proofreader projects where I have been the content > provider have been from specific editions, and most, if not all of > them, have HTML editions which preserve page breaks. Is there any way > to have this paragraph removed from the boilerplate, at least by > request of the submitter? > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Mon Nov 6 12:13:16 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 6 12:13:35 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen Message-ID: <594.41c6177d.3280f15c@aol.com> keith said: > PG has a recomended format! > I would not call it mark-up per-se. i would call it "zen markup". :+) *** marcello said: > There are no native TEI viewers yet "yet"? you mean none of those 134 .tei projects has seen fit to create a viewer-program for their major format? surprising, eh? *** jon said: > I'm intrigued with the possibility > of using a browser plug-in to > view a full-blown PGTEI document. well, if none of those 134 major projects can do it for you, remember you can always use the open-source world for programmers! *** lee said: > As a consumer, I frankly don't care jon, you can't even get your buddy lee to say that he cares about your cause... give it up, dude, it was too old long ago. and frankly, if "consumers" _did_ object, then project gutenberg would have never become the premiere cyberspace library... at least duguid was able to come up with a _good_example_. you have yet to do that... *** a seasick page: > http://www.pgdp.org/ols/tools/display.php?nextpage=111.png& book=400b970ba22f4 how can a scan be twisted one way at the top, and another way at the bottom? that's weird! *** so, let's ruminate for a little bit on "open-source", ok? to any dedicated fan of cooperation and collaboration, open-source has to be one of the neatest concepts ever. the idea is that a whole lot of people will come together, each of them making a small contribution, and that these small individual contributions cumulate and coalesce into a bigger, social good to which each contributor has access. "many hands make light work" is the applicable proverb... since each contributor has made only a small contribution, none of them was excessively inconvenienced, yet they now have use of something none could have created all alone... given the great benefits against a backdrop of small cost, it's almost as if people are getting "something for nothing". (it's more like "something for _almost_ nothing", but still...) of course, into an idyllic situation like this, you know there must be an evil presence which will surely introduce itself. in this case, it's people who make use of the end-product without contributing anything themselves to its creation... in a phrase, these people are "taking advantage". there's many different ways that people who _do_ contribute can think about people who are "taking advantage" of them... one might be that you made your contribution as a _gift_ and therefore don't care if people "take advantage" of what you did. indeed, you might even _want_ people to "take advantage" of it. this is generally the orientation of most open-source software... a second reaction might be that you try to prevent other people from "taking advantage" of the social good, by controlling access. public laws against homeless people sleeping in a park are just one example of this approach. (otherwise we'd _all_ sleep there!) this approach doesn't really work too well with software, however, because if you can get access to a program, then you can use it... a third reaction might be that you just don't care about the people who are "taking advantage" of the social good without contributing. like a shopowner who knows that there's gonna be _some_amount_ of shoplifting s/he can't prevent, and therefore simply _accepts_ it. in terms of software, this approach is probably like "donation-ware", where the programmer _asks_ for a "donation", but accepts the fact that most people are gonna use the program without paying for it... other attitudes can be adopted about people who "take advantage without contributing", but you get the point -- it's an open issue... for another take on the matter, consider the story of the little red hen. i've appended it to this e-mail. you can find it as p.g. e-text #18735. as a brief recap, the little red hen has the idea to bake some bread. she goes around to all the other animals to solicit their help in the various tasks necessary to the process of baking the bread, including the growing and harvesting of the wheat, and so on... at every stage, all of the other animals are "too busy" to help out. but once the bread is baked, all the other animals want to eat it... but the little red hen says "nope", because they didn't help her with any of the intermediate steps. they just wanted the end-benefit of _eating_ the bread, without paying any of the cost of _producing_ it. sometimes, from some people here, i feel they think of "open-source" as a magical way they can get other people to do their work for them. like the other animals in the story of the little red hen, they don't want to do any of the work to _produce_ the social good, they just want to use it after somebody else has done all the work to create it. moreover, these people don't seem to understand that a key concept of open-source is that _many_people_contribute_, with none of them required to do more than a little bit. if it's _one_ person, doing all the work, that's a different phenomenon. and if the other people who are taking advantage of that one person are using his/her work in direct opposition to the _core_philosophy_ of the person who has done the work, then that's _entirely_different_. so, like the little red hen, i've decided i'm not going to let those people "take advantage" of my work. *** thus, until i get a little bit of _help_, i've decided not to share the _code_ that i'm writing. i'll still let y'all see the _output_ -- because, after all, as you'll remember, it's pudding-time -- but if you want to eat the bread you'll have to help make it... (i guess that would make this _bread_pudding_, which i love!) the last assignment in our "babelfish" open-source project -- hey, we've got the biblical "bread and fishes" going now! -- was to generalize the page-writing code to produce any page. for an example, output page 83. you can see the results of this code by running the new script: >?? http://www.greatamericannovel.com/scgi-bin/babelfish11.pl ok, nothing remarkable there in the output that we didn't see from babelfish10.pl. but at least the underlying code is more versatile now... *** so now that we've got general code that can produce any page, we're able to take the project to the next level, and make our script start to behave like something that deserves the label of "an electronic-book viewer-program". specifically, we need to have the script navigate the pages of the e-book, based on feedback it collects from the reader... from any one page, there are a number of pages to which the reader will typically want to navigate. these are: 1. the next page. 2. the previous page. 3. the page prior to this one where a chapter starts. 4. the page following this one where a chapter starts. 5. the table-of-contents page. 6. other table-of-contents pages, if there are others. 7. auxiliary views on the page, if there are any. in addition, of course, you'll want to give the user the ability to enter _any_ pagenumber and jump directly to that page... assignment: let the user determine what page will be displayed. you can see the results of this code by running the new script: >?? http://www.greatamericannovel.com/scgi-bin/babelfish12.pl basically, because we had to produce an "html form", to collect responses from the end-user, this is a rather significant addition to the code we've produced before. still, it only added about 100 lines of code to the 100 lines that we had thus far, so we're still under 200 lines _total_... but yet we've got an impressive e-book engine already. indeed, the only major shortcoming is the absence of a book-wide "search" capability (and that's coming soon.) and again, we produced it from a _plain-text_.zml_file_, with a perl script that was composed by a perl _beginner_ -- one who admits that his code is "hacked together" and who has been described as "dangerous" by david desrod. (if you only knew, david, if you only knew...) notice as well that -- when you use the buttons on top -- the script generates these pages _on-the-fly_, meaning you don't even have to _store_ them if you don't want to. of course, to make sure they get in the search engines, you'll probably _want_ to have them exist as static pages, at least until they get crawled. but you wouldn't _have_to_ store them, if you didn't want to, as you can generate 'em. (the error-reporting feature on these pages puts another wrinkle in the dynamic/static decision, but it's not one that can't be dealt with, if you decide to go the dynamic route.) *** anyway, to recap, this is what a crude white-hat "hacker" can do if you hand him a dirt-simple plain-text format... *** since marcello has been especially contemptuous of my "mad perl coding skillz", i invite you to compare my hack with his online viewer-program for this "my antonia" book: > http://www.gutenberg.org/etext/242 click on "read online", at the top of the page. as usual, i like to jump to p.123, to get in the middle of a book. when i did that, and chose "next page" for page 124, i came to a page that had a major header -- the one for "book 4" -- at the very bottom of the page. kinda funny. and the headers aren't big or bold like headers should be. and of course there's no table-of-contents, or _anything_ that gives users a notion about the structure of the book. and what _is_ the deal with this "pagination" anyway? first of all, it doesn't match any existing pagination, so what good is it really? all it does is make it impossible to search the entire book for a keyword. the reason to chunk a book into pages is to avoid scrolling, but these "pages" have so much text that you must scroll anyway. so this is the worst of both worlds. i dunno, marcello, seems to me like you don't have much ground to stand on in terms of being critical. *** anyway, more pudding later. but if you want to see the secret recipes, you'll need to start collaborating! -bowerbird p.s. here's .zml of the little red hen, from p.g. #18735: The Little Red Hen by Florence White Williams A Little Red Hen lived in a barnyard. She spent almost all of her time walking about the barnyard in her picketty-pecketty fashion, scratching everywhere for worms. She dearly loved fat, delicious worms and felt they were absolutely necessary to the health of her children. As often as she found a worm she would call "Chuck-chuck-chuck!" to her chickies. When they were gathered about her, she would distribute choice morsels of her tid-bit. A busy little body was she! A cat usually napped lazily in the barn door, not even bothering herself to scare the rat who ran here and there as he pleased. And as for the pig who lived in the sty -- he did not care what happened so long as he could eat and grow fat. One day the Little Red Hen found a Seed. It was a Wheat Seed, but the Little Red Hen was so accustomed to bugs and worms that she supposed this to be some new and perhaps very delicious kind of meat. She bit it gently and found that it resembled a worm in no way whatsoever as to taste although because it was long and slender, a Little Red Hen might easily be fooled by its appearance. Carrying it about, she made many inquiries as to what it might be. She found it was a Wheat Seed and that, if planted, it would grow up and when ripe it could be made into flour and then into bread. When she discovered that, she knew it ought to be planted. She was so busy hunting food for herself and her family that, naturally, she thought she ought not to take time to plant it. So she thought of the Pig -- upon whom time must hang heavily and of the Cat who had nothing to do, and of the great fat Rat with his idle hours, and she called loudly: "Who will plant the Seed?" But the Pig said, "Not I," and the Cat said, "Not I," and the Rat said, "Not I." "Well, then," said the Little Red Hen, "I will." And she did. Then she went on with her daily duties through the long summer days, scratching for worms and feeding her chicks, while the Pig grew fat, and the Cat grew fat, and the Rat grew fat, and the Wheat grew tall and ready for harvest. So one day the Little Red Hen chanced to notice how large the Wheat was and that the grain was ripe, so she ran about calling briskly: "Who will cut the Wheat?" The Pig said, "Not I," the Cat said, "Not I," and the Rat said, "Not I." "Well, then," said the Little Red Hen, "I will." And she did. She got the sickle from among the farmer's tools in the barn and proceeded to cut off all of the big plant of Wheat. On the ground lay the nicely cut Wheat, ready to be gathered and threshed, but the newest and yellowest and downiest of Mrs. Hen's chicks set up a "peep-peep-peeping" in their most vigorous fashion, proclaiming to the world at large, but most particularly to their mother, that she was neglecting them. Poor Little Red Hen! She felt quite bewildered and hardly knew where to turn. Her attention was sorely divided between her duty to her children and her duty to the Wheat, for which she felt responsible. So, again, in a very hopeful tone, she called out, "Who will thresh the Wheat?" But the Pig, with a grunt, said, "Not I," and the Cat, with a meow, said, "Not I," and the Rat, with a squeak, said, "Not I." So the Little Red Hen, looking, it must be admitted, rather discouraged, said, "Well, I will, then." And she did. Of course, she had to feed her babies first, though, and when she had gotten them all to sleep for their afternoon nap, she went out and threshed the Wheat. Then she called out: "Who will carry the Wheat to the mill to be ground?" Turning their backs with snippy glee, that Pig said, "Not I," and that Cat said, "Not I," and that Rat said, "Not I." So the good Little Red Hen could do nothing but say, "I will then." And she did. Carrying the sack of Wheat, she trudged off to the distant mill. There she ordered the Wheat ground into beautiful white flour. When the miller brought her the flour she walked slowly back all the way to her own barnyard in her own picketty-pecketty fashion. She even managed, in spite of her load, to catch a nice juicy worm now and then and had one left for the babies when she reached them. Those cunning little fluff-balls were _so_ glad to see their mother. For the first time, they really appreciated her. After this really strenuous day Mrs. Hen retired to her slumbers earlier than usual -- indeed, before the colors came into the sky to herald the setting of the sun, her usual bedtime hour. She would have liked to sleep late in the morning, but her chicks, joining in the morning chorus of the hen yard, drove away all hopes of such a luxury. Even as she sleepily half opened one eye, the thought came to her that today that Wheat must, somehow, be made into bread. She was not in the habit of making bread, although, of course, anyone can make it if he or she follows the recipe with care, and she knew perfectly well that she could do it if necessary. So after her children were fed and made sweet and fresh for the day, she hunted up the Pig, the Cat and the Rat. Still confident that they would surely help her some day she sang out, "Who will make the bread?" Alas for the Little Red Hen! Once more her hopes were dashed! For the Pig said, "Not I," the Cat said, "Not I," and the Rat said, "Not I." So the Little Red Hen said once more, "I will then." And she did. Feeling that she might have known all the time that she would have to do it all herself, she went and put on a fresh apron and spotless cook's cap. First of all she set the dough, as was proper. When it was time she brought out the moulding board and the baking tins, moulded the bread, divided it into loaves, and put them in the oven to bake. All the while the Cat sat lazily by, giggling and chuckling. And close at hand the vain Rat powdered his nose and admired himself in a mirror. In the distance could be heard the long-drawn snores of the dozing Pig. At last the great moment arrived. A delicious odor was wafted upon the autumn breeze. Everywhere the barnyard citizens sniffed the air with delight. The Red Hen ambled in her picketty-pecketty way toward the source of all this excitement. Although she appeared to be perfectly calm, in reality she could only with difficulty restrain an impulse to dance and sing, for had she not done all the work on this wonderful bread? Small wonder that she was the most excited person in the barnyard! She did not know whether the bread would be fit to eat, but -- joy of joys! -- when the lovely brown loaves came out of the oven, they were done to perfection. Then, probably because she had acquired the habit, the Red Hen called: "Who will eat the Bread?" All the animals in the barnyard were watching hungrily and smacking their lips in anticipation, and the Pig said, "I will," the Cat said, "I will," the Rat said, "I will." But the Little Red Hen said, "No, you won't. I will." And she did. --- the end --- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/101ff2b3/attachment-0001.html From jon at noring.name Mon Nov 6 12:38:26 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 12:42:46 2006 Subject: [gutvol-d] PG disclaimer obsolete? In-Reply-To: References: <17742.35541.262189.857767@celery.zuhause.org> Message-ID: <558816798.20061106133826@noring.name> Michael wrote: > No reason not to indicate that any particular was intended to be > an exact copy of a certain paper edition, though you should be > prepared for various other editions to be added to the mix in > new efforts in the future, so make some kind of statement that > you want at least one eBook to remain faithful to the original. Well, this is great! I hope that texts submitted which fall under the category of "faithful/accurate to a particular source edition which is given in the text" will have stricter requirements as to updating/ editing, namely that edits will only be to correct transcription errors with reference to the original source (hopefully the scans will be around to "backup" the digital version, almost like gold backing up paper money, if one wants some weak analogy.) Certainly if someone wants to produce a derivative edition from the source text, such as correcting errors in the original source, that's fine, too, so long as it is identified as such. In fact, my OpenReader version of "My Antonia" includes corrections of errors in the original first edition (the error corrections come from both Jose Menendez and from the Cather Project): http://www.openreader.org/myantonia/My_Antonia_OpenReader_1.0_03-Oct-2006.zip In this version, I go through pains in the metadata to state the general nature of the changes, and that the original authentic source is available which includes markup indicating the exact errors found and what the corrections should be. A future OpenReader version will include both texts in the distribution (as two "user sets") so the reader can choose either the more scholarly, accurate-to-the-1st-edition version (including errors which are highlighted), or the "corrected" edition for casual reading. It's all a matter of identifying for each text what it exactly is, not what it may be. I have no problem with derivative/composite texts so long as they describe what they are and hopefully state what sources were used in their production, and some general comments of what changes were made. The argument that many books from major publishers are themselves some sort of edited versions of older works and that they as a rule don't state much about the sources or what they did, thus we don't have to do the same is a poor argument. Do we really want PG to emulate the thinking of the "top down" companies (as Michael would describe them)? Shouldn't PG be better than commercial publishers and record right in the text whatever metadata it can about the source(s), transcription, and editing process? After all, it is easy to include that data in the text, so why not? Certainly PG has a point in not overly burdening those who transcribe texts and submit them -- on the other hand, a few *recommendations*, not requirements, which don't overly burden the volunteers is not bad, either -- simply state why it is a good idea to do this and that, and most of the volunteers will be happy to oblige. In fact, many volunteers will appreciate that PG cares about quality. Jon From jon at noring.name Mon Nov 6 12:47:37 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 12:45:49 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen In-Reply-To: <594.41c6177d.3280f15c@aol.com> References: <594.41c6177d.3280f15c@aol.com> Message-ID: <1009313713.20061106134737@noring.name> Bowerbird wrote: > Lee said: >>?As a consumer, I frankly don't care > jon, you can't even get your buddy lee > to say that he cares about your cause... > give it up, dude, it was too old long ago. > > and frankly, if "consumers" _did_ object, > then project gutenberg would have never > become the premiere cyberspace library... > > at least duguid was able to come up with a > _good_example_.? you have yet to do that... LOL, well, I guess you don't know Lee as I do. Reread the last paragraph in his message -- it is brilliant. Here, I'll repeat what Lee wrote: > Of course, as Project Gutenberg is devoid of standards, there is > nothing which would prevent a contributer from adding his or her own > information to any text, in addition to the standard Project > Gutenberg boilerplate, claiming conformance to a specific edition or > making other assurances of quality. But only by the proper use of > the Project Gutenberg trademark and disclaimers, can consumers be > assured that a Project Gutenberg e-text may contain unintentional or > intentional errors or omissions and is unreliable for any purpose > other than casual reading. What more can I say about this except that I agree! Jon From jon at noring.name Mon Nov 6 13:05:58 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 13:04:09 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <1009313713.20061106134737@noring.name> References: <594.41c6177d.3280f15c@aol.com> <1009313713.20061106134737@noring.name> Message-ID: <1582116939.20061106140558@noring.name> I previously wrote: > Lee wrote: >> Of course, as Project Gutenberg is devoid of standards, there is >> nothing which would prevent a contributer from adding his or her own >> information to any text, in addition to the standard Project >> Gutenberg boilerplate, claiming conformance to a specific edition or >> making other assurances of quality. But only by the proper use of >> the Project Gutenberg trademark and disclaimers, can consumers be >> assured that a Project Gutenberg e-text may contain unintentional or >> intentional errors or omissions and is unreliable for any purpose >> other than casual reading. > What more can I say about this except that I agree! Hmmm, I guess I should have also asked the obvious questions: Is everybody cool with Lee's conclusion? Should PG's collection become known as the collection that can't be relied upon for anything but casual reading? Jon From Bowerbird at aol.com Mon Nov 6 13:05:41 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 6 13:05:48 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen Message-ID: jon quoted lee, who said: > a Project Gutenberg e-text may contain unintentional > or intentional errors or omissions and is unreliable for > any purpose other than casual reading. probably true of almost every book in almost every bookstore. nobody seems to mind. none of us is perfect either. and since michael's only target was "casual reading", it's fine... *** but more to the point, out of that whole post, _this_ was the only thing to which you responded? that makes me smile, jon. i assure you that there is more there for you to think about... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/f5972a03/attachment.html From Bowerbird at aol.com Mon Nov 6 13:14:41 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 6 13:14:53 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... Message-ID: jon said: > Is everybody cool with Lee's conclusion? i think when you are the top dog, there is no shortage of mutts who will try to take a cheap shot at you. if you can't be "cool" toward that, you don't deserve to be top dog... *** having said all that, i'm in favor of doing a clean-up of all the e-texts. but distributed proofreaders seems more inclined to produce new books than clean up all of the older e-texts (including ones that _they_ produced). since d.p. lays loud claim to now being the biggest digitizer in the land of p.g., i'd say the burden is on their shoulders. if they will not do the job, why should i? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/dde5fe80/attachment.html From jon at noring.name Mon Nov 6 13:19:27 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 13:17:40 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen In-Reply-To: References: Message-ID: <168768566.20061106141927@noring.name> Bowerbird wrote: > jon quoted lee, who said: >>?? a Project Gutenberg e-text may contain unintentional >>?? or intentional errors or omissions and is unreliable for >>?? any purpose other than casual reading. > probably true of almost every book in almost every bookstore. Yes, but we are not talking about paper books sold by commercial interests. We are talking about the PG collection which has completely different goals. > nobody seems to mind.? none of us is perfect either. Who is 'nobody'? Many people do mind, and for good reasons. Most of those people are not here in gutvol-d. > and since michael's only target was "casual reading", it's fine... That's not his vision. His vision is to get the books to every person on every continent in the world. A big vision like this also requires great responsibility with respect to the integrity of the texts. > but more to the point, out of that whole post, _this_ was the > only thing to which you responded?? that makes me smile, jon. > i assure you that there is more there for you to think about... You seem to be happy with Lee's conclusion in his last paragraph, am I right? Jon From jon at noring.name Mon Nov 6 13:21:45 2006 From: jon at noring.name (Jon Noring) Date: Mon Nov 6 13:19:57 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: References: Message-ID: <456909141.20061106142145@noring.name> Bowerbird wrote: > jon said: >>?? Is everybody cool with Lee's conclusion? > i think when you are the top dog, > there is no shortage of mutts who > will try to take a cheap shot at you. > > if you can't be "cool" toward that, > you don't deserve to be top dog... Again, do you accept Lee's conclusion? If not, with which point do you disagree? Jon From Bowerbird at aol.com Mon Nov 6 14:59:39 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 6 14:59:58 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen Message-ID: <37e.7163b54b.3281185b@aol.com> jon said: > we are not talking about paper books > sold by commercial interests. no, we're talking about e-books that are created by volunteers for cost-free use by anyone who can get a hold of them... this cost-free use even extends to people like me -- and you, presumably -- who wants to improve them in various ways... the message i have for those volunteers -- and thus the main reason i am here -- is that there is an easier way for _them_ to do the work that _they_ do. i am trying to _save_ them some of their time and energy. the message you seem to want to deliver is that they are doing things the "wrong" way, and you want them to do it differently, and expend even _more_ time and energy to do it the way that _you'd_ prefer they do it... at any rate... i am gonna do the work -- myself! -- to convert the e-texts to .zml format... i suggest you should do the work yourself in bringing the e-texts up to the standard that _you_ seem to think they should have. after all, p.g. has done all the _hard_ work of actually _digitizing_ the books, so all you have to do now is make sure they actually conform to some hard-copy paper version out there... take it and run with it, jon... > Who is 'nobody'? 00.01% of the population. (1 out of 10,000.) (and heck, most people out there don't give a tinker's damn about books in the first place...) > His vision is to get the books to every person > on every continent in the world. A big vision > like this also requires great responsibility > with respect to the integrity of the texts. i'm happy to let _michael_ define his "responsibilities". more to the point, i don't define "integrity to the text" as a slavish reproduction of some paper-copy version. indeed, i think you're a bit nuts for even suggesting it, since publishers routinely rip authorial intent to shreds. and moreover, since -- thanks to google -- the scans _will_ be available for any p-book that we might want, we can always ascertain what that p-book looked like, if we need to. so i feel no call to slavishly reproduce it; my interest is maximizing the usefulness of the e-book. > You seem to be happy with Lee's conclusion > in his last paragraph, am I right? in case i wasn't perfectly clear, i think it's a cheap shot. so it doesn't surprise me one bit you'd repeat it. twice... further, i expect to see your friend david rothman echo it over on his teleblahg within the next day or two... -bowerbird p.s. and yes, folks, i know jon is trying to divert attention from babelfish12, but my pudding ain't going away now... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061106/5954b0fe/attachment-0001.html From Catenacci at Ieee.Org Mon Nov 6 15:26:18 2006 From: Catenacci at Ieee.Org (Onorio Catenacci) Date: Mon Nov 6 15:32:35 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <456909141.20061106142145@noring.name> References: <456909141.20061106142145@noring.name> Message-ID: On 11/6/06, Jon Noring wrote: > Bowerbird wrote: > > jon said: > > >> Is everybody cool with Lee's conclusion? > > > i think when you are the top dog, > > there is no shortage of mutts who > > will try to take a cheap shot at you. > > > > if you can't be "cool" toward that, > > you don't deserve to be top dog... > > Again, do you accept Lee's conclusion? If not, with which point do you > disagree? Jon, Just to be sure I'm clear: "casual reading" implies good enough to read and follow but not good enough for a scholarly dissertation? :-) I may be off-base but it seems to me that PG is much like Wikipedia; that is, take it for a good place to start but if technical accuracy is a major concern, double check with other sources. I guess I'm saying, yes, I accept Lee's conclusion. -- Onorio From Bowerbird at aol.com Tue Nov 7 00:49:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 7 00:49:28 2006 Subject: [gutvol-d] gvd061107 -- i got a little bit ahead of myself Message-ID: <3bd.1145eab1.3281a28f@aol.com> oops! i got a little bit ahead of myself in yesterday's post... i said: > from any one page, there are a number of pages to which > the reader will typically want to navigate. these are: > 1. the next page. > 2. the previous page. > 3. the page prior to this one where a chapter starts. > 4. the page following this one where a chapter starts. > 5. the table-of-contents page. > 6. other table-of-contents pages, if there are others. > 7. auxiliary views on the page, if there are any. i also said: > notice as well that -- when you use the buttons on top -- > the script generates these pages _on-the-fly_, meaning > you don't even have to _store_ them if you don't want to. while all that is true, the actual code that _implements_ those particular features and which thus forms the solid core of our e-book engine wasn't there yesterday. it's in _today's_ script. you can see the results of this code by running the new script: >?? http://www.greatamericannovel.com/scgi-bin/babelfish14.pl _now_ you can compare this script with marcello's online reader. *** oh, marcello, i'm sure you'll be _happy_ to know that the .html for at least _some_ of these pages actually _validate_! oh wow! so ok, kids, let's put those famous open-source eyeballs to work. review all the pages, to see which ones are rendered incorrectly... and marcello, for extra credit, you can tell us about any pages that _fail_ validation! we'll be sure to give those pages extra attention! *** that's it for today, since i wanna make sure you have enough time to rouse all those evil right-wing republican assholes out of office... -bowerbird p.s. the term "assholes" is _not_ meant to disparage those evangelicals who have had affairs with gay "escorts" and buy-but-do-not-use meth, as some of my best friends (fortunately) are gay, and some of my other best friends (unfortunately) are tweakers. (i like drugs, but meth is bad.) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/e989aaac/attachment.html From Bowerbird at aol.com Tue Nov 7 09:20:05 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 7 09:20:10 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen Message-ID: i said: > further, i expect to see your friend david rothman echo it > over on his teleblahg within the next day or two... bingo. > http://www.teleread.org/blog/?p=5757 -bowerbird p.s. of course, david might have done it just to yank my chain. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/bf7b48b9/attachment.html From jon at noring.name Tue Nov 7 11:28:34 2006 From: jon at noring.name (Jon Noring) Date: Tue Nov 7 11:26:48 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen In-Reply-To: References: Message-ID: <125222089.20061107122834@noring.name> Bowerbird wrote: > Bowerbird wrote: >>?further, i expect to see your friend david rothman echo it >>?over on his teleblahg within the next day or two... > bingo. > >?? http://www.teleread.org/blog/?p=5757 > > -bowerbird > > p.s.? of course, david might have done it just to yank my chain. Nope, he did it because Lee's comment was excellent. We, frankly, don't have much time these days to try to yank your chain -- we have more important fish to fry and pudding to make. Maybe when things settle down we will return to make your life miserable. Jon From Bowerbird at aol.com Tue Nov 7 12:16:49 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 7 12:16:54 2006 Subject: [gutvol-d] gvd061106 -- the story of the little red hen Message-ID: jon said: > Nope, he did it because Lee's comment was excellent. well, it was an excellent cheap shot, i will give you that... but it's good to know your position on project gutenberg. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/4dc61164/attachment.html From lee at novomail.net Tue Nov 7 15:20:44 2006 From: lee at novomail.net (Lee Passey) Date: Tue Nov 7 15:19:13 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: References: <456909141.20061106142145@noring.name> Message-ID: <455114CC.3060203@novomail.net> Onorio Catenacci wrote: [snip] > Just to be sure I'm clear: "casual reading" implies good enough to > read and follow but not good enough for a scholarly dissertation? :-) This is not a bad summation of my position, although, as usual, the devil is in the details. You have posited two extremes: 1. good enough to read and follow (to which I would add "given a modicum of effort") and 2. good enough for a scholarly dissertation. Now I would agree that the vast majority of Project Gutenberg e-texts are probably good enough to read and follow given a modicum of effort. And I suspect that you would agree that the vast majority (and perhaps even the totality) of Project Gutenberg e-texts are inadequate for a scholarly dissertation. But what about those situations which fall between the extremes? The fundamental problem is that Project Gutenberg is totally lacking in standards, so it is impossible to judge how well any given e-text matches any given use. And we probably don't have enough data to determine how well Project Gutenberg e-texts, on the whole, satisfy any external standards. But personally, I find that generally Project Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. are inadequate for assigned reading at a high-school level; AND 3. are inadequate for inclusion in any public or school library; AND 4. are inadequate for any type of automated data processing; AND 5. are mostly inadequate for /effortless/ reading. I suspect that there /may/ be some gems in the PG corpus which are adequate for any or all of the above uses, but again, because Project Gutenberg has no standards, it is virtually impossible to identify those e-texts except on a case-by-case basis. Given the lack of indications of quality, I must accept as a default position that Project Gutenberg e-texts are good enough to read and follow by a human being (and not a computer) given a modicum of effort; but nothing more. -- Nothing of significance below this line. From Bowerbird at aol.com Tue Nov 7 15:48:04 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 7 15:48:15 2006 Subject: [gutvol-d] let's take stock Message-ID: p.g. e-texts are good enough to sell to libraries. how do i know this? because a company called "netlibrary" is already making sales of various p.g. e-texts to libraries. jon and lee and rothman know this too, and they want a piece of that action, so they have to find a way to make p.g. e-texts look bad. and that's what you're seeing here... they can't point to many badly-flawed e-texts, but it doesn't matter if they stick to honest facts; the only thing that counts is if they can make up a story that is "believable" enough to fool some "investors" into forking over some cash to them, so that they can "digitize major books correctly". the story they are spinning to the investors is that they will be able to displace the netlibrary sales by positioning their e-books as "faithful to the original", while condemning p.g. e-texts as "ridden with errors", which of course is going to be anathema to librarians. again, it's a cheap shot, so they can line their pockets... the sad thing is, they probably _will_ find some suckers. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/73b620ce/attachment.html From donovan at abs.net Tue Nov 7 16:40:06 2006 From: donovan at abs.net (D Garcia) Date: Tue Nov 7 16:40:30 2006 Subject: [gutvol-d] The interesting comment by Lee and what the =?iso-8859-1?q?PG=09collection_will_become_known?= for... In-Reply-To: <455114CC.3060203@novomail.net> References: <455114CC.3060203@novomail.net> Message-ID: <200611071940.06539.donovan@abs.net> On Tuesday 07 November 2006 06:20 pm, Lee Passey wrote: [heinous snippage] > But personally, I find that generally Project > Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. > are inadequate for assigned reading at a high-school level; AND 3. are > inadequate for inclusion in any public or school library; AND 4. are > inadequate for any type of automated data processing; AND 5. are mostly > inadequate for /effortless/ reading. While some of the above may be true for some of the PG texts, you're using an awfully broad brush there. See below. > I suspect that there /may/ be some gems in the PG corpus which are > adequate for any or all of the above uses, but again, because Project > Gutenberg has no standards, it is virtually impossible to identify those > e-texts except on a case-by-case basis. Given the lack of indications of > quality, I must accept as a default position that Project Gutenberg > e-texts are good enough to read and follow by a human being (and not a > computer) given a modicum of effort; but nothing more. Project Gutenberg does have standards (albeit minimal), but PG is not the source of the texts. The volunteers are. Distributed Proofeaders is a major (perhaps "the major") source of PG texts, and DP has standards. PG benefits from these standards, in spite of their essentially passive role as repository. I'm even willing to bet that many (or even most) of the PG texts produced by DP are on average the best of the lot. (Except for the ones that aren't.) *grin* From vze3rknp at verizon.net Tue Nov 7 16:47:04 2006 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Tue Nov 7 16:49:18 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <455114CC.3060203@novomail.net> References: <456909141.20061106142145@noring.name> <455114CC.3060203@novomail.net> Message-ID: <45512908.9070805@verizon.net> Lee Passey wrote: > The fundamental problem is that Project Gutenberg is totally lacking > in standards, so it is impossible to judge how well any given e-text > matches any given use. And we probably don't have enough data to > determine how well Project Gutenberg e-texts, on the whole, satisfy > any external standards. But personally, I find that generally Project > Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND > 2. are inadequate for assigned reading at a high-school level; AND 3. > are inadequate for inclusion in any public or school library; AND 4. > are inadequate for any type of automated data processing; AND 5. are > mostly inadequate for /effortless/ reading. The accuracy of the text being produced by Distributed Proofreaders is now very high. At least as good as that of the original printers and usually better. Almost all DP books are produced in nicely laid out, html formats, many of them xhtml 1.0 strict, which I'm told can be easily turned into XML for those who need that for scholarly uses. Whether the editions prepared by DP are the ones that would be chosen for a scholarly dissertation is another question, but all edition information that appears in the book itself is included so that the reader can make that judgement. The early DP texts were not up to the current standards, but are reasonably error free. PG's standards may not be explicitly understood by outsiders (they do exist, as enforced by the white-washers, but they can be quite flexible in many areas) but DP's are reasonably clearly stated in the documentation associated with the site. It disturbs me to see people continuing to bash PG for the quality of its earlier works without acknowledging that more recent material is significantly better. Two examples, chosen from ones posted to PG (from DP) in the last couple of days are The Cliff Ruins of Canyon de Chelly, Arizona by Cosmos Mindeleff from the series of publications produced by the Bureau of American Ethnology of the Smithsonian and The History of Rome, Books 01 to 08, by Titus Livius . Whether or not this translation of Titus Livius is the "best" one, certainly the quality of these ebooks in terms of transcription quality, presentation, etc. is easily good enough for the purposes you cite. Juliet Sutherland Distributed Proofreaders From jon at noring.name Tue Nov 7 17:07:51 2006 From: jon at noring.name (Jon Noring) Date: Tue Nov 7 17:06:08 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <45512908.9070805@verizon.net> References: <456909141.20061106142145@noring.name> <455114CC.3060203@novomail.net> <45512908.9070805@verizon.net> Message-ID: <387594685.20061107180751@noring.name> Juliet wrote: > It disturbs me to see people continuing to bash PG for the quality of > its earlier works without acknowledging that more recent material is > significantly better. Two examples, chosen from ones posted to PG (from > DP) in the last couple of days are The Cliff Ruins of Canyon de Chelly, > Arizona by Cosmos Mindeleff from > the series of publications produced by the Bureau of American Ethnology > of the Smithsonian and The History of Rome, Books 01 to 08, by Titus > Livius . Whether or not this > translation of Titus Livius is the "best" one, certainly the quality of > these ebooks in terms of transcription quality, presentation, etc. is > easily good enough for the purposes you cite. The problem is that, as the saying goes, a rotten apple spoils the whole barrel. We know the quality of the DP product is very good, while PG itself admits the loosey-goosey nature of the pre-DP corpus. So once the DP stuff goes into the same barrel, it gets the PG label and boilerplate. Jon From jon at noring.name Tue Nov 7 17:32:51 2006 From: jon at noring.name (Jon Noring) Date: Tue Nov 7 17:31:10 2006 Subject: [gutvol-d] let's take stock In-Reply-To: References: Message-ID: <1339485088.20061107183251@noring.name> Bowerbird wrote: > p.g. e-texts are good enough to sell to libraries. > > how do i know this? > > because a company called "netlibrary" is already > making sales of various p.g. e-texts to libraries. The marketability of netLibrary is not the public domain texts it contains, but the copyrighted works. The corpus numbers (last I heard, which was a while ago) about 70,000 texts. Next? > they can't point to many badly-flawed e-texts, > but it doesn't matter if they stick to honest facts; > the only thing that counts is if they can make up > a story that is "believable" enough to fool some > "investors" into forking over some cash to them, > so that they can "digitize major books correctly". If a PG text does not state what the source is, that alone is a major flaw. And PG itself admits its books may be composites with editings done at the whim of the compiler(s), without alerting the reader whether it is a composite or not, and with no stated commitment to faithful reproduction. This is right in the PG BOILERPLATE. And this is a major flaw. Those are the honest facts. I don't need to talk about textual errors. In fact, not knowing the source for books which have multiple Manifestations (which are most of the great classics) means that truly knowing the textual errors is indeterminable (and yes, I know about your super-power "toolz" to hunt down and kill textual errors, but textual errors go beyond just OCR or key entry errors.) > the story they are spinning to the investors is that > they will be able to displace the netlibrary sales by > positioning their e-books as "faithful to the original", > while condemning p.g. e-texts as "ridden with errors", > which of course is going to be anathema to librarians. > > again, it's a cheap shot, so they can line their pockets... > > the sad thing is, they probably _will_ find some suckers. Shhhh, you are under NDA. You're not supposed to be telling the world about our "sooper sekrit" plan to steal the Public Domain and forever lock it up under the control of Halliburton and the Trilateral Commission. And in our nefarious plans we only need to point out that the pre-DP portion of the collection is simply untrustworthy for the reasons cited above, which PG itself admits. In fact, the PG BOILERPLATE is all that we need to seize total control of the Public Domain and earn billions of dollars. Jon From jon at noring.name Tue Nov 7 17:45:18 2006 From: jon at noring.name (Jon Noring) Date: Tue Nov 7 17:43:35 2006 Subject: =?iso-8859-1?B?UmU6IFtndXR2b2wtZF0gVGhlIGludGVyZXN0aW5nIGNvbW1lbnQgYnkg?= =?iso-8859-1?B?TGVlIGFuZCB3aGF0IHRoZSBQRwljb2xsZWN0aW9uIHdpbGwgYmVjb21l?= =?iso-8859-1?B?IGtub3duIGZvci4uLg==?= In-Reply-To: <200611071940.06539.donovan@abs.net> References: <455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net> Message-ID: <665744627.20061107184518@noring.name> D. Garcia wrote: > Project Gutenberg does have standards (albeit minimal), but PG is not the > source of the texts. The volunteers are. Distributed Proofeaders is a major > (perhaps "the major") source of PG texts, and DP has standards. PG benefits > from these standards, in spite of their essentially passive role as > repository. I'm even willing to bet that many (or even most) of the PG texts > produced by DP are on average the best of the lot. (Except for the ones that > aren't.) *grin* Agreed! DP does great work, and I love the dedication of those there. And no doubt many of the lone PG contributors also do great work, but the question is how do we know? The most important thing is DP's commitment to faithful reproduction of a known source work, and to preserve the source information. They also do much of their work in a public setting. Certainly the DP process can be improved, but the issue has more to do with a commitment to improve the quality of the work product and address the considerations of authenticity, faithfulness, trustworthiness, and all those good things. Maybe Lee's comment painted too wide of a brush to include the DP stuff (I know he did not intend that). But note that Lee's comment was on the PG trademark and its Boilerplate, which is slapped onto all DP texts when incorporated into the PG collection. This is the "wide brush" we really need to be concerned about. Jon From jon at noring.name Tue Nov 7 17:58:09 2006 From: jon at noring.name (Jon Noring) Date: Tue Nov 7 17:56:26 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <455114CC.3060203@novomail.net> References: <456909141.20061106142145@noring.name> <455114CC.3060203@novomail.net> Message-ID: <22911606.20061107185809@noring.name> Lee Passey wrote: > The fundamental problem is that Project Gutenberg is totally lacking in > standards, so it is impossible to judge how well any given e-text > matches any given use. And we probably don't have enough data to > determine how well Project Gutenberg e-texts, on the whole, satisfy any > external standards. But personally, I find that generally Project > Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. > are inadequate for assigned reading at a high-school level; AND 3. are > inadequate for inclusion in any public or school library; AND 4. are > inadequate for any type of automated data processing; AND 5. are mostly > inadequate for /effortless/ reading. > > I suspect that there /may/ be some gems in the PG corpus which are > adequate for any or all of the above uses, but again, because Project > Gutenberg has no standards, it is virtually impossible to identify those > e-texts except on a case-by-case basis. Given the lack of indications of > quality, I must accept as a default position that Project Gutenberg > e-texts are good enough to read and follow by a human being (and not a > computer) given a modicum of effort; but nothing more. Good point. Of course, with a nod to Juliet who is right to be concerned, Lee's statement paints a broad brush to include the good work done by DP. But again note that for the average consumer, who doesn't know what we know, the DP stuff gets mixed in with all the other stuff, with a common boilerplate added which doesn't exactly instill warm fuzzies ("this text may be a composite...") Another important point is that, by and large, the most popular texts in the PG collection are the well-known classics, most of which were not done by DP, and done a while back when things were much worse. (The source(s) not known, ASCIIization of extended Latin characters, etc.) I suspect that Lee's experience has been skewed, as has most other readers, by this more popular and older portion of the PG collection. The stuff DP is doing now is mostly much lesser known material (which needs to be properly digitized, of course) that will always be less popular than the classics. Jon Noring From Bowerbird at aol.com Tue Nov 7 18:21:19 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 7 18:21:31 2006 Subject: [gutvol-d] juliet responds Message-ID: juliet said: > The accuracy of the text being produced > by Distributed Proofreaders is now very high. from one extreme to the other, we swing. > The early DP texts were not > up to the current standards, > but are reasonably error free. "reasonably" being a very slippery word here. it would be nice to know, moreover, _when_ d.p. texts crossed the line into "the current standards". and what plans exist to revisit the "early texts" and improve them "up to the current standards". > It disturbs me to see people continuing to bash PG > for the quality of its earlier works without acknowledging > that more recent material is significantly better. the line between "early" and "more recent" texts doesn't mesh so neatly along the _quality_ grid. neither does the d.p./other segregation line... some of the early e-texts were quite excellent. and many of the early ones done by d.p. are not. (and the line between "early" and "recent' is unclear.) also, though it probably shouldn't need to be said, given the advances over the years in o.c.r. quality, many of the books that are done _outside_ of d.p. these days are also of excellent, error-free quality. > Two examples, chosen from ones posted to PG > (from DP) in the last couple of days make no mistake about what i'm saying, there are some very good e-books coming out of d.p. now... i haven't looked closely at some recent examples to see what i can find, but given juliet's confidence, perhaps it's time for me to do another assessment. (oh, and keep in mind what i said the other day: that all p.g. e-texts -- including the d.p. ones -- will eventually be tossed, because you discarded the _linebreak_ information. even on the e-texts that you assure us were based on a single source, without the linebreaks in place, it is so difficult to _verify_ it, that we have no choice but it do it over.) *** noring said: > And PG itself admits its books may be composites "admits"? as if this is some kind of confession? > This is right in the PG BOILERPLATE. > And this is a major flaw. no, it isn't. just because it's not what _you_ would do, it's not a "flaw", let alone "a major flaw". but of course, you keep trying to _spin_ it that way, and maybe some naive fools will come to believe you. what a cheap shot! but hey, the bush administration has proven repeatedly that fear-mongering is the modus operandi of the day... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061107/18310b30/attachment.html From lee at novomail.net Tue Nov 7 19:55:18 2006 From: lee at novomail.net (Lee Passey) Date: Tue Nov 7 19:54:48 2006 Subject: [gutvol-d] The interesting comment by Lee and what the PG collection will become known for... In-Reply-To: <200611071940.06539.donovan@abs.net> References: <455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net> Message-ID: <45515526.4000708@novomail.net> D Garcia wrote: [more major snippage] > While some of the above may be true for some of the PG texts, you're > using an awfully broad brush there. See below. You are absolutely correct. But without some means of discrimination, a broad brush is the only one I have. Going out to the Project Gutenberg web site, I can't seem to find any way to perform a search that would return /only/ those e-texts produced by Distributed Proofreaders. Picking an e-text at random, say http://www.gutenberg.org/etext/19725, I see nothing that would indicate that it was produced by Distributed Proofreaders. Looking at the text itself, I see a sentence (in a
 
block) that states "Produced by Ted Garvin, Taavi Kalju and the Online 
Distributed Proofreading Team at http://www.pgdp.net," so as a human I 
could probably safely conclude that it came from DP, but it doesn't look 
like it's in a format where I could come up with an automated way to, 
say, download all of the PG corpus and build an index of only the DP texts.

Later in that same file it says "Project Gutenberg-tm eBooks are often 
created from several printed editions, all of which are confirmed as 
Public Domain in the U.S. unless a copyright notice is included.  Thus, 
we do not necessarily keep eBooks in compliance with any particular 
paper edition." So even if the original e-text was produced consistent 
with DP's published standards I have no way of knowing whether it was 
altered in some way that would make it /inconsistent/ with those standards.

> > I suspect that there /may/ be some gems in the PG corpus which are
> > adequate for any or all of the above uses, but again, because
> > Project Gutenberg has no standards, it is virtually impossible to
> > identify those e-texts except on a case-by-case basis. Given the
> > lack of indications of quality, I must accept as a default position
> > that Project Gutenberg e-texts are good enough to read and follow
> > by a human being (and not a computer) given a modicum of effort;
> > but nothing more.
>
>  Project Gutenberg does have standards (albeit minimal), but PG is not
>  the source of the texts. The volunteers are. Distributed Proofeaders
>  is a major (perhaps "the major") source of PG texts, and DP has
>  standards. PG benefits from these standards, in spite of their
>  essentially passive role as repository. I'm even willing to bet that
>  many (or even most) of the PG texts produced by DP are on average the
>  best of the lot. (Except for the ones that aren't.) *grin*

There is no doubt in /my/ mind that the DP texts are the best of the 
lot. And without getting into any discussion here about whether DP 
standards are sufficient for most purposes, my observation is that the 
DP standards are consistently evolving for the better. Merely having 
stated standards, no matter how trivial, is a major indication of 
quality. I am quite looking forward to the time when DP begins the task 
of re-scanning and re-creating the most popular public domain books 
which make up the early part of the PG corpus. If Project Gutenberg has 
any data which may indicate the most popular downloads over the past 4 
or 5 years, I'm sure that would be very useful.

But how do I find DP e-texts, and when I've found them how do I know 
they still meet the standard? A simple commitment from the Powers That 
Be at Project Gutenberg that e-texts produced by Distributed 
Proofreaders will not be altered without a change log or revision 
control would be a /major/ advance. But that is a degree of control 
which seems inconsistent with the stated PG preference for anarchy.

Now, while all this may sound like a criticism of Project Gutenberg, it 
is not. I am merely trying to illustrate the reality of the situation. 
Project Gutenberg and its directors can choose to be whatever they want. 
Apparently, what PG wants is to be a producer of mediocre PVA e-texts. 
There is nothing wrong with that decision, and I believe it fills a very 
real market niche. Project Gutenberg's support for other projects, with 
other standards (such as PGTEI) is admirable. I have never seen a 
comment from Mr. Hart or Mr. Newby, reiterating PG's stand on freedom 
from rules, that did not also include an invitation to start a new 
project with different standards, frequently with the explicit support 
of Project Gutenberg. The problem is not that PG refuses to accept 
proposed quality standards, it is that those proposing the changes 
refuse to accept the invitation.

So, to reiterate the point that started this thread: I don't care what 
Project Gutenberg chooses to be; I only hope that it does not hold 
itself out to be something it is not, and the current disclaimer is an 
important part in helping consumers understand what they are, and are 
not, getting when they download a Project Gutenberg e-text.

From jon at noring.name  Tue Nov  7 21:15:40 2006
From: jon at noring.name (Jon Noring)
Date: Tue Nov  7 21:14:03 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <45515526.4000708@novomail.net>
References: 
	
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net>
Message-ID: <127476658.20061107221540@noring.name>

Lee wrote:

> ... So even if the original e-text was produced consistent with DP's
> published standards I have no way of knowing whether it was altered
> in some way that would make it /inconsistent/ with those standards.

This is a good point which I had not thought of before. Is there an
agreement of some sort between PG and DP that DP provided texts will
not be altered in the PG archive except with the cooperation and
consent of DP?

And in the bigger picture, if I submitted a text to PG, will my
specific requirements regarding the alterability of the text in the
PG archive be respected? Are there any policies to dictate this, or
does that copy of the etext become the property of PG to do with as
it pleases (a form of no-strings-attached donation)?


> ... I am quite looking forward to the time when DP begins the task
> of re-scanning and re-creating the most popular public domain books 
> which make up the early part of the PG corpus. If Project Gutenberg has
> any data which may indicate the most popular downloads over the past 4
> or 5 years, I'm sure that would be very useful.

For this to happen, DP would have to commit some percentage of their
work flow to a re-make project of the most popular pre-DP works. I
hope Juliet will discuss the feasibility (as well as interest in the
ranks of the DP volunteers) of this possibility.


> Now, while all this may sound like a criticism of Project Gutenberg, it
> is not. I am merely trying to illustrate the reality of the situation.
> ... I have never seen a comment from Mr. Hart or Mr. Newby,
> reiterating PG's stand on freedom from rules, that did not also
> include an invitation to start a new project with different
> standards, frequently with the explicit support of Project
> Gutenberg. The problem is not that PG refuses to accept proposed
> quality standards, it is that those proposing the changes refuse to
> accept the invitation.

The obvious question is if PG's enthusiastic encouragement to others
to start new public domain digitization projects is only for projects
that will submit their work product primarily to PG, or if the
encouragement is truly ecumenical. I hope it is the latter...


> So, to reiterate the point that started this thread: I don't care what
> Project Gutenberg chooses to be; I only hope that it does not hold 
> itself out to be something it is not, and the current disclaimer is an
> important part in helping consumers understand what they are, and are 
> not, getting when they download a Project Gutenberg e-text.

Agreed.

This has been my main point -- to communicate the status of the PG
collection from various consumer perspectives, and this is not
restricted to just casual individual readers. Different perspectives
allow readers, as consumers, to make informed decisions as to how
and with what they will spend their time (time is just as valuable as
money).

The non-scientific poll I conducted a while back on TeBC suggested
that once readers are informed of the deficiencies in many of the PG
texts (which Mr. Passey has wonderfully summarized), a significant
number do express varying levels of concern -- and in my opinion
they should because they are spending something valuable, their time,
to read them.

One of these days I plan to rerun the poll at TeleRead, with improved
wording, and which I will ask for feedback before the poll is started.
The goal is two-fold: 1) to bring the issue to the consciousness of
the ebook reading community, and 2) to gauge within a reasonable
level of certainty how that community views the issue.

Jon Noring


From sam.bretheim at gmail.com  Tue Nov  7 22:39:10 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Tue Nov  7 22:39:55 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <45515526.4000708@novomail.net>
References: 		<455114CC.3060203@novomail.net>	<200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net>
Message-ID: <45517B8E.10203@gmail.com>

Lee Passey wrote:
> You are absolutely correct. But without some means of discrimination, 
> a broad brush is the only one I have. Going out to the Project 
> Gutenberg web site, I can't seem to find any way to perform a search 
> that would return /only/ those e-texts produced by Distributed 
> Proofreaders.

http://pgdp.net/c/list_etexts.php?x=g&sort=5

It would be easy to add this data as a flag in the PG catalog, but being 
produced by DP isn't an automatic rubber stamp on quality: just like PG 
as a whole, DP took some time to arrive at good standards and practices, 
and a number of the early (2-round) DP texts are not up to DP's present 
quality level.

> Picking an e-text at random, say http://www.gutenberg.org/etext/19725, 
> I see nothing that would indicate that it was produced by Distributed 
> Proofreaders. Looking at the text itself, I see a sentence (in a 
 
> block) that states "Produced by Ted Garvin, Taavi Kalju and the Online 
> Distributed Proofreading Team at http://www.pgdp.net," so as a human I 
> could probably safely conclude that it came from DP, but it doesn't 
> look like it's in a format where I could come up with an automated way 
> to, say, download all of the PG corpus and build an index of only the 
> DP texts.

I have a script that fuses the PG standard RDF catalog, the legacy 
GUTINDEX file, the PG archive file listing, and the DP done list to 
produce a number of metadata products and reports, and it has a 
command-line flag to include only texts that did / didn't come from DP.  
I'm hesitant to just bundle the script into a front-end program because 
it downloads tens of megabytes of data each time it updates its catalog, 
and I've got far too much going on at the moment to work on it, but I'll 
try to set up a site to host it and its output sometime this year.

> So even if the original e-text was produced consistent with DP's 
> published standards I have no way of knowing whether it was altered in 
> some way that would make it /inconsistent/ with those standards.

I don't think you need to worry about the PG Errata team making work for 
themselves by arbitrarily changing things in DP texts.  Since they 
aren't leaping to defend themselves, I'll speak up: they do check 
suggested errata against the source pages, and when it comes to DP texts 
I believe they rely on DP's archived page scans for reference.
From Bowerbird at aol.com  Wed Nov  8 04:01:35 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 04:01:43 2006
Subject: [gutvol-d] let's close this up
Message-ID: 

lee said:
>    But without some means of discrimination, 
>    a broad brush is the only one I have. 

well then be a decent human being and
find "some means of discrimination", lee!

but don't just pop off, making cheap shots...


>    Going out to the Project Gutenberg web site, 
>    I can't seem to find any way to perform 
>    a search that would return /only/ those 
>    e-texts produced by Distributed Proofreaders.

all you needed to do was ask.

i can make you a list of such texts, if you want.
and you can spread it far and wide, if you want.

or you can do it yourself; just download the first 10k
of every e-text and search for the word "distributed".
if you find it, then that text was digitized by d.p.

and i'd guess that the d.p. people could help you too.



>    Picking an e-text at random, 
>    say http://www.gutenberg.org/etext/19725, 
>    I see nothing that would indicate that it was 
>    produced by Distributed Proofreaders. 
>    Looking at the text itself, I see a sentence (in a block) 
>    that states "Produced by Ted Garvin, Taavi Kalju and the 
>    Online Distributed Proofreading Team at http://www.pgdp.net,"

bingo.   "distributed".


>    so as a human I   could probably safely conclude that it came from DP, 
>    but it doesn't look   like it's in a format where I could come up with 
>    an automated way to, say, download all of the PG corpus and 
>    build an index of only the DP texts.

it's a simple scraping task, lee.   really.

your computer can do it a lot faster than you.


>    even if the original e-text was produced consistent 
>    with DP's published standards I have no way of knowing 
>    whether it was altered in some way that would make it 
>    /inconsistent/ with those standards.

it's seemed to me that the postprocessors and project managers
get pretty attached to their books, and would notice if something
like that were to start happening, and complain about it _loudly_,
so the absence of such loud complaints would lead me to believe
that something like that is not happening.   and -- in general --
i think project gutenberg leans too far in the _opposite_ direction;
namely, they are too willing to let the volunteers do it _however_
the volunteers want, even when maybe it's not really the best way.


>    There is no doubt in /my/ mind that the DP texts are the best of the 
lot. 

well then maybe you need to get out a little more, lee.

i can give you more if you want, but for starters, read this:
>    http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&
post=2005-09-30,3

jose menendez compared a digitization he did with the one
for the same book that was done by distributed proofreaders.

he found that his version had _zero_ errors in it, while the d.p. one
had _50_ errors in it.   and that was counting an entire missing page
as _1_ error, even though a good many people seemed to believe that
_every_missing_word_ from that page should be counted as an error.
(i thought that was kind of ridiculous.   to me, it was simply one error.
sure, it was a _major_ error, you have to agree.   but still, just 1 error.)

no matter how you count, though, the comparison was still _stark_.
jose -- all by himself -- had outperformed d.p. in stunning fashion.

not only that, but his version was finished long before the d.p. one,
even though they both started when google released the scan-set...

more astonishingly, as i noted a week later in a follow-up post:
>    http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&
post=2005-10-05,4
the missing page had been fixed, but not any of the other 49 errors!

they had a perfect text to compare against, and they _still_ blew it!

as i put it in that follow-up, "the lights are on, but nobody's home..."

nor is this an anomaly.   there are _lots_ of p.g. e-texts that were 
produced by one person that are nice models of high accuracy...

and conversely, there was an estimate a while back, from jim tinsley --
who, as the main whitewasher at the time was the most likely to know
-- that the average d.p.-posted e-text probably had about 50 errors.
(notice how closely that number dovetails with what jose had found.)

jim considered this to be roughly equivalent to the non-d.p. e-texts.

in other words, in his opinion, there was little difference between the
quality of the non-d.p. e-texts and the d.p. ones.   this might not be
too surprising, once you realize that he probably considered it to be
the job of the whitewashers to take the error-rate down to "about 50".

(whether it was more work to attain this rate for the d.p. texts versus
the non-d.p. texts, jim wasn't clear.   i would assume that, because of
the longer-term relationship with the d.p. folks, it would've become
easier to work with them, attain more efficient communication, etc.)


>    my observation is that the DP standards
>    are consistently evolving for the better. 

well, their quality certainly is.   jim's post estimating 50 errors/book
was a wake-up call to d.p. about their quality, and they put into place
a number of changes meant to improve their error-rate, and it worked.

basically, it involved putting additional eyeballs on each book.
whether the benefit of fewer errors offsets the increased cost
of more human time and energy is an unanswered question, but
the fact is the error-rate _is_ at least somewhat better these days.

before then, however, there were rather embarrassing shortcomings;
there would be errors in posted texts that wouldn't pass a spellcheck!

(that's right.   one of the changes was the requirement of a spellcheck
by the postprocessor.   up until then, spellcheck had been "optional".
of such stuff are d.p. standards "consistently evolving for the better".)


>    I am quite looking forward to the time when DP begins the task of 
>    re-scanning and re-creating the most popular public domain books 
>    which make up the early part of the PG corpus.

i've been waiting for that since p.g. hit the 10,000 e-text mark.
i thought once michael's long-sought goal had been realized,
there would be a reshifting of attention to quality-control ideals.

instead, there was just a continuing rush to hit the 20,000 mark.

once again, i'm hoping when _that_ mark is hit, there will be a shift.

but this time, i'm not holding my breath...


>    A simple commitment from the Powers That   Be at Project Gutenberg 
>    that e-texts produced by Distributed Proofreaders will not be altered 
>    without a change log or revision control would be a /major/ advance. 

i long ago said changelogs are an essential aspect of any e-library.
again, nada.   so my advice to you is to not hold your breath either...

but i think your underlying concern here -- that the e-texts are _fine_
when d.p. turns them over, and that then the project gutenberg side
is gonna turn all nefarious -- is disingenuous and unfair all around...

the general tone, especially from noring, is that p.g. acts in bad faith.
and frankly, that's offensive to me.   everyone here is a volunteer, and
they seem to have very big hearts, and even though i see ineptitude,
i do believe that _everyone_ from p.g. or d.p. is acting in good faith...
(i'm not sure they're all using their heads, but that's a different issue.)


>    Apparently, what PG wants is to be a producer of mediocre PVA e-texts. 

for those of you who aren't hip to lee-speak and his abbreviations,
please let me inform you that "p.v.a." stands for "plain vanilla ascii".

given the way most people think about project gutenberg e-texts,
"plain vanilla ascii" seems like a fair description of the uni-sized,
unstyled files, often viewed with a nonproportionally-spaced font.

however, if you've been following my "babelfish" messages lately,
where i'm using simple perl scripts to turn a plain-vanilla ascii file
into a nicely-formatted, high-quality set of .html pages, you know
that this blinkered way of thinking about "p.v.a." files is outdated...

if your "p.v.a." files are _mediocre_, it's because you're not letting 'em
live up to the full potential they can attain from zen markup language.


>    I have never seen a comment from Mr. Hart or Mr. Newby, 
>    reiterating PG's stand on freedom from rules, that did not also 
>    include an invitation to start a new project with different standards, 
>    frequently with the explicit support   of Project Gutenberg. 
>    The problem is not that PG refuses to accept proposed quality standards, 

>    it is that those proposing the changes refuse to accept the invitation.

wow, that's the most even-handed thing you've said in this whole exchange.

jon, are you paying attention to lee?


>    So, to reiterate the point that started this thread: I don't care what 
>    Project Gutenberg chooses to be; I only hope that it does not hold 
>    itself out to be something it is not, and the current disclaimer is an 
>    important part in helping consumers understand what they are, and 
>    are not, getting when they download a Project Gutenberg e-text.

project gutenberg does _not_ "hold itself out to be something it is not".
and i don't think you need to worry that it ever will.   hart's got 
integrity.

and bruce albrecht's point (that the boilerplate does not seem to apply
to a good many of the e-texts that are being produced these days) is,
i would think, something that could be taken into new consideration.

even if the language stays as it is, though, i don't think it's misleading.
as has been pointed out, there is ample indication in the file itself that
points to the source-text.   people are smart enough to figure it out...

also, if there _is_ new consideration about this language, then i'd think
that you'll need to ponder the other side of the teeter-totter, because
if you're gonna say that an e-text is "a faithful rendition of" its p-book,
you might then incur upon yourself some kind of good-faith obligation
to mount the scans, so end-users can verify your statement themselves.
otherwise, you're still essentially saying "trust me", and furthermore you
have elevated the degree of "trust" to a much higher and different level.

(and i'll point out that d.p. has been terribly reluctant to mount scans,
perhaps because it will immediately make their errors _discoverable_.)

also, i think you would need to document, _meticulously_, the changes
that you have made from the original.   (and to a much greater degree,
and in a more formal way, than such changes are currently elucidated.
plus, your inadvertent errors would then become "violations of trust".)

and even with _all_that_, you're still gonna have to cover your ass with
"we're doing the best we can, as unpaid volunteers", which is _exactly_
what p.g. has said all along, so i'm not sure you've added much value...
it's not like you're gonna accept any fiduciary responsibility, are you?

so i don't know if d.p. really wants to jump in this stream quite yet...
it might carry them to some places that they'd really rather not go...

***

in closing -- and just because this fact rarely seems to get mentioned --
let me remind people that the legal advice that project gutenberg got,
back in the early days, was that it should _strip_identifying_information_,
like the original publisher, date of publication, copyright statement, etc.
the legal situation of the public-domain wasn't always cut-and-dried.

it was only december of 2003, during the 10,000th e-text celebration,
this legal advice changed.   before, since we weren't eager to put our ass
on the line for legal damages that p.g. might have made itself liable for,
and i know _i_ wasn't, then i really don't think we can complain about it.
the legal situation of the public-domain _still_ ain't so cut-and-dried
that rich estates ain't willing to hire big lawyers to try to intimidate p.g.
how many of you would put your savings into a safety escrow for p.g.?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/fc58c1b8/attachment-0001.html
From greg at durendal.org  Wed Nov  8 04:29:39 2006
From: greg at durendal.org (Greg Weeks)
Date: Wed Nov  8 05:00:07 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <127476658.20061107221540@noring.name>
References: 
	
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net> <127476658.20061107221540@noring.name>
Message-ID: 

On Tue, 7 Nov 2006, Jon Noring wrote:

> For this to happen, DP would have to commit some percentage of their
> work flow to a re-make project of the most popular pre-DP works. I
> hope Juliet will discuss the feasibility (as well as interest in the
> ranks of the DP volunteers) of this possibility.

Not true. All you need is somebody willing to volunteer to CP and PM the 
content. No commitment is needed by anyone else.

-- 
Greg Weeks
http://durendal.org:8080/greg/

From joshua at hutchinson.net  Wed Nov  8 05:46:30 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 06:05:45 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>

Hell, someone scan in the "classics" they want redone and I'll start 
running them through DP myself.  I'll even mark 'em up in TEI when it 
is all done so we have a consistent markup and a built-in changelog 
system for 'em.

Josh

>----Original Message----
>From: greg@durendal.org
>Date: Nov 8, 2006 7:29 
>To: "Jon Noring", "Project Gutenberg Volunteer 
Discussion"
>Subj: Re: [gutvol-d] The interesting comment by Lee and what the PG	
collection will become known for...
>
>On Tue, 7 Nov 2006, Jon Noring wrote:
>
>> For this to happen, DP would have to commit some percentage of 
their
>> work flow to a re-make project of the most popular pre-DP works. I
>> hope Juliet will discuss the feasibility (as well as interest in 
the
>> ranks of the DP volunteers) of this possibility.
>
>Not true. All you need is somebody willing to volunteer to CP and PM 
the 
>content. No commitment is needed by anyone else.
>
>-- 
>Greg Weeks
>http://durendal.org:8080/greg/
>
>_______________________________________________
>gutvol-d mailing list
>gutvol-d@lists.pglaf.org
>http://lists.pglaf.org/listinfo.cgi/gutvol-d
>


From bruce at zuhause.org  Wed Nov  8 06:06:46 2006
From: bruce at zuhause.org (Bruce Albrecht)
Date: Wed Nov  8 06:06:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <22911606.20061107185809@noring.name>
References: 
	<456909141.20061106142145@noring.name>
	
	<455114CC.3060203@novomail.net>
	<22911606.20061107185809@noring.name>
Message-ID: <17745.58486.373181.347740@celery.zuhause.org>

Jon Noring writes:
 > But again note that for the average consumer, who doesn't know what we
 > know, the DP stuff gets mixed in with all the other stuff, with a
 > common boilerplate added which doesn't exactly instill warm fuzzies
 > ("this text may be a composite...")

Which is why I brought it up in the first place.  If the submitter
attests that this statement is not true, then why include it in the
disclaimer?  I know that with very few exceptions, it is not true for
DP works in PG (and in those cases it's mentioned in the transcriber's
notes), which is why I'd like to see an option to not have that
particular disclaimer in *my* books.  My books, and nearly all of the
rest of the DP output include the original publisher and publication
or copyright information, and it has not been stripped by the
whitewashers.
From jon at noring.name  Wed Nov  8 06:56:39 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 06:55:03 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <17745.58486.373181.347740@celery.zuhause.org>
References: 
	<456909141.20061106142145@noring.name>
	
	<455114CC.3060203@novomail.net> <22911606.20061107185809@noring.name>
	<17745.58486.373181.347740@celery.zuhause.org>
Message-ID: <7310670187.20061108075639@noring.name>

Bruce wrote:
> Jon Noring writes:

>> But again note that for the average consumer, who doesn't know what we
>> know, the DP stuff gets mixed in with all the other stuff, with a
>> common boilerplate added which doesn't exactly instill warm fuzzies
>> ("this text may be a composite...")

> Which is why I brought it up in the first place.  If the submitter
> attests that this statement is not true, then why include it in the
> disclaimer?  I know that with very few exceptions, it is not true for
> DP works in PG (and in those cases it's mentioned in the transcriber's
> notes), which is why I'd like to see an option to not have that
> particular disclaimer in *my* books.  My books, and nearly all of the
> rest of the DP output include the original publisher and publication
> or copyright information, and it has not been stripped by the
> whitewashers.

Good point.

I am confused by your last sentence; does PG strip out any information
from the DP files before inclusion in the PG collection? This would
include metadata information.

Jon

From joshua at hutchinson.net  Wed Nov  8 07:08:24 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 07:08:30 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>



>----Original Message----
>From: jon@noring.name
>
>I am confused by your last sentence; does PG strip out any 
information
>from the DP files before inclusion in the PG collection? This would
>include metadata information.
>

Once upon a time, they used to.  Now, if a text comes through with 
that meta data, it stays.

Josh

PS The stripping hasn't happened for quite some time (ie, years).
From jon at noring.name  Wed Nov  8 07:36:24 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:34:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>
References: <33431707.1162993590632.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <65889479.20061108083624@noring.name>

Josh wrote:

> Hell, someone scan in the "classics" they want redone and I'll start
> running them through DP myself.  I'll even mark 'em up in TEI when it 
> is all done so we have a consistent markup and a built-in changelog 
> system for 'em.

Well, one classic you could start with is "My Antonia" by Willa Cather.
A while back I purchased the 2nd printing of the 1918 First Edition,
chopped it, and scanned it at 600 dpi (with derivatives at lower
resolution.) See http://www.openreader.org/myantonia/

Independently, Jose Menendez (who had feedback from Bowerbird) and I
(with the help of the Cather project and Lori), produced high quality
text versions. I believe the error rate is zero based on back and forth
machine comparisons between Jose's and my versions (well mostly I
compared mine to his!) I have produced a version totally faithful to the
original 1918 edition, including marking up the errors in that original
book (error source: Jose and the Cather project) and included the line
breaks (at the exact point where the linebreak occurred, including at
the end-of-line-hyphen.)

Now one may ask why should DP "redo" this book? Here's my reasons:

1) It is a classic!

2) Unless Jose or someone else resubmitted "My Antonia" recently to PG,
   the version that is there is one that desperately needs replacing.

3) The existence of a probably zero error, faithful transcription can
   serve as a useful benchmark to see how well the DP process does in
   finding OCR errors. Machine comparisons can be made.

4) The DP process will add trustworthiness to the final work product.



Jon Noring

From jon at noring.name  Wed Nov  8 07:46:25 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:44:46 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: 
References: 
	
	<455114CC.3060203@novomail.net> <200611071940.06539.donovan@abs.net>
	<45515526.4000708@novomail.net> <127476658.20061107221540@noring.name>
	
Message-ID: <1224947266.20061108084625@noring.name>

Greg wrote:
> Jon Noring wrote:

>> For this to happen, DP would have to commit some percentage of their
>> work flow to a re-make project of the most popular pre-DP works. I
>> hope Juliet will discuss the feasibility (as well as interest in the
>> ranks of the DP volunteers) of this possibility.

> Not true. All you need is somebody willing to volunteer to CP and PM the
> content. No commitment is needed by anyone else.

Well, I do understand this.

However, if the DP volunteer "leadership" publicly discusses this, and
then comes to a collective decision that everyone should consider
doing a 'classic' every once in a while, that would go a long ways
towards redoing the classics.

In addition, the leadership should discuss, and if it makes any sense,
*suggest* certain procedures to go through with the classics, such as

1) Before picking an edition to transcribe, talk to one or two experts
   on the Work (including published bibliographies) to get their feedback
   as to the differences between editions and what they recommend should
   be digitized first. It might be their suggestion is to digitize two or
   more different editions due to various types of differences.

   Note that unlike many of the more obscure books DP now does, many of the
   classics (especially those which are translations) have been issued in
   multiple editions at different times from different publishers with
   different levels of editing. Thus it makes sense to digitize the
   edition(s) which are considered reasonably authoritative or
   recommended by the scholars and enthusiasts.

2) Do high quality scans of these books. Consider for mastering
   purposes 600 dpi full color of the text pages, 1200 dpi for any
   illustrations (which can be submitted to IA. Heck, burn them on DVD and
   I'll take a copy.) For OCR, of course, the master scans can be resampled
   as needed. (Note, if the book is chopped, I'll gladly take and
   store away the original pages so the book may be retrieved as
   needed, or if pages need rescanning.)

   Of course, make sure the scan sets (either the master or reasonable
   derivatives) are publicly available right away -- e.g., submit the page
   scans not only to IA but to PG the same time the finished text is.
   Btw, the master text itself should also reference the page scans so
   they may be correlated as has been done with my version of My
   Antonia at http://www.openreader.org/myantonia/

3) Yes, the classics should be mastered in PGTEI. And I do suggest, as
   Bowerbird suggests, that line break information be preserved. This
   is useful information to aid future error correction, plus the
   Internet Archive/OCA may find it useful for their proposed study on
   the quality of OCR engines (something Brewster has talked to DP and
   me about in the past.)


Just a few thoughts.

Jon

From jon at noring.name  Wed Nov  8 07:48:47 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 07:47:09 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <1884299832.20061108084847@noring.name>

Joshua wrote:
> Jon asked:

>> I am confused by your last sentence; does PG strip out any
>> information from the DP files before inclusion in the PG
>> collection? This would include metadata information.

> Once upon a time, they used to.  Now, if a text comes through with 
> that meta data, it stays.

Did any metadata stripping happen with the first DP texts submitted to
PG? And if so, has that been corrected since?

Jon

From jon at noring.name  Wed Nov  8 08:32:13 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 08:30:38 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: 
References: 
Message-ID: <817072195.20061108093213@noring.name>

Jon wrote:
> lee said:

>>?But without some means of discrimination, a broad brush is the
>> only one I have.

>  well then be a decent human being and
>  find "some means of discrimination", lee!
>  
>  but don't just pop off, making cheap shots...

Btw, Bowerbird you forgot to add "in my opinion":

"but don't just pop off, making (in my opinion) cheap shots..."

I don't see what Lee said as a cheap shot, but a very well reasoned
argument. I think you know it was a very strong argument, so you
simply label it a "cheap shot". That is a very emotionally-laden
phrase meant not for rational discourse, but tarring and feathering.


>  and bruce albrecht's point (that the boilerplate does not seem to apply
>  to a good many of the e-texts that are being produced these days) is,
>  i would think, something that could be taken into new consideration.

It proves that it is valuable to discuss the integrity, authenticity,
trustworthiness, etc., etc. of the PG texts. It helps people see the
various issues, and suggest improvements.


>  (and i'll point out that d.p. has been terribly reluctant to mount scans,
>  perhaps because it will immediately make their errors _discoverable_.)

Good thing you added "perhaps" -- even so you are making a serious
charge that DP is intentionally hiding the scans for reasons of
embarrassment. Do you have any evidence, or are you just fishing? I
wonder how the DP folk will take your charge?

About the old scan sets, I too hope DP will soon release them. From
what I gather, it simply is a matter of someone's time to straighten
them out (and it will require a lot of time.) I do know one thing, it
is *their* intention to release them to the public. DP is simply a
volunteer project with essentially zero bucks that has submitted over
half of the texts to PG. They are now overwhelmed by their own
success.

Why you have this seeming undercurrent obsession against DP is
disturbing. I would think that Michael considers DP a rousing success.
Even if the transcription error of the DP texts is only the same as
that submitted by individuals (on average), then I would surmise that
Michael considers that good enough. DP has literally doubled PG's
collection in a very short time. They are definitely doing something
right.

Jon Noring



From sly at victoria.tc.ca  Wed Nov  8 08:56:49 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov  8 08:56:54 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
Message-ID: 


I'm not sure that I would use the word "metadata" here.
To me, that seems to imply something that is structured,
that can be extracted and/or processed by a computer.

In this case, we are just talking about transcription
of items, such as a publisher's name, which appear on
a title page.

Andrew

On Wed, 8 Nov 2006, joshua@hutchinson.net wrote:

> >From: jon@noring.name
> >
> >I am confused by your last sentence; does PG strip out any
> information
> >from the DP files before inclusion in the PG collection? This would
> >include metadata information.
> >
>
> Once upon a time, they used to.  Now, if a text comes through with
> that meta data, it stays.
>
> Josh
>
> PS The stripping hasn't happened for quite some time (ie, years).
> _______________________________________________
From jon at noring.name  Wed Nov  8 09:12:03 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 09:10:31 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: 
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>
	
Message-ID: <803308369.20061108101203@noring.name>

Andrew wrote in reply to my use of the word metadata:

> I'm not sure that I would use the word "metadata" here.
> To me, that seems to imply something that is structured,
> that can be extracted and/or processed by a computer.

Understandable. In the broadest sense, metadata is simply information
(or data) about data, with no structuring implied.

What does PGTEI do with respect to metadata?


Jon Noring


From joshua at hutchinson.net  Wed Nov  8 09:15:49 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 09:16:11 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>



>----Original Message----
>From: jon@noring.name
>
>What does PGTEI do with respect to metadata?
>

Metadata is stored in the  section of the document.

See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an 
example from the guidelines.

Josh
From jon at noring.name  Wed Nov  8 09:27:55 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 09:26:17 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
References: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <895085912.20061108102755@noring.name>

Joshua wrote:
> Jon asked:

>> What does PGTEI do with respect to metadata?

> Metadata is stored in the  section of the document.
>
> See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an 
> example from the guidelines.

Oops, I should have qualified my original request with what metadata
is recorded. I'm very interested in the source metadata, which the
above metadata does not include (since it is a PGTEI remake of an
older pre-DP work.)

I would assume that the PGTEI metadata header will include source
metadata when that is known?

Jon

From marcello at perathoner.de  Wed Nov  8 09:36:41 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed Nov  8 09:36:49 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <803308369.20061108101203@noring.name>
References: <4427813.1162998504969.JavaMail.?@fh1037.dia.cp.net>	
	<803308369.20061108101203@noring.name>
Message-ID: <455215A9.9050601@perathoner.de>

Jon Noring wrote:


> What does PGTEI do with respect to metadata?

Look here:

  http://www.tei-c.org/P4X/HD.html

and also here:

  http://www.tei-c.org/Lite/U5-header.html


You can store in the PGTEI header anything that is valid in TEI.

Not everything will be evaluated while generating the user formats but
nothing will be lost for future generations (and later implementations).


-- 
Marcello Perathoner
webmaster@gutenberg.org

From joshua at hutchinson.net  Wed Nov  8 09:42:18 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 09:42:25 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
Message-ID: <15127149.1163007738524.JavaMail.?@fh1037.dia.cp.net>



>----Original Message----
>From: jon@noring.name
>
>I would assume that the PGTEI metadata header will include source
>metadata when that is known?
>

Here is an example from a fairly recent one I did:

 
  
  The Cathedral Church of Hereford
  A. Hugh Fisher
  
   London
   George Bell and Sons
   1898
  
  
 

Josh

PS If you want lots of wordy documentation, here is a link the the TEI-
Lite doc area:  http://www.tei-c.org/Lite/U5-header.html

From sly at victoria.tc.ca  Wed Nov  8 09:57:04 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Wed Nov  8 09:57:10 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
References: <16192932.1163006149445.JavaMail.?@fh1037.dia.cp.net>
Message-ID: 


An interesting point to make is that TEI makes provision for
two separate sections which at first look similar (based on
content). One for a description of the source, and one for
a description of the current transcription.

Although there can easily be overlap between these two,
it still makes a lot of sense to keep them separate, because
there is information that is quite distinctly applicable
to each.

For example, some bits of information that may describe
the original book would be publisher, lccn, number of
pages, perhaps a title statement including a typo, etc.

And some bits of information specific to the PG text would
be release date, PG number, etc.

Andrew

On Wed, 8 Nov 2006, joshua@hutchinson.net wrote:

>
>
> >----Original Message----
> >From: jon@noring.name
> >
> >What does PGTEI do with respect to metadata?
> >
>
> Metadata is stored in the  section of the document.
>
> See http://pgtei.pglaf.org/marcello/0.4/doc/20000-h.html#toc22 for an
> example from the guidelines.
>
> Josh
> _______________________________________________
> gutvol-d mailing list
> gutvol-d@lists.pglaf.org
> http://lists.pglaf.org/listinfo.cgi/gutvol-d
>
From kth at srv.net  Wed Nov  8 09:36:01 2006
From: kth at srv.net (Kevin Handy)
Date: Wed Nov  8 10:06:47 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: <817072195.20061108093213@noring.name>
References: 
	<817072195.20061108093213@noring.name>
Message-ID: <45521581.8080906@srv.net>

Jon Noring wrote:

>Good thing you added "perhaps" -- even so you are making a serious
>charge that DP is intentionally hiding the scans for reasons of
>embarrassment. Do you have any evidence, or are you just fishing? I
>wonder how the DP folk will take your charge?
>  
>
I believe one major problem is finding enough disk space
to hold the scans. One simple b&w books images are likely
to be in the 10Mb range. This increases greatly if the images
are color or greyscale. 10,000(+) books will need serious
amounts of disk space, terra-bytes likely, and also needs
enough network bandwidth to serve this amount of data.

From lee at novomail.net  Wed Nov  8 10:08:29 2006
From: lee at novomail.net (Lee Passey)
Date: Wed Nov  8 10:06:58 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <454D8138.6040307@gmail.com>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com>
Message-ID: <45521D1D.10801@novomail.net>

Sam Bretheim wrote:

> joshua@hutchinson.net wrote:
> 
>> The gist is, the TEI files are not meant to be parse by a web browser, 
>> so the fact that they DON'T display properly basically means 
>> everything is working according to design.
> 
> It's worth mentioning that modern Web browsers are quite capable of 
> displaying TEI reasonably well, though some work on the relevant TEI and 
> XSL stylesheets is necessary before they're ready to be widely used.

Indeed, it has been well known for some time that many major browsers 
are capable of displaying TEI reasonably well. See 
http://lists.pglaf.org/private.cgi/gutvol-d/2005-September/003135.html.

If they are constructed correctly, there is no technical reason why TEI 
files cannot be used directly in web browsers. Web browsers cannot 
display 100% of the richness captured by TEI, but the display would 
still be better than the simplified ASCII text version. And there is no 
reason why a single TEI file could not be created that would satisfy the 
needs of both direct rendering and transformations via an XSL script. It 
is not done simply because those involved in the PGTEI project have 
chosen not to do it.

> For instance, if the author had inserted the following near the 
> beginning of the document, it would have rendered quite tolerably in 
> recent versions of Firefox/Mozilla/Camino, Konqueror/Safari/OmniWeb, 
> Opera, and iCab.  (IE and Amaya have trouble with some of the code in 
> this CSS file; I'll try to figure out how to make them display TEI 
> properly.)

FWIW, I tested both your documents and my documents in IE7 and the 
result is basically identical to the behavior of IE6. Opening your 
stylesheet in Microsoft Visual Studio dotnet, with validation turned on 
indicated a number of errors of the type "'content' is not a known CSS 
property name." My suspicion is that Microsoft has simply not yet gotten 
beyond support for CSS1 ('content' is a new addition in CSS2), so your 
document may look good in IE if you can figure out how to style using 
only CSS1 selectors.

>  href="http://www.shinparam.org/Sam/Projects/TEI-CSS/prettynovel.css"?>

My own preference is for /two/ CSS declarations: one which refers to a 
generic CSS file which would make /all/ PGTEI books look acceptable (I 
have suggested PGTEI.CSS) and a second where an end user could place all 
his personal preferences as overrides (I suggest PGTEI-USER.CSS) (I 
/always/ include statements which will replace serifed fonts with 
sans-serifed fonts). If the second stylesheet does not exist, User 
Agents will simply ignore it and use only the first, generic stylesheet.

> Here are two books I'm in the midst of proofing and marking up, both of 
> which look fairly good when viewed with that CSS stylesheet:
> 
> http://shinparam.org/Sam/Projects/TEI-CSS/Bronte-Shirley-draft.xml
> 
> http://shinparam.org/Sam/Projects/TEI-CSS/Roberts-Bookbinding-draft.xml

These are really great! Hopefully they will become worthwhile additions 
to PG's TEI collection. When they do I hope you will insist that those 
constructs which make the documents viewable in a web browser must be 
retained.

-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Wed Nov  8 10:07:14 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 10:07:32 2006
Subject: [gutvol-d] re: let's close this up
Message-ID: <3fc.59fc8c9.328376d2@aol.com>

jon, i ain't gonna go 'round your merry-go-round again.
not here, anyway, and not now.   maybe elsewhere, later...

i do my homework, on d.p. and everything else i talk about,
so if anyone questions what i say, i'll be happy to back it up.
plus i know a cheap shot when i see it.   so do other people.

as for appreciating d.p., i appreciate them as much as you do.
and that's despite the fact that i grok their flaws better than you.
and i love them enough to actually _tell_ them about those flaws,
rather than slather them in a slobbery wad of adoring fangirl goo.

as for the d.p. scans, here's some help for you on your homework:
>    http://www.pgdp.org/ols/tools/
congratulations to donovan on whipping them into shape...
as soon as d.p. is ready to release them, i'm ready to scrape.

***

josh said:
>   The stripping hasn't happened for quite some time (ie, years).

3 years ago, next month, to be precise.

(y'all really _should_ know your history.)

as i said, the decision was made at the
december, 2003, celebration of 10,000.

jon was there, making his standard pitch,
so i'm not sure why he's so surprised now
to find that this "meta-data" was "stripped"
as a routine matter of policy until that time.
(oh, i get it.   his "shock" now is the faux kind.)

juliet and charlez were arguing for retention.
_i_ even said that -- in cases where an e-text
was generated on the basis of a single p-book
-- that the meta-data could/should be kept...

but, you know, none of _us_ are _lawyers_, eh?

so hey, how much would _you_ listen to us, eh?

it was only the presence of new legal advice
that assured michael and greg to change the
longstanding policy to drop identifying data.

and it's probably only proper to recognize too
that it was _only_ because brewster kahle was
_seeking_ a legal case to set a good precedent
that made steve harris, p.g.'s pro bono lawyer
-- and working for kahle as well at the time --
believe he could give such advice _and_ make it
stick in court, with brewster's money behind him.

michael or greg should feel free to correct me if
i'm wrong about this, but it was clear at the time.
brewster was unafraid of an unambiguous case...

so you've got brewster's backing to thank for this.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/ef912f4f/attachment-0001.html
From Bowerbird at aol.com  Wed Nov  8 10:16:57 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 10:17:59 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <4be.a867516.32837919@aol.com>

lee said:
>    Web browsers cannot display 
>    100% of the richness captured by TEI, 
>    but the display would still be 
>    better than the simplified ASCII text version.

bigger costs to code the document into .tei, with
no benefits greater than those delivered by .zml.

let's stop with the ascii strawman.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/9dd79c6d/attachment.html
From jon at noring.name  Wed Nov  8 10:37:28 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 10:35:49 2006
Subject: [gutvol-d] Kahle versus Gonzales on orphan works soon to be heard
Message-ID: <1011748809.20061108113728@noring.name>

Nov. 13 to be specific. For more info, see:

http://www.archive.org/iathreads/post-view.php?id=76756

From j.boelaert at skynet.be  Wed Nov  8 10:40:56 2006
From: j.boelaert at skynet.be (Johan Boelaert)
Date: Wed Nov  8 11:04:05 2006
Subject: [gutvol-d] Project Gutenberg
Message-ID: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be>

Hello

Every now and then I visit Project Gutenberg Europe, hoping something has
changed to improve it. But today also, it hasn't. 

On the homepage (http://pge.rastko.net/), we read: "Project Gutenberg Europe
is follower of Project Gutenberg philosophy, focused primarily on digitizing
European cultures, under European copyright laws."  If this were true, it
would be an excellent opportunity to publish European books online, which
can't be published in Gutenberg U.S., because of the much more severe
copyright laws over there.

But alas, on the page "Copyright HOWTO",
(http://pge.rastko.net/howto/copyright-howto) we read: "Project Gutenberg's
copyright rules are for Project Gutenberg eBooks, and apply only to works we
release on our main servers. Project Gutenberg is US-based, follows US
rules, and the only servers on which the collection is directly maintained
are in the US." If you try to ask for a clearance, you are redirected to
Getenberg US.

Is there any possibility to improve this site?

 

Greetings from Belgium, Europe (copyright: life +70)

Johan

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/f88052f9/attachment.html
From Bowerbird at aol.com  Wed Nov  8 12:11:07 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 12:11:23 2006
Subject: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Message-ID: 

the article by paul duguid that was published
in the influential "first monday" series recently
was quite insightful on a number of dimensions.

i've written up a good many observations on it,
but i think i will begin with a very simple one...

duguid analyzes the proposition that open-source
_software_development_ might not necessarily apply
-- philosophically -- to other _distributed_ processes.

he specifically includes project gutenberg in his paper,
as well as gracenote (which collects musical meta-data),
and the big cahuna of distributed work today, wikipedia.

at the most basic of levels, duguid argues that software 
has a fundamental check of its worthiness, in the form
of whether the compiler will actually _execute_ the code.

errors that would cause the compiler to abort the code
are quickly identified on that basis, without any analysis.

as duguid puts it:
>    solutions must compile and run. 
>    Hence, while Open Source software has relied heavily 
>    on peer production and to a lesser extent on peer review, 
>    for quality, it relies as heavily though perhaps less obviously, 
>    on the chip and the compiler as ultimate arbiters. 
>    These two both identify problems with the code 
>    and reject inadequate solutions

duguid surmises that other types of distributed projects
-- especially the three that are under his microscope --
do not have this immediate and telling check for errors...

in a letter she sent to duguid responding to his article,
juliet from distributed proofreaders made this claim:
>    By successive revision, in comparing text to image 
>    3-5 times for each page, and then by applying 
>    standard tools, checks, etc (quite parallel, in a way, 
>    to compiling code) DP achieves transcription accuracy

you really should read the entirety of juliet's letter.   it's at:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=22839&start=0

but for right now, i want to focus in on her statement that d.p.'s
"standard checks" are "quite parallel, in a way, to compiling code".

in my experience, both as a person who executes source code
and as a person who has examined the d.p. output very closely,
this statement strains credulity, and even borders on ridiculous.

and i will give an example right here to back up that assessment.

i'm using the e-text entitled "the hawaiian romance of laieikawai":
>    http://www.gutenberg.org/etext/13603

as you can see, this e-text is #13603, which means it is in the
"later half" of the p.g. library.   the files are dated 2004/10/5.
this e-text was included in the bundle prepared to celebrate
the 5,000th e-text created by d.p., so it's in the "later half" of
all e-texts digitized by d.p. as well.   (their total now is 9,320.)

this file predates the switch from a 2-round to a 4-round system,
but i'm not going to focus on _proofing_ errors right now, merely
on mistakes that dispute juliet's "similar to compiling code" claim.

i've appended to this post the first line of each footnote, in order.

there are all kinds of comments i could make (and will make later)
about the footnote numbering, but again, remember the focus here.

to instantiate the kind of checks that are similar to "chip and compiler",
these numbers should be prepended or appended with another string, 
so each separate one becomes _uniquely_identified_ within the book.
i like to use page-number, or the chapter-number or chapter-name,
but even a consecutive bookwide series of numbers will do fine, like:

>    [Footnote 1-01: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2-02: Daggett calls the story "a supernatural folklore legend 
of
>    [Footnote 3-04: The changes introduced by these editors have not been
>    [Footnote 4-05: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5-06: The most important of these chants translated from the
>    [Footnote 1-07: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2-08: Lesson says of the Polynesian groups (I, 378): "On sait 
...
>    [Footnote 3-09: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,

but the problems go even deeper than a failure to relabel the footnotes
to make them unambiguous.   they cut to the fundamental cornerstone of 
_actually_having_the_correct_numbers_in_the_e-text_in_the_first_place_.

look at the lines marked with "xxxxxx" and you'll see that:
a)   footnote 24 is incorrectly labeled as footnote 21.
b)   footnote 26 is incorrectly labeled as footnote 28.
c)   footnote 76 is incorrectly labeled as footnote 78.
d)   footnote 2 (toward the bottom) is incorrectly labeled as footnote 3.

these mistakes were located with a basic consistency-checking program.

if errors this _blatant_ -- which could've been identified as easily by d.p.
as they were by me, simply by using a program as elementary as mine --
were allowed to pass through the quality-control checks, what confidence
can we have that there are not _many_ more, and far more subtle in nature?

nor were these the only errors, even in the easy task of labeling footnotes.

take a look at these lines, again in order, this time from the _body_ of the 
e-text:
>    became Kekalukaluokewa's, and he portioned out the land[61] and set up
>    Then quickly he went to consult his sister, to Malio.[62]
>    they ask you what long waves you surf on say on the _Huia_.[63] If they
>    over to the coast where Kumukahi[64] swims in the billows, then this is
>    xxxxxx          get the foster child of Kapukaihaoa, Laielohelohe,[66] 
who is like
>    took his foster child's umbilical cord[66] and wore it about his neck.
>    snails[67] singing, then do you two meet apart from the assembly.
>    "The marriage of the chiefs! The marriage of the chiefs!"[68]
>    back and stopped at their sources, no water flowed into the sea.[69]
>    After this the seer took Laieikawai's skirt[70] and laid it down on the
>    Kaeloikamalama's neck.[71]
>    Tahiti."[72]
>    Moanalihaikawaokele and Laukieleula."[73]
>    xxxxxx          Then that bird[71] drooped its wings down and its body 
remained aloft,
>    month bad weather closes down,[75] when the storm clears, there I am
>    place, and they resemble evil spirits in their nature.[76]

as you will notice, the anchor for footnote #65 is incorrectly labeled as 
"66",
and the anchor for footnote #74 is incorrectly labeled as "71".   
embarrassing,
that something as simple as a sequential number-set could contain such flaws.

again, there are _6_ errors in this e-text, just in the footnote numbering 
alone!

and that's a series of _sequential_ numbers, and thus easily checked by 
machine.

further, it's a check that _should_ have been in place many, many years ago, 
by 
both distributed proofreaders _and_ project gutenberg proper.   yet this 
e-text
-- with these _obvious_ errors -- was posted two years ago, and never 
updated.

i will have more to say about duguid later.   but for now, one thing is 
certain...

the checks that are done are _clearly_very_far_ from being "chip/compiler" 
checks.

-bowerbird

>    xxxxxx          [Footnote A: The titles of chapters are added for
>    [Footnote 1: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2: Daggett calls the story "a supernatural folklore legend of
>    [Footnote 3: The changes introduced by these editors have not been
>    [Footnote 4: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5: The most important of these chants translated from the
>    [Footnote 1: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2: Lesson says of the Polynesian groups (I, 378): "On sait ...
>    [Footnote 3: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,
>    [Footnote 4: Lesson (II, 190) enumerates eleven small islands, covering
>    [Footnote 5: _Kahiki_, in Hawaiian chants, is the term used to designate
>    [Footnote 6: Lesson, II, 152.]
>    [Footnote 7: Ibid., 170.]
>    [Footnote 8: Ibid., 178.]
>    [Footnote 1: In the Polynesian picture of the universe the wall of
>    [Footnote 2: The Rarotongan world of spirits is an underworld. (See
>    [Footnote 3: White, I, chart; Gill, Myths and Songs, pp. 3, 4; Ellis,
>    [Footnote 4: Gill says of the Hervey Islanders (p. 17 of notes): "The
>    [Footnote 5: Bastian, Samoanische Sch?pfungs-Sage; Ellis, I, 321; White,
>    [Footnote 6: Moerenhout translates (I, 419): "He was, _Taaroa_ (Kanaloa)
>    [Footnote 7: Moerenhout, I, 423: "_Taaroa_ slept with the woman called
>    [Footnote 8: Grey, pp. 38-45; Kr?mer, Samoa Inseln, pp. 395-400; Fison,
>    [Footnote 9: In Fornander's collection of origin chants the Hawaiian
>    [Footnote 1: Mariner, II, 103; Turner, Nineteen Tears in Polynesia, pp.
>    [Footnote 2: When a Polynesian invokes a god he prays to the spirit of
>    [Footnote 3: Bird-bodied gods of low grade in the theogony of the
>    [Footnote 4: With the stories quoted from Fornander may be compared such
>    [Footnote 1: Grey, pp. 1-15; White, I, 46; Baessler, Neue S?dsee-Bilder,
>    [Footnote 2: Compare Kr?mer's Samoan story (in Samoa Inseln, p. 413) of
>    [Footnote 3: Kr?mer, Samoa Inseln, pp. 44, 115; Fison, pp. 16,
>    [Footnote 1: As such Paliuli occurs in other Hawaiian folk tales:
>    [Footnote 2: The gods Kane and Kanaloa, who live in the mountains of
>    [Footnote 3: Although the earthly paradise has the same location in both
>    [Footnote 4: First generation: Waka, Kihanuilulumoku,
>    [Footnote 1: J.A. Macculloch (in Childhood of Fiction, p. 2) says,
>    [Footnote 2: Moerenhout, II, 4, 265.]
>    [Footnote 3: Gracia (p. 47) says that the taboo consists in the
>    [Footnote 4: Compare Kr?mer, Samoa Inseln, p. 31; Stair, p. 75; Turner,
>    [Footnote 5: In certain groups inheritance descends on the mother's side
>    [Footnote 6: Kr?mer (p. 32 et seq.) tells us that in Samoa the daughter
>    [Footnote 7: Rivers, I, 374; Malo, p. 80.
>    [Footnote 8: Keaulumoku's description of a Hawaiian chief (Islander,
>    [Footnote 9: Stair, p. 220; Gracia, p. 59; Alexander, History, chap. IV;
>    [Footnote 10: Gracia, p. 46; Mariner, II, 87, 101, 125; Gill, Myths and
>    [Footnote 11: Malo, p. 69.]
>    [Footnote 12: Ellis (III, 36) describes the art of medicine in
>    [Footnote 1: Jarves says: "Songs and chants were common among all
>    [Footnote 2: Moerenhout, I, 411.]
>    [Footnote 3: Andrews, Islander, 1875, p. 35; Emerson, Unwritten
>    [Footnote 4: In Fornander's story of _Lonoikamakahiki_, the chief
>    [Footnote 5: Compare with Ellis, I, 286, and Williams and Calvert, I,
>    [Footnote 6: Gill, Myths and Songs, pp. 268 et seq.]
>    [Footnote 7: See Fornander's stories of _Lonoikamakahiki, Halemano_, and
>    [Footnote 1: In the Hawaiian Annual, 1890, Alexander translates some 
notes
>    [Footnote 2: Moerenhout (I, 501-507) says that the Areois society in
>    [Footnote 3: Emerson, Unwritten Literature, p. 24 (note).]
>    [Footnote 4: This is well illustrated in Fornander's story of
>    [Footnote 5: Thomson says that the Fijians differ from the Polynesians
>    [Footnote 1: Turner, Samoa, p. 220.]
>    [Footnote 2: Ibid.; Moerenhout, I, 407-410.]
>    [Footnote 3: Turner, Samoa, pp. 216-221; Williams and Calvert, I, p.
>    [Footnote 4: Williams and Calvert, I, 118.]
>    [Footnote 5: Moerenhout, II, 146.]
>    [Footnote 1: See Moerenhout, II, 210; Jarves, p. 34; Alexander in
>    [Footnote 2: Fison, p. 100.]
>    [Footnote 1: The following examples are taken from the Laieikawai, where
>    [Footnote 2: In the course of the story of _Laieikawai_ occur more than
>    [Footnote 1: _Kuakoa_, iv, No. 31, translated also in _Hawaiian Annual_,
>    [Footnote 1: Title pages.
>    [Footnote 1: For the translation of Haleole's foreword, which is in a
>    [Footnote 1: Haleole uses the foreign form for wife, _wahine mare_,
>    [Footnote 2: The chief's vow, _olelo paa_, or "fixed word," to slay all
>    [Footnote 3: The phrase _nalo no hoi na wahi huna_, which means 
literally
>    [Footnote 4: Prenatal infanticide, _omilomilo_, was practiced in various
>    [Footnote 5: The _manini_ (_Tenthis sandvicensis_, Street) is a
>    [Footnote 6: The month _Ikuwa_ is variously placed in the calendar year.
>    [Footnote 7: The adoption by their grandparents and hiding away of the
>    [Footnote 8: The _iako_ of a canoe are the two arched sticks which hold
>    [Footnote 9: The verb _hookuiia_ means literally "cause to be pierced"
>    [Footnote 10: Hawaiian challenge stories bring out a strongly felt
>    [Footnote 11: In his invocation the man recognizes the two classes of
>    [Footnote 12: With this judgment of beauty should be compared
>    [Footnote 13: The building of a _heiau_, or temple, was a common means
>    [Footnote 14: The nights of Kane and of Lono follow each other on the
>    [Footnote 15: By _kahoaka_ the Hawaiians designate "the spirit or soul
>    [Footnote 16: The feathers of the _oo_ bird (_Moho nobilis_), with which
>    [Footnote 17: The reference to the temple of Pahauna is one of a number
>    [Footnote 18: The whole treatment of the Kauakahialii episode suggests 
an
>    [Footnote 19: These are all wood birds, in which form Gill tells us 
(Myths
>    [Footnote 20: _Moaulanuiakea_ means literally "Great-broad-red-cock,"
>    [Footnote 21: Compare Gill's story of the first god, Watea, who dreams
>    [Footnote 22: In the song the girl is likened to the lovely _lehua_,
>    [Footnote 23: No other intoxicating liquor save _awa_ was known to the
>    xxxxxx          [Footnote 21: In the Hawaiian form of checkers, called 
_konane_, the
>    25              [Footnote 25: The _malo_ is a loin cloth 3 or 4 yards 
long and a foot
>    xxxxxx          [Footnote 28: In Hawaiian warfare, the biggest boaster 
was the best man,
>    27              [Footnote 27: The idiomatic passages "_aohe puko momona 
o Kohala_,"
>    [Footnote 28: This boast of downing an antagonist with a single blow is
>    [Footnote 29: Shaking hands was of foreign introduction and marks one of
>    [Footnote 30: Famous Hawaiian boxing teachers kept master strokes in
>    [Footnote 31: Few similes are used in the story. This figure of the
>    [Footnote 32: The Polynesians, like the ancient Hebrews, practiced
>    [Footnote 33: The gods invoked by Aiwohikupua are not translated with
>    [Footnote 34: The _laau palau_, literally "wood-that-cuts," which Wise
>    [Footnote 35: The Hawaiian cloak or _kihei_ is a large square, 2 yards
>    [Footnote 36: The meaning of the idiomatic boast _he lala kamahele no ka
>    [Footnote 37: The _puloulou_ is said to have been introduced by Paao
>    [Footnote 38: Long life was the Polynesian idea of divine blessing. Of
>    [Footnote 39: Chickens were a valuable part of a chief's wealth, since
>    [Footnote 40: Mr. Meheula suggested to me this translation of the
>    [Footnote 41: A peculiarly close family relation between brother and
>    [Footnote 42: For the translation of this dialogue I am indebted, to the
>    [Footnote 43: To express the interrelation between brothers and sisters
>    [Footnote 44: The line translated "Fed upon the fruit of sin" contains
>    [Footnote 45: This _ti_-leaf trumpet is constructed from the thin, dry,
>    [Footnote 46: In the story of _Kapuaokaoheloai_ we read that the
>    [Footnote 47: A strict taboo between man and woman forbade eating
>    [Footnote 48: The place of surf riding in Hawaiian song and story
>    [Footnote 49: _Honi_, to kiss, means to "touch" or "smell," and
>    [Footnote 50: The abrupt entrance of the great _moo_, as of its
>    [Footnote 51: The _ieie_ vine and the sweet-scented fern are, like the
>    [Footnote 52: The fight between two _kupua_, one in lizard form, the
>    [Footnote 53: The season for the bird catcher, _kanaka kia manu_, lay
>    [Footnote 54: For the cloud sign compare the story of Kualii's battles
>    [Footnote 55: Of Hawaiians at prayer Dibble says: "The people were in
>    [Footnote 56: The three mountain domes of Hawaii rise from 13,000 to
>    [Footnote 57: The games of _kilu_ and _ume_, which furnished the popular
>    [Footnote 58: In the story of Kauakahialii, his home at Pihanakalani is
>    [Footnote 59: The Hawaiian custom of group marriages between brothers or
>    [Footnote 60: The Hawaiian flute is believed to be of ancient origin. It
>    [Footnote 61: At the accession of a new chief in Hawaii the land is
>    [Footnote 62: The names of Malio and Halaaniani are still to be found in
>    [Footnote 63: The _huia_ is a specially high wave formed by the meeting
>    [Footnote 64: Kumukahi is a bold cape of black lava on the extreme
>    [Footnote 65: The name of Laieikawai occurs in no old chants with which
>    [Footnote 66: To preserve the umbilical cord in order to lengthen the
>    [Footnote 67: More than 470 species of land snails of a single genus,
>    [Footnote 68: This incident is unsatisfactorily treated. We never know
>    [Footnote 69: This episode of the storm is another inconsistency in the
>    [Footnote 70: The _pa-u_ is a woman's main garment, and consists of five
>    [Footnote 71: In mythical quest stories the hero or heroine seeks, by
>    [Footnote 72: According to the old Polynesian system of age groups, the
>    [Footnote 73: The name Laukieleula means "Red-kiele-leaf." The kiele,
>    [Footnote 74: The story of the slaying of Halulu in the legend of
>    [Footnote 75: The divine approach marked by thunder and lightning,
>    xxxxxx          [Footnote 75: Kaonohiokala, Mr. Emerson tells me, is the 
name of one of
>    [Footnote 1: Compare Westervelt's Gods and Ghosts, p. 66.]
>    [Footnote 1: The rock called Kaneaukai, "Man-floating-on-the-sea," on
>    [Footnote 1: See _Kamapuaa_, where the same feat is described.]
>    [Footnote 1: Compare the fishhook Pahuhu in _Nihoalaki_; the _leho_
>    [Footnote 1: Compare _Kalelealuaka_.]
>    [Footnote 1: This is not the Olopana of Hawaii.]
>    [Footnote 1: This is only a fragment of the very popular story of the
>    [Footnote 2: Rev. A.O. Forbes's version of this story is printed in
>    [Footnote 1: See Daggett's account, who places Moikeha's role in the
>    [Footnote 1: Kaulu meets the wizard Makalii in rat form and kills him by
>    xxxxxx          [Footnote 3: Daggett tells the story of _Hua_, priest of 
Maui.]
>    [Footnote 1: This story Fornander calls "the most famous in Hawaiian
>    [Footnote 1: One of the most popular heroes of the Puna, Kau, and Kona
>    [Footnote 1: Mr. Stokes found on the rocks at Kahaluu, near the _heiau_
>    [Footnote 1: This story is much amplified by Mrs. Nakuina in Thrum, p.
>    [Footnote 1: See Thrum, p. 43.]
>    [Footnote 1: Daggett tells this story.]
>    [Footnote 1: Gill tells this same story from the Hervey group. Myths and
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/3d8978ff/attachment-0001.html
From Bowerbird at aol.com  Wed Nov  8 12:17:59 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Wed Nov  8 12:18:08 2006
Subject: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Message-ID: 

the article by paul duguid that was published
in the influential "first monday" series recently
was quite insightful on a number of dimensions.

i've written up a good many observations on it,
but i think i will begin with a very simple one...

duguid analyzes the proposition that open-source
_software_development_ might not necessarily apply
-- philosophically -- to other _distributed_ processes.

he specifically includes project gutenberg in his paper,
as well as gracenote (which collects musical meta-data),
and the big cahuna of distributed work today, wikipedia.

at the most basic of levels, duguid argues that software 
has a fundamental check of its worthiness, in the form
of whether the compiler will actually _execute_ the code.

errors that would cause the compiler to abort the code
are quickly identified on that basis, without any analysis.

as duguid puts it:
>    solutions must compile and run. 
>    Hence, while Open Source software has relied heavily 
>    on peer production and to a lesser extent on peer review, 
>    for quality, it relies as heavily though perhaps less obviously, 
>    on the chip and the compiler as ultimate arbiters. 
>    These two both identify problems with the code 
>    and reject inadequate solutions

duguid surmises that other types of distributed projects
-- especially the three that are under his microscope --
do not have this immediate and telling check for errors...

in a letter she sent to duguid responding to his article,
juliet from distributed proofreaders made this claim:
>    By successive revision, in comparing text to image 
>    3-5 times for each page, and then by applying 
>    standard tools, checks, etc (quite parallel, in a way, 
>    to compiling code) DP achieves transcription accuracy

you really should read the entirety of juliet's letter.   it's at:
>    http://www.pgdp.net/phpBB2/viewtopic.php?t=22839&start=0

but for right now, i want to focus in on her statement that d.p.'s
"standard checks" are "quite parallel, in a way, to compiling code".

in my experience, both as a person who executes source code
and as a person who has examined the d.p. output very closely,
this statement strains credulity, and even borders on ridiculous.

and i will give an example right here to back up that assessment.

i'm using the e-text entitled "the hawaiian romance of laieikawai":
>    http://www.gutenberg.org/etext/13603

as you can see, this e-text is #13603, which means it is in the
"later half" of the p.g. library.   the files are dated 2004/10/5.
this e-text was included in the bundle prepared to celebrate
the 5,000th e-text created by d.p., so it's in the "later half" of
all e-texts digitized by d.p. as well.   (their total now is 9,320.)

this file predates the switch from a 2-round to a 4-round system,
but i'm not going to focus on _proofing_ errors right now, merely
on mistakes that dispute juliet's "similar to compiling code" claim.

i've appended to this post the first line of each footnote, in order.

there are all kinds of comments i could make (and will make later)
about the footnote numbering, but again, remember the focus here.

to instantiate the kind of checks that are similar to "chip and compiler",
these numbers should be prepended or appended with another string, 
so each separate one becomes _uniquely_identified_ within the book.
i like to use page-number, or the chapter-number or chapter-name,
but even a consecutive bookwide series of numbers will do fine, like:

>    [Footnote 1-01: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2-02: Daggett calls the story "a supernatural folklore legend 
of
>    [Footnote 3-04: The changes introduced by these editors have not been
>    [Footnote 4-05: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5-06: The most important of these chants translated from the
>    [Footnote 1-07: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2-08: Lesson says of the Polynesian groups (I, 378): "On sait 
...
>    [Footnote 3-09: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,

but the problems go even deeper than a failure to relabel the footnotes
to make them unambiguous.   they cut to the fundamental cornerstone of 
_actually_having_the_correct_numbers_in_the_e-text_in_the_first_place_.

look at the lines marked with "xxxxxx" and you'll see that:
a)   footnote 24 is incorrectly labeled as footnote 21.
b)   footnote 26 is incorrectly labeled as footnote 28.
c)   footnote 76 is incorrectly labeled as footnote 78.
d)   footnote 2 (toward the bottom) is incorrectly labeled as footnote 3.

these mistakes were located with a basic consistency-checking program.

if errors this _blatant_ -- which could've been identified as easily by d.p.
as they were by me, simply by using a program as elementary as mine --
were allowed to pass through the quality-control checks, what confidence
can we have that there are not _many_ more, and far more subtle in nature?

nor were these the only errors, even in the easy task of labeling footnotes.

take a look at these lines, again in order, this time from the _body_ of the 
e-text:
>    became Kekalukaluokewa's, and he portioned out the land[61] and set up
>    Then quickly he went to consult his sister, to Malio.[62]
>    they ask you what long waves you surf on say on the _Huia_.[63] If they
>    over to the coast where Kumukahi[64] swims in the billows, then this is
>    xxxxxx          get the foster child of Kapukaihaoa, Laielohelohe,[66] 
who is like
>    took his foster child's umbilical cord[66] and wore it about his neck.
>    snails[67] singing, then do you two meet apart from the assembly.
>    "The marriage of the chiefs! The marriage of the chiefs!"[68]
>    back and stopped at their sources, no water flowed into the sea.[69]
>    After this the seer took Laieikawai's skirt[70] and laid it down on the
>    Kaeloikamalama's neck.[71]
>    Tahiti."[72]
>    Moanalihaikawaokele and Laukieleula."[73]
>    xxxxxx          Then that bird[71] drooped its wings down and its body 
remained aloft,
>    month bad weather closes down,[75] when the storm clears, there I am
>    place, and they resemble evil spirits in their nature.[76]

as you will notice, the anchor for footnote #65 is incorrectly labeled as 
"66",
and the anchor for footnote #74 is incorrectly labeled as "71".   
embarrassing,
that something as simple as a sequential number-set could contain such flaws.

again, there are _6_ errors in this e-text, just in the footnote numbering 
alone!

and that's a series of _sequential_ numbers, and thus easily checked by 
machine.

further, it's a check that _should_ have been in place many, many years ago, 
by 
both distributed proofreaders _and_ project gutenberg proper.   yet this 
e-text
-- with these _obvious_ errors -- was posted two years ago, and never 
updated.

i will have more to say about duguid later.   but for now, one thing is 
certain...

the checks that are done are _clearly_very_far_ from being "chip/compiler" 
checks.

-bowerbird

>    xxxxxx          [Footnote A: The titles of chapters are added for
>    [Footnote 1: Compare the Fijian story quoted by Thomson (p. 6).]
>    [Footnote 2: Daggett calls the story "a supernatural folklore legend of
>    [Footnote 3: The changes introduced by these editors have not been
>    [Footnote 4: Dr. N. B. Emerson's rendering of the myth of _Pele and
>    [Footnote 5: The most important of these chants translated from the
>    [Footnote 1: Bastian In Samoanische Sch?pfungssage (p. 8) says:
>    [Footnote 2: Lesson says of the Polynesian groups (I, 378): "On sait ...
>    [Footnote 3: Compare: Stair, Old Samoa, p. 271; White, I, 176; Fison,
>    [Footnote 4: Lesson (II, 190) enumerates eleven small islands, covering
>    [Footnote 5: _Kahiki_, in Hawaiian chants, is the term used to designate
>    [Footnote 6: Lesson, II, 152.]
>    [Footnote 7: Ibid., 170.]
>    [Footnote 8: Ibid., 178.]
>    [Footnote 1: In the Polynesian picture of the universe the wall of
>    [Footnote 2: The Rarotongan world of spirits is an underworld. (See
>    [Footnote 3: White, I, chart; Gill, Myths and Songs, pp. 3, 4; Ellis,
>    [Footnote 4: Gill says of the Hervey Islanders (p. 17 of notes): "The
>    [Footnote 5: Bastian, Samoanische Sch?pfungs-Sage; Ellis, I, 321; White,
>    [Footnote 6: Moerenhout translates (I, 419): "He was, _Taaroa_ (Kanaloa)
>    [Footnote 7: Moerenhout, I, 423: "_Taaroa_ slept with the woman called
>    [Footnote 8: Grey, pp. 38-45; Kr?mer, Samoa Inseln, pp. 395-400; Fison,
>    [Footnote 9: In Fornander's collection of origin chants the Hawaiian
>    [Footnote 1: Mariner, II, 103; Turner, Nineteen Tears in Polynesia, pp.
>    [Footnote 2: When a Polynesian invokes a god he prays to the spirit of
>    [Footnote 3: Bird-bodied gods of low grade in the theogony of the
>    [Footnote 4: With the stories quoted from Fornander may be compared such
>    [Footnote 1: Grey, pp. 1-15; White, I, 46; Baessler, Neue S?dsee-Bilder,
>    [Footnote 2: Compare Kr?mer's Samoan story (in Samoa Inseln, p. 413) of
>    [Footnote 3: Kr?mer, Samoa Inseln, pp. 44, 115; Fison, pp. 16,
>    [Footnote 1: As such Paliuli occurs in other Hawaiian folk tales:
>    [Footnote 2: The gods Kane and Kanaloa, who live in the mountains of
>    [Footnote 3: Although the earthly paradise has the same location in both
>    [Footnote 4: First generation: Waka, Kihanuilulumoku,
>    [Footnote 1: J.A. Macculloch (in Childhood of Fiction, p. 2) says,
>    [Footnote 2: Moerenhout, II, 4, 265.]
>    [Footnote 3: Gracia (p. 47) says that the taboo consists in the
>    [Footnote 4: Compare Kr?mer, Samoa Inseln, p. 31; Stair, p. 75; Turner,
>    [Footnote 5: In certain groups inheritance descends on the mother's side
>    [Footnote 6: Kr?mer (p. 32 et seq.) tells us that in Samoa the daughter
>    [Footnote 7: Rivers, I, 374; Malo, p. 80.
>    [Footnote 8: Keaulumoku's description of a Hawaiian chief (Islander,
>    [Footnote 9: Stair, p. 220; Gracia, p. 59; Alexander, History, chap. IV;
>    [Footnote 10: Gracia, p. 46; Mariner, II, 87, 101, 125; Gill, Myths and
>    [Footnote 11: Malo, p. 69.]
>    [Footnote 12: Ellis (III, 36) describes the art of medicine in
>    [Footnote 1: Jarves says: "Songs and chants were common among all
>    [Footnote 2: Moerenhout, I, 411.]
>    [Footnote 3: Andrews, Islander, 1875, p. 35; Emerson, Unwritten
>    [Footnote 4: In Fornander's story of _Lonoikamakahiki_, the chief
>    [Footnote 5: Compare with Ellis, I, 286, and Williams and Calvert, I,
>    [Footnote 6: Gill, Myths and Songs, pp. 268 et seq.]
>    [Footnote 7: See Fornander's stories of _Lonoikamakahiki, Halemano_, and
>    [Footnote 1: In the Hawaiian Annual, 1890, Alexander translates some 
notes
>    [Footnote 2: Moerenhout (I, 501-507) says that the Areois society in
>    [Footnote 3: Emerson, Unwritten Literature, p. 24 (note).]
>    [Footnote 4: This is well illustrated in Fornander's story of
>    [Footnote 5: Thomson says that the Fijians differ from the Polynesians
>    [Footnote 1: Turner, Samoa, p. 220.]
>    [Footnote 2: Ibid.; Moerenhout, I, 407-410.]
>    [Footnote 3: Turner, Samoa, pp. 216-221; Williams and Calvert, I, p.
>    [Footnote 4: Williams and Calvert, I, 118.]
>    [Footnote 5: Moerenhout, II, 146.]
>    [Footnote 1: See Moerenhout, II, 210; Jarves, p. 34; Alexander in
>    [Footnote 2: Fison, p. 100.]
>    [Footnote 1: The following examples are taken from the Laieikawai, where
>    [Footnote 2: In the course of the story of _Laieikawai_ occur more than
>    [Footnote 1: _Kuakoa_, iv, No. 31, translated also in _Hawaiian Annual_,
>    [Footnote 1: Title pages.
>    [Footnote 1: For the translation of Haleole's foreword, which is in a
>    [Footnote 1: Haleole uses the foreign form for wife, _wahine mare_,
>    [Footnote 2: The chief's vow, _olelo paa_, or "fixed word," to slay all
>    [Footnote 3: The phrase _nalo no hoi na wahi huna_, which means 
literally
>    [Footnote 4: Prenatal infanticide, _omilomilo_, was practiced in various
>    [Footnote 5: The _manini_ (_Tenthis sandvicensis_, Street) is a
>    [Footnote 6: The month _Ikuwa_ is variously placed in the calendar year.
>    [Footnote 7: The adoption by their grandparents and hiding away of the
>    [Footnote 8: The _iako_ of a canoe are the two arched sticks which hold
>    [Footnote 9: The verb _hookuiia_ means literally "cause to be pierced"
>    [Footnote 10: Hawaiian challenge stories bring out a strongly felt
>    [Footnote 11: In his invocation the man recognizes the two classes of
>    [Footnote 12: With this judgment of beauty should be compared
>    [Footnote 13: The building of a _heiau_, or temple, was a common means
>    [Footnote 14: The nights of Kane and of Lono follow each other on the
>    [Footnote 15: By _kahoaka_ the Hawaiians designate "the spirit or soul
>    [Footnote 16: The feathers of the _oo_ bird (_Moho nobilis_), with which
>    [Footnote 17: The reference to the temple of Pahauna is one of a number
>    [Footnote 18: The whole treatment of the Kauakahialii episode suggests 
an
>    [Footnote 19: These are all wood birds, in which form Gill tells us 
(Myths
>    [Footnote 20: _Moaulanuiakea_ means literally "Great-broad-red-cock,"
>    [Footnote 21: Compare Gill's story of the first god, Watea, who dreams
>    [Footnote 22: In the song the girl is likened to the lovely _lehua_,
>    [Footnote 23: No other intoxicating liquor save _awa_ was known to the
>    xxxxxx          [Footnote 21: In the Hawaiian form of checkers, called 
_konane_, the
>    25              [Footnote 25: The _malo_ is a loin cloth 3 or 4 yards 
long and a foot
>    xxxxxx          [Footnote 28: In Hawaiian warfare, the biggest boaster 
was the best man,
>    27              [Footnote 27: The idiomatic passages "_aohe puko momona 
o Kohala_,"
>    [Footnote 28: This boast of downing an antagonist with a single blow is
>    [Footnote 29: Shaking hands was of foreign introduction and marks one of
>    [Footnote 30: Famous Hawaiian boxing teachers kept master strokes in
>    [Footnote 31: Few similes are used in the story. This figure of the
>    [Footnote 32: The Polynesians, like the ancient Hebrews, practiced
>    [Footnote 33: The gods invoked by Aiwohikupua are not translated with
>    [Footnote 34: The _laau palau_, literally "wood-that-cuts," which Wise
>    [Footnote 35: The Hawaiian cloak or _kihei_ is a large square, 2 yards
>    [Footnote 36: The meaning of the idiomatic boast _he lala kamahele no ka
>    [Footnote 37: The _puloulou_ is said to have been introduced by Paao
>    [Footnote 38: Long life was the Polynesian idea of divine blessing. Of
>    [Footnote 39: Chickens were a valuable part of a chief's wealth, since
>    [Footnote 40: Mr. Meheula suggested to me this translation of the
>    [Footnote 41: A peculiarly close family relation between brother and
>    [Footnote 42: For the translation of this dialogue I am indebted, to the
>    [Footnote 43: To express the interrelation between brothers and sisters
>    [Footnote 44: The line translated "Fed upon the fruit of sin" contains
>    [Footnote 45: This _ti_-leaf trumpet is constructed from the thin, dry,
>    [Footnote 46: In the story of _Kapuaokaoheloai_ we read that the
>    [Footnote 47: A strict taboo between man and woman forbade eating
>    [Footnote 48: The place of surf riding in Hawaiian song and story
>    [Footnote 49: _Honi_, to kiss, means to "touch" or "smell," and
>    [Footnote 50: The abrupt entrance of the great _moo_, as of its
>    [Footnote 51: The _ieie_ vine and the sweet-scented fern are, like the
>    [Footnote 52: The fight between two _kupua_, one in lizard form, the
>    [Footnote 53: The season for the bird catcher, _kanaka kia manu_, lay
>    [Footnote 54: For the cloud sign compare the story of Kualii's battles
>    [Footnote 55: Of Hawaiians at prayer Dibble says: "The people were in
>    [Footnote 56: The three mountain domes of Hawaii rise from 13,000 to
>    [Footnote 57: The games of _kilu_ and _ume_, which furnished the popular
>    [Footnote 58: In the story of Kauakahialii, his home at Pihanakalani is
>    [Footnote 59: The Hawaiian custom of group marriages between brothers or
>    [Footnote 60: The Hawaiian flute is believed to be of ancient origin. It
>    [Footnote 61: At the accession of a new chief in Hawaii the land is
>    [Footnote 62: The names of Malio and Halaaniani are still to be found in
>    [Footnote 63: The _huia_ is a specially high wave formed by the meeting
>    [Footnote 64: Kumukahi is a bold cape of black lava on the extreme
>    [Footnote 65: The name of Laieikawai occurs in no old chants with which
>    [Footnote 66: To preserve the umbilical cord in order to lengthen the
>    [Footnote 67: More than 470 species of land snails of a single genus,
>    [Footnote 68: This incident is unsatisfactorily treated. We never know
>    [Footnote 69: This episode of the storm is another inconsistency in the
>    [Footnote 70: The _pa-u_ is a woman's main garment, and consists of five
>    [Footnote 71: In mythical quest stories the hero or heroine seeks, by
>    [Footnote 72: According to the old Polynesian system of age groups, the
>    [Footnote 73: The name Laukieleula means "Red-kiele-leaf." The kiele,
>    [Footnote 74: The story of the slaying of Halulu in the legend of
>    [Footnote 75: The divine approach marked by thunder and lightning,
>    xxxxxx          [Footnote 75: Kaonohiokala, Mr. Emerson tells me, is the 
name of one of
>    [Footnote 1: Compare Westervelt's Gods and Ghosts, p. 66.]
>    [Footnote 1: The rock called Kaneaukai, "Man-floating-on-the-sea," on
>    [Footnote 1: See _Kamapuaa_, where the same feat is described.]
>    [Footnote 1: Compare the fishhook Pahuhu in _Nihoalaki_; the _leho_
>    [Footnote 1: Compare _Kalelealuaka_.]
>    [Footnote 1: This is not the Olopana of Hawaii.]
>    [Footnote 1: This is only a fragment of the very popular story of the
>    [Footnote 2: Rev. A.O. Forbes's version of this story is printed in
>    [Footnote 1: See Daggett's account, who places Moikeha's role in the
>    [Footnote 1: Kaulu meets the wizard Makalii in rat form and kills him by
>    xxxxxx          [Footnote 3: Daggett tells the story of _Hua_, priest of 
Maui.]
>    [Footnote 1: This story Fornander calls "the most famous in Hawaiian
>    [Footnote 1: One of the most popular heroes of the Puna, Kau, and Kona
>    [Footnote 1: Mr. Stokes found on the rocks at Kahaluu, near the _heiau_
>    [Footnote 1: This story is much amplified by Mrs. Nakuina in Thrum, p.
>    [Footnote 1: See Thrum, p. 43.]
>    [Footnote 1: Daggett tells this story.]
>    [Footnote 1: Gill tells this same story from the Hervey group. Myths and
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061108/0cfb5586/attachment-0001.html
From phil at thalasson.com  Wed Nov  8 12:25:33 2006
From: phil at thalasson.com (Philip Baker)
Date: Wed Nov  8 12:28:48 2006
Subject: [gutvol-d] The interesting comment by Lee and what
	the	PG	collection will become known for...
In-Reply-To: <455114CC.3060203@novomail.net>
Message-ID: 

In article <455114CC.3060203@novomail.net>, Lee Passey
 writes
>Onorio Catenacci wrote:
>
>[snip]
>
>> Just to be sure I'm clear: "casual reading" implies good enough to
>> read and follow but not good enough for a scholarly dissertation?  :-)
>
>This is not a bad summation of my position, although, as usual, the 
>devil is in the details. You have posited two extremes: 1. good enough 
>to read and follow (to which I would add "given a modicum of effort") 
>and 2. good enough for a scholarly dissertation.
>
>Now I would agree that the vast majority of Project Gutenberg e-texts 
>are probably good enough to read and follow given a modicum of effort. 
>And I suspect that you would agree that the vast majority (and perhaps 
>even the totality) of Project Gutenberg e-texts are inadequate for a 
>scholarly dissertation. But what about those situations which fall 
>between the extremes?
>
>The fundamental problem is that Project Gutenberg is totally lacking in 
>standards, so it is impossible to judge how well any given e-text 
>matches any given use. And we probably don't have enough data to 
>determine how well Project Gutenberg e-texts, on the whole, satisfy any 
>external standards. But personally, I find that generally Project 
>Gutenberg e-texts 1. are inadequate for a scholarly dissertation; AND 2. 
>are inadequate for assigned reading at a high-school level; AND 3. are 
>inadequate for inclusion in any public or school library; AND 4. are 
>inadequate for any type of automated data processing; AND 5. are mostly 
>inadequate for /effortless/ reading.
>
>I suspect that there /may/ be some gems in the PG corpus which are 
>adequate for any or all of the above uses, but again, because Project 
>Gutenberg has no standards, it is virtually impossible to identify those 
>e-texts except on a case-by-case basis. Given the lack of indications of 
>quality, I must accept as a default position that Project Gutenberg 
>e-texts are good enough to read and follow by a human being (and not a 
>computer) given a modicum of effort; but nothing more.
>


But what paper book publishers have explicitly stated standards? I don't
remember seeing any, but I commonly find fairly obvious errors in paper
books from all sorts of publishers all the way from large publishers
with a fair degree of prestige like Oxford University Press to tiny
publishers that have only published a handful of books. And when do
paper book editions of 'classic' out of copyright works meticulously
present the sources for their text? 

And looking outside North America and Europe the situation is even
worse. What standards are going to apply to a bootleg Chinese
translation of a Harry Potter book where each page has a different
translator?

And then there are audio books. The typical commercial audio book novel
has about one third of the content of the print edition. Do you want
every audio book sold to come with a paper book complete text with the
audio book omissions and changes marked up?

You have set up some rather ambiguous standards for ebooks but how many
paper books meet those standards? Do you just assume that most do?

-- 
Philip Baker
From jon at noring.name  Wed Nov  8 12:32:37 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 12:30:58 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <754776753.20061108133237@noring.name>

A private communication by Joshua (who said I may share this here)
casually mentioned that the earliest DP texts submitted to PG were
stripped of their source information, and they remain that way today.
(This was before PG's policy change regarding source information as
outlined by BBI.)

If this is true, at least in the general sense, this indicates to me
a need for DP to resubmit those texts to PG. Are there plans to do
this, and how difficult would it be?

Jon

From jmdyck at ibiblio.org  Wed Nov  8 13:19:37 2006
From: jmdyck at ibiblio.org (Michael Dyck)
Date: Wed Nov  8 13:19:41 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <817072195.20061108093213@noring.name>
References: 
	<817072195.20061108093213@noring.name>
Message-ID: <455249E9.9080608@ibiblio.org>

Jon Noring wrote:
> DP ... has submitted over half of the texts to PG.

On the face of it, pgdp.net has contributed 9,321 of PG-US's
19,695 etexts, so only 47.3% so far.

At recent rates, it'll take about a year for that to get up to 50%.

-Michael

From jon at noring.name  Wed Nov  8 13:44:40 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 13:43:02 2006
Subject: [gutvol-d] The interesting comment by Lee and what the PG
	collection will become known for...
In-Reply-To: 
References: <455114CC.3060203@novomail.net> 
Message-ID: <324722079.20061108144440@noring.name>

Philip wrote:

> But what paper book publishers have explicitly stated standards? I don't
> remember seeing any, but I commonly find fairly obvious errors in paper
> books from all sorts of publishers all the way from large publishers
> with a fair degree of prestige like Oxford University Press to tiny
> publishers that have only published a handful of books. And when do
> paper book editions of 'classic' out of copyright works meticulously
> present the sources for their text?

Very interesting question.

And I was composing a long reply differentiating what PG is doing
versus what paper book publishers do, the filtering mechanisms at the
scholarly level, the worry a major publisher has in not doing shoddy
work (if they seek quality), etc. But I decided not to discuss that
here, but instead ask:

"Shouldn't PG be better than commercial paper book publishers when it
comes to doing things right?"

and

"Why should a bottoms-up organization like PG blindly emulate the
practices of top-down publishers who only care about making money?"

The question (and this is not a criticism) also reminds me of the
teenager telling their parents "well, everyone else is doing it."
That sort of argument wears thin on any parent who knows (or should
know) what's best for their children.

Now, obviously, one can argue if the source materials which are
transcribed themselves were done shoddily, then why should we care
about faithful reproduction, warts and all. Well, let's parse that:

1) If there is only one edition of a work, that *is* the canonical
   expression of the author, no matter how badly the publisher mangled
   it from authorial intent. That's *all* we have to go by unless the
   original author's manuscript happens to turn up (which infrequently
   happens.)

2) If a work exists in several editions, then here a case-by-case
   bibliographic analysis should be done to determine which source
   should be used (if there is a choice.) This may include scholarly
   input by those who've studied the Work and its various editions,
   and looking at other things like if it is a first edition, if a
   subsequent edition was published by the same publishers, etc.

Of course, even if a source used has flaws, why would PG or anyone
else attempt to "fix" them (without marking what fixes were made) and
potentially add more problems?

Jon

From prosfilaes at gmail.com  Wed Nov  8 13:55:08 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Nov  8 13:55:12 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <754776753.20061108133237@noring.name>
References: <754776753.20061108133237@noring.name>
Message-ID: <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>

On 11/8/06, Jon Noring  wrote:
> If this is true, at least in the general sense, this indicates to me
> a need for DP to resubmit those texts to PG. Are there plans to do
> this, and how difficult would it be?

There's been discussions, but errata people have been really resistant
to changes that just add source information.
From prosfilaes at gmail.com  Wed Nov  8 14:01:20 2006
From: prosfilaes at gmail.com (David Starner)
Date: Wed Nov  8 14:01:23 2006
Subject: [gutvol-d] let's close this up
In-Reply-To: <45521581.8080906@srv.net>
References: 
	<817072195.20061108093213@noring.name> <45521581.8080906@srv.net>
Message-ID: <6d99d1fd0611081401g196f13beg4265b2066be47128@mail.gmail.com>

On 11/8/06, Kevin Handy  wrote:
> I believe one major problem is finding enough disk space
> to hold the scans.

I think Archive.org is happy to handle the scans. Some have already be
uploaded there, and they've been more than happy to handle all the
scans they can get.
From sam.bretheim at gmail.com  Wed Nov  8 14:30:29 2006
From: sam.bretheim at gmail.com (Sam Bretheim)
Date: Wed Nov  8 14:31:18 2006
Subject: [gutvol-d] The interesting comment by Lee and what the
	PG	collection will become known for...
In-Reply-To: <324722079.20061108144440@noring.name>
References: <455114CC.3060203@novomail.net> 
	<324722079.20061108144440@noring.name>
Message-ID: <45525A85.4070803@gmail.com>

Jon Noring wrote:
> Now, obviously, one can argue if the source materials which are
> transcribed themselves were done shoddily, then why should we care
> about faithful reproduction, warts and all. Well, let's parse that:
>   

One of TEI's unique advantages is that it lets us preserve information 
about both the source document used for transcription and the author's 
presumed intent.  For instance, consider this sentence, authentically 
transcribed from a DP book ("French and oriental love in a harem") that 
we believe was typeset by aliens:

    "I am langhing at all those stories abont yonr harems which yon 
still make np for me jnst as yon nsed to do for that idiot Hadidje."

In TEI, we can mark this up as:

    
        "I am langhing at all those stories abont yonr harems which 
yon still make np for me jnst as yon nsed to do for that idiot 
Hadidje."
        "I am laughing at all those stories about your 
harems which you still make up for me just as you used to do for that 
idiot Hadidje."
    

... where the "resp" attribute indicates the PG proofreader who 
suggested the correction. If it had just been one word, we could have 
tagged just that word (or even just the one incorrect letter):

    "I am langhinglaughing at all those stories about your 
harems which you still make up for me just as you used to do for that 
idiot Hadidje."

However, that particular book is so awful that marking all the necessary 
corrections would be insane.  (The typesetter was mostly uninterested in 
fine distinctions like u/n, h/b, e/c/o, etc.)  For situations like that, 
the TEI header has a plethora of fields for describing the editorial 
intent of the people who made the digital version.

TEI also has similar inline mechanisms for indicating differences 
between editions; any number of editions of a text can be combined into 
a single TEI copy that preserves all of the information in the original 
books.

From joshua at hutchinson.net  Wed Nov  8 17:34:58 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 17:35:02 2006
Subject: [gutvol-d] let's close this up
Message-ID: <10954230.1163036098985.JavaMail.?@fh1063.dia.cp.net>


>----Original Message----
>From: prosfilaes@gmail.com
>On 11/8/06, Kevin Handy  wrote:
>> I believe one major problem is finding enough disk space
>> to hold the scans.
>
>I think Archive.org is happy to handle the scans. Some have already 
be
>uploaded there, and they've been more than happy to handle all the
>scans they can get.

PG will take scans, too, now.  I've been uploading image scans with my 
texts for a couple months.  Sometimes the files are large enough that 
it would timeout the upload window, so other arrangements are sometimes 
necessary.

In fact, if people want, I'd be happy to handle the image uploads (I 
have upload status for audio files and just upload page images isn't 
stretching my abilities much).  Then, you can upload texts normally and 
then once it is posted, give me a location where the image files are 
located and I can upload those.

If you want to do something like this, give me a holler.  I'd 
appreciate if you renumbered the files yourself so that each file 
matches the page number from the book so that I don't have to download 
the files, renumber them and reupload to the pglaf server..

Please follow the numbering proposal Marcello posted a while back 
(quote below):

***Numbering / Naming of page files***

A book usually contains 2 page number sequences, a roman one followed 
by
an arabic one. We considered the cover pages as yet another sequence.

A filename for a single-page png file MUST follow this pattern:

  .png

The prefix for the cover pages is: "c".

The prefix for the roman pages is: "f".

The prefix for the arabic pages is: "p".

If there are more page number sequences in the book, they MUST be
handled in a similar fashion, using an arbitrary free letter.

The  is the true page number as seen on the physical page
(or inferred from the previous / next pages) expressed in arabic
numerals and left-padded with zeroes to a length of 4 digits.

For blank pages there should be no file and the page number should be
skipped. Optionally an image saying: "This page is blank in the
original." may be inserted. Missing pages MUST be replaced by an image
saying: "This page is missing."

If present, front cover, back cover and spine MUST be named as 
follows:

  front cover outside: c0001.png
  front cover inside:  c0002.png
  back cover inside:   c0003.png
  back cover outside:  c0004.png
  spine:               c0005.png


Example of file naming:

  front cover      c0001.png
  back cover       c0004.png
  spine            c0005.png

  i title page     f0001.png
  ii title verso   f0002.png
  iii dedication   f0003.png
  iv is blank
  v contents       f0005.png

  page 1           p0001.png
  page 2           p0002.png
  image on page 2  p0002-1.png
  image on page 2  p0002-2.png
  page 3           p0003.png
  page 4 is blank
  page 5           p0005.png
  ...              ...
  page 9999        p9999.png

***

If anyone wants to take me up on this, please do.  I think having page 
images in the PG archives is a "good thing" and just posting image 
files isn't that time intensive (well, the upload/download part can be, 
but that can be easily automated!)

Josh
From joshua at hutchinson.net  Wed Nov  8 17:44:00 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Wed Nov  8 17:44:02 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>

>----Original Message----
>From: prosfilaes@gmail.com
>On 11/8/06, Jon Noring  wrote:
>> If this is true, at least in the general sense, this indicates to 
me
>> a need for DP to resubmit those texts to PG. Are there plans to do
>> this, and how difficult would it be?
>
>There's been discussions, but errata people have been really 
resistant
>to changes that just add source information.

I think it is more that the errata folks are already overworked and 
this seems a small return on their invested time.

Perhaps if someone "trustworthy" on the DP side were to oversee the 
process so that the errata folks could just take the fixed file(s) and 
push it to the server, it wouldn't be a big deal.

I leave it as an exercise to those more interested in this to organize 
and quantify who is "trustworthy" enough to oversee this.

Josh
From donovan at abs.net  Wed Nov  8 18:19:55 2006
From: donovan at abs.net (D Garcia)
Date: Wed Nov  8 18:20:19 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <455249E9.9080608@ibiblio.org>
References: 
	<817072195.20061108093213@noring.name>
	<455249E9.9080608@ibiblio.org>
Message-ID: <200611082119.56116.donovan@abs.net>

On Wednesday 08 November 2006 04:19 pm, Michael Dyck wrote:
> Jon Noring wrote:
> > DP ... has submitted over half of the texts to PG.
>
> On the face of it, pgdp.net has contributed 9,321 of PG-US's
> 19,695 etexts, so only 47.3% so far.

I'm betting that's not excluding the audio books.
What's it look like with those excluded?
From jon at noring.name  Wed Nov  8 19:27:19 2006
From: jon at noring.name (Jon Noring)
Date: Wed Nov  8 19:25:46 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <24296608.20061108202719@noring.name>

Joshua wrote:

> There's been discussions, but errata people have been really
> resistant to changes that just add source information.
>
> [snip]
>
> Perhaps if someone "trustworthy" on the DP side were to oversee the
> process so that the errata folks could just take the fixed file(s)
> and push it to the server, it wouldn't be a big deal.
>
> I leave it as an exercise to those more interested in this to organize
> and quantify who is "trustworthy" enough to oversee this.

I first thought that all one had to do is to resubmit the DP versions,
which include the source information, to DP to replace the ones in the
PG catalog.

But as I see this discussion unfold, I sense that the DP versions at
PG may have been corrected (at the PG-side), and that the corrections
do not flow back to DP. Is this right? (If so, this is a little bit
troubling since I think DP should hold the "master" and do any
corrections, but I'll defer commenting further on this point.)

So from my limited vantage point, I see the following:

1) If a DP-provided text has never been corrected after it was
   stripped of its source info, DP simply resubmits the text as it
   originally was submitted.

2) If a DP-provided text had corrections, then maybe the insertion
   of the source info should be done by DP.


Anyway, I'm sort of operating in the dark on this particular issue, so
what sayeth the DP and PG folk?

Jon Noring


From joshua at hutchinson.net  Thu Nov  9 05:06:42 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 05:06:46 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
Message-ID: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: jon@noring.name
>
>I first thought that all one had to do is to resubmit the DP 
versions,
>which include the source information, to DP to replace the ones in 
the
>PG catalog.
>

DP doesn't keep a "master" file.  In some cases, we've probably got 
something very close to the final product posted to PG, but ultimately, 
PG is the archive that holds the final product.

That being said, buried within old backups, all the metadata is THERE 
at DP.  It would be a matter of readding that information to the file
(s) on PG's archive.  Something non-trivial in scope.

Josh

PS If any of the DP admins are reading, please feel free to correct 
any mistakes or omissions.
From lee at novomail.net  Thu Nov  9 07:54:17 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 07:53:41 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>
References: <31978970.1162579360874.JavaMail.?@fh1063.dia.cp.net>
Message-ID: <45534F29.7080309@novomail.net>

joshua@hutchinson.net wrote:
> However, it could definitely be improved and flesh out.
>
> As far as community feedback, that tends to happen more in the DP 
> forums where folks are more actively putting together and talking about 
> new etexts.
>   

So, I've made my first pass through the document, and I have a few 
comments. It would appear that most of the discussion on this topic 
occurs in the DP forums, so could you tell me which one would be the 
most appropriate? Or perhaps a thread subject to search for?

If most of this work is occurring under aegis of Distributed 
Proofreaders, and knowing as we do that Project Gutenberg eschews 
standards, wouldn't it be better to call this project DPTEI, and perhaps 
move the documentation and tools to the DP servers?

From lee at novomail.net  Thu Nov  9 07:59:35 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 07:58:57 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>
References: <5438407.1163077602820.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <45535067.1030008@novomail.net>

joshua@hutchinson.net wrote:
> > ----Original Message---- From: jon@noring.name
> >
> > I first thought that all one had to do is to resubmit the DP
> > versions, which include the source information, to DP to replace
> > the ones in the PG catalog.
>
>  DP doesn't keep a "master" file.  In some cases, we've probably got
>  something very close to the final product posted to PG, but
>  ultimately, PG is the archive that holds the final product.

If this is true, I find it very troubling (I'm the kind of person who 
finds any loss of meaningful data troubling). I would think that the 
Internet Archive would be very interested in maintaining DP output. Is 
there any reason that DP would not be willing to publish to PG and IA 
simultaneously? Is there any reason that IA would not be willing to 
accept DP output?

From joshua at hutchinson.net  Thu Nov  9 08:14:57 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 08:15:16 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: <9017688.1163088897747.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: lee@novomail.net
>
>So, I've made my first pass through the document, and I have a few 
>comments. It would appear that most of the discussion on this topic 
>occurs in the DP forums, so could you tell me which one would be the 
>most appropriate? Or perhaps a thread subject to search for?
>
>If most of this work is occurring under aegis of Distributed 
>Proofreaders, and knowing as we do that Project Gutenberg eschews 
>standards, wouldn't it be better to call this project DPTEI, and 
perhaps 
>move the documentation and tools to the DP servers?
>
>

I apologize.  I gave the impression that feedback should *only* occur 
at DP.  It was an off-hand comment just pointing out that most of the 
feedback has historically gone on there (as I'm sure most would agree, 
attempts in gutvol-d have a tendency to get side-track by a certain 
bird).  I'm very active here and very willing to hold any and all 
discussions on the topic here.

There haven't been any recent PGTEI discussions at DP, so resurrecting 
one of those old threads probably wouldn't be all that helpful.

Also, while *I* have a large involvement at DP, Marcello does not (he 
stops by occasionally to explain things from a technical level, but he 
doesn't spend a large portion of his life there like I do ;)  So 
renaming this to DPTEI would be inappropriate, I believe.

Sounds like you have some ideas.  Let 'er rip.

Josh
From sly at victoria.tc.ca  Thu Nov  9 08:52:05 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 08:52:11 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <24296608.20061108202719@noring.name>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
	<24296608.20061108202719@noring.name>
Message-ID: 


Hmm.... I had an excellent little write-up somewhere that
Jim Tinsley did a while back, describing what a white-washer
does with _all_ incoming texts. (Not all files coming out
of DP post-processing can be posted just as they are.)

For now, you could try reading through:

http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ

V.16. How does a text get produced?


Thanks,
Andrew

On Wed, 8 Nov 2006, Jon Noring wrote:

> I first thought that all one had to do is to resubmit the DP versions,
> which include the source information, to DP to replace the ones in the
> PG catalog.
>
> But as I see this discussion unfold, I sense that the DP versions at
> PG may have been corrected (at the PG-side), and that the corrections
> do not flow back to DP. Is this right? (If so, this is a little bit
> troubling since I think DP should hold the "master" and do any
> corrections, but I'll defer commenting further on this point.)
>
> So from my limited vantage point, I see the following:
>
> 1) If a DP-provided text has never been corrected after it was
>    stripped of its source info, DP simply resubmits the text as it
>    originally was submitted.
>
> 2) If a DP-provided text had corrections, then maybe the insertion
>    of the source info should be done by DP.
>
>
> Anyway, I'm sort of operating in the dark on this particular issue, so
> what sayeth the DP and PG folk?
>
> Jon Noring
>
>
From Bowerbird at aol.com  Thu Nov  9 09:22:36 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 09:23:03 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
Message-ID: 

lee said:
>    wouldn't it be better to call this project DPTEI, and perhaps 
>    move the documentation and tools to the DP servers?

lee said:
>    I would think that the Internet Archive would
>    be very interested in maintaining DP output. 
>    Is there any reason that DP would not be willing 
>    to publish to PG and IA simultaneously?

i seem to detect a pattern here.

but maybe it's just me...

meanwhile, that banner-ad atop the p.g. site sure does
pull in a huge flock of new volunteers to d.p., doesn't it?

***

josh said:
>   attempts in gutvol-d have a tendency 
>    to get side-track by a certain bird

that's an ironic thing to say, when a quick glance at
the subject-header reveals my thread was hijacked
into a discussion about .tei...               :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/06e31538/attachment.html
From marcello at perathoner.de  Thu Nov  9 09:47:34 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 09:47:45 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <45521D1D.10801@novomail.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net>
Message-ID: <455369B6.3070203@perathoner.de>

Lee Passey wrote:

> If they are constructed correctly, there is no technical reason why TEI
> files cannot be used directly in web browsers. Web browsers cannot
> display 100% of the richness captured by TEI, but the display would
> still be better than the simplified ASCII text version. And there is no
> reason why a single TEI file could not be created that would satisfy the
> needs of both direct rendering and transformations via an XSL script. It
> is not done simply because those involved in the PGTEI project have
> chosen not to do it.

And they chose not do it for several good reasons:

1. PGTEI is designed as a high fidelity conversion chain. We support
most elements in TEI lite and many CSS styling attributes on every
element. The output quality of PGTEI is meant to be vastly superior to
anything you can attain with XSLT transforms only.

2. It is virtually impossible to produce up-to-spec "plain vanilla" and
PDF output thru XSLT only. (Try line rewrapping in XSLT as exercise.) So
there is no way around external tools for TXT (nroff) and PDF (LaTeX)
output.

3. Many utility functions built into the PGTEI conversion chain cannot
work without external tools. eg. embedded SVG graphics conversion to
PNG, embedded LaTeX and AmsTeX equation conversion to PNG, scaling and
thumbnailing of images thru ImageMagick, automatic validation with
HMTLTidy, ...

4. The ability to display TEI files directly in a browser is a
geek-feature that very few people could use to any advantage while it
will confuse end-users with their browser's inadequacies and the choice
between two different browser-enabled formats.

5. PGTEI is designed to have one master format and many end-user
formats. While the master format is not designed to be consumed by
end-users nothing prevents them doing so. Anybody is free to extend
their PGTEI master file with any valid CSS or XSLT files they might
fancy. Caveat emptor: you'll have to write those CSS and XSLT
stylesheets yourself and maintain them while PGTEI evolves.


OTOH I didn't hear a single compelling reason for the PGTEI masters to
be displayed in a browser. As you said: "there is no technical reason
why TEI files cannot be used directly in web browsers". But you didn't
say if there is a practical reason why we *should* do it. At the very
best we can call it a solution in quest of a problem.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From jon at noring.name  Thu Nov  9 10:33:00 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  9 10:31:25 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455369B6.3070203@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>
	<454D8138.6040307@gmail.com> <45521D1D.10801@novomail.net>
	<455369B6.3070203@perathoner.de>
Message-ID: <389479139.20061109113300@noring.name>

Marcello wrote:
> Lee Passey wrote:

>> If they are constructed correctly, there is no technical reason why
>> TEI files cannot be used directly in web browsers. Web browsers
>> cannot display 100% of the richness captured by TEI, but the
>> display would still be better than the simplified ASCII text
>> version. And there is no reason why a single TEI file could not be
>> created that would satisfy the needs of both direct rendering and
>> transformations via an XSL script. It is not done simply because
>> those involved in the PGTEI project have chosen not to do it.

> And they chose not do it for several good reasons:

Before commenting on a couple of your individual points, let me say
that your reply is an excellent summary of the issues. Kudos...


> 3. Many utility functions built into the PGTEI conversion chain cannot
> work without external tools. eg. embedded SVG graphics conversion to
> PNG, embedded LaTeX and AmsTeX equation conversion to PNG, scaling and
> thumbnailing of images thru ImageMagick, automatic validation with
> HMTLTidy, ...

Are there any plans to actively or passively support embedded MathML,
either in lieu of, or as an option to, LaTeX?

Four reasons:

1) Presentational MathML is round-trippable to LaTeX, so I understand.
   So no loss there.

2) Content MathML allows many (but not all) mathematics expressions to
   be recognizable by mathematics software for processing (e.g.,
   analytical and numerical solving, graphing, etc.)

3) SVG and MathML have close ties, and likely over time will get closer.

4) MathML is XML, LaTeX is not.

This is not to say LaTeX is evil -- it is not -- but that MathML
(especially content MathML) should be allowed to have a role in PGTEI.


> OTOH I didn't hear a single compelling reason for the PGTEI masters to
> be displayed in a browser. As you said: "there is no technical reason
> why TEI files cannot be used directly in web browsers". But you didn't
> say if there is a practical reason why we *should* do it. At the very
> best we can call it a solution in quest of a problem.

Good point.

The only conceivable thing I can think of is visualization, to make
sure markup is applied properly. Here one doesn't have to worry about
the inline , image embedding, hypertext links issues. The CSS
for visualization would be optimized not for final presentation, but
to simply help the publication author see if the markup was applied
properly.

Like in XHTML, validation to a DTD does not prove the markup was
properly applied to the content for publishing purposes.

Jon Noring


From joshua at hutchinson.net  Thu Nov  9 10:49:45 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 10:49:47 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>

>----Original Message----
>From: jon@noring.name
>
>Are there any plans to actively or passively support embedded MathML,
>either in lieu of, or as an option to, LaTeX?
>


MathML is supported.  Due to it only working in HTML docs and not PDF, 
I've never used it.

>From the guidelines:



Attribute notation can take following values:
tex

In PDF output mode this will pipe the contents of the  
element directly through to the TeX processor.

In HTML output mode the contents of the  element will be 
passed to an instance of TeX and converted to an image. The resulting 
image file is inserted into the HTML file.

In all other output modes it will be ignored.
mathml

In HTML output mode the contents of the  element will be 
inserted literally into the HTML file.

In all other output modes it will be ignored.
svg

In HTML and PDF output modes the SVG contents of the  element 
will be converted to an image and inserted into the file.

In all other output modes it will be ignored.
From cannona at fireantproductions.com  Thu Nov  9 11:16:03 2006
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu Nov  9 11:16:45 2006
Subject: [gutvol-d] TEI rendering in Web browsers
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net><454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net><455369B6.3070203@perathoner.de>
	<389479139.20061109113300@noring.name>
Message-ID: <004101c70433$95b4b640$0300a8c0@blackbox>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jon Noring wrote:

> Are there any plans to actively or passively support embedded MathML,
> either in lieu of, or as an option to, LaTeX?
>
> Four reasons:
>
> 1) Presentational MathML is round-trippable to LaTeX, so I understand.
>   So no loss there.
>
> 2) Content MathML allows many (but not all) mathematics expressions to
>   be recognizable by mathematics software for processing (e.g.,
>   analytical and numerical solving, graphing, etc.)
>
> 3) SVG and MathML have close ties, and likely over time will get closer.
>
> 4) MathML is XML, LaTeX is not.
>
> This is not to say LaTeX is evil -- it is not -- but that MathML
> (especially content MathML) should be allowed to have a role in PGTEI.
>


I would add that MathML is slightly more accessible simply because of a
plugin which has been designed which will allow MathML formulas to be
correctly read by a screenreader.  To my knowledge no such piece of software
exists for latex.  This is know problem for me personally, as I have learned
enough latex to get by.  But for the average blind user, MathML is better.

On the other hand, many more people know how to write latex than do MathML.
This is due, in large part, to the length of time it has been around, as
well as its wide acceptance by the academic community.

Aaron Cannon

- --
Skype: cannona
MSN/Windows Messenger: cannona@hotmail.com (don't send email to the hotmail
address.)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959
Comment: Key available from all major key servers.

iD8DBQFFU36YI7J99hVZuJcRAhGIAKDfp/44YN8HtOvtQLxWKQX6BnKX8ACdHCix
ysyaoyGkhNIeXwdby6+WGfY=
=Qmdk
-----END PGP SIGNATURE-----

From gbnewby at pglaf.org  Thu Nov  9 11:17:15 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov  9 11:17:19 2006
Subject: Seeking whitewashers,
	errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source
	info?)
In-Reply-To: <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
References: <754776753.20061108133237@noring.name>
	<6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com>
Message-ID: <20061109191715.GC26863@mail.pglaf.org>

On Wed, Nov 08, 2006 at 03:55:08PM -0600, David Starner wrote:
> On 11/8/06, Jon Noring  wrote:
> >If this is true, at least in the general sense, this indicates to me
> >a need for DP to resubmit those texts to PG. Are there plans to do
> >this, and how difficult would it be?
> 
> There's been discussions, but errata people have been really resistant
> to changes that just add source information.

We are quite short-staffed on the errata handlers and whitewashers.
Jim and Brett have had very little time to spend, leaving
most posting & errata to Tonya, David & Joe.  Joshua has stepped
up quite a bit, but is focused mainly on audio eBooks from Librivox,
and a few other projects you've seen mentioned on gutvol-d.  (There
are tons of other things going on, of course, but I wanted to
send an email about the current labor shortage for the posting/errata).

Mostly we're just trying to save the errata reports, because we don't
have enough staff to act on them.  Many will not get attention until
there are people willing to investigate and apply the changes.

Interested in working with errata reports or whitewashing?  Some
approximate qualifications:
 - commit at least 5 hours per week to labor (plus another
   1-2 for keeping track of email)

 - have outstanding detail-orientation

 - able to work with plain text files, among other formats (i.e., 
   you cannot do this work just within MS Word or Dreamweaver)

 - sufficient network bandwidth to download some pretty big files,
   including multi-MB emails -- OR, sufficient Unix/Linux chops to
   use remote online resources

 - extreme diplomatic skills, for interacting with people whose
   opinions might differ from yours, or who have made some errors
   and need gentle correction and guidance.

As to the question of whether DP can re-submit texts, with
source info added, page scans, etc.: yes, certainly.  In fact,
we could consider some sort of DP process that is more automated
than a "regular" error report or bug fix.

Personally, I'd rather see a little more "value added" than just
source information, for a reposted eBook.  But if someone is
willing to do the work, it's fine to have such an update.

  -- Greg

From sly at victoria.tc.ca  Thu Nov  9 11:24:08 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 11:24:12 2006
Subject: [gutvol-d] Christmas-related texts in PG
Message-ID: 


Would anyone be interested in helping to create a list of
Christmas related texts in the collection?

I've already got a start by just searching for texts
with the word "Christmas" in the title.

Any individual suggestions would be welcome.

Would there be a good place to ask about this on
the DP foums?

Andrew
From gbnewby at pglaf.org  Thu Nov  9 11:03:09 2006
From: gbnewby at pglaf.org (Greg Newby)
Date: Thu Nov  9 11:34:48 2006
Subject: [gutvol-d] DP's contribution to PG
In-Reply-To: <200611082119.56116.donovan@abs.net>
References: 
	<817072195.20061108093213@noring.name>
	<455249E9.9080608@ibiblio.org> <200611082119.56116.donovan@abs.net>
Message-ID: <20061109190309.GA26863@mail.pglaf.org>

On Wed, Nov 08, 2006 at 09:19:55PM -0500, D Garcia wrote:
> On Wednesday 08 November 2006 04:19 pm, Michael Dyck wrote:
> > Jon Noring wrote:
> > > DP ... has submitted over half of the texts to PG.
> >
> > On the face of it, pgdp.net has contributed 9,321 of PG-US's
> > 19,695 etexts, so only 47.3% so far.
> 
> I'm betting that's not excluding the audio books.
> What's it look like with those excluded?

There are fewer than 700 audio books...  that leaves DP
still just under 50%.

People interested in "where to eBooks come from" can
subscribe to the posted list (http://lists.pglaf.org).

We still get quite a few from all over....  from the
whitewasher's point of view, the DP eBooks are generally
much more "conformant" to our processing stream than
those from other sources.  That said, there are some
individuals doing just as high-quality work as DP, but
working on their own.

  -- Greg
From grythumn at gmail.com  Thu Nov  9 11:50:45 2006
From: grythumn at gmail.com (Robert Cicconetti)
Date: Thu Nov  9 11:50:57 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: 
References: 
Message-ID: <15cfa2a50611091150u291bb11cwa54839e0b7dbe897@mail.gmail.com>

On 11/9/06, Andrew Sly  wrote:
>
> Would anyone be interested in helping to create a list of
> Christmas related texts in the collection?
>
> I've already got a start by just searching for texts
> with the word "Christmas" in the title.
>
> Any individual suggestions would be welcome.
>
> Would there be a good place to ask about this on
> the DP foums?

A project search on Special Day: Christmas will find some:
http://www.pgdp.net/c/tools/project_manager/projectmgr.php?show=search&title=&author=&language%5B%5D=&special_day%5B%5D=Christmas&projectid=&project_manager=&checkedoutby=&n_results_per_page=300

A keyword search on the project comments and forums might work better,
but I don't think it is possible at the moment...

Ask in General, I'd say, or the PM forums.

R C
From marcello at perathoner.de  Thu Nov  9 12:08:07 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 12:08:16 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <389479139.20061109113300@noring.name>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>
	<45521D1D.10801@novomail.net>	<455369B6.3070203@perathoner.de>
	<389479139.20061109113300@noring.name>
Message-ID: <45538AA7.70502@perathoner.de>

Jon Noring wrote:

> Are there any plans to actively or passively support embedded MathML,
> either in lieu of, or as an option to, LaTeX?

Is there some stand-alone open source tool (eg. like ImageMagick for
SVG) that converts MathML expressions into PNG (or LaTeX) ?


> The only conceivable thing I can think of is visualization, to make
> sure markup is applied properly.

Simply build your own CSS stylesheet (or steal one somewhere) and paste
it into the TEI file. But no reason to leave it there after proofing.


-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Nov  9 12:09:43 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 12:09:51 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>
References: <32862065.1163098185440.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <45538B07.60401@perathoner.de>

joshua@hutchinson.net wrote:

> MathML is supported.  Due to it only working in HTML docs and not PDF, 
> I've never used it.

The browser has to support it too. Meaning: use Firefox or appropriate
plug-in.



-- 
Marcello Perathoner
webmaster@gutenberg.org

From sly at victoria.tc.ca  Thu Nov  9 12:24:57 2006
From: sly at victoria.tc.ca (Andrew Sly)
Date: Thu Nov  9 12:25:01 2006
Subject: [gutvol-d] Earliest DP texts stripped of source info?
In-Reply-To: <24296608.20061108202719@noring.name>
References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net>
	<24296608.20061108202719@noring.name>
Message-ID: 



And I just ran into one more example of a thing I've
seen a few times before. Where an annonymous text
is in the DP system with the publisher listed as
"author". This is then (hopefully) corrected as
it enters the PG collection.

It is not too uncommon for what we're calling
"metadata" to be revised just before or shortly
after posting. (I do try to review all the messages
sent to posted, and frequently make a pain of
myself to white-washers, sending messages saying
"this doesn't look quite right".)

Andrew


On Wed, 8 Nov 2006, Jon Noring wrote:

> Joshua wrote:
>
> >
> > Perhaps if someone "trustworthy" on the DP side were to oversee the
> > process so that the errata folks could just take the fixed file(s)
> > and push it to the server, it wouldn't be a big deal.
From lee at novomail.net  Thu Nov  9 12:38:20 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 12:36:33 2006
Subject: [gutvol-d] gvd061030 -- let's get it started in here
In-Reply-To: 
References: 
Message-ID: <455391BC.1050700@novomail.net>

Bowerbird@aol.com wrote:
>    lee said:
>>   wouldn't it be better to call this project DPTEI, and perhaps
>>   move the documentation and tools to the DP servers?
> 
> lee said:
>>   I would think that the Internet Archive would
>>   be very interested in maintaining DP output.
>>   Is there any reason that DP would not be willing
>>   to publish to PG and IA simultaneously?
> 
> i seem to detect a pattern here.
> 
> but maybe it's just me...

What pattern do you think you see here? I'd be happy to tell you whether 
or not you're correct.

-- 
Nothing of significance below this line.

From lee at novomail.net  Thu Nov  9 12:38:28 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 12:36:42 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455369B6.3070203@perathoner.de>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<45521D1D.10801@novomail.net>
	<455369B6.3070203@perathoner.de>
Message-ID: <455391C4.9030407@novomail.net>

Marcello Perathoner wrote:

[snip]

> And they chose not do it for several good reasons:

[specious arguments snipped]

Knowing that you are already deeply emotionally invested in the current 
TEI process, I recognize the futility of trying to counter any of your 
arguments.

Could you at least furnish us with your XSL scripts so we could use them 
as a starting point for our own transformations?

-- 
Nothing of significance below this line.

From joshua at hutchinson.net  Thu Nov  9 13:00:11 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 13:00:13 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>


>----Original Message----
>From: lee@novomail.net
>
>> And they chose not do it for several good reasons:
>
>[specious arguments snipped]
>
>Knowing that you are already deeply emotionally invested in the 
current 
>TEI process, I recognize the futility of trying to counter any of 
your 
>arguments.
>
>Could you at least furnish us with your XSL scripts so we could use 
them 
>as a starting point for our own transformations?
>

Maybe I'm too close to the issue, myself, but I thought Marcello's 
reasons were very well thought out.  I'd love to see counter arguments, 
so for my sake, maybe?

That said, the conversion scripts are all available here: http://pgtei.
pglaf.org/marcello/0.4/src/gnutenberg-press-0.4.tgz

Josh
From hiddengreen at gmail.com  Thu Nov  9 13:45:06 2006
From: hiddengreen at gmail.com (Cori)
Date: Thu Nov  9 13:51:49 2006
Subject: [gutvol-d] Christmas-related texts in PG
In-Reply-To: 
References: 
Message-ID: <910fee4a0611091345v4afe69d6y8a53edaf4ff19117@mail.gmail.com>

Much as I hate volunteering other people for things ;) ... Librivoxers
would probably have some time for discussing this too ... I know we're
working on various Chrimbo-related projects there already.
http://librivox.org/forum/

Perhaps "Volunteers Wanted: Other Projects" would be a good place for
a question, and there's still lots of time before Christmas, so some
new text suggestions might get all-singing, all-dancing audio versions
in place in the next 6 weeks.  I suspect a number of readers will have
been looking at the topical book possibilities beyond the usual "A
Christmas Carol" (and literal carol-singing, too!)

Cori


On 09/11/06, Andrew Sly  wrote:
>
> Would anyone be interested in helping to create a list of
> Christmas related texts in the collection?
>
> I've already got a start by just searching for texts
> with the word "Christmas" in the title.
>
> Any individual suggestions would be welcome.
>
> Would there be a good place to ask about this on
> the DP foums?
>
> Andrew
>


-- 
To Posterity - and Beyond!
From Bowerbird at aol.com  Thu Nov  9 14:32:06 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 14:32:12 2006
Subject: [gutvol-d] re: in praise of david widger
Message-ID: 

greg said:
>    there are some individuals doing 
>    just as high-quality work as DP, 
>    but working on their own.

in this regard, david widger deserves
a special shoutout, maybe a medal...

not only has he done a lot of books
on his own, with _excellent_ quality,
but david is also a whitewasher who
handles a ton of other submissions.

in addition, he often includes a little
"explanatory overview" in his books,
and not infrequently a collection of
interesting passages as well.

as a special bonus, in e-mail messages
to the posted list, he provides a one-line
weather report from his home in england.

david is a fine british gentleman...

here's a link to an overview of dr. widger's library:
>    http://www.gutenberg.net.au/widger/home.html

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/b8c0787a/attachment.html
From marcello at perathoner.de  Thu Nov  9 14:42:53 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 14:43:01 2006
Subject: [gutvol-d] re: in praise of david widger
In-Reply-To: 
References: 
Message-ID: <4553AEED.5090905@perathoner.de>

Bowerbird@aol.com wrote:

> in this regard, david widger deserves
> a special shoutout, maybe a medal...
...
> in addition, he often includes a little
> "explanatory overview" in his books,
> and not infrequently a collection of
> interesting passages as well.
> 
> as a special bonus, in e-mail messages
> to the posted list, he provides a one-line
> weather report from his home in england.

Oooops. Both Davids just dropped into one pudding ... :-)



-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Nov  9 14:46:31 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 14:46:43 2006
Subject: [gutvol-d] Christmas-related texts in PG
Message-ID: 

here's what richard seltzer of samizdat.com lists under "christmas":
>    Mr. Bamboo and the Honorable Little God -- a Christmas Story by Fannie 
Macaulay
>    Christmas, Its Origin, Celebration, and Significance by Robert Haven 
Schauffler
>    Christmas in Ritual and Tradition by Clement A. Niles
>    The Christmas Kalends of Provence by Thomas Janvier
>    Christmas Sermon by Robert Louis Stevenson, 1900
>    A Christmas Story by Samuel W. Francis
>    Christmas Tales and Christmas Verse by Eugene Field, 1912
>    Holiday Stories for Young People, compiled by Margaret Sangster
>    Holidays at the Grange by Emily Mayer Higgins
>    In the Yule-Log Glow: Christmas Tales Round the World, edited by 
Harrions Morris
>    Little Book for Christmas by Cyrus Townsend Brady
>    Nisby's Christmas by Jacob August Riis
>    On Christmas Day int he Evening by Grace Richmond
>    St. Nicholas [magazine]
>    Trifles for the Christmas Holidays by H.S. Armstrong, 1869
>    Twas the Night Before Christmas by Clement Moore, 1912
>    Yule-tide in Many Lands by Mary Pringle and Clara Urann

as with a good many categories, richard's work will give you a head-start...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/c60a3f70/attachment.html
From marcello at perathoner.de  Thu Nov  9 14:54:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 14:54:17 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <455391C4.9030407@novomail.net>
References: <9715964.1162696462487.JavaMail.?@fh1040.dia.cp.net>	<454D8138.6040307@gmail.com>	<45521D1D.10801@novomail.net>	<455369B6.3070203@perathoner.de>
	<455391C4.9030407@novomail.net>
Message-ID: <4553B192.10005@perathoner.de>

Lee Passey wrote:

> [specious arguments snipped]

I can see that snipping those arguments is much easier than refuting them.


> Could you at least furnish us with your XSL scripts so we could use them
> as a starting point for our own transformations?

Everything PGTEI is, and always has been, GPLed and online.

Start here:

  http://pgtei.pglaf.org/marcello/0.4/



-- 
Marcello Perathoner
webmaster@gutenberg.org

From Bowerbird at aol.com  Thu Nov  9 15:19:38 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 15:19:59 2006
Subject: [gutvol-d] re: gvd061108 -- one preliminary thought on duguid
Message-ID: 

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Morasch@aol.com
Subject: Fwd: [gutvol-d] gvd061108 -- one preliminary thought on duguid
Date: Thu, 9 Nov 2006 18:03:56 EST
Size: 46776
Url: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/a01afb2f/attachment-0001.mht
From Bowerbird at aol.com  Thu Nov  9 15:29:03 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 15:29:12 2006
Subject: [gutvol-d] re: in praise of david widger
Message-ID: 

marcello said:
>    Oooops. Both Davids just dropped into one pudding ... :-)

oh, that's right, i've blended david widger
and david price into one "super-david"...

well, they'll have to _share_ that medal,
because i'm not buying two...           ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/c28e120f/attachment.html
From lee at novomail.net  Thu Nov  9 16:02:58 2006
From: lee at novomail.net (Lee Passey)
Date: Thu Nov  9 16:01:17 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
Message-ID: <4553C1B2.2060900@novomail.net>

joshua@hutchinson.net wrote:

> Maybe I'm too close to the issue, myself, but I thought Marcello's 
> reasons were very well thought out.  I'd love to see counter arguments, 
> so for my sake, maybe?

Marcello's arguments are not necessarily wrong or inaccurate, the are 
simply "non-responsive" (in lawyer-speak). That is to say, they don't 
really respond to the question.

I postulate that it is possible to create TEI-encoded files which can be 
used with all of the existing TEI tools you and Marcello have created 
AND can be used directly in browsers with CSS. In other words, we don't 
have to make a choice between the two rendering processes, we can have 
both simultaneously.

Now if you look and Marcello's arguments 1-3, you will see that while 
they are all valid (and in my opinion persuasive) arguments in favor of 
a robust TEI encoding for Project Gutenberg e-texts, none of them even 
address the issue of dual-use TEI files. And nowhere does he suggest, 
let alone demonstrate, that building CSS support into the TEI master 
files would in any way hinder the creation of other file formats using 
the existing tools.

Argument number 5 is not really an argument at all; it is merely a 
restatement of the conclusion: "We're not going to make /our/ TEI files 
CSS-compatible, but you're welcome to write an XSL script (or any other 
script for that matter -- BB would probably favor Perl at the moment) 
that will convert PGTEI to CSS-compatible TEI." In other words, all it 
does is give us permission to do that which we know we already have the 
right and capability to do (assuming, of course, that the TEI master 
files will always be available without post-processing).

Argument number 4 is the only valid argument of the 5 (this doesn't mean 
I agree with the conclusion, simply that I recognize that it raises 
valid issues). It can be summarized as:

a. nobody wants this capability, so any time we expend implementing 
CSS-enabled TEI files is wasted (this can also be reformulated as 
"nobody will want anything but ZML files, so any time we expend 
implementing PGTEI files is wasted"); and

b. if we provide CSS-enabled TEI files we will unduly confuse those 
people who can't read the caveat "TEI support in Internet Explorer and 
older web browsers is limited; if you can't read this file you should 
download the HTML version instead."

I don't find either of these arguments particularly persuasive. We know 
that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a 
fairly small sample (the gutvol-d mailing list) would like to see 
CSS-enabled TEI. Perhaps I flatter myself, but I think that sample is 
somewhat significant. I also choose to believe (totally without hard 
evidence) that other people would want it too, if they were to discover 
it. Further, from a purely cost/benefit analysis, the cost of 
implementing the solution (addition of two standard  
lines and perhaps some refinement in the PGTEI guidelines, which are 
still undergoing evolution) is virtually zero. Even if the benefit 
derived is very small, the benefit/cost ratio still makes it worth doing.

Likewise, I don't find the "confused end user" argument very persuasive. 
Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files 
offered by Project Gutenberg, has got to have a certain amount of 
technical sophistication. For those who don't know what TEI or XML is, 
even if the attempt to download the files into their browser if it 
doesn't display well they will simply pick a different version. And if 
users are confused by TEI files that render only 90% of the TEI markup, 
think how confused they will be by TEI files that render 0% of the 
markup. Really, the only way to avoid end user confusion is to not offer 
them TEI files at all.

Project Gutenberg is a volunteer-driven organization; it offers 
downloads "as-is" and doesn't provide technical support. If an end user 
can't figure out that TEI files may not be appropriate for them, it's 
not like the technical support phones are going to start ringing.

Now the really fun thing about CSS is that because the content and the 
style is separated, I can discover my own style preferences and apply 
them to all PGTEI files if only the files themselves will contain a 
reference to a well-known style sheet. Personally I am completely 
unimpressed by the HTML created by Marcello's HTML XSL script. At this 
point, my only option is to create a /new/ XSL script to transform the 
TEI into a file matching my tastes, and to run that script every time I 
download a file. If my own computer with my own XSLT engine and XSL 
scripts is not available. But with CSS-enabled TEI I could simply 
maintain my own CSS stylesheet in the local file system and I could 
download a new file and it would just work.

Personally, I think Mr. Bretheim's version of Shirley is /awesome/ -- 
but I'm not a fan of serifed fonts. And on my system the "Fantasy" font 
family is mapped to a deck of cards, so the title of Chapter 1 of 
_Shirley_ is queen of hearts, five of spades, nine of clubs, nine of 
spades, seven of clubs, nine of spades, etc. However, if the stylesheet 
declaration in that file referred first to PGTEI.CSS and then 
PGTEI-USER.CSS I could use PGTEI-USER.CSS to override any or all of the 
default choices with my own. I wouldn't have to create my own master CSS 
file, and I wouldn't have to edit the downloaded TEI to reference the 
standard CSS file. As a matter of fact, I wouldn't even have to create a 
personal CSS file at all: I could just surf the net and find alternative 
CSS files that other people had created and simply select the one I 
liked the best.



I believe that empowering end users is always a good thing, even if it 
requires a little more effort or may create a little confusion. I 
certainly wouldn't suggest this if it would hinder in any way the 
automated tools used to transform PGTEI into other useful formats, but 
so far there has been no suggestion that it would.

Now to be quite honest, I'm not suggesting that you or Marcello change 
your position on this point. I have made the suggestion before, and I'm 
not the type of person who enjoys beating his head against the wall. The 
Project Gutenberg way (PGTAO?) has always been that if you don't like 
the way something is being done, just redo it differently; that is the 
approach I am attempting to take. Indeed, the only request I have of the 
PGTEI project is to not claim that CSS-enabled TEI /cannot/ be done, or 
that it /should not/ be done (how I choose to waste my time is my 
business). If there is a group here that wants to explore how to create 
e-texts using CSS-enabled TEI I would think that such an effort should 
be encouraged.

> That said, the conversion scripts are all available here: http://pgtei.
> pglaf.org/marcello/0.4/src/gnutenberg-press-0.4.tgz

Thanks.


-- 
Nothing of significance below this line.

From Bowerbird at aol.com  Thu Nov  9 16:17:39 2006
From: Bowerbird at aol.com (Bowerbird@aol.com)
Date: Thu Nov  9 16:17:50 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <24b.1211b237.32851f23@aol.com>

lee said:
>    Project Gutenberg is a volunteer-driven organization; it offers
>    downloads "as-is" and doesn't provide technical support.   If an 
>    end user can't figure out that TEI files may not be appropriate for 
them, 
>    it's not like the technical support phones are going to start ringing.

um...   as far as i know, e-mail to project gutenberg from end-users
_is_ answered, at least sometimes anyway, even if "technical support"
is not advertised per se.   and the questions asked can sometimes be
_basic_, as in "what does it mean to 'unzip' a file?"   just so you know...

anyway, until some way is found to speed the creation of .tei files,
it's putting the cart before the horse to argue about their display,
especially since the time when even _trailing-edge_ browsers will
be able to display .tei is at least 5 years down the line, maybe 10...

and by that time, markup will be all washed up...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061109/987d6e1d/attachment.html
From jon at noring.name  Thu Nov  9 17:21:21 2006
From: jon at noring.name (Jon Noring)
Date: Thu Nov  9 17:26:53 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553C1B2.2060900@novomail.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
	<4553C1B2.2060900@novomail.net>
Message-ID: <224991111.20061109182121@noring.name>

Lee wrote:
> joshua@hutchinson.net wrote:

>> Maybe I'm too close to the issue, myself, but I thought Marcello's 
>> reasons were very well thought out.  I'd love to see counter
>> arguments, so for my sake, maybe?

Marcello's answer definitely summarize his thoughts, and I agree were
pretty well presented. But I also see Lee's point.

I'll comment on a couple points Lee made.


> I postulate that it is possible to create TEI-encoded files which
> can be used with all of the existing TEI tools you and Marcello have
> created AND can be used directly in browsers with CSS. In other
> words, we don't have to make a choice between the two rendering
> processes, we can have both simultaneously.

This I agree with, but it requires knowing what "constraints" must be
followed when utilizing PGTEI. A sort of usage "subset."

I've noted three major areas of difficulty:

1) Handling of hypertext links.

2) Handling of embedded images.

3) Handling of  applied inline within a block of text (such as
   a paragraph.)


#1 and #2 might be handleable by using the XHTML namespaced elements
of  and  instead of the TEI equivalents. But now we are no
longer in pure TEI. As far as I know PGTEI has not been extended to
support these (am I right?)

I've demonstrated that #1 is achievable (but only in Mozilla/Firefox)
using XLink, but again the stuff we have to put into the TEI
document renders it no longer pure TEI (it is not sufficient to put
the stuff into the DTD.) Now, if PGTEI allows the extension to XLink,
then #1 (and #2 in principle, but not currently supported in browsers)
can be handled (interesting to see how difficult it would be to add
XLink support for images in Mozilla/Firefox.)

#3 is difficult and limited to handle in CSS. As I've demonstrated,
one can "float" the  and move it elsewhere in the proximity
of the paragraph it came from, but there are limitations (such as
not supported in IE6 -- I just tried IE7 and it does do something,
but still not pretty.)

#3 is solvable for all browsers if one simply constrains PGTEI
authoring to not put  inline at the point of reference, but
again this is limiting.

And finally, there is always the possibility of adding a plugin to
one of the browsers to handle one or more of the above, but I'm
not sure how feasible that is vis-a-vis developer time -- Marcello
believes it to be a major effort -- it probably depends upon what
one wants to do.

Of course, if we simply say our subset won't support inline ,
images and hypertext links, then applying CSS to TEI is eminently
doable. (There may be a couple other TEI constructs that give
problems as well -- some uses of table in TEI may not map correctly
using CSS display properties for table elements.)


> I don't find either of these arguments particularly persuasive. We know
> that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a
> fairly small sample (the gutvol-d mailing list) would like to see 
> CSS-enabled TEI.

Add me to the list!


> Perhaps I flatter myself, but I think that sample is 
> somewhat significant. I also choose to believe (totally without hard 
> evidence) that other people would want it too, if they were to discover
> it. Further, from a purely cost/benefit analysis, the cost of 
> implementing the solution (addition of two standard  
> lines and perhaps some refinement in the PGTEI guidelines, which are 
> still undergoing evolution) is virtually zero. Even if the benefit 
> derived is very small, the benefit/cost ratio still makes it worth doing.

The only thing I can see that PGTEI could do is to extend PGTEI to
allow adding XLink for defining hypertext linking and image embedding
(mostly boilerplate additions) -- this will at least make hypertext
links workable in Firefox, and maybe images if someone will submit
some code to Firefox which maps the XLink into the equivalent 

The  placed inline is a knotty topic. I personally *like*, for
mastering reasons, putting the  inline at the point of first
reference. But it makes it hell to do something with it in a browser
(especially if the same note is referenced more than once in a book!)
And then consider the impracticality of handling a note inside of a
note, which occurs in scholarly books. Someone mentioned a book with
three levels of notes: a note inside a note inside a note. Fun.


> ... If there is a group here that wants to explore how to create
> e-texts using CSS-enabled TEI I would think that such an effort should
> be encouraged.

Well, I'll be glad to join that discussion. Maybe a separate group?

I know there are people in the TEI community who have explored this,
so it goes beyond PG. Sebastian Rahtz and I have privately
communicated in building an "ebook-oriented" TEI subset with various
element/attribute constraints. Nothing has come of this, but he did
express interest.


Jon Noring


(p.s, here's an early attempt at using CSS to visualize TEI
documents: http://math.ut.ee/~kaarel/NLP/TEI/visualization/
Not exactly for ebook use, but may help with the project.

Also see: http://www.tei-c.org.uk/Stylesheets/  )


From joshua at hutchinson.net  Thu Nov  9 18:00:07 2006
From: joshua at hutchinson.net (joshua@hutchinson.net)
Date: Thu Nov  9 18:00:15 2006
Subject: [gutvol-d] TEI rendering in Web browsers
Message-ID: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net>

>----Original Message----
>From: lee@novomail.net
>
>I postulate that it is possible to create TEI-encoded files which can 
be 
>used with all of the existing TEI tools you and Marcello have 
created 
>AND can be used directly in browsers with CSS. In other words, we 
don't 
>have to make a choice between the two rendering processes, we can 
have 
>both simultaneously.

Yes, it is possible.

No, I don't believe it is all that useful.  (I'll explain in detail 
below)

>And nowhere does he suggest, 
>let alone demonstrate, that building CSS support into the TEI master 
>files would in any way hinder the creation of other file formats 
using 
>the existing tools.
>

Nope, wouldn't affect it in the least.  But that isn't where the 
problem with it lies, imo.

>
>a. nobody wants this capability, so any time we expend implementing 
>CSS-enabled TEI files is wasted

I would interpret it slightly differently.  No one gets any benefit 
from this capability that isn't already present, in a MUCH better way, 
in other formats (HTML/PDF).

>
>b. if we provide CSS-enabled TEI files we will unduly confuse those 
>people who can't read the caveat "TEI support in Internet Explorer 
and 
>older web browsers is limited; if you can't read this file you 
should 
>download the HTML version instead."
>

This is actually a HUGE thing.  If you had seen all the confused 
emails/comments I received just about text index files for audio books, 
you'd see that most of the folks that used this resource don't want to 
have to learn anything beyond, "click" and it loads.  If it loads a 
little, but the feature set is limited due to limited web browser 
capability ... people won't know that anything better is available and 
assume that text is just crap.  At least if the TEI file won't load at 
all, they will understand that something is wrong (and we can funnel 
them away from the TEI just by laying out the catalog screens to 
provide emphasis to other files).

>We know 
>that at least 3 people (myself, Sam Bretheim and Keith Schultz) out 
of a 
>fairly small sample (the gutvol-d mailing list) would like to see 
>CSS-enabled TEI.

It isn't that I don't think there isn't a significant number that 
would like it, rather that it is a better use of time and energy to 
make the HTML or PDF files better.


>Further, from a purely cost/benefit analysis, the cost of 
>implementing the solution (addition of two standard  
>lines and perhaps some refinement in the PGTEI guidelines, which are 
>still undergoing evolution) is virtually zero. Even if the benefit 
>derived is very small, the benefit/cost ratio still makes it worth 
doing.
>

Ok, now this is where I have to disagree 100%.  That much of an 
addition *would* make it render, but it would be a severely handicapped 
version.  For instance, footnotes would not look very good.  CSS is 
just not good at pulling information out of the text flow and placing 
it somewhere else.  Some data structures/formatting are of a dynamic 
nature, too.  For instance, table of contents, indexes, math equations, 
etc are created by the XSL and associated scripts.  In a CSS-enabled 
TEI file, that stuff would either be non-existent or basically 
gibberish (ie, LaTeX equations are gibberish in their native form to 
99.9% of the world).  

This is the biggest reason I have for not wanting CSS-enabled TEI 
files.  There is a lot of stuff that is not possible to display 
properly.

>Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files 
>offered by Project Gutenberg, has got to have a certain amount of 
>technical sophistication. 

This one actually made me chuckle.  You've obviously never been in 
charge of answering the PG email!  ;)

>For those who don't know what TEI or XML is, 
>even if the attempt to download the files into their browser if it 
>doesn't display well they will simply pick a different version. 

No they won't.  They will either assume 1) PG's file is crap or 2) 
they did something wrong and slink away.  

Some people will try other things.  Most will give up.

>Really, the only way to avoid end user confusion is to not offer 
>them TEI files at all.

There is something to that statement.  It is one of the reason I am 
glad that TEI usually lists at or near the bottom in the PG catalog 
pages.  They have to pass up a lot of other GOOD files to pick the one 
that doesn't work like that.

>Personally I am completely 
>unimpressed by the HTML created by Marcello's HTML XSL script. 

Honestly, this is the feedback I'm most interested in from your 
message.  *What* isn't good?  *What* can we improve?  (And keep in 
mind, the system allows for a LOT of CSS styling that can be specified 
in the master document that I don't use.  I'm one that likes simple 
clean layouts, but you can get crazy).

>At this 
>point, my only option is to create a /new/ XSL script to transform 
the 
>TEI into a file matching my tastes, and to run that script every time 
I 
>download a file. But with CSS-enabled TEI I could simply 
>maintain my own CSS stylesheet in the local file system and I could 
>download a new file and it would just work.

Actually, if you're up to that, you also up to adding a personalized 
CSS to the HTML document, which would probably be easier.  But to each 
his own.  Everyone *is* free to do what they want with the files; that 
is the great thing about them being public domain files! :)

>As a matter of fact, I wouldn't even have to create a 
>personal CSS file at all: I could just surf the net and find 
alternative 
>CSS files that other people had created and simply select the one I 
>liked the best.

This is actually a feature that we've talked about a lot for the HTML 
files.  But it is something that we put on the back-burner.  Just felt 
other things had a higher priority.

>
>

I assume you mean a XHTML file that references an external CSS file?  
We had that in older versions, but PG whitewashers did not like 
separate CSS files, so we moved to an integrated file approach.

>Indeed, the only request I have of the 
>PGTEI project is to not claim that CSS-enabled TEI /cannot/ be done, 
or 
>that it /should not/ be done (how I choose to waste my time is my 
>business).

Let me say in no uncertain terms.  It CAN be done.  And I DON'T want 
anyone to think it shouldn't be done by others.  I just don't believe 
it should be done by PG scripts.  For those, like you Lee, that *can* 
work with TEI files directly, the files are available.  

Seriously, I'm concerned with folks like my grandmother, who is 87 
years old. She would be horribly confused by a CSS-enabled TEI file 
that didn't display a Table of Contents or had footnotes splashed in 
the middle of the text, etc.

>If there is a group here that wants to explore how to create 
>e-texts using CSS-enabled TEI I would think that such an effort 
should 
>be encouraged.
>

Go for it.  There is benefit to it.  Just not as a general download 
option at this time.  Maybe we can revisit this when browsers are 
better at it and I'll have a completely different opinion.

Josh

PS  I am serious about wanting your feedback on the XHTML we generate 
and ideas for ways to improve the default CSS that is in those files.
From marcello at perathoner.de  Thu Nov  9 18:03:24 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 18:03:32 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <4553C1B2.2060900@novomail.net>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>
	<4553C1B2.2060900@novomail.net>
Message-ID: <4553DDEC.7010007@perathoner.de>

Lee Passey wrote:

> Argument number 5 is not really an argument at all; it is merely a
> restatement of the conclusion: "We're not going to make /our/ TEI files
> CSS-compatible, but you're welcome to write an XSL script (or any other
> script for that matter -- BB would probably favor Perl at the moment)
> that will convert PGTEI to CSS-compatible TEI."

If you think you can make CSS work with PGTEI as it is, you have nothing
to complain. Just start working on it.

If you think you need changes to PGTEI to make it work with CSS, then I
will answer you that in my opinion the play isn't worth the candle
(unless the changes are absolutely trivial like inserting a link into
the header).


PGTEI is a very thin layer of additional standardization over standard
TEI. The virtually only thing that PGTEI defines is how to interpret the
"rend" attribute on TEI elements. This attribute's contents was
intentionally left undefined in the TEI standard because it was felt
that the needs of different markup projects were too far apart to make a
standardization of this attribute desirable or viable.

This very thin layer borrows from other well-known standards whenever
possible. PGTEI uses CSS2/3-compatible values for the "rend" attribute.
Thus it should not be difficult to process the PGTEI "rend" attribute
into an XML "style" attribute that any CSS-capable browser can understand.


> a. nobody wants this capability, so any time we expend implementing
> CSS-enabled TEI files is wasted

Moreover: if we try to implement a kitchen sink where anybody can stuff
their dirty plates in at will, we risk to fail. One of the more
important responsibilities of the software architect is complexity
management. Any added feature -- especially a feature that implements a
completely new usage model -- will make the project more complex, thus
increasing the risk of failure. It needs to be a very important feature
to make me change the design goals at this late stage.


> b. if we provide CSS-enabled TEI files we will unduly confuse those
> people who can't read the caveat "TEI support in Internet Explorer and
> older web browsers is limited; if you can't read this file you should
> download the HTML version instead."

"Can't read this file" is relative. The user may not be able to tell the
level of degradation she is just experiencing.


> I don't find either of these arguments particularly persuasive. We know
> that at least 3 people (myself, Sam Bretheim and Keith Schultz) out of a
> fairly small sample (the gutvol-d mailing list) would like to see
> CSS-enabled TEI.

Then I suggest those 3 people put their heads together and just do it.


> Further, from a purely cost/benefit analysis, the cost of
> implementing the solution (addition of two standard 
> lines and perhaps some refinement in the PGTEI guidelines, which are
> still undergoing evolution) is virtually zero. Even if the benefit
> derived is very small, the benefit/cost ratio still makes it worth doing.

If you have aready done a cost/benefit analysis, then please state
*exactly* what "refinements" you want, and I'll tell you if they fit in.


> Likewise, I don't find the "confused end user" argument very persuasive.
> Anyone who 1. finds Project Gutenberg, and 2. finds the TEI files
> offered by Project Gutenberg, has got to have a certain amount of
> technical sophistication.

Anybody with enough sophistication to type "free books" into Google will
find PG. If this person then just clicks on "TEI" and gets an
indifferently rendered version, he might not be able to tell that he's
viewing a degraded version.


> For those who don't know what TEI or XML is,
> even if the attempt to download the files into their browser if it
> doesn't display well they will simply pick a different version. And if
> users are confused by TEI files that render only 90% of the TEI markup,
> think how confused they will be by TEI files that render 0% of the
> markup. Really, the only way to avoid end user confusion is to not offer
> them TEI files at all.

Again. 0% is a far better choice than 90%. Users may not be able to tell
that they are getting 90%.

Users will recognize that a TEI source file dumped into the browser is
not what they want and seek alternatives, but they might not be able to
tell eg. that their browser just dropped all footnotes into the bit
bucket because of a buggy XSLT implementation.


> Personally, I think Mr. Bretheim's version of Shirley is /awesome/ --
> but I'm not a fan of serifed fonts. And on my system the "Fantasy" font
> family is mapped to a deck of cards, so the title of Chapter 1 of
> _Shirley_ is queen of hearts, five of spades, nine of clubs, nine of
> spades, seven of clubs, nine of spades, etc.

And now imagine Aunt Tillie confronted with this problem and then say
again that what you propose is user-friendly.



-- 
Marcello Perathoner
webmaster@gutenberg.org

From marcello at perathoner.de  Thu Nov  9 19:05:10 2006
From: marcello at perathoner.de (Marcello Perathoner)
Date: Thu Nov  9 19:05:18 2006
Subject: [gutvol-d] TEI rendering in Web browsers
In-Reply-To: <224991111.20061109182121@noring.name>
References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net>	<4553C1B2.2060900@novomail.net>
	<224991111.20061109182121@noring.name>
Message-ID: <4553EC66.9070202@perathoner.de>

Jon Noring wrote:

> I've noted three major areas of difficulty:
> 
> 1) Handling of hypertext links.
> 
> 2) Handling of embedded images.
> 
> 3) Handling of  applied inline within a block of text (such as
>    a paragraph.)

Ever notice the different semantics of 

in TEI and HTML? A

in TEI might contain a lot of elements a

in HTML is not allowed to contain:

The scrap of paper read: GXLKDS Holmes chuckled.

Is perfectly valid TEI. Now transform this naively into XHTML and you get:

The scrap of paper read:

GXLKDS
Holmes chuckled.

Perfectly invalid XHTML. The browser will notice this and chicken out into "quirks" HTML. It will close the

s for you and will render like this:

The scrap of paper read:

GXLKDS

Holmes chuckled.

Losing the "fancy" formatting on the rest of the snippet. Did I mention a

in TEI can contain tables and lists? > #1 and #2 might be handleable by using the XHTML namespaced elements > of and instead of the TEI equivalents. But now we are no > longer in pure TEI. As far as I know PGTEI has not been extended to > support these (am I right?)

will not map gracefully to any XLink construct because
may contain and . Why not use browser XSLT to map
to ? > I've demonstrated that #1 is achievable (but only in Mozilla/Firefox) > using XLink, but again the stuff we have to put into the TEI > document renders it no longer pure TEI (it is not sufficient to put > the stuff into the DTD.) Now, if PGTEI allows the extension to XLink, > then #1 (and #2 in principle, but not currently supported in browsers) > can be handled (interesting to see how difficult it would be to add > XLink support for images in Mozilla/Firefox.) Basically you are suggesting to make PGTEI incompatible to TEI and to throw in all the complexity of XLink just to maybe make it displayable in Mozilla (and nowhere else)? > #3 is solvable for all browsers if one simply constrains PGTEI > authoring to not put inline at the point of reference, but > again this is limiting. So you support the feature by forbidding its use? > Of course, if we simply say our subset won't support inline , > images and hypertext links, then applying CSS to TEI is eminently > doable. Lets simply say PGTEI doesn't support direct rendering in browsers. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Thu Nov 9 19:20:07 2006 From: jon at noring.name (Jon Noring) Date: Thu Nov 9 19:20:18 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <4553EC66.9070202@perathoner.de> References: <20740354.1163106011315.JavaMail.?@fh1037.dia.cp.net> <4553C1B2.2060900@novomail.net> <224991111.20061109182121@noring.name> <4553EC66.9070202@perathoner.de> Message-ID: <1641943390.20061109202007@noring.name> Marcello wrote: > Jon Noring wrote: >> Of course, if we simply say our subset won't support inline , >> images and hypertext links, then applying CSS to TEI is eminently >> doable. > Lets simply say PGTEI doesn't support direct rendering in browsers. I overall agree with you, for the reasons you cite plus the ones I brought up (e.g, difficulty of the links, images, , etc.). In addition, since PGTEI is simply TEI with a few added things and constraints, it is pretty free-form as TEI is free-form (and as HTML is pretty free-form), making it impossible to develop a library of standardized CSS style sheets that apply to all PGTEI documents. The only thing that can be done, and this is probably outside of PGTEI, is to settle upon a type of "TEI-lighter" (not necessarily the same as "TEI-Lite") which is so well constrained that it is possible to develop a standardized set of style sheets for rendering, either for the TEI vocabulary subset, or of the XHTML transform. But then this "TEI-Lighter" will likely not be able to represent a certain percentage of the texts PG and DP digitize. One thing I've learned, as I know all the DPers and PGers have, is that the diversity of structures used in old documents is very broad. Jon Noring From tb at baechler.net Fri Nov 10 01:12:42 2006 From: tb at baechler.net (Tony Baechler) Date: Fri Nov 10 01:34:15 2006 Subject: [gutvol-d] Christmas-related texts in PG In-Reply-To: References: Message-ID: <20061110091219.IXWX6331.dukecmmtar02.coxmail.com@Tony.baechler.net> At 11:24 AM 11/9/06 -0800, you wrote: >Would anyone be interested in helping to create a list of >Christmas related texts in the collection? Yes, I've already done a lot of this. Here is my list of them. To my knowledge, this includes all of them to date although I could be missing a few. If there is a Christmas story included, it's listed even though the rest of the text may not have anything to do with Christmas or holidays. Also there are others not listed because I've already read them, such as _A Kidnapped Santa Clause_ by Baum, etc. If you need exact titles, please ask. This was what I found with grep, so my apologies for the lack of decent formatting but this at least gives you a list to work from. 7abgh10.txt:Release Date: August, 2005 [EBook #8694] 7loui10.txt:Release Date: February, 2005 [EBook #7425] cbcst10.txt:Release Date: February, 2004 [EBook #5061] pg10813.txt:Release Date: January 23, 2004 [EBook #10813] pg11014.txt:Release Date: February 10, 2004 [EBook #11014] pg12881.txt:Release Date: July 11, 2004 [EBook #12881] pg12974.txt:Release Date: July 21, 2004 [EBook #12974] pg13158.txt:Release Date: August 10, 2004 [EBook #13158] pg13213.txt:Release Date: August 18, 2004 [EBook #13213] pg14572.txt:Release Date: January 3, 2005 [EBook #14572] pg14624.txt:Release Date: January 6, 2005 [EBook #14624] pg14629.txt:Release Date: January 7, 2005 [EBook #14629] pg14667.txt:Release Date: January 11, 2005 [EBook #14667] pg15034.txt:Release Date: February 13, 2005 [EBook #15034] pg15044.txt:Release Date: February 14, 2005 [EBook #15044] pg15078.txt:Release Date: February 16, 2005 [EBook #15078] pg15343.txt:Release Date: March 12, 2005 [EBook #15343] pg15552.txt:Release Date: April 5, 2005 [EBook #15552] pg15709.txt:Release Date: April 25, 2005 [EBook #15709] pg16498.txt:Release Date: August 9, 2005 [EBook #16498] pg16648.txt:Release Date: September 4, 2005 [EBook #16648] pg17006.txt:Release Date: November 5, 2005 [EBook #17006] pg17456.txt:Release Date: January 4, 2006 [EBook #17456] pg17510.txt:Release Date: January 13, 2006 [EBook #17510] pg17562.txt:Release Date: January 21, 2006 [EBook #17562] pg17630.txt:Release Date: January 29, 2006 [EBook #17630] pg17743.txt:Release Date: February 10, 2006 [EBook #17743] pg17770.txt:Release Date: February 16, 2006 [EBook #17770] pg17937.txt:Release Date: March 6, 2006 [EBook #17937] pg18570.txt:Release Date: June 12, 2006 [EBook #18570] pg18720.txt:Release Date: June 29, 2006 [EBook #18720] pg18725.txt:Release Date: July 1, 2006 [EBook #18725] pg18770.txt:Release Date: July 6, 2006 [EBook #18770] pg19014.txt:Release Date: August 9, 2006 [EBook #19014] pg19084.txt:Release Date: August 20, 2006 [EBook #19084] pg19098.txt:Release Date: August 21, 2006 [EBook #19098] pg19337.txt:Release Date: September 20, 2006 [EBook #19337] pg19384.txt:Release Date: September 26, 2006 [EBook #19384] pg19587.txt:Release Date: October 19, 2006 [EBook #19587] pg2556.txt:Release Date: May 18, 2006 [EBook #2556] pg2597.txt:Release Date: May 21, 2006 [EBook #2597] pg2731.txt:Release Date: May 25, 2006 [EBook #2731] pg46.txt:Release Date: August 11, 2004 [EBook #46] stngc10.txt:Release Date: April, 2004 [EBook #5403] tlrco10.txt:Release Date: August, 2004 [EBook #6373] -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.14.0/525 - Release Date: 11/9/06 From prosfilaes at gmail.com Fri Nov 10 05:45:58 2006 From: prosfilaes at gmail.com (David Starner) Date: Fri Nov 10 05:46:04 2006 Subject: [gutvol-d] Earliest DP texts stripped of source info? In-Reply-To: References: <21300953.1163036640831.JavaMail.?@fh1063.dia.cp.net> <24296608.20061108202719@noring.name> Message-ID: <6d99d1fd0611100545h31e8e617i5a85f3f7e9960887@mail.gmail.com> On 11/9/06, Andrew Sly wrote: > And I just ran into one more example of a thing I've > seen a few times before. Where an annonymous text > is in the DP system with the publisher listed as > "author". This is then (hopefully) corrected as > it enters the PG collection. As a project manager at DP, I've never regarded the author and title fields as real metadata, rather more as tools to keep the system from spitting out the entire series at once and to communicate information of interest to the proofer. Slightly offtopic, but today's nightmare in metadata was my Chippeway (and Chipeway) grammars, which I had loaded into the system under Chippewayan. Of course, as is obvious to everyone, Chippewayan is completely unrelated to Chippeway, which is known to the system (using the ISO-639-2 list of names) as Objiway. At least we caught it before we sent it on to DP that way. From lee at novomail.net Fri Nov 10 09:03:37 2006 From: lee at novomail.net (Lee Passey) Date: Fri Nov 10 09:01:59 2006 Subject: [gutvol-d] TEI rendering in Web browsers In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net> References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net> Message-ID: <4554B0E9.7080601@novomail.net> joshua@hutchinson.net wrote: [snip] > Yes, it is possible. > > No, I don't believe it is all that useful. (I'll explain in detail > below) [snip] > It isn't that I don't think there isn't a significant number that > would like it, rather that it is a better use of time and energy to > make the HTML or PDF files better. Hey, you asked. I'm just pointing out that there are no technical issues; the only real issue is the cost/benefit of actually doing it, and on that point you and I will simply have to agree to disagree. [snip] > I assume you mean a XHTML file that references an external CSS file? > We had that in older versions, but PG whitewashers did not like > separate CSS files, so we moved to an integrated file approach. I want to expand on this so we know that we're both on the same page. I am assuming from what you say here that you are talking about a situation where each e-text has its own external CSS file. That is, file 19725.html contains a link to 19725.css and frank15.html contains a link to frank15.css. To me, this method makes no sense, and I can understand why PG whitewashers would be opposed. If there is a one-to-one correspondence between a content file and a stylesheet file, the two ought to be merged into a single file, for file maintenance reasons if nothing else. When it comes to HTML files I have only a single rule on which I am unwilling to compromise: /you don't get to dictate how an e-text looks on my system!/ Now it takes a lot of work to strip out someone else's ill-conceived notion of presentation and replace it with my own, particularly if you do really stupid things like Microsoft Word does and repeatedly attach the same style to every paragraph. A process where I don't have to edit /every file/ I download, sometimes non-trivially, is desirable. So in support of my #1 rule I suggest three sub rules: a. No style rules should be included in any XML content file (of any vocabulary) except via CSS declarations; b. every XML content file should reference a single, well-known CSS file that contains all the style declarations used by a certain class of file (e.g. all Project Gutenberg HTML files); and c. every XML content file should reference a single CSS file with a well-known file name but which is reserved exclusively for end-user use. In this scenario, Project Gutenberg would maintain one, count them, /one/, CSS file (per XML vocabulary). Every XML content file would be designed to have /adequate/ styling using only the default presentation for that vocabulary (I'm thinking primarily XHTML here) without the inclusion of /any/ CSS, but would include links to include the base CSS file and the user CSS file if they exist on the local file system. Thus, IMHO, content file + content specific CSS file == bad, content file + universal CSS file == good. >> Indeed, the only request I have of the PGTEI project is to not >> claim that CSS-enabled TEI /cannot/ be done, or that it /should >> not/ be done (how I choose to waste my time is my business). > > Let me say in no uncertain terms. It CAN be done. And I DON'T want > anyone to think it shouldn't be done by others. I just don't > believe it should be done by PG[TEI] scripts. This is what I have always believed, and have stated many times. I'm not asking you to adopt my belief system any more than I am willing to adopt yours. I am simply exploring the possibility of creating a new set of scripts which compete with yours, and which, given Project Gutenberg's historical even-handed approach to innovation, I'm sure it would be willing to host. -- Nothing of significance below this line. From Bowerbird at aol.com Fri Nov 10 09:05:47 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 10 09:05:55 2006 Subject: [gutvol-d] Earliest DP texts stripped of source info? Message-ID: david said: > As a project manager at DP, I've never regarded > the author and title fields as real metadata in .zml, the author and title are considered as _part_of_the_data_, not as "metadata"... indeed, they are considered to be a very _important_ part of the data, so important that the title is expected to be the first thing listed in the book. the author is usually the second thing, expect in cases where something else supersedes it in importance -- such as the name of the editor of an anthology. anything else that follows in importance, but which is still important enough to list, is listed next to that in this first section... of course, this kind of _transparency_ is the hallmark of .zml. which is not to say that you cannot have a "metadata" section in a .zml file. you can... you can have any kind of section you want. just start a section, and label it accordingly. the philosophy is that if it is important enough information about the file to be saved elsewhere, it's important enough to be put into the file itself, and done so in a way that it can be easily harvested from the file itself, and thus regenerated any time... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/6adf09fd/attachment-0001.html From Bowerbird at aol.com Fri Nov 10 09:16:57 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 10 09:17:07 2006 Subject: [gutvol-d] TEI rendering in Web browsers Message-ID: hey folks, here's a suggestion for consideration: take the .tei negotiations to a different listserve! as an incentive for you, i promise i would _not_ join that list, so you will have no one making fun of you. as the .tei efforts move forward, it's likely that you will have a boatload of .tei "experts" coming here and saying "what if you did it _this_ way instead?", and the second-guessing will drive everyone crazy, especially people who don't like swimming through the acronym soup. so please do us the big favor, eh? think of it as your own little technoid play-pen, where you can impress your peers with your spiffy tech-talk... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/d1342f5f/attachment.html From Bowerbird at aol.com Fri Nov 10 09:18:37 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 10 09:18:48 2006 Subject: [gutvol-d] Earliest DP texts stripped of source info? (typo flame) Message-ID: i said: > the author is usually the second thing, > expect in cases where something else haha, i said "expect" where i meant "except". what an idiot i am! :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/f89f4bc3/attachment.html From Bowerbird at aol.com Fri Nov 10 12:47:46 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 10 12:47:51 2006 Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for "babelfish" Message-ID: <3f8.2dc78637.32863f72@aol.com> here was my first-pass on a 2-up text-template for "babelfish": > http://www.greatamericannovel.com/scgi-bin/twocol.html and marcello, it validates! it's in xhtml 1.0 strict, and i seem to remember some problem mentioned with that, but anyway... you might remember the point of the .html version -- aside from the obvious of putting it on the web -- is to enable conversions to formats like mobipocket, rocketbook, plucker, palm ereader, .lit, and such, so eventually i'll probably have to notch this back so that it will work with all of those converters, but for now i'm satisfied with getting the look down... it's fragile, but seems to be working based on these screenshots: > http://www.greatamericannovel.com/scgi-bin/twocolcamino.jpg > http://www.greatamericannovel.com/scgi-bin/twocolsafari.jpg > http://www.greatamericannovel.com/scgi-bin/twocolie5.jpg if anyone has any feedback, let me know... thanks! have a good weekend! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/5a5fe966/attachment.html From Catenacci at Ieee.Org Fri Nov 10 12:58:20 2006 From: Catenacci at Ieee.Org (Onorio Catenacci) Date: Fri Nov 10 12:58:59 2006 Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for "babelfish" In-Reply-To: <3f8.2dc78637.32863f72@aol.com> References: <3f8.2dc78637.32863f72@aol.com> Message-ID: On 11/10/06, Bowerbird@aol.com wrote: > here was my first-pass on a 2-up text-template for "babelfish": > > http://www.greatamericannovel.com/scgi-bin/twocol.html > > and marcello, it validates! > > it's in xhtml 1.0 strict, and i seem to remember > some problem mentioned with that, but anyway... > > you might remember the point of the .html version > -- aside from the obvious of putting it on the web -- > is to enable conversions to formats like mobipocket, > rocketbook, plucker, palm ereader, .lit, and such, so > eventually i'll probably have to notch this back so > that it will work with all of those converters, but > for now i'm satisfied with getting the look down... > > it's fragile, but seems to be working based on these screenshots: > > > http://www.greatamericannovel.com/scgi-bin/twocolcamino.jpg > > > http://www.greatamericannovel.com/scgi-bin/twocolsafari.jpg > > > http://www.greatamericannovel.com/scgi-bin/twocolie5.jpg > > if anyone has any feedback, let me know... > > thanks! have a good weekend! You may want to look at http://www.csszengarden.com. I'm pretty sure it's possible to get a two-column layout like what you're looking for without having to embed
tags. Just a thought. -- Onorio From Bowerbird at aol.com Fri Nov 10 13:40:36 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Nov 10 13:40:43 2006 Subject: [gutvol-d] gvd06110 -- first-pass on a 2-up text-template for "babelfish" Message-ID: onorio said: > You may want to look at http://www.csszengarden.com. can you be more specific? :+) > I'm pretty sure it's possible > to get a two-column layout > like what you're looking for > without having to embed
tags. oh, ok, well yes, that would be good feedback, onorio. except i'm not using the break tags to create the columns. they were just in there because -- up to this point -- i've been maintaining the linebreaks from the p-book, to make the comparison with the page-scan maximally easy to do... but you're right that since this 2-up text-template is essentially the next step _beyond_ that, the original linebreaks might be thought of as unnecessary. so here's an example where they are taken out: > http://www.greatamericannovel.com/scgi-bin/twocolnobr.html this example also changes to book-style _indented paragraphs from the blank-line-as-paragraph-separator model used before. and, as you'd expect, the original end-line hyphenates are joined. further changes would be the segue over to curly "smart" quotes and proper typographic em-dashes, but i didn't bother with that. of course, you could also argue from this standpoint that now the original _pagebreaks_ are unnecessary as well, and you'd be right about that too. and indeed one of my future examples will go on to take _that_ step as well, but we're just doing one step at a time here now... :+) -bowerbird p.s. but, if you're interested, if a reader opts to abandon original pagebreaks, then the text-chunking mechanism becomes how much text can be displayed on one screen, given the current window-size and their chosen text-size. this starts to get complicated, to the point that we might need to switch up from perl to java or flash or something. unless, of course, we opt for the simple solution and just display a whole section -- or even the whole book -- in a single column and force the person to scroll the thing. however, any respectable e-book designer would rather throw up than resort to that inferior interface ugliness, because proper paging increases readability immensely. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061110/1dda3ba4/attachment.html From ke at gnu.franken.de Fri Nov 10 23:50:59 2006 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Nov 10 23:51:42 2006 Subject: Seeking whitewashers, errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source info?) In-Reply-To: <20061109191715.GC26863@mail.pglaf.org> (Greg Newby's message of "Thu\, 9 Nov 2006 11\:17\:15 -0800") References: <754776753.20061108133237@noring.name> <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com> <20061109191715.GC26863@mail.pglaf.org> Message-ID: Greg Newby writes: > Mostly we're just trying to save the errata reports, because we don't > have enough staff to act on them. Many will not get attention until > there are people willing to investigate and apply the changes. That's a good thing. It would be nice if you could publish the reports as add-ons to the projects. I hope you never ever apply reports without careful checking. Often, words that look wrong are actually written that way by the author. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From ke at gnu.franken.de Fri Nov 10 23:38:49 2006 From: ke at gnu.franken.de (Karl Eichwalder) Date: Fri Nov 10 23:56:28 2006 Subject: [gutvol-d] Re: Project Gutenberg In-Reply-To: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be> (Johan Boelaert's message of "Wed\, 8 Nov 2006 19\:40\:56 +0100") References: <200611081841.kA8If0LW015376@outmx025.isp.belgacom.be> Message-ID: "Johan Boelaert" writes: > Every now and then I visit Project Gutenberg Europe, hoping something has > changed to improve it. But today also, it hasn't. > > On the homepage (http://pge.rastko.net/), we read: "Project Gutenberg Europe > is follower of Project Gutenberg philosophy, focused primarily on digitizing > European cultures, under European copyright laws." If this were true, it > would be an excellent opportunity to publish European books online, which > can't be published in Gutenberg U.S., because of the much more severe > copyright laws over there. Finding these books from pge does not seem that easy, but for all I know, the 350+ books already done at pg-eu are listed in the PG catalog. I can go for the "gold" books listed at http://dp.rastko.net/ or simply browse here: http://pge.rastko.net/dirs/pge/ At dp.rastko.net, I am pushing Friedrich Gundolf's "Romantiker" essays through the rounds which were published 1930/32 in two books. Gundolf died in 1931. -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From sly at victoria.tc.ca Sat Nov 11 00:13:48 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Nov 11 00:13:51 2006 Subject: Seeking whitewashers, errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source info?) In-Reply-To: References: <754776753.20061108133237@noring.name> <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com> <20061109191715.GC26863@mail.pglaf.org> Message-ID: Karl: Yes, that's very true. That's why the ideal person for helping would be someone very detail oriented who has some idea of the wide variety of texts that can be found in PG. Also some patience... I remember Jim Tinsley saying once that about 50% of the "suggested corrections" he received were actually incorrect. >From the catalog point of view, I remember that before the catalog had its current form, I got more than a few emails saying "Why do you have Winston Churchill's dates wrong?", to which I would explain each time that there was an American author called Winston Churchill, who was not the same person as the British politician. Another thing that can happen easily is somthing that looks like it will be a simple, quick fix, leads to a text that would take many hours of work, to get up to the standard we expect from PG texts today. Andrew On Sat, 11 Nov 2006, Karl Eichwalder wrote: > Greg Newby writes: > > > Mostly we're just trying to save the errata reports, because we don't > > have enough staff to act on them. Many will not get attention until > > there are people willing to investigate and apply the changes. > > That's a good thing. It would be nice if you could publish the reports > as add-ons to the projects. > > I hope you never ever apply reports without careful checking. Often, > words that look wrong are actually written that way by the author. > > From hyphen at hyphenologist.co.uk Sat Nov 11 00:35:55 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Nov 11 00:36:06 2006 Subject: Seeking whitewashers, errata handlers (Re: [gutvol-d] Earliest DP texts stripped of source info?) In-Reply-To: References: <754776753.20061108133237@noring.name> <6d99d1fd0611081355i73507b7bhfd0e7583c5fa7438@mail.gmail.com> <20061109191715.GC26863@mail.pglaf.org> Message-ID: On Sat, 11 Nov 2006 00:13:48 -0800 (PST), Andrew Sly wrote: |I remember Jim |Tinsley saying once that about 50% of the "suggested |corrections" he received were actually incorrect. Also 50% of suggested errors by the whitewashers are actually incorrect or errors in the paper copy. "Hiding to nothing" or "Rock and Hard Place" spring to mind :-( -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From Bowerbird at aol.com Sat Nov 11 11:44:36 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Nov 11 11:44:43 2006 Subject: [gutvol-d] gvd061111 -- erecting more dynamic intralinear tensions Message-ID: <473.8f46abd.32878224@aol.com> i'm going on a very dangerous mission. they asked for volunteers. it's very dangerous, but someone has to do it. i'll write you the instant i get back. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20061111/086e49d9/attachment.html From lee at novomail.net Sat Nov 11 15:25:31 2006 From: lee at novomail.net (Lee Passey) Date: Sat Nov 11 15:24:52 2006 Subject: [gutvol-d] TEI to XHTML transformations In-Reply-To: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net> References: <12481777.1163124007435.JavaMail.?@fh1038.dia.cp.net> Message-ID: <45565BEB.9020003@novomail.net> joshua@hutchinson.net wrote: > Honestly, this is the feedback I'm most interested in from your > message. *What* isn't good? *What* can we improve? (And keep in > mind, the system allows for a LOT of CSS styling that can be > specified in the master document that I don't use. I'm one that > likes simple clean layouts, but you can get crazy) For this analysis I started with two documents: _Alice in Wonderland_ (GutenText #11) available through Marcello's examples page at http://pgtei.pglaf.org/marcello/0.4/examples/alice/ (which, BTW, is not available via the regular download process at gutenberg.org). I also chose to examine _The Siouan Indians_ (GutenText #19628) on the assumption that as a very recent TEI document it more likely reflect the most recent evolution of the PGTEI XSL scripts. This assumption may have been unnecessary; if the XHTML files are generated dynamically using XSLT each time they are downloaded then an end user will /always/ get the result of the most recent iteration of the transformation tools no matter when the master TEI file was generated. But at least it does no harm. This reply may seem a bit disjointed as I am composing it as I examine the files, and I am recording my notes as I encounter things. Having downloaded the generated XHTML files I ran them through HTML Tidy to regularize the presentation for human analysis and to convert the native UTF-8 encoding to ASCII. Older versions of Tidy complained about "nested emphasis" when encountering the construct "..." In these cases it is more elegant to simply combine the two elements into a single element when they are co-terminous (and I vaguely recall that I may have added code to Tidy to do that in a subsequent release; but I could be wrong). In any case, the construct, while inelegant, is not incorrect, and given the limitation of XSL transformations it is probably not objectionable. The next thing I noticed when examining the XHTML file is that /every/

tag is qualified with both a "class" attribute and a "style" attribute. As I think I have made clear by now, this is totally unacceptable. The amount of effort required to replace all of these style attributes with what it should really be ("style='margin: 0; text-indent: 2em") is simply too great. On closer examination you will see that every

element is also qualified with a "class='tei-p'" attribute, and the