From joey at joeysmith.com Thu Feb 1 16:37:42 2007 From: joey at joeysmith.com (joey) Date: Thu, 1 Feb 2007 17:37:42 -0700 Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <20070202003742.GA25119@joeysmith.com> > 1. find a "magic" character, one that's _not_ in the file. > > 2. change all carriagereturn+linefeed to the magic one. > > 3. change all carriagereturns to the magic character. > > 4. change all linefeeds to the magic character as well. > (and yes, you really need to do all three of these, > with the carriagereturn+linefeed one done first, > because there actually exist some e-texts that > contain _multiple_ types of newlines in them...) Is there any good reason not to unify these to one particular style of newline across the board? "\r\n" seems a likely candidate since the other platforms are generally more lenient towards Windows line-endings than Windows is toward theirs, in my (albeit limited) experience. From j.hagerson at comcast.net Sat Feb 3 07:19:41 2007 From: j.hagerson at comcast.net (John Hagerson) Date: Sat, 3 Feb 2007 09:19:41 -0600 Subject: [gutvol-d] "Books Posted or Updated" page not updated Message-ID: <020a01c747a6$ba52f2d0$1f12fea9@sarek> It was brought to my attention that the "Books Posted or Updated Since" page http://www.gutenberg.org/browse/recent/last1 has not been updated since February 1. Has something broken? From marcello at perathoner.de Sat Feb 3 12:51:16 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 03 Feb 2007 21:51:16 +0100 Subject: [gutvol-d] "Books Posted or Updated" page not updated In-Reply-To: <020a01c747a6$ba52f2d0$1f12fea9@sarek> References: <020a01c747a6$ba52f2d0$1f12fea9@sarek> Message-ID: <45C4F5C4.8080302@perathoner.de> John Hagerson wrote: > It was brought to my attention that the "Books Posted or Updated Since" page > http://www.gutenberg.org/browse/recent/last1 has not been updated since > February 1. > > Has something broken? Seems so ... some php driver is broken. We are migrating to a new database server because ibiblio needs the rack space our old server is occupying. This will also give us more cycles. I expect to get this fixed in the next few days. -- Marcello Perathoner webmaster at gutenberg.org From robert_marquardt at gmx.de Sat Feb 3 21:05:31 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Sun, 04 Feb 2007 06:05:31 +0100 Subject: [gutvol-d] "Books Posted or Updated" page not updated In-Reply-To: <45C4F5C4.8080302@perathoner.de> References: <020a01c747a6$ba52f2d0$1f12fea9@sarek> <45C4F5C4.8080302@perathoner.de> Message-ID: On Sat, 03 Feb 2007 21:51:16 +0100, you wrote: >We are migrating to a new database server because ibiblio needs the rack >space our old server is occupying. This will also give us more cycles. I will download the complete installation bzip2. More backup cannot hurt. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From marcello at perathoner.de Sun Feb 4 12:02:34 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 04 Feb 2007 21:02:34 +0100 Subject: [gutvol-d] "Books Posted or Updated" page not updated In-Reply-To: References: <020a01c747a6$ba52f2d0$1f12fea9@sarek> <45C4F5C4.8080302@perathoner.de> Message-ID: <45C63BDA.6070600@perathoner.de> Robert Marquardt wrote: > On Sat, 03 Feb 2007 21:51:16 +0100, you wrote: > >> We are migrating to a new database server because ibiblio needs the rack >> space our old server is occupying. This will also give us more cycles. > > I will download the complete installation bzip2. More backup cannot hurt. We are not losing any data. (In fact we currently have 2 copies of the database running on different servers.) I'm just unable to update any one of them :-) -- Marcello Perathoner webmaster at gutenberg.org From joey at joeysmith.com Mon Feb 5 22:52:11 2007 From: joey at joeysmith.com (joey) Date: Mon, 5 Feb 2007 23:52:11 -0700 Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <20070202003742.GA25119@joeysmith.com> References: <20070202003742.GA25119@joeysmith.com> Message-ID: <20070206065211.GA12616@joeysmith.com> On Thu, Feb 01, 2007 at 05:37:42PM -0700, joey wrote: > Is there any good reason not to unify these to one particular style > of newline across the board? "\r\n" seems a likely candidate since > the other platforms are generally more lenient towards Windows > line-endings than Windows is toward theirs, in my (albeit limited) > experience. I have identified 52 files in the current library that exhibit the problem of mixed newline styles. Can anyone give me a reason not to fix these? From robert_marquardt at gmx.de Tue Feb 6 10:52:21 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 06 Feb 2007 19:52:21 +0100 Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <20070206065211.GA12616@joeysmith.com> References: <20070202003742.GA25119@joeysmith.com> <20070206065211.GA12616@joeysmith.com> Message-ID: <1hjhs2loj1mv2aaa1ebe4jfqku1ohvijbb@4ax.com> On Mon, 5 Feb 2007 23:52:11 -0700, you wrote: >I have identified 52 files in the current library that exhibit the problem >of mixed newline styles. Can anyone give me a reason not to fix these? My opinion is "fix it it is broken". -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 6 11:03:50 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 06 Feb 2007 20:03:50 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD Message-ID: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> The project is going well. I have downloaded all SF books in all formats (150 MB) and i now work on creating the HTML page for easy navigation of the files. What i need for the CD is the font the "Project Gutenberg" logo in the website is written with. A nice Windows PG icon would be nice also. I want to create a CD label to print on the CD (we have a CD copy station in our company for that). Northern quadrant "Project Gutenberg" in the correct font. East and west the images with text as seen on the SF Bookshelf. South two lines "Science Fiction" in a futuristic font (i have several already) and "Bookshelf" in Times. Alternatively replacing "Bookshelf" with an image of a bookshelf. I think this is a simple design which can be easily adapted for further Bookshelf CDs. Maybe it is using too many fonts, but that i will decide later. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From greg at durendal.org Tue Feb 6 11:03:25 2007 From: greg at durendal.org (Greg Weeks) Date: Tue, 6 Feb 2007 14:03:25 -0500 (EST) Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: On Tue, 6 Feb 2007, Robert Marquardt wrote: > The project is going well. I have downloaded all SF books in all formats (150 MB) and i now work on creating the HTML > page for easy navigation of the files. Did you catch the new one that posted yesterday? "Highways in Hiding" by George O. Smith. -- Greg Weeks http://durendal.org:8080/greg/ From robert_marquardt at gmx.de Tue Feb 6 11:42:47 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 06 Feb 2007 20:42:47 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: On Tue, 6 Feb 2007 14:03:25 -0500 (EST), you wrote: >On Tue, 6 Feb 2007, Robert Marquardt wrote: > >> The project is going well. I have downloaded all SF books in all formats (150 MB) and i now work on creating the HTML >> page for easy navigation of the files. > >Did you catch the new one that posted yesterday? "Highways in Hiding" by >George O. Smith. Of course! Also added to the SF Bookshelf pages. Please expedite the remaining Pipers. I want them all on the CD. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From desrod at gnu-designs.com Tue Feb 6 11:33:22 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Tue, 06 Feb 2007 14:33:22 -0500 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: <1170790402.21136.9.camel@localhost.localdomain> On Tue, 2007-02-06 at 20:03 +0100, Robert Marquardt wrote: > The project is going well. I have downloaded all SF books in all > formats (150 MB) and i now work on creating the HTML > page for easy navigation of the files. Were these the SF books already in PG? Or some other external books? -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/d24e0ebc/attachment.pgp From hyphen at hyphenologist.co.uk Tue Feb 6 12:08:53 2007 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue, 06 Feb 2007 20:08:53 +0000 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: <8umhs254qdk84nrop960kfvnefsihp69tf@4ax.com> On Tue, 06 Feb 2007 20:03:50 +0100, Robert Marquardt wrote: |!The project is going well. I have downloaded all SF books in all formats (150 MB) and i now work on creating the HTML |!page for easy navigation of the files. Just a point on content! At the local Library I have picked up a copy of The Oxford Book of Science Fiction stories" Tom Shippey, Oxford University Press 1992 ISBN 0-19-280381-6. An anthology containing the following three and many later works. This contains three short stories which PG *could* contain, but a quick search of the on line catalogue fails to turn up. The Land Ironclads (1903) H G Wells, c20 pages Finis (1906) Frank L Pollack, c11 pages As Easy as ABC, Rudyard Kipling (1912) c26 Pages Having read all three they are all worthy of inclusion in the CD. Perhaps someone else could do a better check on the PG catalogue, and if they have not been done chase down the original publication with a view to making them into e-text. -- Dave Fawthrop From marcello at perathoner.de Tue Feb 6 13:53:58 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 06 Feb 2007 22:53:58 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: <45C8F8F6.5030303@perathoner.de> Robert Marquardt wrote: > What i need for the CD is the font the > "Project Gutenberg" logo in the website is written with. A nice > Windows PG icon would be nice also. http://www.dafont.com/gutenberg-textura.font > Northern quadrant "Project Gutenberg" in the correct font. East and > west the images with text as seen on the SF Bookshelf. South two > lines "Science Fiction" in a futuristic font (i have several already) > and "Bookshelf" in Times. Alternatively replacing "Bookshelf" with an > image of a bookshelf. Yuck! Don't use too many fonts. A rule of thumb is max. 2 different fonts and max. 5 different sizes on one page. Italic and bold count as new font. How about just Textura and Futura? -- Marcello Perathoner webmaster at gutenberg.org From greg at durendal.org Tue Feb 6 14:41:13 2007 From: greg at durendal.org (Greg Weeks) Date: Tue, 6 Feb 2007 17:41:13 -0500 (EST) Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: On Tue, 6 Feb 2007, Robert Marquardt wrote: > Of course! Also added to the SF Bookshelf pages. > Please expedite the remaining Pipers. I want them all on the CD. I widh I could. They are in PPers hands. (and have been for a long time) -- Greg Weeks http://durendal.org:8080/greg/ From sly at victoria.tc.ca Tue Feb 6 14:48:45 2007 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue, 6 Feb 2007 14:48:45 -0800 (PST) Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1hjhs2loj1mv2aaa1ebe4jfqku1ohvijbb@4ax.com> References: <20070202003742.GA25119@joeysmith.com> <20070206065211.GA12616@joeysmith.com> <1hjhs2loj1mv2aaa1ebe4jfqku1ohvijbb@4ax.com> Message-ID: It is perhaps worth noting that quite often in this kind of circumstance, an attempt to make a similar changes to of a number of files has resulted in unintended side-effects or problems which then take twice as long to try to deal with. (or even are not found until years later.) Personally, I am a fan of the "Work with one text at a time" approach. Andrew On Tue, 6 Feb 2007, Robert Marquardt wrote: > On Mon, 5 Feb 2007 23:52:11 -0700, you wrote: > > >I have identified 52 files in the current library that exhibit the problem > >of mixed newline styles. Can anyone give me a reason not to fix these? > > My opinion is "fix it it is broken". > From Bowerbird at aol.com Tue Feb 6 15:15:15 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Feb 2007 18:15:15 EST Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: andrew said: > It is perhaps worth noting that > quite often in this kind of circumstance, > an attempt to make a similar changes to of > a number of files has resulted in unintended > side-effects or problems which then take > twice as long to try to deal with. > (or even are not found until years later.) could you elaborate on those experiences please? i'd like to see if i might be able to learn from them, and -- if so -- what it is exactly that i would learn. > Personally, I am a fan of the > "Work with one text at a time" approach. not only do i think that "approach" is unnecessarily limiting, i fail to see how it applies in this situation. i can see absolutely _no_ purpose that is served by having a file with mixed newline characters... furthermore, the fix is simple and straightforward. if you can imagine any unanticipated consequences, i would love to see such an imagination in action... as it is, the comment just seems superstitious to me. indeed, i believe the _real_ issue in this matter is to ask just exactly how this mixed-newline glitch _ever_happened_ in the first place, with an aim to changing whatever workflow allowed this problem. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/b0bfde6b/attachment.htm From jon at noring.name Tue Feb 6 15:42:44 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 16:42:44 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <1344868262.20070206164244@noring.name> Bowerbird wrote: > indeed, i believe the _real_ issue in this matter is > to ask just exactly how this mixed-newline glitch > _ever_happened_ in the first place, with an aim to > changing whatever workflow allowed this problem. I assume we are talking about mixing of different "new line breaks" in the same text file? Who knows the reason. If we are talking about non-DP etexts, which I assume is the case, then asking the question of "how it ever happened" is a silly question, since we know how this happened: PG established no standardization in text formatting. No standardization, means in reality anything goes. We don't want to make it difficult on the PG volunteers, do we?, by asking them to follow a short list of workflow product requirements... Now to fix that particular problem of normalizing "new line" breaks in all the PG text files, that should be pretty straightforward with a PHP script. The more complicated issue is to gather the texts and then return the new-line-normalized texts back to their homes, maybe with their edition numbers incremented. Now that's a bookkeeping issue which I can't address not knowing the intimate details of the PG database. Jon Noring From marcello at perathoner.de Tue Feb 6 16:10:30 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 01:10:30 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1344868262.20070206164244@noring.name> References: <1344868262.20070206164244@noring.name> Message-ID: <45C918F6.7010901@perathoner.de> Jon Noring wrote: > Now to fix that particular problem of normalizing "new line" breaks in > all the PG text files, that should be pretty straightforward with a > PHP script. The more complicated issue is to gather the texts and then > return the new-line-normalized texts back to their homes, maybe with > their edition numbers incremented. Look! No hands! cd /public/ftp/pub/docs/books/gutenberg cat list-of-files | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' or, better, do all files: cd /public/ftp/pub/docs/books/gutenberg find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Tue Feb 6 16:25:40 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 17:25:40 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45C918F6.7010901@perathoner.de> References: <1344868262.20070206164244@noring.name> <45C918F6.7010901@perathoner.de> Message-ID: <1242775140.20070206172540@noring.name> Marcello wrote: > Jon Noring wrote: >> Now to fix that particular problem of normalizing "new line" breaks in >> all the PG text files, that should be pretty straightforward with a >> PHP script. The more complicated issue is to gather the texts and then >> return the new-line-normalized texts back to their homes, maybe with >> their edition numbers incremented. > Look! No hands! > > cd /public/ftp/pub/docs/books/gutenberg > cat list-of-files | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' > > or, better, do all files: > > cd /public/ftp/pub/docs/books/gutenberg > find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' touch? ! Jon From Bowerbird at aol.com Tue Feb 6 16:29:37 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Feb 2007 19:29:37 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: jon said: > No standardization, means in reality anything goes. aren't you getting tired of squeaking that toy? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/979a738f/attachment.htm From jon at noring.name Tue Feb 6 16:46:26 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 17:46:26 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <07874618.20070206174626@noring.name> Bowerbird wrote: > jon said: >>?? No standardization, means in reality anything goes. > aren't you getting tired of squeaking that toy? And aren't you tired of using this ploy? Jon From Bowerbird at aol.com Tue Feb 6 17:26:17 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Feb 2007 20:26:17 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: jon noring said: >??? No standardization, means in reality anything goes. there _are_ standards. go read 'em. it's definitely not "anything goes". -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/d1e6d285/attachment.htm From jon at noring.name Tue Feb 6 17:50:32 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 18:50:32 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <1528808409.20070206185032@noring.name> Bowerbird wrote: > jon noring said: >>??? No standardization, means in reality anything goes. > there _are_ standards.? go read 'em.? it's definitely not "anything goes". References? Others besides myself are probably interested. Jon From Bowerbird at aol.com Tue Feb 6 17:59:48 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Feb 2007 20:59:48 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: jon noring said: > References? geez! all the whining you've done, and you've never read the f.a.q.? > http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/7938a1d2/attachment.htm From jon at noring.name Tue Feb 6 18:04:33 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 19:04:33 -0700 Subject: [gutvol-d] "Web 2.0 ... The Machine is Us/ing Us" (Digital Text, TNG) Message-ID: <1256577413.20070206190433@noring.name> The powerful and must see video by Michael Wesch shows the future of digital text where it becomes more than just words on "digital" paper: http://www.youtube.com/watch?v=6gmP4nk0EOE&eurl= XML forms a key underpinning of this new tomorrow. Jon Noring From jon at noring.name Tue Feb 6 18:08:54 2007 From: jon at noring.name (Jon Noring) Date: Tue, 6 Feb 2007 19:08:54 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <38477852.20070206190854@noring.name> Bowerbird wrote: > jon noring said: >>?? References? > geez!? all the whining you've done, and you've never read the f.a.q.? >?? http://www.gutenberg.org/wiki/Gutenberg:Volunteers%27_FAQ Oh, I thought you had something new. Hmmm, I glanced at it again to see if it changed. Same ol' "do what feels good so long as it doesn't hurt someone else": "V.10. Do I have to produce in plain ASCII text? "Certainly not if it doesn't make sense. To take an extreme example, if you're working in Japanese or Arabic, or creating audio files, there is no point in trying to reproduce that in ASCII! "Where the text can largely be expressed in ASCII, we do want to post an ASCII version, even if it is somewhat degraded compared to the original. However, we will post your file in as many open formats as you want to create, so that your original work is available for those who have the software to read it." Geez, with Unicode and UTF-8 and UTF-16 encodings becoming the norm, this "requirement" is absolutely Byzantine. I won't delve into the other "requirements", but no where do I see any requirement about normalization of even plain texts. Jon Noring From Bowerbird at aol.com Tue Feb 6 20:25:02 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Feb 2007 23:25:02 EST Subject: [gutvol-d] "Web 2.0 ... The Machine is Us/ing Us" (Digital Text, TNG) Message-ID: jon noring said: > XML forms a key underpinning of this new tomorrow. nah. anything you can do with x.m.l., i can do easier without it. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070206/52d09733/attachment.htm From robert_marquardt at gmx.de Tue Feb 6 20:33:23 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 05:33:23 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> Message-ID: <8jlis292inea3gbulm4209im2af65meejb@4ax.com> On Tue, 6 Feb 2007 17:41:13 -0500 (EST), you wrote: >I widh I could. They are in PPers hands. (and have been for a long time) I will make a polite request on the DP forum. Polite requests can take you far :-) -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 6 20:38:15 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 05:38:15 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <8umhs254qdk84nrop960kfvnefsihp69tf@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <8umhs254qdk84nrop960kfvnefsihp69tf@4ax.com> Message-ID: On Tue, 06 Feb 2007 20:08:53 +0000, you wrote: >Having read all three they are all worthy of inclusion in the CD. > >Perhaps someone else could do a better check on the PG catalogue, and if >they have not been done chase down the original publication with a view to >making them into e-text. I will check if they are on PG. If not they will take some time so they will probably not make it to the CD. I want to complete the CD this month. The CD will be updated so this is not much of a problem. The stories of Campbell and E E Smith are in the DP queue. as soon as they are available i can easily create a new image. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 6 20:42:11 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 05:42:11 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <45C8F8F6.5030303@perathoner.de> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> Message-ID: <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> On Tue, 06 Feb 2007 22:53:58 +0100, you wrote: >Yuck! Don't use too many fonts. A rule of thumb is max. 2 different >fonts and max. 5 different sizes on one page. Italic and bold count as >new font. Yes i know. This is why i think a bookshelf image is better than the text "Bookshelf". Where can i upload images or PDFs so you can critise the design? Can i use the PG Wiki for that? >How about just Textura and Futura? We will see. The design should be adaptable for other bookshelves. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 6 20:43:16 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 05:43:16 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <1170790402.21136.9.camel@localhost.localdomain> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <1170790402.21136.9.camel@localhost.localdomain> Message-ID: On Tue, 06 Feb 2007 14:33:22 -0500, you wrote: >Were these the SF books already in PG? Or some other external books? Strictly the PG books. No audio files or we would be well over a gigabyte. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From joey at joeysmith.com Tue Feb 6 23:23:09 2007 From: joey at joeysmith.com (joey) Date: Wed, 7 Feb 2007 00:23:09 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45C918F6.7010901@perathoner.de> References: <1344868262.20070206164244@noring.name> <45C918F6.7010901@perathoner.de> Message-ID: <20070207072309.GA23797@joeysmith.com> On Wed, Feb 07, 2007 at 01:10:30AM +0100, Marcello Perathoner wrote: > cd /public/ftp/pub/docs/books/gutenberg > find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' Marcello: More or less exactly what I was going to do. Does this mean you've already done that on the production version of the library? From marcello at perathoner.de Wed Feb 7 04:14:59 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 13:14:59 +0100 Subject: [gutvol-d] "Web 2.0 ... The Machine is Us/ing Us" (Digital Text, TNG) In-Reply-To: References: Message-ID: <45C9C2C3.4020402@perathoner.de> Bowerbird at aol.com wrote: > nah. anything you can do with x.m.l., i can do easier without it. Boy! You really should have shown that to Bill Gates. He'd have bought your ZML instead of developing XPS. Oh! It just occured to me, you just want Bill to waste a bit more of his time with XML? Can't wait to see Bill's face when you finally release ZML and kill off XPS (and PDF too) in one big swoop. -- Marcello Perathoner webmaster at gutenberg.org From desrod at gnu-designs.com Wed Feb 7 04:13:42 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Wed, 07 Feb 2007 07:13:42 -0500 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45C918F6.7010901@perathoner.de> References: <1344868262.20070206164244@noring.name> <45C918F6.7010901@perathoner.de> Message-ID: <1170850422.5718.9.camel@localhost.localdomain> > Look! No hands! > > cd /public/ftp/pub/docs/books/gutenberg > cat list-of-files | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' > > or, better, do all files: > > cd /public/ftp/pub/docs/books/gutenberg > find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' In the interest of TMTOWTDI and minimizing errors on various platforms, I find that the suggestions in this Perlmonks node might be helpful. http://www.perlmonks.org/index.pl?node_id=595426 -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/59b0a8b7/attachment.pgp From marcello at perathoner.de Wed Feb 7 04:22:12 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 13:22:12 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <20070207072309.GA23797@joeysmith.com> References: <1344868262.20070206164244@noring.name> <45C918F6.7010901@perathoner.de> <20070207072309.GA23797@joeysmith.com> Message-ID: <45C9C474.2060406@perathoner.de> joey wrote: > On Wed, Feb 07, 2007 at 01:10:30AM +0100, Marcello Perathoner wrote: >> cd /public/ftp/pub/docs/books/gutenberg >> find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' > > Marcello: > More or less exactly what I was going to do. Does this mean you've already done that > on the production version of the library? No. This is a WWers job. The bureaucratically correct way to do this is to copy the files on pglaf.org, fix them with perl, and re-push them. That way they will be fixed in the other repositories (Internet Archive) too. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Wed Feb 7 04:23:43 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 13:23:43 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> Message-ID: <45C9C4CF.8040304@perathoner.de> Robert Marquardt wrote: > Where can i upload images or PDFs so you can critise the design? > Can i use the PG Wiki for that? Of course. Start a discussion on a subpage of the SF bookshelf or on your user page. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Wed Feb 7 05:08:27 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 08:08:27 EST Subject: [gutvol-d] =?iso-8859-1?q?howto=3A_unwrap_the_paragraphs_in_a_pro?= =?iso-8859-1?q?ject_gutenberg=A0_e-text?= Message-ID: don't overthink things. joey tells us a mere 52 files needed editing. do the edit and push the change to those 52 e-texts. all standard operating policy... you changelog it as newline standardization, a diagnosis that diff analysis quickly confirms. end of concern, nothing to worry about, aflack! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/ffc680a6/attachment.htm From marcello at perathoner.de Wed Feb 7 05:13:50 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 14:13:50 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1170850422.5718.9.camel@localhost.localdomain> References: <1344868262.20070206164244@noring.name> <45C918F6.7010901@perathoner.de> <1170850422.5718.9.camel@localhost.localdomain> Message-ID: <45C9D08E.8000501@perathoner.de> David A. Desrosiers wrote: >> find . -name '*.txt' | xargs perl -pi -e 's/(\r\n|\n|\r)/\r\n/g' > > In the interest of TMTOWTDI and minimizing errors on various platforms, > I find that the suggestions in this Perlmonks node might be helpful. > > http://www.perlmonks.org/index.pl?node_id=595426 Don't even think of running my code on that glorified game console boot loader! The Right Thing would be to replace everything with just "\n". It makes sense to still use the character combination "\r\n" that was introduced solely because the common teletype device in use 40 years ago was too slow to do a carriage return in the time of one character sent at 110 baud. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at AOL.COM Wed Feb 7 05:46:00 2007 From: Bowerbird at AOL.COM (Bowerbird at AOL.COM) Date: Wed, 7 Feb 2007 08:46:00 EST Subject: [gutvol-d] =?iso-8859-1?q?howto=3A_unwrap_the_paragraphs_in_a_pro?= =?iso-8859-1?q?ject_gutenberg=A0_e-text?= Message-ID: david said: > In the interest of TMTOWTDI and minimizing errors on various platforms adroitly put... > I find that the suggestions in this Perlmonks node might be helpful. .,..and perlmonks i like. so i went there looking to learn something. sorry, learned nothing. they're talking about how to work with data you read in. well, yes, i can assure you, having just learned that lesson -- which is how i graduated out of perl first grade now -- thoroughly, you _do_ wanna normalize the newlines when your program is dealing with chunks of text, it's too hard to code defensively against all three types of newlines, so you normalize and then program against the one newline. but the discussion here now is about how you _store_ the text, with what kind of linebreak. one suggestion was for the [cr+lf| combo, but i myself favor [linefeed]. to me, it's simple and it makes sense, so it makes sense. first -- two characters is stupid, one is better anyway; and as you're on a linux server, you use linux newline. end of story. that's all she wrote. take it to the bank... (and you know when marcello and bowerbird agree...) so run it on the whole library; don't even changelog it. "if it was different than this, it was wrong, just forget it." =bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/dba17fc3/attachment.htm From greg at durendal.org Wed Feb 7 05:23:24 2007 From: greg at durendal.org (Greg Weeks) Date: Wed, 7 Feb 2007 08:23:24 -0500 (EST) Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <8jlis292inea3gbulm4209im2af65meejb@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <8jlis292inea3gbulm4209im2af65meejb@4ax.com> Message-ID: On Wed, 7 Feb 2007, Robert Marquardt wrote: > On Tue, 6 Feb 2007 17:41:13 -0500 (EST), you wrote: > >> I widh I could. They are in PPers hands. (and have been for a long time) > > I will make a polite request on the DP forum. Polite requests can take you far :-) I did pass on your request by private PM also. -- Greg Weeks http://durendal.org:8080/greg/ From jon at noring.name Wed Feb 7 07:23:07 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 08:23:07 -0700 Subject: [gutvol-d] =?windows-1252?q?howto=3A_unwrap_the_paragraphs_in_a_p?= =?windows-1252?q?roject_gutenberg=A0_e-text?= In-Reply-To: References: Message-ID: <521676807.20070207082307@noring.name> First, to summarize what I write below, if PG is to settle upon one "newline" standard for plain text files, it should be "CR+LF" since otherwise Windows users using the default Notepad (which is the majority of all users in the western world) will find PG texts essentially unreadable... Bowerbird wrote: > but the discussion here now is about how you _store_ > the text, with what kind of linebreak.? one suggestion > was for the [cr+lf| combo, but i myself favor [linefeed]. > to me, it's simple and it makes sense, so it makes sense. For those of us who lived back in the days of line printers, a carriage return simply returned the carriage, while a line feed advanced the line. Both were required to create a "newline". I'm not saying we should mimic what was done years and years ago, but there is a logic behind the CR+LF characters. To me, the most important thing is if the text document is readable using Windows Notepad. I know Bowerbird hates Windows with a passion, but the reality is that the vast majority of authors and publishers use Windows systems, and if they were to process text (whether plain like ZML or marked up like XML), many will use the venerable Notepad (which is now Unicode compliant.) So I did an experiment where I generated an ASCII text file using only a LF for a newline character, and Notepad for Windows XP did not create a newline at the spot where the LF character occurred (only the recognized "box" character). If PG was to settle upon one standard for newlines, even though using only LF to indicate a newline is "politically correct" (since this is what Linux uses, and as a matter of fact so does Mac OS X, but not previous versions of Mac OS -- see Wikipedia's "Newline" article), I think that CR+LF makes more sense for the time being. We have to remember that the average user does not understand subtle issues such as "newline characters". The vast majority use Windows, and the vast majority of them have Notepad as their default text editor, and if they open up a PG *.txt file on their system, and see no line breaks (only boxes), that renders the plain text unreadable on their system. These users just expect it "to work." I don't believe Michael Hart would want this. So settle upon CR+LF for the present, or provide two text versions and explain which version should be used for direct reading depending upon the user's system. Now, shall we talk about UTF-8 (for western texts) versus ISO-8859-*? Jon Noring From jon at noring.name Wed Feb 7 07:41:35 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 08:41:35 -0700 Subject: [gutvol-d] "Web 2.0 ... The Machine is Us/ing Us" (Digital Text, TNG) In-Reply-To: <45C9C2C3.4020402@perathoner.de> References: <45C9C2C3.4020402@perathoner.de> Message-ID: <224575929.20070207084135@noring.name> Marcello wrote: > Bowerbird at aol.com wrote: >> nah. anything you can do with x.m.l., i can do easier without it. > Boy! You really should have shown that to Bill Gates. He'd have bought > your ZML instead of developing XPS. > > Oh! It just occured to me, you just want Bill to waste a bit more of his > time with XML? > > Can't wait to see Bill's face when you finally release ZML and kill off > XPS (and PDF too) in one big swoop. Another touch?. When one looks at the XML landscape, one sees that XML has become so ubiquitous, it is amazing. This is because XML can do so many things. Industry has settled upon XML in a big way. In banking, billions of dollars *every day* are exchanged using XML to represent the data in some parts of the chain. In addition, as that YouTube video shows, we have to consider etexts of books as more than just linear etexts which mimic the ink-on-paper world. These etexts are actually data which will enable a whole host of digital interactivity with texts -- to integrate these etexts into every day human life that could not be done with ink-on-paper. The original PG "paradigm" simply emulated ink-on-paper for narrative reading purposes. But today we see many more ways that etexts can be used which increase their value. These new ways add requirements which were not considered important in 1985-95 when PG matured. The sand is shifting and hopefully PG will evolve to meet these new opportunities. I see XML playing a major role in this new paradigm, as does Marcello, as does DP, as does Greg, as does most everyone else interested in digitizing the Public Domain, except for one person who decries XML for any purpose, despite its resounding success. Reminds me of the saying: "The dog barks, but the parade marches on." Jon Noring From jon at noring.name Wed Feb 7 08:42:30 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 09:42:30 -0700 Subject: [gutvol-d] Wikipedia article on "newline" in text files Message-ID: <949038773.20070207094230@noring.name> As part of this discussion about what to use for "newline" characters in PG's plain text files, refer to the Wikipedia article: http://en.wikipedia.org/wiki/Newline Lots of good insights as to why things are the way they are, and the limitations of whatever "newline markup" is chosen. If a significant number of people use Windows Notepad as the default way to read PG texts, then this strongly supports using CR+LF as the normal way to represent newlines in all PG plain text documents, at least for the immediate future. It is always possible to run a Perl or PHP script (as Marcello demonstrated) to change the default newline characters to LF. Jon Noring From desrod at gnu-designs.com Wed Feb 7 08:46:50 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Wed, 07 Feb 2007 11:46:50 -0500 Subject: [gutvol-d] Wikipedia article on "newline" in text files In-Reply-To: <949038773.20070207094230@noring.name> References: <949038773.20070207094230@noring.name> Message-ID: <1170866810.5718.13.camel@localhost.localdomain> On Wed, 2007-02-07 at 09:42 -0700, Jon Noring wrote: > If a significant number of people use Windows Notepad as the default > way to read PG texts, then this strongly supports using CR+LF as the > normal way to represent newlines in all PG plain text documents, at > least for the immediate future. It is always possible to run a Perl or > PHP script (as Marcello demonstrated) to change the default newline > characters to LF. And ironically, even DOS edit and Wordpad DTRT.. (Do The Right Thing). Notepad is the odd cousin here and probably shouldn't be the standard. -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/bd98ec1c/attachment.pgp From traverso at dm.unipi.it Wed Feb 7 09:25:12 2007 From: traverso at dm.unipi.it (Carlo Traverso) Date: Wed, 7 Feb 2007 18:25:12 +0100 Subject: [gutvol-d] Wikipedia article on "newline" in text files In-Reply-To: <1170866810.5718.13.camel@localhost.localdomain> (desrod@gnu-designs.com) References: <949038773.20070207094230@noring.name> <1170866810.5718.13.camel@localhost.localdomain> Message-ID: <200702071725.l17HPCE22600@pico.dm.unipi.it> There is another good reason to stick to CRLF: if we convert all the library to another standard, all the txt files and the corresponding zips are changed, with much joy of all the mirrors that suddenly have to refresh simultaneously the whole archive downloading some 100GB each. And of course of the server having to serve everything. And of course every serious application is not bothered by the varying newlines and can handle them seamlessly, so the change would only be for the sake of the change. Carlo From robert_marquardt at gmx.de Wed Feb 7 10:02:07 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 19:02:07 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <45C9C4CF.8040304@perathoner.de> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> <45C9C4CF.8040304@perathoner.de> Message-ID: <405ks25jal42tbi8kg91ee87bcjfou8lvu@4ax.com> On Wed, 07 Feb 2007 13:23:43 +0100, you wrote: >Of course. Start a discussion on a subpage of the SF bookshelf or on >your user page. See my talk page. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Wed Feb 7 10:17:49 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 07 Feb 2007 19:17:49 +0100 Subject: [gutvol-d] what you cannot do with ebooks Message-ID: http://www.sublackwell.co.uk/bookcut/index.htm definitely breathtaking -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From ricardofdiogo at gmail.com Wed Feb 7 10:26:38 2007 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Wed, 7 Feb 2007 18:26:38 +0000 Subject: [gutvol-d] what you cannot do with ebooks In-Reply-To: References: Message-ID: <9c6138c50702071026y7f1a79bdjb9dd5b3152847092@mail.gmail.com> You have to read Italo Calvino's (1923-1985) _If on a winter's night a traveler_. By the end you'll notice why. 2007/2/7, Robert Marquardt : > > http://www.sublackwell.co.uk/bookcut/index.htm > > definitely breathtaking > -- > Robert Marquardt (Team JEDI) http://delphi-jedi.org > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/8b4c4f1e/attachment.htm From sly at victoria.tc.ca Wed Feb 7 10:37:30 2007 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed, 7 Feb 2007 10:37:30 -0800 (PST) Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: bb said: >could you elaborate on those experiences please? >i'd like to see if i might be able to learn from them, >and -- if so -- what it is exactly that i would learn. Ok, I'll try. One of my little peeves has to do with reprentation of emdashes. The usual PG standard for English-language files is to represent them in plain text as two dashes--with no space on either side. I have seen in texts, a small, but still significant number of places where an emdash is used as the end of a sentence in place of a period to indicate interrupted or trailing off speech-- As I have just done here, there is a space after this. However, I have seen a decent number of times where someone runs an automatic check over the file and then "fixes" the spacing around the dashes. === Jon Noring wrote: >First, to summarize what I write below, if PG is to settle upon one >"newline" standard for plain text files, it should be "CR+LF" since Jon, you must have been around here long enough to know that that is already the case. I would say that anything else is an unintentional anomoly. === In the case of the new line characters under discussion, I would suggest that one single command line to change them all could have unintended side effects. In some places, an extra LF should perhaps be removed altogether, and in some places, perhaps, it would indicate another blank line. So I would look more closely at the context of individual instances before making changes. Andrew From Bowerbird at aol.com Wed Feb 7 11:07:23 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 14:07:23 EST Subject: [gutvol-d] Wikipedia article on "newline" in text files Message-ID: carlo said: > There is another good reason to stick to CRLF: > if we convert all the library to another standard, > all the txt files and the corresponding zips are changed, > with much joy of all the mirrors that suddenly have to > refresh simultaneously the whole archive downloading > some 100GB each. And of course of the server having to > serve everything. you're taking this a bit further than had been discussed so far, and then arguing that this advanced step would be a bad idea... maybe it would, but the point is that no one suggested _that_... marcello showed how it could be done, with a one-line script, but he explicitly said that he hadn't done it, that it was a ww job. and i specifically said this only applies to joey's 52 files, and that those files -- and only those files -- should now be pushed new. the question is, if we normalize a file to a consistent newline, which one should it be? maybe the answer should be "cr/lf". maybe "cr". maybe "lf". and perhaps the answer should be "it doesn't matter one bit, just so long as it's entirely consistent in each individual file." or heck, maybe the answer should be "it doesn't matter at all, even if the newlines _within_ one file aren't even consistent." (which would mean we'd just leave those 52 files as they are.) but that's the question under concern, not a wholesale change. all that having been said, however, from _my_ best perspective, it wouldn't be a bad thing if such a wholesale change was made. and yes, i believe you could make that wholesale change and then _restore_ the timedate stamp to each file to what it was previously, so there would be no "unnecessary" thrashing of the mirrors. so this is just another one of those "superstitious" arguments that we can't do action x because of reason y, when reason y is tweakable. > And of course every serious application is not bothered by > the varying newlines and can handle them seamlessly, so > the change would only be for the sake of the change. well, first, yes, every serious application does normalize newlines. and it will continue to do so into the future, as a consistency check, even if the project gutenberg e-texts are normalized to a standard. most especially us mac programmers have programmed defensively in regard to newlines for over two decades now, and i'd be guessing that many of us normalize any data to our own "cr" newline character. linux programmers probably normalize to their "lf" newline character. sometimes i even adopt some weird character (not present in the file) and normalize to _that_ as my newline character, just to wash my code of any presumption of what the newline character is, for better clarity... ergo, any wholesale change would _not_ be done for the programmers. it would be done because someone realizes that there should be some _consistency_ across the library, and acts to bring about that consistency. you seem to be laboring under the impression that the library has a consistent standard now -- cr/lf -- and that's simply not the case. some files use one newline, others use another. and, as i'd reported, and joey's list confirms, there are even files that _mix_ their newlines. (my impression at the time was someone had edited a cr/lf file with a linux editor, so the edited portion was "missing" all its "cr" characters.) is this the worst inconsistency in the library. heck no, not by a longshot. but it's _one_ inconsistency, and a policy to remove the inconsistencies could do worse than to start with the elimination of such a basic one... in the meantime, please make sure those 52 files get done right away... normalize the newlines to anything you want, but please normalize 'em. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/84ba0305/attachment.htm From Bowerbird at aol.com Wed Feb 7 11:27:43 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 14:27:43 EST Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: andrew said: > One of my little peeves has to do with representation of emdashes. funny you should say that, because it's also one of mine. > The usual PG standard for English-language files is to represent > them in plain text as two dashes--with no space on either side. which i disagree with, because of an extremely good (bad?) reason. when you do that, you essentially splice two different words together. (when, typically, the dash is supposed to mean they are _separated_.) that means your line rewrapping code needs to rip them apart, and that's asking that rewrap function to do more than it should have to. typesetters avoid this problem typically by placing the smallest of spaces on either side of the em-dash, so a paragraphing routine will break there if either one of them is the "best" place to break. since we don't have "tiny" spaces in an ascii file, we need to use regular ones, but that's ok, because -- in practice -- an em-dash _will_ have a space on each side. > I have seen in texts, a small, but still significant number of places > where an emdash is used as the end of a sentence in place of a period > to indicate interrupted or trailing off speech-- As I have just done here, > there is a space after this. However, I have seen a decent number > of times where someone runs an automatic check over the file and > then "fixes" the spacing around the dashes. if you mean they remove the space after the em-dash, and thereby run the two sentences together, i firmly agree that they should not. so yes, i think it's pretty clear that such a global change not be made. but that's an easily foreseeable undesired consequence. can you name some global change -- _across_documents_ -- that caused some undesired consequence that had not been anticipated? > Jon, you must have been around here long enough to know > that that is already the case. I would say that anything else > is an unintentional anomoly. it's _not_ the case that all the e-texts use cr/lf to represent a newline. if that is the _policy_, then there are a _lot_ of "unintentional anomalies". *** > In the case of the new line characters under discussion, I would suggest > that one single command line to change them all could have unintended > side effects. In some places, an extra LF should perhaps be removed > altogether, and in some places, perhaps, it would indicate another > blank line. > So I would look more closely at the context of individual instances > before making changes. well, i _have_ "looked more closely at the context of individual instances" -- remember, i am the person who first pointed out these instances -- and i say they _can_ be changed globally without undesired consequence. so andrew, i suggest you take a close look at the changes made in those 52 files, and tell us whether you can find anything of which i was unaware. seems like a golden opportunity to _prove_ me wrong. (or prove you wrong.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/34b85e04/attachment.htm From Bowerbird at aol.com Wed Feb 7 11:45:41 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 14:45:41 EST Subject: [gutvol-d] =?iso-8859-1?q?howto=3A_unwrap_the_paragraphs_in_a_pro?= =?iso-8859-1?q?ject_gutenberg=A0_e-text?= Message-ID: david said: > Did you actually READ the replies? > Specifically those made by daveorg there? > He gives a detailed reason > why using \r\n is bad in a Perl regex. do you mean the suggestion to use the numeric codes? that's a strategy i use a lot, as i find the "escape-character" stuff to be unnecessarily bewildering too much of the time. by the way, the actual question that was asked there is somewhat nonstandard, as the person had files where blank lines were represented with a solitary linefeed while nonblank lines were followed by a cr/lf combo. so the specific regex needed to solve _that_ problem was not the type of one that applies to our situation... on the other hand, the right workflow would solve both our problems and the one that provoked the question: 1. globally change cr/lf to magic character. 2. globally change cr to magic character. 3. globally change lf to magic character. 4. change magic character to desired newline. and yes, the numeric codes solve a lot of confusion -- on _my_ part -- so it doesn't surprise me a bit that it also can save some confusion for the machine. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/ccf0ffd1/attachment-0001.htm From Bowerbird at aol.com Wed Feb 7 11:51:57 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 14:51:57 EST Subject: [gutvol-d] "Web 2.0 ... The Machine is Us/ing Us" (Digital Text, TNG) Message-ID: to think a _bloated_ file-format like x.m.l. gives us _flexibility_ is pure plain _silly_. programs enable us. file-formats are a boring excuse for technoids to pontificate. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/f1b66426/attachment.htm From Bowerbird at aol.com Wed Feb 7 12:14:26 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 15:14:26 EST Subject: [gutvol-d] =?windows-1252?q?howto=3A_unwrap_the_paragraphs_in_a_p?= =?windows-1252?q?roject_gutenberg=A0_e-text?= Message-ID: jon noring said: > there is a logic behind the CR+LF characters. the "logic" of windows. make us eat puke. notepad? ha! -bowerbird p.s. love how your p.c. munged the subject. how appropriate. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/5a788922/attachment.htm From marcello at perathoner.de Wed Feb 7 12:42:32 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 07 Feb 2007 21:42:32 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <521676807.20070207082307@noring.name> References: <521676807.20070207082307@noring.name> Message-ID: <45CA39B8.8030405@perathoner.de> Jon Noring wrote: > I'm not saying we should mimic what was done years and years ago, but > there is a logic behind the CR+LF characters. There's a logic behind war too ... Don't let the circumstance, that one glorified boot loader was designed by monkeys, influence your judgement. > To me, the most important thing is if the text document is readable > using Windows Notepad. To me the most important thing is that the text displays in a browser. All browsers handle LF just fine. Did you any research whatsoever to corroborate your claim that a substantial percentage of PG users switch from the browser to notepad to read their texts? -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Wed Feb 7 12:59:54 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 13:59:54 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CA39B8.8030405@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> Message-ID: <178078053.20070207135954@noring.name> Marcello wrote: > Jon Noring wrote: >> To me, the most important thing is if the text document is readable >> using Windows Notepad. > To me the most important thing is that the text displays in a browser. > All browsers handle LF just fine. Yes, on Windows: WordPad, Word, and web browsers recognize LF-only as a newline. Only Notepad is the exception. > Did you any research whatsoever to corroborate your claim that a > substantial percentage of PG users switch from the browser to notepad to > read their texts? Nope, just a guess. When someone downloads a *.txt file into Windows (which is 80-85% of all personal computers, not including handhelds), and then clicks on it directly in File Explorer, the Windows default application for text is Notepad (it is also the default "source" viewer for Internet Explorer.) Since the vast majority of Windows users don't tweak their default application settings for things like text files, I am confident that locally read PG plain text files are largely read using Notepad. So the question becomes how common it is for Windows users to download and read the plain text versions? Jon Noring From jon at noring.name Wed Feb 7 13:18:03 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 14:18:03 -0700 Subject: [gutvol-d] =?windows-1252?q?howto=3A_unwrap_the_paragraphs_in_a_p?= =?windows-1252?q?roject_gutenberg=A0_e-text?= In-Reply-To: References: Message-ID: <1356736137.20070207141803@noring.name> Bowerbird wrote: > jon noring said: >>?? there is a logic behind the CR+LF characters. > the "logic" of windows.? make us eat puke.? notepad?? ha! *shrug*. As noted in my prior message, we have the following facts: 1) Windows defaults to Notepad to read all local *.txt files. (And IE, for viewing the source of all web documents, also defaults to Notepad, so this impacts on viewing the source of HTML formatted PG texts, too.) 2) For PC-based computers, Windows is the overwhelming dominant OS (80%?). 3) Most Windows users don't reset the default viewer for *.txt files. Most don't even know how. Draw your own conclusion. But all I know is that Michael Hart, knowing these numbers, would take the pragmatic route and say that for now CR+LF is preferable to define newlines since this allows the most users to properly view the PG text files on virtually on platforms and applications. And Michael is big on local viewing of text files! That's why I am cc'ing Michael on this since it is a topic that intersects with his interest in maximum platform and application support for properly rendering PG plain text files. I'd rather see, for the long-term, that all text files (plain and XML) use only LF to define newlines, since that is what is supported in *nix and Mac OS X. But for the shorter term we have to recognize that Notepad still plays an important enough role in the reading of PG texts to stick with CR+LF. > p.s.? love how your p.c. munged the subject.? how appropriate. Not sure what happened. I looked at the other messages I sent out with the same header, and the Subject: was not munged. I did nothing different, and my email client is set for plain text. I simply hit "reply" to another message -- don't remember which one otherwise I'd check it out. Jon Noring From Bowerbird at aol.com Wed Feb 7 13:23:34 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 16:23:34 EST Subject: [gutvol-d] =?windows-1252?q?howto=3A_unwrap_the_paragraphs_in_a_p?= =?windows-1252?q?roject_gutenberg=A0_e-text?= Message-ID: jon noring said: > the pragmatic route and say that for now CR+LF is preferable comical, Anita it?, how a monopoly makes eating puke "pragmatic"... ok, so let's eat puke! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/6fee47bb/attachment.htm From Bowerbird at aol.com Wed Feb 7 13:27:30 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 16:27:30 EST Subject: [gutvol-d] =?windows-1252?q?howto=3A_unwrap_the_paragraphs_in_a_p?= =?windows-1252?q?roject_gutenberg=A0_e-text?= Message-ID: i said: > comical, Anita it? my spellchecker changed "ain't" to "anita". silly spellchecker, you're supposed to _fix_ mistakes, not _cause_ them yourself... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/a0797c2b/attachment.htm From Bowerbird at aol.com Wed Feb 7 13:32:49 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 16:32:49 EST Subject: [gutvol-d] choosing a light-markup system Message-ID: i've been making pudding lately, so i thought i would serve some. i created a "test-suite" to show features needed in any p.g. system... it represents the types of structures that are found in the p.g. library. which means it's great material for a test of light markup systems... there are 3 horses in this race: markdown, textile, and z.m.l. the leader in the light markup arena is one called "markdown" -- google says 790,000 hits for "markdown, text, format" -- which is described at this u.r.l.: > http://daringfireball.net/projects/markdown/ you can try it out at its sandbox, charmingly called a "dingus": > http://daringfireball.net/projects/markdown/dingus also strong in the field is "textile" -- google actually reports more hits (1,200,000) on a search for "textile text format" -- which you can try out in their explanatory sandbox at: > http://www.textism.com/tools/textile/ so let's look at the utility and performance of the 3 systems, ok?, using that test-suite i prepared from project gutenberg e-texts... here are the 3 "input" files (open them in different tabs or windows): > http://www.greatamericannovel.com/zen/suite-textile.txt > http://www.greatamericannovel.com/zen/suite-markdown.txt > http://www.greatamericannovel.com/zen/suite-zml.txt and here are the 3 "output" files -- html for the web -- respectively: > http://www.greatamericannovel.com/zen/suite-textile.html > http://www.greatamericannovel.com/zen/suite-markdown.html > http://www.greatamericannovel.com/zen/suite-zml.html first, one small note about procedure... there _might_ be shortcomings in my textile and markdown input. those are due to the fact that i worked them only well enough for a simple comparison; so any fault is mine. i can assure you they are both good systems, capable enough of doing what p.g. needs. (i improved their .html output files to cover those shortcomings, so you will not get out the exact same .html files as i've shown. but again, given experience with these systems, i could modify the input text files so they would indeed produce the same .html.) but now, let's take a close look at those input files, ok? view the input text-files side-by-side, and you'll see both the textile and markup files have "gunk" in them -- and a considerable amount. it might be less obtrusive than heavy markup, but it's still obtrusive. even worse, it takes _work_ to insert that pseudo-markup into a file. on the other hand, the z.m.l. file has remarkably little "gunk" in it... (i could find ways to quantify this, but i think that it's clearly visible.) so on the input side, z.m.l. has what i consider to be a big advantage, even against the leading "lightweight markup systems" (and certainly against any heavy-markup system)... but wait!, there's more!, because... ...that advantage is _multiplied_ by the quality of the .html output. the textile and markdown .html versions are nicely serviceable, true, but the .zml makes .html with several touches of higher functionality. (if you want, i'd be happy to catalog them. but for now, i'll send this.) and when you add in the benefits of high-quality .rtf and .pdf output, the z.m.l. format -- in my humble opinion -- is the runaway winner... so go ahead, taste the pudding. see what you think. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/0ad56029/attachment.htm From desrod at gnu-designs.com Wed Feb 7 14:08:48 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Wed, 07 Feb 2007 17:08:48 -0500 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CA39B8.8030405@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> Message-ID: <1170886128.5718.15.camel@localhost.localdomain> On Wed, 2007-02-07 at 21:42 +0100, Marcello Perathoner wrote: > To me the most important thing is that the text displays in a browser. > All browsers handle LF just fine. That's because, as you know, anything served in a web browser is one long line.. the line breaks, carriage returns and other elements are there just as a convenience to us humans reading the source.. but the actual protocol serves it all up as one big, long line. -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/acdc3742/attachment.pgp From traverso at dm.unipi.it Wed Feb 7 16:07:49 2007 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu, 8 Feb 2007 01:07:49 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <178078053.20070207135954@noring.name> (message from Jon Noring on Wed, 7 Feb 2007 13:59:54 -0700) References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> Message-ID: <200702080007.l1807nn11274@pico.dm.unipi.it> Since I put gutvol-d in the same mail folder as gweekly, this happened to me just in the middle of our discussion on line ends: > Subject: [gweekly] Project Gutenberg Weekly Newsletter -- Week #05-2007 > Sender: gweekly-bounces at lists.pglaf.org > X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.0.1 > > pgweekly_2007_02_07.txt > The Project Gutenberg Weekly Newsletter for Wednesday, February 7, 2007 > ****eBooks Readable by Both Humans and Computers since July 4, 1971**** > > NOTE: Best viewed with a fixed-width font, i.e. Courier New. > Windows NotePad is a good a program to use for viewing. > From jon at noring.name Wed Feb 7 16:10:04 2007 From: jon at noring.name (Jon Noring) Date: Wed, 7 Feb 2007 17:10:04 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <200702080007.l1807nn11274@pico.dm.unipi.it> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> Message-ID: <83702198.20070207171004@noring.name> Carlo Traverso wrote: > Since I put gutvol-d in the same mail folder as gweekly, this happened > to me just in the middle of our discussion on line ends: >> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >> Windows NotePad is a good a program to use for viewing. Well, it looks like PG still recommends Windows users use Notepad for the local viewing of PG plain texts. This means that PG plain texts should still use CR+LF for newlines, at least until Notepad sees the light, or there's a decision that Notepad is no longer recommended to locally view PG plain texts. Bowerbird, do you suggest that PG should recommend users not to use any Windows-based system to read PG texts? Jon Noring From Bowerbird at aol.com Wed Feb 7 20:34:01 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 7 Feb 2007 23:34:01 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: jon noring said: > Bowerbird, do you suggest that PG should recommend users > not to use any Windows-based system to read PG texts? i recommend windows users "upgrade" to vista immediately; they'll get lost in complexity fog and we can forget about 'em... -bowerbird p.s. x.p. people should use wordpad -- far better than notepad. p.p.s. the newsletter suggestion was for reading the newsletter... p.p.p.s. i really fell off the one-liner bandwagon on this post! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070207/76533bc9/attachment.htm From Gutenberg9443 at aol.com Wed Feb 7 22:24:11 2007 From: Gutenberg9443 at aol.com (Gutenberg9443 at aol.com) Date: Thu, 8 Feb 2007 01:24:11 EST Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg... Message-ID: I forgot to mention that I always keep my word processing program set at "view page" so that paragraphs don't appear to go on forever. Sorry I didn't get into this discussion earlier. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/a92a3ace/attachment.htm From schultzk at uni-trier.de Thu Feb 8 01:15:57 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 8 Feb 2007 10:15:57 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1170886128.5718.15.camel@localhost.localdomain> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <1170886128.5718.15.camel@localhost.localdomain> Message-ID: Hi All, Speaking of protocol (http/html etc): The crs and lfs are to be considered as white space !! (over simplified) No such thing as a line, just a data stream!! It is fun reading that you all mix fact fiction and purposely misinterpretate. Keith. Am 07.02.2007 um 23:08 schrieb David A. Desrosiers: > On Wed, 2007-02-07 at 21:42 +0100, Marcello Perathoner wrote: >> To me the most important thing is that the text displays in a >> browser. >> All browsers handle LF just fine. > > That's because, as you know, anything served in a web browser is one > long line.. the line breaks, carriage returns and other elements are > there just as a convenience to us humans reading the source.. but the > actual protocol serves it all up as one big, long line. > > -- > David A. Desrosiers > desrod at gnu-designs.com > Skype username: setuid > http://gnu-designs.com > ?The palest ink is better than the most retentive memory.? > - Old Chinese Proverb > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Thu Feb 8 01:20:04 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 8 Feb 2007 10:20:04 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <83702198.20070207171004@noring.name> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> Message-ID: Come Guys, Who really cares what is suggested for Windows or whatever !! Fact is that PG states that the texts are to have the cr + lf line ending. It is the definition of PG. Keith. Am 08.02.2007 um 01:10 schrieb Jon Noring: > Carlo Traverso wrote: > >> Since I put gutvol-d in the same mail folder as gweekly, this >> happened >> to me just in the middle of our discussion on line ends: > >>> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >>> Windows NotePad is a good a program to use for viewing. > > Well, it looks like PG still recommends Windows users use Notepad > for the local viewing of PG plain texts. > > This means that PG plain texts should still use CR+LF for newlines, at > least until Notepad sees the light, or there's a decision that Notepad > is no longer recommended to locally view PG plain texts. > > Bowerbird, do you suggest that PG should recommend users not to use > any Windows-based system to read PG texts? > > Jon Noring > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Thu Feb 8 01:04:17 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Thu, 8 Feb 2007 10:04:17 +0100 Subject: [gutvol-d] =?iso-8859-1?q?howto=3A_unwrap_the_paragraphs_in_a_pro?= =?iso-8859-1?q?ject_gutenberg=A0_e-text?= In-Reply-To: References: Message-ID: <8F72C47C-B8BF-4F48-8218-AF151481959A@uni-trier.de> Tch, Tch, Tch, You should know better than that!! You have been around long enough that the cr+lf goes way back to before windows! I agree though that it has lost its meaning and windows has kept it alive beyond it usefulness!!! ;-)) Just for the record: Logic and Windows is a contradiction in itself ! Keith. Am 07.02.2007 um 21:14 schrieb Bowerbird at aol.com: > jon noring said: > > there is a logic behind the CR+LF characters. > > the "logic" of windows. make us eat puke. notepad? ha! > > -bowerbird > > p.s. love how your p.c. munged the subject. how appropriate. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/5eec5031/attachment.htm From marcello at perathoner.de Thu Feb 8 03:53:54 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 12:53:54 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <83702198.20070207171004@noring.name> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> Message-ID: <45CB0F52.20000@perathoner.de> Jon Noring wrote: >>> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >>> Windows NotePad is a good a program to use for viewing. > > Well, it looks like PG still recommends Windows users use Notepad > for the local viewing of PG plain texts. No. It looks like Michael still recommends Notepad for reading the newsletter. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Thu Feb 8 03:59:29 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 12:59:29 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> Message-ID: <45CB10A1.3010608@perathoner.de> Schultz Keith J. wrote: > Fact is that PG states that the texts are to have the cr + lf line > ending. Could you cite chapter and verse pls? -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Thu Feb 8 05:20:00 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 14:20:00 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <83702198.20070207171004@noring.name> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> Message-ID: <45CB2380.8090208@perathoner.de> Jon Noring wrote: > This means that PG plain texts should still use CR+LF for newlines, > at least until Notepad sees the light, or there's a decision that > Notepad is no longer recommended to locally view PG plain texts. I don't want to denigrate Windows users (Note for Windows users: "denigrate" means: put down) but it is stupid for a "literary archive" to encode texts following ephemeral conventions of the day. >From the Volunteer's FAQ: > This section of the FAQ goes into great detail about all kinds of > formatting questions. However, looked at from a higher level, the > only real issue is that we want to render texts clearly, with > formatting that reflects the original, so that readers of the plain > text format can read them easily, and people converting them to other > formats can do so reliably. Now, some will argue that "read them easily" is more important than "converting them to other formats" because millions of people want to read and just a few want to convert. But this is a short-sighted argument. If there would be a reliable way to convert to other formats, these conversion would have been long since implemented at PG giving readers a vast choice of different formats for every PG text. The short-sighted decision to make a presentational format the main PG format has, on one hand made it possible to read PG texts on the ubiquitous Notepad, on the other hand made it *impossible* to port the texts to anything else. Empirical proof: many programmers have tried and none succeeded. PG's impact on the world would be much greater if PG would offer a format that allows reliable conversion and the toolchain to do that conversion online at PG or locally by any user or commercial venture. Everybody is complaining about young people not reading enough gadda gadda gadda, but why should they read, if reading means staying at home and staring at fixed fonts in Notepad? When they can easily hang around town and watch videos on their iPods? The future of PG is to get the books onto the devices people carry around, like cellphones, PDAs, iPods, car navigation systems, gameboys, PlayStation Portables, etc. Hand on the torch ... -- Marcello Perathoner webmaster at gutenberg.org From prosfilaes at gmail.com Thu Feb 8 05:30:44 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 07:30:44 -0600 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: <6d99d1fd0702080530t6f96f89ck99e1d83d2410f78d@mail.gmail.com> On 2/7/07, Jon Noring wrote: > I'd rather see, for the long-term, that all text files (plain and XML) > use only LF to define newlines, since that is what is supported in *nix > and Mac OS X. But for the shorter term we have to recognize that Notepad > still plays an important enough role in the reading of PG texts to stick > with CR+LF. I'd rather see us stick with CR+LF. A CR+LF is likely to look fine on a program designed to handle LF newlines; the worst case common scenarios is that there's some sort of mark at the end of lines, or that the text is double spaced. Even in the future, CR+LF files are going to be common enough that all text viewers used by users not savvy enough to fix line ends is going to read CR+LF. As for XML, I think enforcing any one line-end system is pointless. It doesn't matter whether a XML file has CR+LF line ends or LF line ends, and it's probably not a great idea to open the XML file in Notepad anyway. From prosfilaes at gmail.com Thu Feb 8 05:43:01 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 07:43:01 -0600 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB2380.8090208@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> Message-ID: <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> On 2/8/07, Marcello Perathoner wrote: > Jon Noring wrote: > > > This means that PG plain texts should still use CR+LF for newlines, > > at least until Notepad sees the light, or there's a decision that > > Notepad is no longer recommended to locally view PG plain texts. > > I don't want to denigrate Windows users (Note for Windows users: > "denigrate" means: put down) but it is stupid for a "literary > archive" to encode texts following ephemeral conventions of the day. Thanks, Marcello, for that point of friendliness towards all our volunteers, no matter what system they may be using. This is a single completely arbitrary choice. The only real difference is whether Windows users (that is, the majority) can read the text files in the default tool on their system. Or that would be the only real difference, if this was a de novo choice. In fact, our choices are, have an inconsistent archive, change a few files to get a consistent CR+LF collection, or change twenty thousand files to get a consistent LF collection. > The short-sighted decision to make a presentational format the main PG > format has, on one hand made it possible to read PG texts on the > ubiquitous Notepad, on the other hand made it *impossible* to port the > texts to anything else. Empirical proof: many programmers have tried and > none succeeded. Which has absolutely nothing to do with CR+LF versus LF. From jon at noring.name Thu Feb 8 06:18:35 2007 From: jon at noring.name (Jon Noring) Date: Thu, 8 Feb 2007 07:18:35 -0700 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB2380.8090208@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> Message-ID: <1909657568.20070208071835@noring.name> Marcello wrote: > PG's impact on the world would be much greater if PG would offer a > format that allows reliable conversion and the toolchain to do that > conversion online at PG or locally by any user or commercial venture. No argument from me. I've noted many times that PG's approach should be to build a "digital master", which itself need not be directly readable (but should be "understandable" in its native form), and then from that derive the "format of the week." That is, digital mastering should be decoupled from the reading editions. (To make it clear, this is where PG is heading by the work of Marcello and DP, but there's a huge amount of legacy PG content which follows the old paradigm. And PG itself has not committed to the digital master approach.) Basing the digital master on TEI is a smart move. Still supporting ASCII plain text as a digital master is no longer a "good thing". If one is to support plain text as some sort of master, even ZML, at least require UTF-8 or UTF-16. Unicode support is no longer an issue on most platforms, thus limiting oneself to ASCII is ludicrous. We no longer live in 1995 when this was a major issue. (I remember arguing with Michael Hart way back in 1996 on the eBook-List about the need to preserve the non-ASCII characters found in most Public Domain texts; the approach back then, and still "approved" in the PG guidelines today, is to allow ASCIIzation, which is no longer needed nor should it even be tolerated. This issue is now resolved.) > Everybody is complaining about young people not reading enough gadda > gadda gadda, but why should they read, if reading means staying at home > and staring at fixed fonts in Notepad? When they can easily hang around > town and watch videos on their iPods? Hey, again no argument from me. But as long as PG supports plain text intended for native reading, then the Notepad issue rears its ugly head. Now if PG says we'll go with LF for newlines, for both plain texts and XML documents, then it has to explicitly address the issue of Notepad, such as saying "sorry, if you want to natively read this plain text file in Windows, use either WordPad or your web browser, or some third party text editor -- the default Notepad won't work because..." One benefit of a rich digital master system (like TEI with full Unicode support) is to allow end-users who want plain text to have it their way vis-a-vis newline characters. (Other choices include text encoding, such as ASCII, ISO-8859 and even EBCDIC, with possible losses of character fidelity, and UTF-8/16/32; another option can be text line-lengths and whether lines are even broken at all within paragraphs.) I'm thinking of something akin to the TEI "pizza chef" (not the same thing, but I like being able to "order" your text based on a checklist.) > The future of PG is to get the books onto the devices people carry > around, like cellphones, PDAs, iPods, car navigation systems, gameboys, > PlayStation Portables, etc. It's also, in my opinion, to enable a host of features beyond simple narrative reading, which is the limitation with the plain text approach. Again, PG should not be stuck in "1990-think". Jon Noring From jon at noring.name Thu Feb 8 06:28:22 2007 From: jon at noring.name (Jon Noring) Date: Thu, 8 Feb 2007 07:28:22 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CB0F52.20000@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> Message-ID: <1203301105.20070208072822@noring.name> Marcello wrote: > Jon Noring wrote: >>>> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >>>> Windows NotePad is a good a program to use for viewing. >> Well, it looks like PG still recommends Windows users use Notepad >> for the local viewing of PG plain texts. > No. It looks like Michael still recommends Notepad for reading the > newsletter. I stand corrected. However, if Michael recommends this for his newsletter, would he also recommend (or suggest) using Notepad for reading PG plain texts on Windows systems? Were PG to institute a policy that all plain texts in the PG archive are to have LF (and not CR+LF) to identify new lines, this would preclude the use of Windows Notepad to read them. Hopefully Michael and/or Greg will weigh in on this topic since it does suggest something to discuss in the guidelines. (If not, PG may find more and more plain texts submitted to the archive which use LF and thus will be unreadable in Notepad -- not exactly a desired result so long as plain texts still form the core of the PG collection.) Jon Noring From marcello at perathoner.de Thu Feb 8 07:34:06 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 16:34:06 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> Message-ID: <45CB42EE.5010407@perathoner.de> David Starner wrote: > The only real difference > is whether Windows users (that is, the majority) can read the text > files in the default tool on their system. If you put the horse before the cart you'll find that users are reading on their PCs because it is impossible to read the texts anywhere else. In 2006 1 billion cell phones were sold but only 209 million PCs. (From Steve Jobs's keynote about the iPhone.) Basically there are 5 times as many cell phones around as PCs. Windows users are a "majority" only because arbitrary PG format choices prevent owners of other digital equipment from using the texts. -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Thu Feb 8 07:40:49 2007 From: jon at noring.name (Jon Noring) Date: Thu, 8 Feb 2007 08:40:49 -0700 Subject: [gutvol-d] "Get Fuzzy" on Project Gutenberg (sort of) Message-ID: <1606825416.20070208084049@noring.name> http://www.comics.com/comics/getfuzzy/archive/getfuzzy-20070208.html Well, it is a stretch to tie the above cartoon strip to PG, but funny nevertheless. Jon From grythumn at gmail.com Thu Feb 8 08:45:21 2007 From: grythumn at gmail.com (Robert Cicconetti) Date: Thu, 8 Feb 2007 11:45:21 -0500 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB42EE.5010407@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> Message-ID: <15cfa2a50702080845m41312954y103498f4c225861e@mail.gmail.com> On 2/8/07, Marcello Perathoner wrote: > If you put the horse before the cart you'll find that users are reading > on their PCs because it is impossible to read the texts anywhere else. > > In 2006 1 billion cell phones were sold but only 209 million PCs. (From > Steve Jobs's keynote about the iPhone.) Basically there are 5 times as > many cell phones around as PCs. > > Windows users are a "majority" only because arbitrary PG format choices > prevent owners of other digital equipment from using the texts. You're oversimplifying. Of those 1 billion cell phones, how many 1) Have more than 256Kb of free storage onboard or provisions for external cards, 2) Have widely available computer interfaces or internet connectivity, 3) Have a display bigger than a postage stamp, 4) Have or even allow for, integral or 3rd party text reading software? I think you'll find the number of cell phones sold, suited to reading more than a paragraph or so, is actually quite small. Sure, in theory you could break an etext down to SMS and send it to a majority of phones (expensive in the US, at least) but it is not exactly practical, whether you use a CRLF or just LF. R C From prosfilaes at gmail.com Thu Feb 8 08:52:17 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 10:52:17 -0600 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB42EE.5010407@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> Message-ID: <6d99d1fd0702080852o4e95c883wc55b81af6a73fae7@mail.gmail.com> On 2/8/07, Marcello Perathoner wrote: > David Starner wrote: > > > The only real difference > > is whether Windows users (that is, the majority) can read the text > > files in the default tool on their system. > > If you put the horse before the cart you'll find that users are reading > on their PCs because it is impossible to read the texts anywhere else. That's not true; it is possible to read the texts many other places. I've seen sites offer Gutenberg files for many different PDAs and e-book readers, and seen many people explain how to convert Gutenberg etexts for such and such device. If people don't have texts available for certain devices, I'm going to bet that it's because they either don't care or the system is so locked down as to be unusable. > In 2006 1 billion cell phones were sold but only 209 million PCs. (From > Steve Jobs's keynote about the iPhone.) Basically there are 5 times as > many cell phones around as PCs. Cell phones are unsharable and at this point in time, people are getting new cell phones more frequently than computers, since computers have matured enough that they don't have to be replaced as often and computers don't have to be replaced every time you go to a new company. So there probably aren't 5 times as many cell phones around as PCs. And while we're talking about random pieces of digital equipment, what about a DVD or VCD player? I certainly expect more of them around than computers. Why is the cell phone market so much more important than the DVD market? > Windows users are a "majority" only because arbitrary PG format choices > prevent owners of other digital equipment from using the texts. Nonsense. Any digital equipment that is used to display text can import a text or html file, and people have made tools to import PG texts to pretty much all of those tools. TEI-Lite may get prettier conversions, but it won't fundamentally change what you can convert to. Why can't you read a PG file on a DVD player? It's fairly trivial to turn a text file into something that will play on just about any DVD player, and they're much more common than e-book readers and palm pilots that people keep playing with, and at least as common as PCs. Apparently, demand for text matters just as much as the simple amount of equipment out there. The reason why you can't use a PG file on a cell phone or a DVD player is because they aren't designed for reading and people don't want to read on them. A cell phone is low resolution, low contrast and hard to read from. A DVD player, on the other hand, only has the problem that it's low resolution and has poor controls. From hart at pglaf.org Thu Feb 8 09:26:53 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 Feb 2007 09:26:53 -0800 (PST) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <6d99d1fd0702080852o4e95c883wc55b81af6a73fae7@mail.gmail.com> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> <6d99d1fd0702080852o4e95c883wc55b81af6a73fae7@mail.gmail.com> Message-ID: A few additions, emendations, corrections, to these thoughts and numbers. Cell phones are important because more people have them, and not only do they have them, they have them WITH them so much more of the time, thus, the eBooks on them are much more available. As for whether people would USE eBooks on cell phones, this is very much a generational thing with a generation of people who grew up with GameBoys thinking that the windows on cell phones to the virtual world are just the right size. However, even a decade ago the Nokia 9000 had a screen four time larger, and a full qwerty keyboard, Web interface, etc., not to mention the eNV, iPhone, and other 2007 products with larger screens. Should I continue? I have whole series of articles on this subject I could send. . . . Project Gutenberg eBooks, pretty much all of them, are available for the new cell phones, not to mention those with Web browsers, email, etc, and and iPod program to read them was out only one week into iPod history. Obviously these people think eBooks on cell phones and iPods have value. As for the numbers of cell phones, we are at the most rapid growth curve of all cell phone history right now, and should pass that mid-point when 50% of the potential market is saturated sometime this year: with about 3 billion active cell phone accounts, as compared to 1 billion Internet, though that, too, should be a little larger by the end of 2007. Presuming only HALF the new cell phones each year go to new customers it will be 4 billion just two years later, and then on to 5 billion. I was guessing that the trend would slow down more and more during the period, and thus perhaps take a year longer to 5 billion, and several years more to get to 6 billion, which will probably be pretty much a limit to these rapid growth curves, so watch for some SERIOUS competition and shakeout, it could be as bad as "The Dot Com Bust." However, by the time this happens, the vast majority of people will have cell phones, but the majority of people will still not have computers of the laptop or desktop variety. Hence we should target cell phones, simply because they are there in an already greater number that will continue to outpace computer growth to the point where saturation takes place. Question: Does anyone have similar predictions for computers of normal laptop or desktop varieties? What is the the projected saturation, the point where growth slows down simply because most of the people who had a desire to have a computer already have one? I have always predicted the merging of technologies, and continue now-- the iPhone is just the beginning, and cheaper models will follow as the iPod clones are now available for $50, video iPod, 4G, FM tuner, AND an eBook reader. . .all came standard. Do you think the manufacturers would put in an eBook reader if there is nobody interested in reading eBooks on them? I predict a future with more clamshell devices, with larger screens and keyboards on the inside, more functions, etc., until what you have will literally be pocket computers of all sizes, shapes, colors, etc. BTW, if you go see a REAL eNV, you will notice that the TV commercials, such as they are, appear to have midgets using them, or perhaps persons with just incredibly small hands, or maybe they did it in CGI, but this gizmo just doesn't have a very big internal screen, or anything else. If you want to see where this all started, look up the Nokia 9000, then perhaps take a look at the movie "The Saint," where Val Kilmer uses one quite a bit, without any special effects to make it look bigger, then-- remember that this movie was released in 1997, probably shot in 1996. Not only will the iPhone sell, but it will spawn more and more products just as the iPod did. The next couple years are going to see more growth in cell phone stuff, more than ever before. . . . Michael On Thu, 8 Feb 2007, David Starner wrote: > On 2/8/07, Marcello Perathoner wrote: >> David Starner wrote: >> >>> The only real difference >>> is whether Windows users (that is, the majority) can read the text >>> files in the default tool on their system. >> >> If you put the horse before the cart you'll find that users are reading >> on their PCs because it is impossible to read the texts anywhere else. > > That's not true; it is possible to read the texts many other places. > I've seen sites offer Gutenberg files for many different PDAs and > e-book readers, and seen many people explain how to convert Gutenberg > etexts for such and such device. If people don't have texts available > for certain devices, I'm going to bet that it's because they either > don't care or the system is so locked down as to be unusable. > >> In 2006 1 billion cell phones were sold but only 209 million PCs. (From >> Steve Jobs's keynote about the iPhone.) Basically there are 5 times as >> many cell phones around as PCs. > > Cell phones are unsharable and at this point in time, people are > getting new cell phones more frequently than computers, since > computers have matured enough that they don't have to be replaced as > often and computers don't have to be replaced every time you go to a > new company. So there probably aren't 5 times as many cell phones > around as PCs. > > And while we're talking about random pieces of digital equipment, what > about a DVD or VCD player? I certainly expect more of them around than > computers. Why is the cell phone market so much more important than > the DVD market? > >> Windows users are a "majority" only because arbitrary PG format choices >> prevent owners of other digital equipment from using the texts. > > Nonsense. Any digital equipment that is used to display text can > import a text or html file, and people have made tools to import PG > texts to pretty much all of those tools. TEI-Lite may get prettier > conversions, but it won't fundamentally change what you can convert > to. > > Why can't you read a PG file on a DVD player? It's fairly trivial to > turn a text file into something that will play on just about any DVD > player, and they're much more common than e-book readers and palm > pilots that people keep playing with, and at least as common as PCs. > Apparently, demand for text matters just as much as the simple amount > of equipment out there. > > The reason why you can't use a PG file on a cell phone or a DVD player > is because they aren't designed for reading and people don't want to > read on them. A cell phone is low resolution, low contrast and hard to > read from. A DVD player, on the other hand, only has the problem that > it's low resolution and has poor controls. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Thu Feb 8 09:38:01 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 Feb 2007 09:38:01 -0800 (PST) Subject: [gutvol-d] !@! MORE Re: Plain Text, Hand on the Torch In-Reply-To: <15cfa2a50702080845m41312954y103498f4c225861e@mail.gmail.com> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> <15cfa2a50702080845m41312954y103498f4c225861e@mail.gmail.com> Message-ID: A few more comments: With RAMsticks now so cheap that I just bought a bunch at $10-$15 per gig, there is no reason you can't do the same with your cell phones, presuming, of course, that you get a phone with a USB port, which most of the iPodish ones have to have to be iPod compatible. Of course, some will want more than 1 gig, but that's not really an issue, given how quickly RAMsticks are growing. There are 32 and 64 gig versions most of us would find too expensive, but by the time the cell phone growth curve flattens out, these will be much more affordable. As far as software availability, this was not an issue for the iPods where the first text reading program that would allow PG eBook reading was there only one week after release. As someone mentioned before, yes, there will be some systems intentionally made hard to do this with, and unless growth in other features makes up for this, they will join the dinosaurs. Dedicated systems, such as dedicated word processors, etc., were borh with a very limited lifespan, just as the iPod functions have been included for use in so many other products now, billions more than actual iPods sold by Apple and credited to Steve Jobs. The same will happen with the iPhone, unless he really messes it up. Michael On Thu, 8 Feb 2007, Robert Cicconetti wrote: > On 2/8/07, Marcello Perathoner wrote: >> If you put the horse before the cart you'll find that users are reading >> on their PCs because it is impossible to read the texts anywhere else. >> >> In 2006 1 billion cell phones were sold but only 209 million PCs. (From >> Steve Jobs's keynote about the iPhone.) Basically there are 5 times as >> many cell phones around as PCs. >> >> Windows users are a "majority" only because arbitrary PG format choices >> prevent owners of other digital equipment from using the texts. > > You're oversimplifying. Of those 1 billion cell phones, how many 1) > Have more than 256Kb of free storage onboard or provisions for > external cards, 2) Have widely available computer interfaces or > internet connectivity, 3) Have a display bigger than a postage stamp, > 4) Have or even allow for, integral or 3rd party text reading > software? > > I think you'll find the number of cell phones sold, suited to reading > more than a paragraph or so, is actually quite small. Sure, in theory > you could break an etext down to SMS and send it to a majority of > phones (expensive in the US, at least) but it is not exactly > practical, whether you use a CRLF or just LF. > > R C > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Thu Feb 8 09:48:11 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 Feb 2007 09:48:11 -0800 (PST) Subject: [gutvol-d] !@! Just One More Thing Re: Plain Text, Hand on the Torch In-Reply-To: <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> Message-ID: A computer user who only uses the defaults can be compared to the person who buys a 21 speed bicycle for $500 and then only uses it in whatever gear it was in when they bought it. . . . You probably would NOT believe the numbers of highly placed in an era of punditry who continually complain about their default font as if Project Gutenberg chose it for them. Any volunteers for a ahort FAQ on how to choose your own fonts?!? Please!!! Thanks!!! Give the world eBooks in 2007!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org On Thu, 8 Feb 2007, David Starner wrote: > On 2/8/07, Marcello Perathoner wrote: >> Jon Noring wrote: >> >>> This means that PG plain texts should still use CR+LF for newlines, >>> at least until Notepad sees the light, or there's a decision that >>> Notepad is no longer recommended to locally view PG plain texts. >> >> I don't want to denigrate Windows users (Note for Windows users: >> "denigrate" means: put down) but it is stupid for a "literary >> archive" to encode texts following ephemeral conventions of the day. > > Thanks, Marcello, for that point of friendliness towards all our > volunteers, no matter what system they may be using. > > This is a single completely arbitrary choice. The only real difference > is whether Windows users (that is, the majority) can read the text > files in the default tool on their system. Or that would be the only > real difference, if this was a de novo choice. In fact, our choices > are, have an inconsistent archive, change a few files to get a > consistent CR+LF collection, or change twenty thousand files to get a > consistent LF collection. > >> The short-sighted decision to make a presentational format the main PG >> format has, on one hand made it possible to read PG texts on the >> ubiquitous Notepad, on the other hand made it *impossible* to port the >> texts to anything else. Empirical proof: many programmers have tried and >> none succeeded. > > Which has absolutely nothing to do with CR+LF versus LF. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From robert_marquardt at gmx.de Thu Feb 8 10:03:41 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Thu, 08 Feb 2007 19:03:41 +0100 Subject: [gutvol-d] !@! MORE Re: Plain Text, Hand on the Torch In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> <15cfa2a50702080845m41312954y103498f4c225861e@mail.gmail.com> Message-ID: <5bpms2h4mkkjfvilqq6kmn1as3vmbdp41c@4ax.com> On Thu, 8 Feb 2007 09:38:01 -0800 (PST), you wrote: >With RAMsticks now so cheap that I just bought a bunch at $10-$15 per gig, >there is no reason you can't do the same with your cell phones, presuming, >of course, that you get a phone with a USB port, which most of the iPodish >ones have to have to be iPod compatible. I just bought a 1 Gig SD card for 11 Euro. SD cards would cover most of the PDAs. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From joey at joeysmith.com Thu Feb 8 10:10:50 2007 From: joey at joeysmith.com (Joey Smith) Date: Thu, 08 Feb 2007 11:10:50 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <45CB67AA.6090708@joeysmith.com> Bowerbird at aol.com wrote: > on the other hand, the right workflow would solve both > our problems and the one that provoked the question: > 1. globally change cr/lf to magic character. > 2. globally change cr to magic character. > 3. globally change lf to magic character. > 4. change magic character to desired newline. It's not clear to me what the value is of going to an intermediate magic character. Marcello has already demonstrated a perfectly reasonable example of how to replace all newlines of any type with the desired newline. From cannona at fireantproductions.com Thu Feb 8 10:18:44 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 12:18:44 -0600 Subject: [gutvol-d] !@! MORE Re: Plain Text, Hand on the Torch References: <521676807.20070207082307@noring.name><45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name><200702080007.l1807nn11274@pico.dm.unipi.it><83702198.20070207171004@noring.name><45CB2380.8090208@perathoner.de><6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com><45CB42EE.5010407@perathoner.de><15cfa2a50702080845m41312954y103498f4c225861e@mail.gmail.com> Message-ID: <001901c74bad$ae167040$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Many of the phones that do not have usb ports do support mmc or other memory cards that are also falling in price along with the ram sticks. - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: "Michael Hart" To: "Project Gutenberg Volunteer Discussion" Sent: Thursday, February 08, 2007 11:38 AM Subject: [gutvol-d] !@! MORE Re: Plain Text, Hand on the Torch > > A few more comments: > > With RAMsticks now so cheap that I just bought a bunch at $10-$15 per gig, > there is no reason you can't do the same with your cell phones, presuming, > of course, that you get a phone with a USB port, which most of the iPodish > ones have to have to be iPod compatible. > > Of course, some will want more than 1 gig, but that's not really an issue, > given how quickly RAMsticks are growing. There are 32 and 64 gig versions > most of us would find too expensive, but by the time the cell phone growth > curve flattens out, these will be much more affordable. > > As far as software availability, this was not an issue for the iPods where > the first text reading program that would allow PG eBook reading was there > only one week after release. As someone mentioned before, yes, there will > be some systems intentionally made hard to do this with, and unless growth > in other features makes up for this, they will join the dinosaurs. > > Dedicated systems, such as dedicated word processors, etc., were borh with > a very limited lifespan, just as the iPod functions have been included for > use in so many other products now, billions more than actual iPods sold by > Apple and credited to Steve Jobs. > > The same will happen with the iPhone, unless he really messes it up. > > Michael > > On Thu, 8 Feb 2007, Robert Cicconetti wrote: > >> On 2/8/07, Marcello Perathoner wrote: >>> If you put the horse before the cart you'll find that users are reading >>> on their PCs because it is impossible to read the texts anywhere else. >>> >>> In 2006 1 billion cell phones were sold but only 209 million PCs. (From >>> Steve Jobs's keynote about the iPhone.) Basically there are 5 times as >>> many cell phones around as PCs. >>> >>> Windows users are a "majority" only because arbitrary PG format choices >>> prevent owners of other digital equipment from using the texts. >> >> You're oversimplifying. Of those 1 billion cell phones, how many 1) >> Have more than 256Kb of free storage onboard or provisions for >> external cards, 2) Have widely available computer interfaces or >> internet connectivity, 3) Have a display bigger than a postage stamp, >> 4) Have or even allow for, integral or 3rd party text reading >> software? >> >> I think you'll find the number of cell phones sold, suited to reading >> more than a paragraph or so, is actually quite small. Sure, in theory >> you could break an etext down to SMS and send it to a majority of >> phones (expensive in the US, at least) but it is not exactly >> practical, whether you use a CRLF or just LF. >> >> R C >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy2m0I7J99hVZuJcRAgSeAJ9ff0s59nuixU2RfqubdaFhjru8HwCgzajq P43HazRvqCYQqVILhRosHLw= =hA8l -----END PGP SIGNATURE----- From Bowerbird at aol.com Thu Feb 8 11:18:04 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 14:18:04 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: jon noring said: > Were PG to institute a policy that all plain texts in the PG archive > are to have LF (and not CR+LF) to identify new lines it seems to need to be repeated that no one suggested that. we merely said that _if_ there was standardization, what _we_ would suggest -- in our opinion -- should _be_ the standard. but it's interesting to see how the idea strikes such a nerve. notepad is a crap app, but look how loudly people scream if they think you might be taking it away from them! it's funny. *** marcello said: > PG's impact on the world would be much greater if PG would offer a > format that allows reliable conversion and the toolchain to do that > conversion online at PG or locally by any user or commercial venture. like i said, it's pretty rare when marcello and i agree. so pay attention. and if i may append something, it would be this: > and the impact will be even greater if the format is simple enough > that a 4th-grader could understand it, and maintain files using it... at any rate, let's see if z.m.l. or t.e.i. gets there first, ok? *** keith said: > The crs and lfs? are to be considered as white space !! (over?simplified) actually, that _is_ "over-simplified". they reposition the pen, which might not leave any ink on the page, but the effect is far from "invisible". > No such thing as a line, just a data stream!! yes, that's a better way of saying the same thing. > It is fun reading that you all mix fact fiction and purposely?misinterpretate. gotta take your amusement where you find it, i guess... ;+) > Who really cares what is suggested for Windows or whatever !! > Fact is that PG states that the texts are to have the cr + lf line ending. > It is the definition of PG. ok, then fix the files that don't use cr/lf. but be consistent. > You should know better than that!! You have been around long enough > that the cr+lf goes way back to before windows! ah yes, the mainframe lineprinters, i remember them well... ;+) > I agree though that it has lost its meaning > and windows has kept it alive beyond it usefulness!!! ;-)) the only reason we're talking about it is because of the windows monopoly. and let us not forget that there _are_ viewer-apps on the windows platform which handle line-endings transparently, and do it well. except for notepad. which billyg, in his wisdom, still sees fit to make the _default_ .txt viewer-app. *** david said: > "sorry, if you want to natively read this plain text file in Windows, > use either WordPad or your web browser, or some third party text editor > -- the default Notepad won't work because..." hey, that would be a good start! of course, you'd tell them _how_ to change their default as well. it's really not all that difficult to do, and i imagine a good number would thank you for stepping them up, not just in regard to project gutenberg e-texts, but _all_ their .txt files. but even better would be to give them a cross-platform kick-ass viewer-app geared specifically to project gutenberg e-texts and suggest they use _that_... *** david said: > In fact, our choices are, have an inconsistent archive, > change a few files to get a consistent CR+LF collection, > or change twenty thousand files to get a consistent LF collection. there are more than "a few files" which don't use cr/lf newlines. perhaps someone with access to the actual files can run a 2-liner: 1. for each file in the library, 2. if (num(cr)>0 or num(lf)>0) and num(cr)<>num(lf) then spit out name. that way we can know the actual numbers that we are talking about here... i mean, really, people seem to be having a heart-attack at the idea of an e-text that wouldn't have cr/lf line-endings, seemingly oblivious to the fact that there are probably hundreds of those texts, right now. *** david said: > I'm thinking of something akin to the TEI "pizza chef" > (not the same thing, but I like being able to > "order" your text based on a checklist.) i've already accomplished this in large part... > http://www.greatamericannovel.com/scgi-bin/babelfish16.pl ...and i wrote most of that code right here, in public, on this list... remember, it was the "open-source" project to which nobody -- except me -- made any contributions. rather curious, eh? *** david said: > Even in the future, CR+LF files are going to be common enough that > all text viewers used by users not savvy enough to fix line ends > is going to read CR+LF. and jon said: > Unicode support is no longer an issue on most platforms, thus > limiting oneself to ASCII is ludicrous. We no longer live in 1995 > when this was a major issue. funny, isn't it?, that "some people" argue that we should expect the viewer-apps to handle _unicode_, when many currently do not, and _at_the_same_time_ say that we cannot expect viewer-apps to standardize line-endings, when almost all of them currently do? the hypocrisy of that double-standard is extremely telling... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/8e988a2d/attachment-0001.htm From jon at noring.name Thu Feb 8 12:21:08 2007 From: jon at noring.name (Jon Noring) Date: Thu, 8 Feb 2007 13:21:08 -0700 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: <117138269.20070208132108@noring.name> Bowerbird wrote: > notepad is a crap app, but look how loudly people scream if > they think you might be taking it away from them!? it's funny. Actually, as a simple etext reader, Notepad is acceptable and doesn't get in the way of reading. Hmmm, testing it out now, I can adjust the window aspect ratio, select word wrap (if needed), and select the font. I can see why Michael Hart recommends Notepad for his newsletter. Not quite at the level of WordPad, but usable and can present texts in a pretty acceptable manner... > marcello said: >>?? PG's impact on the world would be much greater if PG would offer a >>?? format that allows reliable conversion and the toolchain to do that >>?? conversion online at PG or locally by any user or commercial venture. > like i said, it's pretty rare when marcello and i agree.? so pay attention. You forgot to add me and others to the list. Many of us agree on the need for a "digital master". But we disagree on what that digital master should be. Marcello and I believe it should be a rich XML-based vocabulary (he supports TEI). Bowerbird prefers a normalized plain text (ZML). >>?? and the impact will be even greater if the format is simple enough >>?? that a 4th-grader could understand it, and maintain files using it... > at any rate, let's see if z.m.l. or t.e.i. gets there first, ok? I'm amazed you haven't finished the ZML project already! Of course super-simple plain text should get there first, including for conversion, since one has few document structures and text semantics to have to deal with (especially with regards to styling.) But simplicity is not the issue. Meeting requirements is. Most here in gutvol-d, DP, etc., collectively understand the need for a richer description of the document structure and text semantics, as well as to provide the necessary hooks for various document metadata, to enable linking, etc. This is much tougher and non-standard to do with plain text, while with markup it is a snap. So plain text is no longer simple when one wants to do something more elaborate with it. Markup has significant advantages since it allows describing of the content (data) without it being the content/data itself. (As I've noted before, it appears ZML has an XML analog, allowing round-tripping between the two. Thus the issue is if ZML has sufficient definition to meet a host of needs and requirements. Bowerbird has not demonstrated that; him saying that "it will" is not sufficient. The default position here is XML, thus those proposing non-XML solutions have to state why their approach meets both requirements and confers other advantages. I've not seen the list yet. Again, stating "it will" without elaboration is not a sufficient argument -- it is a non-argument. So point-by-point, how does ZML meet the requirements for a "digital master" standard?) > the only reason we're talking about it is because of the windows > monopoly. Nevertheless, when it comes to personal computers, that's what most people have: Windows boxes. Love it, hate it. That's the reality. So far PG has decided to support people who have Windows boxes. > and let us not forget that there _are_ viewer-apps on the windows platform > which handle line-endings transparently, and do it well.? except for notepad. > which billyg, in his wisdom, still sees fit to make the _default_ .txt viewer-app. Agreed with this. There's WordPad and IE as the other "built-in" text viewers. So WordPad could be recommended, but then PG has to *explicitly* recommend it. Most Windows users, the unwashed masses, simply use the computer out of the box, and the default text file viewer is Notepad, not WordPad. And these people really don't know how to change that -- yet these are the people PG is trying to reach. >>?? "sorry, if you want to natively read this plain text file in Windows, >>?? use either WordPad or your web browser, or some third party text editor >>?? -- the default Notepad won't work because..." > > hey, that would be a good start!? of course, you'd tell them _how_ > to change their default as well.? it's really not all that difficult to do, > and i imagine a good number would thank you for stepping them up, > not just in regard to project gutenberg e-texts, but _all_ their .txt files. Yes, but this requires PG to make a recommendation, which is something I notice PG tries to avoid as much as possible. > but even better would be to give them a cross-platform kick-ass viewer-app > geared specifically to project gutenberg e-texts and suggest they use _that_... "kick-ass" is nice, but "kick-ass" has to be demonstrated in a version 1.0 application, not a preliminary alpha. Funny that no one has said much about the viewer-app you have demonstrated -- have you tried to find out why people have not said much about it? A user/ergonomic survey of sorts? > i mean, really, people seem to be having a heart-attack at the idea of > an e-text that wouldn't have cr/lf line-endings, seemingly oblivious > to the fact that there are probably hundreds of those texts, right now. Probably. But then this is PG, and pretty much anything goes. Reminds me of the late 60's culture, when I was still a wee teen. >>?? Unicode support is no longer an issue on most platforms, thus >>? limiting oneself to ASCII is ludicrous. We no longer live in 1995 >>?? when this was a major issue. > > funny, isn't it?, that "some people" argue that we should expect > the viewer-apps to handle _unicode_, when many currently do not, > and _at_the_same_time_ say that we cannot expect viewer-apps > to standardize line-endings, when almost all of them currently do? Actually, *Notepad* handles Unicode. That butt-ugly, lame, gawd-awfully stupid, backwards, Windows app foisted on people by BillG to enslave mankind. Yes, Notepad should recognize lone LF as newlines, but by god you give it a UTF-8 text encoding using all kinds of non-ASCII characters, and it *renders it nicely* (the default font has a quite wide glyph support.) Your statement is wrong. Most applications today on the three major personal computing platforms support Unicode and most of them provide a large range of glyphs for the odd-ball characters, and for particular character sets it's possible to install the needed glyphs. Your anti-Unicode slant is hard to understand. > the hypocrisy of that double-standard is extremely telling... You are still talking up ASCII, which conforms with UTF-8. Yet the whole world, including Mac OS X and *nix, has moved to Unicode support. Why you continue to worship ASCII is beyond me (and it does conform to UTF-8!) When that gawdawful Notepad supports UTF-8, one knows that Unicode has arrived. Btw, a great "kick-ass" Unicode tool for Windows is BabelPad: http://www.babelstone.co.uk/Software/BabelPad.html I use it and highly recommend it for text encoding issues and editing in a multi-character-set environment. Now about Macs, see: http://www.alanwood.net/unicode/utilities_editors_macosx.html Plenty of Unicode-capable text editors available. No excuse anymore to restrict oneself to ASCII. ASCII-limited text tools are rapidly becoming a thing of the past, and restricting oneself to it for "simplicity-sake" is quaint, but short-sighted and Luddite. Jon Noring From prosfilaes at gmail.com Thu Feb 8 12:34:20 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 14:34:20 -0600 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: <6d99d1fd0702081234r46d61c80k98a2728f9d5f8e6b@mail.gmail.com> On 2/8/07, Bowerbird at aol.com wrote: > but even better would be to give them a cross-platform kick-ass viewer-app > geared specifically to project gutenberg e-texts and suggest they use _that_... Why? Project Gutenberg has always been towards making texts for anyone, not just for people who use specific applications that only exist for some platforms. > david said: > > Even in the future, CR+LF files are going to be common enough that > > all text viewers used by users not savvy enough to fix line ends > > is going to read CR+LF. > > and jon said: > > Unicode support is no longer an issue on most platforms, thus > > limiting oneself to ASCII is ludicrous. We no longer live in 1995 > > when this was a major issue. > > funny, isn't it?, that "some people" argue that we should expect > the viewer-apps to handle _unicode_, when many currently do not, > and _at_the_same_time_ say that we cannot expect viewer-apps > to standardize line-endings, when almost all of them currently do? It doesn't matter if almost all viewer-apps standardize line-endings, if the one that doesn't is horribly common. Furthermore, using Unicode enables us to do things better, but unstandardized line-endings don't make anything easier. From Bowerbird at aol.com Thu Feb 8 12:37:35 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 15:37:35 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: jon said: > understand the need for a richer description of the document structure _show_me_ something that you can do which i cannot... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/4d7f9318/attachment.htm From ian at babcockbrown.com Thu Feb 8 12:30:31 2007 From: ian at babcockbrown.com (Ian Stoba) Date: Thu, 8 Feb 2007 12:30:31 -0800 Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: As a brief respite from the discussion of Notepad as an ebook reader, I was wondering if anyone here had begun investigating putting together an ebook reader program to run on the XO laptops that are being developed by the One Laptop Per Child group (http:// www.laptop.org). The devices run Linux and use a UI system called Sugar. The preferred development language is Python. With the emphasis on low cost and openness of the system, it seems like a perfect match with PG. Besides, if OLPC really takes off, this could be what the next billion computer users come to think of as standard instead of Windows or any other system running today. This email message may contain information that is confidential and proprietary to Babcock & Brown or a third party. If you are not the intended recipient, please contact the sender and destroy the original and any copies of the original message. Babcock & Brown takes measures to protect the content of its communications. However, Babcock & Brown cannot guarantee that email messages will not be intercepted by third parties or that email messages will be free of errors or viruses. If you do not wish to receive any further e-mail from Babcock & Brown, please send an email to opt-out at babcockbrown.com. From Bowerbird at aol.com Thu Feb 8 12:55:05 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 15:55:05 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: david said: > It doesn't matter if almost all viewer-apps standardize line-endings, > if the one that doesn't is horribly common. except that "the one that doesn't" is easily replaced with a better default. project gutenberg should do its users the _favor_ of moving them forward. which is _also_ why it should provide 'em with a kick-ass viewer-application. and by the way, i'm not opposed to utf8. it's probably the best we've got. and i can certainly support it with my tools, so that's a complete non-issue. and if we want to standardize on cr/lf to support backward apps like notepad, _fine_. but then let's also make sure we support the other backward apps that don't have unicode capability. the argument either works both ways, or not at all, you can't pull it out when it supports your desires and put it away when it doesn't. -bowerbird p.s. and this notion that unicode is omnipresent now and working just fine is _bunk_. and if people keep putting out that lie, i will start dragging in examples -- one after another in a seemingly endless stream -- to prove the ugly truth... it's just another case of the hypocrisy you maintain about the end-users, which sometimes presents them as total blathering idiots who aren't smart enough to try wordpad when notepad doesn't give them a good display, and other times you present them as able to figure out how to make unicode work on their machines... so you have some mighty strange end-users. on the one hand, they are dumber than any end-users i know; but on the other hand, far more technically proficient. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/1adf88a6/attachment.htm From Bowerbird at aol.com Thu Feb 8 13:03:27 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 16:03:27 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: ian said: > I was wondering if anyone here had begun investigating > putting?together an ebook reader program to run on the XO laptops > that are? being developed by the One Laptop Per Child group as soon as they figure out what they want their file-format to be, yeah, i'll have a reader-program for them... > With the emphasis on low cost and openness of the system, > it seems? like a perfect match with PG. indeed, they'll be putting a lot of p.g. e-texts on their machines. > Besides, if OLPC really takes off, this could be what > the next billion computer users come to think of as?standard > instead of Windows or any other system running today. the o.s. is geared specifically toward educational purposes, and might not lend itself well for use outside that context, but they are doing significant rethinking that could ramify... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/4ebc5f66/attachment.htm From marcello at perathoner.de Thu Feb 8 13:04:51 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 22:04:51 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <6d99d1fd0702081234r46d61c80k98a2728f9d5f8e6b@mail.gmail.com> References: <6d99d1fd0702081234r46d61c80k98a2728f9d5f8e6b@mail.gmail.com> Message-ID: <45CB9073.1020501@perathoner.de> David Starner wrote: > Why? Project Gutenberg has always been towards making texts for > anyone, not just for people who use specific applications that only > exist for some platforms. And yet you just said we should format our texts for one specific application on one specific platform. A bit contradictory ... "Making text for anyone" today means: making text for cell phones. There are more cell phones around than anything else and many of them could handle ebooks. > It doesn't matter if almost all viewer-apps standardize line-endings, > if the one that doesn't is horribly common. Furthermore, using Unicode > enables us to do things better, but unstandardized line-endings don't > make anything easier. Which standard are you talking about? ISO? ANSI? W3C? IETF? Or are you talking about the "we won't fix it because by breaking standards we can make more money" M$-standard? -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Thu Feb 8 13:07:13 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 22:07:13 +0100 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <45CB9101.2050907@perathoner.de> Ian Stoba wrote: > As a brief respite from the discussion of Notepad as an ebook reader, > I was wondering if anyone here had begun investigating putting > together an ebook reader program to run on the XO laptops that are > being developed by the One Laptop Per Child group (http:// > www.laptop.org). The devices run Linux and use a UI system called > Sugar. The preferred development language is Python. "less" is a standard program on linux and handles LF quite well. -- Marcello Perathoner webmaster at gutenberg.org From phil at thalasson.com Thu Feb 8 12:38:07 2007 From: phil at thalasson.com (Philip Baker) Date: Thu, 8 Feb 2007 20:38:07 +0000 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CA39B8.8030405@perathoner.de> Message-ID: In article <45CA39B8.8030405 at perathoner.de>, Marcello Perathoner writes >Did you any research whatsoever to corroborate your claim that a >substantial percentage of PG users switch from the browser to notepad to >read their texts? > > A little research shows that CRLF is the Internet line end standard. I would prefer it otherwise but there it is. -- Philip Baker From ian at babcockbrown.com Thu Feb 8 14:11:59 2007 From: ian at babcockbrown.com (Ian Stoba) Date: Thu, 8 Feb 2007 14:11:59 -0800 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: <45CB9101.2050907@perathoner.de> References: <45CB9101.2050907@perathoner.de> Message-ID: I'm familiar with both less and more as paginated text readers and use them daily. I was thinking more along the lines of an application that would take advantage of the XO's built in mesh networking to download Gutenberg ebooks as well as display them. The XO will have a web browser which will certainly work for this purpose, but a simple reader application following the Sugar UI guidelines might be useful as well. One interesting feature of the Sugar UI is that instead of text based menus the UI uses glyphs and icons. The idea is to simplify the interface to make it easier for children unfamiliar with computers to use and also sidesteps the need for localization. On Feb 8, 2007, at 1:07 PM, Marcello Perathoner wrote: > Ian Stoba wrote: > >> As a brief respite from the discussion of Notepad as an ebook reader, >> I was wondering if anyone here had begun investigating putting >> together an ebook reader program to run on the XO laptops that are >> being developed by the One Laptop Per Child group (http:// >> www.laptop.org). The devices run Linux and use a UI system called >> Sugar. The preferred development language is Python. > > "less" is a standard program on linux and handles LF quite well. > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d This email message may contain information that is confidential and proprietary to Babcock & Brown or a third party. If you are not the intended recipient, please contact the sender and destroy the original and any copies of the original message. Babcock & Brown takes measures to protect the content of its communications. However, Babcock & Brown cannot guarantee that email messages will not be intercepted by third parties or that email messages will be free of errors or viruses. If you do not wish to receive any further e-mail from Babcock & Brown, please send an email to opt-out at babcockbrown.com. From Bowerbird at aol.com Thu Feb 8 14:22:32 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 17:22:32 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: the olpc people want more than just a viewer-app. and even in terms of their viewer-app, they want much more than you get in a simple text-viewer. they reasonably expect text styling and niceties like headers, lists, footnotes, tables, references, external and internal links, the whole ball'o'wax... they want a format they can use across-the-board, from their own documentation to use by the kids... which means they _also_ need an authoring tool... and since the thrust of the pedagogical philosophy is "constructivistic", where the classroom of kids is seen as being a responsible agent in their education, this authoring tool also needs to be _collaborative_. the expectation is that the kids will write _together_, in group projects, creating material _cooperatively_... so the bar for their authoring tool is set very high... but in terms of their format, it's a no-markup variant, of course, since it has to be easy enough for the kids. they call their version "crossmark". a description is here: > http://dev.laptop.org/git.do?p=users/krstic/docformat;a=blob;h=beb82c6fb55aec5ef1c959ad587bc6f23a915989;hb=0684915bef67c372a011764ce9c2c0c537684fd8;f=c rossmark-spec.txt -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/b02982c4/attachment.htm From marcello at perathoner.de Thu Feb 8 14:24:02 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu, 08 Feb 2007 23:24:02 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <45CBA302.8060301@perathoner.de> Philip Baker wrote: > In article <45CA39B8.8030405 at perathoner.de>, Marcello Perathoner > writes >> Did you any research whatsoever to corroborate your claim that a >> substantial percentage of PG users switch from the browser to notepad to >> read their texts? >> >> > A little research shows that CRLF is the Internet line end standard. I > would prefer it otherwise but there it is. It is the standard for HTTP request and response headers and mail headers but we are talking about files on disks. There's no standard for that unless you equate "industry standard" (ie. what M$ does) with "standard". -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Thu Feb 8 14:28:37 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 17:28:37 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: ian said: > I was thinking more along the lines of an application? > that would take advantage of the XO's built in mesh networking > to?download Gutenberg ebooks as well as display them. exactly. but downloading -- to simply read -- is pretty easy. it's when you start using that mesh network to _write_ -- collaboratively, with a kid on the other side of the room, at the same time -- that things start to get "interesting"... so the authoring-side of the app is thornier than the viewer-side. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/6548eb91/attachment.htm From prosfilaes at gmail.com Thu Feb 8 14:48:07 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 16:48:07 -0600 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB9073.1020501@perathoner.de> References: <6d99d1fd0702081234r46d61c80k98a2728f9d5f8e6b@mail.gmail.com> <45CB9073.1020501@perathoner.de> Message-ID: <6d99d1fd0702081448r21f07d92x6ea9b64e70281d98@mail.gmail.com> On 2/8/07, Marcello Perathoner wrote: > David Starner wrote: > > > Why? Project Gutenberg has always been towards making texts for > > anyone, not just for people who use specific applications that only > > exist for some platforms. > > And yet you just said we should format our texts for one specific > application on one specific platform. A bit contradictory ... No. I just said that given the choice between formatting our texts for everyone, and formatting our texts for everyone but Windows users, we should choose the first. > "Making text for anyone" today means: making text for cell phones. There > are more cell phones around than anything else and many of them could > handle ebooks. Again, there are many DVD players around, and _all_ of them can handle ebooks. Why aren't we formatting for them? Furthermore, there's no evidence that people want to read etexts on cell phones. Most people buy cell phones for phone calls, and cell phone screens aren't any good for reading on. > > It doesn't matter if almost all viewer-apps standardize line-endings, > > if the one that doesn't is horribly common. Furthermore, using Unicode > > enables us to do things better, but unstandardized line-endings don't > > make anything easier. > > Which standard are you talking about? ISO? ANSI? W3C? IETF? I'm talking about PG standardizing its line-endings to be useful for the people who want to read the books. > Or are you talking about the "we won't fix it because by breaking > standards we can make more money" M$-standard? It doesn't matter what Microsoft does. We want to be useful to our users, which means using the line-ending that all our users can work with. Anything else is just being petty and silly. From cannona at fireantproductions.com Thu Feb 8 14:53:11 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 16:53:11 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: Message-ID: <004701c74bd4$07c85d30$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 They may already have something that can view the texts. I know that they are working on an ebook viewer for text books and such. Perhaps we can take advantage of that, rather than starting fresh. Actually, now that I think about it, I believe they may be planning on harvesting several titles from the library, but I'm not certain. Anyway, wouldn't be a bad to contact them and get the official word. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: "Ian Stoba" To: "Project Gutenberg Volunteer Discussion" Sent: Thursday, February 08, 2007 2:30 PM Subject: [gutvol-d] Sugar-based reader for OLPC? > As a brief respite from the discussion of Notepad as an ebook reader, > I was wondering if anyone here had begun investigating putting > together an ebook reader program to run on the XO laptops that are > being developed by the One Laptop Per Child group (http:// > www.laptop.org). The devices run Linux and use a UI system called > Sugar. The preferred development language is Python. > > With the emphasis on low cost and openness of the system, it seems > like a perfect match with PG. Besides, if OLPC really takes off, this > could be what the next billion computer users come to think of as > standard instead of Windows or any other system running today. > > > > > > This email message may contain information that is confidential and > proprietary to Babcock & Brown or a third party. If you are not the > intended recipient, please contact the sender and destroy the original and > any copies of the original message. Babcock & Brown takes measures to > protect the content of its communications. However, Babcock & Brown cannot > guarantee that email messages will not be intercepted by third parties or > that email messages will be free of errors or viruses. > > If you do not wish to receive any further e-mail from Babcock & Brown, > please send an email to opt-out at babcockbrown.com. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy6oLI7J99hVZuJcRAueEAJ4wyfu1WvbmxTjffUq98pgEFW3PmgCgwqGO fGrP2hj/JMPV54wUJPiGdwo= =n2IO -----END PGP SIGNATURE----- From prosfilaes at gmail.com Thu Feb 8 14:57:05 2007 From: prosfilaes at gmail.com (David Starner) Date: Thu, 8 Feb 2007 16:57:05 -0600 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: <6d99d1fd0702081457s1fc2359q363dd2c8df6b3a90@mail.gmail.com> On 2/8/07, Bowerbird at aol.com wrote: > david said: > > It doesn't matter if almost all viewer-apps standardize line-endings, > > if the one that doesn't is horribly common. > > except that "the one that doesn't" is easily replaced with a better default. > project gutenberg should do its users the _favor_ of moving them forward. > which is _also_ why it should provide 'em with a kick-ass > viewer-application. Forcing someone to change because you think it's better isn't a favor. > and by the way, i'm not opposed to utf8. it's probably the best we've got. > and i can certainly support it with my tools, so that's a complete > non-issue. I love that. You _can_. Once again, you ramble about what you will be able to do, instead of actually having a working system that compares to what we have. > and if we want to standardize on cr/lf to support backward apps like > notepad, > _fine_. but then let's also make sure we support the other backward apps > that > don't have unicode capability. >the argument either works both ways, or not > at all, No, it doesn't. We can support one of the most common text reading applications that comes with new computers in the world in a way that hurts us not at all without being forced to support obsolete text editors in a way that seriously limits us. > p.s. and this notion that unicode is omnipresent now and working just fine > is > _bunk_. I realize that you don't have capital letters yet on your system, but the rest of us have gone a bit beyond that stage. From Bowerbird at aol.com Thu Feb 8 15:03:50 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 18:03:50 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: david said: > No. I just said that given the choice between formatting our texts for > everyone, and formatting our texts for everyone but Windows users, > we should choose the first. just so people know, the cr/lf pairing puts an ugly linefeed "character" at the start of each line in mac programs that don't auto-convert them. mac users are smart enough we just global-change them away, but if it's your impression that cr/lf works "for everyone", you're wrong... windows users should find it as easy to change each lf to a cr/lf pair. (i am assuming that notepad has a change facility, and that it allows people to enter some kind of code to indicate the control characters, but as i reflect back on my memory, i'm not quite so sure about that.) all of this is not to say that i would be _opposed_ to a cr/lf standard. as long as there is consistency in the library, i'm fine with anything... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/d716127e/attachment.htm From Bowerbird at aol.com Thu Feb 8 15:07:23 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 18:07:23 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: aaron said: > Perhaps we can take advantage of that, rather than starting fresh. nobody here has indicated any desire to do any work anyway -- do you remember i posted the story of the little red hen? -- so what exactly would you "take advantage of"? or "start fresh" on? > Anyway, wouldn't be a bad to contact them and get the official word. they've been in touch with michael for some time now. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/899682ad/attachment.htm From cannona at fireantproductions.com Thu Feb 8 15:21:23 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 17:21:23 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: Message-ID: <00c701c74bd7$dd502f20$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - ----- Original Message ----- From: To: ; Sent: Thursday, February 08, 2007 5:07 PM Subject: Re: [gutvol-d] Sugar-based reader for OLPC? > aaron said: >> Perhaps we can take advantage of that, rather than starting fresh. > > nobody here has indicated any desire to do any work anyway > -- do you remember i posted the story of the little red hen? -- > so what exactly would you "take advantage of"? or "start fresh" on? Funny thing about reading. If you start at the top and go to the bottom, with out skipping sentences at random, it makes things much easier to understand. Let me quote the sentence that you skipped. It will tell you exactly what I propose interested persons take advantage of: "I know that they are working on an ebook viewer for text books and such." Then I said: Perhaps we can take advantage of that..." To what could the "that" possibly be refering? Hmmm, perhaps it's the subject of the previous sentence. > > >> Anyway, wouldn't be a bad to contact them and get the official word. > > they've been in touch with michael for some time now. Good to hear. Perhaps he can give us an update. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy7B6I7J99hVZuJcRAu8YAJ9HKybCKZr3DwTChCYHBmPjOAcziQCfRSlf 78YuZsmC6KyG7fszQ9cn+r8= =KZSk -----END PGP SIGNATURE----- From Bowerbird at aol.com Thu Feb 8 15:22:28 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 18:22:28 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: david said: > Forcing someone to change because you think it's better isn't a favor. enabling people who use inferior software certainly isn't doing them a favor. if there is anyone here still using notepad, change your default to wordpad. do it now. you will not regret it. indeed, you'll thank us for doing you a favor. > Once again, you ramble about what you will be able to do, instead of > actually having a working system that compares to what we have. you make me laugh. tell us about your "working system", david. tell us all about it. and i'll go over to the d.p. forums and collect the threads there discussing glitches people have with unicode... and answer me this, people. how many of you have a unicode font installed as your default _right_now_. i'd love to see _that_ poll... to repeat, i can support unicode. but english e-texts don't need it. and when you impose something unnecessary on people that actually will degrade their experience -- which those unicode characters do, in viewer-programs that don't support unicode -- that is a disservice. on the one hand, you say "let users use any viewer-program they want". but then you turn right around and say, "but it has to support unicode". i don't see how you maintain such a fractured stance with a straight face. i say we give people a viewer-app that handles all line-endings well, and which handles unicode, and we tell 'em to throw away crap apps. there's no inconsistency here on this end. and it's not "forcing them" when the viewer-program you give them is a huge improvement, and they are the first people to tell you that they thank you for the program. > We can support one of the most common text reading applications > that comes with new computers in the world in a way that hurts us > not at all if you think you're hurt "not at all" when people read p.g. e-texts in an inferior application like notepad -- which gives them almost _none_ of the capabilities that a full-on e-book program would give them, thereby seriously hampering their true understanding of e-books -- you're wrong. dead wrong. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/15105be8/attachment.htm From Bowerbird at aol.com Thu Feb 8 15:28:49 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 18:28:49 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: aaron said: > "I know that they are working on an ebook viewer for text books and such." > Then I said: > Perhaps we can take advantage of that..." > To what could the "that" possibly be refering?? > Hmmm, perhaps it's the subject of the previous sentence. right. i got that. but since -- when i started an "open-source" project to build an actual e-book viewer-program for p.g. e-texts, right here on this list -- _nobody_contributed_anything_, i don't think there's one bit of interest here in a viewer-app. indeed, david is -- just today -- expressing hostility to the thought that p.g. would supply a viewer-app to users. so, i ask again, just exactly what would you "take advantage of"? -bowerbird p.s. it's also the case that they don't even have a viewer-app, not yet, they haven't even finalized their file-format yet, but all of that is beside the point. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/c4def333/attachment.htm From Bowerbird at aol.com Thu Feb 8 15:32:50 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 18:32:50 EST Subject: [gutvol-d] babelfish, once again Message-ID: once again, here's a link to the last version i did of babelfish: > http://www.greatamericannovel.com/scgi-bin/babelfish16.pl just in case anyone really _is_ interested in a viewer-program... if i remember correctly -- it was a while ago -- that version is broken in one way or another, so _do_ send me error-reports, either backchannel or frontchannel. feature-requests and/or comments and/or any other reaction is equally welcomed, ok? but i know you people couldn't care less about viewer-apps... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/818c6afd/attachment.htm From cannona at fireantproductions.com Thu Feb 8 16:13:23 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 18:13:23 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: Message-ID: <00e301c74bdf$37fb3a30$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Just because no one is interested in your viewer application, does not mean that they are not interested in all viewer applications. Anyway, the point is moot because we can probably take advantage of the viewer application already being built by the OLPC folks to display our texts. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: To: ; Sent: Thursday, February 08, 2007 5:28 PM Subject: Re: [gutvol-d] Sugar-based reader for OLPC? aaron said: > "I know that they are working on an ebook viewer for text books and such." > Then I said: > Perhaps we can take advantage of that..." > To what could the "that" possibly be refering? > Hmmm, perhaps it's the subject of the previous sentence. right. i got that. but since -- when i started an "open-source" project to build an actual e-book viewer-program for p.g. e-texts, right here on this list -- _nobody_contributed_anything_, i don't think there's one bit of interest here in a viewer-app. indeed, david is -- just today -- expressing hostility to the thought that p.g. would supply a viewer-app to users. so, i ask again, just exactly what would you "take advantage of"? - -bowerbird p.s. it's also the case that they don't even have a viewer-app, not yet, they haven't even finalized their file-format yet, but all of that is beside the point. - -------------------------------------------------------------------------------- > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy7zRI7J99hVZuJcRArVVAJ9vHukvql1HyoNOrdAV3YiXWnMgJgCfW/Mf u2CBO28CdIwwqZ5z2QG9/kc= =Xfh9 -----END PGP SIGNATURE----- From cannona at fireantproductions.com Thu Feb 8 16:33:55 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 18:33:55 -0600 Subject: [gutvol-d] babelfish, once again References: Message-ID: <010501c74be2$0c02a320$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bowerbird wrote: > but i know you people couldn't care less about viewer-apps... Flawed logic, like usual. It's like me telling everyone on this list to go volunteer for Sexaholics Anonymous http://www.sa.org/ and when they don't I say, "I know you people couldn't care less about nonprofit organizations." Just because no one cares about your half-baked substandard excuse for a viewer application, that doesn't necessarily mean that no one cares about them in general. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy8GQI7J99hVZuJcRApgDAKDDdM1QTVe8cfybQL4hjPA+k+urggCeLppI JmRjxIkHaY/iwmwGOGiS/JY= =g74p -----END PGP SIGNATURE----- From Bowerbird at aol.com Thu Feb 8 16:48:23 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Feb 2007 19:48:23 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: aaron said: > Just because no one is interested in your viewer application, > does not mean that they are not interested in all viewer applications.? when given the chance to build their _own_ viewer-application, people here declined the opportunity. spin that how you want. > Anyway, the point is moot because we can probably > take advantage of the viewer application already being > built by the OLPC folks to display our texts. the viewer-program they're building is for their "crossmark" format. until p.g. e-texts are converted to that format, it won't work that well. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070208/4b007f3c/attachment.htm From hart at pglaf.org Thu Feb 8 17:02:30 2007 From: hart at pglaf.org (Michael Hart) Date: Thu, 8 Feb 2007 17:02:30 -0800 (PST) Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: <00c701c74bd7$dd502f20$0300a8c0@blackbox> References: <00c701c74bd7$dd502f20$0300a8c0@blackbox> Message-ID: Not too much to update about the OLPC as of yet. I do know the code to get beneath the Sugar shell, if anyone gets one and wants to do Linus things, but it will be a while before anyone but a developer can get them, not to mention tat I'm not sure they have yet settled on a final design. I CAN tell you that they don't come with the original crank any more, or even the pull cord, a la starting a lawnmower, though these are available separately. Michael On Thu, 8 Feb 2007, Aaron Cannon wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > - ----- Original Message ----- > From: > To: ; > Sent: Thursday, February 08, 2007 5:07 PM > Subject: Re: [gutvol-d] Sugar-based reader for OLPC? > > >> aaron said: >>> Perhaps we can take advantage of that, rather than starting fresh. >> >> nobody here has indicated any desire to do any work anyway >> -- do you remember i posted the story of the little red hen? -- >> so what exactly would you "take advantage of"? or "start fresh" on? > > Funny thing about reading. If you start at the top and go to the bottom, > with out skipping sentences at random, it makes things much easier to > understand. Let me quote the sentence that you skipped. It will tell you > exactly what I propose interested persons take advantage of: > > "I know that they are working on an ebook viewer for text books and such." > > Then I said: > > Perhaps we can take advantage of that..." > > To what could the "that" possibly be refering? Hmmm, perhaps it's the > subject of the previous sentence. > > >> >> >>> Anyway, wouldn't be a bad to contact them and get the official word. >> >> they've been in touch with michael for some time now. > > Good to hear. Perhaps he can give us an update. > > Aaron > > > - -- > Skype: cannona > MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail > address.) > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 > Comment: Key available from all major key servers. > > iD8DBQFFy7B6I7J99hVZuJcRAu8YAJ9HKybCKZr3DwTChCYHBmPjOAcziQCfRSlf > 78YuZsmC6KyG7fszQ9cn+r8= > =KZSk > -----END PGP SIGNATURE----- > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Thu Feb 8 17:11:56 2007 From: jon at noring.name (Jon Noring) Date: Thu, 8 Feb 2007 18:11:56 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <1684270218.20070208181156@noring.name> Bowerbird wrote: > the viewer-program they're building is for their "crossmark" format. > until p.g. e-texts are converted to that format, it won't work that well. It's a quite interesting "markup" vocabulary. Of course, the one downside is that a text structured like this makes it very difficult (not impossible) to build stable pointers to. In XML, one can add an "id" to elements, providing a hook to attach a pointer -- even with minor edits of the text the id's act as islands of stability to maintain third-party links even when there are edits to the document. (Such pointers may be used for things like annotations, bookmarks, inter-publication references, etc.) So the ease of editability makes it more difficult to provide interactivity with the texts. I do see that a crossmark formatted document could be converted to the BookX vocabulary a few of us are working on: http://www.bookx.org/ , which is intended to be a relatively simple XML vocabulary for marking up simpler type of books, such as most linear fiction and other narrative types of works. We'll see if crossmark has any legs. As an alternative for an "easy" authoring platform, there's the "Sophie" system associated with the if:book folk. Sophie is much more multimedia oriented, and for educational use may be preferable. It is also XML under the hood, but the user never sees the pointy brackets. This allows using standardized parsers and the like. Jon Noring From lee at novomail.net Thu Feb 8 17:38:14 2007 From: lee at novomail.net (Lee Passey) Date: Thu, 08 Feb 2007 18:38:14 -0700 Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <20070202003742.GA25119@joeysmith.com> References: <20070202003742.GA25119@joeysmith.com> Message-ID: <45CBD086.9090408@novomail.net> joey wrote: >> 1. find a "magic" character, one that's _not_ in the file. >> >> 2. change all carriagereturn+linefeed to the magic one. >> >> 3. change all carriagereturns to the magic character. >> >> 4. change all linefeeds to the magic character as well. >> (and yes, you really need to do all three of these, >> with the carriagereturn+linefeed one done first, >> because there actually exist some e-texts that >> contain _multiple_ types of newlines in them...) >> > > Is there any good reason not to unify these to one particular style > of newline across the board? "\r\n" seems a likely candidate since > the other platforms are generally more lenient towards Windows > line-endings than Windows is toward theirs, in my (albeit limited) > experience. There is a reason; whether it is a good one is a matter of perspective. Project Gutenberg has historically been overtly hostile to any requirement/standard/recommendation/suggestion which might conceivably inconvenience any volunteer in creating a "plain-vanilla" transcription of any written work in the public-domain. In the context of your question, "unification" means "standardization" and "standardization" implies enforcement -- which the Powers That Be at PG refuse to do. You are certainly free to advocate your position as vigorously as you wish, and are encouraged to do so. On the other hand, Linux bigots are free to continue to use "\n" and Mac bigots are free to continue to use "\r". (Windows bigots typically don't know what they're using, so they'll continue to use whatever is put there by the text editor they happen to be using). Welcome to the world of Project Gutenberg anarchy. The whole debate over the relative merits of "\n" vs. "\r" vs. "\r\n", rational and irrational, is just so many wasted electrons, because even if unanimity could be achieved it would have absolutely zero chance of being adopted. From cannona at fireantproductions.com Thu Feb 8 18:03:00 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 20:03:00 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: Message-ID: <190701c74bee$79a42a50$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bowerbird wrote: when given the chance to build their _own_ viewer-application, people here declined the opportunity. spin that how you want. In case anyone has forgotten, here's some excerpts from Bowerbird's original message on his little programming project that went know where. On Wednesday, October 18, 2006 at 5:11 AM Bowerbird wrote: "ok, lee, so as i said the other day, i want you to waste your time reinventing the wheel, by writing a program i've already written (which you seem to be fond of insinuating is mere vapor)... ... we can make it open-source -- i'll direct you, and you'll program. so each day, i'll give you a little assignment for a routine to write - -- i'll even give you the pseudo-code for it -- and then when you come back with the routine finished, we'll go on to the next one... ... so this is your input file, the .zml "master version" that generates others. ... and yes, this _is_ "my antonia", digitized by jose menendez, jon noring, and a flock of others. under my direction, you'll write a program that will turn this nifty .zml version of the book into some solid .html files. then we can convert that .html to a wide variety of our e-book formats. i will also show you how to write routines to get a nice .pdf of the text. all from a measly "lightweight" file in z.m.l. -- "zero markup language", the "virtually invisible" markup that's 2 steps more advanced than x.m.l." Our own viewer application indeed. The wonderous opportunity you were offering us was to build a zml viewer, a viewer for a useless format that no one but you seems to want to bother with. Spin that how you want. > Anyway, the point is moot because we can probably > take advantage of the viewer application already being > built by the OLPC folks to display our texts. the viewer-program they're building is for their "crossmark" format. until p.g. e-texts are converted to that format, it won't work that well. True, but the viewer application either exists or will soon, so that is something we can take advantage of. Whether we do so by using their format, or by extending their application to support the display of the current books in the library, it's at least somewhere to start from. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy9ZqI7J99hVZuJcRAsf6AJ9xd+qKptgsAFYbt8En7NfWK3XxuACg3KI6 go66jbdalbHcv39PL5Mg8hg= =Ve0N -----END PGP SIGNATURE----- From cannona at fireantproductions.com Thu Feb 8 18:05:00 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 8 Feb 2007 20:05:00 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: Message-ID: <190801c74bee$bc3b21c0$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bowerbird wrote: when given the chance to build their _own_ viewer-application, people here declined the opportunity. spin that how you want. In case anyone has forgotten, here's some excerpts from Bowerbird's original message on his little programming project that went know where. On Wednesday, October 18, 2006 at 5:11 AM Bowerbird wrote: "ok, lee, so as i said the other day, i want you to waste your time reinventing the wheel, by writing a program i've already written (which you seem to be fond of insinuating is mere vapor)... ... we can make it open-source -- i'll direct you, and you'll program. so each day, i'll give you a little assignment for a routine to write - -- i'll even give you the pseudo-code for it -- and then when you come back with the routine finished, we'll go on to the next one... ... so this is your input file, the .zml "master version" that generates others. ... and yes, this _is_ "my antonia", digitized by jose menendez, jon noring, and a flock of others. under my direction, you'll write a program that will turn this nifty .zml version of the book into some solid .html files. then we can convert that .html to a wide variety of our e-book formats. i will also show you how to write routines to get a nice .pdf of the text. all from a measly "lightweight" file in z.m.l. -- "zero markup language", the "virtually invisible" markup that's 2 steps more advanced than x.m.l." Our own viewer application indeed. The wondrous opportunity you were offering us was to build a zml viewer, a viewer for a useless format that no one but you seems to want to bother with. Spin that how you want. > Anyway, the point is moot because we can probably > take advantage of the viewer application already being > built by the OLPC folks to display our texts. the viewer-program they're building is for their "crossmark" format. until p.g. e-texts are converted to that format, it won't work that well. True, but the viewer application either exists or will soon, so that is something we can take advantage of. Whether we do so by using their format, or by extending their application to support the display of the current books in the library, it's at least somewhere to start from. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFy9bZI7J99hVZuJcRAi6QAKCrolk8AKj7ldxmowrR/HNxcLwbOgCgq59u E10tIRBLX4LHf93aCGPugK4= =Rq/m -----END PGP SIGNATURE----- From editor at pg-news.org Thu Feb 8 18:04:13 2007 From: editor at pg-news.org (Mike Cook) Date: Thu, 8 Feb 2007 15:04:13 -1100 Subject: [gutvol-d] unwrap the paragraphs -- babelfish -- Plain Text, Hand on the Torch Message-ID: <003a01c74bee$9adc58a0$473c9a0a@laptop> I've been following these discussions for a while and wanted to add my two-penny's >> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >> Windows Notepad is a good a program to use for viewing. With regards the newsletter recommendation of using Windows Notepad, that was me. Although Michael Hart has been reading the newsletters and forwarding a number of thoughts, he has never disagreed with that statement. As already mentioned by several people, a high percentage of computer users use Windows. I believed that even people who don't use Windows would know what kind of program is being recommended. It seemed like a good choice. -- There's been talk about end users being stupid and also about getting people to change their default applications. Let's face it, in today's world most users are not very computer literate...when it comes to it, most don't want to know nor care if they do, people just want to do task X or Y and then get on with their lives. I've noticed that more and more people are going over to Mac's, why? Because they are uber easy to use...are they not designed to make the end users experience as easy and enjoyable as possible? (Personally, I could not change over to a Mac.) >> mac users are smart enough we just global-change them away Hmmm, I know many people who use Mac and this statement certainly does not apply to them! It's really not that easy to get people to change their default apps. How long has Mozilla been trying to get people to leave IE, yet the FireFox market share is still small compared to IE. We have to remember that it's not just about what is better or worse. Why do some 3D artist use 3D Studio Max and others Maya - some musicians use Logic Audio, some Cubase - certain webmasters use DreamWeaver, others GoLive. Myself, I use HTML-Kit. These and other apps are all as good as each other yet some people prefer one more than the next. It's the same reason why I couldn't use a Mac. It just doesn't sit right in my mind. I use certain apps because I like the way they work not because they are necessarily the best. We have to choose a workflow that is right for each of us. I still use a Windows box because Linux doesn't do everything I want...I'm sure it will eventually, just not yet. We live in a world of choices so should not PG try and cover most of those? You'll never cover every base but if using CR+LF covers most, then is this not the best choice? -- babelfish16.pl - I didn't spend much time using it but first impressions are that the interface is not very pretty. All those clunky buttons and drop-down boxes. I would rather use Notepad to be honest. In the past I've tried reading eBooks on my laptop, not all that enjoyable really. For the past two years I've been reading on my PDA, still not the best option but I did prefer it over the laptop....even though it only holds 40-50 words per page. Very recently I purchased a Sony Reader (PRS-500). Yes, there are a number of issues Sony will need to rectify but for me this type of device is the best option we have, reading on this is a real joy. It is certainly the closest thing to a [pocket] paper book experience around. When I read a book (novel) it is a linear experience, I start at page one and read each page in turn until I reach the end (A 'Manual' is of course a different beast altogether). Is this not what most PG texts are, novels/stories and are therefore read in this linear fashion? Mike Cook Editor, PG-News http://www.pg-news.org From schultzk at uni-trier.de Fri Feb 9 00:38:11 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri, 9 Feb 2007 09:38:11 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB2380.8090208@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> Message-ID: Hi Marcello, First off I like to reply to another message(unusual I know) I can not give a cite for where that PG text files have to have cr+lf line endings. But, from discussions from way, way back I am most certain that they were to hav cr+lf endings. Personally, I could care less what line ending PG uses, as a computer geek it does not matter I can handle it! As we all know any archive, data base, or format has a specification! Or for that matter conventions. They are a necessity. This is also true in natural language. I have for decades argued that PG should adopt some kind of markup. At least as a base format. It should be simplistic and minimal. For example have a Line ending, paragraph mark, and a few others in my opinoin. Furthermore, to avoid the on going flame wars it should not be an offspring of an existing system, though it may resemble one. Yes, PG has a line ending, and paragraph mark (two consecutive line endings), but as the wars going on here there should be more. Another, important prerequist of such a format would that it does infringe on the abilities of ongoing projects for PG. As you can see we agree. As a matter of fact everybody agrees that PG needs a better file format. Once we have this format. it is not a problem to create the standard plain vanilla etexts that PG is so well known for. The question is how do we get Mr. Hart to agree with us so that PG gets a truely usuable BASE FORMAT, so that everbody can get the most out of PG texts. My sugestion would a be a tagged mark-up having: Line ending Paragraph Footnote Chapter Escape (for when you need to have the tags verbatim) Header (for the PG part) as a must. As candy for the aestheticians: Page Bold Italics. As I see it such a markup would aide all projects. Also, it would not be too much work for those contributing texts to PG to add these in. Naturally, it remains what to do about the older text in PG. Well, we can only use what we have, Line ending and paragraph marks. A relatively easy task. You may ask, why I do not do it. It is not PG policy and my ego is not that big, that I need a monument. If and when PG adopts officially such a markup I will gladly help. I am not willing to waste my time in a niche that is not officially supported. I also do not see a problem or conflict with other project ongoing with PG. They all should be able to easily convert they format into the above mentioned format, for others to use, which would benefit the other projects. The lack of a decent base format for PG is its biggest problem!! I wish Mr. Hart would finally realize this fact and allow for a new official base format for PG. The Plain Vanilla Etexts will not go away, but PG will become a more valuable resource for free etexts, and the contributors can spend more time with there text and systems instead of their war against each other. It is such a waste of resources. regards Keith. Am 08.02.2007 um 14:20 schrieb Marcello Perathoner: > Jon Noring wrote: > >> This means that PG plain texts should still use CR+LF for newlines, >> at least until Notepad sees the light, or there's a decision that >> Notepad is no longer recommended to locally view PG plain texts. > > I don't want to denigrate Windows users (Note for Windows users: > "denigrate" means: put down) but it is stupid for a "literary > archive" to encode texts following ephemeral conventions of the day. > >> From the Volunteer's FAQ: > >> This section of the FAQ goes into great detail about all kinds of >> formatting questions. However, looked at from a higher level, the >> only real issue is that we want to render texts clearly, with >> formatting that reflects the original, so that readers of the plain >> text format can read them easily, and people converting them to other >> formats can do so reliably. > > Now, some will argue that "read them easily" is more important than > "converting them to other formats" because millions of people want to > read and just a few want to convert. But this is a short-sighted > argument. > > If there would be a reliable way to convert to other formats, these > conversion would have been long since implemented at PG giving > readers a > vast choice of different formats for every PG text. > > The short-sighted decision to make a presentational format the main PG > format has, on one hand made it possible to read PG texts on the > ubiquitous Notepad, on the other hand made it *impossible* to port the > texts to anything else. Empirical proof: many programmers have > tried and > none succeeded. > > PG's impact on the world would be much greater if PG would offer a > format that allows reliable conversion and the toolchain to do that > conversion online at PG or locally by any user or commercial venture. > > > Everybody is complaining about young people not reading enough gadda > gadda gadda, but why should they read, if reading means staying at > home > and staring at fixed fonts in Notepad? When they can easily hang > around > town and watch videos on their iPods? > > The future of PG is to get the books onto the devices people carry > around, like cellphones, PDAs, iPods, car navigation systems, > gameboys, > PlayStation Portables, etc. > > > Hand on the torch ... > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Fri Feb 9 00:49:55 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri, 9 Feb 2007 09:49:55 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1203301105.20070208072822@noring.name> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> Message-ID: <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> Am 08.02.2007 um 15:28 schrieb Jon Noring: > Marcello wrote: >> Jon Noring wrote: > >>>>> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >>>>> Windows NotePad is a good a program to use for viewing. > >>> Well, it looks like PG still recommends Windows users use Notepad >>> for the local viewing of PG plain texts. > >> No. It looks like Michael still recommends Notepad for reading the >> newsletter. > > I stand corrected. > > However, if Michael recommends this for his newsletter, would he also > recommend (or suggest) using Notepad for reading PG plain texts on > Windows systems? > Yes, windows is the most popular system. > Were PG to institute a policy that all plain texts in the PG archive > are to have LF (and not CR+LF) to identify new lines, this would > preclude the use of Windows Notepad to read them. Hopefully Michael > and/or Greg will weigh in on this topic since it does suggest > something to discuss in the guidelines. (If not, PG may find more and > more plain texts submitted to the archive which use LF and thus will > be unreadable in Notepad -- not exactly a desired result so long as > plain texts still form the core of the PG collection.) One reason for having Cr+lf as a line ending is that pratically any system will find a line ending. Most modern(!) programs generally ignore the extra character today and display the lines correctly. This was not true in the past and one would have this ugly character at the end of a line. Keith. From schultzk at uni-trier.de Fri Feb 9 00:51:40 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri, 9 Feb 2007 09:51:40 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45CB42EE.5010407@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> <45CB42EE.5010407@perathoner.de> Message-ID: Excuse me, People use their cell phones for readning PG etexts!! grin Keith. Am 08.02.2007 um 16:34 schrieb Marcello Perathoner: > David Starner wrote: > >> The only real difference >> is whether Windows users (that is, the majority) can read the text >> files in the default tool on their system. > > If you put the horse before the cart you'll find that users are > reading > on their PCs because it is impossible to read the texts anywhere else. > > In 2006 1 billion cell phones were sold but only 209 million PCs. > (From > Steve Jobs's keynote about the iPhone.) Basically there are 5 times as > many cell phones around as PCs. > > Windows users are a "majority" only because arbitrary PG format > choices > prevent owners of other digital equipment from using the texts. > > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Fri Feb 9 01:02:35 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Fri, 9 Feb 2007 10:02:35 +0100 Subject: [gutvol-d] !@! Just One More Thing Re: Plain Text, Hand on the Torch In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB2380.8090208@perathoner.de> <6d99d1fd0702080543o37007fd5kee47846394408cb4@mail.gmail.com> Message-ID: <4BA2D95B-B895-4B25-BC1E-367E862AC0F9@uni-trier.de> Am 08.02.2007 um 18:48 schrieb Michael Hart: > > A computer user who only uses the defaults can be compared to the > person who buys a 21 speed bicycle for $500 and then only uses it > in whatever gear it was in when they bought it. . . . I can not agree with more. As a side note and probaly more to our world of computers. How many people use Word just a better typewriter. They do not use the formating templates for paragraph, etc. See it all the time here at the university. > > You probably would NOT believe the numbers of highly placed in an > era of punditry who continually complain about their default font > as if Project Gutenberg chose it for them. > > Any volunteers for a ahort FAQ on how to choose your own fonts?!? I would love to do it, but who would want to read a 50 page FAQ describing the ramification of choosing a font. Keith. From Bowerbird at aol.com Fri Feb 9 03:41:58 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 06:41:58 EST Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: keith said: > The question is how do we get?Mr. Hart to agree with us > so that PG gets a truely usuable BASE FORMAT, > so?that everbody can get the most out of PG texts. keith, keith, keith, you were doing so good right up until then. you think "mr. hart" is who needs to have his head turned around? what a laugh! :+) you're _wrong_, buddy, and i mean 180-degrees the wrong way. michael hart is the visionary here with his insistence on plain text. if he wouldn't have demanded a plain-text copy of every e-text, i probably wouldn't have even realized that plain-text copies have embedded within 'em, most of the info we need to know about 'em. (which by the way, goes far beyond your basic list, keith.) and i would have never took that idea, run with it, and made it work. and now the world is gonna have "a truly usable base format", one "so that everybody can get the most out of p.g. texts", and on top of all that, it's simple to learn too. a 4th-grader can do it. and all because little old me saw pure insight in michael's eyes... so, you script kiddies can just sit back and watch, while i show you -- with working code -- that michael hart is one true genius here. and that his format can kick the ass of your format, here to sunday. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/63398ba9/attachment.htm From hart at pglaf.org Fri Feb 9 03:51:04 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 9 Feb 2007 03:51:04 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: It's not so much what Mr. Bowerbird says, as that this was a request for Project Gutenberg to pick, as "official," any one particular format over all the others. I prefer to encourage everyone to develop formats they enjoy without trying to enforce their decisions on other, rather to provide a range of formats, just as there are a range of book sizes, styles, appearances, etc. We are not going to annoint any one format as the only "official" format, though I am sure from time to time new formats will be invented and have certain times in greater vogue than other. Michael On Fri, 9 Feb 2007, Bowerbird at aol.com wrote: > keith said: >> The question is how do we get?Mr. Hart to agree with us >> so that PG gets a truely usuable BASE FORMAT, >> so?that everbody can get the most out of PG texts. > > keith, keith, keith, you were doing so good right up until then. > > you think "mr. hart" is who needs to have his head turned around? > > what a laugh! :+) > > you're _wrong_, buddy, and i mean 180-degrees the wrong way. > > michael hart is the visionary here with his insistence on plain text. > > if he wouldn't have demanded a plain-text copy of every e-text, > i probably wouldn't have even realized that plain-text copies have > embedded within 'em, most of the info we need to know about 'em. > (which by the way, goes far beyond your basic list, keith.) > > and i would have never took that idea, run with it, and made it work. > > and now the world is gonna have "a truly usable base format", > one "so that everybody can get the most out of p.g. texts", and > on top of all that, it's simple to learn too. a 4th-grader can do it. > > and all because little old me saw pure insight in michael's eyes... > > so, you script kiddies can just sit back and watch, while i show you > -- with working code -- that michael hart is one true genius here. > and that his format can kick the ass of your format, here to sunday. > > -bowerbird > From hart at pglaf.org Fri Feb 9 03:58:20 2007 From: hart at pglaf.org (Michael Hart) Date: Fri, 9 Feb 2007 03:58:20 -0800 (PST) Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> Message-ID: On Fri, 9 Feb 2007, Schultz Keith J. wrote: > > Am 08.02.2007 um 15:28 schrieb Jon Noring: > >> Marcello wrote: >>> Jon Noring wrote: >> >>>>>> NOTE: Best viewed with a fixed-width font, i.e. Courier New. >>>>>> Windows NotePad is a good a program to use for viewing. >> >>>> Well, it looks like PG still recommends Windows users use Notepad >>>> for the local viewing of PG plain texts. >> >>> No. It looks like Michael still recommends Notepad for reading the >>> newsletter. >> >> I stand corrected. >> >> However, if Michael recommends this for his newsletter, would he also >> recommend (or suggest) using Notepad for reading PG plain texts on >> Windows systems? >> > Yes, windows is the most popular system. > >> Were PG to institute a policy that all plain texts in the PG archive >> are to have LF (and not CR+LF) to identify new lines, this would >> preclude the use of Windows Notepad to read them. Hopefully Michael >> and/or Greg will weigh in on this topic since it does suggest >> something to discuss in the guidelines. (If not, PG may find more and >> more plain texts submitted to the archive which use LF and thus will >> be unreadable in Notepad -- not exactly a desired result so long as >> plain texts still form the core of the PG collection.) > One reason for having Cr+lf as a line ending is that pratically > any system will find a line ending. Most modern(!) programs generally > ignore the extra character today and display the lines correctly. > This was not true in the past and one would have this ugly character at > the end of a line. > > Keith. I can't tell you how many files I have received that lose their margination when passing through various email systems, file conversion programs, etc. Right now I am proofreading a book for someonw who uses an editing program none of my friends seem to know much, and while I can sometimes see lines as true lines in the email, but not after the file is saved. It would be NICE if every program simply had an option to save the lines as lines with cr/lf if wanted, and vice versa. Of course, there are outboard programs that can change cr/lf to either cr or lf in just one second, providing you know which you want, and also vice versa. Once a reader figures out what they want, they can convert directories of files to their own specifications in literally seconds, so why force any of this on them? Michael From greg at durendal.org Fri Feb 9 05:54:15 2007 From: greg at durendal.org (Greg Weeks) Date: Fri, 9 Feb 2007 08:54:15 -0500 (EST) Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: <00c701c74bd7$dd502f20$0300a8c0@blackbox> Message-ID: On Thu, 8 Feb 2007, Michael Hart wrote: > Not too much to update about the OLPC as of yet. > > I do know the code to get beneath the Sugar shell, > if anyone gets one and wants to do Linus things, > but it will be a while before anyone but a developer > can get them, not to mention tat I'm not sure they > have yet settled on a final design. I CAN tell you > that they don't come with the original crank any more, > or even the pull cord, a la starting a lawnmower, > though these are available separately. I had a chance to talk to one of the people doing software development for the OLPC last night. The crank was removed from the base unit because the stress it put on the case caused them to fail. It was moved to an external power brick where they could build it stronger without increasing cost/size/weight of the base laptop. The second gen samples are amazingly durable. He participated when they took one of the samples and played football with it for 10 minutes, picked it up off the floor and booted it up. -- Greg Weeks http://durendal.org:8080/greg/ From Catenacci at Ieee.Org Fri Feb 9 07:38:06 2007 From: Catenacci at Ieee.Org (Onorio Catenacci) Date: Fri, 9 Feb 2007 10:38:06 -0500 Subject: [gutvol-d] Scan Of Time Capsule Book Message-ID: Hi all, Saw this item on Boing Boing And I looked at the various versions of the actual text they offer on the website. It looks like the TXT format was scanned and never cleaned up at all. Anyway considering the interests of volunteers for Project Gutenberg, I thought this link might be of interest to some of the folks following this mailing list. -- Onorio From Bowerbird at aol.com Fri Feb 9 10:34:46 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 13:34:46 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: greg said: > I had a chance to talk to one of the people doing software development > for the OLPC last night. The crank was removed from the base unit > because the stress it put on the case caused them to fail. It was > moved to an external power brick where they could build it stronger > without increasing cost/size/weight of the base laptop. i had a chance to hear nicholas negroponte talk last night. he said that he considers the crank to have been an immense success -- in spite of the fact that it had all kinds of problems, including inefficiency -- because it created a vivid memory that communicated an _extremely_ important concept, namely that this device is powered at a human level. i forget the unit of measurement, but the olpc machine runs in 2 units, while the typical laptop requires 25-30, meaning an energy requirement that has been slashed 90%, to the point where human energy can run it. they've also developed solar panels -- a $10 panel will charge one olpc -- and a lawnmower-style pull-cord charger, and a generator that can be placed on a bicycle so the kids pedal-power is converted into a charge... -bowerbird p.s. the other two items where the olpc machine is _different_from_ a mere "stripped-down" regular laptop is the _mesh_network_ and a dual-mode screen which works well in sunlight _and_ in the dark. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/6ef1ae42/attachment-0001.htm From Morasch at aol.com Fri Feb 9 10:52:48 2007 From: Morasch at aol.com (Morasch at aol.com) Date: Fri, 9 Feb 2007 13:52:48 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: joey said: > It's not clear to me what the value is of > going to an intermediate magic character. if you know what line-ending you want to end up with, you can just change the other two to the one you desire. but if i had written it up that way, i would've had to use a bunch of "if your desired line-ending is..." type clauses, and it would have messed with the simplicity of the recipe. > Marcello has already demonstrated > a perfectly reasonable example of > how to replace all newlines of any type > with the desired newline. well, first of all, my recipe came before his one-liner. second of all, he knew what line-ending he wanted, and he only showed the code for that particular one. third of all, a perl one-liner is fine for the technoids, but what most real people want is a clear-cut recipe they run in a word-processor, no scripting required. fourth of all, writing it up the way i did allowed me to share the knowledge some files have mixed newlines. it was good of you to pick up on that little hint, joey, and to feel the responsibility to correct that problem. by the way, joey, can you modify your own script just a bit to have it report how many files utilize a cr as their linefeed, and how many use an lf? also, the actual _list_ of e-text numbers for those 52 files with _mixed_ line-endings would be great. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/549c1c1a/attachment.htm From greg at durendal.org Fri Feb 9 10:54:55 2007 From: greg at durendal.org (Greg Weeks) Date: Fri, 9 Feb 2007 13:54:55 -0500 (EST) Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: On Fri, 9 Feb 2007, Bowerbird at aol.com wrote: > p.s. the other two items where the olpc machine is _different_from_ > a mere "stripped-down" regular laptop is the _mesh_network_ and > a dual-mode screen which works well in sunlight _and_ in the dark. The screen is very important. It's also far tougher than almost any other screen manufactured. It's really two LCD screens on top of each other. A lower res color with a high res 800x600 monochrome on top of it. A really neat design. -- Greg Weeks http://durendal.org:8080/greg/ From Bowerbird at aol.com Fri Feb 9 11:50:54 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 14:50:54 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said: > Of course, the one downside is that a text structured like this makes > it very difficult (not impossible) to build stable pointers to. In XML, > one can add an "id" to elements, providing a hook to attach a pointer as usual, jon, you attribute magic to x.m.l. that is available elsewhere; many plain-text markup systems enable inclusion of an element "id". my system is less accommodating in one respect, in that you cannot include an arbitrary "id" at will, but it is also far _more_ accommodating on the other hand, in that it _automatically_ generates many "id" for you, specifically those that have to do with section-headers. so, for instance: > http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html#chapter10dashdashhotlinks that u.r.l. will take a person directly to the test-suite chapter on hotlinks, and i didn't have to do anything to make that id-based hotlink happen. here's that same link in another copy of the document located elsewhere: > http://www.greatamericannovel.com/zen/suite-zml.html#chapter10dashdashhotlinks so what you're saying is "very difficult" has already been accomplished. and if i decide that i want to give people the ability to arbitrarily designate certain parts of the book with an "id" element, i can easily add that capability. markdown and textile already have it; it's a known and working functionality. but to my mind, the necessity to code these "id" elements is symptomatic of my overall philosophical problem with heavy markup -- it seems unnecessary. why should any author have to code each piece of their file for it to be linkable? why should i -- as someone who wants to link to a specific piece of a file -- have to _depend_on_ the author having coded that piece as being linkable? in short, this style of linking is simply too quaint. there's a better way to do it. and that better way is already demonstrated to us, if we look around a bit. for instance, a while back, i made an offer to the distributed proofreader people. they didn't act on it sufficient quickly, so yesterday i informed them it had expired. you can see this by following this link: > http://www.pgdp.net/phpBB2/viewtopic.php?p=284440& highlight=the+offer+has+now+expired#284440 you're taken to the exact message thanks to an "id" element, but _then_, the actual words are highlighted because they were placed into the u.r.l.; change those words, and watch the highlighted text change accordingly. now, that highlighting magic is due to a nice little bit of .php scripting. it would be simple to change it, so instead of highlighting those words, a _search_ was executed so the person would be taken directly to them. and _that_ is the future of linking -- a simple "jump to this phrase" mode, executed by the browser, in the event a phrase is appended to any u.r.l. that method -- like any other linking method -- will "break" at times, yes, so maybe there's a way to harden it against that. or maybe we wouldn't _want_ to harden against that, because if the text has been changed, then perhaps it is only appropriate that your link should break. these are thorny details that bear on linking philosophy and implementation, but the point for our purpose right now is to realize that we're still innovating the web... today's forms of linking might well seem gross and primitive in ten years... and it's _certainly_ not the case that we need heavy markup to do linking. that's just more noring f.u.d. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/7aa8afe3/attachment.htm From Bowerbird at aol.com Fri Feb 9 11:52:25 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 14:52:25 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said: > We'll see if crossmark has any legs. um, it'll be going out to over 9 million kids in the next year or two. so i think it will have plenty of "legs"... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/f9843a6b/attachment.htm From jon at noring.name Fri Feb 9 12:51:14 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 13:51:14 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <13010029302.20070209135114@noring.name> Bowerbird wrote: > jon noring said: >>?? We'll see if crossmark has any legs. > um, it'll be going out to over 9 million kids in the next year or two. > > so i think it will have plenty of "legs"... True. But what I meant by my comment is if crossmark will be sufficient for their needs. As noted before, when I started looking at how to build an interactive system (including referencing, citation and annotation) around that non-XML "markup" vocabulary, it began to look hairy, especially for texts which are not fixed but will morph, such as educational texts. So the question is will that vocabulary meet their needs in the future, or will they have to modify it accordingly? What will be its limitations? The lack of markup suitable to hang id's on (or to use something akin to XPointer where one can "count" tags), makes it much harder to create links to texts which will undergo morphing over time. Jon Noring (p.s., I was peripherally involved with OSoft's talks with the OLPC folk regarding dotReader, so I know some behind-the-scenes info, although it was a while ago. Looking on things, I see that if 'crossmark' is not suitable for all applications, then if:books "Sophie" may be more suitable. Sophie is more sophisticated, is focused more on multimedia, and under-the-hood is XML which, in principle at least, should have more link stability as publications morph over time. When one looks at the requirements for using etexts in interactive, collaborative environments, allowing annotation, bookmarking, highlighting, referencing/citation, integration with social networking, addition of multimedia, etc., etc., this brings out the deficiencies in systems based on "plain text" formats. 'crossmark' is certainly a little more than just plain text, but it still shares a lot of its design with plain text with regards to identification of document structure and text semantics. From jon at noring.name Fri Feb 9 13:06:38 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 14:06:38 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <1188975646.20070209140638@noring.name> Bowerbird wrote: > jon noring said: >>?? Of course, the one downside is that a text structured like this makes >>?? it very difficult (not impossible) to build stable pointers to. In XML, >>?? one can add an "id" to elements, providing a hook to attach a pointer > my system is less accommodating in one respect, in that you cannot > include an arbitrary "id" at will, but it is also far _more_ accommodating > on the other hand, in that it _automatically_ generates many "id" for you, > specifically those that have to do with section-headers.? so, for instance: >>?? http://snowy.arsc.alaska.edu/bowerbird/test-suite/test-suite.html#chapter10dashdashhotlinks > that u.r.l. will take a person directly to the test-suite chapter on hotlinks, > and i didn't have to do anything to make that id-based hotlink happen. In XML one can certainly build third-party applications using XPointer to point to any spot in an XML document without having to call up the document author and ask them to add an 'id'. > here's that same link in another copy of the document located elsewhere: >>?? http://www.greatamericannovel.com/zen/suite-zml.html#chapter10dashdashhotlinks > > so what you're saying is "very difficult" has already been accomplished. But you are pointing to pre-defined structures. A third party application, such as for text-annotation, may want to point to a paragraph, or to a phrase or word within that paragraph. In plain text, there's limited hooks to hook onto, and those hooks tend to be more unstable should the text undergo some minor updating. (I see you cover that below...) > and if i decide that i want to give people the ability to arbitrarily designate > certain parts of the book with an "id" element, i can easily add that capability. > markdown and textile already have it; it's a known and working functionality. But this is problematic in three ways. 1) Adding an "id" in the flow of the text no longer makes it "plain text". 2) Even if one built a "separate table" to enable pointers, it still relies upon the source text not morphing much if at all. 3) One oftentimes doesn't want a third-party application to edit the source text at all. So pointing is all that is allowed. > that method -- like any other linking method -- will "break" at times, yes, > so maybe there's a way to harden it against that.? or maybe we wouldn't > _want_ to harden against that, because if the text has been changed, then > perhaps it is only appropriate that your link should break.? these are thorny > details that bear on linking philosophy and implementation, but the point > for our purpose right now is to realize that we're still innovating the web... > today's forms of linking might well seem gross and primitive in ten years... I agree with this. The Reading 2.0 conference last March discussed alternatives for identifying, which is a component of pointing. > and it's _certainly_ not the case that we need heavy markup to do linking. > that's just more noring f.u.d. Or Bowerbird f.u.d.? The point is that plain text creates more problems for interactivity than would XML. One reason is that in XML, document structure and text semantics are assigned not by the content itself, but by characters outside of content. In plain text, one relies upon content characters, such as spaces, tabs, EOL characters (the "white space") in order to communicate structure. This leads to documents which are a lot more brittle and touchy, plus the fact that one is limited in what structures and text semantics can be unambiguously identified. Jon From marcello at perathoner.de Fri Feb 9 14:05:41 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri, 09 Feb 2007 23:05:41 +0100 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <45CCF035.5060101@perathoner.de> Bowerbird at aol.com wrote: > but to my mind, the necessity to code these "id" elements is symptomatic of > my overall philosophical problem with heavy markup -- it seems unnecessary. > why should any author have to code each piece of their file for it to be > linkable? Your philosophical problem would have dissolved if you had educated yourself about XPath / XPointer. http://www.example.org/doc.tei# xpointer(//div[@type="chapter"][7]/p[2]/ range-to(//div[@type="chapter"][7]/p[5])) (put all in one line) selects the 2nd to the 5th para in the 7th chapter of doc.tei without any need for an id in the text. You can use this to cite other XML documents down to the granularity of their markup. If you are a linguist and mark up down to phoneme level you can cite a single or a group of phonemes if you want. Because this is an XML standard it will work with any XML document, be it TEI, XHTML, DocBook, XPS, etc. You see, you don't have to reinvent the wheel because XML comes with batteries included. -- Marcello Perathoner webmaster at gutenberg.org From cannona at fireantproductions.com Fri Feb 9 14:24:46 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Fri, 9 Feb 2007 16:24:46 -0600 Subject: [gutvol-d] Sugar-based reader for OLPC? References: <13010029302.20070209135114@noring.name> Message-ID: <001f01c74c99$455f9df0$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think the idea that they will be stuck with crossmark in its current form is not exactly correct. From what I understand, the laptops will be capable of receiving updated software, , so one of those updates could easily include extensions to the crossmark format. But I agree with Jon, it will be interesting to see how it evolves over time, if it doesn't get replaced entirely. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: "Jon Noring" To: Sent: Friday, February 09, 2007 2:51 PM Subject: Re: [gutvol-d] Sugar-based reader for OLPC? Bowerbird wrote: > jon noring said: >> We'll see if crossmark has any legs. > um, it'll be going out to over 9 million kids in the next year or two. > > so i think it will have plenty of "legs"... True. But what I meant by my comment is if crossmark will be sufficient for their needs. As noted before, when I started looking at how to build an interactive system (including referencing, citation and annotation) around that non-XML "markup" vocabulary, it began to look hairy, especially for texts which are not fixed but will morph, such as educational texts. So the question is will that vocabulary meet their needs in the future, or will they have to modify it accordingly? What will be its limitations? The lack of markup suitable to hang id's on (or to use something akin to XPointer where one can "count" tags), makes it much harder to create links to texts which will undergo morphing over time. Jon Noring (p.s., I was peripherally involved with OSoft's talks with the OLPC folk regarding dotReader, so I know some behind-the-scenes info, although it was a while ago. Looking on things, I see that if 'crossmark' is not suitable for all applications, then if:books "Sophie" may be more suitable. Sophie is more sophisticated, is focused more on multimedia, and under-the-hood is XML which, in principle at least, should have more link stability as publications morph over time. When one looks at the requirements for using etexts in interactive, collaborative environments, allowing annotation, bookmarking, highlighting, referencing/citation, integration with social networking, addition of multimedia, etc., etc., this brings out the deficiencies in systems based on "plain text" formats. 'crossmark' is certainly a little more than just plain text, but it still shares a lot of its design with plain text with regards to identification of document structure and text semantics. _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFFzPT3I7J99hVZuJcRAvxNAJ0WvtgYPL0FLhGZ2f8sqyt+WHM4bACdFg+C ionP+dWNvoRZb6vheuVDjow= =17G4 -----END PGP SIGNATURE----- From jon at noring.name Fri Feb 9 14:26:57 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 15:26:57 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: <45CCF035.5060101@perathoner.de> References: <45CCF035.5060101@perathoner.de> Message-ID: <1657102868.20070209152657@noring.name> Marcello wrote: > Bowerbird at aol.com wrote: >> but to my mind, the necessity to code these "id" elements is symptomatic of >> my overall philosophical problem with heavy markup -- it seems unnecessary. >> why should any author have to code each piece of their file for it to be >> linkable? > Your philosophical problem would have dissolved if you had educated > yourself about XPath / XPointer. > > http://www.example.org/doc.tei# > xpointer(//div[@type="chapter"][7]/p[2]/ > range-to(//div[@type="chapter"][7]/p[5])) > > (put all in one line) selects the 2nd to the 5th para in the 7th chapter > of doc.tei without any need for an id in the text. > > You can use this to cite other XML documents down to the granularity of > their markup. If you are a linguist and mark up down to phoneme level > you can cite a single or a group of phonemes if you want. > > Because this is an XML standard it will work with any XML document, be > it TEI, XHTML, DocBook, XPS, etc. > > You see, you don't have to reinvent the wheel because XML comes with > batteries included. Another touch?! An important consideration is to use, whenever possible, established standards. Bowerbird has so painted himself into the "XML is so utterly evil" corner, that he ignores the fact that XML is, day by day, becoming more and more ubiquitous, and is found in places one would not expect, such as banking. Is XML the best for many of these applications? Possibly not, but it is an open standard that works for a large number of applications. It has a huge number of people and top organizations backing it. It now has a huge tool base, and it is a unifying framework tying together normally disparate entities. XML itself has problems, but then every standard in the universe has problems because we live in an imperfect world. Jon From jon at noring.name Fri Feb 9 14:48:39 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 15:48:39 -0700 Subject: [gutvol-d] TeleRead articles on PG's "Tarzan of the Apes" Message-ID: <1368470988.20070209154839@noring.name> A few here might be interested in two articles in today's TeleRead about PG's "Tarzan of the Apes" etext: http://www.teleread.org/blog/?p=6164 http://www.teleread.org/blog/?p=6168 Jon From Bowerbird at aol.com Fri Feb 9 17:50:57 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 20:50:57 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said; > When one looks at the requirements for using etexts in interactive, > collaborative environments, allowing annotation, bookmarking, > highlighting, referencing/citation, integration with social networking, > addition of multimedia, etc., etc., this brings out the deficiencies > in systems based on "plain text" formats. you're utterly and completely wrong. and i'll prove it with pudding. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/99740c68/attachment.htm From Bowerbird at aol.com Fri Feb 9 18:28:49 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 21:28:49 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: aaron said: > one of those updates could easily include > extensions to the crossmark format you have a fundamental misunderstanding of the project. the crossmark format might _well_ "evolve" over time, but it won't be because of "extensions" that are "received" as "software" that was being "downloaded" as an "update". no, in _stark_ contrast, if the format "evolves", it will be because the kids themselves saw fit to improve upon it. it's hard for "top-down" westerners to understand that the olpc is a tool in service to a pedagogical philosophy that the students learn in collaboration with each other. this is one of the main reasons negroponte is refusing to allow the machines to "dribble out", especially in the west (and especially not to gadget freaks who just want to play with the latest toy, or see how it works as an e-book reader, and who whine when they're told their money is no good, like the spoiled-rotten first-world ugly-americans they are). instead, he wants the machines placed into an infrastructure truly dedicated to this revolutionary educational methodology. (he also says -- rightly -- that as long as the machine has zero presence in the commercial world, there'll be no gray market.) the open-source ethic permeates the project. the expectation is that the kids will soon be modifying the actual _programs_! (and if they're doing that, then modifying a file-format is easy.) and this expectation is one they are actively putting into play. the keyboard has a specific _key_ dedicated to "view source"... (negroponte also bragged that they'd "lost" the caps-lock key. "it might seem disproportionate to create a $25-million project just to get rid of the caps-lock key, but you do what you have to," he joked. and yeah, the crowd got a good laugh out of that one.) it is also expected the kids will generate _course_materials_ and eventually end up sharing them throughout their nation. so there is an assumption of self-reliability that is unmistakable; it's not a project where decisions will be handed down from above. a good deal of care and attention has gone into making a tool that engenders the communication and collaboration necessary to pull off these expectations, and if it works -- even a little bit -- i'm certain that it will be tremendously stimulating to those kids... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/5e33e8cb/attachment-0001.htm From Bowerbird at aol.com Fri Feb 9 18:31:27 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 21:31:27 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said: > XML is, day by day, becoming more and more ubiquitous trends come. and trends go. there is nothing as certain as change. nothing. the more experience people have with x.m.l., the more they'll want to ditch it. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/a1fca4c2/attachment.htm From Bowerbird at aol.com Fri Feb 9 18:36:26 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 21:36:26 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said: > plain text creates more problems for interactivity than would XML. my plain-text system will be doing backflips while your x.m.l. is still comatose. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/fa77ab73/attachment.htm From Bowerbird at aol.com Fri Feb 9 18:37:42 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Feb 2007 21:37:42 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: marcello said: > Your philosophical problem would have dissolved > if you had educated yourself about XPath / XPointer. which one, xpath or xpointer? aren't they different? > http://www.example.org/doc.tei# > xpointer(//div[@type="chapter"][7]/p[2]/ > range-to(//div[@type="chapter"][7]/p[5])) > (put all in one line) selects > the 2nd to the 5th para in the 7th chapter of doc.tei > without any need for an id in the text. and now tell me which browsers support this... yeah, that's what i thought. but you're making my point for me, which is that manual markup of "id" elements will soon become unnecessary... even the markup technoids agree that it's unnecessary, and woefully inadequate on top of that. ya know, marcello, all this agreement between us is starting to grate on my nerves... how about yours? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070209/4ccadc1d/attachment.htm From jon at noring.name Fri Feb 9 18:41:57 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 19:41:57 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <785130834.20070209194157@noring.name> Bowerbird wrote: > jon noring said: >>?? XML is, day by day, becoming more and more ubiquitous > trends come.? and trends go.? there is nothing as certain as change.? nothing. Agreed on this. So far I see nothing on the horizon which is poised to replace XML for a large number of applications now using XML. All I see is continued growth and expansion as people become familiar with XML and the huge tool and application sets that have been built to process XML and associated technologies. And you forget that there are two basic areas that XML operates in: digitized textual content, and data structuring. (Some might say all applications of XML, even for PG texts, is ultimately data structuring.) The biggest application of XML by far is for data applications. > the more experience people have with x.m.l., the more they'll want to ditch it. Is your comment attributable to noted experts in the field, or simply your opinion? Opinions are fine, I have a million of them myself, but it is good to prepend what you wrote "I believe that the more experience...". Without it, I assume you are referring to the positions of noted authorities in the areas of digital texts and databases? So, I welcome some sort of references sufficient for those of us interested to look up their positions. Jon From jon at noring.name Fri Feb 9 18:51:19 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 19:51:19 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <1205879424.20070209195119@noring.name> Bowerbird wrote: > but you're making my point for me, which is that manual > markup of "id" elements will soon become unnecessary... Who said that adding 'id' is always manual. In the BookX project I'm working on, it is expected that the script to check a BookX document for conformity will also add in 'id' values that don't already exist automatically. Push a button, and they are identified. (And note these *will* work in all browsers as fragment identifiers.) And it is interesting with the group I worked with for testing out BookX, where they were working with the XML, pointy brackets and all, in epcEdit. Within a few minutes they had it mastered. They don't have to worry about number of lines, or indentation so many characters, or tabs, or all the funky layout crap in ZML. The tool did a lot of the work for them. So my experience is that with the right vocabulary, small publishers will quickly learn, and with a nice tool like epcEdit, that they can just as easily build an XML document as it is to build a plain text file, and BookX already defines a larger number of structures than ZML ever can, and it can be extended. So there, you want simplicity? Jon From jon at noring.name Fri Feb 9 18:53:07 2007 From: jon at noring.name (Jon Noring) Date: Fri, 9 Feb 2007 19:53:07 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <914236729.20070209195307@noring.name> Bowerbird wrote: > jon noring said: >>?? plain text creates more problems for interactivity than would XML. > my plain-text system will be doing backflips while your x.m.l. is still comatose. Next thing you'll probably demand is that we pull our pants down and compare our manhoods. Jon From robert_marquardt at gmx.de Sat Feb 10 08:21:40 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Sat, 10 Feb 2007 17:21:40 +0100 Subject: [gutvol-d] [gutvol-p] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <20070202003742.GA25119@joeysmith.com> References: <20070202003742.GA25119@joeysmith.com> Message-ID: <58srs29jk0h1rttbh3r6f5cq82qdqhsbel@4ax.com> On Thu, 1 Feb 2007 17:37:42 -0700, you wrote: >Is there any good reason not to unify these to one particular style >of newline across the board? "\r\n" seems a likely candidate since >the other platforms are generally more lenient towards Windows >line-endings than Windows is toward theirs, in my (albeit limited) >experience. I just ran across "lani10.txt2 aka #2509 "The Lani People". I had to convert it to \r\n for my SF CD. I think we should change the books to \r\n lineends if these are real inconsistencies and not intentional formatting. This is not about formatting but about correcting inconsistencies. We have many more of them. With the inconsistencies corrected the formatting question can be discussed and decided (or not). If it is decided and a global change of the files is needed then it can be done by a program and it does not stumble over the inconsistencies. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From marcello at perathoner.de Sat Feb 10 11:48:44 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 10 Feb 2007 20:48:44 +0100 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <45CE219C.8000205@perathoner.de> Bowerbird at aol.com wrote: > this is one of the main reasons negroponte is refusing to > allow the machines to "dribble out", especially in the west > (and especially not to gadget freaks who just want to play > with the latest toy, or see how it works as an e-book reader, If this is true, it is a very stupid idea. PG is not going to support this format if a machine doesn't "dribble out" into my hands for testing. And if they wait for Bowerbird to implement this: One Laptop per Octagenarian. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Sat Feb 10 11:51:02 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 14:51:02 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: jon noring said: > Next thing you'll probably demand is that we > pull our pants down and compare our manhoods. um... it's one thing to make comments as unprofessional as that. (and this isn't the first time you've made that specific one...) but i think it's particularly unseemly to try and make it appear that _i_ would be the one saying it. that's dishonest. so don't do that again, jon. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/6113c195/attachment.htm From marcello at perathoner.de Sat Feb 10 11:54:50 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 10 Feb 2007 20:54:50 +0100 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <45CE230A.2060507@perathoner.de> Bowerbird at aol.com wrote: > which one, xpath or xpointer? aren't they different? One uses the other. > and now tell me which browsers support this... http://xpointerlib.mozdev.org/ > but you're making my point for me, which is that manual > markup of "id" elements will soon become unnecessary... Which browsers are going to support ZML ? -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Sat Feb 10 11:59:09 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 14:59:09 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: marcello said: > If this is true, it is a very stupid idea. > PG is not going to support this format > if a machine doesn't "dribble out" > into my hands for testing. so you're the person who decides what formats p.g. will "support"? and you're saying you'll convert the library to their format? but only if they put one of their machines "in your hands" for "testing"? well, i'm sure they'll be glad to have the situation defined so clearly. -bowerbird p.s. but hey, if i could, i'd pay $100 for a machine for you, marcello, just so you'd convert the whole p.g. library to their crossmark format, because it's just a short distance from crossmark to my z.m.l. format... i haven't been looking for any help converting the p.g. library to z.m.l., but if someone would do it for me for $100, i'd be overjoyed to pay it... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/186d79eb/attachment-0001.htm From Bowerbird at aol.com Sat Feb 10 12:06:43 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 15:06:43 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: marcello said: > http://xpointerlib.mozdev.org/ typical. when you ask a technoid for an example, they send you to a bureaucratic spec-sheet. it's the traditional "snow-the-user" tactic. let me know when i can use links like the "examples" you furnished earlier, marcello. oh, and when that day comes, i will tell you that i was right when i said a better form of linking would be forthcoming, so we didn't need to use primitive forms like "id" tagging. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/42f2360b/attachment.htm From Bowerbird at aol.com Sat Feb 10 12:20:15 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 15:20:15 EST Subject: [gutvol-d] TeleRead articles on PG's "Tarzan of the Apes" Message-ID: jon noring said: > TeleRead about PG's "Tarzan of the Apes" etext: in a nutshell, the p.g. e-text apparently wasn't based on the first edition. -bowerbird p.s. it _appears_ to be based on a 1984 edition, but we do not know whether or not that 1984 edition was based on some earlier edition, since -- like most p-books -- it offers the reader zero provenance... which means the p.g. version _could_ be based on an earlier edition, the same one as the 1984 edition. or maybe not. but who really cares? there's no question that "many modern editions" match the p.g. edition. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/4786dac9/attachment.htm From lee at novomail.net Sat Feb 10 13:23:07 2007 From: lee at novomail.net (Lee Passey) Date: Sat, 10 Feb 2007 14:23:07 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> Message-ID: <45CE37BB.6010102@novomail.net> Michael Hart wrote: > Once a reader figures out what they want, they can convert directories > of files to their own specifications in literally seconds, so why force > any of this on them? > > Michael So, why is it that it's not OK to force a reader to deal with multiple variants of "\r\n" or "\n" or "\r", but it /is/ OK to force them to deal with unnecessary line endings in the first place? From jon at noring.name Sat Feb 10 14:28:13 2007 From: jon at noring.name (Jon Noring) Date: Sat, 10 Feb 2007 15:28:13 -0700 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <1321921947.20070210152813@noring.name> Bowerbird wrote: > jon noring said: >>?? Next thing you'll probably demand is that we >>?? pull our pants down and compare our manhoods. > it's one thing to make comments as unprofessional as that. Just replying to your unprofessionalism in this whole discussion. I believe others saw the irony in my comment. > (and this isn't the first time you've made that specific one...) Nope, and it is fitting considering your general bragging tone. > but i think it's particularly unseemly to try and make it appear > that _i_ would be the one saying it.? that's dishonest. > > so don't do that again, jon. I've learned from a master at it named Bowerbird. Jon From marcello at perathoner.de Sat Feb 10 14:28:23 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 10 Feb 2007 23:28:23 +0100 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: References: Message-ID: <45CE4707.1050906@perathoner.de> Bowerbird at aol.com wrote: > so you're the person who decides what formats p.g. will "support"? Almost ... I'm the only who knows how the compiler works :-) > and you're saying you'll convert the library to their format? but > only if they put one of their machines "in your hands" for "testing"? I'm saying that PGTEI will only convert to open source formats I can get to play with, like plucker on my cell phone (and I bought that myself). So if I can't buy one I don't see how I'm going to write a converter for it. But maybe you want to step in and write one over the weekend without having ever seen the hardware. > p.s. but hey, if i could, i'd pay $100 for a machine for you, marcello, You cannot afford $100 ? Better get a job. -- Marcello Perathoner webmaster at gutenberg.org From desrod at gnu-designs.com Sat Feb 10 14:42:52 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Sat, 10 Feb 2007 17:42:52 -0500 Subject: [gutvol-d] Sugar-based reader for OLPC? In-Reply-To: <45CE4707.1050906@perathoner.de> References: <45CE4707.1050906@perathoner.de> Message-ID: <1171147372.7043.9.camel@localhost.localdomain> On Sat, 2007-02-10 at 23:28 +0100, Marcello Perathoner wrote: > I'm saying that PGTEI will only convert to open source formats I can > get to play with, like plucker on my cell phone (and I bought that > myself). I hope you mean you bought the cellphone, not Plucker :) (not that we mind that our little project is generating revenue for anyone, just that they remain within the terms of the license binding Plucker if they do so :) -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/8a902d0c/attachment.pgp From Bowerbird at aol.com Sat Feb 10 15:10:59 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 18:10:59 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: marcello said: > So if I can't buy one I don't see how I'm going to write a converter i can understand the crossmark format fine, even without a machine. but maybe that's because it's very similar to my zen markup language. > But maybe you want to step in and write one over the weekend > without having ever seen the hardware. as soon as their format is finalized, yes, i will do _exactly_ that... and since i've done the work of programming routines that can identify and auto-correct the inconsistencies in the p.g. e-texts, my conversion will be a whole lot better than yours would be... > Better get a job. nah, i think i'll pass on that... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/e6df5109/attachment.htm From Bowerbird at aol.com Sat Feb 10 17:32:10 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 10 Feb 2007 20:32:10 EST Subject: [gutvol-d] Sugar-based reader for OLPC? Message-ID: i said: > but hey, if i could, i'd pay $100 for a machine for you, marcello unfortunately, they aren't making any sales to individuals... and i'm sure not gonna give you the machine they give me! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070210/e0320a85/attachment.htm From hart at pglaf.org Sat Feb 10 19:20:18 2007 From: hart at pglaf.org (Michael Hart) Date: Sat, 10 Feb 2007 19:20:18 -0800 (PST) Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CE37BB.6010102@novomail.net> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> Message-ID: On Sat, 10 Feb 2007, Lee Passey wrote: > Michael Hart wrote: >> Once a reader figures out what they want, they can convert directories >> of files to their own specifications in literally seconds, so why force >> any of this on them? >> >> Michael > > So, why is it that it's not OK to force a reader to deal with multiple > variants of "\r\n" or "\n" or "\r", but it /is/ OK to force them to deal > with unnecessary line endings in the first place? Sorry, I am not involved with the first issue you raised, but as to the second, we used both cr AND lf because virtually ALL the readers would read things with one or the other AND both, and we didn't want to leave anyone out. It's quite a different thing to get extra line endings than no line endings, not to mention how little effort it ever took to convert from cr/lf to either just cr or to lf, if someone really wanted to change the way it looked. Having eBooks that ALL can read is the most important factor, making various alterations to tailor them to various tastes opens up a dozen other cans of worms. Michael From marcello at perathoner.de Sun Feb 11 12:06:29 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 11 Feb 2007 21:06:29 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> Message-ID: <45CF7745.4060401@perathoner.de> Michael Hart wrote: > Having eBooks that ALL can read is the most important factor, > making various alterations to tailor them to various tastes > opens up a dozen other cans of worms. The most important factor is not to produce books people *can* read, but books people actually *want* to read. Not books you could *theoretically* read on a 386 class machine if you were a night watchman in a technology museum or otherwise able by hook or crook to get by such a relic of the past, but books that look appealing on the devices people use every day, like cell phones, PDAs, iPods etc. Outside your ivory tower the world has changed. The OLPC machine runs Linux and has a graphical interface. Charities don't accept anything less than pentium class machines. And, make no mistake, HTML is here to stay. If HTML was to go away the Internet Archive would become a gazillion petabytes of rubbish. Don't you think governments will salvage the world heritage set in HTML? The time has come to change PG policy. Require a unicode HTML version and make the plain vanilla ascii version optional. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Sun Feb 11 13:09:24 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 11 Feb 2007 16:09:24 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: michael- i'll soon be mounting a large-scale demonstration showing the superiority of your traditional format (with small modifications) in creating a cyberlibrary -- .html on the web, in unicode, with the range of capabilities people expect from electronic-books, including functionality to quickly make corrections, and an ability to smoothly create alternate formats. so i'd suggest you resist pressures to change that. (as if you needed help from me to do that.) :+) i'll prove to the world that you were right all along. -bowerbird p.s. by the way, to make the process easy for people, i've put up a web-page that will unwrap a p.g. e-text: >?? http://snowy.arsc.alaska.edu/bowerbird/unwrap.pl i'm becoming a regular old script-kiddie these days! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070211/a4f32da2/attachment.htm From robert_marquardt at gmx.de Sun Feb 11 20:43:19 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Mon, 12 Feb 2007 05:43:19 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CF7745.4060401@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> Message-ID: <2lrvs2t1rsj2tnj02g2c5jhk2eruae4ubd@4ax.com> On Sun, 11 Feb 2007 21:06:29 +0100, you wrote: >Outside your ivory tower the world has changed. The OLPC machine runs >Linux and has a graphical interface. Charities don't accept anything >less than pentium class machines. This is a bit impolite, but true. >And, make no mistake, HTML is here to stay. If HTML was to go away the >Internet Archive would become a gazillion petabytes of rubbish. Don't >you think governments will salvage the world heritage set in HTML? I think an XML or SGML DTD is what we need. After all SGML has been developed for this task. >The time has come to change PG policy. Require a unicode HTML version >and make the plain vanilla ascii version optional. I think we are still far from that goal. first we need to work on the foundations. What most of the texts still lack is metadata which structures the texts. HTML is not good enough for all the aspects of that. i currently discuss with andrew Sly in private about tools for fixing the inconsistencies of the PG data. I think i will drag this discussion in the open. What we definitely need is a tool to handle the whole lifecycle of a document in the PG database. In fact i think we need something on the level of Eclipse targeted to XML documents. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From schultzk at uni-trier.de Sun Feb 11 23:45:59 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 08:45:59 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CF7745.4060401@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> Message-ID: Am 11.02.2007 um 21:06 schrieb Marcello Perathoner: > Michael Hart wrote: > >> Having eBooks that ALL can read is the most important factor, >> making various alterations to tailor them to various tastes >> opens up a dozen other cans of worms. > > The most important factor is not to produce books people *can* > read, but > books people actually *want* to read. > > Not books you could *theoretically* read on a 386 class machine if you > were a night watchman in a technology museum or otherwise able by hook > or crook to get by such a relic of the past, but books that look > appealing on the devices people use every day, like cell phones, PDAs, > iPods etc. > > Outside your ivory tower the world has changed. The OLPC machine runs > Linux and has a graphical interface. Charities don't accept anything > less than pentium class machines. In all of the above you do have a point, but the requirement for HTML just causes wars XML, ZML, etc is better. What PG needs is what I call is a base format that easily facilitates over formats. PG needs a base format to fill the needs of the contributors to PG, not a existing format. > > And, make no mistake, HTML is here to stay. If HTML was to go away the > Internet Archive would become a gazillion petabytes of rubbish. Don't > you think governments will salvage the world heritage set in HTML? > > > The time has come to change PG policy. Require a unicode HTML version > and make the plain vanilla ascii version optional. PG policy should change. Yet, to their own format and serve the other formats as HTML, PDF, etc. regards Keith. From schultzk at uni-trier.de Sun Feb 11 23:54:15 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 08:54:15 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45CE37BB.6010102@novomail.net> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> Message-ID: Am 10.02.2007 um 22:23 schrieb Lee Passey: > Michael Hart wrote: >> Once a reader figures out what they want, they can convert >> directories >> of files to their own specifications in literally seconds, so why >> force >> any of this on them? >> >> Michael > > So, why is it that it's not OK to force a reader to deal with multiple > variants of "\r\n" or "\n" or "\r", but it /is/ OK to force them to > deal > with unnecessary line endings in the first place? Actually, it has nothing to do with the line ending. It is just that PG (Mr Hart) is not willing to change policy and go with more modern times. The normal user does not even know about line endings and does not care. They use what is available. The discussion going on is with "developers". The PG conventions were O.K. when PG started, but 25 years later they are far from adequate and acceptable !! This can easily be seen from the reoccuring discusions here in this forum. regards Keith. From schultzk at uni-trier.de Mon Feb 12 00:28:09 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 09:28:09 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> Message-ID: Am 11.02.2007 um 04:20 schrieb Michael Hart: > > On Sat, 10 Feb 2007, Lee Passey wrote: > >> Michael Hart wrote: >>> Once a reader figures out what they want, they can convert >>> directories >>> of files to their own specifications in literally seconds, so why >>> force >>> any of this on them? >>> >>> Michael >> >> So, why is it that it's not OK to force a reader to deal with >> multiple >> variants of "\r\n" or "\n" or "\r", but it /is/ OK to force them >> to deal >> with unnecessary line endings in the first place? > > Sorry, I am not involved with the first issue you raised, but as to > the > second, we used both cr AND lf because virtually ALL the readers would > read things with one or the other AND both, and we didn't want to > leave > anyone out. It's quite a different thing to get extra line endings > than no line endings, not to mention how little effort it ever took > to convert from cr/lf to either just cr or to lf, if someone really > wanted to change the way it looked. > > Having eBooks that ALL can read is the most important factor, > making various alterations to tailor them to various tastes > opens up a dozen other cans of worms. I can not agree more. Yet, as you well know any system offering data that can be universially used must be appropriately formated to serve ALL. The current PG Plain Vanilla Text is not up to this job! Of course we can develop tools for this. But, this type of processing is NO TRIVIAL task as some believe. From what I seen different project and contributors have not yet develope a system that fully works in ALL the years they have been working on this problem. I dare say it is even decades. The complexity of NLP (natural language processing) is also beyond the scope of thesecontributors in my opinion. Please do not get me wrong. They have done great work and are making progress, I have several times started and restarted the task of developing a converter/processing system for PG, but soon came to the conclusion that the task is futile, due to: 1) the inconsitencies in PG Texts (actually this is a smaller problem and could be solved by manual editing) 2) the nature of language processing involved in automating the processing of PG texts to: a) handle most inconsistancies b) the need to develope a parser needed to handle reformating (recognition of features of the text that hint at formating) c) the system would be just for one language (see b.) d) the nature of PG etexts: Information is lost that can hardly be retrieved by analysis. I am not saying we can not make something cute and readable out of PG etexts, but the effort involved is a waste of resources. If PG changed it policy and endorsed its OWN format with minimal markup the points 1 and 2 would be mute. Furthermore, everybody working with PG etexts would benifit and could finish their projects in very little time. The entire discussion here are reminiscent of the discussions over databases and formats. It is also why most systems today still can handle tabbed-delimited files!! regards Keith. From schultzk at uni-trier.de Mon Feb 12 00:37:18 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 09:37:18 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> Message-ID: <0BAD92C4-FBF2-4D22-BB06-19758296F324@uni-trier.de> Am 09.02.2007 um 12:58 schrieb Michael Hart: > > On Fri, 9 Feb 2007, Schultz Keith J. wrote: > >> [snip, snip ... snip] >> One reason for having Cr+lf as a line ending is that pratically >> any system will find a line ending. Most modern(!) programs >> generally >> ignore the extra character today and display the lines correctly. >> This was not true in the past and one would have this ugly >> character at >> the end of a line. >> >> Keith. > > I can't tell you how many files I have received that lose their > margination > when passing through various email systems, file conversion > programs, etc. > > Right now I am proofreading a book for someonw who uses an editing > program > none of my friends seem to know much, and while I can sometimes see > lines > as true lines in the email, but not after the file is saved. Thanks, for the proof that PG needs its own standard/format that all contributors can expect!!! > > It would be NICE if every program simply had an option to save the > lines > as lines with cr/lf if wanted, and vice versa. When Apple and Microsoft finally fuse this will likely happen! ;-))) > > Of course, there are outboard programs that can change cr/lf to either > cr or lf in just one second, providing you know which you want, and > also vice versa. > > Once a reader figures out what they want, they can convert directories > of files to their own specifications in literally seconds, so why > force > any of this on them? But, this IS THE PROBLEM. Readers, generally do not have the tools, nor the know-how. Please, do not forget we are geeks. regards Keith. From schultzk at uni-trier.de Mon Feb 12 00:47:17 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 09:47:17 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> I am not asking for a particular format for distrubuting etexts! Just a base format so that the different projects have something better to work with. It is not supposed to be the (w)holy grail. As I have described time and time again, we can still have plain texts. Partically all projects I know of concerning language processing use a intermediate format to facilatate the processing to achieve the goals of the research. PG need as intermediate base format to facilate the production of etexts in all shapes, sizes, and formats. Keith. Am 09.02.2007 um 12:51 schrieb Michael Hart: > > It's not so much what Mr. Bowerbird says, as that this was a request > for Project Gutenberg to pick, as "official," any one particular > format > over all the others. I prefer to encourage everyone to develop > formats > they enjoy without trying to enforce their decisions on other, rather > to provide a range of formats, just as there are a range of book > sizes, > styles, appearances, etc. > > We are not going to annoint any one format as the only "official" > format, > though I am sure from time to time new formats will be invented and > have > certain times in greater vogue than other. > > Michael > > > On Fri, 9 Feb 2007, Bowerbird at aol.com wrote: > >> keith said: >>> The question is how do we get Mr. Hart to agree with us >>> so that PG gets a truely usuable BASE FORMAT, >>> so that everbody can get the most out of PG texts. >> >> keith, keith, keith, you were doing so good right up until then. >> >> you think "mr. hart" is who needs to have his head turned around? >> >> what a laugh! :+) >> >> you're _wrong_, buddy, and i mean 180-degrees the wrong way. >> >> michael hart is the visionary here with his insistence on plain text. >> >> if he wouldn't have demanded a plain-text copy of every e-text, >> i probably wouldn't have even realized that plain-text copies have >> embedded within 'em, most of the info we need to know about 'em. >> (which by the way, goes far beyond your basic list, keith.) >> >> and i would have never took that idea, run with it, and made it work. >> >> and now the world is gonna have "a truly usable base format", >> one "so that everybody can get the most out of p.g. texts", and >> on top of all that, it's simple to learn too. a 4th-grader can >> do it. >> >> and all because little old me saw pure insight in michael's eyes... >> >> so, you script kiddies can just sit back and watch, while i show you >> -- with working code -- that michael hart is one true genius here. >> and that his format can kick the ass of your format, here to sunday. >> >> -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Mon Feb 12 01:04:56 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Mon, 12 Feb 2007 10:04:56 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: Message-ID: <1132C394-BA80-4F00-A0C6-5DF28DC17955@uni-trier.de> Am 09.02.2007 um 12:41 schrieb Bowerbird at aol.com: > keith said: > > The question is how do we get Mr. Hart to agree with us > > so that PG gets a truely usuable BASE FORMAT, > > so that everbody can get the most out of PG texts. > > keith, keith, keith, you were doing so good right up until then. Thank you very much. What a great compliment coming from you. You must be getting old and tried. > > you think "mr. hart" is who needs to have his head turned around? > > what a laugh! :+) > > you're _wrong_, buddy, and i mean 180-degrees the wrong way. > > michael hart is the visionary here with his insistence on plain text. Yes, he has a vision. A very good one. So did Galileo, Newton, and Einstein. It does not mean though that thier wisdom, knowledge and vision can not be improved. > > if he wouldn't have demanded a plain-text copy of every e-text, > i probably wouldn't have even realized that plain-text copies have > embedded within 'em, most of the info we need to know about 'em. > (which by the way, goes far beyond your basic list, keith.) Really, How did you figure that out all by yourself? Thank you for the insight. But, to tell you the truth: you know practically nothing about natural language processing. The information is not in the text, but in the minds of each and very person who uses the text. This information is called extra lingual information! Your system has this information coded into it by you! > > and i would have never took that idea, run with it, and made it work. > > and now the world is gonna have "a truly usable base format", > one "so that everybody can get the most out of p.g. texts", and > on top of all that, it's simple to learn too. a 4th-grader can do it. I can teach any 4-grader to due complex math and physics, but I am not bragging about it. > > and all because little old me saw pure insight in michael's eyes... > > so, you script kiddies can just sit back and watch, while i show you > -- with working code -- that michael hart is one true genius here. > and that his format can kick the ass of your format, here to sunday. Have you ever thought about contacting MIT ! They would love to have your insight. They have a couple of projects in NLP and AI that have failed, just be cause of the problem you just solved !!!! > > -bowerbird > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/8cb878cc/attachment.htm From Bowerbird at aol.com Mon Feb 12 01:09:22 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 04:09:22 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: keith said: > the requirement for HTML just causes wars XML, ZML, etc is better. > What PG needs is what I call a base format that easily facilitates over formats. keith, the main reason there have been "wars" here over the years is because i wanted to see how much credibility my opponents would be willing to wager. plus i was hoping they'd waste a lot of their life trying to make the x.m.l. work. it ended up they were willing to bet all their credibility. (which they will lose.) sadly, they seemed unwilling to waste much of their time and energy on x.m.l. (they probably realized it was a shell game, and hoped to snooker volunteers; heck, even _i_ would like the library in x.m.l. if someone else did all the work!) now, though, the time for talking is over. it's time for _proof_ via _pudding_... i am about to show you -- beyond the point at which anyone can continue to doubt the truth of the matter -- that what i've been saying all along is correct. michael's format gives the best bang for the buck to do what needs to be done. in retrospect, maybe i should have done it much sooner. but i was having fun. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/b180cb55/attachment.htm From klofstrom at gmail.com Mon Feb 12 01:13:15 2007 From: klofstrom at gmail.com (Karen Lofstrom) Date: Sun, 11 Feb 2007 23:13:15 -1000 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> Message-ID: <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> I'm not sure how it's going, but Joshua Hutchinson at DP is working on DP-TEI, a subset of TEI that can be used as a base for all sorts of file conversions. TEI is XML-compliant, so there shouldn't be any problems there. Insisting on outdated file formats is like keeping your data on 8-inch floppies. Remember those? Any computer file system has to be continually upgraded. I have 20-year-old files on my computer, files that have been through a number of upgrades. I think they started on a CPM machine. If I'd insisted on keeping them in their original format -- they'd be useless. -- Karen Lofstrom old fart From joshua at hutchinson.net Mon Feb 12 06:30:16 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Mon, 12 Feb 2007 14:30:16 +0000 (UTC) Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: <4017733.1171290616395.JavaMail.?@fh1038.dia.cp.net> PGTEI is developed by Marcello. I'm the main DP evangelist. ;) As far as status, it works well. A TEI master can be converted automatically to UTF-8, ISO-8859-1 and US-ASCII text files, HTML and PDF. So far, roughly 100 books have been posted to PG's archives using PGTEI as the master format. Once the learning curve is out of the way, a TEI master is roughly as much work as a HTML edition. The biggest problem is a lack of automation tools. A *lot* of DP's in-house shorthand maps to TEI markup, but we don't have the great tools that the HTML process has (ie, GuiGuts, GutAxe, GutCutter, etc). If anyone is interested in helping to develop such tools, let me know. I make a great test subject! :) If anyone wants to create a PGTEI master, let me know. David Widger will post PGTEI files, but I usually handle the first pass check to make sure everything is valid and "looks ok" before sending the files to David, who can create the entire suite of file types with a single commandline. Josh >----Original Message---- >From: klofstrom at gmail.com >Date: Feb 12, 2007 4:13 >To: "Project Gutenberg Volunteer Discussion" >Subj: Re: [gutvol-d] Plain Text, Hand on the Torch > >I'm not sure how it's going, but Joshua Hutchinson at DP is working on >DP-TEI, a subset of TEI that can be used as a base for all sorts of >file conversions. TEI is XML-compliant, so there shouldn't be any >problems there. > >Insisting on outdated file formats is like keeping your data on 8- inch >floppies. Remember those? Any computer file system has to be >continually upgraded. I have 20-year-old files on my computer, files >that have been through a number of upgrades. I think they started on a >CPM machine. If I'd insisted on keeping them in their original format >-- they'd be useless. > >-- >Karen Lofstrom >old fart >_______________________________________________ >gutvol-d mailing list >gutvol-d at lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > From gpdimonderose at hotmail.it Mon Feb 12 07:46:11 2007 From: gpdimonderose at hotmail.it (giacinto plescia) Date: Mon, 12 Feb 2007 16:46:11 +0100 Subject: [gutvol-d] 4097acc3e1b8f7ce8958bf98332c98bbb7982ddc Message-ID: An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/081b7e67/attachment.htm From sly at victoria.tc.ca Mon Feb 12 08:28:20 2007 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon, 12 Feb 2007 08:28:20 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> Message-ID: Karen: What do you think of the argument that there are 20-year-old files in the PG collection which are just as easy for anyone to use now as when they were made? Andrew On Sun, 11 Feb 2007, Karen Lofstrom wrote: > Insisting on outdated file formats is like keeping your data on 8-inch > floppies. Remember those? Any computer file system has to be > continually upgraded. I have 20-year-old files on my computer, files > that have been through a number of upgrades. I think they started on a > CPM machine. If I'd insisted on keeping them in their original format > -- they'd be useless. > > From hart at pglaf.org Mon Feb 12 08:52:02 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 08:52:02 -0800 (PST) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> Message-ID: The argument being proposed here is to reduce the number of formats rather than to increase them. However, the fewer the formats, then the fewer people we can reach. Project Gutenberg encourages everyone to try their own formats, but is not going to pick one format above and beyond all the others and say "This is THE offical Project Gutenberg Format." People have tried this for years, and we always encourage them with our complete cooperation, volunteering to provide all the space for their project, servers to distribute their results, and even to get their requests for volunteers out on their behalf. However, what they are so often seeking is some way to "take over," and become annointed as the ONLY accepted format, and what they are really asking for is for someone to wipe out all competition. The only competition is out there in the real world where you would be competing to get your eBook downloaded. Thanks!!! Give the world eBooks in 2007!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org On Mon, 12 Feb 2007, Andrew Sly wrote: > Karen: > > What do you think of the argument that there are 20-year-old > files in the PG collection which are just as easy for anyone > to use now as when they were made? > > Andrew > > On Sun, 11 Feb 2007, Karen Lofstrom wrote: > >> Insisting on outdated file formats is like keeping your data on 8-inch >> floppies. Remember those? Any computer file system has to be >> continually upgraded. I have 20-year-old files on my computer, files >> that have been through a number of upgrades. I think they started on a >> CPM machine. If I'd insisted on keeping them in their original format >> -- they'd be useless. >> >> > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Mon Feb 12 09:11:32 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 12:11:32 EST Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= Message-ID: michael said: > we always encourage them with our complete cooperation, > volunteering to provide all the space for their project, > servers to distribute their results yes. they made this offer to me, and i accepted it. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/0612053f/attachment-0001.htm From jon at noring.name Mon Feb 12 09:12:53 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 10:12:53 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: <847204396.20070212101253@noring.name> Bowerbird noted: > sadly, they seemed unwilling to waste much of their time and energy on x.m.l. > (they probably realized it was a shell game, and hoped to snooker volunteers; > heck, even _i_ would like the library in x.m.l. if someone else did all the > work!) Well, as noted, the DP folk are instituting PGTEI, and it has always been the intent of Charles, Juliet, et al, to evolve their workflow to the XML "digital master" approach. Already 100 texts are finished in PGTEI as Joshua just noted. Older texts can be recast into PGTEI as time goes on. > it's time for _proof_ via _pudding_... > i am about to show you -- beyond the point at which anyone can continue to > doubt the truth of the matter -- that what i've been saying all along is > correct. Actually, some of the things you've said over the last couple years are spot on. In fact, Bowerbird, you may not realize this, but despite my criticisms of your project, I've supported many of your positions more than you realize, probably more than anyone else here, and I've even given advice (in an implied manner -- see below) that would have advanced your cause and possibly benefitted PG. The core areas of agreement: 1) PG should use a single master, which allows for unambiguous machine identification of fundamental document structures, from which everything else can be derived. 2) The masters should be text encoded (glad to hear you want to support Unicode encodings!) The ZML text format is actually equivalent roundtrip-wise to a fairly simple XML vocabulary (I've said this many times), a notch or two below the XML-based BookX format I'm working on for representing documents and their structures and text semantics (see http://www.bookx.org/ ). And two to three notches below PGTEI. So it is clear that ZML texts can be converted to other formats quite readily and with consistent results once the scripts are written. And as you've noted, others have taken this approach with lightly marked-up texts that are quite close to ZML. > michael's format gives the best bang for the buck > to do what needs to be done. Actually, Michael never instituted regularized plain text as a format. (At least regularized to a degree to allow for unambiguous machine identification of document structures and some text semantics.) What you are saying though is by taking plain text (which Michael has touted), and strictly regularize the formatting so as to allow for unambiguous machine identification of document structures, will make it easier to get consistent conversion to various other formats (e.g., reflowable types) as well as to be natively readable with some bells and whistles. Laudable goal, and one which I support in principle. "Regularized" plain text is definitely better than "ad hoc" plain text. No argument from anyone on this! > in retrospect, maybe i should have done it much sooner. > but i was having fun. Actually, you made the mistake in focusing on the viewer-app. Had you instead focused on conversion tools from ZML to "regularized" XHTML, for example, that would have made a much heavier impact. And it would also have been a lot easier. Why do I say this? 1) You will plug-in with existing reading systems which everyone are familiar with and using. Once ZML becomes an accepted part of the landscape, then a native ZML reading application can be built. (Heck, even for this I'd use a stripped down Mozilla engine or similar web browser code-base since ZML can be internally mapped on the fly to a simple XHTML doc -- and use standardized CSS style sheets, see below.) 2) XHTML is the precursor to a lot of other ebook formats including PDF. (Btw, you should also be able to convert ZML to a "regularized" TEI.) 3) One important advantage about ZML (which is also an advantage for XML-based markup vocabularies that strictly focus on pre-defined document structures and text semantics) is that one can build a library of standardized CSS style sheets for the XHTML equivalent. This is a huge advantage in that it allows people to take ZML and during the conversion to "regularized" XHTML for both native reading and for producing derivative formats (e.g., LIT, Mobipocket, Plucker, even PDF), to be able to get consistent results without themselves having to be CSS gurus. Just take the CSS Zen Garden approach and use someone else's styling -- no need to adapt CSS to one's XHTML formatting since ZML supports the same "regularized" and "standardized" XHTML. Imagine a repository with hundreds of "Z(XHT)ML" CSS style sheets. Perfect rice everytime. (And a native ZML "viewer app" built upon a browser engine can use the same CSS repository for end-user selection of styling.) Now where we differ, Bowerbird, is that you believe ZML is sufficient to identify the needed document structures and text semantics for most if not all books, while I do not. I believe XML confers some other advantages as well. And there is the "metadata" issue that you've *apparently* not yet addressed (which BookX and PGTEI do address -- I believe a ZML document needs to have the ability to include some "machine-readable" metadata info, such as source info, identifiers, etc. -- look at Dublin Core for the basics. I also believe there should be a prolog in the ZML document saying this is a "ZML 1.0" document. Now if you've already addressed the metadata issue, I've missed it.) But in my reckoning ZML can be made to work for a significant enough percentage of books to be looked at more closely by the PG folk -- for some books it will probably be more than sufficient even by my fairly strict requirements (well, solve the metadata thing...) After all, as I've said many times, including above, ZML is logically equivalent to an XML document using a pre-defined set of elements focusing on document structure and text semantics. I see it being, logically, a "subset" of BookX and of PGTEI, for example. (Well, close, there is the metadata issue.) There are certainly printed books/texts which will be sufficiently represented by ZML (where we differ is in the percentage of those books which ZML can sufficiently represent.) *** Anyway, I'd be happy to advise you on a "standardized" XHTML to use for the ZML to XHTML conversion allowing for standardization of CSS style sheets a'la CSS Zen Garden. Or, look at the "BookXHTML" to get ideas of how to standardize the XHTML equivalent. I suggest using a "subset" of "BookXHTML" to produce your "z(xht)ml" format. (I suspect you'll turn down my offer...) So, it must gall on you that I appear to be your top ZML supporter in the PG and DP communities. Jon Noring From jon at noring.name Mon Feb 12 09:29:59 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 10:29:59 -0700 Subject: [gutvol-d] "Introducing the Book" (funny video) Message-ID: <1243109227.20070212102959@noring.name> It's relevance to ebooks is more than obvious: http://www.youtube.com/watch?v=eRjVeRbhtRU (Link from Peter Brantley.) Jon From Bowerbird at aol.com Mon Feb 12 09:36:30 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 12:36:30 EST Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: jon noring said: > There are certainly printed books/texts > which will be sufficiently represented by ZML um, it's a little too late to backpedal now, jon. all your credibility is already in the poker pot. > (where we differ is in the percentage of those books > which ZML can sufficiently represent.) i say 98.5%, which would mean 19,700 out of 20,000 books. what percentage do you say? put your number on the record. of the 300 i have trouble with, x.m.l. will have trouble with 150. (does anyone have a count of the _unique_books_ -- or text of any kind -- in the p.g. library, eliminating all the duplicates and non-textual matter like movies, music, genome files, etc.? i'm estimating that that count will run about 10-12 thousand.) > I'd be happy to advise you on a "standardized" XHTML to use i'll pass on that. but thanks for starting my week with a big laugh. :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/6be2b1e0/attachment.htm From robert_marquardt at gmx.de Mon Feb 12 09:56:02 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Mon, 12 Feb 2007 18:56:02 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> Message-ID: On Mon, 12 Feb 2007 08:52:02 -0800 (PST), you wrote: > >The argument being proposed here is to reduce the number of formats >rather than to increase them. I do not read the discussion that way. The proposal is to have a master format which can be rendered to the other formats. TEI seems the way to go for that. The individual files can then be created on demand about the same way as the Plucker format is currently rendered. The real problem lies in the environment for the developer aka text provider. It needs something like the Eclipse IDE (Integrated Development Environment). In fact Eclipse is a framework which could be targeted to PG book creation. It is a major task though. This would allow the text provider apply all the current separate tools interactively and repeatedly. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From marcello at perathoner.de Mon Feb 12 09:53:23 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 18:53:23 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> Message-ID: <45D0A993.4000109@perathoner.de> Schultz Keith J. wrote: > What PG needs is what I call is a base format that > easily facilitates over formats. PG needs a base format to > fill the needs of the contributors to PG, not a existing > format. Are you proposing to replace the old proprietary isolated used-by-nobody-except-PG "plain vanilla ascii" format with a new proprietary isolated used-by-nobody-except-PG format? -- Marcello Perathoner webmaster at gutenberg.org From klofstrom at gmail.com Mon Feb 12 10:02:31 2007 From: klofstrom at gmail.com (Karen Lofstrom) Date: Mon, 12 Feb 2007 08:02:31 -1000 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> Message-ID: <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> On 2/12/07, Andrew Sly wrote: > What do you think of the argument that there are 20-year-old > files in the PG collection which are just as easy for anyone > to use now as when they were made? You mean, "as broken as when they were made," right? Inability to handle many accented characters is broken. For my purposes, plain text files are only a starting point; I have to convert them to something I can read on my PDA. That's why, whenever possible, I download my ebooks from manybooks.net, which has a nice clean interface and has done the file conversion for me. That's where I send ebook newbies, telling them that the PG interface is user-hostile. -- Karen Lofstrom From marcello at perathoner.de Mon Feb 12 10:08:25 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 19:08:25 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> Message-ID: <45D0AD19.6030000@perathoner.de> Michael Hart wrote: > Project Gutenberg encourages everyone to try their own formats, but > is not going to pick one format above and beyond all the others and > say "This is THE offical Project Gutenberg Format." But PG *requires* the submission of a "plain vanilla ascii" version along with any other format chosen by the user. As long as that requirement is not voided, "plain vanilla ascii" is a format "picked above and beyond all the others". What about posting a text format without the "plain vanilla ascii" counterpart? Can do or no can do? -- Marcello Perathoner webmaster at gutenberg.org From joshua at hutchinson.net Mon Feb 12 10:11:17 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Mon, 12 Feb 2007 18:11:17 +0000 (UTC) Subject: [gutvol-d] Plain Text, Hand on the Torch Message-ID: <24927336.1171303877119.JavaMail.?@fh1038.dia.cp.net> >----Original Message---- >From: sly at victoria.tc.ca > >Karen: > >What do you think of the argument that there are 20-year-old >files in the PG collection which are just as easy for anyone >to use now as when they were made? > I'm gonna answer this too... It isn't that the old files are any harder to use. It's that people have been trained to expect more now. I've shown ascii text files to people unfamiliar with PG and they almost always go "Oh, that's cool," and then move on to the next thing. When I show them HTML or PDF files, they go ... "Oh, that's cool," and start clicking/scrolling around and sometimes bookmarking it to come back later. People expect nicer formatting nowadays. Good or bad, like it or not, but people expect more. Josh From grythumn at gmail.com Mon Feb 12 10:15:24 2007 From: grythumn at gmail.com (Robert Cicconetti) Date: Mon, 12 Feb 2007 13:15:24 -0500 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <45D0AD19.6030000@perathoner.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> Message-ID: <15cfa2a50702121015h5d050301we1025a6715274332@mail.gmail.com> On 2/12/07, Marcello Perathoner wrote: > Michael Hart wrote: > > > Project Gutenberg encourages everyone to try their own formats, but > > is not going to pick one format above and beyond all the others and > > say "This is THE offical Project Gutenberg Format." > > But PG *requires* the submission of a "plain vanilla ascii" version > along with any other format chosen by the user. As long as that > requirement is not voided, "plain vanilla ascii" is a format "picked > above and beyond all the others". Eh? I seem to recall a number of books getting unicode-only releases (where it does not map well to plain ascii), and a few with HTML-only releases (The popular Hand Shadows is an example of the latter.) R C From marcello at perathoner.de Mon Feb 12 10:19:29 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 19:19:29 +0100 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: <847204396.20070212101253@noring.name> References: <847204396.20070212101253@noring.name> Message-ID: <45D0AFB1.8040701@perathoner.de> Jon Noring wrote: > But in my reckoning ZML can be made to work for a significant enough > percentage of books to be looked at more closely by the PG folk -- > for some books it will probably be more than sufficient even by my > fairly strict requirements (well, solve the metadata thing...) But where's the point? If a text is so simple that you can faithfully represent it in ZML, you can automatically process it into TEI with the tools in place *today*. If a text is more complex than ZML can represent, you can still automatically process it into TEI and apply the complex bits by hand. No need to have more than one master format around. TEI is free, very well documented, tried and tested by scholars from all over the world, the tools are open sourced and available today. ZML is proprietary, poorly documented, completely untried and untested, the source of the tools will cost six figures and, at the rate he's going, will be unavailable for at least 25 years. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Mon Feb 12 10:36:36 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 13:36:36 EST Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= Message-ID: robert said: > The proposal is to have a master format which can be rendered to the other formats. michael is saying "go ahead and create a mirror of the library in your master format." *** karen said: > I download my ebooks from manybooks.net, which has a nice clean interface > and has done the file conversion for me. manybooks.net is cool. their "conversions" leave much to be desired in the realm of typography, though. they're just doing a straight plain-text dump. it's too unappealing, in my opinion. *** marcello said: > TEI is free, very well documented, tried and tested by scholars from > all over the world, the tools are open sourced and available today. sounds like you should be making progress then. so what's the hold-up? *** marcello said: > But PG *requires* the submission of a "plain vanilla ascii" version > along with any other format chosen by the user. > As long as that requirement is not voided, "plain vanilla ascii" > is a format "picked above and beyond all the others". and thank goodness for that! it's what made the p.g. library great! it's the reason places like blackmask and samizdat and manybooks come to project gutenberg as the primary source for their offerings. *** josh said: > People expect nicer formatting nowadays and they _deserve_ nicer formatting than a plain-ascii text-file! but that plain-ascii text-file can serve as the input to a converter that makes it look nicer and gives it full-on e-book capabilities... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/393280c4/attachment.htm From marcello at perathoner.de Mon Feb 12 10:37:10 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 19:37:10 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <15cfa2a50702121015h5d050301we1025a6715274332@mail.gmail.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> <15cfa2a50702121015h5d050301we1025a6715274332@mail.gmail.com> Message-ID: <45D0B3D6.6020709@perathoner.de> Robert Cicconetti wrote: > Eh? I seem to recall a number of books getting unicode-only releases > (where it does not map well to plain ascii), and a few with HTML-only > releases (The popular Hand Shadows is an example of the latter.) "FAQ V.10. Do I have to produce in plain ASCII text? "Certainly not if it doesn't make sense. To take an extreme example, if you're working in Japanese or Arabic, or creating audio files, there is no point in trying to reproduce that in ASCII! "Where the text can largely be expressed in ASCII, we do want to post an ASCII version, even if it is somewhat degraded compared to the original. However, we will post your file in as many open formats as you want to create, so that your original work is available for those who have the software to read it." "Where the text can largely be expressed in ASCII" is a rather elastic definition and has produced many hilarious postings eg. German texts with letters from the German alphabet replaced with similar-looking letters from the ASCII alphabet. -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Mon Feb 12 10:45:31 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 11:45:31 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: <45D0AFB1.8040701@perathoner.de> References: <847204396.20070212101253@noring.name> <45D0AFB1.8040701@perathoner.de> Message-ID: <70903579.20070212114531@noring.name> Marcello wrote: > Jon Noring wrote: >> But in my reckoning ZML can be made to work for a significant enough >> percentage of books to be looked at more closely by the PG folk -- >> for some books it will probably be more than sufficient even by my >> fairly strict requirements (well, solve the metadata thing...) > If a text is so simple that you can faithfully represent it in ZML, you > can automatically process it into TEI with the tools in place *today*. > > If a text is more complex than ZML can represent, you can still > automatically process it into TEI and apply the complex bits by hand. > > No need to have more than one master format around. Hey, I agree, and I think I made it clear that an XML-based vocabulary is superior. In many respects, my prior message was simply to convey that ZML is logically not that much different from the XML-based approach in terms of usability and repurposeability -- to strip away some of the mythos Bowerbird is trying to create around ZML by showing the logical equivalence. The default position of those in PG (such as Greg Newby, who is the PGLAF Executive Director) is that the future is XML-based mastering, and so those who promote non-XML solutions have to show that theirs is sufficient (for structuring texts) *plus* conferring other advantages. At least I am open to the possibilities of ZML, if for no other reason to learn from it which will benefit everyone. If Bowerbird is willing to spend all that time developing ZML, then no one is stopping him, and I think we might even learn a few things from his effort that will benefit the more universal and fully extensible XML-based approaches to digital mastering. > TEI is free, very well documented, tried and tested by scholars from all > over the world, the tools are open sourced and available today. Again, agreed. > ZML is proprietary, poorly documented, completely untried and untested, > the source of the tools will cost six figures and, at the rate he's > going, will be unavailable for at least 25 years. Well, actually, had he simply wrote a ZML --> Z(XHT)ML script two years ago (which I know he could do), then he'd at least be somewhere today. But he went off into the "viewer-app" approach. He says we are wasting our time continuing to develop the XML approach, but his "viewer app" has turned out to be a true waste of his time. I guess hindsight is 20-20. Anyway, my interest in ZML is actually the reverse of Bowerbird's. So long as there's a need/requirement for a plain text reading edition, I see ZML as possibly being the output format from the master (a slave if you will) and not itself be the master format. Jon From hart at pglaf.org Mon Feb 12 10:54:34 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 10:54:34 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <24927336.1171303877119.JavaMail.?@fh1038.dia.cp.net> References: <24927336.1171303877119.JavaMail.?@fh1038.dia.cp.net> Message-ID: Nobody minds the inclusion of HTML, Unicode, or any other formats. It's only when someone comes up and asks to be the ONLY format. mh On Mon, 12 Feb 2007, joshua at hutchinson.net wrote: > > >> ----Original Message---- >> From: sly at victoria.tc.ca >> >> Karen: >> >> What do you think of the argument that there are 20-year-old >> files in the PG collection which are just as easy for anyone >> to use now as when they were made? >> > > I'm gonna answer this too... > > It isn't that the old files are any harder to use. It's that people > have been trained to expect more now. > > I've shown ascii text files to people unfamiliar with PG and they > almost always go "Oh, that's cool," and then move on to the next thing. > > When I show them HTML or PDF files, they go ... "Oh, that's cool," and > start clicking/scrolling around and sometimes bookmarking it to come > back later. > > People expect nicer formatting nowadays. Good or bad, like it or not, > but people expect more. > > Josh > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Mon Feb 12 11:00:40 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 11:00:40 -0800 (PST) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <45D0AD19.6030000@perathoner.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> Message-ID: On occasion there have been requests by eBook creators to ONLY present their work in a certain format, usually to preserve an accent group or something on that order. Greg and I tell them that WE won't convert their file, but the file eventually will probably be converted not only into great numbers of other formats, but also great numbers of languages, and that when they are, we will probably post the files: this seems to be OK with those making such requests. By 2021 I fully expect to see a number of formats nothing like what we have today in terms of compatibility, and also I quite expect that machine translation will be up to the point we had with OCR a decade ago. It won't be perfect, but the reader is likely going to be able to read 99% of the book in a manner of enough understandability to know what the book is about. Michael On Mon, 12 Feb 2007, Marcello Perathoner wrote: > Michael Hart wrote: > >> Project Gutenberg encourages everyone to try their own formats, but >> is not going to pick one format above and beyond all the others and >> say "This is THE offical Project Gutenberg Format." > > But PG *requires* the submission of a "plain vanilla ascii" version > along with any other format chosen by the user. As long as that > requirement is not voided, "plain vanilla ascii" is a format "picked > above and beyond all the others". > > What about posting a text format without the "plain vanilla ascii" > counterpart? > > Can do or no can do? > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From hart at pglaf.org Mon Feb 12 11:03:24 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 11:03:24 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> Message-ID: This is exactly why we want easy to convert formats. If Project Gutenberg eBooks were ONLY available in .pdf or whatever, Karen would probably NOT be able to get her PDA files from manybooks.net, or a variety of other sites that provide for various hardware/software. The whole idea here is to OPEN more doors, not to close more doors. More open formats equal more open doors. Michael On Mon, 12 Feb 2007, Karen Lofstrom wrote: > On 2/12/07, Andrew Sly wrote: > >> What do you think of the argument that there are 20-year-old >> files in the PG collection which are just as easy for anyone >> to use now as when they were made? > > You mean, "as broken as when they were made," right? Inability to > handle many accented characters is broken. > > For my purposes, plain text files are only a starting point; I have to > convert them to something I can read on my PDA. That's why, whenever > possible, I download my ebooks from manybooks.net, which has a nice > clean interface and has done the file conversion for me. That's where > I send ebook newbies, telling them that the PG interface is > user-hostile. > > -- > Karen Lofstrom > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Mon Feb 12 11:05:02 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 12:05:02 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <1084486371.20070212120502@noring.name> Bowerbird wrote: > jon noring said: >>?There are certainly printed books/texts >>?which will be sufficiently represented by ZML > um, it's a little too late to backpedal now, jon. > all your credibility is already in the poker pot. I knew you were going to say this, which is basically trying to rewrite history, which is what you always try to do. But in the past two years I have said the above comment several times. I'm not backpedaling (and I'm not), but simply restating what I've always said. The problem is that a few books, or some books, is not the same as all books. PG/DP should not be interested in a mastering approach that sufficiently works for only a minority of books. My point is clear, so to call me "backpedaling" is simply disingenous and basically a veiled ad-hominem attack. > i say 98.5%, which would mean 19,700 out of 20,000 books. > what percentage do you say?? put your number on the record. Much less than 50%. And remember this is not based on what *you* believe to be sufficient, but what the PG/DP folk believe is sufficient. If you define sufficiency by your standards, you'll always win. Very convenient ploy. > (does anyone have a count of the _unique_books_ -- or text > of any kind -- in the p.g. library, eliminating all the duplicates > and non-textual matter like movies, music, genome files, etc.? > i'm estimating that that count will run about 10-12 thousand.) No idea. The PG metadata is unfortunately so poor that one can't just push a button and know the answer. > i'll pass on that.? but thanks for starting my week with a big laugh.?????? :+) I figured you'd pass on my offer. In fact I predicted it, in the comment which you intentionally left out: >> (I suspect you'll turn down my offer...) It's fun toying with you since you are so predictable. Jon From hart at pglaf.org Mon Feb 12 11:05:34 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 11:05:34 -0800 (PST) Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45D0A993.4000109@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> Message-ID: On Mon, 12 Feb 2007, Marcello Perathoner wrote: > Schultz Keith J. wrote: > >> What PG needs is what I call is a base format that >> easily facilitates over formats. PG needs a base format to >> fill the needs of the contributors to PG, not a existing >> format. > > Are you proposing to replace the old proprietary isolated > used-by-nobody-except-PG "plain vanilla ascii" format with a new > proprietary isolated used-by-nobody-except-PG format? > > > -- > Marcello Perathoner > webmaster at gutenberg.org Except, of course, that billions of "plain vanilla asci" emails are sent and received every single day. . . . Not to mention the incredibly large portions of many web pages that simply contain an entire article in plain text surrounded by a few title words, ads, etc. Michael From Bowerbird at aol.com Mon Feb 12 12:15:22 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 15:15:22 EST Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: jon noring said: > but his "viewer app" has turned out to be a true waste of his time. someday soon you will learn what a truly ridiculous statement that is... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/35af5a28/attachment.htm From marcello at perathoner.de Mon Feb 12 12:25:40 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 21:25:40 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> Message-ID: <45D0CD44.7080006@perathoner.de> Michael Hart wrote: > Except, of course, that billions of "plain vanilla asci" emails > are sent and received every single day. . . . Except of course that all emails in all languages except English are *not* encoded in ascii. > Not to mention the incredibly large portions of many web pages > that simply contain an entire article in plain text surrounded > by a few title words, ads, etc. Are you saying that plain text is more common on the net than HTML? -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Mon Feb 12 12:28:04 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 15:28:04 EST Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: jon noring said: > Much less than 50%. very stark difference there, between 98.5% and "much less than 50%". won't be hard to tell who was right and who lost all their credibility... > remember this is not based on what *you* believe to be sufficient, > but what the PG/DP folk believe is sufficient. i'll deliver more than pgtei delivers. (if pgtei delivers anything, that is.) and i'll do it for a much smaller price, and on a much faster timetable... and since my library will be a separate mirror from p.g. "proper", it'll be quite easy for everyone to tell the difference between 'em. we'll see which one that places like manybooks use in the future. so let's leave all the "debates" behind, and see who serves pudding. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/aa64e537/attachment.htm From marcello at perathoner.de Mon Feb 12 12:29:34 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 21:29:34 +0100 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <45D0CE2E.3010609@perathoner.de> Bowerbird at aol.com wrote: > someday soon you will learn what a truly ridiculous statement that is... For suitably large values of 'soon'. -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Mon Feb 12 12:33:23 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 13:33:23 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <1749308120.20070212133323@noring.name> Bowerbird wrote: > jon noring said: > and since my library will be a separate mirror from p.g. "proper", > it'll be quite easy for everyone to tell the difference between 'em. > we'll see which one that places like manybooks use in the future. > > so let's leave all the "debates" behind, and see who serves pudding. Glad to see you're focusing on delivering formats from your "master" format to use on existing "viewer apps", rather than trying to build your own viewer app. And manybooks is a good place. Jon From jon at noring.name Mon Feb 12 12:36:44 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 13:36:44 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <1598256904.20070212133644@noring.name> Bowerbird wrote: > jon noring said: >>?? but his "viewer app" has turned out to be a true waste of his time. > someday soon you will learn what a truly ridiculous statement that is... Someday! Well, if you did't have me around to motivate you so you can "show me up" and make me lose all credibility (in whose eyes?), you wouldn't even be as far along as you are. I've said *all along* that there's merit to your ZML idea. I just don't agree with you as to what that merit is. Jon From hart at pglaf.org Mon Feb 12 12:42:19 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 12:42:19 -0800 (PST) Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45D0CD44.7080006@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> Message-ID: On Mon, 12 Feb 2007, Marcello Perathoner wrote: > Michael Hart wrote: > >> Except, of course, that billions of "plain vanilla asci" emails >> are sent and received every single day. . . . > > Except of course that all emails in all languages except English are > *not* encoded in ascii. > > >> Not to mention the incredibly large portions of many web pages >> that simply contain an entire article in plain text surrounded >> by a few title words, ads, etc. > > Are you saying that plain text is more common on the net than HTML? Technically speaking, which is not what I meant, the amount of actual HTML versus other things just embedded in an HTML shell, yes, probably < 50%. And, as I mentioned earlier, we are working on doing many more languages, for all our books, not just some, so "plain text" need not be "American" as in the "A" in ACSCII, but just plain text to the readers and writers. mh From marcello at perathoner.de Mon Feb 12 13:21:54 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 12 Feb 2007 22:21:54 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> Message-ID: <45D0DA72.60901@perathoner.de> Michael Hart wrote: >> Are you saying that plain text is more common on the net than HTML? > Technically speaking, which is not what I meant, the amount of actual HTML > versus other things just embedded in an HTML shell, yes, probably < 50%. Wriggling out sideways with a strawman again? When was the last time you actually answered a question? -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Mon Feb 12 14:27:25 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 17:27:25 EST Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text Message-ID: marcello said: > Wriggling out sideways with a strawman again? > When was the last time you actually answered a question? michael, i'm wondering why you take this abuse. are "webmasters" _really_ that difficult to find? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/1dcab3ba/attachment.htm From Bowerbird at aol.com Mon Feb 12 14:29:16 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 17:29:16 EST Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: jon noring said: > Glad to see you're focusing on delivering formats > from your "master" format to use on existing "viewer apps", > rather than trying to build your own viewer app. "rather than"? what rubbish. it was always the mission to have an .html capacity, since that's what people need to put stuff on the web. and as long as there are handheld machines that can't access the web directly, there's a niche for other formats. but for most offline use, people will use my viewer-app. > Well, if you did't have me around to motivate you > so you can "show me up" and make me lose all credibility > (in whose eyes?), everyone who has read your posts to this listserve for years, right up to and including the "much less than 50%" one today. > you wouldn't even be as far along as you are. ha. you really stretch to find _some_ reason to exist, don't you? you have had zero effect on my thinking over the many long years, jon, which is only slightly less than the impact i have had on yours. > I've said *all along* that there's merit to your ZML idea. > I just don't agree with you as to what that merit is. you don't have the foggiest idea how to evaluate technical merit. that's why you're an "e-book expert" who cannot even do the simple task of turning your listserve archives into an e-book... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/8363753a/attachment.htm From jon at noring.name Mon Feb 12 14:41:49 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 15:41:49 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <153692897.20070212154149@noring.name> Bowerbird wrote: [a bunch of angry diatribe] > that's why you're an "e-book expert" who cannot even do the > simple task of turning your listserve archives into an e-book... Well, I do have to answer this in that the primary focus of getting the archives is to get them online in a way specific to reading in a forum mode. So the first attempt has been to get them imported to Google groups, which my contact at Google replied and said they could not do in the way I had hoped (mostly to do with proper time stamping.) There are some avenues I plan to investigate based on feedback I've received. Just need the time to focus on it, which won't happen for a while. Certainly the archives could be compressed into an "ebook" of some sort and distributed that way. Any ideas for a format? Anyway, in compressed form, with the headers stripped, I may be able to squeeze the archive in about 12-15 megs, which is on the large size for a distributable ebook. Web mode makes more sense since the collection is, by and large, topic-based and not a narrative read. I take your comment about my technical inabilities as simply a stab based on anger. Smoke a joint or something. Relax, man, there's more serious issues facing mankind to worry about. Jon From jon at noring.name Mon Feb 12 14:52:31 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 15:52:31 -0700 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <77293650.20070212155231@noring.name> Bowerbird wrote: > marcello said: >>?Wriggling out sideways with a strawman again? >>?When was the last time you actually answered a question? > michael, i'm wondering why you take this abuse. Because Michael believes in free speech and airing of all sides of an issue? I seem to recall both Michael and Greg have shown infinite patience with a certain someone (besides myself I suppose), despite the private pleas of several, if not a dozen, people in this group who'd like that person tarred, de-feathered, and banned. Ah, brings back memories ... I know what the administrators of forums have to put up with regarding difficult, anarcho-leaning people. > are "webmasters" _really_ that difficult to find? Are you offering to take over the job? I'm sure you'd do great! Jon From Bowerbird at aol.com Mon Feb 12 14:58:25 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Feb 2007 17:58:25 EST Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter Message-ID: jon noring said: > Certainly the archives could be compressed into an "ebook" of some > sort and distributed that way. Any ideas for a format? i suggest you spend several years applying heavy markup to it all. :+) > Anyway, in compressed form, with the headers stripped, > I may be able to squeeze the archive in about 12-15 megs, > which is on the large size for a distributable ebook. tell that to hacker-david, who put the imdb into plucker format. (or was it wikipedia? i can never remember. something big...) > Web mode makes more sense since the collection is, > by and large, topic-based and not a narrative read. hunh? a one-time 15-meg download is quite easy with broadband -- it's about 1/4 the size of the last "update" for microsoft-office -- so why wouldn't a person want the whole thing on their hard-drive? much better that way than getting bits and pieces squirted at you... > I take your comment about my technical inabilities as simply > a stab based on anger. anger? who's angry? i am taking great delight in mocking you and your "technical inabilities"... my smile is _quite_ wide... :+) but i guess we'll have to bring it to a close, because now we aren't even bothering with any pretense of being on-topic... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070212/4ba65348/attachment.htm From jon at noring.name Mon Feb 12 17:18:07 2007 From: jon at noring.name (Jon Noring) Date: Mon, 12 Feb 2007 18:18:07 -0700 Subject: [gutvol-d] Support for ZML from (apparently) the number one ZML supporter In-Reply-To: References: Message-ID: <1072971673.20070212181807@noring.name> Bowerbird wrote: > but i guess we'll have to bring it to a close, because now we > aren't even bothering with any pretense of being on-topic... Agreed. From sly at victoria.tc.ca Mon Feb 12 17:36:37 2007 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon, 12 Feb 2007 17:36:37 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> Message-ID: On Mon, 12 Feb 2007, Karen Lofstrom wrote: > On 2/12/07, Andrew Sly wrote: > > > What do you think of the argument that there are 20-year-old > > files in the PG collection which are just as easy for anyone > > to use now as when they were made? > > You mean, "as broken as when they were made," right? Inability to > handle many accented characters is broken. > I was thinking more of file format than character encoding. I find too often those two concepts are confused. I was thinking of the argument that plain text files can still be viewed and otherwise utilized on just about any computer you could find. Andrew From klofstrom at gmail.com Mon Feb 12 18:03:54 2007 From: klofstrom at gmail.com (Karen Lofstrom) Date: Mon, 12 Feb 2007 16:03:54 -1000 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> Message-ID: <1e8e65080702121803h531a4ab3mfe33d4a2084e22eb@mail.gmail.com> On 2/12/07, Andrew Sly wrote: > I was thinking of the argument that plain text files can > still be viewed and otherwise utilized on just about > any computer you could find. The desirability of having a plain text version available doesn't imply that this should be the BASE version. As long as you have a base version, living on your server, that can generate a plain text version. you're OK. Also, the plain text is there, even in XML or TEI markup. The markup is nothing but formatting codes added to the text. It would be trivial to strip those out, if we assume: 1) Civilization has fallen 2) You have a 486 running on electricity from a windmill 3) You have a CD drive 4) You have a CD with an ebook library, coded in XML or TEI or HTML Just display the files and strip out the markup codes. Easier than reading a palimpsest. Of course, for real proven durability, we'd want the clay tablet version. In a zillion tablets. Output of your HP Clay Printer-Baker (TM). -- Karen Lofstrom From robert_marquardt at gmx.de Mon Feb 12 20:35:07 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 13 Feb 2007 05:35:07 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <45D0AD19.6030000@perathoner.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> Message-ID: <7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com> On Mon, 12 Feb 2007 19:08:25 +0100, you wrote: >But PG *requires* the submission of a "plain vanilla ascii" version >along with any other format chosen by the user. As long as that >requirement is not voided, "plain vanilla ascii" is a format "picked >above and beyond all the others". What is bad with a required fallback option? -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From hart at pglaf.org Mon Feb 12 21:22:15 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 21:22:15 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <1e8e65080702121803h531a4ab3mfe33d4a2084e22eb@mail.gmail.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> <1e8e65080702121803h531a4ab3mfe33d4a2084e22eb@mail.gmail.com> Message-ID: On Mon, 12 Feb 2007, Karen Lofstrom wrote: > On 2/12/07, Andrew Sly wrote: > >> I was thinking of the argument that plain text files can >> still be viewed and otherwise utilized on just about >> any computer you could find. > > The desirability of having a plain text version available doesn't > imply that this should be the BASE version. As long as you have a base > version, living on your server, that can generate a plain text > version. you're OK. Let's be sure the plain text IS generated and on file, not just that it CAN be. . . . And any other formats you feel are equally important. After all, terabytes are getting extremely cheap. > > Also, the plain text is there, even in XML or TEI markup. The markup > is nothing but formatting codes added to the text. It would be trivial > to strip those out, if we assume: > > 1) Civilization has fallen > 2) You have a 486 running on electricity from a windmill > 3) You have a CD drive > 4) You have a CD with an ebook library, coded in XML or TEI or HTML > > Just display the files and strip out the markup codes. Easier than > reading a palimpsest. Each and every proposed format says conversion from their format to all other formats is trivial. Each time I simply ask for the trivial to be done. Each time the trivial turns into the quadrivial, and nothing happens. > Of course, for real proven durability, we'd want the clay tablet > version. In a zillion tablets. Output of your HP Clay Printer-Baker > (TM). I think you need missed the last step, don't you really mean "Written In Stone" ??? > Karen Lofstrom Michael S. Hart, who doesn't even have a printer. From hart at pglaf.org Mon Feb 12 21:24:18 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 21:24:18 -0800 (PST) Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> Message-ID: On Mon, 12 Feb 2007, Andrew Sly wrote: > > > On Mon, 12 Feb 2007, Karen Lofstrom wrote: > >> On 2/12/07, Andrew Sly wrote: >> >>> What do you think of the argument that there are 20-year-old >>> files in the PG collection which are just as easy for anyone >>> to use now as when they were made? >> >> You mean, "as broken as when they were made," right? Inability to >> handle many accented characters is broken. >> > > I was thinking more of file format than character encoding. > I find too often those two concepts are confused. > > I was thinking of the argument that plain text files can > still be viewed and otherwise utilized on just about > any computer you could find. > > Andrew to put this more plainly than some are wont to do: what other format could you have chosen that would have worked as well or better? not to dodge the point that one could be created now. just do it, post all the PG books in it, and be done. we'll give you the space, publicity, etc. mh From hart at pglaf.org Mon Feb 12 21:34:09 2007 From: hart at pglaf.org (Michael Hart) Date: Mon, 12 Feb 2007 21:34:09 -0800 (PST) Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45D0DA72.60901@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> <45D0DA72.60901@perathoner.de> Message-ID: On Mon, 12 Feb 2007, Marcello Perathoner wrote: > Michael Hart wrote: > >>> Are you saying that plain text is more common on the net than HTML? > >> Technically speaking, which is not what I meant, the amount of actual HTML >> versus other things just embedded in an HTML shell, yes, probably < 50%. > > Wriggling out sideways with a strawman again? > > When was the last time you actually answered a question? > > > Marcello Perathoner > webmaster at gutenberg.org This is not a point _I_ was trying to make, it is a point YOU were trying to make. How does it feel to find you have hooked yourself, and are now hoisted by your own petard? Michael From vlsimpson at gmail.com Mon Feb 12 21:41:16 2007 From: vlsimpson at gmail.com (V. L. Simpson) Date: Mon, 12 Feb 2007 23:41:16 -0600 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <405ks25jal42tbi8kg91ee87bcjfou8lvu@4ax.com> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> <45C9C4CF.8040304@perathoner.de> <405ks25jal42tbi8kg91ee87bcjfou8lvu@4ax.com> Message-ID: A heads-up for your SF Bookshelf Project. I uploaded Galaxy Primes for review at Distributed Proofreaders and Campbell's Blackstar Passes and Smith's Triplanetary will be up Very Soon(TM). Unfortunately, as a Post Processor, I don't have upload privileges to Gutenberg yet so posting to PG is at the mercy of our verifiers. Anyway, just to let you know and thanks for pulling the PG SF together. vls From klofstrom at gmail.com Mon Feb 12 21:45:47 2007 From: klofstrom at gmail.com (Karen Lofstrom) Date: Mon, 12 Feb 2007 19:45:47 -1000 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> <45D0DA72.60901@perathoner.de> Message-ID: <1e8e65080702122145j7c00342fi6a8ba0316fdd688a@mail.gmail.com> Perhaps we could agree that even if we establish a new standard for a base version, from which other versions can be generated (whether this is XML, PGTEI, or HTML -- that's another discussion), that it should be *immediately* capable of generating a plain text version. Michael is worried that we will shift to a new base and that the promised plain-text converter will be vaporware. Since software projects are known for vaporware, this is a reasonable fear. If we can agree on this, then we can consider the choice of base version. When that's decided, we consider how best to implement the new standard. It will probably require some website redesign and some changes in the DP workflow, as well as being premised on working conversion software. This should be done slowly and carefully. -- Karen Lofstrom From robert_marquardt at gmx.de Mon Feb 12 23:34:31 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 13 Feb 2007 08:34:31 +0100 Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> <45C9C4CF.8040304@perathoner.de> <405ks25jal42tbi8kg91ee87bcjfou8lvu@4ax.com> Message-ID: <20070213073431.225210@gmx.net> > I uploaded Galaxy Primes for review at Distributed Proofreaders and > Campbell's Blackstar Passes and Smith's Triplanetary will be up Very > Soon(TM). Quite amusing that i want the Pipers expedited and all other SF in the queue is expedited insted :-) I take what i get and i will release this month no matter what. After all the CD can be updated as often as needed. As soon as this CD is out i will challenge the others to do better :-) The Christmas bookshelf is an obvious choice. Crime would be better, but i think there are not enough books linked yet. Detective Fiction is more popular than SF. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer From schultzk at uni-trier.de Mon Feb 12 23:54:38 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Tue, 13 Feb 2007 08:54:38 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45D0A993.4000109@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> Message-ID: In essence, yes. With a Big BUT ... PG offers thier texts in many different formats which are/can be automatically generated from this format!! A BIG difference. Keith. Am 12.02.2007 um 18:53 schrieb Marcello Perathoner: > Schultz Keith J. wrote: > >> What PG needs is what I call is a base format that >> easily facilitates over formats. PG needs a base format to >> fill the needs of the contributors to PG, not a existing >> format. > > Are you proposing to replace the old proprietary isolated > used-by-nobody-except-PG "plain vanilla ascii" format with a new > proprietary isolated used-by-nobody-except-PG format? > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Mon Feb 12 23:58:00 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Tue, 13 Feb 2007 08:58:00 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <45D0CD44.7080006@perathoner.de> References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> Message-ID: Am 12.02.2007 um 21:25 schrieb Marcello Perathoner: > Michael Hart wrote: > >> Except, of course, that billions of "plain vanilla asci" emails >> are sent and received every single day. . . . > > Except of course that all emails in all languages except English are > *not* encoded in ascii. Most are actually HTML!??! > > >> Not to mention the incredibly large portions of many web pages >> that simply contain an entire article in plain text surrounded >> by a few title words, ads, etc. > > Are you saying that plain text is more common on the net than HTML? Boo! ;--)) HTML is plain text !! think think. Sorry I could not help my self. > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From schultzk at uni-trier.de Tue Feb 13 00:04:23 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Tue, 13 Feb 2007 09:04:23 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1e8e65080702122145j7c00342fi6a8ba0316fdd688a@mail.gmail.com> References: <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> <45D0DA72.60901@perathoner.de> <1e8e65080702122145j7c00342fi6a8ba0316fdd688a@mail.gmail.com> Message-ID: <6B386D12-DDFD-4ACB-9A79-BD7326F143EB@uni-trier.de> I may glad somebody seems to agree. No matter what the base version will be it should be no problem to create a plain vanilla version. I have never actually understood why PG does not endorse or create it own. Keith. Am 13.02.2007 um 06:45 schrieb Karen Lofstrom: > Perhaps we could agree that even if we establish a new standard for a > base version, from which other versions can be generated (whether this > is XML, PGTEI, or HTML -- that's another discussion), that it should > be *immediately* capable of generating a plain text version. Michael > is worried that we will shift to a new base and that the promised > plain-text converter will be vaporware. Since software projects are > known for vaporware, this is a reasonable fear. > > If we can agree on this, then we can consider the choice of base > version. When that's decided, we consider how best to implement the > new standard. It will probably require some website redesign and some > changes in the DP workflow, as well as being premised on working > conversion software. This should be done slowly and carefully. > > -- > Karen Lofstrom > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Tue Feb 13 00:21:35 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 03:21:35 EST Subject: [gutvol-d] noring's disgusting attempt to smear p.g. Message-ID: i'm disgusted by jon noring's latest attempt to smear p.g. > http://www.teleread.org/blog/?p=6174 -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/f79b4623/attachment.htm From greg at durendal.org Tue Feb 13 03:35:06 2007 From: greg at durendal.org (Greg Weeks) Date: Tue, 13 Feb 2007 06:35:06 -0500 (EST) Subject: [gutvol-d] Some help needed for my PG Sciene Fiction Bookshelf CD In-Reply-To: <20070213073431.225210@gmx.net> References: <5ojhs2hn0250lasdgftdhkn8j7nrmp3g1t@4ax.com> <45C8F8F6.5030303@perathoner.de> <9vlis2punsr22ruetm3kck77ngh6vga9cd@4ax.com> <45C9C4CF.8040304@perathoner.de> <405ks25jal42tbi8kg91ee87bcjfou8lvu@4ax.com> <20070213073431.225210@gmx.net> Message-ID: On Tue, 13 Feb 2007, Robert Marquardt wrote: > Quite amusing that i want the Pipers expedited and all other SF in the > queue is expedited insted :-) vls doesn't have any of the Pipers. Two of the Piper shorts have posted to available to PPV also. All I need to do now is round up a PPVer to finish them off. -- Greg Weeks http://durendal.org:8080/greg/ From joshua at hutchinson.net Tue Feb 13 05:52:25 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Tue, 13 Feb 2007 13:52:25 +0000 (UTC) Subject: [gutvol-d] noring's disgusting attempt to smear p.g. Message-ID: <30093340.1171374745968.JavaMail.?@fh1039.dia.cp.net> I must have read a different one than you. This one didn't smear PG at all. It pointed out a problem Jon perceives with some of our older texts. Mainly that there is no way to verify their accuracy due to policies we used to follow. He proposes a way to fix it. In other words, nothing to see here other than a bird squawking at the sky. Josh ----Original Message---- From: Bowerbird at aol.com Date: Feb 13, 2007 3:21 To: , Subj: [gutvol-d] noring&#39;s disgusting attempt to smear p.g. i'm disgusted by jon noring's latest attempt to smear p.g. > http://www.teleread.org/blog/?p=6174 -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/fd75189f/attachment.htm From joshua at hutchinson.net Tue Feb 13 05:55:15 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Tue, 13 Feb 2007 13:55:15 +0000 (UTC) Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch Message-ID: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> NOTE: Mistakenly hit reply instead of reply all. Meant this to go out to the list, not just Michael. Sorry about that. >----Original Message---- >From: hart at pglaf.org > >Each and every proposed format says conversion from their format >to all other formats is trivial. > >Each time I simply ask for the trivial to be done. > >Each time the trivial turns into the quadrivial, and nothing happens. > Just wanted to point out that this statement is absolutely, 100% wrong. The PGTEI tool chain generates three different txt files (UTF-8, Latin- 1, US-ASCII), a PDF file and a HTML file from one single command. tei The PGLAF.ORG server will churn for a minute or so, and then spit out a nice little .zip file ready for pushing to the main archive. It will even handle the occasion of having original page scan images available and zip those into the upload file as well. *** The problem with using a master format is NOT the available formats we can post to the archive. It is the learning curve and lack of really good tools for the novice. Josh From cannona at fireantproductions.com Tue Feb 13 07:01:27 2007 From: cannona at fireantproductions.com (Aaron Cannon) Date: Tue, 13 Feb 2007 09:01:27 -0600 Subject: [gutvol-d] noring's disgusting attempt to smear p.g. References: Message-ID: <00e001c74f81$1c776270$0300a8c0@blackbox> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Uh-oh! Jon is being constructive again. We can't have that! :) Personally, I think it is a great idea and I believe it will add to, not take away from, Project Gutenberg. I also hope DP gets involved. Aaron - -- Skype: cannona MSN/Windows Messenger: cannona at hotmail.com (don't send email to the hotmail address.) - ----- Original Message ----- From: To: ; Sent: Tuesday, February 13, 2007 2:21 AM Subject: [gutvol-d] noring's disgusting attempt to smear p.g. > i'm disgusted by jon noring's latest attempt to smear p.g. > >> http://www.teleread.org/blog/?p=6174 > > -bowerbird > - -------------------------------------------------------------------------------- > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 Comment: Key available from all major key servers. iD8DBQFF0dTrI7J99hVZuJcRAtI2AKDjfucDSIgXwTGzIy2BYoTj0kaApACgnhTU MIwdQHIYEU148yn7oZOQ9YU= =45lx -----END PGP SIGNATURE----- From Bowerbird at aol.com Tue Feb 13 08:35:19 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 11:35:19 EST Subject: [gutvol-d] noring's disgusting attempt to smear p.g. Message-ID: josh said: > I must have read a different one than you. > This one didn't smear PG at all. it's always nice to have multiple perspectives. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/5c6193c0/attachment-0001.htm From jon at noring.name Tue Feb 13 08:46:00 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 09:46:00 -0700 Subject: [gutvol-d] noring's disgusting attempt to smear p.g. In-Reply-To: References: Message-ID: <196381540.20070213094600@noring.name> Bowerbird wrote: > josh said: >>?? I must have read a different one than you. >>?? This one didn't smear PG at all. > it's always nice to have multiple perspectives. Agreed! From jon at noring.name Tue Feb 13 09:33:28 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 10:33:28 -0700 Subject: [gutvol-d] Link to "Digital Text Masters" blog article In-Reply-To: <00e001c74f81$1c776270$0300a8c0@blackbox> References: <00e001c74f81$1c776270$0300a8c0@blackbox> Message-ID: <16010441326.20070213103328@noring.name> I appreciate Aaron's and Josh's positive comments on the "Digital Text Masters" (DTM) proposal. Since many people don't read Bowerbird's messages, let me repost the URL to my original article plus the URL to a followup comment to an informative and supportive comment from Karen Lofstrom: http://www.teleread.org/blog/?p=6174 http://www.teleread.org/blog/?p=6174#comment-224843 My intent in posting this article was two-fold: 1) Get the ideas out in a more public and (hopefully) coherent fashion. Frankly, I am overwhelmed with, like, five different initiatives that are sucking up my time (DTM, OpenReader, BookX, the "generic" Container, and the "Universal Annotation Standard".) I also have some other irons in the fire. Maybe I have a touch of ADD. With my new responsibilities as head of development for DigitalPulp Publishing (we are exploring various potential projects), I have to begin resolving these initiatives. Either make them happen somehow, or at least get the ideas out there in public -- to plant some seeds which may or may not grow, and then move on. I am serious about DTM, but it will only happen if a critical mass of core group people comes together. DTM is not intended to be Jon Noring's project -- I don't care if my name is ever associated with it -- I am not motivated by creating a legacy. I simply believe DTM is eminently needed and will greatly benefit our culture. The short-term goal is to put together a viable governance (a Board) along with good management. That leaves me out as Exec. Dir. since that is not my strength, nor do I have the time to run it anyway, but I will certainly help out part-time and as one of the Board members and occasional XML wonk. 2) Find out who is interested and see if we can get this thing actually launched in some form. To be clear: those who step forward saying they are interested are not yet committing themselves. Let's have a show of hands, and if there's enough there, then let's have a teleconference to talk it over as a group. I certainly plan to post the URL to the article in various places where it may draw interest from people not currently associated with PG/DP. As noted before, there are innovative funding/revenue possibilities. Let me finally note that although in the past some of the ideas in DTM were presented to Juliet at DPP (mostly as part of the LibraryCity proposal), I have not recently been in contact with Juliet so there is not yet DP participation in (nor support of) this proposed project. Furthermore, to be clearer, I said the following in the comment to Karen (URL above): "So our vision for DTM, should it get launched, is much more comprehensive and wide-ranging. It certainly could take advantage of the DP system and if DP wants to be involved. (I don?t want to be presumptuous here -- Juliet will understandably require that there be meat and real potential in the DTM proposal for her to commit any official DP mindshare to it.)" I look forward to your feedback. And if you are interested in working with the DTM project in any capacity (from occasional advisor to Exec. Dir.), let me know in private email. I would like a show of hands to gauge interest and support. Jon Noring From traverso at dm.unipi.it Tue Feb 13 10:03:17 2007 From: traverso at dm.unipi.it (Carlo Traverso) Date: Tue, 13 Feb 2007 19:03:17 +0100 Subject: [gutvol-d] Link to "Digital Text Masters" blog article In-Reply-To: <16010441326.20070213103328@noring.name> (message from Jon Noring on Tue, 13 Feb 2007 10:33:28 -0700) References: <00e001c74f81$1c776270$0300a8c0@blackbox> <16010441326.20070213103328@noring.name> Message-ID: <200702131803.l1DI3Hd18831@pico.dm.unipi.it> >>>>> "Jon" == Jon Noring writes: Jon> "So our vision for DTM, should it get launched, is much Jon> more comprehensive and wide-ranging. It certainly could take Jon> advantage of the DP system and if DP wants to be involved. (I Jon> don?t want to be presumptuous here -- Juliet will Jon> understandably require that there be meat and real potential Jon> in the DTM proposal for her to commit any official DP Jon> mindshare to it.)" I think that DP might require AT LEAST to have its name correctly spelled. Telerad> Fortunately, Digital Proofreaders (DP) is dedicated to this Telerad> very goal, Carlo From joshua at hutchinson.net Tue Feb 13 09:52:36 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Tue, 13 Feb 2007 17:52:36 +0000 (UTC) Subject: [gutvol-d] Link to "Digital Text Masters" blog article Message-ID: <18551241.1171389156240.JavaMail.?@fh1039.dia.cp.net> >----Original Message---- >From: traverso at dm.unipi.it > >I think that DP might require AT LEAST to have its name correctly spelled. > > Telerad> Fortunately, Digital Proofreaders (DP) is dedicated to this > Telerad> very goal, > Digital Proofreaders is the little known sister project to the Distributed Proofreaders we all know and love. Digital Proofreaders is currently proofing pi out to a million digits.... Josh From jon at noring.name Tue Feb 13 09:53:15 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 10:53:15 -0700 Subject: [gutvol-d] Link to "Digital Text Masters" blog article In-Reply-To: <200702131803.l1DI3Hd18831@pico.dm.unipi.it> References: <00e001c74f81$1c776270$0300a8c0@blackbox> <16010441326.20070213103328@noring.name> <200702131803.l1DI3Hd18831@pico.dm.unipi.it> Message-ID: <1148685158.20070213105315@noring.name> Carlo wrote: > I think that DP might require AT LEAST to have its name correctly spelled. Telerad>> Fortunately, Digital Proofreaders (DP) is dedicated to this Telerad>> very goal, Wow, what a bad gaff. thanks! I'll make the correction. Btw, it is TeleRead, not Telerad. Jon From jon at noring.name Tue Feb 13 09:56:42 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 10:56:42 -0700 Subject: [gutvol-d] Link to "Digital Text Masters" blog article In-Reply-To: <18551241.1171389156240.JavaMail.?@fh1039.dia.cp.net> References: <18551241.1171389156240.JavaMail.?@fh1039.dia.cp.net> Message-ID: <15910018639.20070213105642@noring.name> Josh wrote: > Carl wrote: >> I think that DP might require AT LEAST to have its name correctly >> spelled. > Digital Proofreaders is the little known sister project to the > Distributed Proofreaders we all know and love. Digital Proofreaders is > currently proofing pi out to a million digits.... Laugh. Made the correction in the blog article. Jon From Bowerbird at aol.com Tue Feb 13 09:57:27 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 12:57:27 EST Subject: [gutvol-d] a cold hard look at some .tei lies Message-ID: karen said: > The markup is nothing but formatting codes added to the text. > It would be trivial to strip those out it would be nice if that was true. but it's not. not by a long shot. so you've just shown some ignorance here -- ignorance that is easily revealed as such. (and i'm not blaming you for that ignorance! what you are saying is something that many x.m.l. advocates spout, which is most likely how you were misled, but it's simply untrue.) let's look at #20485, a recent .tei-based e-text. if you look at the .tei file, the first thing you'll see is that there is much relevant information stored inside angle-brackets, including the p.g. header and styling info (e.g., on italics, bold, small caps, and centering instructions, just at a quick glance). if you stripped it out, you'd be doing a disservice... furthermore, even outside the angle-brackets, some of the text would need to be converted, because it's got those squirrely .html entities -- like ’ and — -- which will not do... but it gets still worse. because some of those angle-bracket tags even _replaced_ characters, specifically in the form of the quotation-marks, so you would need to do the back-replacement. and it gets even worse still. since the whitespace is "not significant" within a .tei heavy-markup file, carriage-returns and spaces might well have been added or deleted without the appropriate concern, with those edits going undetected, so that when the markup is "stripped away", the resultant "plain-text" file will be deficient in this regard, in unknown ways. take all these factors together, and your plain-text file after "stripping out the tags" would be very bad. some of the problems are avoidable. some are not. but they put the lie to the claim that "stripping away" the markup will give you the kind of plain-ascii text that is the hallmark of the project gutenberg library... so please, let's put that b.s. to bed, once and forever. *** and, by the way, just as a small aside, with this e-text -- which i'd guess is not atypical of those being done -- the idea that this is _truly_ .tei markup is quite amusing. not only is this markup most assuredly _not_ "semantic", heck, it's not even "structural" in nature. it is _shockingly_ presentational -- and with a strong print-bias to boot... come along with me while i take a look-see, ok? i see tags that are geared toward fontsize, like these... > [hi rend="font-size: 100%"] J. STORER CLOUSTON > [hi rend="font-size: 125%"] INTRODUCTORY. > [hi rend="font-size: 150%"] THE LUNATIC AT LARGE. i see tags that are targeted at print-based styling, like these... > [hi rend="font-style: italic"] A NOVEL > [hi rend="font-style: italic"] aliases > [hi rend="font-style: italic"] The dog! [/hi] [/q] cried > [hi rend="font-style: italic"] We [/hi] had made a slip, ...and print-based typography, like these... > [hi rend="font-variant: small-caps"] GEORGE TWIDDEL > [hi rend="font-variant: small-caps"] Thomas Billson ...and still more print-based typography, like these... > [titlePage rend="page-break-before: right; text-align: center"] > [trailer rend="text-align: center; font-size: 75%"] THE END. i see a clumsy workaround to produce indentation... > [l rend="margin-left: 2"] She isn’t my misses, > [l rend="margin-left: 6"] Can’t you twig, dear boys, ...and right-justification... > [p rend="text-align: right"] Timothy Watson > [p rend="text-align: right"] GEORGE TWIDDEL > [p rend="text-align: right"] Thomas Billson i see "thoughtbreaks" -- 5 stars! -- parading as nonsense "milestones"... > [milestone unit="tb" rend="stars: 5"/] > [milestone unit="tb" rend="stars: 5"/] and of course -- in still another indication this will not always be an easy conversion-- if you get the encoding wrong, you get jewels like... > [hi rend="font-style: italic"] vis-?-vis > [hi rend="font-variant: small-caps"] A. ?. F. indeed, the _only_ markup i found that is "semantic" in nature was... > [lg type="ditty" rend="display"] > [lg type="ditty" rend="display"] > [lg type="ditty" rend="display"] ...which will indeed come in handy to any future scholars who might want to examine the p.g. library for their "ditty" research. it's good of the .tei people to be so thoughtful to mark those up. all in all, this "markup" shows a _fundamental_misunderstanding_ of the _actual_potential_ of .tei, were it to be done correctly; thus, when this file is examined by .tei experts, they will _laugh_ at it... ...and then throw it in the trash... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/7927eebd/attachment-0001.htm From hart at pglaf.org Tue Feb 13 10:26:59 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 10:26:59 -0800 (PST) Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> Message-ID: My statement COULD EASILY BE 100% WRONG, as Mr. Hutchinson says below, if he would simply DO what he SAYS is so simple and trivial. The fact the this has NOT been done for the Project Gutenberg library simply indicates that it is not as simply and trivial as stated. We would be MORE than happy to house all these various formats to start with, against the times when the conversion programs might not be online for whatever reasons. With terabytes getting so cheap, this is getting much easier. Remember, one of the great reasons for Project Gutenberg's success is that we encourage hundreds and thousands of people to house the eBooks all over the world, not just have one central location. Michael On Tue, 13 Feb 2007, joshua at hutchinson.net wrote: > NOTE: Mistakenly hit reply instead of reply all. Meant this to go out > to the list, not just Michael. Sorry about that. > > >> ----Original Message---- >> From: hart at pglaf.org >> >> Each and every proposed format says conversion from their format >> to all other formats is trivial. >> >> Each time I simply ask for the trivial to be done. >> >> Each time the trivial turns into the quadrivial, and nothing happens. >> > > Just wanted to point out that this statement is absolutely, 100% > wrong. > > The PGTEI tool chain generates three different txt files (UTF-8, > Latin- > 1, US-ASCII), a PDF file and a HTML file from one single command. > > tei > > The PGLAF.ORG server will churn for a minute or so, and then spit out > a nice little .zip file ready for pushing to the main archive. It > will > even handle the occasion of having original page scan images > available > and zip those into the upload file as well. > > *** > > The problem with using a master format is NOT the available formats > we > can post to the archive. It is the learning curve and lack of really > good tools for the novice. > > Josh > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Tue Feb 13 10:28:28 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 10:28:28 -0800 (PST) Subject: [gutvol-d] noring's disgusting attempt to smear p.g. In-Reply-To: <30093340.1171374745968.JavaMail.?@fh1039.dia.cp.net> References: <30093340.1171374745968.JavaMail.?@fh1039.dia.cp.net> Message-ID: On Tue, 13 Feb 2007, joshua at hutchinson.net wrote: > I must have read a different one than you. This one didn't smear PG at all. > > It pointed out a problem Jon perceives with some of our older texts. Mainly > that there is no way to verify their accuracy due to policies we used to > follow. > > He proposes a way to fix it. > > In other words, nothing to see here other than a bird squawking at the sky. > > Josh As I recall, Jon Noring also proposed removing all the old files while his proposed program was in the works. mh > > > > > > > ----Original Message---- > > From: Bowerbird at aol.com > > Date: Feb 13, 2007 3:21 > > To: , > > Subj: [gutvol-d] noring&#39;s disgusting attempt to smear p.g. > > > > i'm disgusted by jon noring's latest attempt to smear p.g. > > > >> http://www.teleread.org/blog/?p=6174 > > > > -bowerbird > > > > From joshua at hutchinson.net Tue Feb 13 10:47:58 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Tue, 13 Feb 2007 18:47:58 +0000 (UTC) Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch Message-ID: <23349977.1171392478847.JavaMail.?@fh1039.dia.cp.net> Michael, please, please, please READ what you respond to. I'll quote the relevant sentences to make it plainer. >>> Each and every proposed format says conversion from their format >>> to all other formats is trivial. This means going from TEI as the master document to all the other formats is trivial, right? >>> Each time the trivial turns into the quadrivial, This is the WRONG I referred to. The conversion from PGTEI to the other formats *is* trivial. It is one line typed in by the whitewasher: tei You then upload a zip file to the push directory. One command is about as trivial as it gets! >>> and nothing happens. >My statement COULD EASILY BE 100% WRONG, as Mr. Hutchinson says below, >if he would simply DO what he SAYS is so simple and trivial. It not only is trivial, but it happens ALL THE TIME! Try paying attention to what goes on in PG once in a while, Michael. There are many PGTEI texts in the archive as I type. More get added all the time. As much as I would love to see PG have a master document format, PGTEI isn't ready. But not because the conversion isn't trivial. But rather the CREATION of the master document isn't trivial. It isn't exactly hard, but the tools aren't there yet for newbie document creators to hit the ground running. Instead of rudely dismissing other people's comments, you really might try having a constructive conversation, Michael. I've tried to be polite in my comments up until this one, but your dismissive and condescending tone has really irked me this time. Josh From robert_marquardt at gmx.de Tue Feb 13 11:00:39 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Tue, 13 Feb 2007 20:00:39 +0100 Subject: [gutvol-d] Plucker generator on website Message-ID: It seems that images do not get included into the ebooks when generation from the HTML. Is that intentional? I consider regenerating the ebooks for the SF CD on my computer to include the images. It will increase the size considerably, but i think it is worth the effort. 60 of the 150 books are HTML with images. Another 44 HTML without images. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From marcello at perathoner.de Tue Feb 13 10:59:23 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 19:59:23 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> <7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com> Message-ID: <45D20A8B.8060706@perathoner.de> Robert Marquardt wrote: > On Mon, 12 Feb 2007 19:08:25 +0100, you wrote: > >> But PG *requires* the submission of a "plain vanilla ascii" version >> along with any other format chosen by the user. As long as that >> requirement is not voided, "plain vanilla ascii" is a format "picked >> above and beyond all the others". > > What is bad with a required fallback option? That it doubles the work of producing an ebook in any useful format. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Tue Feb 13 11:06:46 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 14:06:46 EST Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= Message-ID: josh says "it's trivial." 11 minutes later, marcello says "it doubles the work." you guys gotta get your stories straight. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/1f125463/attachment.htm From Bowerbird at aol.com Tue Feb 13 11:09:54 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 14:09:54 EST Subject: [gutvol-d] here's to different perspectives Message-ID: well, since we all seem to like different perspectives, here's my take on a few issues being bantered about... *** first, i am surprised -- and glad -- that the matter of viewer-programs has come up, and that y'all seem to be interested in 'em. on the other hand, i'm dismayed -- ok, not really, it just seemed like a fun word to use -- that some of you appear to think that i have abandoned my own viewer-programmer. that's simply _not_ true... so, in a few days, i'll make available the newest version of my program, just so y'all can take a good look at it... i dusted it off yesterday, and i must say it looks _good_. contrary to what you have been told, far too many times, my viewer-program _does_ exist; indeed, a beta has been available for downloading for over two-and-a-half years! the "beta" labeling was an inside joke -- thanks google! -- but the joke is getting old now, so i'll remove that badge. this is the first released update in 20 months, as i have been busy working on other parts of the toolchain, but that work is now approaching solid ground as well, so i expect to be rolling out a model soon which covers that entire toolchain. so you can expect me to start discussing my viewer more, along with a variety of related topics, as i gear all this up... *** josh just mentioned the auto-conversions from .tei to .pdf and .xhtml. as part of my coverage of the entire toolchain, i'll be showing my auto-conversions as well, so people can judge for themselves which ones they like better, and why... manybooks.net has been mentioned here recently, and they find that .pdf is consistently their most-downloaded format, by a _huge_ margin. while i do expect that most readers will come to prefer my viewer-app instead, i'm happy to give them a .pdf solution if that's what they continue to choose. besides, for the sony reader, .pdf is one of the main formats it supports. but personally, i still feel that .pdf stinks, except in the rare case where the pagesize happens to be the same as the window-size. as for an .html version, we'll need that, obviously, for web display. (some people can't download files, for one reason or another, so the web will be their only access-point, meaning we will offer it.) *** the other issue that's come up recently is jon noring's "criticism" -- whether it's "constructive" seems to be a matter of opinion -- that the provenance information was stripped from early e-texts. i think jon's post on the teleblahg was a smear job because jon didn't tell people the real reason for this early policy decision -- namely that the p.g. lawyer specifically advised michael to do that. certainly, when you're a no-budget entity like project gutenberg, you can't afford to take legal risks that might cost you huge fines. and please, spare us from all the sanctimonious crap about how public-domain law "clearly" gives p.g. the "right" to include that... because you know who has been the leader -- _for_35_years_ -- in pushing the envelope on the public use of the public domain? that's right, michael hart, and his little baby -- project gutenberg. before some of you were even _born_ yet, he was walking the walk. and to this day, if you can find _any_ entity that's mentioned together more often in conjunction with "public domain" than p.g., let us know. so if the law is "clear" on public domain now, _give_michael_credit_. because he's the one who did the work to carve out that legal clarity. and i'll pass on the "distributed proofreaders does it right" stuff too. because right up until the policy was changed -- and i was sitting in the room when it happened -- when brewster kahle said "we will pay for your legal defense and fines if anyone challenges on this matter", d.p. stripped out the provenance info when they sent books to p.g. too. because they could never afford to defend themselves in court _either_. so they'd be stripping away that info to this very day, if not for brewster. and that's why this post of noring's -- which says the very same things he has said repeatedly in posts all over cyberspace -- is a smear-job. it ignores the legal-reality environment under which p.g. has operated. and it fails to give michael and project gutenberg the credit it deserves. and let us note p.g. has managed to _survive_ in our litigious society; crikey, the fact that it has _never_ been taken to court -- never, ever -- is a huge win for the global community dedicated to the public domain. michael hart is a _hero_. and let us not forget that without p.g., there would be no d.p. and without d.p., none of the johnny-come-lately slashdotter weenies who want to tell michael how to run his project would even _be_here_. *** so that handles my perspective on why the noring post is a smear-job. (if you've got some other perspective, fine, i'd love to hear it. really.) but what about -- as josh put it -- the "problem" of provenance?, and the issue of e-texts that were cobbled together from various editions? well, i'm happy to give my perspective on those in a later post. and i will. but right now, i've gotta go do some polish work on my viewer-program... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/5921f16e/attachment.htm From joshua at hutchinson.net Tue Feb 13 11:16:39 2007 From: joshua at hutchinson.net (joshua at hutchinson.net) Date: Tue, 13 Feb 2007 19:16:39 +0000 (UTC) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch Message-ID: <28848788.1171394199237.JavaMail.?@fh1039.dia.cp.net> Oh, come on. We weren't talking about the same thing, much less in the same discussion thread! PGTEI -> txt ... trivial. (my comment) Manually creating multiple formats, one of which is the txt file Marcello dislikes ... double the work. (Marcello's comment). ----Original Message---- From: Bowerbird at aol.com Date: Feb 13, 2007 14:06 To: , Subj: Re: [gutvol-d] !@!Re: Plain Text, Hand on the Torch josh says "it's trivial." 11 minutes later, marcello says "it doubles the work." you guys gotta get your stories straight. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/3fa9f647/attachment-0001.htm From jon at noring.name Tue Feb 13 11:20:30 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 12:20:30 -0700 Subject: [gutvol-d] a cold hard look at some .tei lies In-Reply-To: References: Message-ID: <1253406907.20070213122030@noring.name> Bowerbird wrote: > karen said: >>?The markup is nothing but formatting codes added to the text. >>?It would be trivial to strip those out > furthermore, even outside the angle-brackets, > some of the text would need to be converted, > because it's got those squirrely .html entities -- > like ’ and — -- which will not do... Easy, run it through BabelPad, push a button, and it converts the mnemonic character entities to whatever one wants, including native encoding as UTF-8. > and it gets even worse still.? since the whitespace > is "not significant" within a .tei heavy-markup file, > carriage-returns and spaces might well have been > added or deleted without the appropriate concern, > with those edits going undetected, so that when the > markup is "stripped away", the resultant "plain-text" > file will be deficient in this regard, in unknown ways. White space can easily be normalized. > take all these factors together, and your plain-text > file after "stripping out the tags" would be very bad. Agreed. A script needs to be built to convert XML markup to regularized plain text, but then this has always been known. > not only is this markup most assuredly _not_ "semantic", > heck, it's not even "structural" in nature.? it is _shockingly_ > presentational -- and with a strong print-bias to boot... Now this I agree with. I believe that all XML markup must be structural and semantic. It is possible to do with TEI since TEI, like HTML, is very flexible. > all in all, this "markup" shows a _fundamental_misunderstanding_ > of the _actual_potential_ of .tei, were it to be done correctly; thus, > when this file is examined by .tei experts, they will _laugh_ at it... Again, agreed to your first point. The TEI experts, though, may not necessarily laugh at it because there are times when TEI is used to record the typography of the work in addition to the document structure and text semantics. There is definitely a seduction in trying to markup the "master" with original typographical information. Those working at DP with PGTEI are definitely enamored with typography. I agree typography can be art in and of itself, so I understand its lure. But in my opinion it goes too far when the typographical markup is used in lieu of, rather than to augment, full document structure and text semantic accounting, which should always come #1. My observation is that nearly all the textual content is independent of the typography (in rare cases visual presentation *is* content, which is then handleable using SVG.) In addition, there is metadata masquerading as content, but which should not be considered as content. For example, the title page information is primarily metadata all typographically-prettied up (although it oftentimes has an epigraph as part of it which is part of content.) So in my opinion there is no need to elaborately reproduce the title page information as it was in the original, at least in the primary master. (I almost view a title page as an "illustration", and it can certainly be treated that way in ebook versions.) (Btw, there is special markup just for marking up title pages. Or, SVG can be used to produce a "title page" layout for those who just *gotta* have the title page typography exactly reproduced. This title page reproduction in XML can be done as an adjunct to the digital text master.) The master XML document must focus on the textual content from a presentation-agnostic requirement to be maximally repurposeable. My rule of thumb is text-to-speech: what information is important to preserve for text-to-speech and associated navigation by the blind. Using this rule of thumb is very enlightening as it reveals what is really important in the books we digitally master. Does a blind person care if a certain piece of text is centered and in 12 pt. Garamond? If a father reads a book to his young child, what information does the father communicate to his child? I doubt he'll say "btw, this line of poetry is indented more than the others, and it says ..." -- no, he'll just read the line of poetry. This leaves out nearly all typographical layout information (there are case-by-case exceptions.) Instead we have the bare text plus document structures and special text semantics that should be recorded. (Of course, a reminder that the original page scans allow those interested in the typography to have it at their fingertips, so we are not losing typography, just putting it into its proper persepctive.) ***** For an idea of what I believe is more proper document structure and semantic markup, refer to the BookX version of "My Antonia": http://www.openreader.org/myantonia/BookX/myantonia-bookx.xml (Yes, the CSS associated with this XML document is pretty cheesy and bad, but look at the source markup, that's what is important. Use Opera or Firefox for direct viewing of the XML in a browser, though.) What I have in mind for Digital Text Masters is to use BookX and/or TEI (in a similar fashion), and to do it like is done in the BookX example. Note that nowhere is the original "title page" reproduced. The focus is on the real textual content and its fundamental structure and semantics, not typography. Focus on typography, and the text is a lot less usable. If someone wants to create an SVG version of the original title page to exactly record in XML every minute detail of the typography, that will certainly be welcome, but otherwise it is not necessary to communicate the *Work* of "My Antonia" to the user (the page scans will always be there for those interested in the original typography). Btw, in the digital text master (which the BookX version of My Antonia is not, though it could be with some changes), I'd add original page breaks and exact line breaks, among other info. Jon Noring From Bowerbird at aol.com Tue Feb 13 11:39:43 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 14:39:43 EST Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= Message-ID: josh said; > Oh, come on. We weren't talking about the same thing yes, you were, the creation of the .txt version that michael requires. > much less in the same discussion thread! the subject on both posts was "plain text, hand on the torch". > PGTEI -> txt ... trivial. (my comment) i see _your_ point. > Manually creating multiple formats, one of which is the txt file > Marcello dislikes ... double the work. (Marcello's comment). i don't see the need for creating multiple formats "manually". was marcello seriously suggesting that? i sincerely doubt it... the truth of the matter, of course, is that a plain-text version is easy to create. o.c.r. output is very close to a plain-text version. (indeed, in one recent test, i found that between o.c.r. and final-text, a mere 57 lines -- out of some 12,500 -- had changed. remarkable.) no, it is the _.tei_ version that takes a lot of work to create. (especially, as you have said yourself, josh, without tools.) and the big "benefit" of that .tei version? according to many -- and this is a position you have propagated widely, josh -- it is that .tei gives us a "trivial" conversion to .pdf and .html... ok, that's good. (depending on how good the conversion is.) but as i will demonstrate clearly and unequivocally very soon -- actually, i've _already_ demonstrated it, but some people prefer to stick their head in the sand so as not to witness it -- we can get the same "trivial" conversion from a z.m.l. e-text, which takes a _lot_ less work to create. and that's precisely how i will use z.m.l. to outcompete .tei... (plus i have a sneaking suspicion my conversions will be better.) of course, if you told people over at distributed proofreaders that they could get all the same benefits as .tei without paying the high cost of creating .tei, you couldn't sucker any volunteers. could you? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/6682559d/attachment.htm From Bowerbird at aol.com Tue Feb 13 11:51:26 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 14:51:26 EST Subject: [gutvol-d] a cold hard look at some .tei lies Message-ID: jon said: > run it through BabelPad now we need a special program to do a "trivial" job. > White space can easily be normalized. "normalizing" won't replace the eliminated newlines, or eliminate the ones that were added in error. sorry. > A script needs to be built > to convert XML markup to regularized plain text, > but then this has always been known. karen didn't know it just yesterday. > Now this I agree with. > I believe that all XML markup > must be structural and semantic. > It is possible to do with TEI since > TEI, like HTML, is very flexible. one of the reasons i never mentioned this before is because i knew it would send jon on this tirade. :+) i also wanted to see how long d.p. would go on fooling themselves that they were "doing .tei..." > In addition, there is metadata masquerading as content, > but which should not be considered as content. > For example, the title page information is primarily > metadata all typographically-prettied up > (although it oftentimes has an epigraph > as part of it which is part of content.) gosh, jon, you jump right into the silliness, don't you? :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/cbb179f1/attachment.htm From desrod at gnu-designs.com Tue Feb 13 11:56:24 2007 From: desrod at gnu-designs.com (David A. Desrosiers) Date: Tue, 13 Feb 2007 14:56:24 -0500 Subject: [gutvol-d] Plucker generator on website In-Reply-To: References: Message-ID: <1171396584.4975.53.camel@localhost.localdomain> On Tue, 2007-02-13 at 20:00 +0100, Robert Marquardt wrote: > > I consider regenerating the ebooks for the SF CD on my computer to > include the images. It will increase the size considerably, but i > think it is worth the effort. 60 of the 150 books are HTML with > images. Another 44 HTML without images. There is another uber-secret, clandestine effort to address the Plucker/PG situation with the auto-generated books, so don't put too much effort into cleaning it up just yet... -- David A. Desrosiers desrod at gnu-designs.com Skype username: setuid http://gnu-designs.com ?The palest ink is better than the most retentive memory.? - Old Chinese Proverb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/4f7c2e67/attachment.pgp From marcello at perathoner.de Tue Feb 13 12:01:19 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:01:19 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: Message-ID: <45D2190F.1080106@perathoner.de> Bowerbird at aol.com wrote: > michael, i'm wondering why you take this abuse. > > are "webmasters" _really_ that difficult to find? Just a bit harder than visionary leaders. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 12:07:18 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:07:18 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> <45D0DA72.60901@perathoner.de> Message-ID: <45D21A76.5000902@perathoner.de> Michael Hart wrote: >>>> Are you saying that plain text is more common on the net than HTML? >>> Technically speaking, which is not what I meant, the amount of actual HTML >>> versus other things just embedded in an HTML shell, yes, probably < 50%. >> Wriggling out sideways with a strawman again? >> >> When was the last time you actually answered a question? > > This is not a point _I_ was trying to make, > it is a point YOU were trying to make. > > How does it feel to find you have hooked yourself, > and are now hoisted by your own petard? I made the point that HTML, besides its many other advantages, is much better future-proof than "plain vanilla text" because there is much more HTML around than "plain vanilla text" and because the world heritage of a whole generation is expressed in HTML. You answered that there are pictures and other things too on the net. Your strategy is never to give a direct answer. Your strategy is never to say something tangible. Your "answers" to questions are merely long-winded reiterations of the commonplace. Sorry, I just cannot take you seriously any more. -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Tue Feb 13 12:10:04 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 13:10:04 -0700 Subject: [gutvol-d] here's to different perspectives In-Reply-To: References: Message-ID: <628992182.20070213131004@noring.name> Bowerbird wrote: > i think jon's post on the teleblahg was a smear job because jon > didn't tell people the real reason for this early policy decision -- > namely that the p.g. lawyer specifically advised michael to do that. > > certainly, when you're a no-budget entity like project gutenberg, > you can't afford to take legal risks that might cost you huge fines. *shrug* If anyone is doing a * job, it is you, Bowerbird. (Using another emotionally-laden and empty word, huh?) Anyway, regardless of the reason, not having provenance information, plus no commitment by PG to create faithful masters of known source documents, is a downside to the core-part of the collection. And that's all I said and needed to say in the article, a statement of what I believe to be fact. It doesn't matter one whit how the collection got to be in that situation. But let's talk about it a little anyway. Considering that PG now has no trouble in allowing provenance info (and they are really no bigger money-wise), plus they have a rigorous policy of copyright clearance to make sure only pre-1923 printings are used, means that they could have, and probably should have, instituted this process back in the 1992-95 time frame or whenever. A commitment to rigor, of following exactly following a conservative interpretation of the law, is the best legal defense. That they chose to not do so, or to get second and third legal opinions, indicates to me at least that deep down they did not at the time view provenance to be that important. If so, they would have done something about it, knowing Michael's tenacity and drive. Fine, we all make decisions which have unintended consequences. They have corrected the requirements by now, but there's a massive legacy of the Top 500 or Top 1000 works without known provenance plus other problems. So the solution is to fix the collection. DTM is one proposal to help fix the collection. Now to comment on Michael Hart's reply this morning. Note that in my article I did not call for the older PG texts to be removed pending remastering. I do recall making that comment a while back, mainly because of my various concerns, but that is not my position today. It would certainly be disruptive. Jon Noring From marcello at perathoner.de Tue Feb 13 12:36:18 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:36:18 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> Message-ID: <45D22142.6070806@perathoner.de> Michael Hart wrote: > what other format could you have chosen that would > have worked as well or better? No other format in 1971. No other format in 1980. In 1994 a better format would have been TEI 3.0. In 2000 HTML became international standard ISO/IEC 15445:2000 > not to dodge the point that one could be created now. The point not to dodge is what PG should do about its outdated standard format. > just do it, post all the PG books in it, and be done. > > we'll give you the space, publicity, etc. A fork should always be the last resource, but if PG stays firmly committed to the technology of the 80's, we who want to get ebooks to the people of today will have no other choice. ibiblio will be glad to host "ebooks TNG". PG has to speedily address these points: - one computer readable master format - unicode as default encoding - multiple licenses - one true international PG - central database for all affiliated countries - decentral download servers with user redirection - cataloging of new books as soon as acquired - posting of page images as soon as scanned - posting of text as soon as ocred - continuos upgrading of text with new proofs - online error reporting system with page images ... maybe more. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 12:42:49 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:42:49 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: <1e8e65080702122145j7c00342fi6a8ba0316fdd688a@mail.gmail.com> References: <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> <45D0DA72.60901@perathoner.de> <1e8e65080702122145j7c00342fi6a8ba0316fdd688a@mail.gmail.com> Message-ID: <45D222C9.8020605@perathoner.de> Karen Lofstrom wrote: > Michael > is worried that we will shift to a new base and that the promised > plain-text converter will be vaporware. The plain text converter already exists and existed as far back as 2003. > This should be done slowly and carefully. 4 years to notice that a converter already exists ... -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 12:45:23 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:45:23 +0100 Subject: [gutvol-d] howto: unwrap the paragraphs in a project gutenberg e-text In-Reply-To: References: <521676807.20070207082307@noring.name> <45CA39B8.8030405@perathoner.de> <178078053.20070207135954@noring.name> <200702080007.l1807nn11274@pico.dm.unipi.it> <83702198.20070207171004@noring.name> <45CB0F52.20000@perathoner.de> <1203301105.20070208072822@noring.name> <00415913-4B5C-496D-847B-25B5C4427973@uni-trier.de> <45CE37BB.6010102@novomail.net> <45CF7745.4060401@perathoner.de> <45D0A993.4000109@perathoner.de> <45D0CD44.7080006@perathoner.de> Message-ID: <45D22363.3010102@perathoner.de> Schultz Keith J. wrote: >> Are you saying that plain text is more common on the net than HTML? > Boo! ;--)) HTML is plain text !! think think. > Sorry I could not help my self. Plain text has mimetype: text/plain HTML has mimetype: text/html Now explain, if it is the same, why 2 mime types? -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 12:55:34 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:55:34 +0100 Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> Message-ID: <45D225C6.70604@perathoner.de> Michael Hart wrote: > The fact the this has NOT been done for the Project Gutenberg library > simply indicates that it is not as simply and trivial as stated. This HAS been done for PG. The conversion toolchain is available at pglaf.org and about 100 books (with autogenerated plain text version) have already been posted. -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 12:59:43 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 21:59:43 +0100 Subject: [gutvol-d] Plucker generator on website In-Reply-To: References: Message-ID: <45D226BF.4090708@perathoner.de> Robert Marquardt wrote: > It seems that images do not get included into the ebooks when > generation from the HTML. Is that intentional? True. This is intentional and documented. But you can download the plucker distiller and do the conversion yourself. (The plucker distiller is also "powering" the pg site.) -- Marcello Perathoner webmaster at gutenberg.org From jon at noring.name Tue Feb 13 12:59:46 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 13:59:46 -0700 Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: <45D225C6.70604@perathoner.de> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> <45D225C6.70604@perathoner.de> Message-ID: <1504198311.20070213135946@noring.name> Marcello wrote: > Michael Hart wrote: >> The fact the this has NOT been done for the Project Gutenberg library >> simply indicates that it is not as simply and trivial as stated. > This HAS been done for PG. > > The conversion toolchain is available at pglaf.org and about 100 books > (with autogenerated plain text version) have already been posted. Cool! A list of the 100 books, or at least some of them, with links, would be nice to post here. Jon From marcello at perathoner.de Tue Feb 13 13:05:07 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 22:05:07 +0100 Subject: [gutvol-d] Plucker generator on website In-Reply-To: <1171396584.4975.53.camel@localhost.localdomain> References: <1171396584.4975.53.camel@localhost.localdomain> Message-ID: <45D22803.5070705@perathoner.de> David A. Desrosiers wrote: > There is another uber-secret, clandestine effort to address the > Plucker/PG situation with the auto-generated books, so don't put too > much effort into cleaning it up just yet... Why don't I know nothing about this secret thing ? And what situation is there to address ? -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Tue Feb 13 13:07:54 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 22:07:54 +0100 Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: <1504198311.20070213135946@noring.name> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> <45D225C6.70604@perathoner.de> <1504198311.20070213135946@noring.name> Message-ID: <45D228AA.1040008@perathoner.de> Jon Noring wrote: >> The conversion toolchain is available at pglaf.org and about 100 books >> (with autogenerated plain text version) have already been posted. > > Cool! > > A list of the 100 books, or at least some of them, with links, would > be nice to post here. http://www.gutenberg.org/catalog/world/results?filetype=tei -- Marcello Perathoner webmaster at gutenberg.org From johnson.leonard at gmail.com Tue Feb 13 13:09:47 2007 From: johnson.leonard at gmail.com (Leonard Johnson) Date: Tue, 13 Feb 2007 16:09:47 -0500 Subject: [gutvol-d] Portable library builds books on spot Message-ID: <748ba8e50702131309w1a0595f5j4597eaba9d02ae4e@mail.gmail.com> For anyone interested, I am providing a link to an article in the Salt Lake Tribune concerning one way that Project Gutenberg books are being used in practical ways. I have a text copy if the link is not accessible, although it probably is copyrighted. http://www.sltrib.com/news/ci_5210030 Len Johnson From jon at noring.name Tue Feb 13 13:17:59 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 14:17:59 -0700 Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: <45D228AA.1040008@perathoner.de> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> <45D225C6.70604@perathoner.de> <1504198311.20070213135946@noring.name> <45D228AA.1040008@perathoner.de> Message-ID: <1648504928.20070213141759@noring.name> Marcello wrote: > Jon Noring wrote: >>> The conversion toolchain is available at pglaf.org and about 100 books >>> (with autogenerated plain text version) have already been posted. >> Cool! >> >> A list of the 100 books, or at least some of them, with links, would >> be nice to post here. > http://www.gutenberg.org/catalog/world/results?filetype=tei Thanks. Been looking at one random text among the list: http://www.gutenberg.org/etext/17756 Now, to clarify -- the Plucker, the HTML, and the various plain text encodings were push-button auto-generated from the master TEI version? I assume this, and if so, everything looks very good! Even the TEI markup is quite acceptable (from my strict structural/semantic perspective). Well done! Jon From Bowerbird at aol.com Tue Feb 13 13:35:41 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 16:35:41 EST Subject: [gutvol-d] here's to different perspectives Message-ID: jon noring said: > It doesn't matter one whit how the collection got to be in that situation. no, it doesn't matter "one whit" if you're doing a smear-job. but the _reason_ for the situation matters a whole bunch -- especially if it's a darn good reason, like ensuring that your little baby will survive in a hostile, litigious society -- if you want to be fair to a man who is a public-domain hero, the greatest public-domain champion of the last 35 years... > But let's talk about it a little anyway. um, no, let's not. if you want to provide provenance information for the early e-texts, you're more than welcome to do the work... i _respect_ kenneth fuchs, who did the work to determine that the p.g. "tarzan of the apes" was based on the ballantine version. he put together a very useful display of the differences manifested: > http://www.erblist.com/erbmania/novels/toafuchs.html and i don't know about you, but when i look at the changes made, they look like pretty straightforward edits to me. there's very little -- if anything -- that warrants a charged word like "bowdlerized", which was used in the _heading_ of the article on the teleblog site. (in an ironic twist, fuchs spelled it as "bowlderization" on his site.) besides, it was ballantine that did that editing, not a p.g. volunteer. the p.g. version might well be _100%_ "authentic" to the ballantine. this book was done by judith boss, an asset to the public domain, a woman who did excellent work on many of the early p.g. e-texts, and who _deserves_many_thanks_, not this backhanded backbiting. do we _really_ want to pick on her now because she used a 1980s version of the book instead of trying to dig up a first-edition 1914? heck, even kenneth fuchs didn't do that; as he puts it on his web-page: > An original McClurg edition was not available, but other texts based > on the McClurg first edition printing plates were used for this project: so let's cut judith boss a break, eh? because i'll tell you, even though (as i said), i respect kenneth fuchs, i'd be honored to buy judith boss dinner any time; i respect her tons. but i don't respect you and your bellyaching, jon. heck, if you would have started in on this job when you first began bringing up this issue, every e-text would have provenance by now. and i would be thanking you for doing it, and saying i respect you. but instead, all you've done is whine. and do smear-jobs. and propose the formation of new groups -- with "innovative funding and revenue models" -- that will digitize some "500 to 1000" books. get a grip, jon. google scans that many books before lunch, every day -- day in and day out... so you get no respect from me, jon. not a bit. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/8a775ec5/attachment.htm From jon at noring.name Tue Feb 13 13:43:50 2007 From: jon at noring.name (Jon Noring) Date: Tue, 13 Feb 2007 14:43:50 -0700 Subject: [gutvol-d] here's to different perspectives In-Reply-To: References: Message-ID: <392343123.20070213144350@noring.name> Bowerbird angrily wrote: > this book was done by judith boss, an asset to the public domain, > a woman who did excellent work on many of the early p.g. e-texts, > and who _deserves_many_thanks_, not this backhanded backbiting. > > do we _really_ want to pick on her now because she used a 1980s > version of the book instead of trying to dig up a first-edition 1914? Did I mention her at all? > and propose the formation of new groups -- > with "innovative funding and revenue models" > -- that will digitize some "500 to 1000" books. The books that probably account for the majority of downloads. > get a grip, jon.? google scans that many books > before lunch, every day -- day in and day out... *shrug* The top 500 or top 1000 classics account for the majority of use of the public domain in education, and by casual readers. Thus these 500 or 1000 represent, by default, the core of the public domain. As noted in my article, I'm all for the digitization of the millions of books out there. > so you get no respect from me, jon.? not a bit. I'm not asking for respect from you. In fact, I'd be embarrassed to have your respect. Jon From jmdyck at ibiblio.org Tue Feb 13 14:15:47 2007 From: jmdyck at ibiblio.org (Michael Dyck) Date: Tue, 13 Feb 2007 14:15:47 -0800 Subject: [gutvol-d] entities mentioned with "public domain" In-Reply-To: References: Message-ID: <45D23893.8070507@ibiblio.org> [Just going off on a minor tangent, thus the change of subject.] Bowerbird at aol.com wrote: > > and to this day, if you can find _any_ entity that's mentioned together > more often in conjunction with "public domain" than p.g., let us know. I suppose it depends on how one defines "entity" and "mentioned together in conjunction with", but I took the most readily useable definition, and did some googling. Here are the hit counts: about 9,400,000 for "public domain" "internet" about 5,700,000 for "public domain" "wikipedia" about 2,340,000 for "public domain" "google" about 1,240,000 for "public domain" "creative commons" about 1,210,000 for "public domain" "yahoo" about 1,190,000 for "public domain" "microsoft" about 1,150,000 for "public domain" "youtube" about 890,000 for "public domain" "disney" about 819,000 for "public domain" "project gutenberg" <--- about 729,000 for "public domain" "world wide web" about 439,000 for "public domain" "michael hart" about 384,000 for "public domain" "internet archive" about 236,000 for "public domain" "copyright office" about 228,000 for "public domain" "lawrence lessig" about 176,000 for "public domain" "prelinger archives" about 124,000 for "public domain" "united states congress" about 121,000 for "public domain" "cory doctorow" about 72,600 for "public domain" "distributed proofreaders" about 37,600 for "public domain" "open content alliance" (My guess is that "internet", "wikipedia", "google", "yahoo", "microsoft", and "youtube" would still be near the top if you replaced "public domain" with just about anything.) Not conclusive, but an amusing way to kill some time. -Michael From Bowerbird at aol.com Tue Feb 13 14:24:56 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 17:24:56 EST Subject: [gutvol-d] here's to different perspectives Message-ID: jon noring said: > Thus these 500 or 1000 represent, by default, > the core of the public domain. so just go do them! it shouldn't take you long at all, even if you have to do every one of 'em by yourself... and with the help of all the people that you _report_ agree with you on this matter, it should be a _snap_. > I'm not asking for respect from you. In fact, > I'd be embarrassed to have your respect. well yes, and i knew that revelation wouldn't surprise you; it won't surprise anyone else to know the feeling is mutual. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/12314b0a/attachment.htm From marcello at perathoner.de Tue Feb 13 14:25:17 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 13 Feb 2007 23:25:17 +0100 Subject: [gutvol-d] Fw: Plain Text, Hand on the Torch In-Reply-To: <1648504928.20070213141759@noring.name> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> <45D225C6.70604@perathoner.de> <1504198311.20070213135946@noring.name> <45D228AA.1040008@perathoner.de> <1648504928.20070213141759@noring.name> Message-ID: <45D23ACD.2090205@perathoner.de> Jon Noring wrote: > Now, to clarify -- the Plucker, the HTML, and the various plain text > encodings were push-button auto-generated from the master TEI version? Yes. And if the TEI master changes another push of the button will rebuild all dependent formats. And the button is even optional ... just so the WWers have something to push. Could be easily done with a cron job. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Tue Feb 13 14:41:08 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 17:41:08 EST Subject: [gutvol-d] entities mentioned with "public domain" Message-ID: j. michael said: > Not conclusive, but an amusing way to kill some time. i was hoping i could induce someone else to do that task. :+) > I suppose it depends on how one defines "entity" and > "mentioned together in conjunction with", but I took > the most readily useable definition, and did some googling. with "google", "yahoo", "youtube", and even "microsoft" and "disney" (for crying out loud!) listed above "project gutenberg", i'd say your definition probably could not be consider optimal. :+) interesting that "creative commons" is on that list as well, since a key component is that their active claim to copyright legalities is necessary for the strict licensing requirements to be activated. people seem to have a very sloppy understanding of fine points, but i suppose that it's not surprising to discover that... again... funny, michael hart has half as many hits as project gutenberg. i guess it's not hyperbole to say that p.g. really _is_ his baby... -bowerbird p.s. and i guess it's a little bit sad that "open content alliance" is _so_ far down the list. shows what the absence of a p.r. budget can do to your visibility in the age of pagerank... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/7f2fb1d8/attachment.htm From hart at pglaf.org Tue Feb 13 15:08:31 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 15:08:31 -0800 (PST) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <45D20A8B.8060706@perathoner.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> <7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com> <45D20A8B.8060706@perathoner.de> Message-ID: On Tue, 13 Feb 2007, Marcello Perathoner wrote: > Robert Marquardt wrote: >> On Mon, 12 Feb 2007 19:08:25 +0100, you wrote: >> >>> But PG *requires* the submission of a "plain vanilla ascii" version >>> along with any other format chosen by the user. As long as that >>> requirement is not voided, "plain vanilla ascii" is a format "picked >>> above and beyond all the others". >> >> What is bad with a required fallback option? > > That it doubles the work of producing an ebook in any useful format. Riiight. . . . Marcello would have you believe BOTH that plain text doubles the amount of work from making a markup version AND that the plain text is NOT a format of any use. Michael > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Tue Feb 13 15:12:10 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 15:12:10 -0800 (PST) Subject: [gutvol-d] entities mentioned with "public domain" In-Reply-To: <45D23893.8070507@ibiblio.org> References: <45D23893.8070507@ibiblio.org> Message-ID: Actually, this list is pretty impressive, esp. when you consider the billions of dollars of press given to each of the others and PG has never spent a total of a million, even counting when paid for by yours truly. mh On Tue, 13 Feb 2007, Michael Dyck wrote: > [Just going off on a minor tangent, thus the change of subject.] > > Bowerbird at aol.com wrote: >> >> and to this day, if you can find _any_ entity that's mentioned together >> more often in conjunction with "public domain" than p.g., let us know. > > I suppose it depends on how one defines "entity" and "mentioned > together in conjunction with", but I took the most readily useable > definition, and did some googling. Here are the hit counts: > > about 9,400,000 for "public domain" "internet" > about 5,700,000 for "public domain" "wikipedia" > about 2,340,000 for "public domain" "google" > about 1,240,000 for "public domain" "creative commons" > about 1,210,000 for "public domain" "yahoo" > about 1,190,000 for "public domain" "microsoft" > about 1,150,000 for "public domain" "youtube" > about 890,000 for "public domain" "disney" > about 819,000 for "public domain" "project gutenberg" <--- > about 729,000 for "public domain" "world wide web" > about 439,000 for "public domain" "michael hart" > about 384,000 for "public domain" "internet archive" > about 236,000 for "public domain" "copyright office" > about 228,000 for "public domain" "lawrence lessig" > about 176,000 for "public domain" "prelinger archives" > about 124,000 for "public domain" "united states congress" > about 121,000 for "public domain" "cory doctorow" > about 72,600 for "public domain" "distributed proofreaders" > about 37,600 for "public domain" "open content alliance" > > (My guess is that "internet", "wikipedia", "google", "yahoo", > "microsoft", and "youtube" would still be near the top if you replaced > "public domain" with just about anything.) > > Not conclusive, but an amusing way to kill some time. > > -Michael > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From lee at novomail.net Tue Feb 13 16:22:51 2007 From: lee at novomail.net (Lee Passey) Date: Tue, 13 Feb 2007 17:22:51 -0700 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de><1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com><45D0AD19.6030000@perathoner.de><7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com><45D20A8B.8060706@perathoner.de> Message-ID: <016f01c74fce$98667fe0$41030201@landesk.com> ----- Original Message ----- From: "Michael Hart" To: "Project Gutenberg Volunteer Discussion" Sent: Tuesday, February 13, 2007 4:08 PM Subject: Re: [gutvol-d] !@!Re: Plain Text, Hand on the Torch > > On Tue, 13 Feb 2007, Marcello Perathoner wrote: > >> Robert Marquardt wrote: >>> On Mon, 12 Feb 2007 19:08:25 +0100, you wrote: >>> >>>> But PG *requires* the submission of a "plain vanilla ascii" version >>>> along with any other format chosen by the user. As long as that >>>> requirement is not voided, "plain vanilla ascii" is a format "picked >>>> above and beyond all the others". >>> >>> What is bad with a required fallback option? >> >> That it doubles the work of producing an ebook in any useful format. > > Riiight. . . . Marcello would have you believe BOTH that plain text > doubles the amount of work from making a markup version AND that the > plain text is NOT a format of any use. 1. I think Mr. Perathoner's statement is best interpreted as "if you create a 'plain vanilla ascii' text version of a work, and then attempt to create a marked-up version of the same work, either building upon the earlier work or through independant creation, the work is effectively doubled, as the plain text version offers little usefulness which can be leveraged." It is by now obvious that if you start with a marked up version subsequent creation of a degraded ascii version is trivial -- which is one reason why a marked-up text is desirable as the base/original/preferred version, and the 'plain vanilla ascii' version should be an optional/derived version. 2. Even if both of Mr. Perathoner's statements are true at face value, they are neither inconsistent nor contradictory. Effort expended in creation is no indication of value; days, months and even years can be expended in the creation of a useless product. Indeed, it is likely that the fact that 'plain vanilla ascii' is a useless format is /why/ it requires twice as much time to create both formats: creation of a 'plain vanilla ascii' version does little to advance the creation of a richly formatted version. From Bowerbird at aol.com Tue Feb 13 16:36:20 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 19:36:20 EST Subject: [gutvol-d] Portable library builds books on spot Message-ID: leonard said: > For anyone interested, I am providing a link to > an article in the Salt Lake Tribune concerning > one way that Project Gutenberg books are being > used in practical ways. I have a text copy if the link > is not accessible, although it probably is copyrighted. > http://www.sltrib.com/news/ci_5210030 gosh, what a nice thing to bring up on a day like today... it's a good reminder of the value of project gutenberg... there are some beautiful people doing beautiful things with the fruits of the labor of thousands of volunteers... i downloaded their .pdf version of "alice in wonderland" and learned they're using gutenmark as their converter. so their books are not as good-looking as they could be, but that probably doesn't matter much to the schoolkids who just got handed their favorite book, free of charge. very uplifting. thanks for the link. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/7af7fd1c/attachment.htm From Bowerbird at aol.com Tue Feb 13 16:48:12 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 19:48:12 EST Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= Message-ID: lee said: > creation of a 'plain vanilla ascii' version does little > to advance the creation of a richly formatted version. from beauty to gobbledygook, in the blink of an eye. here are links to some plain-text z.m.l. files: >??? http://www.greatamericannovel.com/myant/myant.txt >??? http://www.greatamericannovel.com/mabie/mabie.txt >??? http://www.greatamericannovel.com/sgfhb/sgfhb.txt >??? http://www.greatamericannovel.com/tolbk/tolbk.txt >??? http://www.greatamericannovel.com/ahmmw/ahmmw.txt here are the "richly-formatted" sets of .html they auto-generated: >??? http://www.greatamericannovel.com/myant/myantp123.html >??? http://www.greatamericannovel.com/mabie/mabiep123.html >??? http://www.greatamericannovel.com/sgfhb/sgfhbp123.html >??? http://www.greatamericannovel.com/tolbk/tolbkp023.html >??? http://www.greatamericannovel.com/ahmmw/ahmmwp023.html the z.m.l. files have a _very_strong_resemblance_ to the file that is created when you do o.c.r. on the scans of a p-book. it is ridiculous to even _attempt_ to make the argument that it is difficult (in any sense) to create a "richly formatted" text. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/7400d1b7/attachment.htm From phil at thalasson.com Tue Feb 13 17:23:15 2007 From: phil at thalasson.com (Philip Baker) Date: Wed, 14 Feb 2007 01:23:15 +0000 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <1e8e65080702121803h531a4ab3mfe33d4a2084e22eb@mail.gmail.com> Message-ID: In article <1e8e65080702121803h531a4ab3mfe33d4a2084e22eb at mail.gmail.com> , Karen Lofstrom writes >On 2/12/07, Andrew Sly wrote: > >> I was thinking of the argument that plain text files can >> still be viewed and otherwise utilized on just about >> any computer you could find. > >The desirability of having a plain text version available doesn't >imply that this should be the BASE version. As long as you have a base >version, living on your server, that can generate a plain text >version. you're OK. > >Also, the plain text is there, even in XML or TEI markup. The markup >is nothing but formatting codes added to the text. It would be trivial >to strip those out, if we assume: > >1) Civilization has fallen >2) You have a 486 running on electricity from a windmill >3) You have a CD drive >4) You have a CD with an ebook library, coded in XML or TEI or HTML > >Just display the files and strip out the markup codes. Easier than >reading a palimpsest. > >Of course, for real proven durability, we'd want the clay tablet >version. In a zillion tablets. Output of your HP Clay Printer-Baker >(TM). > (1) implies not(2), not(3), and not(4) But a little more seriously, I increasingly get the feeling that all this discussion is arguing about the solution to a non-problem. That it is based on that compulsive urge to tidy things up when they are already tidy enough and when there are more important things to do. -- Philip Baker From bubblegirl at optusnet.com.au Tue Feb 13 07:24:10 2007 From: bubblegirl at optusnet.com.au (Season BubbleGirl - BubbleGirl.net) Date: Wed, 14 Feb 2007 01:54:10 +1030 Subject: [gutvol-d] noring's disgusting attempt to smear p.g. References: <00e001c74f81$1c776270$0300a8c0@blackbox> Message-ID: <002d01c74f83$03bbfc30$0a01a8c0@bubblegirl> I never really read these messages, nor respond, but had to after reading that article. Some of the DPs are getting involved in reading through the old works. I, for one, was reading the old texts looking for errors, which were then reported to PG. Unfortunately, I was too careful and I think I annoyed the person there by reporting errors I wasn't sure about. So I gave on up that. However, it doesn't mean other DPs can't do it -- just use a dictionary before reporting words you don't know! :) The only thing missing to this system is a list of the works to be assessed and a list to sign off which you are reading. Then, when the books is completed, a way to cross the title off the list. A little HTML organisation in DP itself, more than likely, to ensure we aren't re-checking books and make more use of our time. S. Season BubbleGirl International author and motivational icon hailing from Australia. Writer of A Doggy Diary (Ingram/Baker & Taylor, 0-9766-2213-0), Music Mash and autobiography, Absolute Individual: Life In a Bubble (Zeus Publications, 1-922-1837-7; Ingram/Baker & Taylor, 1-5968-2040-3) www.bubblegirl.net - where individuality truly shines ----- Original Message ----- From: "Aaron Cannon" To: "Project Gutenberg Volunteer Discussion" Sent: Wednesday, February 14, 2007 1:31 AM Subject: Re: [gutvol-d] noring's disgusting attempt to smear p.g. > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Uh-oh! Jon is being constructive again. We can't have that! :) > > Personally, I think it is a great idea and I believe it will add to, not > take away from, Project Gutenberg. I also hope DP gets involved. > > Aaron > > > - -- > Skype: cannona > MSN/Windows Messenger: cannona at hotmail.com (don't send email to the > hotmail > address.) > - ----- Original Message ----- > From: > To: ; > Sent: Tuesday, February 13, 2007 2:21 AM > Subject: [gutvol-d] noring's disgusting attempt to smear p.g. > > >> i'm disgusted by jon noring's latest attempt to smear p.g. >> >>> http://www.teleread.org/blog/?p=6174 >> >> -bowerbird >> > > > - -------------------------------------------------------------------------------- > > >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 > Comment: Key available from all major key servers. > > iD8DBQFF0dTrI7J99hVZuJcRAtI2AKDjfucDSIgXwTGzIy2BYoTj0kaApACgnhTU > MIwdQHIYEU148yn7oZOQ9YU= > =45lx > -----END PGP SIGNATURE----- > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From gbnewby at pglaf.org Tue Feb 13 19:26:18 2007 From: gbnewby at pglaf.org (Greg Newby) Date: Tue, 13 Feb 2007 19:26:18 -0800 Subject: [gutvol-d] Building your own PG (Re: Fw: Plain Text, Hand on the Torch) In-Reply-To: References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> Message-ID: <20070214032618.GB23621@mail.pglaf.org> On Tue, Feb 13, 2007 at 10:26:59AM -0800, Michael Hart wrote: > > > My statement COULD EASILY BE 100% WRONG, as Mr. Hutchinson says below, > if he would simply DO what he SAYS is so simple and trivial. > > The fact the this has NOT been done for the Project Gutenberg library > simply indicates that it is not as simply and trivial as stated. > > We would be MORE than happy to house all these various formats to > start with, against the times when the conversion programs might > not be online for whatever reasons. Definitely. Just to augment this: I can give anyone (and the Bird can attest :) big-time disk space & download locations for their own products, testing, etc. It doesn't need to be "ready for prime time" or otherwise suitable for adding to the "main" PG collection. Michael and I often talk about PG as being a collection of collections...with a goal of putting a library in everyone's pocket (or on their hard drive...whatever). Making it easier for everyone to create their own formats, for their own libraries, is a major goal. I'm not against having some guidelines & procedures for any collection, including the main PG one. But at the same time, I'm 100% ready to facilitate additional collections, as well as making it easy for interested people to set up their own procedures. -- Greg > With terabytes getting so cheap, this is getting much easier. > > Remember, one of the great reasons for Project Gutenberg's success > is that we encourage hundreds and thousands of people to house the > eBooks all over the world, not just have one central location. > > Michael > > > On Tue, 13 Feb 2007, joshua at hutchinson.net wrote: > > > NOTE: Mistakenly hit reply instead of reply all. Meant this to go out > > to the list, not just Michael. Sorry about that. > > > > > >> ----Original Message---- > >> From: hart at pglaf.org > >> > >> Each and every proposed format says conversion from their format > >> to all other formats is trivial. > >> > >> Each time I simply ask for the trivial to be done. > >> > >> Each time the trivial turns into the quadrivial, and nothing happens. > >> > > > > Just wanted to point out that this statement is absolutely, 100% > > wrong. > > > > The PGTEI tool chain generates three different txt files (UTF-8, > > Latin- > > 1, US-ASCII), a PDF file and a HTML file from one single command. > > > > tei > > > > The PGLAF.ORG server will churn for a minute or so, and then spit out > > a nice little .zip file ready for pushing to the main archive. It > > will > > even handle the occasion of having original page scan images > > available > > and zip those into the upload file as well. > > > > *** > > > > The problem with using a master format is NOT the available formats > > we > > can post to the archive. It is the learning curve and lack of really > > good tools for the novice. > > > > Josh > > > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Bowerbird at aol.com Tue Feb 13 19:56:59 2007 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Feb 2007 22:56:59 EST Subject: [gutvol-d] =?iso-8859-1?q?Building_your_own_PG_=28Re=3A=A0_Fw=3A?= =?iso-8859-1?q?=A0_Plain_Text=2C_Hand_on_the_Torch=29?= Message-ID: greg said: > I can give anyone (and the Bird can attest :) i have already done so, but confirm it again... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20070213/38844e87/attachment-0001.htm From hart at pglaf.org Tue Feb 13 20:04:48 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 20:04:48 -0800 (PST) Subject: [gutvol-d] Building your own PG (Re: Fw: Plain Text, Hand on the Torch) In-Reply-To: <20070214032618.GB23621@mail.pglaf.org> References: <16225668.1171374915620.JavaMail.?@fh1039.dia.cp.net> <20070214032618.GB23621@mail.pglaf.org> Message-ID: Wonderful!!! On Tue, 13 Feb 2007, Greg Newby wrote: > On Tue, Feb 13, 2007 at 10:26:59AM -0800, Michael Hart wrote: >> >> >> My statement COULD EASILY BE 100% WRONG, as Mr. Hutchinson says below, >> if he would simply DO what he SAYS is so simple and trivial. >> >> The fact the this has NOT been done for the Project Gutenberg library >> simply indicates that it is not as simply and trivial as stated. >> >> We would be MORE than happy to house all these various formats to >> start with, against the times when the conversion programs might >> not be online for whatever reasons. > > Definitely. > > Just to augment this: I can give anyone (and the Bird can attest :) > big-time disk space & download locations for their own products, > testing, etc. It doesn't need to be "ready for prime time" or > otherwise suitable for adding to the "main" PG collection. > > Michael and I often talk about PG as being a collection of > collections...with a goal of putting a library in everyone's > pocket (or on their hard drive...whatever). Making it easier > for everyone to create their own formats, for their own libraries, > is a major goal. > > I'm not against having some guidelines & procedures for any > collection, including the main PG one. But at the same time, > I'm 100% ready to facilitate additional collections, as well as > making it easy for interested people to set up their own procedures. > > -- Greg > > >> With terabytes getting so cheap, this is getting much easier. >> >> Remember, one of the great reasons for Project Gutenberg's success >> is that we encourage hundreds and thousands of people to house the >> eBooks all over the world, not just have one central location. >> >> Michael >> >> >> On Tue, 13 Feb 2007, joshua at hutchinson.net wrote: >> >>> NOTE: Mistakenly hit reply instead of reply all. Meant this to go out >>> to the list, not just Michael. Sorry about that. >>> >>> >>>> ----Original Message---- >>>> From: hart at pglaf.org >>>> >>>> Each and every proposed format says conversion from their format >>>> to all other formats is trivial. >>>> >>>> Each time I simply ask for the trivial to be done. >>>> >>>> Each time the trivial turns into the quadrivial, and nothing happens. >>>> >>> >>> Just wanted to point out that this statement is absolutely, 100% >>> wrong. >>> >>> The PGTEI tool chain generates three different txt files (UTF-8, >>> Latin- >>> 1, US-ASCII), a PDF file and a HTML file from one single command. >>> >>> tei >>> >>> The PGLAF.ORG server will churn for a minute or so, and then spit out >>> a nice little .zip file ready for pushing to the main archive. It >>> will >>> even handle the occasion of having original page scan images >>> available >>> and zip those into the upload file as well. >>> >>> *** >>> >>> The problem with using a master format is NOT the available formats >>> we >>> can post to the archive. It is the learning curve and lack of really >>> good tools for the novice. >>> >>> Josh >>> >>> >>> _______________________________________________ >>> gutvol-d mailing list >>> gutvol-d at lists.pglaf.org >>> http://lists.pglaf.org/listinfo.cgi/gutvol-d >>> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Tue Feb 13 20:13:38 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 20:13:38 -0800 (PST) Subject: [gutvol-d] noring's disgusting attempt to smear p.g. In-Reply-To: <002d01c74f83$03bbfc30$0a01a8c0@bubblegirl> References: <00e001c74f81$1c776270$0300a8c0@blackbox> <002d01c74f83$03bbfc30$0a01a8c0@bubblegirl> Message-ID: On Wed, 14 Feb 2007, Season BubbleGirl - BubbleGirl.net wrote: > I never really read these messages, nor respond, but had to after reading > that article. > > Some of the DPs are getting involved in reading through the old works. I, > for one, was reading the old texts looking for errors, which were then > reported to PG. Unfortunately, I was too careful and I think I annoyed the > person there by reporting errors I wasn't sure about. So I gave on up that. I've been told that half the errors reported end up the editing room floor. > However, it doesn't mean other DPs can't do it -- just use a dictionary > before reporting words you don't know! :) The only thing missing to this > system is a list of the works to be assessed and a list to sign off which > you are reading. Then, when the books is completed, a way to cross the title > off the list. A little HTML organisation in DP itself, more than likely, to > ensure we aren't re-checking books and make more use of our time. Yes, putting a note inside the book, perhaps even just one line, stating the history and date of each proofreading might be very nice. > > S. > Thanks!!! Michael > > Season BubbleGirl > > International author and motivational icon hailing from Australia. > > Writer of A Doggy Diary (Ingram/Baker & Taylor, 0-9766-2213-0), Music Mash > and autobiography, Absolute Individual: Life In a Bubble (Zeus Publications, > 1-922-1837-7; Ingram/Baker & Taylor, 1-5968-2040-3) > > www.bubblegirl.net - where individuality truly shines > ----- Original Message ----- > From: "Aaron Cannon" > To: "Project Gutenberg Volunteer Discussion" > Sent: Wednesday, February 14, 2007 1:31 AM > Subject: Re: [gutvol-d] noring's disgusting attempt to smear p.g. > > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Uh-oh! Jon is being constructive again. We can't have that! :) >> >> Personally, I think it is a great idea and I believe it will add to, not >> take away from, Project Gutenberg. I also hope DP gets involved. >> >> Aaron >> >> >> - -- >> Skype: cannona >> MSN/Windows Messenger: cannona at hotmail.com (don't send email to the >> hotmail >> address.) >> - ----- Original Message ----- >> From: >> To: ; >> Sent: Tuesday, February 13, 2007 2:21 AM >> Subject: [gutvol-d] noring's disgusting attempt to smear p.g. >> >> >>> i'm disgusted by jon noring's latest attempt to smear p.g. >>> >>>> http://www.teleread.org/blog/?p=6174 >>> >>> -bowerbird >>> >> >> >> - -------------------------------------------------------------------------------- >> >> >>> _______________________________________________ >>> gutvol-d mailing list >>> gutvol-d at lists.pglaf.org >>> http://lists.pglaf.org/listinfo.cgi/gutvol-d >>> >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.3 (MingW32) - GPGrelay v0.959 >> Comment: Key available from all major key servers. >> >> iD8DBQFF0dTrI7J99hVZuJcRAtI2AKDjfucDSIgXwTGzIy2BYoTj0kaApACgnhTU >> MIwdQHIYEU148yn7oZOQ9YU= >> =45lx >> -----END PGP SIGNATURE----- >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hart at pglaf.org Tue Feb 13 20:22:20 2007 From: hart at pglaf.org (Michael Hart) Date: Tue, 13 Feb 2007 20:22:20 -0800 (PST) Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: <016f01c74fce$98667fe0$41030201@landesk.com> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de><1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com><45D0AD19.6030000@perathoner.de><7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com><45D20A8B.8060706@perathoner.de> <016f01c74fce$98667fe0$41030201@landesk.com> Message-ID: On Tue, 13 Feb 2007, Lee Passey wrote: > ----- Original Message ----- From: "Michael Hart" To: > "Project Gutenberg Volunteer Discussion" Sent: > Tuesday, February 13, 2007 4:08 PM Subject: Re: [gutvol-d] !@!Re: Plain Text, > Hand on the Torch > > >> >> On Tue, 13 Feb 2007, Marcello Perathoner wrote: >> >>> Robert Marquardt wrote: >>>> On Mon, 12 Feb 2007 19:08:25 +0100, you wrote: >>>> >>>>> But PG *requires* the submission of a "plain vanilla ascii" version along >>>>> with any other format chosen by the user. As long as that requirement is >>>>> not voided, "plain vanilla ascii" is a format "picked above and beyond >>>>> all the others". >>>> >>>> What is bad with a required fallback option? >>> >>> That it doubles the work of producing an ebook in any useful format. >> >> Riiight. . . . Marcello would have you believe BOTH that plain text doubles >> the amount of work from making a markup version AND that the plain text is >> NOT a format of any use. > > 1. I think Mr. Perathoner's statement is best interpreted as "if you create > a 'plain vanilla ascii' text version of a work, and then attempt to create a > marked-up version of the same work, either building upon the earlier work or > through independant creation, the work is effectively doubled, as the plain > text version offers little usefulness which can be leveraged." It is by now > obvious that if you start with a marked up version subsequent creation of a > degraded ascii version is trivial -- which is one reason why a marked-up text > is desirable as the base/original/preferred version, and the 'plain vanilla > ascii' version should be an optional/derived version. > > 2. Even if both of Mr. Perathoner's statements are true at face value, they > are neither inconsistent nor contradictory. Effort expended in creation is no > indication of value; days, months and even years can be expended in the > creation of a useless product. Indeed, it is likely that the fact that 'plain > vanilla ascii' is a useless format is /why/ it requires twice as much time to > create both formats: creation of a 'plain vanilla ascii' version does little > to advance the creation of a richly formatted version. This once again bring forward the question that, after all these years, and a dozen supposed efforts to do just that, why any of the teams with such aspirations have not just gone ahead and done this on their own? After all, John Mizzi converted all 20,000 to his Mobilebooks cell format, and the only response from Mr. Perathoner has been to ignore Mr. Mizzi's requests that his efforts be treated on a par with any other volunteers. There are more such efforts that have converted or repackaged PG eBooks, pretty much the whole lot of them, to various formats, and only passing nods or comments come from this group. If individuals operating out of their homes such as John can do it, and do it for all 20,000, it should be a motivation for all of us!!! Thanks!!! Give the world eBooks in 2007!!! Michael S. Hart Founder Project Gutenberg Blog at http://hart.pglaf.org From robert_marquardt at gmx.de Tue Feb 13 21:27:28 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 14 Feb 2007 06:27:28 +0100 Subject: [gutvol-d] Plain Text, Hand on the Torch In-Reply-To: <45D22142.6070806@perathoner.de> References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <1e8e65080702121002w4b514c57l7cbc10722b554ac4@mail.gmail.com> <45D22142.6070806@perathoner.de> Message-ID: On Tue, 13 Feb 2007 21:36:18 +0100, you wrote: >A fork should always be the last resource, but if PG stays firmly >committed to the technology of the 80's, we who want to get ebooks to >the people of today will have no other choice. ibiblio will be glad to >host "ebooks TNG". Do not be frustrated. A fork is not the way to go yet. This discussion is just normal project politics. It will cool down and the problems will be addressed. The simple trick is to go forward. >PG has to speedily address these points: > > - one computer readable master format So lets define one (eh we have one already :) so simply start converting the books to it, ie add a TEI version to as many books as possible and see if it can be converted to the existing versions without loss. > - unicode as default encoding Some thing like UTF-8 for the master format should be good enough for most books. I would prefer an option to have UTF-8, UTF-16 and HTML entities for the encoding. > - multiple licenses Yep, the Creative Commons licenses need to be handled. > - one true international PG > - central database for all affiliated countries High politics. This will take time. It is needed though. It will take time. > - decentral download servers with user redirection A simple tech problem on the surface, but involves politics. > - cataloging of new books as soon as acquired This refers to the metadata about the books which is in a sad state. > - posting of page images as soon as scanned > - posting of text as soon as ocred > - continuos upgrading of text with new proofs This is binding in DP which is politics. Needs much groundwork and may not be desirable. > - online error reporting system with page images A forum would be a good start to get a community. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 13 21:33:00 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 14 Feb 2007 06:33:00 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de><1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com><45D0AD19.6030000@perathoner.de><7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com><45D20A8B.8060706@perathoner.de> <016f01c74fce$98667fe0$41030201@landesk.com> Message-ID: <2h75t2h45m70488a22vueqv4bg1sfgt9ao@4ax.com> On Tue, 13 Feb 2007 20:22:20 -0800 (PST), you wrote: >This once again bring forward the question that, after all these years, >and a dozen supposed efforts to do just that, why any of the teams with >such aspirations have not just gone ahead and done this on their own? This is the way to go. Especially for Marcello. I am involved in several Open Source projects and i found the only working way is to simply do it and have the others follow. The SF CD is a sample. I did not ask permission. I bet all those efforts have been bogged down in a discussion like this one. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 13 21:50:05 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 14 Feb 2007 06:50:05 +0100 Subject: [gutvol-d] Plucker generator on website In-Reply-To: <45D226BF.4090708@perathoner.de> References: <45D226BF.4090708@perathoner.de> Message-ID: <8h85t2h2km5i8415q1jqebdu2us1fua1j0@4ax.com> On Tue, 13 Feb 2007 21:59:43 +0100, you wrote: >Robert Marquardt wrote: > >> It seems that images do not get included into the ebooks when >> generation from the HTML. Is that intentional? > >True. This is intentional and documented. But you can download the >plucker distiller and do the conversion yourself. (The plucker distiller >is also "powering" the pg site.) I will do that. I am just a bit hampered by not knowing enough about PG yet, ie newbie. Where is it documented? I have downloaded the Plucker distiller already and i will check it today. I am a bit sick so i have time to sit at the computer. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From robert_marquardt at gmx.de Tue Feb 13 21:53:23 2007 From: robert_marquardt at gmx.de (Robert Marquardt) Date: Wed, 14 Feb 2007 06:53:23 +0100 Subject: [gutvol-d] entities mentioned with "public domain" In-Reply-To: <45D23893.8070507@ibiblio.org> References: <45D23893.8070507@ibiblio.org> Message-ID: On Tue, 13 Feb 2007 14:15:47 -0800, you wrote: >about 1,190,000 for "public domain" "microsoft" >about 890,000 for "public domain" "disney" Does not really astonish me. The seach catches "for public domain" and "against public domain" equally well. -- Robert Marquardt (Team JEDI) http://delphi-jedi.org From marcello at perathoner.de Tue Feb 13 23:00:10 2007 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 14 Feb 2007 08:00:10 +0100 Subject: [gutvol-d] !@!Re: Plain Text, Hand on the Torch In-Reply-To: References: <130BB9A1-F3F5-46F8-B6A5-B82053A777FF@uni-trier.de> <1e8e65080702120113t2ea8c40cs3b949ebad6eb447@mail.gmail.com> <45D0AD19.6030000@perathoner.de> <7tf2t21knomk73uidbj9oh9utuoiuq9m1r@4ax.com> <45D20A8B.8060706@perathoner.de> Message-ID: <45D2B37A.1010508@perathoner.de> Michael Hart wrote: > Riiight. . . . Marcello would have you believe BOTH that plain text > doubles the amount of work from making a markup version AND that the > plain text is NOT a format of any use. Not of any use if there already is a marked up version, be it in HTML, TEI or any other free and widely deployed format. -- Marcello Perathoner webmaster at gutenberg.org From schultzk at uni-trier.de Wed Feb 14 00:22:26 2007 From: schultzk at uni-trier.de (Schultz Keith J.) Date: Wed, 14 Feb 2007 09:22:26 +0100 Subject: [gutvol-d] =?iso-8859-1?q?!=40!Re=3A=A0_Plain_Text=2C_Hand_on_the?= =?iso-8859-1?q?_Torch?= In-Reply-To: References: Message-ID: Sometimes the world is crazy and seemingly contridcts itself: 1) 2) Trival and double the work. 3)