From jmdyck at ibiblio.org Mon Nov 1 00:16:45 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Mon Nov 1 00:17:06 2004 Subject: [gutvol-d] The release of PG etext #7000 Message-ID: <4185F0ED.D061AF37@ibiblio.org> For those of you who wish to celebrate the release of PG etext #7000 ("The Kalevala"), today would appear to be the day: http://www.gutenberg.org/etext/7000 -Michael Dyck From Bowerbird at aol.com Mon Nov 1 09:34:56 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 1 09:35:09 2004 Subject: [gutvol-d] linking page-scans to the text Message-ID: <8b.18fe1d2c.2eb7cdc0@aol.com> greg said: > We're planning to include the scanned page images > along with eBooks. In fact, this is part of the intent > with the new directory structure for the PG servers > (the /1/0/8/0/... structure). > > We haven't done any (or many, anyway) because > we're still trying to figure out how to best name the page files, > and how to link them on a page-by-page basis into the > (marked up?) eBooks. Jim Tinsley drafted some general guidelines > for the image files themselves, but linking them to the eBooks > is something we need to figure out still. > > (BTW, the Million Books project at archive.org uses djvu > for this purpose. It's not bad, but I like our intended solution > of XML markup much better. Plus, of course, the MBP is mostly > working with relatively poor quality proofreading. For PG, > the text has taken the main emphasis, not the appearance.) > > My notion is that the PGTEI and TEI lite solutions I've been > reading about in this list will be easily adaptable to > including links to specific page image files, so I've > not mentioned it until now. sometimes i feel like i'm talking to a wall... greg, i can give you this capability _right_now_, with your plain-text files (i.e., the whole library), if you would only make it your policy to: (1) include page-break information in the files, and (2) use a sensible and consistent naming standard; neither of these is difficult to realize in the slightest. (if you need some input on them, i'll be happy to give it.) if you'd like to see a demo program that does this -- using the page-scans and text-files over at d.p. -- say so publicly (before thursday) and i'll put one up. or continue delaying, it makes no difference to me... -bowerbird From traverso at dm.unipi.it Mon Nov 1 10:53:45 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Nov 1 10:53:54 2004 Subject: [gutvol-d] (no subject) Message-ID: <200411011853.iA1IrjMD018781@posso.dm.unipi.it> Bowerbird> greg said: >> We're planning to include the scanned page images along with >> eBooks. In fact, this is part of the intent with the new >> directory structure for the PG servers (the /1/0/8/0/... >> structure). >> >> We haven't done any (or many, anyway) because we're still >> trying to figure out how to best name the page files, and how >> to link them on a page-by-page basis into the (marked up?) >> eBooks. Jim Tinsley drafted some general guidelines for the >> image files themselves, but linking them to the eBooks is >> something we need to figure out still. >> Bowerbird> greg, i can give you this capability _right_now_, with Bowerbird> your plain-text files (i.e., the whole library), if you Bowerbird> would only make it your policy to: (1) include Bowerbird> page-break information in the files, How do you include the information in the files if it has been removed? This can at best be valid for future production. And moreover, how do you find the correct page when some material (e.g. the footnotes) has been moved, and the page contents are no longer consecutive? I have a solution of both problems for DP-produced books using the files output by DP before the post-processing stage; these files correspond to individual pages of the original book, and you can find the image corresponding to a fragment of text through a grep on the DP-file. The concept has been implemented recently by a student, and a test of 300 recently posted PG ebooks should be publicly available before the end of this week. This is a part of a system for ebook maintenance (an user can submit a proposal of correction of a text through a web page, after consulting the original images, and an administrator later can accept - or reject - the proposals and obtain automatically a corrected version). Carlo From hmacdougall at stny.rr.com Mon Nov 1 11:05:42 2004 From: hmacdougall at stny.rr.com (Hugh MacDougall) Date: Mon Nov 1 11:05:40 2004 Subject: [gutvol-d] Page Breaks References: <200411011853.iA1IrjMD018781@posso.dm.unipi.it> Message-ID: <00b301c4c045$cae75090$331a1842@Hugh> I don't often enter this discussion, though I have put a number of items by James Fenimore Cooper and Susan Fenimore Cooper on gutenberg. Today I generally put them on the James Fenimore Cooper Society website, in html, in part because of my frustration with italics and foreign accidents. However, on page breaks, I have for some time (I'm my own webmaster) adopted the practice, in putting books (not short articles) on our website, of inserting the page numbers of the original in {curly brackets} which I generally don't use for other purposes. This not only identifies the page from the original one is reading (helpful both for checking and for bibliographic reference) and, because it is surrounded by {curly brackets} is easy to search for without finding other materials. Anyhow, it's a thought. Hugh MacDougall, Secretary/Treasurer James Fenimore Cooper Society 8 Lake Street, Cooperstown, NY 13326-1016 http://www.oneonta.edu/external/cooper From Bowerbird at aol.com Mon Nov 1 11:48:59 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 1 11:49:09 2004 Subject: [gutvol-d] re: talking to the walls Message-ID: <25.5149e49d.2eb7ed2b@aol.com> carlo said: > How do you include the information in the files > if it has been removed? go back to a source and get it, that's how. where applicable, information about page-breaks can be obtained from the d.p. proofed text-files; it's a simple matter of matching up image-scans with the text they contained. (see the page that contains a table of the scans with their text-files.) that's why i offered a demo using that specifically. (nonetheless, it's positively _criminal_ that we should even have to do _anything_ to re-gain this information, since it was _willfully_ discarded. when is this bad practice going to be halted?) for books not done by distributed proofreaders, it's as easy as loading the text-file into my viewer and clicking on each word that starts a new page as you get that information by viewing a paper-copy. (my viewer will then save an updated copy of the file.) this process can be facilitated by setting the leading so the lines-per-page is equivalent to the paper-copy, making the task almost trivially easy (but still useful!). > And moreover, how do you find the correct page > when some material (e.g. the footnotes) has been moved, > and the page contents are no longer consecutive? footnotes are easy. (my viewer displays them on the page where they are called anyway, so there's no problem there.) and if you point me to some examples of the other "material" that is moved, i'll be happy to tell you how i'd deal with that. > I have a solution of both problems for DP-produced books > using the files output by DP before the post-processing stage; right. > these files correspond to individual pages of the original book, > and you can find the image corresponding to a fragment of text > through a grep on the DP-file. that's one way of doing it. but why not run the process systematically, one time, restoring the page-break information in the text-files, and incorporating the ability to grab the image-scans -- automatically and simply -- using that information. i'm sure you know that the eyes of most users glaze over when you start talking about "grep". besides, what needs to be done is to _thoroughly_incorporate_ the error-reporting process _into_ the end-user's reading-experience, so as to maximize the eyeballs of all the people reading the e-texts. it's just a shame that -- at the same time readers are condemning the e-texts because "they are full of errors" -- practically _nothing_ is being done to harness their ability to _catch_ and _report_ errors. > The concept has been implemented recently by a student, > and a test of 300 recently posted PG ebooks should be > publicly available before the end of this week. This is > a part of a system for ebook maintenance (an user can > submit a proposal of correction of a text through a web page, > after consulting the original images, and an administrator later > can accept - or reject - the proposals and obtain automatically > a corrected version). sounds like a process i described in great detail months ago here. i'm glad somebody is programming it for you guys, because i'll be leaving here shortly. but i intend to write the app anyway, because users who want to grab content from the million-book-project will need it to turn those scans into nicely-proofed and formatted text... -bowerbird From Bowerbird at aol.com Mon Nov 1 12:09:20 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Nov 1 12:09:42 2004 Subject: [gutvol-d] Page Breaks Message-ID: <96.18f280d1.2eb7f1f0@aol.com> hugh said: > However, on page breaks, I have for some time (I'm my own webmaster) > adopted the practice, in putting books (not short articles) on our website, > of inserting the page numbers of the original in {curly brackets} which I > generally don't use for other purposes. This not only identifies the page > from the original one is reading (helpful both for checking and for > bibliographic reference) and, because it is surrounded by {curly brackets} > is easy to search for without finding other materials. that's a pretty good strategy. it's still fairly obtrusive on the reading experience -- we need to recognize most users don't want to see this info -- but i could live with that method if it became the policy. what i would do, with my viewer-program, is to "vanish" them; that is, i'd display them only if the user specified to show them, and even then, i'd move them out to the margins to be discreet... what i suggest, instead, for a standard for project gutenberg, would be to use some of the under-32 ascii-characters to indicate the various types of page-breaks, so they would be invisible to an ordinary end-user with an ordinary text-viewer, while a savvy viewer-program would be able to discern them and show them to the occasional reader that might want them. also -- and i'm sorry i haven't mentioned this up until now -- i think it's very important that these page-break indicators not gunk up the text. again, for the average user, they will be a nuisance in most cases, and we need to minimize that nuisance. for instance, consider the current practice in the .html versions coming out of distributed proofreaders with page-number info. even when the page-number display is moved out to the margin, with c.s.s., they are still there right in the middle of the text! so when a person selects a range of text including a page-number, and copies it out of the browser-window, boom, that page-number is sitting right in the middle of it. and it's a hassle to get rid of it. (and, as far as i know, that's the case even when you have elected to "turn off" display of the page-numbers, but i could be wrong on that.) one of the basic aspects of project gutenberg e-texts has always been that you could easily copy out the text and repurpose it, and i believe that is an important asset to protect... -bowerbird From bkeir at pgdp.net Mon Nov 1 20:24:43 2004 From: bkeir at pgdp.net (bkeir@pgdp.net) Date: Mon Nov 1 20:25:00 2004 Subject: [gutvol-d] pglaf.org settings might block some messages In-Reply-To: <20041101071016.GA30421@pglaf.org> References: <20041101043133.GA26281@pglaf.org> <20041101071016.GA30421@pglaf.org> Message-ID: <21029.203.11.112.2.1099369483.squirrel@203.11.112.2> I had repeated bounces of the following message, as described. This was sent to catalog twice and help once... Hi Sorry, I know this isn't the correct address, but this mail has bounced twice now from catalog AT pglaf.org My original message was: Hi Perhaps Long, William Joseph (1866 - 1952) http://www.gutenberg.net/catalog/world/authrec?fk_authors=744 and Long, William J. (1866 - 1952) http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505 are the same person? Cheers! Bill Here's the second bounce report: Your message did not reach some or all of the intended recipients. Subject: FW: Duplicate author? Sent: 27/09/2004 12:51 PM The following recipient(s) could not be reached: 'catalog@pglaf.org' on 29/09/2004 12:51 PM The message was undeliverable because the recipient specified in the recipient postal address was not known at this address The MTS-ID of the original message is: c=AU;a= ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573 From sly at victoria.tc.ca Tue Nov 2 10:43:17 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Nov 2 10:43:26 2004 Subject: [gutvol-d] pglaf.org settings might block some messages In-Reply-To: <21029.203.11.112.2.1099369483.squirrel@203.11.112.2> References: <20041101043133.GA26281@pglaf.org> <20041101071016.GA30421@pglaf.org> <21029.203.11.112.2.1099369483.squirrel@203.11.112.2> Message-ID: >From the catalog point of view, this has been settled, but it does bring up a point... I've seen in wikipedia and a few other places URLs constructed like the two below, using an "author number". After amalgamating the two author records below, one of them will no longer link to William Joseph Long. So this is just a warning that URLs formed like this are not necessarily permanant. Andrew On Tue, 2 Nov 2004 bkeir@pgdp.net wrote: > I had repeated bounces of the following message, as described. This was > sent to catalog twice and help once... > > > Hi > > Sorry, I know this isn't the correct address, but this mail has bounced > twice now from catalog AT pglaf.org > > My original message was: > > Hi > > Perhaps > > Long, William Joseph (1866 - 1952) > > http://www.gutenberg.net/catalog/world/authrec?fk_authors=744 > > and > > Long, William J. (1866 - 1952) > > http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505 > > are the same person? > > Cheers! > > Bill > > Here's the second bounce report: > > > Your message did not reach some or all of the intended recipients. > > Subject: FW: Duplicate author? > Sent: 27/09/2004 12:51 PM > > The following recipient(s) could not be reached: > > 'catalog@pglaf.org' on 29/09/2004 12:51 PM > The message was undeliverable because the recipient specified > in the recipient postal address was not known at this address > The MTS-ID of the original message is: c=AU;a= > ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573 > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joel at oneporpoise.com Tue Nov 2 11:06:45 2004 From: joel at oneporpoise.com (Joel A. Erickson) Date: Tue Nov 2 11:20:41 2004 Subject: [gutvol-d] author lookup [was: pglaf.org settings might block some messages] References: <20041101043133.GA26281@pglaf.org><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2> Message-ID: <000601c4c10f$189d2ac0$6601a8c0@JOEL> I'm assuming numbers are used because it's easier on the programming side. But wouldn't it be easier for the users if it was the name instead. It probably has its downsides, but if author lookup was based on a name, then when the name was modified, the system could just look for the closest match(es). Or, I suppose, the author number could be forwarded, 3505 -> 744. On a side note, the cookies for personalizing the PG skin seem to teminate rather quickly. Usually they last less than a day, it seems. Has anyone else noticed this? Perhaps I should ask Marcello. Joel ----- Original Message ----- From: "Andrew Sly" To: "Project Gutenberg Volunteer Discussion" Sent: Tuesday, November 02, 2004 10:43 AM Subject: Re: [gutvol-d] pglaf.org settings might block some messages > >>From the catalog point of view, this has been settled, but it does bring > up a point... > > I've seen in wikipedia and a few other places URLs constructed like the > two below, using an "author number". After amalgamating the two > author records below, one of them will no longer link to > William Joseph Long. > > So this is just a warning that URLs formed like this are not necessarily > permanant. > > Andrew > > On Tue, 2 Nov 2004 bkeir@pgdp.net wrote: > >> I had repeated bounces of the following message, as described. This was >> sent to catalog twice and help once... >> >> >> Hi >> >> Sorry, I know this isn't the correct address, but this mail has bounced >> twice now from catalog AT pglaf.org >> >> My original message was: >> >> Hi >> >> Perhaps >> >> Long, William Joseph (1866 - 1952) >> >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=744 >> >> and >> >> Long, William J. (1866 - 1952) >> >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505 >> >> are the same person? >> >> Cheers! >> >> Bill >> >> Here's the second bounce report: >> >> >> Your message did not reach some or all of the intended recipients. >> >> Subject: FW: Duplicate author? >> Sent: 27/09/2004 12:51 PM >> >> The following recipient(s) could not be reached: >> >> 'catalog@pglaf.org' on 29/09/2004 12:51 PM >> The message was undeliverable because the recipient specified >> in the recipient postal address was not known at this address >> The MTS-ID of the original message is: c=AU;a= >> ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573 >> >> >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Tue Nov 2 11:35:27 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Nov 2 11:35:39 2004 Subject: [gutvol-d] author lookup [was: pglaf.org settings might blocksome messages] Message-ID: <20041102193528.14AD99E775@ws6-2.us4.outblaze.com> But what happens when you have two different authors that share the same name? It will happen (if it hasn't already). Using a forward from a deprecate author number is probably not a bad idea... Josh ----- Original Message ----- From: "Joel A. Erickson" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] author lookup [was: pglaf.org settings might blocksome messages] Date: Tue, 2 Nov 2004 11:06:45 -0800 > > I'm assuming numbers are used because it's easier on the programming side. > But wouldn't it be easier for the users if it was the name instead. It > probably has its downsides, but if author lookup was based on a name, then > when the name was modified, the system could just look for the closest > match(es). Or, I suppose, the author number could be forwarded, 3505 -> 744. > > On a side note, the cookies for personalizing the PG skin seem to teminate > rather quickly. Usually they last less than a day, it seems. Has anyone else > noticed this? Perhaps I should ask Marcello. > > Joel > > ----- Original Message ----- > From: "Andrew Sly" > To: "Project Gutenberg Volunteer Discussion" > Sent: Tuesday, November 02, 2004 10:43 AM > Subject: Re: [gutvol-d] pglaf.org settings might block some messages > > > > > >>From the catalog point of view, this has been settled, but it does bring > > up a point... > > > > I've seen in wikipedia and a few other places URLs constructed like the > > two below, using an "author number". After amalgamating the two > > author records below, one of them will no longer link to > > William Joseph Long. > > > > So this is just a warning that URLs formed like this are not necessarily > > permanant. > > > > Andrew > > > > On Tue, 2 Nov 2004 bkeir@pgdp.net wrote: > > > >> I had repeated bounces of the following message, as described. This was > >> sent to catalog twice and help once... > >> > >> > >> Hi > >> > >> Sorry, I know this isn't the correct address, but this mail has bounced > >> twice now from catalog AT pglaf.org > >> > >> My original message was: > >> > >> Hi > >> > >> Perhaps > >> > >> Long, William Joseph (1866 - 1952) > >> > >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=744 > >> > >> and > >> > >> Long, William J. (1866 - 1952) > >> > >> http://www.gutenberg.net/catalog/world/authrec?fk_authors=3505 > >> > >> are the same person? > >> > >> Cheers! > >> > >> Bill > >> > >> Here's the second bounce report: > >> > >> > >> Your message did not reach some or all of the intended recipients. > >> > >> Subject: FW: Duplicate author? > >> Sent: 27/09/2004 12:51 PM > >> > >> The following recipient(s) could not be reached: > >> > >> 'catalog@pglaf.org' on 29/09/2004 12:51 PM > >> The message was undeliverable because the recipient specified > >> in the recipient postal address was not known at this address > >> The MTS-ID of the original message is: c=AU;a= > >> ;p=Matrikon;l=EXCHANGE-NCS-040927025037Z-16573 > >> > >> > >> > >> _______________________________________________ > >> gutvol-d mailing list > >> gutvol-d@lists.pglaf.org > >> http://lists.pglaf.org/listinfo.cgi/gutvol-d > >> > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From hyphen at hyphenologist.co.uk Tue Nov 2 12:13:39 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Nov 2 12:13:59 2004 Subject: [gutvol-d] Test from Dave F In-Reply-To: <41768369.6050204@perathoner.de> References: <20041020135750.11303.qmail@web41728.mail.yahoo.com> <41768369.6050204@perathoner.de> Message-ID: Test -- Dave F From joshua at hutchinson.net Tue Nov 2 13:26:05 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Tue Nov 2 13:26:13 2004 Subject: [gutvol-d] PG TEI Message-ID: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com> Good afternoon, everyone! I have a few things I want to try to get a consensus on as to HOW we want to handle some aspects of the PG TEI master document. Some of the questions will pertain to specific people (such as the automatic inclusion of the PG header/footer) and some will pertain to everyone interested in PG TEI. The TEI master I am using for the basis of this discussion is available at http://home.alltel.net/hutch2000/sunny/start.xml. So, without further ado! 1 - Currently, Marcello's online converter (TEI -> HTML) automatically adds a PG standard header and footer. (http:\\home.alltel.net/hutch2000/sunny/sunny.html) It looks nicer (to my eyes anyway) than the monospaced header and footer that the whitewashers currently use. However, is this a "bad thing" in the eyes of the whitewashers? As near as I can tell, the only people of information that will need to be manually added by the whitewashers is the EBook number that is assigned to this text. If this is placed in the TEI master, then it is automatically put into the HTML version when it is run through the TEI -> HTML converter. If this is an "ok thing" but needs some work ... what needs changed? Jim, you're a vocal whitewasher! Rip this apart! (This question also includes suggestions for style improvements to the header/footer, too.) *** 2 - The version I have posted above has two rather significant CSS changes from the style used in Marcello's converter. A) The margins have been set to 10% whitespace on the right and left. This is a fairly arbitrary number arrived at because it is the "defacto" standard at DP. Suggestions/comments? B) The paragraph markup has been changed back to HTML standard. Marcello's original style more closely resembles TeX formatting, where there is no white space between paragraphs and each paragraph is indented. This was jarring to me, hence the change. Again, suggestions/comments? The rest of the style is as Marcello's converter made it. It is a bit verbose by some people's standards (almost everything has a class attribute), but this can be a very good thing because it now allows CSS to affect the layout/look of nearly every aspects of the document. *** 3 - The TEI master uses rend="indent" markup in the poetry. This validates fine, but currently the TEI -> HTML converter basically ignores the indent markup. What I want to address here is how we want to have those indents converted. TEI master markup: "I thank the goodness and the grace That on my birth have smiled, And made me in these Christian days A happy English child." Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use). Pro: Degrades gracefully on non-CSS enabled browsers like Lynx. Con: Treats the indent as content. Option #2 - Convert the rend="indent" markup to CSS markup equivalent (my mind is going blank right now or I'd give an example). Option #3 - Any other ideas how to handle this? *** 4 - I used markup for blockquotes. This looks fine to me. However, in previous discussions, some people did not like the rend="display" for this purpose. As far as I am concerned, it works and doesn't seem to be a problem, but I'm willing to hear opposing arguments. *** 5 - I used to indicate a blank line of text (commonly called a thoughtbreak over at DP). Marcello's documentation indicates this isn't what it is truly meant for, though. Anyone see a problem with this implementation? Or see an improvement we should use instead? *** 6 - This work has a small example of drama markup. It is very simple markup (verse with no partial lines), but it seems to work well. I don't have any problems with it, but I also know that my experience with drama markup is extremely limited. Any suggestions/concerns? *** 7 - The only other thing I can remember that was at all out of the ordinary with this text was the retention of small caps. I used the rend="sc" markup and it worked just as I expected it to in the TEI -> HTML converter. Any suggestions/comments/improvements? *** I'm sure I'll remember something on the way home tonight that I forgot to mention, but that's what I can think of right now for discussion. I'm looking forward to everyone's input. Josh From marcello at perathoner.de Tue Nov 2 13:46:12 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Nov 2 13:46:24 2004 Subject: [gutvol-d] author lookup [was: pglaf.org settings might block some messages] In-Reply-To: <000601c4c10f$189d2ac0$6601a8c0@JOEL> References: <20041101043133.GA26281@pglaf.org><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2> <000601c4c10f$189d2ac0$6601a8c0@JOEL> Message-ID: <41880024.1020002@perathoner.de> Joel A. Erickson wrote: > I'm assuming numbers are used because it's easier on the programming > side. But wouldn't it be easier for the users if it was the name > instead. It probably has its downsides, but if author lookup was based > on a name, then when the name was modified, the system could just look > for the closest match(es). Or, I suppose, the author number could be > forwarded, 3505 -> 744. The canonical url for linking to an author is http://www.gutenberg.org/author/Mark_Twain This is described in http://www.gutenberg.org/howto-link > On a side note, the cookies for personalizing the PG skin seem to > teminate rather quickly. Usually they last less than a day, it seems. > Has anyone else noticed this? Perhaps I should ask Marcello. They should terminate a year from issue. Maybe your browser limits the duration to one session? -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Tue Nov 2 13:58:51 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Nov 2 13:59:02 2004 Subject: [gutvol-d] PG TEI In-Reply-To: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com> References: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com> Message-ID: <4188031B.5020205@perathoner.de> Joshua Hutchinson wrote: > The rest of the style is as Marcello's converter made it. It is a > bit verbose by some people's standards (almost everything has a class > attribute), but this can be a very good thing because it now allows > CSS to affect the layout/look of nearly every aspects of the > document. Everything has a class attribute because this way you can use the generated html in a web site -- eg. for an online reader -- and the book and site style will not clash. TODO: all generated styles should have the same prefix: pgtei. > 3 - The TEI master uses rend="indent" markup in the poetry. This > validates fine, but currently the TEI -> HTML converter basically > ignores the indent markup. What I want to address here is how we > want to have those indents converted. I'm working on implementing indent and a few other rend attribute gimmicks. It will understand rend="indent" and rend="indent(n)" where n can be any positive or negative number. > 5 - I used to indicate a blank line of text (commonly called a > thoughtbreak over at DP). Marcello's documentation indicates this > isn't what it is truly meant for, though. Anyone see a problem with > this implementation? Or see an improvement we should use instead? is meant to record line breaks in a certain edition like , not to output ones. To get a thought break enclose both "thoughts" in .
1.

...

...

-- Marcello Perathoner webmaster@gutenberg.org From joel at oneporpoise.com Tue Nov 2 19:25:55 2004 From: joel at oneporpoise.com (Joel A. Erickson) Date: Tue Nov 2 19:25:50 2004 Subject: [gutvol-d] author lookup References: <20041101043133.GA26281@pglaf.org><20041101071016.GA30421@pglaf.org><21029.203.11.112.2.1099369483.squirrel@203.11.112.2> <000601c4c10f$189d2ac0$6601a8c0@JOEL> <41880024.1020002@perathoner.de> Message-ID: <001901c4c154$d44ebc80$6601a8c0@JOEL> Marcello Perathoner wrote: > The canonical url for linking to an author is > > http://www.gutenberg.org/author/Mark_Twain But why couldn't that lead to the author record, instead of search results. >From a user point of view, since http://www.gutenberg.org/etext/12345 leads to the the etext record 12345, according to some reasoning the Mark Twain link should lead to the Mark Twain author record. If there is no direct match, then it should be forwarded to the search results. Joel From scott_bulkmail at productarchitect.com Tue Nov 2 20:02:28 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Tue Nov 2 21:37:10 2004 Subject: [gutvol-d] PG TEI In-Reply-To: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com> References: <20041102212605.3B1482F8B6@ws6-3.us4.outblaze.com> Message-ID: >1 - Currently, Marcello's online converter (TEI -> HTML) automatically adds a PG standard header and footer. (http:\\home.alltel.net/hutch2000/sunny/sunny.html) It looks nicer (to my eyes anyway) than the monospaced header and footer that the whitewashers currently use. One advantage of monospaced: it clearly distinguishes the long PG footer from the book's content. One could instead use a smaller size and sans serif font. (Personally, I would prefer omitting the license and just including a link.) Also, as noted in http://classicosm.com/xml/feedbackonpgtei.html: In the PG license, section numbers such as "1.A." should appear on the same line as the text that follows -- per the original and to avoid wasting space. > A) The margins have been set to 10% whitespace on the right and left. This is a fairly arbitrary number arrived at because it is the "defacto" standard at DP. Suggestions/comments? Looks good to me. > B) The paragraph markup has been changed back to HTML standard. As you say, it's the HTML standard and thus appropriate for the default CSS. > The rest of the style is as Marcello's converter made it. It is a bit verbose by some people's standards (almost everything has a class attribute), but this can be a very good thing because it now allows CSS to affect the layout/look of nearly every aspects of the document. A few notes based on a quick look: - class=dgp does seem to be overused. - span class="hi" style="font-variant: small-caps;" is a bit much; how about span class="smallCaps"? I also hate that the HTML is wrapped at 78 (or whatever) chars. I suppose few people will edit the output, but it seems like a wasteful throwback. Don't people have editors that wrap text??? >3 - The TEI master uses rend="indent" markup in the poetry. This validates fine, but currently the TEI -> HTML converter basically ignores the indent markup. What I want to address here is how we want to have those indents converted. > > TEI master markup: > > >"I thank the goodness and the grace >That on my birth have smiled, >And made me in these Christian days >A happy English child." > > > Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use). Pro: Degrades gracefully on non-CSS enabled browsers like Lynx. Con: Treats the indent as content. I think the XHTML version should be completely modern, e.g. here's one way to indent using CSS: .indent {margin-left:40px; margin-right:40px} There are benefits to an "old fashioned HTML" version, but let's make that a different file, probably 4.01 transitional. >4 - I used markup for blockquotes. This looks fine to me. However, in previous discussions, some people did not like the rend="display" for this purpose. As far as I am concerned, it works and doesn't seem to be a problem, but I'm willing to hear opposing arguments. The issue as I understand it: q is for words spoken, quote is for text attributed to an outside source. Either may occur inline or set off in an indented block. So, a long "speech" by a character should (I think) be . I think the TEI tags and explanation are confusing, but that's perhaps a different issue. >5 - I used to indicate a blank line of text (commonly called a thoughtbreak over at DP). Marcello's documentation indicates this isn't what it is truly meant for, though. Anyone see a problem with this implementation? Or see an improvement we should use instead? Marcello suggested that a closing and opening div creates a blank line; I'm not convinced that's a good idea in general. ===== Misc. questions: * The following looks like a (minor) error: Letter XVII

LETTER XVII.

The latter looks redundant. * Was the italics in the original here?

Andover, May 30, 1854.

* Does the original really have several pages with no paragraph breaks? -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From joshua at hutchinson.net Wed Nov 3 05:30:07 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 3 05:30:11 2004 Subject: [gutvol-d] PG TEI Message-ID: <20041103133007.BB1C04F441@ws6-5.us4.outblaze.com> ----- Original Message ----- From: Scott Lawton > > > Also, as noted in http://classicosm.com/xml/feedbackonpgtei.html: In the PG license, section numbers such as "1.A." should appear on the same line as the text that follows -- per the original and to avoid wasting space. > I agree. I'll work on an updated footer to forward on to Marcello. > > > A few notes based on a quick look: > - class=dgp does seem to be overused. > - span class="hi" style="font-variant: small-caps;" is a bit much; how about span class="smallCaps"? > You've got a point. I'll add a todo item to go through the default style and make it more intuitive (ie change the class names where appropriate) and create classes for things like small caps. > I also hate that the HTML is wrapped at 78 (or whatever) chars. I suppose few people will edit the output, but it seems like a wasteful throwback. Don't people have editors that wrap text??? > That you can blame on me. I used Tidy to rewrap everything because of the lots and lots of playing I did with the source code. Marcello's converter leaves the line breaks that were in the original XML source in place. > > Option #1 - Convert the rend="indent" markup to & emsp ; & emsp ; (remove spaces for use). Pro: Degrades gracefully on non-CSS enabled browsers like Lynx. Con: Treats the indent as content. > > I think the XHTML version should be completely modern, e.g. here's one way to indent using CSS: > .indent {margin-left:40px; margin-right:40px} > > There are benefits to an "old fashioned HTML" version, but let's make that a different file, probably 4.01 transitional. > I'm hoping for more discussion on this. I've had fairly heated discussion at DP on it. > > > > >5 - I used to indicate a blank line of text (commonly called a thoughtbreak over at DP). Marcello's documentation indicates this isn't what it is truly meant for, though. Anyone see a problem with this implementation? Or see an improvement we should use instead? > > Marcello suggested that a closing and opening div creates a blank line; I'm not convinced that's a good idea in general. > I don't like it definitely from the point of view of having to create the markup. > ===== > > Misc. questions: > > * The following looks like a (minor) error: > Letter XVII >

LETTER XVII.

> > The latter looks redundant. > My bad. Must have done a copy/paste into the head tag instead of a cut/paste there. > * Was the italics in the original here? > >

Andover, May 30, 1854.

> I went by the text provided by DP on this. I didn't check closely to the original on this type of thing. > * Does the original really have several pages with no paragraph breaks? Yep. Makes for easy reading, huh? ;) Josh From Bowerbird at aol.com Wed Nov 3 12:33:31 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 3 12:33:48 2004 Subject: [gutvol-d] thoroughly depressed, and off to collect thoughts in blog Message-ID: well, i am thoroughly depressed by the specter of four more years of george w. anyway, i will be heading out of here shortly -- off to collect my thoughts and criticisms in a blog, instead of attempting to share them with you here, to be free of the incessant swirl of noise and flack that my detractors here seem to love to throw around me -- so does anyone have any questions for me before i leave? -bowerbird From nwolcott2 at kreative.net Thu Nov 4 11:30:12 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Thu Nov 4 13:21:27 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders Message-ID: <006201c4c2b4$2d233380$0e9495ce@net> A new team Jules Verne 2005 has been set up at DP to help get Verne online by the 100th anniversary in March 2005. DP members can either join the team or read and post messages without joining at http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353 Team members can assist with a number of activities connected with the project. Bi-lingual persons are especially needed for the french texts, and clarification of obscure references. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041104/5f3c8921/attachment.html From jeroen at bohol.ph Thu Nov 4 14:34:50 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Thu Nov 4 14:33:48 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders In-Reply-To: <006201c4c2b4$2d233380$0e9495ce@net> References: <006201c4c2b4$2d233380$0e9495ce@net> Message-ID: <418AAE8A.5040107@bohol.ph> Norm Wolcott wrote: > A new team Jules Verne 2005 has been set up at DP to help get Verne > online by the 100th anniversary in March 2005. DP members can either > join the team or read and post messages without joining at > http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353 > We are also planning to do a number of works of Jules Verne in Dutch translations, hopefully to go on the wave... Jeroen Hellingman. From hyphen at hyphenologist.co.uk Thu Nov 4 19:07:02 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Nov 4 19:07:33 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders In-Reply-To: <006201c4c2b4$2d233380$0e9495ce@net> References: <006201c4c2b4$2d233380$0e9495ce@net> Message-ID: On Thu, 4 Nov 2004 14:30:12 -0500, "Norm Wolcott" wrote: | This is a multi-part message in MIME format. | | --===============1330105761== | Content-Type: multipart/alternative; | boundary="----=_NextPart_000_0053_01C4C27A.CB04BAE0" | | This is a multi-part message in MIME format. | | ------=_NextPart_000_0053_01C4C27A.CB04BAE0 And so looks a shambles on Agent set to a80. -- Dave F From shalesller at writeme.com Thu Nov 4 21:20:49 2004 From: shalesller at writeme.com (D. Starner) Date: Thu Nov 4 23:36:48 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders Message-ID: <20041105052049.923344BE64@ws1-1.us4.outblaze.com> Dave Fawthrop writes: > And so looks a shambles on Agent set to a80. MIME has been a standard - an RFC - for quite some time now. I'm not sure that mail agents that don't support that should still be a big concern. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From nwolcott2 at kreative.net Fri Nov 5 06:13:27 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 5 06:29:43 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders References: <006201c4c2b4$2d233380$0e9495ce@net> <418AAE8A.5040107@bohol.ph> Message-ID: <006501c4c343$d5fefea0$2d9495ce@net> If you can send copyright clearance and a single file text and html that would speed things up immeasurably! nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Jeroen Hellingman" To: "Project Gutenberg Volunteer Discussion" Sent: Thursday, November 04, 2004 5:34 PM Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders > Norm Wolcott wrote: > > > A new team Jules Verne 2005 has been set up at DP to help get Verne > > online by the 100th anniversary in March 2005. DP members can either > > join the team or read and post messages without joining at > > http://www.pgdp.net/c/stats/teams/tdetail.php?tid=353 > > > > We are also planning to do a number of works of Jules Verne in Dutch > translations, hopefully to go on the wave... > > Jeroen Hellingman. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott2 at kreative.net Fri Nov 5 06:15:05 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 5 06:29:43 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders References: <006201c4c2b4$2d233380$0e9495ce@net> Message-ID: <006601c4c343$d6c86920$2d9495ce@net> What do you think happened to the message? nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Dave Fawthrop" To: "Project Gutenberg Volunteer Discussion" Sent: Thursday, November 04, 2004 10:07 PM Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders > On Thu, 4 Nov 2004 14:30:12 -0500, "Norm Wolcott" > wrote: > > | This is a multi-part message in MIME format. > | > | --===============1330105761== > | Content-Type: multipart/alternative; > | boundary="----=_NextPart_000_0053_01C4C27A.CB04BAE0" > | > | This is a multi-part message in MIME format. > | > | ------=_NextPart_000_0053_01C4C27A.CB04BAE0 > > And so looks a shambles on Agent set to a80. > > -- > Dave F > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott2 at kreative.net Fri Nov 5 06:16:39 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 5 06:29:45 2004 Subject: [gutvol-d] New Jules Verne Team at Dist Proofreaders References: <20041105052049.923344BE64@ws1-1.us4.outblaze.com> Message-ID: <006701c4c343$d79974c0$2d9495ce@net> The message went to other sites ok. looks like none of my posts are getting through why? nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "D. Starner" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, November 05, 2004 12:20 AM Subject: Re: [gutvol-d] New Jules Verne Team at Dist Proofreaders > Dave Fawthrop writes: > > > And so looks a shambles on Agent set to a80. > > MIME has been a standard - an RFC - for quite some time > now. I'm not sure that mail agents that don't support that > should still be a big concern. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott2 at kreative.net Fri Nov 5 06:28:27 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 5 06:29:53 2004 Subject: [gutvol-d] Test message why not going through Message-ID: <006901c4c343$dcf845e0$2d9495ce@net> This is a test message which gets scrambled somehow on the way why? nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041105/df54f5c5/attachment.html From hyphen at hyphenologist.co.uk Fri Nov 5 08:05:24 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Nov 5 08:05:46 2004 Subject: [gutvol-d] Test message why not going through In-Reply-To: <006901c4c343$dcf845e0$2d9495ce@net> References: <006901c4c343$dcf845e0$2d9495ce@net> Message-ID: <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com> On Fri, 5 Nov 2004 09:28:27 -0500, "Norm Wolcott" wrote: | This is a test message which gets scrambled somehow on the way why? | | nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest Because it had an attachment, and so gets deleted by spam filters? -- Dave F From nwolcott2 at kreative.net Sat Nov 6 07:30:51 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Nov 6 08:12:11 2004 Subject: [gutvol-d] Test message why not going through References: <006901c4c343$dcf845e0$2d9495ce@net> <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com> Message-ID: <00b101c4c41b$505c3dc0$5b9495ce@net> The message had no attachments. the Mime attachments must have been automatically generated as the message was sent. I have sent other test messages, none arrive. I believe someone has removed me from the listserve. I recieved 3 messages saying my messages have been received at lists dot pglaf dot org . But none of them have showed up in my mailbox, nor have the been returned to me by a bot. Who is in charge of the listserve now that g newby has moved? nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Dave Fawthrop" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, November 05, 2004 11:05 AM Subject: Re: [gutvol-d] Test message why not going through > On Fri, 5 Nov 2004 09:28:27 -0500, "Norm Wolcott" > wrote: > > | This is a test message which gets scrambled somehow on the way why? > | > | nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest > > Because it had an attachment, and so gets deleted by spam filters? > > -- > Dave F > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Sat Nov 6 10:32:26 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 6 10:32:35 2004 Subject: [gutvol-d] Test message why not going through In-Reply-To: <00b101c4c41b$505c3dc0$5b9495ce@net> References: <006901c4c343$dcf845e0$2d9495ce@net> <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com> <00b101c4c41b$505c3dc0$5b9495ce@net> Message-ID: <418D18BA.1080707@perathoner.de> Norm Wolcott wrote: > I have sent other test messages, none arrive. I believe someone has removed > me from the listserve. I recieved 3 messages saying my messages have been > received at lists dot pglaf dot org . But none of them have showed up in my > mailbox, nor have the been returned to me by a bot. Go to lists.pglaf.org and change your settings. There is an option you must set if you want to get your own messages. -- Marcello Perathoner webmaster@gutenberg.org From hyphen at hyphenologist.co.uk Sat Nov 6 11:06:32 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Nov 6 11:06:52 2004 Subject: [gutvol-d] Test message why not going through In-Reply-To: <418D18BA.1080707@perathoner.de> References: <006901c4c343$dcf845e0$2d9495ce@net> <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com> <00b101c4c41b$505c3dc0$5b9495ce@net> <418D18BA.1080707@perathoner.de> Message-ID: On Sat, 06 Nov 2004 19:32:26 +0100, Marcello Perathoner wrote: | Norm Wolcott wrote: | | > I have sent other test messages, none arrive. I believe someone has removed | > me from the listserve. I recieved 3 messages saying my messages have been | > received at lists dot pglaf dot org . But none of them have showed up in my | > mailbox, nor have the been returned to me by a bot. | | Go to lists.pglaf.org and change your settings. | | There is an option you must set if you want to get your own messages. Now that is a totally *daft* idea. -- Dave F From hyphen at hyphenologist.co.uk Sat Nov 6 11:05:15 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sat Nov 6 11:07:49 2004 Subject: [gutvol-d] Test message why not going through In-Reply-To: <00b101c4c41b$505c3dc0$5b9495ce@net> References: <006901c4c343$dcf845e0$2d9495ce@net> <419no09ge87dkrogjmb2t77t31u7d6e8n9@4ax.com> <00b101c4c41b$505c3dc0$5b9495ce@net> Message-ID: On Sat, 6 Nov 2004 10:30:51 -0500, "Norm Wolcott" wrote: | The message had no attachments. the Mime attachments must have been | automatically generated as the message was sent. When it got to me it had an *html* attachment. Had my spam trap system not caught it as something interesting and put it in my Gutenberg Directory, It would have gone straight into trash with all the other html rubbish. Had I seen it in the inbox, I would have punched "delete" on seeing the html bit without reading the subject line. 400 spam per day here. :-( Only plain text emails get through reliably. -- Dave F From sly at victoria.tc.ca Tue Nov 9 10:23:05 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Nov 9 10:23:12 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: == Resent message; It was bounced the first time == On Tue, 9 Nov 2004, Steve Thomas wrote: > This was all well and good, and eventually we ended up with > around 3,800 records for PG titles in our catalogue. > > However, the advent of DP put paid to all that. The volume of > works appearing each month very quickly overwhelmed me, and I > was forced to abandon the effort, so that an unfortunate side > effect of DP was that I could no longer add MARC records to our > catalogue. I believe something like this is also faced by John Mark Ockerbloom, who maintains the Online Books page. He has cataloged a large portion of PG, as well as thousands of online books from other sources. However, as you say, one person cannot keep up with the increasing number of old books being digitized. > I believe that recent changes and enhancements to the PG archive > may make a similar effort possible once more. First, I am told > that there is now an XML file of the PG database, and that this > contains much more and better detail than the old GUTINDEX list. I would qualify this with a "yes, but..." Yes, this does exist (see the link Greg gave, or here's a link directly to the compressed rdf file: http://www.gutenberg.org/feeds/catalog.rdf.bz2) But, as is PG custom, it has its own inconsistancies. All new records are generated automatically from information in the headers of newly posted files (and this is not always accurate) Many older records were copied from the old catalog from promo.net, which sometimes had "interesting" variations. Many records have additional information such as subject headings LOC classifications and sometimes other material of bibliographical interest in a "notes" field. But many records have only very basic information. Additional information is generally added when one of the volunteers who has write access to the catalog takes an interest in looking it up. So this happens somewhat irregularly. Taken all together, the PG online catalog does present plently of information that can help people interact with the collection in meaningful ways; but it may make professional librarians roll their eyes. > Second, PG now has a neater way of accessing texts, > using a simple URL like http://www.gutenberg.org/etext/1234 > Previously, one could only link directly to the individual files > in the archive, and this complicated matters, since every title > has at least two files (.txt and .zip) and often there are > multiple versions and formats. Yes. In my own opinion, the ability to do this is perhaps the best thing to have happened for PG in the last year. This provides a much more ideal way to link to a PG title from any place such as newsgroups, websites, catalogs, whatever. (Thanks Marcello!) This also makes it easier to present selections from PG, organized by whatever criteria you choose. (eg, Marcello's list of "Top 100" downloads, my list of Canadiana.) All of this only encourages more exposure for PG, and a greater chance that some computer user will come across (perhaps by accident) a PG text that interests him. > Of course, one has to ask whether the effort of creating and > *maintaining* catalogue records for PG is worth while. We live > in the age of Google, and it is a lament frequently heard from > librarians that the user is more often likely to search the 'net > with Google than to use the Library catalogue. I believe the effort is worth while. Good cataloging can lead to a user finding an item of interest that may have been missed otherwise. And yes, google does index the PG "bibrec" pages, so any additional work done in cataloging could possibly lead to a text being found from someone searching with google. > However, redundancy is no bad thing with information, and the > more ways of getting at it the better -- so long as those ways > remain accurate. So I believe many libraries would welcome the > chance to load marc records pointing at PG texts -- provided > that they can be sure the record contents are accurate and the > links remain so. At this point in time, I would say a good deal of manual tweaking would be needed to get a result that would be somewhat satisfactory for librarians. Links should not be a problem, as the canonical URLs discussed above show every sign of being much more permanent than most. Andrew From marcello at perathoner.de Tue Nov 9 11:06:20 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue Nov 9 11:06:34 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: References: Message-ID: <4191152C.9080702@perathoner.de> Andrew Sly wrote: > Taken all together, the PG online catalog does present plently > of information that can help people interact with the collection > in meaningful ways; but it may make professional librarians > roll their eyes. The design philosophy of the catalog database is: To help people find a book they may want to read. That includes both, people who already know which book they want and people who want a suggestion. The catalog database was not designed to be a tool for professionals. But this doesn't mean that I'm not willing to add some functions to help them out, so long as those functions don't get in the way of the primary functionality. Producing MARC records out of existing catalog entries seems to be a pretty forward thing. Importing other people's MARC into our database will be much hairier. -- Marcello Perathoner webmaster@gutenberg.org From aakman at csufresno.edu Tue Nov 9 11:26:21 2004 From: aakman at csufresno.edu (Alev Akman) Date: Tue Nov 9 11:26:29 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <4191152C.9080702@perathoner.de> References: <4191152C.9080702@perathoner.de> Message-ID: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> At 11:06 AM 11/9/2004, you wrote: >Andrew Sly wrote: > >>Taken all together, the PG online catalog does present plently >>of information that can help people interact with the collection >>in meaningful ways; but it may make professional librarians >>roll their eyes. > >The design philosophy of the catalog database is: > > To help people find a book they may want to read. > >That includes both, people who already know which book they want and >people who want a suggestion. > >The catalog database was not designed to be a tool for professionals. But >this doesn't mean that I'm not willing to add some functions to help them >out, so long as those functions don't get in the way of the primary >functionality. > >Producing MARC records out of existing catalog entries seems to be a >pretty forward thing. Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole thing would be in place by now. On the other hand, PG database may not be capable of the Z39.50 imports but there are many MANY (if not all!) library cataloging software packages that will do it in a short time. The advantage of importing from the existing catalog entries is that we have our pick of what fits our needs for especially the subject fields. Of course there is always work to edit and customize them for the PG user database. I don't see why we can't have a commercial software to do most of the work and keep the existing catalog as a backup. And for the record, I have been involved in the PG cataloging effort for more than six years and anyone who says I am not interested in it any more is clearly not aware of the full facts. It may be quite disappointing when one's years of volunteer efforts have been deleted with the "new improvements"! Alev. an "official" librarian > Importing other people's MARC into our database will be much hairier. > > > >-- >Marcello Perathoner >webmaster@gutenberg.org > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From lynne at rhodesresearch.biz Tue Nov 9 18:11:28 2004 From: lynne at rhodesresearch.biz (Lynne Anne Rhodes) Date: Tue Nov 9 18:10:21 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> Message-ID: <200411091911.28485.lynne@rhodesresearch.biz> I'm new around here so please forgive me if I go over old ground. I have subscribed to the RSS Recently Posted or Updated feeds and it is truely amazing to see the way the entries roll in every night. However, it is frustrating to see when one of the entries is opened up there is little information apart from the author (with or without dates) and a title. In most cases I have no idea what the book is about and whether I am interested in it. I, and I am sure many others would love to see a bit more detail such as the original date of publication and a brief synopsis of the work. Obviously to enter such information day after day with such a rush of material is far beyond the resources of a small group of volunteers, however, dedicated. Would it not be possible to devise a distributed cataloguing system followng along the model of DP. For each book "in the frame" a form would be provided with spaces for the required items. When these were completed (and checked) the data would then be transferred, in an agreed format--MARC or otherwise,--to a file held within the books directory tree. In many cases this information is provided at the time of proofreadng and then it seems to be lost. Obviously some of the infomation might be easy to complete such as book or serial. However other fields might need research such as key dates, author bio etc. Also a meaningful synopsis would mean most likely reading the text or abstracting a portion from another work. I could also see that multilingual versions might be needed. I would think there are many who would rise to the challenge of helping in such an endevour, Lynne On Tuesday 09 November 2004 12:26 pm, Alev Akman wrote: > At 11:06 AM 11/9/2004, you wrote: > >Andrew Sly wrote: > >>Taken all together, the PG online catalog does present plently > >>of information that can help people interact with the collection > >>in meaningful ways; but it may make professional librarians > >>roll their eyes. > > > >The design philosophy of the catalog database is: > > > > To help people find a book they may want to read. > > > >That includes both, people who already know which book they want and > >people who want a suggestion. > > > >The catalog database was not designed to be a tool for professionals. But > >this doesn't mean that I'm not willing to add some functions to help them > >out, so long as those functions don't get in the way of the primary > >functionality. > > > >Producing MARC records out of existing catalog entries seems to be a > >pretty forward thing. > > Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole > thing would be in place by now. > > On the other hand, PG database may not be capable of the Z39.50 imports but > there are many MANY (if not all!) library cataloging software packages that > will do it in a short time. The advantage of importing from the existing > catalog entries is that we have our pick of what fits our needs for > especially the subject fields. Of course there is always work to edit and > customize them for the PG user database. > > I don't see why we can't have a commercial software to do most of the work > and keep the existing catalog as a backup. > > And for the record, I have been involved in the PG cataloging effort for > more than six years and anyone who says I am not interested in it any more > is clearly not aware of the full facts. It may be quite disappointing when > one's years of volunteer efforts have been deleted with the "new > improvements"! > > Alev. > an "official" librarian > > > Importing other people's MARC into our database will be much hairier. > > > > > > > >-- > >Marcello Perathoner > >webmaster@gutenberg.org > > > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > > > >--- > >Incoming mail is certified Virus Free. > >Checked by AVG anti-virus system (http://www.grisoft.com). > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From shalesller at writeme.com Tue Nov 9 15:07:38 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Nov 9 20:33:48 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> Marcello Perathoner writes: > The design philosophy of the catalog database is: > > To help people find a book they may want to read. It does a pretty horrid job at that, then. If you don't know what you're looking for, it's very hard to find it. One step might be making the list of LoC classifications available, so you can scroll down to the list of histories. When I'm looking for something to read, I often look for a list of science-fiction or mysteries. Being in a college library, I miss the spine stickers loudly identifying the genre of the fiction. PG's catalog has nothing in that direction. Another thing I will do is to browse the stacks. I guess if the LoC classifications are available, that would be possible. The thing I would honestly like is the Amazon-style "if you liked this, you might like ...". I don't mean to be harsh in this email, but I'm having a real hard time believing your statement, because the catalog so badly sucks at it. Not that most of the library catalogs I've dealt with have been good at it, but it's never been stated as a design philosophy for them. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From sly at victoria.tc.ca Tue Nov 9 21:23:16 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Nov 9 21:23:37 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> Message-ID: On Tue, 9 Nov 2004, D. Starner wrote: > > When I'm looking for something to read, I often look > for a list of science-fiction or mysteries. Being in > a college library, I miss the spine stickers loudly > identifying the genre of the fiction. PG's catalog > has nothing in that direction. If it helps, I've assembled a small list of PG books that would fall under the heading of science fiction. I haven't done anything with it yet, as I feel it's rather on the small side, and surely misses many of the examples which we have. Another catagory that could be of interest to some is cook books, of which there are now quite a decent number in PG. Andrew From shalesller at writeme.com Tue Nov 9 19:46:32 2004 From: shalesller at writeme.com (D. Starner) Date: Tue Nov 9 21:27:07 2004 Subject: [gutvol-d] draft TEI conventions and larger example file Message-ID: <20041110034632.5655F164005@ws1-4.us4.outblaze.com> writes: > Isn't sco gaelic? No. sco has the name "Scots"; gd has the name "Scottish Gaelic". Since they're distinct codes, sco must be the germanic language. > I use foreign exclusively as a holder for the lang attribute. My problem with that, is it means there's no way to transform a document such that the foreign words are marked differently from the emphasized words, or not marked at all. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From traverso at dm.unipi.it Tue Nov 9 22:08:18 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Tue Nov 9 22:08:36 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <200411091911.28485.lynne@rhodesresearch.biz> (message from Lynne Anne Rhodes on Tue, 9 Nov 2004 19:11:28 -0700) References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <200411091911.28485.lynne@rhodesresearch.biz> Message-ID: <200411100608.iAA68I6P016938@posso.dm.unipi.it> >>>>> "Lynne" == Lynne Anne Rhodes writes: Lynne> I, and I am sure many others would love to see a bit more Lynne> detail such as the original date of publication and a brief Lynne> synopsis of the work. Obviously to enter such information Lynne> day after day with such a rush of material is far beyond Lynne> the resources of a small group of volunteers, however, Lynne> dedicated. DP would be delighted of preserving these data. Most books that pass through DP are accompanied by a small html page that describes the author, the book, etc; and the data on the original book are preserved in proofreading, and often deleted in post-processing. We have also discussed keeping a catalogue of our books, with this kind of additional information. One of the problems is copyright: most of the info on the author are taken from sources that could not resist a clearance procedure (i.e. are raided from other sites). So this cannot be integrate with the PG catalogue; but might build the core of an added-value site that maintains a PG catalogue adding information and classification data. The PG catalogue remains authoritative and terse, but you can get additional features. Exactly as with many etexts, for which sites exist that add formats for PG ebooks. The first step however is to have better PG records, and a method to avoid losing information from DP to the PG catalogue. Carlo From Bowerbird at aol.com Tue Nov 9 23:42:11 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Nov 9 23:42:35 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <6EBFB319.08D194C6.023039A8@aol.com> david starner said: > It does a pretty horrid job at that hey, before i skip on out of here, i get to agree with david for once. the catalog just ain't gonna help someone know what kind of book they'd like to read. (and a marc record won't help them either.) that's a job that collaborative filtering will eventually do much better than anything you can do in the form of a catalog of any type. (and the collaborative filtering that amazon uses is absolutely primitive compared to what it could be.) just make e-texts. clean, consistent, clear e-texts. do that, and the rest will take care of itself... -bowerbird From marcello at perathoner.de Wed Nov 10 01:12:39 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 01:13:01 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> Message-ID: <4191DB87.7020904@perathoner.de> D. Starner wrote: > It does a pretty horrid job at that, then. If you > don't know what you're looking for, it's very hard > to find it. One step might be making the list of > LoC classifications available, so you can scroll down > to the list of histories. We already have LoC class as a search criterium. What we lack is the data. Are you volunteering to type the data in ? -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Nov 10 02:29:14 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 02:29:36 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> Message-ID: <4191ED7A.3020203@perathoner.de> Alev Akman wrote: > Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole > thing would be in place by now. Nobody requested that feature before. And, to be exact, nobody is requesting that feature now. Its just some of us *think* that libraries could use that. As a rule, I'm not putting work into features that maybe nobody will use. > On the other hand, PG database may not be capable of the Z39.50 imports > but there are many MANY (if not all!) library cataloging software > packages that will do it in a short time. The advantage of importing > from the existing catalog entries is that we have our pick of what fits > our needs for especially the subject fields. Of course there is always > work to edit and customize them for the PG user database. > > I don't see why we can't have a commercial software to do most of the > work and keep the existing catalog as a backup. - Does it provide web access for users? - For catalogers? - How much will an unlimited worldwide public access license cost? - Will it run on Linux/Apache? - Will it manage our files? - Will it provide download links for the files? - Do we get the source code to adapt it to our particular needs? I think any commercial library-use-oriented catalog software will fall far short of what we have now. We don't need so much of a catalog system. What we need is a web shop system ? la Amazon. But I have my doubts they will give us theirs. The problems with MARC are: - the standard is not free. - the records are not free. - the technology is obsolete I don't know what the copyright status of the LoC MARC records is. They are an US government agency, so they should be free. But do we know? To request a MARC record I have to implement an obscure Z39.50 protocol. And I get back a record full of numeric codes that I have to look up before knowing what they are. Why can't I simply post a HTTP request and get an XML/RDF answer? Which MARC record should we import for a book. If you search thru the LoC catalog you'll find many examples of works that have got different MARC subject classifications for the different copies held by the LoC. LoC class codes have shifted semantically over the years. What was XY in 1970 will not necessarily be XY in 2000. So you'll have to keep the LoC class code, the year the classification was made and the list of class codes that was authoritative in that year. Of course same goes for Dewey etc. > And for the record, I have been involved in the PG cataloging effort for > more than six years and anyone who says I am not interested in it any > more is clearly not aware of the full facts. I didn't say that. I said Greg and me wanted to get you as manager of the catalog team but last time I mailed Greg about it he said he got no answer from you. Your last post on this list was on 3/18. > It may be quite > disappointing when one's years of volunteer efforts have been deleted > with the "new improvements"! I don't know of any data that has willfully been deleted. Please give an example. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Nov 10 02:33:48 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 02:34:11 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> Message-ID: <4191EE8C.7030109@perathoner.de> Andrew Sly wrote: > If it helps, I've assembled a small list of PG books that > would fall under the heading of science fiction. > I haven't done anything with it yet, as I feel it's > rather on the small side, and surely misses many of > the examples which we have. > > Another catagory that could be of interest to some > is cook books, of which there are now quite a decent number > in PG. You could add a subject "Science Fiction" or "Cooking" entry to all those books. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Nov 10 02:38:47 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 02:39:10 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <200411100608.iAA68I6P016938@posso.dm.unipi.it> References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <200411091911.28485.lynne@rhodesresearch.biz> <200411100608.iAA68I6P016938@posso.dm.unipi.it> Message-ID: <4191EFB7.6080203@perathoner.de> Carlo Traverso wrote: > The first step however is to have better PG records, and a method to > avoid losing information from DP to the PG catalogue. If you put a complete ... somewhere in the files, maybe at the back where it won't hurt much, I can easily pick it out and parse it into the database. Of course it has to stay in the file after being posted. What is happening now is that I parse the tiny header at the top of the file and I get just what's there. -- Marcello Perathoner webmaster@gutenberg.org From M.J.Farmer at bham.ac.uk Wed Nov 10 03:02:16 2004 From: M.J.Farmer at bham.ac.uk (Malcolm Farmer) Date: Wed Nov 10 03:35:56 2004 Subject: [gutvol-d] Re: catalog data In-Reply-To: <4191EE8C.7030109@perathoner.de> References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> <4191EE8C.7030109@perathoner.de> Message-ID: <4191F538.5030201@bham.ac.uk> Marcello Perathoner wrote: > Andrew Sly wrote: > >> If it helps, I've assembled a small list of PG books that >> would fall under the heading of science fiction. >> I haven't done anything with it yet, as I feel it's >> rather on the small side, and surely misses many of >> the examples which we have. >> >> Another catagory that could be of interest to some >> is cook books, of which there are now quite a decent number >> in PG. > > > You could add a subject "Science Fiction" or "Cooking" entry to all > those books. Is there a simple process for doing this? For historical novels, there's a book in PG, "A Guide to the Best Historical Novels and Tales" (#1359) which lists hundreds of such, also listing their time and place settings: an ever-increasing number of the titles are in PG, so it would make an interesting project for someone with write access to the catalog data to go through the listing and add this classification to those titles. At present, there are only six titles in the catalog categorised as historical fiction. Distributed Proofreaders projects are classified under various headings (history, cooking, children's etc. ) when they're first started: it may be worth working out with DP a way of passing that data on to the catalog when the work is submitted. That only covers DP books, and probably doesn't match proper library classifications, but it should help in giving some information to the prospective reader. From marcello at perathoner.de Wed Nov 10 04:06:52 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 04:07:19 2004 Subject: [gutvol-d] Re: catalog data In-Reply-To: <4191F538.5030201@bham.ac.uk> References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> <4191EE8C.7030109@perathoner.de> <4191F538.5030201@bham.ac.uk> Message-ID: <4192045C.4070908@perathoner.de> Malcolm Farmer wrote: >> You could add a subject "Science Fiction" or "Cooking" entry to all >> those books. > > Is there a simple process for doing this? First you have to agree with Andrew on the subject headings you want to tackle. Then you can build an ASCII-list like this: Subject: Cooking 1234 2345 3456 Subject: Science Fiction 7777 8888 9999 The numbers are the etext numbers. I will then import that data into the database. That's the easiest way to get a *lot* of data into the catalog. -- Marcello Perathoner webmaster@gutenberg.org From joshua at hutchinson.net Wed Nov 10 05:25:48 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 10 05:25:52 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com> I've heard this suggestion before, but I think it bears repeating... This sounds an awful lot like a Wikipedia entry. I think a loose partnership between us and wikipedia would be useful here. We have the book and a link could be made from the catalog page to that books entry in Wikipedia. Then, we have a place to put all sorts of information about the book, the author, where it was published, the historical context it was conceived in, ... just about anything someone wants to add. Wikipedia's hyperlinked nature would also allow someone looking up information on, say, 20,000 Leagues Under the Sea, to get information on all sorts of related Jules Verne books and even perhaps other Science Fiction books in the PG collection ... provided someone takes the time to enter the information. This doesn't solve any catalog problems we may have, but it does address some of the concerns raised by Lynne. And, the only change needed on our side would be a link to the wikipedia article on each book (something that could be implemented piecemeal as someone makes a wikipedia article available). Thoughts? Josh ----- Original Message ----- From: Lynne Anne Rhodes To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] [BP] The Future of eBooks Date: Tue, 9 Nov 2004 19:11:28 -0700 > > I'm new around here so please forgive me if I go over old ground. > > I have subscribed to the RSS Recently Posted or Updated feeds and it is truely > amazing to see the way the entries roll in every night. However, it is > frustrating to see when one of the entries is opened up there is little > information apart from the author (with or without dates) and a title. In > most cases I have no idea what the book is about and whether I am interested > in it. > > I, and I am sure many others would love to see a bit more detail such as the > original date of publication and a brief synopsis of the work. Obviously to > enter such information day after day with such a rush of material is far > beyond the resources of a small group of volunteers, however, dedicated. > > Would it not be possible to devise a distributed cataloguing system followng > along the model of DP. For each book "in the frame" a form would be provided > with spaces for the required items. When these were completed (and checked) > the data would then be transferred, in an agreed format--MARC or > otherwise,--to a file held within the books directory tree. In many cases > this information is provided at the time of proofreadng and then it seems to > be lost. > > Obviously some of the infomation might be easy to complete such as book or > serial. However other fields might need research such as key dates, author > bio etc. Also a meaningful synopsis would mean most likely reading the text > or abstracting a portion from another work. I could also see that > multilingual versions might be needed. I would think there are many who would > rise to the challenge of helping in such an endevour, > > Lynne > > > > On Tuesday 09 November 2004 12:26 pm, Alev Akman wrote: > > At 11:06 AM 11/9/2004, you wrote: > > >Andrew Sly wrote: > > >>Taken all together, the PG online catalog does present plently > > >>of information that can help people interact with the collection > > >>in meaningful ways; but it may make professional librarians > > >>roll their eyes. > > > > > >The design philosophy of the catalog database is: > > > > > > To help people find a book they may want to read. > > > > > >That includes both, people who already know which book they want and > > >people who want a suggestion. > > > > > >The catalog database was not designed to be a tool for professionals. But > > >this doesn't mean that I'm not willing to add some functions to help them > > >out, so long as those functions don't get in the way of the primary > > >functionality. > > > > > >Producing MARC records out of existing catalog entries seems to be a > > >pretty forward thing. > > > > Obviously it is not an _easy_ pretty forward thing! Otherwise, the whole > > thing would be in place by now. > > > > On the other hand, PG database may not be capable of the Z39.50 imports but > > there are many MANY (if not all!) library cataloging software packages that > > will do it in a short time. The advantage of importing from the existing > > catalog entries is that we have our pick of what fits our needs for > > especially the subject fields. Of course there is always work to edit and > > customize them for the PG user database. > > > > I don't see why we can't have a commercial software to do most of the work > > and keep the existing catalog as a backup. > > > > And for the record, I have been involved in the PG cataloging effort for > > more than six years and anyone who says I am not interested in it any more > > is clearly not aware of the full facts. It may be quite disappointing when > > one's years of volunteer efforts have been deleted with the "new > > improvements"! > > > > Alev. > > an "official" librarian > > > > > Importing other people's MARC into our database will be much hairier. > > > > > > > > > > > >-- > > >Marcello Perathoner > > >webmaster@gutenberg.org > > > > > >_______________________________________________ > > >gutvol-d mailing list > > >gutvol-d@lists.pglaf.org > > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > > > > > > > > >--- > > >Incoming mail is certified Virus Free. > > >Checked by AVG anti-virus system (http://www.grisoft.com). > > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Wed Nov 10 05:42:26 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 05:42:30 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com> References: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com> Message-ID: <41921AC2.20107@perathoner.de> Joshua Hutchinson wrote: > This sounds an awful lot like a Wikipedia entry. I think a loose > partnership between us and wikipedia would be useful here. We have > the book and a link could be made from the catalog page to that books > entry in Wikipedia. We already have many links into wikipedia from the author pages. I could implement that functionality for the bibrec pages too. Still somebody has to enter all the links... -- Marcello Perathoner webmaster@gutenberg.org From M.J.Farmer at bham.ac.uk Wed Nov 10 05:48:22 2004 From: M.J.Farmer at bham.ac.uk (Malcolm Farmer) Date: Wed Nov 10 06:20:46 2004 Subject: [gutvol-d] Re: catalog data In-Reply-To: <4192045C.4070908@perathoner.de> References: <20041109230738.8CB1F4BE64@ws1-1.us4.outblaze.com> <4191EE8C.7030109@perathoner.de> <4191F538.5030201@bham.ac.uk> <4192045C.4070908@perathoner.de> Message-ID: <41921C26.9090100@bham.ac.uk> Marcello Perathoner wrote: > Malcolm Farmer wrote: > >>> You could add a subject "Science Fiction" or "Cooking" entry to all >>> those books. >> >> >> Is there a simple process for doing this? > > > First you have to agree with Andrew on the subject headings you want > to tackle. Then you can build an ASCII-list like this: > > > Subject: Cooking > 1234 > 2345 > 3456 > [snip] > The numbers are the etext numbers. I will then import that data into > the database. That's the easiest way to get a *lot* of data into the > catalog. Oh, right then. It really *is* simple. In that case I'd be happy to volunteer to look up the numbers for the historical fiction texts listed in the bibliography I mentioned. That won't cover every book in this category (post-1900 works, for example, will be missing), but it should considerably expand that category's listing. From sly at victoria.tc.ca Wed Nov 10 07:14:54 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Nov 10 07:14:59 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com> References: <20041110132548.EC56D9E7E7@ws6-2.us4.outblaze.com> Message-ID: On Wed, 10 Nov 2004, Joshua Hutchinson wrote: > This sounds an awful lot like a Wikipedia entry. I think a loose partnership between us and wikipedia would be useful here. We have the book and a link could be made from the catalog page to that books entry in Wikipedia. Then, we have a place to put all sorts of information about the book, the author, where it was published, the historical context it was conceived in, ... just about anything someone wants to add. Yes, I think that in some ways Project Gutenberg and Wikipedia can complement each other very well. As I see it, PG is about preserving the original content of the printed material, and Wikipedia appears to be an ideal place for all that extra information that we may have. As someone (I believe Carlo) has mentioned, very often the people involved in the scanning and digitizing of texts have more knowlege about the author, the text itself, etc., which could be passed on to either the PG online catalog or Wikipedia, as appropriate. In the last few months, I have added countless links between Wikipedia and the PG online catalog, sometimes creating new Wikipedia articles for authors I think worthy of mention. However, it's still only a small portion of what could be done. Anyone else interested? Andrew From Jeroen.Hellingman at kabelfoon.nl Wed Nov 10 08:11:52 2004 From: Jeroen.Hellingman at kabelfoon.nl (Jeroen.Hellingman@kabelfoon.nl) Date: Wed Nov 10 08:11:59 2004 Subject: [gutvol-d] draft TEI conventions and larger example file Message-ID: <20041110161152.95CC555639@betazoid.kabelfoon.nl> Op 10-11-2004 04:46, schreef jij: > writes: > > > I use foreign exclusively as a holder for the lang attribute. > > My problem with that, is it means there's no way to transform a > document such that the foreign words are marked differently from > the emphasized words, or not marked at all. Well, the issue of course is that you first have to tag foreign words as foreign, and emphasized words as emphasized, now what if a foreign word is emphasized? The original typography will not always allow you to distinguish those cases. That is the core reason for me using instead of or or something semantic: I don't always know the intended semantic. However, it is fairly easy to, for example, color all German words in a text green, based on the value of the lang attribute, or to print in roman all italic words in a certain language. Jeroen From tb at baechler.net Wed Nov 10 08:27:22 2004 From: tb at baechler.net (Tony Baechler) Date: Wed Nov 10 08:26:13 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <4191152C.9080702@perathoner.de> References: Message-ID: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> At 08:06 PM 11/9/2004 +0100, you wrote: >The design philosophy of the catalog database is: > > To help people find a book they may want to read. > >That includes both, people who already know which book they want and >people who want a suggestion. Hello list. Sorry if I seem to be complaining, but I must say that I find the current PG catalog to be mostly useless. I should qualify that. I can easily search through GUTINDEX.ALL to find a certain title or author. I've found that grep works great for that. However, there are no clues anywhere that tell me what a book is about, whether it's mystery, drama, nonfiction or something else, or even a basic subject classification. I admit that some of this might be found by using the search form or the gutenberg.org/etext1234 url, but from the standpoint of a user who is in a hurry and just wants something to read it's still inconvenient. Let's pick a random example of something which has been recently discussed. http://gutenberg.org/etext/1473 First, the link for in-depth information takes you to the volunteer pages. This is misleading since it looks like I would be able to find more information on the book. More than once I have followed that link only to find myself in the wrong place and I had to go back in my browser. Second, let's look at the subject. All it says is "fiction." OK, but about what? What category of fiction? While bookshare.org has a catalog not designed for professionals either, most books have a synopsis and are sorted by category. I have a possible suggestion for solving part of this. Put something in the newsletter asking people who read PG etexts to write summaries of them and categorize them. Somehow create a form which only allows books to be reviewed or summarized, maybe like a wiki but more confined. Someone would still manually approve the summary ("good" isn't helpful) and add it to the catalog. That would at least give the end user some idea of what a book is about first. Just for clarity, I would suggest that this summary, synopsis, categorization etc. would show up on the etext/1234 page and be added to the rdf feed but not appear in GUTINDEX.ALL. From marcello at perathoner.de Wed Nov 10 08:44:37 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 08:44:44 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> Message-ID: <41924575.2030708@perathoner.de> Tony Baechler wrote: > Second, let's look at the subject. All it says is "fiction." > OK, but about what? What category of fiction? While bookshare.org has > a catalog not designed for professionals either, most books have a > synopsis and are sorted by category. Everybody is complaining about the missing subject information. Complaining won't help. Stepping up and volunteering to enter the data would help. -- Marcello Perathoner webmaster@gutenberg.org From aakman at csufresno.edu Wed Nov 10 08:59:20 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 08:59:25 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <41924575.2030708@perathoner.de> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> <41924575.2030708@perathoner.de> Message-ID: <6.1.2.0.2.20041110085132.08917318@zimmer.csufresno.edu> At 08:44 AM 11/10/2004, you wrote: >Tony Baechler wrote: > >>Second, let's look at the subject. All it says is "fiction." >>OK, but about what? What category of fiction? While bookshare.org has a >>catalog not designed for professionals either, most books have a synopsis >>and are sorted by category. > >Everybody is complaining about the missing subject information. > >Complaining won't help. Stepping up and volunteering to enter the data >would help. Maybe if the computer people stuck to "computering" and listened to how the library world does it? After all, the library sytems and conventions have been in place for a while. And, Marcello, my dear, don't give me that line about not having been on the list since 3/18. Just because I don't believe in the diarrhea of the mouth like some people we know : ) does not mean I do not care! It would be good if the people who know the technical side would listen to library requirements (whether _they_ think MARC records are needed, or not!) once in a while. Otherwise, PG will be sentenced to being a whoever, whatever kind of project. Alev. >-- >Marcello Perathoner >webmaster@gutenberg.org > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From joshua at hutchinson.net Wed Nov 10 09:09:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 10 09:09:57 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Alev Akman > > Maybe if the computer people stuck to "computering" and listened to how the > library world does it? After all, the library sytems and conventions have > been in place for a while. Great! Answer my earlier question then. What fields should be mandatory for our and which fields should be optional? ie, Author, title, Original publisher = mandatory. Optional? Author birth/death dates? Which printing of the original source we derived from? Others? I'm not a librarian. I need someone knowledgeable to answer these questions. Josh PS If we define good teiHeader information for each work, it becomes a much simpler task for Marcello's cataloging scripts to find all sorts of fun information for the reader. From sly at victoria.tc.ca Wed Nov 10 09:24:14 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Nov 10 09:24:21 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> Message-ID: Something very similar to this has been attempted before, with rather dismal results. Hardly anyone seemed interested in writing a little synopsis (or "blurb") On a few records in the online catalog, you will see a link labled "Reviews" which contain these. Many of them are actually only brief excerpts from the text in question. Andrew On Wed, 10 Nov 2004, Tony Baechler wrote: > I have a possible suggestion for solving part of this. Put something in > the newsletter asking people who read PG etexts to write summaries of them > and categorize them. Somehow create a form which only allows books to be > reviewed or summarized, maybe like a wiki but more confined. Someone would > still manually approve the summary ("good" isn't helpful) and add it to the > catalog. That would at least give the end user some idea of what a book is > about first. Just for clarity, I would suggest that this summary, > synopsis, categorization etc. would show up on the etext/1234 page and be > added to the rdf feed but not appear in GUTINDEX.ALL. From sly at victoria.tc.ca Wed Nov 10 09:32:23 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Nov 10 09:32:30 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <41924575.2030708@perathoner.de> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> <41924575.2030708@perathoner.de> Message-ID: On Wed, 10 Nov 2004, Marcello Perathoner wrote: > Everybody is complaining about the missing subject information. > > Complaining won't help. Stepping up and volunteering to enter the data > would help. I don't believe we are ready. There is right now no agreement about what form this data would take, or what standard to try to comply with. If various volunteers all get to enter their own idea of what catagories and subject headings appeal to them, we will end up with a mish-mash of conflicting and overlapping data. I am no expert here, but I have read enough to know that doing subject cataloging _well_ is more involved most people realise. Andrew From brad at chenla.org Wed Nov 10 09:56:33 2004 From: brad at chenla.org (Brad Collins) Date: Wed Nov 10 09:58:29 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <6EBFB319.08D194C6.023039A8@aol.com> (Bowerbird@aol.com's message of "Wed, 10 Nov 2004 02:42:11 -0500") References: <6EBFB319.08D194C6.023039A8@aol.com> Message-ID: Bowerbird@aol.com writes: > david starner said: > the catalog just ain't gonna help someone > know what kind of book they'd like to read. > (and a marc record won't help them either.) > > that's a job that collaborative filtering > will eventually do much better than anything > you can do in the form of a catalog of any type. > But this won't be of any help to brick and mortar libraries who want to integrate PG etexts into their existing catalogs. MARC is best way to accomplish this. This would also let PG to offer a Z39.50 gateway to the catalog which would be very cool. I like the distributed cataloging idea, but it's not the same as DP or Wikipedia which are brilliant and making it as simple and easy to contribute as possible. Cataloging is not simple and it's not easy, and if it's not correct and consistent it will result in a mess which will do more harm than good. That said, there are a number of steps in the process that can't be easily automated which can be done in a distributed environment by people fairly easily, but these steps have to be identified and then a mechanism for people to contribute. The catalog as it stands represents a lot more effort than a lot of people realize. I hope people keep that in mind when they slam the existing catalog. b/ -- Brad Collins , Bangkok, Thailand From Bowerbird at aol.com Wed Nov 10 10:10:16 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 10 10:10:25 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <128.4f68b436.2ec3b388@aol.com> i offered, many months ago, to shepard a system that would provide a wiki for every e-text, which would contain all the info about that e-text (in a part of the wiki that was "locked") such as its change-log, and also allow users to write summaries and comments, make error-reports, and hold discussions. nobody provided the disk-space i would've needed to make it work. steve sakoman built a system of his own to do something similar. nobody provided him with the support he needed to make it work. now it appears that the gutvol-d merry-go-round has once again come to this spot as it runs around in circles. at this time, though, i'm heading out of here, because nothing seems to get done here, other than throwing more and more inconsistent e-texts on the pile. (and i might, suggest, humbly, that a pause in _that_ machinery to rethink and plan might be a very good idea at this point in time.) all the good suggestions being made now have been made before, but the process for implementing them seems to be badly broken. if you want to fix _anything_, you will need to fix _that_ first... a wiki for each e-text is still a good idea, but someone else will have to step up and take responsibility for it. -bowerbird From gbnewby at pglaf.org Wed Nov 10 10:28:28 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Nov 10 10:28:30 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> Message-ID: <20041110182828.GA27968@pglaf.org> On Wed, Nov 10, 2004 at 12:09:52PM -0500, Joshua Hutchinson wrote: > > ----- Original Message ----- > From: Alev Akman > > > > Maybe if the computer people stuck to "computering" and listened to how the > > library world does it? After all, the library sytems and conventions have > > been in place for a while. > > > Great! Answer my earlier question then. What fields should be mandatory for our and which fields should be optional? (I don't know) > ie, > > Author, title, Original publisher = mandatory. Quick note: "Author" isn't the only term. The categories are: Author Illustrator Annotator Commentator Compiler Editor Illustrator Translator Unknown role These are the fields we use for copyright clearances & the online catalog. I think they match the MARC format too. They're used somewhat unevenly in the current eBook metadata field. -- Greg > Optional? Author birth/death dates? Which printing of the original source we derived from? Others? > > I'm not a librarian. I need someone knowledgeable to answer these questions. > > Josh > > PS If we define good teiHeader information for each work, it becomes a much simpler task for Marcello's cataloging scripts to find all sorts of fun information for the reader. Yes, this is the intent. Although the details are a little elusive right now, I think that including the authoritative catalog information in the XML file makes a lot of sense. The cataloging scripts are already ready for this. -- Greg From shalesller at writeme.com Wed Nov 10 10:28:53 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 10:29:04 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com> Andrew Sly writes: > Something very similar to this has been attempted before, with > rather dismal results. Hardly anyone seemed interested in writing > a little synopsis (or "blurb") What do you mean "has been attempted before"? If you mean the newsletter, you've got less than a week to write it up, and last time I submitted something to the newsletter, it got dropped into the void. If you're interested in synopsises, then express that interest and where we can direct the result to, and I'm sure people will respond. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From Bowerbird at aol.com Wed Nov 10 10:32:56 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 10 10:33:08 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <9e.191bcb1d.2ec3b8d8@aol.com> brad said: > But this won't be of any help to brick and mortar libraries > who want to integrate PG etexts into their existing catalogs. why is this a priority of any kind? but perhaps i don't understand. just precisely what would it mean for a "brick and mortar library" to "integrate" this e-library into their catalog? that if i walk into the place and go to the catalog to look for a book, it will tell me that it's available online? d'uh, next time i'll stay home, and search google. why should a brick-and-mortar library want this? i thought the goal here was to create a global library, one that is available 24/7 from anyplace in the world, with millions of books that are never "unavailable" because they are "checked out" or "mis-shelved" or "awaiting reshelving" or "going through re-binding" or because "this branch has never had a copy of that book, sorry, you'll have to go to the main library downtown." am i the one who's not seeing things clearly? or are you? > The catalog as it stands represents a lot more effort > than a lot of people realize. I hope people keep that in mind > when they slam the existing catalog. i agree. conversely, there's something perverse about giving people "credit" for working in a way that is clearly not efficient or productive... i could spend hours and hours and hours making a flyer, for instance, telling people how wonderful project gutenberg is, a flyer that would produce little effect out in the world. would you pat me on the back? or would you suggest instead that there is a better use for my energy? i humbly and respectfully suggest there is a better use for your energy. -bowerbird From gbnewby at pglaf.org Wed Nov 10 10:36:10 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Nov 10 10:36:11 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <41924575.2030708@perathoner.de> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> <41924575.2030708@perathoner.de> Message-ID: <20041110183610.GB27968@pglaf.org> On Wed, Nov 10, 2004 at 05:44:37PM +0100, Marcello Perathoner wrote: > Tony Baechler wrote: > > >Second, let's look at the subject. All it says is "fiction." > >OK, but about what? What category of fiction? While bookshare.org has > >a catalog not designed for professionals either, most books have a > >synopsis and are sorted by category. > > Everybody is complaining about the missing subject information. > > Complaining won't help. Stepping up and volunteering to enter the data > would help. It's a little more complicated than that. I'll send a few messages more about this in a few minutes. The basic story is that the FIRST approach to cataloging our stuff will be "copy" cataloging. This includes adding subject terms, as well as regularizing the titles, authors and other data. This involves finding an existing catalog record in MARC format via OCLC or similar resources. Alev thinks this is possible for the majority of our works, even the very obscure ones and non-US items. The SECOND approach will be original cataloging, to create a record from scratch (or based on existing info like author records). This is something we'd like to do only when necessary. In either case, adding a new record requires looking at consistency with other records and other uses of the subject information, because these things tend to change over time. My view is that we will be able to get a corps of "distributed catalogers" to work on the first approach, though just as with distributed proofreaders, there will probably be different levels at which people feel comfortable/confident/competent in creating or changing records. ** I'll send some further info about how this could get underway. ** At some point soon, though, let's move this to the "gutcat" ** list. http://lists.pglaf.org to join -- Greg From ke at suse.de Wed Nov 10 04:00:28 2004 From: ke at suse.de (Karl Eichwalder) Date: Wed Nov 10 10:37:37 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <4191ED7A.3020203@perathoner.de> (Marcello Perathoner's message of "Wed, 10 Nov 2004 11:29:14 +0100") References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <4191ED7A.3020203@perathoner.de> Message-ID: Marcello Perathoner writes: > To request a MARC record I have to implement an obscure Z39.50 > protocol. You can use yaz-client as it comes with the YAZ toolkit (http://www.indexdata.dk/yaz/). Index Data also offers a database system: http://www.indexdata.dk/zebra/ (GPL). -- Key fingerprint = B2A3 AF2F CFC8 40B1 67EA 475A 5903 A21B 06EB 882E From shalesller at writeme.com Wed Nov 10 10:49:25 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 10:49:37 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com> "Joshua Hutchinson" writes: > This sounds an awful lot like a Wikipedia entry. I > think a loose partnership between us and wikipedia > would be useful here. We have the book and a link > could be made from the catalog page to that books > entry in Wikipedia. Then, we have a place to put all > sorts of information about the book, the author, > where it was published, the historical context it was > conceived in, ... just about anything someone wants to add. But does every single book in the catalog deserve a Wikipedia entry? And a lot of details wanted are about the specific edition, when it was published and what not, that would never fit for a Wikipedia entry. That's a lot of short entries we'd be adding to Wikipedia. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From shalesller at writeme.com Wed Nov 10 11:02:11 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 11:02:18 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110190211.BC8A14BE64@ws1-1.us4.outblaze.com> Brad Collins writes: > Cataloging is not simple and it's not easy, and if it's not correct > and consistent it will result in a mess which will do more harm than > good. I've heard that about producing an ebook and about producing an operating system. I don't buy it. An incomplete list of science fiction books still helps. The fact that some of the computer books in the library are sorted under 510 and some under 000, like in the professionally catalogued Oklahoma State Library does not cause the roof to fall in. The worst thing a catalog can do is force you to try and handle things without it, which is what you'd be forced to do if you had no catalog. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From sly at victoria.tc.ca Wed Nov 10 11:02:29 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Wed Nov 10 11:02:36 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com> References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com> Message-ID: On Wed, 10 Nov 2004, D. Starner wrote: > Andrew Sly writes: > > > Something very similar to this has been attempted before, with > > rather dismal results. Hardly anyone seemed interested in writing > > a little synopsis (or "blurb") > > What do you mean "has been attempted before"? If you mean the newsletter, > you've got less than a week to write it up, and last time I submitted > something to the newsletter, it got dropped into the void. If you're > interested in synopsises, then express that interest and where we can > direct the result to, and I'm sure people will respond. I mean that this has been tried before. (admittedly, it was a while ago, late in 2000) And, after the initial contributions, the interest died out. If enough people would like to contribute a brief synopsis for texts in the collection, we already have a place in the catalog they can go. (although I don't know about the mechanics behind it) When I tried to make a few of these myself, I found that writing a good brief syopsis of a novel was harder than I would have thought. Andrew From gbnewby at pglaf.org Wed Nov 10 11:04:24 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Nov 10 11:04:25 2004 Subject: [gutvol-d] MARC to the catalog Message-ID: <20041110190424.GA29073@pglaf.org> Here's some information to try to get subject cataloging moving forward. As you've seen, Alev (who cataloged our first 3500 or so books) has stepped up to try to help shape this project. Andrew Sly has also stepped up, and has already been doing a lot of editing of existing catalog data. (I'm sending this to gutvol-d, but hope we can soon take this conversation to gutcat@lists.pglaf.org. Visit http://lists.pglaf.org to subscribe) One of our goals is to get proper subject headings into the Project Gutenberg catalog. ("Proper" means that they come from the Library of Congress Subject Headings corpus or similar authoritative source, and were generated by librarians or similarly clued-in people.) Currently, less than 1/4 of the Project Gutenberg collection has subject headings. Furthermore, the names we use for authors and titles are not always consistent. There are other limitations with the current catalog data, too. This message is partially to let people know how I think we'd like to start, and partially to ask Marcello and others (like Steve Thomas) to look at what it will take to move things forward. The basic scenario is that the easiest way to get authoritative catalog data (including subject headings) for our holdings is to find existing library catalog entries. There are some great resources and software for doing this, and a data interchange format called MARC. MARC stands for Machine Readable Cataloging, and it actually has a few variations. Essentially, it's delimited fields with data about an item: author, title, etc. (Many, many fields and subfields - most of which are not needed for a particular item.) ** What I'd like to enable is import of MARC records to the catalog, to update, augment or replace existing catalog entry for a particular item. This is harder than it might sound, for a variety of reasons. I'm appending a few MARC records from some recently released PG titles that Alev was able to find (yes, she found existing catalog records for these items, even though some are obscure and non-English). I don't want to over-specify how I think the workflow should happen. I think that's still to be determined. But the overall flow needs to be somewhat circular: librarians need to import existing PG catalog records, preferably in MARC format, to existing software. (Alev has a couple of programs for this; PGLAF can probably acquire software for other folks who'd like to work a lot on this activity.) Then, updated records would need to be shipped back into the catalog. Below are some MARC records, also the listing of info in the PG catalog and our clearance records (which are incomplete - though we usually do have the page scans sent for clearance) -- Greg One format: Title Author Date Publisher Note Punch, or, The London Charivari. Punch (London, England) 1841 Published for the proprietors by R. Bryant "No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover. Lippincott's magazine of popular literature and science (none) 1871; 1871-1880 J.B. Lippincott and Co. Edited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling. In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles. Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine. After a short time, however, it was merged with Scribner's; Title from caption.; Microfilm. The Lady of the Lake Scott, Walter; Rolfe, W. J. 1922 Houghton Mifflin company (none) The authoritative life of General William Booth Railton, George Scott 1912; c1912 Hodder & Stoughton, George H. Doran company (none) Camp and trail Hornibrook, Isabel Katherine 1897 Lothrop publishing company (none) The Outdoor Girls in army service, or, Doing their bit for the soldier boys Hope, Laura Lee. 1918; c1918 Grosset & Dunlap (none) Grace Harlowe's second year at Overton College Flower, Jessie Graham. 1914; c1914 Henry Altemus (none) Les trois mousquetaires Dumas, Alexandre; Le Courrier des ?tats-Unis 1846 P. Gaillardet At head of title: Semaine litt?raire du Courrier des ?tats-Unis. George Borrow Thomas, Edward. 1912 Chapman & Hall, ltd. "Bibliography of George Borrow": p.[323]-333. Another: 00796nam 2200217 a 45M0001001300000003000400013005001700017008004100034040001300075043001200088050002000100130002800120245003700148246002100185260006500206300003300271500008500304650004000389650003900429852011000468 NYUb11968217 NYU 19990310183125.0 990310s1841 enka 000 0 eng d aNNUcNNU ae-uk-en 4aAP101b.P8 1841 0 aPunch (London, England) 10aPunch, or, The London Charivari. 30aLondon Charivari aLondon :bPublished for the proprietors by R. Bryant,c1841. a14, [2] p. :bill. ;c30 cm. a"No. 1. For the week ending July 17, 1941. Price Threepence."--At head of cover. 0aEnglish wit and humorvPeriodicals. 0aPopular literaturezGreat Britain. aNNUbNYUbBobstbSpecColhAP101i.P8 1841712081221mNon-circulatingpN10964924t1yAvailable3no.15no.1 02256cas 2200373 a 45M0001001300000003000400013005001700017007001400034008004100048010001600089035002600105035001800131040002700149042000800176090005100184245007400235246002600309260005700335300002800392310001200420362004900432500069900481500002401180533015201204760004201356776006001398780006301458785002601521830005401547866007901601950001001680998010101690852009101791 NYUb10726168 NYU 19940713181853.0 hduafu---buca 890810d18711880miuuu p a 0uuua0eng d asn 85060910 a(CStRLIN)NYUG89-S4496 aGLIS007261686 aOAkUcOAkUdNdMHdNNU alcd i06/03/93 Th10/26/92 Th09/19/89 Th08/10/89 T 00aLippincott's magazine of popular literature and scienceh[microform]. 14aLippincott's magazine aPhiladelphia :bJ.B. Lippincott and Co.,c1871-1880. a20 v. :bill. ;c28 cm. aMonthly 0 aVol. 7, no. 1 (Jan. 1871)-v. 26 (Dec. 1880). aEdited in Philadelphia for the first seventeen years by John Foster Kirk, Lippincott's Magazine published many notable English and American writers including Henry James, Oscar Wilde, Amelie Rive, Conan Doyle, and Rudyard Kipling. In addition to long and short fiction, there was much literary criticism and many book reviews and illustrated travel articles. Although the contents were of high quality, competition with popular New York magazines eventually caused Lippincott's to be sold in 1914 to McBride, Nast and Company who moved it to New York and changed the name to McBride's Magazine. After a short time, however, it was merged with Scribner'szCf. American periodicals, 1741-1900. aTitle from caption. aMicrofilm.bAnn Arbor, Mich. :cXerox University Microfilms,d1972.e8 microfilm reels ; 4 in., 35 mm.f(American periodicals, 1850-1900 ; 317-324) 0 tAmerican periodical series, 1850-1900 1 tLippincott's magazine of popular literature and science 00tLippincott's magazine of literature, science and education 00tLippincott's magazine 0aAmerican periodical series, 1850-1900 ;v317-324. lBobst Microform dFilm 277 APS III R317-322e8908f0g5hj7-26k1871-1880 lBMICR a06/03/93tcs9110nNNUwDCLCSF8999097Sd08/10/89cMJDbSKHi930603h921026h890919h890810lNYUG aNNUbNYUbBobstbMicroform711635364mNon-circulatingpN10396809yAvailable5N10396809 00953cam 22002531 4500001000800000005001700008008004100025035002100066906004500087010001700132035001900149040001800168050002100186100003700207245009700244250002200341260006300363300005600426490004300482650005200525700005200577985002100629991004900650 9668953 20031210181225.0 830715s1922 msuab 000 0 eng 9(DLC) 25005333 a7bcbccoclcrplduencipf19gy-gencatlg a 25005333 a(OCoLC)9706316 aDLCcMsJdDLC 00aPR5308b.A1 1922 1 aScott, Walter,cSir,d1771-1832. 14aThe Lady of the Lake,cby Sir Walter Scott, Bart.; edited with notes by William J. Rolfe ... aRev. and enl. ed. aBoston,aNew York [etc.]bHoughton Mifflin companyc[1922] axvi, 272, [2] p. incl. front., illus., map.c17 cm. 0 a[Riverside literature series,vno. 53] 0aLady of the Lake (Legendary character)vPoetry. 1 aRolfe, W. J.q(William James),d1827-1910,eed. eOCLC REPLACEMENT bc-GenCollhPR5308i.A1 1922tCopy 1wOCLCREP 00614nam 2200157I 4500001000800000008004100008010001300049035001500062050001800077100003900095245014900134260006800283300005300351600003200404610002000436 1828178 830316s1912 nyucf 00010beng a13000924 a0313-23760 0 aBX9743.B7bR3 1 aRailton, George Scott,d1849-1913. 04aThe authoritative life of General William Booth,bfounder of the Salvation army,cby G. S. Railton ... with a preface by General Bramwell Booth. aNew York,bHodder & Stoughton, George H. Doran companyc[c1912] a7 p. l., 331 p.bfront., ports., facsim.c20 cm. 10aBooth, William,d1829-1912. 20aSalvation Army. 00595nam 2200181u 4500001000800000005001700008008004100025035002100066906004500087010001700132040001900149050001600168100006000184245002100244260004800265300004600313991005400359 5859494 00000000000000.0 810904s1897 mauf j 000 0 eng 9(DLC) 04016828 a0bcbccpremunvduencipf19gy-gencatlg a 04016828 aDLCcCarPdDLC 00aPZ7.H784bC 1 aHornibrook, Isabel Katherine,d1859- [from old catalog] 10aCamp and trail; aBoston,bLothrop publishing companyc[1897] a2 p.bl., 5-305 p. front. plates.c20 cm. bc-GenCollhPZ7.H784iCp00024749368tCopy 1wPREM 00478nam 2200145Ia 4500001000900000005001700009008004100026040002300067090002200090100002100112245010200133260004200235300002900277490002600306 10004676 19880111095007.0 831012s1918 nyua j 00011 eng d aNGUcNGUdm/cdBGU aPS3515.O585bO846 10aHope, Laura Lee. 14aThe Outdoor Girls in army service, or, Doing their bit for the soldier boys /cby Laura Lee Hope. 0 aNew York :bGrosset & Dunlap,cc1918. a212 p. :bill. ;c20 cm. 0 aOutdoor girls series. 00497nam 2200157Ii 4500001000800000005001700008008004100025040001800066090002100084100002700105245007900132260004300211300002900254490003000283830002600313 2810423 19880329145958.0 770317s1914 xx j 00011 eng d aMNLcMNLdBGU aPS3511.L78bG758 10aFlower, Jessie Graham. 10aGrace Harlowe's second year at Overton College /cby Jessie Graham Flower. 0 aPhiladelphia :bHenry Altemus,cc1914. a248 p. :bill. ;c19 cm. 1 aThe college girls series. 0aCollege girls series. 00765nam 22002291 4500001000800000005001700008008004100025035002100066906004500087010001700132035002000149040001900169050002100188100003400209245005100243260003700294300001900331500007100350700004400421985002100465991004900486 9603832 19980421190136.0 850703s1846 nyu 000 1 fre 9(DLC) 03029683 a7bcbccoclcrplduencipf19gy-gencatlg a 03029683 a(OCoLC)12231807 aDLCcNBuUdDLC 00aPQ2228b.A1 1846 1 aDumas, Alexandre,d1802-1870. 14aLes trois mousquetaires,cpar Alexandre Dumas. aNew York,bP. Gaillardet,c1846. a268 p.c26 cm. aAt head of title: Semaine litt?eraire du Courrier des ?Etats-Unis. 2 aLe Courrier des ?Etats-Unis,cNew York. eOCLC REPLACEMENT bc-GenCollhPQ2228i.A1 1846tCopy 1wOCLCREP 00568nam 2200169I 4500001000800000005001700008008004100025010001300066040002400079050001400103100002000117245006200137260004200199300006900241500005000310600003800360 4929310 19880421065446.0 790504s1912 enkcfh b 00110 eng a13012350 aDLCcAMHdm.c.dm/c 0 aPR4156.T5 10aThomas, Edward. 10aGeorge Borrow,bthe man and his books,cby Edward Thomas. 0 aLondon,bChapman & Hall, ltd.,c1912. axi, 333, viii p., 1 ?.bfront., plates, ports., facsims.c23 cm. a"Bibliography of George Borrow": p.[323]-333. 10aBorrow, George Henry,d1803-1881. The above are for these entries: 1. Celsissimus (German) http://www.gutenberg.org/etext/13953 gbn0403071608: Arthur Achleitner, Celsissimus (german). user@host. 1902p. 3/21/2004. ok. (that's a cleared2.gbn clearance line) 2. The Pocket George Borrow http://www.gutenberg.org/etext/13957 OK 20041030020123thomas The Pocket George Borrow Edward Thomas 1912:c 3. Les trois mousquetaires (French) http://www.gutenberg.org/etext/13951 OK 20041019125907dumas Les trois mousquetaires Alexandre Dumas 1844:p 4. Grace Harlowe's Second Year at Overton College http://www.gutenberg.org/etext/6858 gbn520: Grace Harlowe's Second Year at Overton College, Jessie Graham Flower user@host. 1914c. 9/13/2002. ok. (that's a cleared.gbn , really old clearance line) 5. The Outdoor Girls in Army Service, Or, doing their bit for the soldier boys http://www.gutenberg.org/etext/7494 gbn560g: The Outdoor Girls in Army Service, Laura Lee Hope. user@host. 1918c. 9/10/2002. ok. gbn568: Laura Lee Hope, The Outdoor Girls in Army Service. user@host. 1918c. 9/13/2002. ok. (cleared twice, but looks like the same edition) 6. Camp and Trail, A Story of the Maine Woods http://www.gutenberg.org/etext/13946 OK 20040825223614hornibrook Camp and Trail Isabel Hornibrook 1897:c 7. The Authoritative Life of General William Booth http://www.gutenberg.org/etext/13958 gbn0403190519: G[eorge] S[cott] Railton, The Authoritative Life of General William Booth. user@host. 1912c. 3/23/2004. ok. 8. The Lady of the Lake http://www.gutenberg.org/etext/3011 The Lady of the Lake Walter Scott J. C. Byers 11/23/99 ok 82-83c (this cleared line is from Michael's Xeroxes) 9. Lippincott's Magazine of Popular Literature and Science, Vol. XVII. No. 101. May, 1876. http://www.gutenberg.org/etext/13956 gbn0405261819: various, Lippincott's Magazine v. 17 Jan-June 1875. user@host. 1876c. 5/26/2004. ok. OK 20040808141522various Lippincott's Magazine v. 17 Jan-Jun 1876 various 1876:c (two clearances for this, too. We often clear entire year-long or multi-year volumes for periodicals based on a single TP&V scan) 10. Punch, or the London Charivari, Vol. 152, June 27, 1917 1917 Almanack http://www.gutenberg.org/etext/13954 gbn0402060846: Various, Punch - Vol. 152.. user@host. 1917p. 2/6/2004. ok. And, just for fun: Title Author Date Publisher Note Editor Call Number Corporate Author Description Edition Illustrator ISBN ISSN Language LC Call Number Main Series Subject Heading Gone with the wind Mitchell, Margaret; Herman Finkelstein Collection (Library of Congress); Alfred Whital Stern Collection of Lincolniana (Library of Congress) 1936 Macmillan "Published May 1936"--Verso of t.p. Actual publication of the 1st ed. was delayed to June 30, 1936. Cf. Gone with the wind as book and film / Richard Harwell. c1983. P. [xv].; LC copy has dust jacket. Newspaper clipping from the Parade section, Oct. 31, 1976 and magazine clipping from Publishers weekly, Sept. 6, 1976 on author laid in.; Source: Gift of Herman Finkelstein, Dec. 30, 1980. (none) PS3525.I972 (none) 1037 p. 22 cm. (none) (none) (none) (none) eng (none) (none) Women From aakman at csufresno.edu Wed Nov 10 11:14:19 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 11:14:28 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> Message-ID: <6.1.2.0.2.20041110092605.02074690@zimmer.csufresno.edu> Joshua, I think the minimum mandatory fields should include: Author (including birth/death dates) Title (and/or Uniform Title) Subtitle (when it exists) Editor/Translator/Illustrator Date the book was published Physical properties of the "original print work" like number of pages, size of the book, illustrations, etc. Notes (Contents for collections, for example) Call numbers (LC/Dewey) Subjects Genre (that's where the Mystery, Historical Fiction, etc would come in) That's what I can think of now. Does the list help? Alev. At 09:09 AM 11/10/2004, you wrote: >----- Original Message ----- >From: Alev Akman > > > > Maybe if the computer people stuck to "computering" and listened to how > the > > library world does it? After all, the library sytems and conventions have > > been in place for a while. > > >Great! Answer my earlier question then. What fields should be mandatory >for our and which fields should be optional? > >ie, > >Author, title, Original publisher = mandatory. > >Optional? Author birth/death dates? Which printing of the original >source we derived from? Others? > >I'm not a librarian. I need someone knowledgeable to answer these questions. > >Josh > >PS If we define good teiHeader information for each work, it becomes a >much simpler task for Marcello's cataloging scripts to find all sorts of >fun information for the reader. >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From joshua at hutchinson.net Wed Nov 10 11:15:46 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 10 11:15:53 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110191546.5ED692F8F7@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > > a wiki for each e-text is still a good idea, but someone else will have to > step up and take responsibility for it. As with everything else around here, someone has to step up. You act like you tried before... You didn't. You basically said, someone set up the wiki, someone decide this is how we are doing it and then you would "shepherd" it. No one is going to be able to wave a magic wand to make things happen. If no one feels strongly enough about something to take control and make it happen ... it ain't gonna happen. Josh From shalesller at writeme.com Wed Nov 10 11:17:58 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 11:18:27 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com> Alev Akman writes: > Physical properties of the "original print work" like number of pages, size > of the book, illustrations, etc. How can this be mandatory? We've got a few composite books, that don't have a single print analogue, and many books where it would be hard or arbitrary to find an edition to get this information from. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From joshua at hutchinson.net Wed Nov 10 11:22:33 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 10 11:22:40 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110192233.0332C4F550@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "D. Starner" > "Joshua Hutchinson" writes: > > > This sounds an awful lot like a Wikipedia entry. I > > think a loose partnership between us and wikipedia > > would be useful here. We have the book and a link > > could be made from the catalog page to that books > > entry in Wikipedia. Then, we have a place to put all > > sorts of information about the book, the author, > > where it was published, the historical context it was > > conceived in, ... just about anything someone wants to add. > > But does every single book in the catalog deserve a Wikipedia > entry? And a lot of details wanted are about the specific > edition, when it was published and what not, that would never > fit for a Wikipedia entry. That's a lot of short entries we'd > be adding to Wikipedia. I was thinking more of the summary/synopsis, author info, genre, etc. The stuff that everyone was saying we need in order to better use the collection. Author, publication date, etc belongs in MARC records or teiHeaders. As far as I know, neither really has a mechanism for holding things like a summary or review information (I suppose since TEI is XML based, that functionality could be added, but I don't think that is the proper place for it). Josh From aakman at csufresno.edu Wed Nov 10 11:26:13 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 11:26:21 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com> References: <20041110191758.73D9C4BE64@ws1-1.us4.outblaze.com> Message-ID: <6.1.2.0.2.20041110111917.08b7e760@zimmer.csufresno.edu> At 11:17 AM 11/10/2004, you wrote: >Alev Akman writes: > > > Physical properties of the "original print work" like number of pages, > size > > of the book, illustrations, etc. > >How can this be mandatory? We've got a few composite books, that don't have >a single print analogue, and many books where it would be hard or arbitrary >to find an edition to get this information from. I was speaking for our future records. I am aware that some of our files are even compilations of various additions. Hopefully we are getting away from works obtained that way, maybe even redoing them. If we want dependable works, we should be able to prove our source. No more chickening out for copyright reasons. Part of the reason PG does not have the power it should with the libraries is that the information required for citation purposes are a mishmash. We should be ready to present the best replicate of the work at hand and the information that goes with it. Alev. >-- >___________________________________________________________ >Sign-up for Ads Free at Mail.com >http://promo.mail.com/adsfreejump.htm > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From Bowerbird at aol.com Wed Nov 10 11:28:59 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 10 11:29:14 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <193.328d8437.2ec3c5fb@aol.com> josh said: > As with everything else around here, someone has to step up. > You act like you tried before... You didn't. i not only "tried", i _did_ "step up". i developed a whole system to use. all i required was a place to put it. steve even took things a step further, and put up his system in his own space. and nobody went there to support him or got anyone else to go there to do it. (go there, right now, and you'll see it's true.) maybe that's because you are all talk and no action. or maybe it's because there is no demand from users. (certainly not to justify a huge expenditure of effort.) either way, there's no future in _this_ "future of e-books". so i'll be making my way out of here... -bowerbird From joshua at hutchinson.net Wed Nov 10 11:31:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 10 11:32:00 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com> Thank you! One quick question jumped out at me. You mention Illustrations. Do you just mean a total number of illustrations or a list of illustrations in the work? Total will be easy... a list of illustrations is a lot of work! :) Josh ----- Original Message ----- From: Alev Akman To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] [BP] The Future of eBooks Date: Wed, 10 Nov 2004 11:14:19 -0800 > > Joshua, > > I think the minimum mandatory fields should include: > > Author (including birth/death dates) > Title (and/or Uniform Title) > Subtitle (when it exists) > Editor/Translator/Illustrator > Date the book was published > Physical properties of the "original print work" like number of pages, size > of the book, illustrations, etc. > Notes (Contents for collections, for example) > Call numbers (LC/Dewey) > Subjects > Genre (that's where the Mystery, Historical Fiction, etc would come in) > > That's what I can think of now. Does the list help? > > Alev. > > > At 09:09 AM 11/10/2004, you wrote: > > > >----- Original Message ----- > >From: Alev Akman > > > > > > Maybe if the computer people stuck to "computering" and listened to how > > the > > > library world does it? After all, the library sytems and conventions have > > > been in place for a while. > > > > > >Great! Answer my earlier question then. What fields should be mandatory > >for our and which fields should be optional? > > > >ie, > > > >Author, title, Original publisher = mandatory. > > > >Optional? Author birth/death dates? Which printing of the original > >source we derived from? Others? > > > >I'm not a librarian. I need someone knowledgeable to answer these questions. > > > >Josh > > > >PS If we define good teiHeader information for each work, it becomes a > >much simpler task for Marcello's cataloging scripts to find all sorts of > >fun information for the reader. > >_______________________________________________ > >gutvol-d mailing list > >gutvol-d@lists.pglaf.org > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > >--- > >Incoming mail is certified Virus Free. > >Checked by AVG anti-virus system (http://www.grisoft.com). > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 > > > > > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Wed Nov 10 11:33:01 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 11:33:12 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <4191ED7A.3020203@perathoner.de> Message-ID: <41926CED.1080900@perathoner.de> Karl Eichwalder wrote: > You can use yaz-client as it comes with the YAZ toolkit > (http://www.indexdata.dk/yaz/). Ok. I got so far. For all of you that wondered what a MARC record looks like here is an example: 000 01109cam 2200277 a 4500 001 708964 005 19980710092633.8 008 970604s1997 inuab b 001 0 eng 035 $9(DLC) 97023698 906 $a7$bcbc$corignew$d1$eocip$f19$gy-gencatlg 955 $apc16 to ja00 06-04-97; jd25 06-05-97; jd99 06-05-97; jd11 06-06-97;aa05 06-10-97; CIP ver. pv08 11-05-97 010 $a 97023698 020 $a0253333490 (alk. paper) 040 $aDLC$cDLC$dDLC 050 00 $aQE862.D5$bC697 1997 082 00 $a567.9$221 245 04 $aThe complete dinosaur /$cedited by James O. Farlow and M.K. Brett-Surman ; art editor, Robert F. Walters. 260 $aBloomington :$bIndiana University Press,$cc1997. 300 $axi, 752 p. :$bill. (some col.), maps ;$c26 cm. 504 $aIncludes bibliographical references and index. 650 0 $aDinosaurs. 700 1 $aFarlow, James Orville. 700 2 $aBrett-Surman, M. K.,$d1950- 920 $a**LC HAS REQ'D # OF SHELF COPIES** 991 $bc-GenColl$hQE862.D5$iC697 1997$tCopy 1$wBOOKS 991 $br-SciRR$hQE862.D5$iC697 1997$tCopy 1$wGenBib bi 98-003434 The first problem is: how do we relate existing and new books to LoC MARC records. Meaning: we have to find out the Control Number (001) or the LoC Control Number (010) of every book we have. We need a few volunteers to build a list: etext-number => Control Number. Then we can import that list into the database. -- Marcello Perathoner webmaster@gutenberg.org From aakman at csufresno.edu Wed Nov 10 11:41:57 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 11:42:08 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com> References: <20041110193152.1E5479E829@ws6-2.us4.outblaze.com> Message-ID: <6.1.2.0.2.20041110114008.08af1420@zimmer.csufresno.edu> What I meant was whether the book has any illustrations. Usually the catalog record also indicates if any or all are in color. Never the number/list of illustrations. Alev. At 11:31 AM 11/10/2004, you wrote: >Thank you! > >One quick question jumped out at me. > >You mention Illustrations. Do you just mean a total number of >illustrations or a list of illustrations in the work? Total will be >easy... a list of illustrations is a lot of work! :) > >Josh > >----- Original Message ----- >From: Alev Akman >To: Project Gutenberg Volunteer Discussion >Subject: Re: [gutvol-d] [BP] The Future of eBooks >Date: Wed, 10 Nov 2004 11:14:19 -0800 > > > > > Joshua, > > > > I think the minimum mandatory fields should include: > > > > Author (including birth/death dates) > > Title (and/or Uniform Title) > > Subtitle (when it exists) > > Editor/Translator/Illustrator > > Date the book was published > > Physical properties of the "original print work" like number of pages, > size > > of the book, illustrations, etc. > > Notes (Contents for collections, for example) > > Call numbers (LC/Dewey) > > Subjects > > Genre (that's where the Mystery, Historical Fiction, etc would come in) > > > > That's what I can think of now. Does the list help? > > > > Alev. > > > > > > At 09:09 AM 11/10/2004, you wrote: > > > > > > >----- Original Message ----- > > >From: Alev Akman > > > > > > > > Maybe if the computer people stuck to "computering" and listened to > how > > > the > > > > library world does it? After all, the library sytems and > conventions have > > > > been in place for a while. > > > > > > > > >Great! Answer my earlier question then. What fields should be mandatory > > >for our and which fields should be optional? > > > > > >ie, > > > > > >Author, title, Original publisher = mandatory. > > > > > >Optional? Author birth/death dates? Which printing of the original > > >source we derived from? Others? > > > > > >I'm not a librarian. I need someone knowledgeable to answer these > questions. > > > > > >Josh > > > > > >PS If we define good teiHeader information for each work, it becomes a > > >much simpler task for Marcello's cataloging scripts to find all sorts of > > >fun information for the reader. > > >_______________________________________________ > > >gutvol-d mailing list > > >gutvol-d@lists.pglaf.org > > >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > > >--- > > >Incoming mail is certified Virus Free. > > >Checked by AVG anti-virus system (http://www.grisoft.com). > > >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 > > > > > > > > > > > --- > > Outgoing mail is certified Virus Free. > > Checked by AVG anti-virus system (http://www.grisoft.com). > > Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 > > > > > > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From marcello at perathoner.de Wed Nov 10 12:01:07 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 12:01:16 2004 Subject: [gutvol-d] MARC to the catalog In-Reply-To: <20041110190424.GA29073@pglaf.org> References: <20041110190424.GA29073@pglaf.org> Message-ID: <41927383.9030308@perathoner.de> Greg Newby wrote: > I don't want to over-specify how I think the workflow should > happen. I think that's still to be determined. But the overall > flow needs to be somewhat circular: librarians need to import > existing PG catalog records, preferably in MARC format, to > existing software. (Alev has a couple of programs for this; PGLAF > can probably acquire software for other folks who'd like to > work a lot on this activity.) Then, updated records would need > to be shipped back into the catalog. I think an easier solution would be to build an ASCII list containing the etext-number and the LoC Call Number for all etexts we have. We would then import the LoC Call Number into a field in the database. The catalog software could then update a number of fields (Subject, LoC Class, Unified Title) automatically from the LoC database (TODO Check copyright status of LoC database !!!) Then we could do a manual pass over the database with the MARC record at hand and fix the author / coauthor attributions, link into wikipedia if an article exists, add summaries etc. -- Marcello Perathoner webmaster@gutenberg.org From brad at chenla.org Wed Nov 10 12:37:53 2004 From: brad at chenla.org (Brad Collins) Date: Wed Nov 10 12:39:58 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <9e.191bcb1d.2ec3b8d8@aol.com> (Bowerbird@aol.com's message of "Wed, 10 Nov 2004 13:32:56 EST") References: <9e.191bcb1d.2ec3b8d8@aol.com> Message-ID: Bowerbird@aol.com writes: > brad said: >> But this won't be of any help to brick and mortar libraries >> who want to integrate PG etexts into their existing catalogs. > > why is this a priority of any kind? > > but perhaps i don't understand. just precisely what would it mean for a > "brick and mortar library" to "integrate" this e-library into their catalog? > > that if i walk into the place and go to the catalog to look for a book, > it will tell me that it's available online? d'uh, next time i'll stay home, > and search google. why should a brick-and-mortar library want this? > Are you kidding? Check any of the major online catalogs. They all try to integrate records for electronic works. The ISBD spends a very large amount of spec-space on how to format records for electronic formated works. I'm sorry, I live in SE Asia, and the libraries I've visited in Hong Kong, Thailand, Malaysia, Japan and Singapore and mainland China--even the smaller libraries have terminals which provide an electronic catalog. The rows of wooden drawers with paper catalog cards are still there but most people use the computer catalog. I haven't been back to the States much in the last 15 years but when I have I always spend time in some University library. They all use online catalogs. A growing number of brick and mortar libraries are now adding etexts to their collections. Sometimes they only provide links to websites but often they are local copies of the etexts which correspond to their catalog entry. > i thought the goal here was to create a global library, one that is > available 24/7 from anyplace in the world, with millions of books > that are never "unavailable" because they are "checked out" or > "mis-shelved" or "awaiting reshelving" or "going through re-binding" > or because "this branch has never had a copy of that book, sorry, > you'll have to go to the main library downtown." > The goal here is to create etexts which can be used anywhere--in your home, in a high school, in a public library as well as over the Internet. A remote library in northern India may not be able to afford the bandwidth to download PG texts. But they can provide access to a CDROM collection of the PG texts. Librarians would love to be able to say "all copies have been checked out, but the etext is available in pdf, html and plain text" > am i the one who's not seeing things clearly? or are you? > I think what I'm seeing and saying is very clear. MARC is the present standard for the vast majority of bibliographic data for libraries. Libraries fill a very real need for their communities which will change over time but will not vanish because of the Internet. I'd like to hear one good reason why the catalog shouldn't be available in as many different formats as is needed for everyone to find and access PG texts? b/ -- Brad Collins , Bangkok, Thailand From gbnewby at pglaf.org Wed Nov 10 12:52:46 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Nov 10 12:52:47 2004 Subject: [gutvol-d] MARC to the catalog In-Reply-To: <41927383.9030308@perathoner.de> References: <20041110190424.GA29073@pglaf.org> <41927383.9030308@perathoner.de> Message-ID: <20041110205246.GA457@pglaf.org> On Wed, Nov 10, 2004 at 09:01:07PM +0100, Marcello Perathoner wrote: > Greg Newby wrote: > > >I don't want to over-specify how I think the workflow should > >happen. I think that's still to be determined. But the overall > >flow needs to be somewhat circular: librarians need to import > >existing PG catalog records, preferably in MARC format, to > >existing software. (Alev has a couple of programs for this; PGLAF > >can probably acquire software for other folks who'd like to > >work a lot on this activity.) Then, updated records would need > >to be shipped back into the catalog. > > I think an easier solution would be to build an ASCII list containing > the etext-number and the LoC Call Number for all etexts we have. > > We would then import the LoC Call Number into a field in the database. > > The catalog software could then update a number of fields (Subject, LoC > Class, Unified Title) automatically from the LoC database (TODO Check > copyright status of LoC database !!!) > > Then we could do a manual pass over the database with the MARC record at > hand and fix the author / coauthor attributions, link into wikipedia if > an article exists, add summaries etc. I like this idea, but am concerned that there will still need to be human oversight. Just importing records will only work if there are unambiguous matches, and it seems that matching is often ambiguous. >From doing lots of copyright clearances, I know that many items are not in the LoC database (most of our non-English is not in there). But this would be a good start, and there are other national library catalogs that offer Z39.50 access to their records. -- Greg From shalesller at writeme.com Wed Nov 10 12:56:24 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 12:56:37 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110205624.5FAD64BE64@ws1-1.us4.outblaze.com> "Joshua Hutchinson" writes: > I was thinking more of the summary/synopsis, author info, genre, etc. Genre must be in our system to be useful; we want to search on that. Author birth and death is part of the MARC info. But still, is Wikipedia the right place for an article on every book ever written and every author who ever wrote a book? We should collaborate with them on the encyclopedia-worthy entries, but "The Influence of Old Norse Literature on English" isn't really encyclopedia-worthy, and neither is its author, Nordby, who wrote but one book. However, it might be worth extracting the author information from the book for PG's author page. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From hart at pglaf.org Wed Nov 10 12:57:29 2004 From: hart at pglaf.org (Michael Hart) Date: Wed Nov 10 12:57:29 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com> References: <20041110184925.013A24BE65@ws1-1.us4.outblaze.com> Message-ID: Let's change the subject header to PG catalog so I can find the relevant messages easily. Thanks! Michael From shalesller at writeme.com Wed Nov 10 13:15:35 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 13:15:44 2004 Subject: [gutvol-d] MARC to the catalog Message-ID: <20041110211535.0B2644BE64@ws1-1.us4.outblaze.com> Marcello Perathoner writes: > TODO Check > copyright status of LoC database !!!) It was created by employees of the US federal government in the course of their work for the government. It's public domain. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From marcello at perathoner.de Wed Nov 10 13:47:55 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 13:48:06 2004 Subject: [gutvol-d] MARC to the catalog In-Reply-To: <20041110205246.GA457@pglaf.org> References: <20041110190424.GA29073@pglaf.org> <41927383.9030308@perathoner.de> <20041110205246.GA457@pglaf.org> Message-ID: <41928C8B.3010600@perathoner.de> Greg Newby wrote: >>The catalog software could then update a number of fields (Subject, LoC >>Class, Unified Title) automatically from the LoC database (TODO Check >>copyright status of LoC database !!!) > I like this idea, but am concerned that there will still need > to be human oversight. Just importing records will only work > if there are unambiguous matches, and it seems that matching > is often ambiguous. We can start to match the easy ones and leave the hard ones to our librarians. We can periodically output a list of still unmatched books. The fields I propose to import (Subject, LoC, Unified Title) should not be ambiguous. It doesn't matter which edition of a work we match. OTOH for new books still in the DP queue it might be wiser to match the exact edition down to the format and number of pages and the coffee stain on page 42. -- Marcello Perathoner webmaster@gutenberg.org From shalesller at writeme.com Wed Nov 10 13:51:37 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 13:51:54 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com> Alev Akman writes: > >How can this be mandatory? We've got a few composite books, that don't have > >a single print analogue, and many books where it would be hard or arbitrary > >to find an edition to get this information from. > > I was speaking for our future records. I am aware that some of our files > are even compilations of various additions. Hopefully we are getting away > from works obtained that way, maybe even redoing them. If we want > dependable works, we should be able to prove our source. No more chickening > out for copyright reasons. It still can't be mandatory until we're willing to seek out editions for every single book we've done, either the exact same edition or be willing to update the text to an edition we can find. More towards what I was thinking, there are books that are compilations of several printed books. PG recently posted The Fifteen Comforts of Matrimony: Responses from Men and The Fifteen Comforts of Matrimony: Responses from Women. We could break those down into the individual 8 page pamphlets, but PG has generally discouraged that. Likewise, if I can ever find the papers, a book of short stories by Edna St. V. Millay will go to DP, consistenting of several stories written for magazines but never collected as a book. There's no reason for us to break that up, either. If you enter "Het Esperanto" into the LoC catalog, you get "[Language, languages, and writing pamphlets]./1819-1947/51 items". So this is a practice that LoC engages in, in its own way. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From aakman at csufresno.edu Wed Nov 10 14:20:51 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 14:21:05 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com> References: <20041110215137.69DFD4BE64@ws1-1.us4.outblaze.com> Message-ID: <6.1.2.0.2.20041110141155.08ce5598@zimmer.csufresno.edu> At 01:51 PM 11/10/2004, you wrote: >Alev Akman writes: > > > >How can this be mandatory? We've got a few composite books, that don't > have > > >a single print analogue, and many books where it would be hard or > arbitrary > > >to find an edition to get this information from. > > > > I was speaking for our future records. I am aware that some of our files > > are even compilations of various additions. Hopefully we are getting away > > from works obtained that way, maybe even redoing them. If we want > > dependable works, we should be able to prove our source. No more > chickening > > out for copyright reasons. > >It still can't be mandatory until we're willing to seek out editions for >every single book we've done, either the exact same edition or be willing to >update the text to an edition we can find. > >More towards what I was thinking, there are books that are compilations of >several printed books. PG recently posted The Fifteen Comforts of Matrimony: >Responses from Men and The Fifteen Comforts of Matrimony: Responses from >Women. >We could break those down into the individual 8 page pamphlets, but PG has >generally discouraged that. Likewise, if I can ever find the papers, a book >of short stories by Edna St. V. Millay will go to DP, consistenting of several >stories written for magazines but never collected as a book. There's no reason >for us to break that up, either. Here's the tabbed text format of the title you mentioned: The Fifteen comforts of matrimony. La Sale, Antoine de 1795 [by Isaiah Thomas] and sold at the Worcester bookstore. Translation of: Les quinze joyes de mariage, sometimes attributed to Antoine de La Sale.; Printer's name supplied by Evans.; Evans; Microform version available in the Readex Early American Imprints series.; Electronic text and image data.; EvansDigital. (none) 144 p. ill. 16 cm. (12mo) (none) 16 cm. (12mo) (none) (none) (none) (none) (none) (none) eng (none) (none) 144 p. Printed at Worcester, Massachusetts Marriage.; Women; Women in literature. With an addition of three comforts more. : Wherein the various miscarriages of the wedded state, and the miserable consequences of rash and inconsiderate marriages are laid open and detected. (none) (none) (none) Obviously, fields shown as (none) would be shown as blank in the database. If necessary, we can have a catalog entry for the combined works but give the information for the individual works within the Notes field, indicating the titles, dates, whatever. And, here's the MARC version: 041 1 $a eng$h fre 245 04$a The Fifteen comforts of matrimony.$h [electronic resource] :$b With an addition of three comforts more. : Wherein the various miscarriages of the wedded state, and the miserable consequences of rash and inconsiderate marriages are laid open and detected. 260 $a Printed at Worcester, Massachusetts, :$b [by Isaiah Thomas] and sold at the Worcester bookstore.,$c 1795. 300 $a 144 p. :$b ill. ;$c 16 cm. (12mo) 500 $a Translation of: Les quinze joyes de mariage, sometimes attributed to Antoine de La Sale. 500 $a Printer's name supplied by Evans. 510 4 $a Evans$c 28948. 530 $a Microform version available in the Readex Early American Imprints series. 533 $a Electronic text and image data.$b [Chester, Vt. :$c Readex, a division of Newsbank, Inc.,$d 2002-2004.$e Includes files in TIFF, GIF and PDF formats with inclusion of keyword searchable text.$f (Early American imprints. First series ; no. 28948) 590 $a EvansDigital. 650 0$a Marriage. 650 0$a Women $v Anecdotes. 650 0$a Women in literature. 655 7$a ebooks$2 local. 655 7$a eresource$2 local. 655 7$a Facetiae.$2 rbgenr. 690 $a Evans Digital Edition. 690 $a Early American Imprints. 700 1 $a La Sale, Antoine de ,$d b. 1388? 752 $a United States$b Massachusetts$d Worcester. 830 0$a Early American imprints.$n First series ;$v no. 28948. 856 41$u http://libproxy.unm.edu/login?url=http://opac.newsbank.com/select/evans/28948 $y Click here $z to access Evans Digital Edition You see also where we would attach our files? On field 856? Yup, it is doable. You think a record like that where we can harvest what we need is useful? Alev. >If you enter "Het Esperanto" into the LoC catalog, you get >"[Language, languages, and writing pamphlets]./1819-1947/51 items". So this >is a practice that LoC engages in, in its own way. > >-- >___________________________________________________________ >Sign-up for Ads Free at Mail.com >http://promo.mail.com/adsfreejump.htm > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 -------------- next part -------------- --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.783 / Virus Database: 529 - Release Date: 10/25/2004 From Bowerbird at aol.com Wed Nov 10 15:22:26 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 10 15:22:42 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <190.32b05033.2ec3fcb2@aol.com> brad said: > Are you kidding? no. are you? > Check any of the major online catalogs. > They all try to integrate records for electronic works. yeah. but so what? i don't get the impression that the people who are looking for e-texts go to a library's catalog to try and find 'em... > The rows of wooden drawers with paper catalog cards are > still there but most people use the computer catalog. um, yes, i quite realize that. as far as i know, the "rows of wooden drawers" were effectively replaced a decade or more ago. they're still kind of quaint, though, don't you think? in spite of their digitized equivalents, however, google (and the others) are the new "card catalog". so i would think it would be _far_ more productive to implement a strategy that leverages the search engines (because, honestly, the system you have now does not), instead of plays into all of the antiquated systems. heck, i'd even like to see a decent system on the website. the one there now is good if you know the title and/or the author-name. that's a start. but it doesn't go very far, not in _recommending_ a book to a reader... > A growing number of brick and mortar libraries > are now adding etexts to their collections. > Sometimes they only provide links to websites > but often they are local copies of the etexts > which correspond to their catalog entry. well, that's all very nice. and if they did all of the work to integrate this e-library into their system, i'd say "thanks". but i don't see much purpose in doing that work myself, not when a whole host of other capabilities would be _far_ more useful, in my eyes, like full-text search... if getting to the patrons of a specific brick-and-mortar library could _only_ be done by getting myself in that specific catalog, it might be a completely different story. but when those patrons can just as easily use their computer to find my e-texts in google as to find them in the library's catalog, i don't see much difference. besides, show me one good system -- from any library out there! -- that helps a person find a book that they will like. show me! please! i said it before, and i'll say it again. collaborative filtering does this. spend time _productively_, building a collaborative filtering system. people don't decide what to read based on the info in a marc record. > A remote library in northern India may not be able to > afford the bandwidth to download PG texts. But they can > provide access to a CDROM collection of the PG texts. ok, except now you're talking about something different. putting a c.d. of the e-texts into a brick-and-mortar library -- or indeed, in every residence in india with a computer -- is a great idea. but that has little to do with marc records. > Librarians would love to be able to say > "all copies have been checked out, > but the etext is available in pdf, html and plain text" you'd think so. but michael reports he has encountered resistance. say what you will, i don't think "the absence of marc records" is why. (and when librarians finally decide they _do_ want a copy of the c.d., an absence of those marc records will be of zero consequence to them.) nonetheless, i'm sure their intelligent patrons will be able to find a copy. online. using google. to download. and burn copies for their neighbors. > I'd like to hear one good reason why the catalog shouldn't be > available in as many different formats as is needed > for everyone to find and access PG texts? the best reason of all is because no one seems to want to do the work. most of you say "it's a good idea" (although i haven't heard one single compelling reason _why_), but very few have done anything about it. (kudos to andrew. and what is his experience? "it's hard work!") now, what is your counterargument to that? yeah, that's what i thought. -bowerbird From marcello at perathoner.de Wed Nov 10 15:38:25 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 15:38:40 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> References: <20041110170952.3ECBCEDC5D@ws6-1.us4.outblaze.com> Message-ID: <4192A671.4010002@perathoner.de> Joshua Hutchinson wrote: > What fields should be mandatory for our and which fields > should be optional? Start with something like this: Note that the sourceDesc should always accurately describe the physical source. If you collect items from more than one phys. source all sources should be listed. Also if you split one physical source into multiple etexts, the source should appear in all etexs. Note the DP project number and the LoC Call Number. The LCCN should be given in the sourceDesc if it matches the physical source exactly. It should be given in the publicationStmt if it matches a different edition of the same work. Note: in the HTML file you'll have to enclose the teiHeader in comments and make sure you replace all occurences of -- inside the header with — You don't have to provide the and stuff if you don't want to. I really just need the LCCN number and can pull the rest from the LoC database. An absolutely minimal header should look like: DP Project Number goes here Same work LCCN goes here. Only exact source LCCN goes here. A full header should look like: Common sense Paine, Thomas (1737-1809) Project Gutenberg DP Project Number goes here Same work LCCN goes here. First PG Edition Brief notes on the text. Foner, Philip S. The collected writings of Thomas Paine New York Citadel Press January 1945 19 pp. Only exact source LCCN goes here. Library of Congress Subject Headings Library of Congress Classification Library of Congress Call Number Distributed Proofreaders Project Number Date text was created goes here: 1774 English. Political science United States — Politics and government — Revolution, 1775-1783 JC 177 January 2004 Joshua Hutchinson Tonya Allen Distributed Proofreaders Team Scanned and proofed it. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Wed Nov 10 15:47:28 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed Nov 10 15:47:40 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <190.32b05033.2ec3fcb2@aol.com> References: <190.32b05033.2ec3fcb2@aol.com> Message-ID: <4192A890.6040201@perathoner.de> Bowerbird@aol.com wrote: > besides, show me one good system -- from any library out there! -- > that helps a person find a book that they will like. show me! please! I think you should program one over the weekend in BASIC to keep up your record of phenomenal software success stories. > i said it before, and i'll say it again. collaborative filtering does this. > spend time _productively_, building a collaborative filtering system. Oh, no! He learned a new buzzword. He'll never let us hear the end of it. -- Marcello Perathoner webmaster@gutenberg.org From Bowerbird at aol.com Wed Nov 10 16:25:25 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Wed Nov 10 16:25:45 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <1da.2f1c0458.2ec40b75@aol.com> marcello said: > I think you should program one over the weekend in BASIC > to keep up your record of phenomenal software success stories. i'm busy this weekend. but i'll get to it next month, i promise... but it's appropriate that -- on my way out -- you'd try to smear me because i use _basic_ -- goodness gracious -- since that was exactly what you did when i first appeared. you also thought it was fashionable to try and ridicule me because i'm on a.o.l., so do try and work that in now too... because you can bet money that when i start up my blog and tell the world what a bunch of idiots are running the show here, i will make fun of you because you're a bunch of script kiddies who think that you'll rule the world because you know reg-ex. > Oh, no! He learned a new buzzword. > He'll never let us hear the end of it. actually, i've served my one-year sentence here, much of it in solitary, so i'll be leaving very shortly. oh, i learned that "buzzword" a long, long time ago. as far as "finding" things goes, it's a silver bullet... and if anyone wants to see it applied to books, the guy at alexlit was doing it many years ago. might wanna talk to him about it... -bowerbird From shalesller at writeme.com Wed Nov 10 17:21:38 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 17:21:52 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041111012138.989774BE64@ws1-1.us4.outblaze.com> Alev Akman writes: > > At 01:51 PM 11/10/2004, you wrote: > > >More towards what I was thinking, there are books that are compilations of > >several printed books. PG recently posted The Fifteen Comforts of Matrimony: > >Responses from Men and The Fifteen Comforts of Matrimony: Responses from > >Women. > > Here's the tabbed text format of the title you mentioned: > > The Fifteen comforts of matrimony. La Sale, Antoine de 1795 [by > Isaiah Thomas] and sold at the Worcester bookstore. But it's not. A book that included part of the current PG title was published and printed then; but the PG title also includes various other publications that weren't combined with it in book form. (It's probably a moot point, but that's not the text; that part of PG text was printed in 1706 in England. It may be a reprint, but it could be a seperate translation or a different text trading on the popularity of the first.) -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From gbnewby at pglaf.org Wed Nov 10 17:30:08 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Wed Nov 10 17:30:10 2004 Subject: [gutvol-d] iPod transformation to try Message-ID: <20041111013008.GA10145@pglaf.org> I'm forwarding the information below in the hopes that some folks with iPods could check this out. Provided it works, I'll link it in our PDA "howto" and maybe we'll put a note about it in the newsletters. Please Cc any feedback to Dan, as well as gutvol-d. Thanks! Greg ----- Forwarded message from Dan Duris ----- From: Dan Duris To: Greg Newby Subject: Re: Converting Project Gutenberg's books for read on iPod Date: Thu, 11 Nov 2004 02:10:34 +0100 iPod eBook Creator allows you to convert any text file to notes (note files for iPod). You can then read almost any book, magazine or message available in plain text format on your iPod. It's free online service that basically splits the large text files to small ones (due to iPod limitation on note files). It also links the notes, so you can easily browse them same way as turning pages in the book. URL: http://www.ambience.sk/ipod-ebook-creator/ipod-book-notes-text-conversion.php Any comments are welcome at: dusoft@staznosti.sk (I am not subscribed to this mailing list) From stephen.thomas at adelaide.edu.au Wed Nov 10 17:51:24 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Nov 10 17:51:46 2004 Subject: [gutvol-d] PG Catalog In-Reply-To: <4191EFB7.6080203@perathoner.de> References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <200411091911.28485.lynne@rhodesresearch.biz> <200411100608.iAA68I6P016938@posso.dm.unipi.it> <4191EFB7.6080203@perathoner.de> Message-ID: <4192C59C.4080203@adelaide.edu.au> The central problem, if I've understood all the posts, is that the catalog entry is generated from the final header, which as we all know omits lots of detail which the volunteers have. Would it be possible to add manual cataloguing to the posting workflow? By which I mean, when a person (whitewasher?) posts a new text, they also edit the catalog to add whatever level of detail for the work is to hand. I understand that we don't want to add to the whitewasher's workload, but -- thanks to Marcello's web interface -- it is really quite easy to add to a catalog entry, so probably not a great deal of work in comparison to the work they already do. Of course, once all the TEI stuff is in place, this won't be necessary, but in the meantime ... Steve Marcello Perathoner wrote: > Carlo Traverso wrote: > >> The first step however is to have better PG records, and a method to >> avoid losing information from DP to the PG catalogue. > > > If you put a complete ... somewhere in the > files, maybe at the back where it won't hurt much, I can easily pick it > out and parse it into the database. Of course it has to stay in the file > after being posted. > > What is happening now is that I parse the tiny header at the top of the > file and I get just what's there. > > > -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From stephen.thomas at adelaide.edu.au Wed Nov 10 17:51:34 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Nov 10 17:51:52 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> <41924575.2030708@perathoner.de> Message-ID: <4192C5A6.3020006@adelaide.edu.au> Andrew Sly wrote: > > I don't believe we are ready. There is right now no agreement > about what form this data would take, or what standard to try > to comply with. > > If various volunteers all get to enter their own idea of what > catagories and subject headings appeal to them, we will end up > with a mish-mash of conflicting and overlapping data. > > I am no expert here, but I have read enough to know that > doing subject cataloging _well_ is more involved most > people realise. Yes indeed. Library systems use what's known as an authority file for subject headings (and also for authors). This lists only headings that are "authorised" -- e.g. for LCSH, conform to the LCSH standards. Now, PG is *never* going to have such a file (it would be huge) and I don't think it should -- LCSH is famously arcane and often seems rather arbitrary. (Although there are teams of librarians working day and night in a dark tower somewhere making sure that only the "correct" terms are used. ;-) Ideally though, there should be some guidelines about what terms should be used in the subject field, otherwise it will be less than useful. For example, if we are going to apply the term "Fiction" to some works of fiction, then it should be applied to all. Otherwise, it's usefulness as a search term is diminished. The key problem is one of scale. Do you limit the field to a short list of valid terms ("fiction", "history", ...) and risk them being too broad to be useful, or do you allow a longer list with greater precision, and risk the list being too long to be manageable? Sorry, I don't have an answer to that. Needs debate. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From aakman at csufresno.edu Wed Nov 10 18:39:54 2004 From: aakman at csufresno.edu (Alev Akman) Date: Wed Nov 10 18:40:10 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <276b28274fcb.274fcb276b28@cvip.net> ----- Original Message ----- From: "D. Starner" Date: Wednesday, November 10, 2004 5:21 pm Subject: Re: [gutvol-d] [BP] The Future of eBooks > Alev Akman writes: > > > > > At 01:51 PM 11/10/2004, you wrote: > > > > >More towards what I was thinking, there are books that are > compilations of > > >several printed books. PG recently posted The Fifteen Comforts > of Matrimony: > > >Responses from Men and The Fifteen Comforts of Matrimony: > Responses from > > >Women. > > > > Here's the tabbed text format of the title you mentioned: > > > > The Fifteen comforts of matrimony. La Sale, Antoine de 1795 [by > > Isaiah Thomas] and sold at the Worcester bookstore. > > But it's not. A book that included part of the current PG title > was published > and printed then; but the PG title also includes various other > publicationsthat weren't combined with it in book form. I was just trying to make a point and give an example. I had no way of knowing what the publication date for the title in question was. Did I? : ) Alev. > > (It's probably a moot point, but that's not the text; that part of > PG text was > printed in 1706 in England. It may be a reprint, but it could be a > seperate > translation or a different text trading on the popularity of the > first.) > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From shalesller at writeme.com Wed Nov 10 20:01:10 2004 From: shalesller at writeme.com (D. Starner) Date: Wed Nov 10 20:01:29 2004 Subject: [gutvol-d] [BP] The Future of eBooks Message-ID: <20041111040110.D9C664BE64@ws1-1.us4.outblaze.com> Alev Akman writes: > > But it's not. A book that included part of the current PG title > > was published > > and printed then; but the PG title also includes various other > > publicationsthat weren't combined with it in book form. > > I was just trying to make a point and give an example. I had no way of knowing what the publication date for the title in question was. Did I? : ) What point? It's obvious we're at cross purposes. The publication date had nothing to do with it. My point was that that data was wrong, because the PG book was a unique compliation of several original volumes; the fact that the publication date was wrong was a red herring. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From sly at victoria.tc.ca Wed Nov 10 23:59:57 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Nov 11 00:00:16 2004 Subject: [gutvol-d] PG Catalog In-Reply-To: <4192C59C.4080203@adelaide.edu.au> References: <4191152C.9080702@perathoner.de> <6.1.2.0.2.20041109111430.04badf98@zimmer.csufresno.edu> <200411091911.28485.lynne@rhodesresearch.biz> <200411100608.iAA68I6P016938@posso.dm.unipi.it> <4191EFB7.6080203@perathoner.de> <4192C59C.4080203@adelaide.edu.au> Message-ID: Hi Steve. The problem with this proposition is that at the time a whitewasher is working on the final posting of a text, there is no catalog record to edit yet. New records are only generated once a day, when the directories are automatcally scanned to find any new files. Also, I do have the impression that the whitewashers would rather not deal with cataloging issues. (where a small change can suddenly require further following up in order to keep the catalog somewhat consistent, deal with further issues, etc.) As the closest thing we have to a "Catalog content supervisor" I will volunteer to work with additional information if we can find some way to get it to me--preferrably via catalog[at]pglaf.org--from the people producing the texts. And I must add here that simply having a tei template in place will not remove the advisability of still manually looking through every record. With the amount of less-than-ideal modifications that can creep in when just dealing with a Title and Author, I can only think I would see more if more fields are included. Andrew On Thu, 11 Nov 2004, Steve Thomas wrote: > The central problem, if I've understood all the posts, is that > the catalog entry is generated from the final header, which as > we all know omits lots of detail which the volunteers have. > > Would it be possible to add manual cataloguing to the posting > workflow? By which I mean, when a person (whitewasher?) posts a > new text, they also edit the catalog to add whatever level of > detail for the work is to hand. > > I understand that we don't want to add to the whitewasher's > workload, but -- thanks to Marcello's web interface -- it is > really quite easy to add to a catalog entry, so probably not a > great deal of work in comparison to the work they already do. > > Of course, once all the TEI stuff is in place, this won't be > necessary, but in the meantime ... > > > Steve From gld199 at yahoo.com Thu Nov 11 07:07:56 2004 From: gld199 at yahoo.com (Gemma Dearing) Date: Thu Nov 11 07:08:02 2004 Subject: [gutvol-d] browse books published in a specific year? Message-ID: <20041111150756.26633.qmail@web50803.mail.yahoo.com> Hi, I want to read books which were published in a specific range of years (1800-1820); is there any way I can browse/search for this? I realise that specific dates might not always be known, but even approximate date ranges would help a lot. TIA, Gemma. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.793 / Virus Database: 537 - Release Date: 10/11/2004 ___________________________________________________________ Moving house? Beach bar in Thailand? New Wardrobe? Win 10k with Yahoo! Mail to make your dream a reality. Get Yahoo! Mail http://uk.mail.yahoo.com From scott_bulkmail at productarchitect.com Thu Nov 11 08:17:19 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Thu Nov 11 08:19:37 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com> Message-ID: >If enough people would like to contribute a brief synopsis for texts >in the collection, we already have a place in the catalog they can >go. (although I don't know about the mechanics behind it) What we don't have are links on the book's Web page which say: Add a summary Add a review When/if this gets implemented, I strongly recommend that the person's contribution is posted automatically and immediately. People want to see an immediate benefit from their effort (however modest), rather than remembering to check back to see if their voice was heard. To minimize spam, the software could email a copy to a gut-comments-verification list (or some such), and any authorized person could go in and delete/edit if needed. Note that I'm inverting the usual process: instead of requiring every contribution to be approved, only require extra effort in the rare case of spam or other problem. FWIW, I also think a simple registration system would be fine, e.g. verify that a commenter is a member of any gut* list, or just do a simple round-trip to verify that they are supplying a valid email address. >When I tried to make a few of these myself, I found that >writing a good brief syopsis of a novel was harder than I >would have thought. True, but we shouldn't let that stand in the way of easy cases. Sometimes it's enough to copy or excerpt the preface. For example: 13032 The Book of Noodles >From the Preface: My design has been to bring together, from widely scattered sources, many of which are probably unknown or inaccessible to ordinary readers, the best of this class of humorous narratives, in their oldest existing Buddhist and Greek forms as well as in the forms in which they are current among the people in the present day. It will, perhaps, be thought by some that a portion of what is here presented might have been omitted without great loss; but my aim has been not only to compile an amusing story-book, but to illustrate to some extent the migrations of popular fictions from country to country. -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From marcello at perathoner.de Thu Nov 11 08:35:51 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Thu Nov 11 08:35:57 2004 Subject: [gutvol-d] browse books published in a specific year? In-Reply-To: <20041111150756.26633.qmail@web50803.mail.yahoo.com> References: <20041111150756.26633.qmail@web50803.mail.yahoo.com> Message-ID: <419394E7.9060607@perathoner.de> Gemma Dearing wrote: > I want to read books which were published in a > specific range of years (1800-1820); is there any way > I can browse/search for this? Not in the online catalog. There are many dates that would come in useful if we knew them: - date the text was created - date the text was first published - date the text was first published in the USA - date the edition was first published - date the last contributor died -- Marcello Perathoner webmaster@gutenberg.org From traverso at dm.unipi.it Thu Nov 11 09:27:55 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu Nov 11 09:28:07 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: (message from Scott Lawton on Thu, 11 Nov 2004 11:17:19 -0500) References: <20041110182853.9E2444BE64@ws1-1.us4.outblaze.com> Message-ID: <200411111727.iABHRtHL008551@posso.dm.unipi.it> >>>>> "Scott" == Scott Lawton writes: >> If enough people would like to contribute a brief synopsis for >> texts in the collection, we already have a place in the catalog >> they can go. (although I don't know about the mechanics behind >> it) Scott> What we don't have are links on the book's Web page which Scott> say: Scott> Add a summary Scott> Add a review Scott> the person's contribution is posted automatically and Scott> immediately. People want to see an immediate benefit from Scott> their effort (however modest), rather than remembering to Scott> check back to see if their voice was heard. I see a risk in this: the easiest way to add a summary is to grab one somewhere in the net, or in a book cover, and paste it. But it may be copyrighted, and PG will risk being sued for this. Not because of this, but some publisher might try to shut down PG with this excuse. This should be made by another web site, completely separated, and not being an interesting target. The way of accessing this other web site from the book catalogue page might be done in a legally safe way. Carlo From sly at victoria.tc.ca Thu Nov 11 13:23:42 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Nov 11 13:23:51 2004 Subject: [gutvol-d] Sources for PG texts Message-ID: On Thu, 11 Nov 2004, Jon Noring wrote: > Let me clarify (again) below, what I wrote in a separate message. You > may still reject it, but PG's past carelessness and looseness leads to > legitimate questions about the accuracy and acceptance of the pre-DP- > era texts. "Corrupt" may be a strong word (and inaccurate), but not > placing "textual integrity" as #1 (including the perception of textual > integrity) is simply wrong. Note that perceptions are just as real as > reality itself. I would be careful about making a distinction between pre and post DP-era texts. Since the creation of DP, there have been countless texts added to the PG collection from other sources, without page images saved, or indication of exact editions used. Project Gutenberg has always been open to accepting texts from any source, as long as they can be copyright-cleared. I don't see any likelihood of this changing. Andrew From sly at victoria.tc.ca Thu Nov 11 14:51:04 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Nov 11 14:51:15 2004 Subject: [gutvol-d] Cataloging In-Reply-To: <4192C5A6.3020006@adelaide.edu.au> References: <5.2.0.9.0.20041110081643.01fb6b10@snoopy2.trkhosting.com> <41924575.2030708@perathoner.de> <4192C5A6.3020006@adelaide.edu.au> Message-ID: (Yes, there is a mailing list for discussing cataloging issues, but it seems to have very little traffic, and I feel I may have a better chance of sharing my ideas with people here.) On Thu, 11 Nov 2004, Steve Thomas wrote: > The key problem is one of scale. Do you limit the field to a > short list of valid terms ("fiction", "history", ...) and risk > them being too broad to be useful, or do you allow a longer list > with greater precision, and risk the list being too long to be > manageable? > > Sorry, I don't have an answer to that. Needs debate. I don't have an answer either. So I'll ask a question: Is it possible to have both large and small scale? Here is one possible way that could be approached: In the recent discussion on this list regarding cataloging, I've seen mention of different things that I might label genre, form and subject. Genre would be examples such as Science Fiction, Mystery, Historical Fiction, etc. Form would be examples such as novel, essays, drama, poetry, short stories, etc., as Steve mentioned is coded in MARC 008 field. Subject would be the subject headings one could find in a traditional library's catalog. For example: Legends--British Columbia--Vancouver We already have some examples creeping into the PG catalog of trying to cover all of these in the Subject field. (ie a collection of poems with "Subject: Poetry". This should be used for a book which is _about_ poetry, not one which merely contains poetry.) All three of these divisions could really be of great use to people using the catalog; however, having enough volunteer effort to have them consistently entered is of course a sticking point. Andrew From gbnewby at pglaf.org Thu Nov 11 18:24:13 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Nov 11 18:24:13 2004 Subject: Did this get slipped in without discussion? (was Re: [gutvol-d] Cleaning up messes) In-Reply-To: References: Message-ID: <20041112022413.GB8242@pglaf.org> (I redirected to gutvol-d@lists.pglaf.org. Who sent this to Lyris @ listserv.unc.edu? That server is broken, the list there is defunct. I have been trying to delete the list there for months, but the software is perpetually non-responsive) This will probably be my last response to Jon. Clearly, some people can't take "yes, go for it!" as an answer. Jon wants to tell other people how they should do things, but is unwilling to make things happen himself. He insists there is a "right" way of doing things, and belittles the efforts of those who don't fit his notions. My view is that Jon will not be content until all the people working on PG are ousted, in favor of his preferred organization, governance, fundraising, production rules, and collection guidelines. This is not going to happen anytime soon, and other than being critical of the status quo, Jon has contributed nothing towards making it happen anyway. Instead, Jon has repeatedly been offered the ability -- with support and encouragement -- to create the organization or content he so strongly desires. Some people can't take "yes" for an answer, or are not content with the ability to control their own domain without controlling others. A few more comments: On Thu, Nov 11, 2004 at 01:35:04PM -0700, Jon Noring wrote: > Greg Newby wrote: > > > I have only a few brief things to say about this. Jon, and other > > interested persons, are very much welcome to start their own projects, > > sub-projects or related activities to pursue this agenda, or other > > agendas. We (the messy ones) will provide encouragement and support. > > > > We have pretty extensive wording on this philosophy and encouragement > > in the "FAQ" items Michael and I wrote, online at > > > > http://gutenberg.org/about > > I urge all the PG people reading this to read Michael Hart's statement > of the principles of PG governance given in > > http://www.gutenberg.org/about/faq1 > > (Notice the date of it from June, and edited in October, after much of > the discussion about the organization and governance of PG. As far as > I know, this was silently put up without any announcement to the > group.) > > Was this statement of principles run by the actual owners of PG, the > thousands and thousands of volunteers who have donated their untold > hours of time to further Project Gutenberg? Did they get a chance to > discuss and approve of this statement? There were announcements with requests for feedback in about 6 *months* of weekly & monthly newsletters, with advance copies going back to around May. There was a posting to the front page of gutenberg.org, for months and months. There were at least a few mentions on gutvol-d. > Or is PG a "benevolent" dictatorship, where the volunteers-at-large > are not given any real say? You know better. > So much for democracy and decentralization, where "less is more." > (Orwell?) I see PG primarily as a meritocracy. Always, the pattern is to enable, empower, support and encourage those who want to do things to further the mission - or related activities. The people who do the most are the most active in shaping policy and future direction. Your insinuation that there are central power brokers who are insulated from the many people who are contributing is inconsistent with how things -- *all* things -- get done. > Who owns Project Gutenberg, anyway? Until that is clarified, nothing > can be resolved. You know the answer to this, too. You are simply trying to stir up discontent and create an "us vs. them" atmosphere. For those who, unlike Jon, don't know: visit http://gutenberg.org/fundraising for a quick rundown. An even quicker rundown: - Michael created Project Gutenberg, and owns the trademarked name, "Project Gutenberg" - PGLAF was formed in 2001 as the legal entity that operates Project Gutenberg - PGLAF has four board members, including me. I'm also the CEO. - I am a volunteer for PGLAF, and have worked with PG since 1992. The extent to which Michael, or I, or PGLAF, has sway over the daily activities of PG is limited. Set direction: yes. Control some of the technologies: to some extent. Get people to do stuff: only as they agree & desire. The ability of Jon or anyone else to take leadership and make things happen is just as strong as mine, or anyone's. Flinging mud because so few people subscribe to your view of reality is certainly not going to create progress towards your goals. > > Finally and most importantly, I utterly reject Jon's accusation that > > the lack of source matter or other metadata (or formatting, or > > anything else) makes the Project Gutenberg content of today or > > yesterday "corrupt." > > Let me clarify (again) below, what I wrote in a separate message. You > may still reject it, but PG's past carelessness and looseness leads to > legitimate questions about the accuracy and acceptance of the pre-DP- > era texts. "Corrupt" may be a strong word (and inaccurate), but not > placing "textual integrity" as #1 (including the perception of textual > integrity) is simply wrong. Note that perceptions are just as real as > reality itself. > > I take it PG's official position, then, is that PG will continue > with the policy of not requiring the source information to be included > in the metadata associated with each PG text? If this policy is to Yes. There is very little that is required, and as the FAQs mentioned above say quite clearly, we intend to keep it that way. > continue, why? If this policy been changed, then that calls into > question those texts where the pedigree is unknown. Question them all you want. Or don't even read them. But if you want to fix them, get started, rather than talking about careless, inaccuracy, lack of textual integrity, etc. As I mentioned, I'm tired of saying "yes" to you, and then having you argue about it. You have all the freedom you could possibly want to do things your way. What you cannot have is control of the past or present of PG. > I'm pretty certain that the vast, overwhelming majority of PG > volunteers who do take a position either way on this issue want the > full source information to be included in the metadata. > > As I said in the followup clarification: > > "A clarification... > > "Note that certainly any third-party can attempt to verify the > authenticity of a PG text even if the source information is not known > and no scans are given. However, not giving the source work (and not > making the scans immediately available), the third-party has a much > more difficult time in verifying the text. You are envisioning frustrated scholars and others who care about such things. Those are not our target audience, and never have been. While it's likely that some such scholars have "turned off" to PG, I can tell you that there are close to zero requests for such pedigree information that come in on a monthly basis. In short, you are trying to portray your pet peeve as a universal truth, desired by all. First, again (and as stated in literally *every* PG header, for decades): we do not try to keep our books in accord with any particular print edition. We are not catering to scholars who care about particular dead trees sources. Second, I do not accept your idea that this is a major impediment to use and acceptance by scholars, or anyone else. This is pure speculation on your part (regardless of whether it's backed up by a few personal stories), and counter-indicated by the uses and support requests we hear about. Finally, and perhaps most importantly from your point of view, I'm still saying "yes," not "no." I will be perfectly happy, overjoyed even, to have better tracking of source information, richer markup, and available scans for more of our eBooks. I expect that part of our cataloging discussion outcomes will be better facilities for doing this -- as will the outcomes of the PGTEI markup. But as I keep saying, (a) the lack of pedigree, scans, etc. are not going to stop us from adding submitted eBooks; (b) people who want to retroactively work on existing eBooks are welcome to do so. > "PG, by identifying the source document, *and* providing scans, adds a > lot of credibility (and greater usability) of the digitized texts it > produces and distributes. This action effectively says: "We are proud > of our work, and stand behind it fully. We even provide you, the user, > with full information about its pedigree, and the original page scans > are available for your use and easy verification." Once again, you are belittling the efforts of everyone who created these works. Did you ever hear the story about flies, honey & vinegar? > "Of course, it also aids in copyright clearance having the original > scans and full source information available. Scholars and researchers, > too, will now find the collection to be sufficiently authoritative for > their purposes, where now it is NOT. If PG wishes to become Big League, > it has to begin playing Big League ball." Your view of Big League ball for eBooks seems to include the following: - stating that all the work of past & current volunteers is crap. Or was it just "corrupt," or "careless?" Or "loose" and "messy?" - dictating that all new content from all sources must include pedigree information and scans, and may only remain true to the printed dead trees edition - only accept complete markup allowing for re-creation of the original printed word In my final words, I again encourage you to start your own effort to make such things happen. Use the PG mailing lists & newsletter to solicit like-minded participants. Work with DP to spin off your own projects there, or your own independent DP-like effort. Play in the big league. Cater to scholars. Include only the works you think pass muster. Build your own constituency. Meanwhile, you might want to review the documents in http://gutenberg.org/about and see again why your efforts to belittle past efforts or pursue your agenda to restrict current activities are rejected. -- Greg From Bowerbird at aol.com Thu Nov 11 20:00:52 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Nov 11 20:01:11 2004 Subject: [gutvol-d] a few questions that i don't know the answer to Message-ID: <9d.529136af.2ec58f74@aol.com> here are a few questions before i skip on out of here... first, for jon, and/or any other c.s.s. supporters out there: my finding is that, at least using internet explorer, when i copy c.s.s.-indented text (such as that in a block-quote) out of the browser-window, the indentation gets lost... (i'm not sure whether other browsers are able to retain the indentation.) do you have a solution to this problem? second, for greg. people over at distributed proofreaders have reported that the f.a.q. here at project gutenberg do not state that styled text (specifically, italics and bold) be marked with underbars and asterisks in the text files. the understanding i have from you is that this has become the official policy of project gutenberg. if that's not the case, would you please inform people here? and if it _is_ the case, when you next update the f.a.q., could you include this policy? thank you. the next questions involve long-standing and often-repeated requests that i have made for other changes to p.g. policy. having received no satisfaction, in spite of the reasonableness of these requests, i will pursue a strategy of lobbying for them to a wider audience, but i make them again here for the record. 1. could you _please_ strive for consistency in your e-books? 2. could you please ensure the policy on styled text is upheld? 3. could you please start including graphic-file-names in your plain-text versions, so my viewer-app knows what to display where? 4. could you please start including page-break information in your plain-text versions, so my viewer-app can use original page-breaks for those end-users that might desire that capability? 5. could you please start including line-break information in your plain-text versions, for the same reason? my documentation on zen markup language (z.m.l.) will demonstrate how you can incorporate these requests into your plain-text files... i'll continue to monitor this listserve until the end of the year, so you can respond here, or backchannel, whichever you prefer. not that i expect a response, since i've never gotten one so far. david widger and jim tinsley, i salute you for all the hard work you do. j. michael, thanks for being the one person who always treated me fairly here. and carlo, your record was almost as good; and as i've told you backchannel, you are one of the very few people here who quite consistently demonstrates a solid grasp on the problems and has good ideas about how to build the solutions to those problems, so keep on asserting yourself, because these people really need you. michael hart, if there's anything i can do for you, let me know. -bowerbird From lofstrom at lava.net Thu Nov 11 20:45:02 2004 From: lofstrom at lava.net (Karen Lofstrom) Date: Thu Nov 11 20:45:18 2004 Subject: [gutvol-d] PG audience In-Reply-To: <20041112022413.GB8242@pglaf.org> References: <20041112022413.GB8242@pglaf.org> Message-ID: On Thu, 11 Nov 2004, Greg Newby wrote: > You are envisioning frustrated scholars and others who care > about such things. Those are not our target audience, and never > have been. While it's likely that some such scholars have "turned off" > to PG, I can tell you that there are close to zero requests for > such pedigree information that come in on a monthly basis. Well, yes, because scholars AREN'T using PG, that's why you don't get any requests. At DP, we're processing things that no one but a scholar will ever read. Ever. I'm proofreading one of Canon Sells' books about Islam. No one who is interested in current, up-to-date information is going to read this book. It's antiquated. However, some scholar working on a book re "history of Western perceptions of Islam" might be thrilled to get access to an old out-of-print work. If he/she feels the work is reliable, that is. If you don't want to cater to scholars, you're throwing away much of DP's work. -- Karen Lofstrom {Zora on DP} From traverso at dm.unipi.it Thu Nov 11 21:37:34 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu Nov 11 21:37:53 2004 Subject: [gutvol-d] PG audience In-Reply-To: (message from Karen Lofstrom on Thu, 11 Nov 2004 18:45:02 -1000 (HST)) References: <20041112022413.GB8242@pglaf.org> Message-ID: <200411120537.iAC5bYHk016230@posso.dm.unipi.it> One of the problems is that, until recently, the whitewashers removed the informations on the origin of the book, like date of the edition, publisher, etc; and the change in policy has not been sufficiently advertised, so some people (even at DP) remove the information to conform to the perceived PG policy. We should at least change the official policy to recommend including the full information on the sources (as well as information on e.g. page numbers when it is useful, e.g. when there is an index or cross-references by pages, or when the origin is a standard reference). I believe that PG has space for everything: combined editions, abridged editions (provided they are stated to be abridged editions...) scholarly editions. What is what should however be stated, and acessible through the catalogue. Cataloguing work may be distributed. I am sure that at DP a cataloguing step done by specialized volunteers might be added, and probably extended to non-DP submissions. The same team might be willing to update the existing items, starting from past DP contributions but extending to the other PG items. But please let us start to have sound cataloguing procedures for the future. For example, PG should have a separate whitewashing step for the catalogue (that might be done by a separate team, the competences required being different). Carlo From gbnewby at pglaf.org Thu Nov 11 23:33:29 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Nov 11 23:33:31 2004 Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few questions that i don't know the answer to) In-Reply-To: <9d.529136af.2ec58f74@aol.com> References: <9d.529136af.2ec58f74@aol.com> Message-ID: <20041112073329.GA18841@pglaf.org> On Thu, Nov 11, 2004 at 11:00:52PM -0500, Bowerbird@aol.com wrote: > second, for greg. people over at distributed proofreaders > have reported that the f.a.q. here at project gutenberg > do not state that styled text (specifically, italics and bold) > be marked with underbars and asterisks in the text files. > the understanding i have from you is that this has become > the official policy of project gutenberg. if that's not the case, > would you please inform people here? and if it _is_ the case, > when you next update the f.a.q., could you include this policy? > thank you. Jim maintains the FAQ, and DP has their own style guides that sometimes vary for different texts. So, I'm not really the right guy to ask. I don't think there was agreement on how to handle bold & italics, but I do think everyone I heard from agreed it should be indicated somehow in plain text. So, I don't think there is an official policy on handling bold & italics in plain text files. But if DP has an official policy I'm unaware of, then it should probably be reflected in the FAQ as a recommendation. Sorry I don't know the current state on this, but perhaps Jim or some of the DP project managers can contribute the latest thinking. -- Greg From Bowerbird at aol.com Thu Nov 11 23:41:53 2004 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Nov 11 23:42:14 2004 Subject: [gutvol-d] goodbye Message-ID: <1d1.2be96420.2ec5c341@aol.com> nearly exactly one year ago, i came on the gutvol listserves, challenging the gutvol-p markup wonks as all-talk/no-action, noting that in spite of years of discussion, they hadn't made any of the promises of a heavy-markup strategy materialize, and indeed hadn't even gotten many of the e-texts marked up! a year later, after an exchange of literally hundreds of messages -- maybe thousands, i don't know, i wasn't really counting them -- many of them inflammatory, the situation is essentially the same. i tried to wake people up to a simpler way that was still effective. but nobody here was willing to hear it. in fact, i was "moderated", blamed for the flaming that my detractors used to victimize me. i don't play the victim role though, so i'm off now, to work on my own. i'll drop by again in another year, and see if y'all have made any progress by then, or if you're still stuck in the same old merry-go-round circles... in the meantime, i'll start up a blog, telling the world all the things that i think you're doing wrong. because i think they deserve to learn them. i tried to tell you all this personally, as friends, here on your own lists, but you weren't willing to listen. so now i'm going public. i tell you this so you know i'm not going "behind your backs". heck, you might want to read the blog yourself; maybe it'll help you see what you've been missing. here's a message i posted a while back. you can call it "21 steps to happiness". ------------------------------------------------------------------------------ here's a little _overview_ to help you get your bearings, at least in regard to _my_ work, _my_ viewer-program, _my_ format, _my_ markup system, and _my_ philosophy. 1. the e-texts -- as they are now -- must be regularized. 2. i can write programs to do most of that automatically. 3. the results need to be checked for quality control, and 4. some missing information will need to be re-inserted. 5. once that is done, the files will be _finished_, in that 6. my viewer will present them as high-powered e-books. 7. users can push a button to create high-end .html files, 8. or save text as an .rtf file, or print out to paper or .pdf, 9. in a way that gives 'em customized high-quality output. 10. my program will do text-to-speech, and screenshots, 11. and let people explore the project gutenberg library, 12. and easily report errors they encounter in any e-text. 13. those error-correction reports will be automatically 14. routed to a system that presents all the material, so 15. a human only has to say "yes" to approve the mod, and 16. change-logs will be updated and a notice distributed. 17. this e-text standardization and ease of handling will 18. nurture a flowering of synergistic uses of the library 19. by an array of creative and imaginative programmers 20. that will engender a book-driven revolution in thought. 21. and everyone will live happily ever after. the end. ------------------------------------------------------------------------------ in the absence of a more compelling vision from anyone else, i now depart... -bowerbird From bill at truthdb.org Fri Nov 12 00:04:48 2004 From: bill at truthdb.org (bill jenness) Date: Fri Nov 12 00:05:31 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <200411111727.iABHRtHL008551@posso.dm.unipi.it> References: <200411111727.iABHRtHL008551@posso.dm.unipi.it> Message-ID: <32924.134.117.137.186.1100246688.squirrel@134.117.137.186> >>>>>> "Scott" == Scott Lawton >>>>>> writes: > > >> If enough people would like to contribute a brief synopsis for > >> texts in the collection, we already have a place in the catalog > >> they can go. (although I don't know about the mechanics behind > >> it) > > Scott> What we don't have are links on the book's Web page which > Scott> say: > Scott> Add a summary > Scott> Add a review > > > Scott> the person's contribution is posted automatically and > Scott> immediately. People want to see an immediate benefit from > Scott> their effort (however modest), rather than remembering to > Scott> check back to see if their voice was heard. > > I see a risk in this: the easiest way to add a summary is to grab one > somewhere in the net, or in a book cover, and paste it. But it may be > copyrighted, and PG will risk being sued for this. Not because of > this, but some publisher might try to shut down PG with this excuse. > > This should be made by another web site, completely separated, and not > being an interesting target. The way of accessing this other web site > from the book catalogue page might be done in a legally safe way. > > Carlo > > I think the way to go is to have a pg wiki linked to the catalog page where the users could input reviews, literary commentary, author biographical details, and etc. This would allow DP and other producers to concentrate on producing and not get bogged down with researching extraneous useful facts. I am certain there are some open source wikis available that could be adapted. Perhaps the documentation side could be set up as a separate foundation. From hyphen at hyphenologist.co.uk Fri Nov 12 00:33:10 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Nov 12 00:33:49 2004 Subject: [gutvol-d] Sources for PG texts In-Reply-To: References: Message-ID: <44t8p05ohkte7iufgtckqnr3kbikevmkcn@4ax.com> On Thu, 11 Nov 2004 13:23:42 -0800 (PST), Andrew Sly wrote: | | | On Thu, 11 Nov 2004, Jon Noring wrote: | | > Let me clarify (again) below, what I wrote in a separate message. You | > may still reject it, but PG's past carelessness and looseness leads to | > legitimate questions about the accuracy and acceptance of the pre-DP- | > era texts. "Corrupt" may be a strong word (and inaccurate), but not | > placing "textual integrity" as #1 (including the perception of textual | > integrity) is simply wrong. Note that perceptions are just as real as | > reality itself. | | | I would be careful about making a distinction between pre and post | DP-era texts. Since the creation of DP, there have been countless | texts added to the PG collection from other sources, without page | images saved, or indication of exact editions used. | | Project Gutenberg has always been open to accepting texts from | any source, as long as they can be copyright-cleared. I don't | see any likelihood of this changing. So all you volunteers who have specialized interests and want Out of Copyright books about your special interest available forever. Get working. -- Dave F From jon at noring.name Fri Nov 12 01:23:49 2004 From: jon at noring.name (Jon Noring) Date: Fri Nov 12 01:24:24 2004 Subject: [gutvol-d] My spanking , and my reply In-Reply-To: <20041112022413.GB8242@pglaf.org> References: <20041112022413.GB8242@pglaf.org> Message-ID: <109886230796.20041112022349@noring.name> Greg wrote: > Jon wrote: Wow, my backside is really sore from the spanking Greg just administered to me. Some of the spanking was deserved, but some of it was not, imho. More on that later, but first I'd like to first give some thoughts on the problems with the gutvol-d list and archive before answering several of Greg's comments. (Walking slowly...) > (I redirected to gutvol-d@lists.pglaf.org. Who sent this > to Lyris @ listserv.unc.edu? That server is broken, the list > there is defunct. I have been trying to delete the list there > for months, but the software is perpetually non-responsive) I'm not sure. I think I directed all my replies to the right place. Btw, I tried to search the gutvol-d archives with regards to the FAQ0 and FAQ1 issue (that is, how much was it really discussed on the lists as Greg said it was?), and noticed that indeed the archive appears broken -- everything before August is gone. James Linden told me that the older archives may be lost for good, at least the Lyris version. Did anyone here keep their own copy of the gutvol-d (and I suppose other gut*) archives? I've kept full backup archives of the several dozen mailing lists I've run since 1992 (by simply collecting all the emails sent out in plain text unix mbox format), but not lists I don't run, thinking that those who administer them do as I do and create redundant backups in a universal plain text format (as Michael would approve!) Since I've lately been sticking my nose into various affairs here some think I should not, I may as well do it one more time and give another opinion that the various Gutenberg lists be moved to YahooGroups (with 2-3 people designated as backup archivists in unix mbox format -- I'll gladly volunteer to be one of the backup archivists since I already do that for over twenty lists I run and co-administer.) Why YahooGroups and not some listserv software running on PG's own server? 1) I've had experience running various listserv software since 1992, and I find a lot of time is saved when someone else does it for me as YahooGroups does. 2) YahooGroups is actually very good and reliable, and since so many people now subscribe to one or more YG lists, it's easy to subscribe to one more. My decision to move The eBook Community, now with over 2400 subscribers, to YahooGroups in 1999 has proven to be the right decision. With a custom listserv run by PG, it's just another list I have to separately subscribe to, and if I have to change my email address, it's another separate service I have to contend with. YahooGroups consolidates all my subscribed lists into an easy, manageable form that no other listserv software comes close to in power and convenience. 3) YahooGroups includes other useful services, such as a Files Area and facilitated YahooIM Chat. 4) It's free! It doesn't take up any diskspace or bandwidth on the local server. (There are the insufferable ads, though, but these are easily ignored.) 5) It is possible to extract plain text (with full headers) for every posted message to any YahooGroup. 6) It archives messages for quite a while back. The eBook Community presently has 21289 messages available in the online archive dating from 1999 -- I don't know when YahooGroups will begin lopping off the oldest ones to save space, but it hasn't yet. I have separate archives for the mirrored unix mbox archive. 7) It has a web access for those who prefer that over receiving email. 8) Administration by the moderators is a breeze. ********************************************************************** O.k., to address some of the issues Greg brings up. He is certainly angry with several of my comments today. As noted above, some of it I deserved, either in what I said or how I said something... > My view is that Jon will not be content until all the people > working on PG are ousted, in favor of his preferred organization, > governance, fundraising, production rules, and collection > guidelines. This is not going to happen anytime soon, and > other than being critical of the status quo, Jon has contributed > nothing towards making it happen anyway. Instead, Jon has > repeatedly been offered the ability -- with support and encouragement -- > to create the organization or content he so strongly desires. There are several related points I'd like to address here, since Greg brings up a couple I didn't really want to talk about (who otherwise cares about my motivation for being here and for what I've brought up recently?): ***** My motivation is certainly not to "take over" PG and build a dictatorship, and to kick out the old guard. Those who know me know that I'm the opposite and in fact fear the same things Michael does with respect to proprietary interests trying to defang the growth of a robust and fully available digital public domain. The OpenReader Project, which I co-founded, clearly shows my focus on open standards, open source, and creating an ebook future founded on these principles. In personality type, I am definitely a Fighting Idealist, for better and for worse. I am definitely not very politically savvy and not very diplomatic with my words, again for better and oftentimes for worse. For example, I commented earlier today, in response to a message Juliet posted, that maybe DP should consider a policy that if they don't get unencumbered page scans to put freely online (because some group is anal about their beloved source document of a public domain work), then they should not accept that situation and work around it. Who's the idealist here? (referring to PG's FAQ0 or FAQ1.) (But DP has their way of doing things and policies, which is fine. I greatly admire DP for what they have accomplished, are now doing, and fully support their vision for going to the next-level with an XML-based system. Juliet is doing an extraordinary job and has not been thanked enough for what she and her volunteers have accomplished, which borders on the remarkable. I am working with Juliet and Charles (who's currently on "sabbatical") to help them, as I can, with the organizational challenges in their wish to move to next level, both in XML implementation, and in increasing their capacity to meet the challenges for the intriguing "Million Digital Texts Project.") I make no bones I have strong feelings based on the bigger picture as I see it -- and I honestly believe my vision is even bigger than Michael's. I don't believe the ad-hoc, everyone does it their own way approach for producing etexts is sufficient any more to accomplish this Big Vision, and in fact will work against the Big Vision. Greg no doubts disagrees with me as FAQ0/1/3 outlines, but so be it -- history will be the ultimate arbiter of our differing world views. I see how inadequate the current PG collection is for the future. This evaluation is based upon three different ventures I've been involved with since 1999 (including one now in development) where this Big Vision has been, and is now being researched, by some really sharp technical people who are nailing down the many architectural and technical requirements. There are many more subtle requirements than one would at first imagine -- I'm only now beginning to understand them in a holistic sense -- and they reflect themselves all the way back to the fundamental structure of the texts themselves, and the associated metadata/catalog information. I see millions of high-quality, uniform digital texts, both public domain and Creative Commons, in a single repository which allows people to access them, annotate them, and link them together and with other texts and with other types of multimedia content in other repositories in very powerful ways that would take too long to describe here. That's one reason I state the master texts must be in well-structured XML, since that will enable the advanced features this repository will have. Properly done XML also confers many other benefits too numerous to mention here. Both DP and PG have blessed the right XML approach (e.g., as exemplified by Marcellos PGTEI), which is very encouraging. But there's more. For reasons I won't go into here (again for brevity sake), this Big Vision also sets slightly more stringent requirements on both metadata and cataloging than is currently done in PG, and it's the spinning wheels of the current discussion on metadata and cataloging that lead to my posts this afternoon out of sheer frustration. I see no *requirements* mentioned, and no vision as to *what* the metadata/ catalog information is to be used for. How can one fix the metadata requirements without a discussion of what the metadata will be used, and useful for? It is frustrating to see all this ad-hoc activity happening with no guidance as to the who, what, when, where, why and to what extent -- the purpose of the metadata -- being resolved based on general requirements, which in turn are derived from the full and detailed vision (which is NOT given in the FAQs) of why PG exists and what it produces. Certainly I could try to force my way further into the discussion (more than I have now) and try to provide answers to these questions, but then I'll just become another voice to add the ad-hoc cacophony we now have where the one who produces something first wins, even if it ends up not meeting the full long-term goals. This is the result of the FAQ0 and FAQ1 philosophy, which does not always give the results one hopes for. To get resolution on tough issues it is oftentimes necessary for the leadership to take charge and to firmly guide discussion to logically resolve what must be done. In some ways, it may be that the "leadership" simply doesn't have the time (because it is voluntary) to formalize the process to force a structured approach to fast decision-making and buy-in to the result. Understandable, but sad. What I fear the most, and this I've expressed to Brewster Kahle (who I meet again next week about Project Gramophone) and to JD Lasica (who's launching the ourmedia project and I'm assisting with the metadata/ cataloging side) is that many people will develop these wonderful repositories of digital content (I'm also working on Project Gramophone/Sound Preserve to transfer and archive millions of old sound recordings), with billions of digital objects, which simply won't and can't "talk" with each other, because everyone is "doing their own thing" PG-style. Wheeee, the late 60's all over again. Let me give a small example to illustrate just a corner of what the world could be like if everything is done properly: Imagine someone creating a video for ourmedia where someone is playing the piano, say "Take the 'A' Train", composed by Billy Strayhorn and which became Duke Ellington's theme song. We would want to be able to allow the viewer to link, if they so choose, with the song lyric repository, with various wikipedia entries, and to Sound Preserve to bring up orchestral recordings of "Take The 'A' Train" by Duke Ellington and others. We'd also like to link to the Project Gutenberg collection for any works, such as Duke Ellington's book "Music is My Mistress" (assuming PG got permission to add it, likely not.) And of course we'd allow the end-user to join special communities built around any particular topic connected with that song -- just as Ellington communities, jazz communities, Strayhorn communities, etc. Doing all of this (and a lot more) confers a few added requirements, especially with regards to metadata information (text has the redeeming grace that it is fairly easy to dig out some information by full text searching -- but not standardized subject matter fields! -- but it is much harder with video and audio so the metadata and cataloging requirements for video and audio will likely be more stringent and extensive.) PG's self-enforced isolation, because of its seeming fear of working with the Big Boys (which is somewhat understandable) is working against PG in various ways in seeing the bigger picture of how the text production activities it is catalyzing will mesh with this much bigger, more wonderful world. But if the various repositories don't do it right from the start, including Project Gutenberg, and they end up with millions and billions of digital objects *not done right*, then the interlinkage will be much more difficult and nowhere near as powerful and useful as it could be. It will be essentially impossible to fix after the fact. JD Lasica now recognizes this and is supporting somewhat expanded metadata standards to assure inter-repository linkage, but I don't see the PG "leadership" seeing this, nor am I confident it can because of the FAQ0/1/3 constraints. Note how PG is having difficulty fixing the metadata and catalog info for a *measly* 10,000 or so texts. Imagine having a million of them *not done right* (especially with regards to metadata and catalog information requiring human input -- for some digital objects, if the data is not collected right at the start, it will be impossible to figure it out much later, even with human intervention. So much for the power of our digital future.) (Part of the Big Vision calls for aiding integration using James Linden's very interesting "Open Genesis" concept, currently under development. James is probably not yet ready to discuss this, but it is best described as the "Semantic Web Done Right From the Start." The requirements Open Genesis confers upon digital content repositories are surprisingly quite minimal -- but it is needed to have a standardized framework to improve inter-repository and inter-object linking. Marcello's effort to bring RDF into the mix is laudable and will certainly aid more robust intra- and inter- repository linking.) I'd love to see PG take the lead to make this happen for the text side of the house, and that's my motivation in pressing a lot of issues here to the point where I may become personna non grata, but it won't happen until PG realizes that it needs to confer more requirements on the texts and metadata it catalyzes and collects from the many volunteers (outside of DP, which is doing things mostly right by my reckoning), as well as to more actively work with other repositories -- to become a part of the bigger world rather than isolating itself as it seems to. It needs one or two full-time people -- this costs some $$$ -- this requires a somewhat higher level of organization and a maybe a slightly different governance to even be given this $$$ (or to develop some ongoing revenue stream.) And if it wants to play a major role in the "Million Digitized Texts Project" (should it get successfully launched), it *has* to change its governance and how it interacts with the world at large. Frankly, the FAQ0 and FAQ1 documents are actually quite hostile by inferring the world at large is somehow evil and out to get PG. Yes, some parts of the world at large are hostile to PG and wish it gone, but not all of them. The wisdom is to associate with your friends and those who share the same vision, not drive them away by painting everyone with the same "evil" brush. If you don't believe FAQ0 and FAQ1 sends this message to those in various outside groups, I suggest the wording of FAQ0/1 be looked at again by what it doesn't say but should say. For example, there's little in there about building, for example, close strategic partnerships with other like-minded organizations, and to work together on common standards and common goals. Nothing there is mentioned about joining standards and other types of organizations so as to promote PG's interests. PG has become disturbingly quite xenophobic in orientation -- it acts as if the rest of the world does not exist or does exist and is evil, and that magic will always automatically happen if you simply let everyone do their own thing. Magic does happen often, but magic can also run out. To answer Greg's "I don't take Yes For an Answer" (which is, interestingly enough, what the New York Times William Safire today used to describe Arafat's 1999 refusal of unbelievable concessions by the Israelis), let me say that I am working hard on the vision. I'm coordinating with ourmedia, with Project Gramophone (now called Sound Preserve), and working with another venture dedicated to tying this all together and to launch the "Million Digitized Texts Project." Will we succeed in at least launching MDTP? Maybe. Maybe not. But I am taking Greg's "Yes for an answer" to heart and I am working on it as I envision it -- it's just that it is not restricted to the closed world of PG so that's why it seems somewhat out of lockstep with what is going on here. But if we do succeed in launching MDTP and the Bigger Vision it will be a part of, and if PG wants to play a *major* role with MDTP -- and I'd certainly welcome PG and its "leader volunteers" to jump onboard for many obvious reasons -- PG will have to change in certain ways simply to work as a major player with the MDTP project. If PG decides it rather not change its governance and focus by increasing the acceptable text and metadata standards (which really are not that much), then that's totally understandable -- PG could still play a role, but it would essentially be peripheral and the parade may end up marching by it. ***** On another point, if I expressed wording reflecting hostility to those who have contributed texts to the PG collection over the years, this was not my intent, and I apologize for this. I've typed in whole books by hand, and then laboriously proofed them, marked them up, and converted into ebooks, so I am familiar firsthand with this process of love. Some of the books being talked about here -- the very difficult 17th/18th century texts -- is a remarkable achievement to digitize (and markup as well.) It amazes me the commitment many people here have to digitizing texts. My comments were directed at the leadership for not following what I believe are slightly more stringent policies with regards to metadata and text formatting requirements (some of which are understandable given where things were in the early 1990's). I'm a firm believer in the principle of "the buck stops here". That is, if there are problems, it is the responsibility of the PG leadership due to their prior decisions and established system. It may be unfair at times since it is impossible to accurately predict the future and to develop the right approach to meet that future (e.g., Michael Hart's early allergy to including source information in texts appeared to be a protection mechanism against copyright infringement claims.) But nevertheless it is up to the leadership to take responsibility, adjust accordingly, and to pro-actively "fix it". Maybe some of the problems are best solved by the ad-hoc, hands-off approach as given in FAQ0/1/3, but I don't believe all problems with the PG Collection will be solved by this approach, especially when looking at the useful linkage of the PG collection with other content repositories as outlined above, which requires an integrated approach, and working cooperatively with other groups. ***** On a point related to what I wrote earlier, I'm troubled by this view that PG's collection should be focused toward a particular use niche, rather than to be designed to be useful for just about every use. As I've analyzed things, the added requirements to make PG digital texts useful not only for general reading, but for scholarship and research (plus linking to other repositories) are so few that to ignore them is downright puzzling. What is needed? Well, require the source info be included in the metadata -- that's the major one. The next one is to work hard to acquire and preserve page scans. There is likely a few other requirements which are even less burdensome. The vast majority of the effort to produce digital texts from paper copy is to scan (or type in) the book and then proofreading it. The rest of the added stuff to make the texts more useful is time- and effort-wise miniscule by comparison. This reminds me of a Minnesota-Norwegian joke about the Norwegian who tried to swim across a lake -- when he got 95% of the way to the other side, he decided he couldn't make it, and swam back. It's ludicrous not to make that extra 5% effort, and elevate the PG collection to a significantly higher plane of usefulness, quality, and better digital integrity (talked about next). This is especially tragic given the hundreds of thousands of hours already devoted to the PG collection, when that extra 5% (if that) would have made a significant improvement. ***** And about digital integrity, I stick to my position that anything which PG requires to increase the digital integrity of the text itself to the original source is a Good Thing (tm). Certainly deviations from the source must be allowed, such as correcting some obvious typesetting errors (as an aside, has PG established a uniform policy for what types of edits/corrections in the digital text are allowed? Or is this again one of those FAQ0/1 "let's not interfere with anyone", type of things?) But what I mean by digital integrity has to do with the faithfulness, or more importantly, the perception of faithfulness, of the *meaning* of the text to the original source. It's a legitimate question to ask whether those involved in producing digital texts took more liberties with the text than they should have? This is not a trivial issue when we look at history where censorship is the norm. Certainly, as Greg pointed out, the source texts themselves may have been grossly edited contrary to the author's original intent (if it were not the first edition, for example), but we must not add to this problem in any way (instead, let's also do the first edition!) In addition, I believe one intent of PG is to assist with the effort to assure the digital texts will survive into the distant future, to hopefully survive wars, revolutions, totalitarianism, digital "book burnings", etc. As the centuries roll by, the issue of digital integrity becomes more and more important for the integrity of the information being passed on to future generations. That is why I believe it is necessary for PG to establish policies for new texts, and to begin working on upgrading some of the existing texts at the appropriate time, to standardize the digital integrity requirements as much as possible, and more importantly to acquire and preserve the original page scans whenever possible. Having the original page scans available side-by-side with the digital texts also benefits everyone (and the Big Vision) by resolving any difficulties in presentation of the digital texts (we all know how weird some texts are), and for fighting against claims of copyright infringement. Contrary to Michael Hart's early policies in hiding the pedigree of digital texts, having the page scans available, so long as our copyright clearance procedure is sufficent, actually strengthens PG against claims of copyright infringement. ***** As a final note, I do agree with several who responded today about my call for redoing the older PG texts, saying we should wait until DP moves to the next-generation XML-based system before redoing these texts. I definitely agree as I think about it. What I think could be done, however, is to prepare for this eventuality by 1) flagging those texts we'd like to redo someday, 2) search for higher-quality source books which will give us *unencumbered* page scans, and then 3) file those page scans away in the archive for later conversion to digital text at the appropriate time. There's nothing wrong with decoupling the scanning stage from the proofreading stage. No doubt my answers will not satisfy everyone, and may not satisfy anyone. But after my spanking, I needed to reply, and in one case apologize. Jon Noring From traverso at dm.unipi.it Fri Nov 12 03:41:04 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Nov 12 03:41:30 2004 Subject: [gutvol-d] My spanking , and my reply In-Reply-To: <109886230796.20041112022349@noring.name> (message from Jon Noring on Fri, 12 Nov 2004 02:23:49 -0700) References: <20041112022413.GB8242@pglaf.org> <109886230796.20041112022349@noring.name> Message-ID: <200411121141.iACBf4iN022809@posso.dm.unipi.it> I have kept all PG mail since I subscribed in sept. 2001. It needs to be sorted in the different lists, might contain some extraneous items, and might miss something. If somebody wants them to reconstruct the archives, Il be glad to contribute them. I don't make them immediately available, since I would have to check first that something private is not contained there, since my filtering is not always accurate. I dislike YahooGroups, I by far prefer a pglaf-based mailman. Carlo From jmk at his.com Fri Nov 12 04:04:25 2004 From: jmk at his.com (Janet Kegg) Date: Fri Nov 12 04:04:46 2004 Subject: [gutvol-d] PG audience In-Reply-To: <200411120537.iAC5bYHk016230@posso.dm.unipi.it> References: <20041112022413.GB8242@pglaf.org> <200411120537.iAC5bYHk016230@posso.dm.unipi.it> Message-ID: On Fri, 12 Nov 2004 06:37:34 +0100, you wrote: > >One of the problems is that, until recently, the whitewashers removed >the informations on the origin of the book, like date of the edition, >publisher, etc; and the change in policy has not been sufficiently >advertised, so some people (even at DP) remove the information to >conform to the perceived PG policy. Until recently? I've been regularly including publisher, place of publication, and date in DP books I've uploaded to PG. Except in a few cases earlier this year, all but the date has been deleted by the WW. This has been mildly bugging me for a while--since I do see other new PG eBooks with publisher information included. And as long as I'm delurking, I'll mention that my DP projects include project comments with quoted biographical info on the author (from Web sources, and usually other Web links). Would it be somehow useful if I include the url to the DP project page in the comments section of the upload form? -- Janet Kegg From marcello at perathoner.de Fri Nov 12 05:03:31 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 12 05:03:34 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <20041112022413.GB8242@pglaf.org> Message-ID: <4194B4A3.3050305@perathoner.de> Karen Lofstrom wrote: > At DP, we're processing things that no one but a scholar will ever read. > Ever. I'm proofreading one of Canon Sells' books about Islam. No one who > is interested in current, up-to-date information is going to read this > book. It's antiquated. The Koran makes the Top 20 of our downloads and is much older. > However, some scholar working on a book re > "history of Western perceptions of Islam" might be thrilled to get access > to an old out-of-print work. If he/she feels the work is reliable, that > is. The problem lieth not within PG. It lieth within Academia. Academia has to adapt its methods and processes to the new world where information resources are ephemeral. If you cite a dead tree edition of something you are quite confident that the cited text stays put. It wont change its wording or glide from the cited page into the next etc. If you cite an electronic resource you have no such confidence. How do you make sure that the text at the url you cite will not be edited or removed? You cannot. How do you make sure the medium you cite will still be readable in some years? In a hundred years reading a CDROM may be harder than it was to read the rosetta stone. > If you don't want to cater to scholars, you're throwing away much of DP's > work. Its not our problem. Any amount of catering will not do away with Academias percieved "limitations" of electronic media. The best value for Academia (and the least work for us) would be just to include the page scans. Any transcription you make will fall short of the requirements of some scholar. I think we should use our time for producing more books for a general audience instead than producing Academia-certified editions of them. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Nov 12 05:10:35 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 12 05:10:39 2004 Subject: [gutvol-d] goodbye In-Reply-To: <1d1.2be96420.2ec5c341@aol.com> References: <1d1.2be96420.2ec5c341@aol.com> Message-ID: <4194B64B.3010301@perathoner.de> Bowerbird@aol.com wrote: > in the meantime, i'll start up a blog, telling the world all the things that > i think you're doing wrong. That will help. But, don't let your departure be delayed by poor little me. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Nov 12 05:20:25 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 12 05:20:29 2004 Subject: [gutvol-d] [BP] The Future of eBooks In-Reply-To: <32924.134.117.137.186.1100246688.squirrel@134.117.137.186> References: <200411111727.iABHRtHL008551@posso.dm.unipi.it> <32924.134.117.137.186.1100246688.squirrel@134.117.137.186> Message-ID: <4194B899.7000907@perathoner.de> bill jenness wrote: > I think the way to go is to have a pg wiki linked to the catalog page > where the users could input reviews, literary commentary, author > biographical details, and etc. I think you underestimate the maintenance work that goes into a wiki. Please go over to Wikipedia and read the Talk pages for some controversial topic, eg. Israel vs. Islamic World. Or read the vote pages where competing groups try to get the other group's pages removed by vote. I sure don't want to spend my day inside the wiki admin page for "The Koran" or "The Communist Manifesto" other works with high controversial potential. > This would allow DP and other producers to > concentrate on producing and not get bogged down with researching > extraneous useful facts. Do you know for a fact that they are bogged down? > I am certain there are some open source wikis > available that could be adapted. Perhaps the documentation side could be > set up as a separate foundation. Go ahead. Get your wiki started. If you reach critical mass we'll implement links from the bibrec pages to your wiki. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Fri Nov 12 05:23:42 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 12 05:23:45 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <20041112022413.GB8242@pglaf.org> <200411120537.iAC5bYHk016230@posso.dm.unipi.it> Message-ID: <4194B95E.1060607@perathoner.de> Janet Kegg wrote: > Would it be somehow useful if I > include the url to the DP project page in the comments section of the > upload form? It would be useful to include the dp project number in some form. We have a discussion ongoing with Joshua on how to achieve this. -- Marcello Perathoner webmaster@gutenberg.org From bkeir at pgdp.net Fri Nov 12 05:57:58 2004 From: bkeir at pgdp.net (bkeir@pgdp.net) Date: Fri Nov 12 05:58:02 2004 Subject: [gutvol-d] Scholarly acceptance Message-ID: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232> The many discussions I've had with academics about PG and DP point to their unshakable distrust, sight unseen, of the quality of work done by "unqualified" volunteer/amateurs. "You mean you let ANYONE do your proofreading??!?!?" is both a question I was asked, and a fair summary of their attitude of incredulity. The open-source, distributed but computer-linked volunteer paradigm is still too new in the world for its strengths, and the quality of its productions, to be trusted by the average academic. Give it a few decades, and population replacement MAY change this. From hart at pglaf.org Fri Nov 12 06:10:51 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Nov 12 06:10:54 2004 Subject: [gutvol-d] Scholarly acceptance In-Reply-To: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232> References: <36207.203.12.144.232.1100267878.squirrel@203.12.144.232> Message-ID: On Sat, 13 Nov 2004 bkeir@pgdp.net wrote: > The many discussions I've had with academics about PG and DP point to > their unshakable distrust, sight unseen, of the quality of work done by > "unqualified" volunteer/amateurs. > > "You mean you let ANYONE do your proofreading??!?!?" is both a question I > was asked, and a fair summary of their attitude of incredulity. > > The open-source, distributed but computer-linked volunteer paradigm is > still too new in the world for its strengths, and the quality of its > productions, to be trusted by the average academic. Give it a few decades, > and population replacement MAY change this. On the other hand, scholars and librarians around the world have also said just the opposite, remarking VERY positively about our collections of Robert Louis Stevenson, Charles Dickens, and many others. The truth is that there will always be those who can't abide anything "not invented here." This goes from messages we have received that ONLY the sender's favorite edition should be used, and all others should be denied a place in ANY eBook library. On the other hand, there is always the Darwinian approach: Those who do not use eBooks simply won't be able to keep up with the those who do. This might be one of the best reasons for NOT giving them each eBook as an exact copy of a particular paper edition. I've also heard that many of those who complain, actually use our eBooks in secret, and ONLY want the provenance so they can steal them without giving credit where credit is due. Apparently they feel they can't actually take them publicly, because they don't wan't to give credit to Project Gutenberg, but if they know which paper edition we used, they can bypass giving us any credit. Somehow this reminds me of Napoleon, in "Animal Farm". . . . Michael From hart at pglaf.org Fri Nov 12 06:23:29 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Nov 12 06:23:31 2004 Subject: [gutvol-d] PG audience In-Reply-To: <4194B4A3.3050305@perathoner.de> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: On Fri, 12 Nov 2004, Marcello Perathoner wrote: > Karen Lofstrom wrote: > >> At DP, we're processing things that no one but a scholar will ever read. >> Ever. I'm proofreading one of Canon Sells' books about Islam. No one who >> is interested in current, up-to-date information is going to read this >> book. It's antiquated. > > The Koran makes the Top 20 of our downloads and is much older. > >> However, some scholar working on a book re >> "history of Western perceptions of Islam" might be thrilled to get access >> to an old out-of-print work. If he/she feels the work is reliable, that >> is. > > The problem lieth not within PG. It lieth within Academia. > > Academia has to adapt its methods and processes to the new world where > information resources are ephemeral. Actually, Project Gutenberg eBooks have proven much less ephemeral than paper books published in the same period, as all of the Project Gutenberg eBooks have been available continuously from their first day of release, while most paper books from over 5 years ago are no longer in print. > If you cite a dead tree edition of something you are quite confident that the > cited text stays put. It wont change its wording or glide from the cited page > into the next etc. But only if you find the exact same paper edition. > If you cite an electronic resource you have no such confidence. How do you > make sure that the text at the url you cite will not be edited or removed? > You cannot. Actually, it's pretty easy to find all the original Project Gutenberg eBooks, as well as the newer versions, because so many places keep them, usually in the thousands for any of our eBooks that have been out for even a week. > How do you make sure the medium you cite will still be readable > in some years? In a hundred years reading a CDROM may be harder than it was > to read the rosetta stone. There are SO many copies of each Project Gutenberg eBook out there that the question of a particular medium becomes irrelevant. . .when you download a copy of Huck Finn, you never know at your end whether it is stored on a CDROM, DVD, RAID, Terabrick, or even a floppy. Most of you don't realize that less then 20 years ago our eBooks were available from my BBS, and that the entire BBS ran on hi-density floppy drives. The fact that the eBooks are independent of the medium, and of hardware or software requirements in "Unlimited Distribution" is what makes them last longer than anything else on the entire Internet. Where else can you find files that were originally posted 33 years ago? Michael From hart at pglaf.org Fri Nov 12 06:31:23 2004 From: hart at pglaf.org (Michael Hart) Date: Fri Nov 12 06:31:25 2004 Subject: [gutvol-d] PG audience In-Reply-To: <4194B4A3.3050305@perathoner.de> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: On Fri, 12 Nov 2004, Marcello Perathoner wrote: >> Karen Lofstrom wrote: > The problem lieth not within PG. It lieth within Academia. I must agree. Academia is perhaps the worst when it comes to the "not invented here," syndome. . .and it pays the price by lagging behind. > >> If you don't want to cater to scholars, you're throwing away much of DP's >> work. > > Its not our problem. Any amount of catering will not do away with Academias > perceived "limitations" of electronic media. That is, until they take over the eBooks, and claim them as their own. >> If you don't want to cater to scholars, >> you're throwing away much of DP's work. If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist. The real value of the work lies in making it available to the masses, not to the scholars. If we can increase literacy by even 10%, we make more difference than if we cater to the scholars. > The best value for Academia (and the least work for us) would be just to > include the page scans. Any transcription you make will fall short of the > requirements of some scholar. I think we should use our time for producing > more books for a general audience instead than producing Academia-certified > editions of them. Hear Hear! Michael From tb at baechler.net Fri Nov 12 07:11:46 2004 From: tb at baechler.net (Tony Baechler) Date: Fri Nov 12 07:10:06 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <4194B4A3.3050305@perathoner.de> <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> At 06:23 AM 11/12/2004 -0800, you wrote: >Actually, it's pretty easy to find all the original Project Gutenberg eBooks, >as well as the newer versions, because so many places keep them, usually in >the thousands for any of our eBooks that have been out for even a week. Hello. Actually, I've had a hard time finding any of the very early editions of PG files. There are some old files in the etext90 directory, but not edition 10 of the first several ebooks. I would be interested to find the very first edition of when10.txt or whatever it was called as MH posted it. Even the old GUTINDEX.* files have been removed, with the earliest being GUTINDEX.96 when it used to be GUTINDEX.90. From nwolcott2 at kreative.net Fri Nov 12 07:08:31 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 12 07:26:02 2004 Subject: [gutvol-d] Gone with the wind i s "Gone with the wind" Message-ID: <006d01c4c8cb$e4d38440$069595ce@net> Sayonara. Apparently all versions of GWTW have disappeared from the net. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/77ed0a7f/attachment.html From nwolcott2 at kreative.net Fri Nov 12 07:22:08 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Fri Nov 12 07:26:05 2004 Subject: [gutvol-d] Perfection Message-ID: <006e01c4c8cb$e5c621a0$069595ce@net> Instead of worrying about perfection, we would be better advised to fix the many texts which are or have become unreadable. It is also uncomfortable, when there are several translations of a work with the same title and an anonymous translator to havve the publisher routinely or randomly removed. Also there are many DOS texts with accents that are hence unreadable. Any code page should be acceptable? maybe but. . . Also although there are explicit directions for submitting a text, correcting one or updataing one, even one I contributed, has apparently no explicit provision. Also, at random apparently, a little preamble I have added to help the reader identify the text or its possible shortcomings is removed. Although many texts shave no unique provenance as MH has advised, but that is no reason for removing any hint of preovenance when one is supplied by a contributor. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest keeping the inkpots full. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/d52a5c13/attachment.html From mbuch at mcsp.com Fri Nov 12 07:48:18 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 07:46:27 2004 Subject: [gutvol-d] Perfection In-Reply-To: <006e01c4c8cb$e5c621a0$069595ce@net> Message-ID: And herein lies some of the problem. I'm a college professor, and I recently earned my PhD. I would have had a hard time getting a rtext past my professors without being able to document who published it. I would have a hard time making a citation to a document with no pages. I would be very annoyed with a student who just pointed to something on the net that had no provenance whatsoever- even many pieces of ephemera have provenance. I don't think this is a matter of fuddy-duddy professors who just don't understand how wonderful e-books are; I think the very concept of e-books as it now stands, while excellent for casual readers or people who simply want to educate themselves, is deeply flawed. When I am citing a text, I cannot refer to a vague document. I need to know EXACTLY when the original was published, who published it, and where, since there are variant texts out there. Even a single word change that might have occurred in the copying process could change the meaning of a vital sentence. PG is wonderful- but as a student and a teacher, I don't think that most cybertexts provide the citability that is so important for academics. If PG was the only source in the world for vital texts, that would be one thing- but it isn't. I love PG, and I sned students to it all the time- but only for the purpose of reading. I would not seend a student to a PG text in order to make a citation. I have no way of knowing where many of the texts came from, whether the edition copied was a variant on the original, what page the information appeared on inthe original copy, or anything else. In the social sciences and liberal arts, these things are very important. It is the soul of how we check for plagiarism, understand the history of a work, and make specific references. PG is great for when I want to read a Tom Swift book or understand the human genome - but it doesn't help me if I need to explain the migration in the ideas of Franz Boas over time and through eiditions of his works or examine the changes between editions of Dust Tracks on the Road. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Norm Wolcott Sent: Friday, November 12, 2004 10:22 AM To: Project Gutenberg Volunteer Discussion Cc: Norm Wolcott Subject: [gutvol-d] Perfection Instead of worrying about perfection, we would be better advised to fix the many texts which are or have become unreadable. It is also uncomfortable, when there are several translations of a work with the same title and an anonymous translator to havve the publisher routinely or randomly removed. Also there are many DOS texts with accents that are hence unreadable. Any code page should be acceptable? maybe but. . . Also although there are explicit directions for submitting a text, correcting one or updataing one, even one I contributed, has apparently no explicit provision. Also, at random apparently, a little preamble I have added to help the reader identify the text or its possible shortcomings is removed. Although many texts shave no unique provenance as MH has advised, but that is no reason for removing any hint of preovenance when one is supplied by a contributor. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest keeping the inkpots full. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041112/3467f280/attachment.html From mbuch at mcsp.com Fri Nov 12 08:12:39 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 08:10:47 2004 Subject: [gutvol-d] PG audience In-Reply-To: Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart Sent: Friday, November 12, 2004 9:31 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] PG audience On Fri, 12 Nov 2004, Marcello Perathoner wrote: >> Karen Lofstrom wrote: > The problem lieth not within PG. It lieth within Academia. I must agree. Academia is perhaps the worst when it comes to the "not invented here," syndome. . .and it pays the price by lagging behind. Sometimes it's not a matter of laggig behind. Academia has different needs and goals than the casual reader. I'm an academic, and I will use PG with undergrands- but tell them to go to paper books for citations. Why? because provenance is important in citation. My students tend to think everything on the net is 'true'- they don't understand that books on the net may or may not reflect scholarly knowledge or acceptance. And often the divisions are too large for useful citation- the page is not only a piece of paper. It's a unit of citation. Page 193 in the 3rd edition of a particular book by a particular poublisher is page 193 in every copy, and contains a finite number of words. Chapter 23 may have a finite number of words, but how do I find the sentence I want to cite? Plus, the edition used on PG might not be the standard- it might be a variant. Variant problems are crucial when trying to read poetry and literature for scholarly purposes. Chefs aren't 'lagging behind' just because most of them still chop food by hand instead of using Cuisinarts. They can control the texture and shape of what they cook much better by using an old-fashioned blade. On the other hand, electric mixers are much more efficient for making cakes and can do a better job than a person beating eggs and butter by hand- which is why pastry chefs use machines most of the time. > >> If you don't want to cater to scholars, you're throwing away much of DP's >> work. > > Its not our problem. Any amount of catering will not do away with Academias > perceived "limitations" of electronic media. That is, until they take over the eBooks, and claim them as their own. We probably won't, unless we can find ways of making exact facsimile scans of books with page numbers, citations, illustrations, and so on. Are musicians silly because they choose to play instruments instead of having machines do all the work? No. Machines, no matter how good they are, don't have the same warmth that physical instruments have. Even if one day they do, I doubt all the instuments in the world will be thrown away. Why do you care whether academics cite PG? You seem to think they should come to you- did you ever think we have this thing called a 'page' that acts as a standard unit of knowledge, and that when we cite something, we need that page to stay reasonably stable? And it does, even with the vagaries of publishing. PG is great but most of the the books you publish aren't the sorts of things that would be useful to a grad student anyway- or even an undergrad, most of the time. For people who want a book on the go, who are looking for an out of print book for nostalgia's sake, for people who need to change print size for readability, PG is perfect. But it's not very useful for citations, any more than tv science programs are. I do think that dedicated proofers can do a great deal, and should be applauded. They can have exactitude. But that's not the problem. The problem is provenance. If you wanted academics to accept you, you would have to provide that, and maybe have experts on particular books vet them. >> If you don't want to cater to scholars, >> you're throwing away much of DP's work. If we cater to scholars, we are only expanding the "digital divide," so to speak. Our goal is to provide a large viable library to all, not just to the scholars, who represent less than 1% of the people, and are often very elitist. I agree, and i'm a scholar. Stop worrying about what we think. PG has shown me books I couldn't enjoy otherwise. Scholars don't read scholarly books all the time, and they have places to go for that. The real value of the work lies in making it available to the masses, not to the scholars. If we can increase literacy by even 10%, we make more difference than if we cater to the scholars. > The best value for Academia (and the least work for us) would be just to > include the page scans. Any transcription you make will fall short of the > requirements of some scholar. I think we should use our time for producing > more books for a general audience instead than producing Academia-certified > editions of them. Hear Hear! I agree- but I would love to see page scans. I don't think that most casual readers (and by that I even include 'serious' readers who do not use written material for citation) understand why pagination is so important to scholars. That's fine. But pease stop assuming that we're all Luddites just because PG is pretty much useless to us academically. Hey- professional baskeball players sometimes play one-on-one for fun; that doesn't mean they have to take such play seriously for it to have value. Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Fri Nov 12 08:28:33 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 08:28:38 2004 Subject: [gutvol-d] a few questions that i don't know the answer to Message-ID: <20041112162833.3C91E9E793@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Bowerbird@aol.com > > > the next questions involve long-standing and often-repeated > requests that i have made for other changes to p.g. policy. > having received no satisfaction, in spite of the reasonableness > of these requests, i will pursue a strategy of lobbying for them > to a wider audience, but i make them again here for the record. > > 1. could you _please_ strive for consistency in your e-books? We do. We don't always succeed, but we do strive for it. > > 2. could you please ensure the policy on styled text is upheld? > The whitewashers most definitely do. As far as the style goes (beyond _ and *, we don't specify much). > 3. could you please start including graphic-file-names in your > plain-text versions, so my viewer-app knows what to display where? > Nope. Plain-text means ... plain text, nothing more. > 4. could you please start including page-break information in your > plain-text versions, so my viewer-app can use original page-breaks > for those end-users that might desire that capability? > Once again, nope. Text doesn't provide a way to have this information in the file without it being jarringly placed in the middle of the flow of the text. Other formats can provide this, but plain text cannot. > 5. could you please start including line-break information in your > plain-text versions, for the same reason? > Nope (see answer for the above). This would be even more invasive to the reading experience, because you'd have a character or markup of some kind all over the place. And plain text has to be able to be fully used in any old text editor/reader. We can't assume the reader program will hide the line break information. > my documentation on zen markup language (z.m.l.) will demonstrate > how you can incorporate these requests into your plain-text files... Sigh. And once they are marked up, they are no longer plain text files. They are z.m.l. files. If you want to convert the entire catalog to z.m.l, feel free. No one will use them, but feel free. Josh From jon at noring.name Fri Nov 12 08:32:44 2004 From: jon at noring.name (Jon Noring) Date: Fri Nov 12 08:32:54 2004 Subject: [gutvol-d] PG audience In-Reply-To: <4194B4A3.3050305@perathoner.de> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: <111911965953.20041112093244@noring.name> Marcello wrote: > Karen Lofstrom wrote: > If you cite a dead tree edition of something you are quite confident > that the cited text stays put. It wont change its wording or glide from > the cited page into the next etc. > > If you cite an electronic resource you have no such confidence. How do > you make sure that the text at the url you cite will not be edited or > removed? You cannot. How do you make sure the medium you cite will still > be readable in some years? In a hundred years reading a CDROM may be > harder than it was to read the rosetta stone. Actually, this issue is dealable using hash functions. Once a digital document is finished and archived, simply calculate a hash value for it (or the set of files the work comprises.) Use a published, open standards hashing algorithm -- there's many out there to choose from. It's also possible to use digital signatures in some manner, but I'll let the experts in this area discuss this possibility. Textual integrity is definitely an issue, and it goes beyond just keeping academics happy -- it is germane to the perceived integrity of the entire collection of texts by society-at-large. By keeping the page scans along with the digital texts, we are, in effect, telling the users of the digital texts that we fully stand by the textual integrity of the collection, that we did not pull any fast ones, and that it can be trusted. We are putting our reputation on the line. With using digital hashes and digital signatures, and redundant/ mirrored text repositories, we go a long ways towards assuring the collection maintains its integrity. As others have noted, some dictator or totalitarian regime in the future may break into one of the repositories and start tweaking texts. So long as the whole world does not revert to totalitarianism (where then we have much bigger problems than the integrity of texts), then with a properly designed repository it will always be possible to restore the original digital texts from a clean, untouched digital repository. Hopefully individuals will also keep digital texts laying around, but again here we also need to keep in mind individuals can also tweak the texts, thus the use of hashing/digital signatures is still needed. >> If you don't want to cater to scholars, you're throwing away much of DP's >> work. > Its not our problem. Any amount of catering will not do away with > Academias percieved "limitations" of electronic media. I don't have such a pessimistic view of academia. Yes, academics are strange birds. But as the old generation dies, and a new generation arises, familiar with accessing digital information, they will embrace digital media with a fervor. PG can certainly make its texts "academia friendly", or at least reasonably so. The incremental effort (delta-t) to do the few more things to make PG texts more academia-friendly is pretty small compared to the overall time it takes to scan/type/OCR/proof a text. And many of these added things have other small benefits outside of academia itself, benefits for other user groups of PG texts. > The best value for Academia (and the least work for us) would be just to > include the page scans. Any transcription you make will fall short of > the requirements of some scholar. I think we should use our time for > producing more books for a general audience instead than producing > Academia-certified editions of them. It behooves PG to at least reasonably reach out to the requirements of "academia" (which is not as monolithic as implied) in markup and metadata, and include the original page scans for every work. That's all that can be done and should be done. Making the page scans available has purposes beyond just keeping academics happy. For example, someone may wish to issue a retypeset print edition of some work using the XML-based PG texts. Having the original page scans there to verify document structure and layout oddities will be useful to those doing final proofing of the output typography. And as noted above, having the original page scans available to future generations is a further protection of the textual integrity of the digital text. It also has the side-benefit of being a digital preservation of the original source, and this alone is a very powerful argument to keep the page scans as an honored and integral part of the PG collection -- it will greatly add value and purpose to the PG collection. Disk space and bandwidth is no longer an issue (well, it's no longer a major, show-stopper issue as it was a decade ago.) It mystifies me why the original page scans are treated by some here as some sort of waste product, meant to be flushed down the toilet when done, or that we don't need to preserve them, or need to have access to them (I'm still surprised to hear that the scans for some of the DP texts are not available to the public because of licensing issues.) Jon Noring From joshua at hutchinson.net Fri Nov 12 09:00:17 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 09:00:22 2004 Subject: [gutvol-d] PG audience Message-ID: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> ----- Original Message ----- From: Marcello Perathoner > > Janet Kegg wrote: > > > Would it be somehow useful if I > > include the url to the DP project page in the comments section of the > > upload form? > > > It would be useful to include the dp project number in some form. We > have a discussion ongoing with Joshua on how to achieve this. > Yep, pretty easy to implement in the teiHeader. And then it can be used by Marcello's script to link back to the DP project commments fairly easily. At this point, I figure it will be in there in the final specs. Josh From joshua at hutchinson.net Fri Nov 12 09:06:02 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 09:06:07 2004 Subject: [gutvol-d] Scholarly acceptance Message-ID: <20041112170602.983BB109901@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Michael Hart > > On the other hand, there is always the Darwinian approach: > > Those who do not use eBooks simply won't be able to keep up with > the those who do. > > This might be one of the best reasons for NOT giving them each > eBook as an exact copy of a particular paper edition. Ok, I'm not following the mental jump from point A to point B. How does people being savvey with eBooks lead to not putting good bibliographic information in place in the book? Dead tree editions will usually have information on how the text was obtained. Why not ours? Josh From joshua at hutchinson.net Fri Nov 12 09:20:49 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 09:20:55 2004 Subject: [gutvol-d] PG audience Message-ID: <20041112172049.3FA5EEDC4E@ws6-1.us4.outblaze.com> ----- Original Message ----- From: Jon Noring > > It mystifies me why the original page scans are treated by some here > as some sort of waste product, meant to be flushed down the toilet > when done, or that we don't need to preserve them, or need to have > access to them (I'm still surprised to hear that the scans for some of > the DP texts are not available to the public because of licensing > issues.) > Just to clarify a little... DP produces a large quantity of work based on scans produced by other organizations. At different points in our history, we couldn't have kept up with the proofers any other way. With some organizations, we are able to automate the "harvesting" of images with utilities. Some, though, are willing to work with us, sending us the files or providing an easy access path. In return, we agree (if they ask) not to post the images for use other than as proofing sources. The credit lines always have a reference back to the original site for the images. Basically, if we wanted to be rude, we could post the images any way we wanted to. The copyright laws certainly seem to be in our favor if we did. But these sites are going out of their way to help us, so we return the favor by routing people that want to see the images (the fruits of their labor) back to them. Josh From shalesller at writeme.com Fri Nov 12 09:49:44 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Nov 12 09:49:56 2004 Subject: [gutvol-d] Perfection Message-ID: <20041112174944.77DE74BE64@ws1-1.us4.outblaze.com> "Norm Wolcott" writes: > Also there are many DOS texts with accents that are hence > unreadable. Any code page should be acceptable? maybe but. . . Can you point to one? I was under the impression that all the texts in old DOS codepages were updated to use Latin-1. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From marcello at perathoner.de Fri Nov 12 10:31:51 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 12 10:31:59 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: Message-ID: <41950197.2020707@perathoner.de> Her Serene Highness wrote: > Why do you care whether academics cite PG? You seem to think they should > come to you- did you ever think we have this thing called a 'page' that acts > as a standard unit of knowledge, and that when we cite something, we need > that page to stay reasonably stable? Did it ever occur to you that the "page" as "standard unit of knowledge" is a purely arbitrary thing? The standard unit of knowledge depends on the information technology of the epoch. It first was the "cave wall", then became the "clay plate", then it became the "scroll", then it became the "page" and today it is the "internet resource". I can google any cited phrase on the net in a few keystrokes' time. OTOH to verify a quotation, it may take months until I get my hands on a physical copy of some random obscure book . -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Fri Nov 12 10:46:48 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Nov 12 10:46:49 2004 Subject: [gutvol-d] Perfection In-Reply-To: References: <006e01c4c8cb$e5c621a0$069595ce@net> Message-ID: <20041112184648.GC3160@pglaf.org> On Fri, Nov 12, 2004 at 10:48:18AM -0500, Her Serene Highness wrote: > And herein lies some of the problem. I'm a college professor, and I > recently earned my PhD. I would have had a hard time getting a rtext past > my professors without being able to document who published it. I would have > a hard time making a citation to a document with no pages. I would be very > annoyed with a student who just pointed to something on the net that had no > provenance whatsoever- even many pieces of ephemera have provenance. I > don't think this is a matter of fuddy-duddy professors who just don't > understand how wonderful e-books are; I think the very concept of e-books > as it now stands, while excellent for casual readers or people who simply > want to educate themselves, is deeply flawed. When I am citing a text, I > cannot refer to a vague document. I need to know EXACTLY when the original > was published, who published it, and where, since there are variant texts > out there. Even a single word change that might have occurred in the > copying process could change the meaning of a vital sentence. PG is > wonderful- but as a student and a teacher, I don't think that most > cybertexts provide the citability that is so important for academics. If PG > was the only source in the world for vital texts, that would be one thing- > but it isn't. >... My Ph.D. in Information Transfer is from 1993. I've taught Internet stuff and a whole lot of other things since 1988. I went to college in 1983, and never left, holding faculty positions since 1991 - in short, I'm very much a professional academic. Here are some of my experiences related to electronic texts: - I *have* entirely electronic articles cited in my academic vita (http://petascale.org/vita.html). Nobody (none of my deans, etc.) has even raised an eyebrow. Today, like always, peer review and the reputation of the publication are what matters, not whether it was printed. - I have refused paper submissions of any assignments from my students for years (http://petascale.org/paperless.html), including master's theses and doctoral dissertations. Again, this is just not a problem. At the end of the degree process, we (the committee) signs a piece of paper and the student submits copies of the printed document to the library. Then, a PDF or similar goes to various archives and Web pages, and is available for widespread free access. - I was recently appointed Editor of the standards document series in the Global Grid Forum (http://www.ggf.org), which publishes an all-electronic document series modeled after the RFC series published by the IETF (which is much older, and is essentially the standards that defines the Internet). - Every citation format (APA, MLA, Chicago, etc.) specifies how to cite documents which are not printed. For the most part, they distinguish between epheremal stuff like email messages and more permanent stuff like online journal articles. This is still difficult, and many people cite inappropriate items as though they were published documents rather than things like personal communication, changeable Web pages, etc. But it's certainly done, and it's done in journal articles (print & electronic), standards documents, books, newspaper articles, etc. Here's one of many good pages describing electronic citation: http://owl.english.purdue.edu/handouts/research/r_docelectric.html In short, I'm happy to say that my experience is completely different than yours. Moreover, unlike you, I seem to have specific documents, citations and processes to back up my impressions, while you haven't provided any. Certainly it's the case that some academic fields rely more on the exact words of a particular printed item. Hermeneutics is an example, and some others of the historical, classic & humanities disciplines. But to dismiss "academics" as being unable to deal with online content (as the subject/object of research, as support for research, or as the published outcome of research) is certainly an overstatement, and inconsistent with the experiences of me and my academic peers. -- Greg From gbnewby at pglaf.org Fri Nov 12 10:54:00 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Fri Nov 12 10:54:02 2004 Subject: [gutvol-d] PG audience In-Reply-To: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> References: <4194B4A3.3050305@perathoner.de> <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> Message-ID: <20041112185400.GD3160@pglaf.org> On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote: > At 06:23 AM 11/12/2004 -0800, you wrote: > > >Actually, it's pretty easy to find all the original Project Gutenberg > >eBooks, > >as well as the newer versions, because so many places keep them, usually in > >the thousands for any of our eBooks that have been out for even a week. > > Hello. Actually, I've had a hard time finding any of the very early > editions of PG files. There are some old files in the etext90 directory, > but not edition 10 of the first several ebooks. I would be interested to > find the very first edition of when10.txt or whatever it was called as MH > posted it. Even the old GUTINDEX.* files have been removed, with the > earliest being GUTINDEX.96 when it used to be GUTINDEX.90. Michael might have some of the older files. There are a few sources, like old Walnut Creek CDs, that might also be able to help. These days, we essentially never delete anything (not strictly true, but close enough... and we run a no-delete mirror for when mistakes happen). But in the past, Michael would remove older files. This was largely due to space constraints on the hosting servers. As for the GUTINDEX* files, we don't keep older files around, since they are essentially always updated weekly. I can see the reason for interest in looking back through older files, though - maybe we'll start doing this in a new subdirectory. Note that the GUTINDEX files have been through many iterations. Michael used to maintain them, then I did, and now George Davis does. The filenames have changed, and so has the format. For the most part, this has been simply to accommodate the changing nature of the publications, enhanced metadata (like contents listings), and other pragmatics. Unrelated story: I needed to print GUTINDEX.ALL the other day (as part of an affadavit for another legal case I'm helping with, where we once again show there are "significant non-infringing uses" for online content). It's about 550 pages. Whew! I hope that's the only time in this decade anyone needs to print it. -- Greg From sly at victoria.tc.ca Fri Nov 12 10:53:40 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Nov 12 10:57:48 2004 Subject: [gutvol-d] Perfection In-Reply-To: <006e01c4c8cb$e5c621a0$069595ce@net> References: <006e01c4c8cb$e5c621a0$069595ce@net> Message-ID: On Fri, 12 Nov 2004, Norm Wolcott wrote: >randomly removed. Also there are many DOS texts with accents that are >hence unreadable. Any code page should be acceptable? maybe but. . . We have a couple people who are fixing up and reposting older files, (which is often more involved than simply changing character encoding) A little while ago, I heard over 400 etexts had been reposted, so it's more than that by now. Are you volunteering to help? Andrew From mbuch at mcsp.com Fri Nov 12 13:58:52 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 13:57:07 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041112184648.GC3160@pglaf.org> Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Greg Newby Sent: Friday, November 12, 2004 1:47 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] Perfection On Fri, Nov 12, 2004 at 10:48:18AM -0500, Her Serene Highness wrote: > And herein lies some of the problem. I'm a college professor, and I > recently earned my PhD. I would have had a hard time getting a rtext past > my professors without being able to document who published it. I would have > a hard time making a citation to a document with no pages. I would be very > annoyed with a student who just pointed to something on the net that had no > provenance whatsoever- even many pieces of ephemera have provenance. I > don't think this is a matter of fuddy-duddy professors who just don't > understand how wonderful e-books are; I think the very concept of e-books > as it now stands, while excellent for casual readers or people who simply > want to educate themselves, is deeply flawed. When I am citing a text, I > cannot refer to a vague document. I need to know EXACTLY when the original > was published, who published it, and where, since there are variant texts > out there. Even a single word change that might have occurred in the > copying process could change the meaning of a vital sentence. PG is > wonderful- but as a student and a teacher, I don't think that most > cybertexts provide the citability that is so important for academics. If PG > was the only source in the world for vital texts, that would be one thing- > but it isn't. >... My Ph.D. in Information Transfer is from 1993. I've taught Internet stuff and a whole lot of other things since 1988. I went to college in 1983, and never left, holding faculty positions since 1991 - in short, I'm very much a professional academic. Here are some of my experiences related to electronic texts: - I *have* entirely electronic articles cited in my academic vita (http://petascale.org/vita.html). Nobody (none of my deans, etc.) has even raised an eyebrow. Today, like always, peer review and the reputation of the publication are what matters, not whether it was printed. Agreed. I have no reason to doubt you. However- you did say that your work is in Information Transfer, right? Do you think there might be a teensy bit of difference between a reference by an Information Transfer academic that is from an electronic journal and was published for other academics in that and related fields, and a citation of, say, Emily Dickenson's poetry without information as to when the book it was taken from was published- considering that it is now known that many earlier copies of Dickenson used incorrect punctuation because previous editors messed around with them? I'd have no problem accepting or using a citation of the US Census online- I've done it. I've used citations of NYS divorce and sexual offense law from online sources- no problem. All of those are frequently updated. But a citation of an out of print book in anthropology, English literature, the hard scieces, et al, which might very well not be correct in its information- that will be problematic. I would be very happy to see Boas online. Eventually I hope to track down an out of copywright version of his writings and scan it for PG. I'd like to do the same with Zora Neale Hurston, Ruth Benedict, and quite a few other people. However- and this is the big 'however'- while these texts would be useful for casual and serious non-academic readers, and even for many academic readers as a point of reference, theie usefulness would be seriously impaired without info as to who originally published the books and when. Boas' works vary according to edition- therefore, knowing which edition you are reading can matter if you are doing research on his theories. If I were doing online research in a general fashion onthe history of anthropology, it wouldn't matter. If I were writing a scholarly work, it would. It would also matter if there was no pagination. Again- I'm not talking about materials produced in the past twenty years. I'm talking about historical materials. They are not entirely electronic. Another example- I'm tutoring a 15 year old about the incidents that led up to WW2. We go online and find the Treaty of Versailles. He can cite it- not only is it a well-known document (making it easy to check for errors and lacunae), but each section of the treaty is numbered. It's easy for him to refer to Article 15 in a paper, and easy for a teacher to find the section in an online document. I would encourage him to use it in class, and to do an internet citation- no problem. But if he was to try to cite Winston Churchill's autobiography from an online site (Not that it's online) or Mein Kampf (which probably is), he'd run up against a problem. In chapter 5 there might be a very quotable sentence- but what my student doesn't know is that this sentence was changed in later editions. And there's no page number- does he tell his teacher to read the entire chapter to find a sentence that won't be there in a later edition? The last time I looked at PG (a few weeks ago) I found it very easy to red books if I wanted to read the whole text. If I wanted to find chapters or pages I had hard luck- I had to scan through whole documents. You don't have to believe me. Just find this quote. It's from The Koran. "And thou takest vengeance on us only because we have believed on the signs of our Lord when they came to us. Lord! pour out constancy upon us, and cause us to die Muslims." It's in Sura VII. I have no doubt that you'll find it- but it will take you quite a while to do so with no page numbers and no way to go to each section separately. As a teacher, I don't have time to read half The Koran (that's a hint, by the way)to find this one quote on PG. I can however find websites that will make the search much easier for me, and will provide some info on the translation. After all, I have no idea who JM Rodwell was, or whether his translation of The Koran is the definitive English version, or why his translation was chosen- other than that his book was out of copyright. From my point of view, that's a red flag itself. If this translation is so superb, why isn't it still being used- or is it? to the library. Then, a PDF or similar goes to various archives and Web pages, and is available for widespread free access. - I was recently appointed Editor of the standards document series in the Global Grid Forum (http://www.ggf.org), which publishes an all-electronic document series modeled after the RFC series published by the IETF (which is much older, and is essentially the standards that defines the Internet). - Every citation format (APA, MLA, Chicago, etc.) specifies how to cite documents which are not printed. For the most part, they distinguish between epheremal stuff like email messages and more permanent stuff like online journal articles. This is still difficult, and many people cite inappropriate items as though they were published documents rather than things like personal communication, changeable Web pages, etc. But it's certainly done, and it's done in journal articles (print & electronic), standards documents, books, newspaper articles, etc. Here's one of many good pages describing electronic citation: http://owl.english.purdue.edu/handouts/research/r_docelectric.html I'm aware of that. As I states above, I've used electronic citations, even when professors raised eyebrows. But you are not dealing with my particular statements, which have nothing to do with the citation of contemporary documents and ephemera, or with copies of documents that make searches for particular passages much easier for readers and writers. I was very specific in my criticism- and since you have a degree in Information Transfer and have taught Library Science, it ought to be of concern to you, too. But being an expert in Information Trnasfer is not the same thing as doing research using out of print documents. Your business is making them readable and accessible, which is important. From where I stand that is important too, but less important than being able to consistently find passages, and checking to see the differences according to editions. Nietzsche's work for instance, was butchered by his sister. There are conflicting copies of his work floating around. When his works were copied for Project Gutenberg, did someone go for an out of copyright copy that is definitive, or one that his sister chopped up? Did that matter, or was it just more important to get a copy up? Cattle ranchers, butchers, and chefs all deal with meat. That doesn't make a chef an expert on cattle feed or an butcher an expert on how to best prepare beef in orange sauce. We may both be involved in academia, but our concerns regarding information technology might be very different- that doesn't mean that one or both of us are idiots, or that I'm a Luddite, or that you're a geek with no appreciation for what's inside the books you put up. support for research, or as the published outcome of research) is certainly an overstatement, and inconsistent with the experiences of me and my academic peers. -- Greg _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From joshua at hutchinson.net Fri Nov 12 14:14:26 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 14:14:36 2004 Subject: [gutvol-d] Perfection Message-ID: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> ----- Original Message ----- From: "Her Serene Highness" > > You don't have to believe me. Just find this quote. It's from The Koran. > "And thou takest vengeance on us only because we have believed on the signs > of our Lord when they came to us. Lord! pour out constancy upon us, and > cause us to die Muslims." It's in Sura VII. I have no doubt that you'll > find it- but it will take you quite a while to do so with no page numbers > and no way to go to each section separately. Not trying to be a smart-### here, but I tried your example. Time to open PG's website and search for Koran... ~25 seconds. Clicked on the Koran link, it downloaded quickly (thanks to a T1 here at work! ;) ~ 5 seconds. Control-F, paste in the first four words from your quote ("And thou takest vengeance), hit return ... first hit was the right one. 5 seconds at most. Total time from reading your paragraph to reading the passage from the Koran... 35 seconds. This is why electronic citation is so much BETTER. I someone points me to the file they cited from, a quick search will turn it up in seconds and opposed to finding the book, flipping to the page and skimming down through the text to find the quote material. And just so you know I did find the material in the Koran... I will surely cut off your hands and feet on opposite sides; then will I have you all crucified." They said, "Verily, to our Lord do we return; And thou takest vengeance on us only because we have believed on the signs of our Lord when they came to us. Lord! pour out constancy upon us, and cause us to die Muslims." Then said the chiefs of Pharaoh's people-"Wilt thou let Moses and his people go to spread disorders in our land, and desert thee and thy gods?" He said, "We will cause their male children to be slain and preserve their females alive: and verily we shall be masters over them." **** Ah, the oneness of religion! Christianity is built upon the foundation of Judaism ... Islam references both ... Yet religious reasons are giving for so much fighting. Josh From jlinden at projectgutenberg.ca Fri Nov 12 12:18:46 2004 From: jlinden at projectgutenberg.ca (James Linden) Date: Fri Nov 12 14:18:45 2004 Subject: [gutvol-d] Perfection In-Reply-To: Message-ID: A simple implementation of the id="" HTML attribute would solve the issues regarding quoting a particular sentence or paragraph... for example: http://kodekrash.com/project/btw_ufs.html#p191 -- will put you right at a paragraph talking about learning multiplication before cube roots (in Booker T Washington's autobiography). If we had decent master versions of the texts, such features would be child's play... I _will_not_ go into the "master versions" rant again tho. -- James From kris at transitory.org Fri Nov 12 15:24:14 2004 From: kris at transitory.org (kris foster) Date: Fri Nov 12 15:24:25 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> Message-ID: <20041112182216.Y99646@krweb.net> > This is why electronic citation is so much BETTER. I someone points me > to the file they cited from, a quick search will turn it up in seconds > and opposed to finding the book, flipping to the page and skimming down > through the text to find the quote material. this is a dangerous reliance on a transitory medium. electronic citation is merely more convenient. --kris From shalesller at writeme.com Fri Nov 12 15:45:27 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Nov 12 15:45:40 2004 Subject: [gutvol-d] Perfection Message-ID: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com> Let me note, I had no way of telling Greg's comments apart from yours except for context. Perhaps you relied on some HTML thing; please don't do so. I'm not going to argue the wisdom of HTML email, but HTML email that does not degrade nicely to plain text is going to look awful to many of the recievers. "Her Serene Highness" writes: > But a > citation of an out of print book in anthropology, English literature, the > hard scieces, et al, which might very well not be correct in its > information- that will be problematic. But this has nothing to do with etexts; this has to do with older books. > > I would be very happy to see Boas online. Eventually I hope to track down > an out of copywright version of his writings and scan it for PG. It'll be a long time, unless you move to Canada. The last of his works are out of print for another 7 years in the EU and 33 years in the US. The Bureau of American Ethnology volumes are being worked on up to 1930 (since it's a US government publication) and I believe that includes some work by Boas. > In chapter 5 > there might be a very quotable sentence- but what my student doesn't know is > that this sentence was changed in later editions. And there's no page > number- does he tell his teacher to read the entire chapter to find a > sentence that won't be there in a later edition? What is he supposed to do, give a page reference to one of a dozen editions that might be very hard for the teacher to find? With etexts, you know that your recipent has access to the same edition you have. And as someone else pointed out, if you quote the sentence, the context can be found in seconds. > After all, I > have no idea who JM Rodwell was, or whether his translation of The Koran is > the definitive English version, or why his translation was chosen- other > than that his book was out of copyright. From my point of view, that's a red > flag itself. If this translation is so superb, why isn't it still being > used- or is it? And how do I know that if I pull it off the library shelves? My college library has a half dozen different translations of the Koran; how am I to know which are in use? As for the reason it's not being used, I would suggest that the fact that academics like to retranslate everything every decade might be an explanation. My class used a modern translation of the Iliad, but that doesn't mean that in several hundred years of English translation of the work that's now public domain, there's not one competent, even superb translation. > Nietzsche's work for instance, was butchered by his sister. There are > conflicting copies of his work floating around. When his works were copied > for Project Gutenberg, did someone go for an out of copyright copy that is > definitive, or one that his sister chopped up? Did that matter, or was it > just more important to get a copy up? I doubt that the people who scanned it were aware of the differences. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jmdyck at ibiblio.org Fri Nov 12 15:54:30 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Fri Nov 12 15:54:51 2004 Subject: [gutvol-d] PG audience References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: <41954D36.7B9503D9@ibiblio.org> Michael Hart wrote: > > If we cater to scholars, we are only expanding the "digital divide," > so to speak. Our goal is to provide a large viable library to all, > not just to the scholars, who represent less than 1% of the people, > and are often very elitist. I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman. Instead, some people simply want to make PG texts more useful to scholars than they currently are, and I think we can do that without making them less useful or less available to non-scholars. -Michael From jmdyck at ibiblio.org Fri Nov 12 15:59:48 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Fri Nov 12 16:00:08 2004 Subject: [gutvol-d] increasing literacy References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> Message-ID: <41954E74.4EFB64DE@ibiblio.org> Michael Hart wrote: > > If we can increase literacy by even 10%, > we make more difference than if we cater > to the scholars. We could make even more difference by doing both! Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels? -Michael From jon at noring.name Fri Nov 12 16:22:13 2004 From: jon at noring.name (Jon Noring) Date: Fri Nov 12 16:22:31 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41954D36.7B9503D9@ibiblio.org> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <41954D36.7B9503D9@ibiblio.org> Message-ID: <184940134937.20041112172213@noring.name> Michael Dyck wrote: > Michael Hart wrote: >> If we cater to scholars, we are only expanding the "digital divide," >> so to speak. Our goal is to provide a large viable library to all, >> not just to the scholars, who represent less than 1% of the people, >> and are often very elitist. > I don't think anyone is advocating providing the PG library "just to the > scholars", so that's a strawman. > > Instead, some people simply want to make PG texts more useful to > scholars than they currently are, and I think we can do that without > making them less useful or less available to non-scholars. Agreed. It is possible to come up with a "happy medium" set of baseline requirements which will make the PG texts useful for many purposes. Those who wish to make particular texts even more useful than the baseline for a particular user group simply add more stuff. XML makes it quite easy to extend the features -- just add markup to the content and to the metadata fields. A possibly useful exercise is to categorize the various uses and user groups, and then determine what are the most important features each user group especially desires/needs. Without thinking about it for more than 30 seconds, here's a partial list of different user groups. No doubt this list can be expanded and much better described/subcategorized. But it's a start to further discussion if enough here deem it of interest. 1) Personal interest readers 2) Scholars and researchers 3) Students (K-12 and post-secondary) 4) Professional and vocational Jon Noring From mbuch at mcsp.com Fri Nov 12 16:37:30 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 16:35:49 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com> Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of D. Starner Sent: Friday, November 12, 2004 6:45 PM To: Project Gutenberg Volunteer Discussion Subject: RE: [gutvol-d] Perfection Let me note, I had no way of telling Greg's comments apart from yours except for context. Perhaps you relied on some HTML thing; please don't do so. I'm not going to argue the wisdom of HTML email, but HTML email that does not degrade nicely to plain text is going to look awful to many of the recievers. **Michele here. I'll clarify it for you. I didn't use HTML- I thought those arrow thingies would show up, and they didn't.** "Her Serene Highness" writes: > But a > citation of an out of print book in anthropology, English literature, the > hard scieces, et al, which might very well not be correct in its > information- that will be problematic. But this has nothing to do with etexts; this has to do with older books. ** In some cases it does have to do with older books. But we aren't dealing with new books. We're dealing with old ones. We're also dealing with the problem of not having the master texts.** > > I would be very happy to see Boas online. Eventually I hope to track down > an out of copywright version of his writings and scan it for PG. It'll be a long time, unless you move to Canada. The last of his works are out of print for another 7 years in the EU and 33 years in the US. The Bureau of American Ethnology volumes are being worked on up to 1930 (since it's a US government publication) and I believe that includes some work by Boas. **I'm young enough that I'm willing to wait, and for all I know I may end up in Canada. But that's not the point. i would like to see Boas available to everyone. And to some extent he is- in paper. An ebook of his work isn't 'better' as someone said- it's different.** > In chapter 5 > there might be a very quotable sentence- but what my student doesn't know is > that this sentence was changed in later editions. And there's no page > number- does he tell his teacher to read the entire chapter to find a > sentence that won't be there in a later edition? What is he supposed to do, give a page reference to one of a dozen editions that might be very hard for the teacher to find? With etexts, you know that your recipent has access to the same edition you have. And as someone else pointed out, if you quote the sentence, the context can be found in seconds. **Why not? It's done all the time. Students and scholars have cited rare books that are impossible to find before- I remember citing a rare book that contained the concordat between the Vatican and Germany for a grad class years ago, and information on the Black Star line of Marcus Garvey while still in high school. Why did my professors accept my citations? Because they could be tracked down. It wasn't impossible to find the originals- just difficult. The former one was located in Bobst Library at NYU and the latter was in the NY Public Library's Schomberg Collection. I can find both of them more easily now, because both libraries have their catalogues online. That menas I can find the cites and then go look at the actual books. Since there is no physical book with PG that an outsider can hold, it would be nice to have a master scan of the text. PG isn't meant to be a master text- it's a repository for copies. But copies come from somewhere. 'The context can be found in seconds'. Uh huh. The context of what? The context of a no longer accepted version of an original text? The context of a book that is out of date? I looked at the front end of the Koran. From the Translator's note, he (I assume it was a he) made the translation sometime in the 19th centruy- or the early 20th. I can tell, because he used the word 'Mohammedan', and because I know know PG uses books out of copyright, and because the language and other signs pointed to it being from the 19th century. But other than as a work of literature, i'd have problems using it- like if I were comparing 19th century versions of Arabic texts, because I'm not even sure it was written in the 19th century.** > After all, I > have no idea who JM Rodwell was, or whether his translation of The Koran is > the definitive English version, or why his translation was chosen- other > than that his book was out of copyright. From my point of view, that's a red > flag itself. If this translation is so superb, why isn't it still being > used- or is it? And how do I know that if I pull it off the library shelves? My college library has a half dozen different translations of the Koran; how am I to know which are in use? **How? Easy. You look at other books about Koranic translations and see if they refer to this one- and guess what? You can't do that online. Which means you have to go to a library anyway. Online isn't BETTER. It's different. By the way- in a library, I can tell if a book is a reprint. If it was reprinted, chances are someone thought it was good enough to put out for sale again. I can tell that without even picking up any other books on the shelf- it's called 'looking at the publishing date and the edition number'. It an old trick but you knew that already. Most school libraries don't keep first editions of definitive books on shelves- they're too valuable. A first edition with value- one that is considered important- would be kept in back. I can tell things like that from a card catalogue- PG is a library without one, in the snese of having the kinds of basic info that card catalogues (even electronic ones) have.** As for the reason it's not being used, I would suggest that the fact that academics like to retranslate everything every decade might be an explanation. My class used a modern translation of the Iliad, but that doesn't mean that in several hundred years of English translation of the work that's now public domain, there's not one competent, even superb translation. > Nietzsche's work for instance, was butchered by his sister. There are > conflicting copies of his work floating around. When his works were copied > for Project Gutenberg, did someone go for an out of copyright copy that is > definitive, or one that his sister chopped up? Did that matter, or was it > just more important to get a copy up? I doubt that the people who scanned it were aware of the differences. **That's my point. Thjey did a great job copying it. People all around the world can read it and learn from it. That gives it worth. But i sit worth a whole lot to someone doing work on Nietzsche, and on how his ideas were changed by his editors? You hear that sound? It's the sound of someone starting their car to go to a physical library. anyone who is really interested in his philosophy- or even a college student trying to do a decent paper- has no way of knowing which edition this is or where it came from. A few added words on the front end (original publisher, original publishing date, number of pages edition number) and that problem would be gone. No academic worth his or her salt would be able to seriously dispute it. If the original index were included (if there was one- authors often do that themselves, too) and the bibliography too- well then you've got yourself a book. I could print it out and share it with my friends- after all, most people don't read whole books online. Control F is only useful if I'm in front of a machine. If I want to read a Tom Swift book to my kids at a chapter a night, I'm not going to do it from a laptop or park little Johnny's bed next to my desk. Maybe other people here take their T1 connections to the beach with them or on the subway, but I don't. When I want to cite something to my students, i print out something at home and take it with me, maybe with highlights all over the paper. That's funny thing about books- even casual readers like writing in the margins. I happen to love PG- but it will be in ideal form when it has hyperlinks to other books and to the notes I type up, when I can print it out and have it paginated, when I can tell if I'm reading a facsimile of a first edition. I know that's a lot, but a girl can dream. amd some sites are doing that kind of thing with individual books already- but their scope isn't as large as PG's. PG's scope is what makes it valuable, but I wouldn't use it foor scholarly work. One person made the comment that PG shouldn't try to anticipate what scholars want- it should let scholars discover it and let them say what they need. I just did, and most of what I'm hearing is that I have to learn to adapt to PG, when there are perfectly good college libraries out there. There is no reason for scholars to embrace a site that doesn't even meet up with basic MLA guidelines for books. After all, that's the business you are in- not original websites, but books. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From mbuch at mcsp.com Fri Nov 12 16:47:34 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 16:45:53 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41954D36.7B9503D9@ibiblio.org> Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck Sent: Friday, November 12, 2004 6:55 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] PG audience Michael Hart wrote: > > If we cater to scholars, we are only expanding the "digital divide," > so to speak. Our goal is to provide a large viable library to all, > not just to the scholars, who represent less than 1% of the people, > and are often very elitist. I don't think anyone is advocating providing the PG library "just to the scholars", so that's a strawman. Instead, some people simply want to make PG texts more useful to scholars than they currently are, and I think we can do that without making them less useful or less available to non-scholars. -Michael **I have all kinds of books on my shelf- first edition anthro texts, humor books, cook books. Each one of them has a publisher and info on the publishing date. If PG is a publishing house for out-of-copyright books, fine. But it's supposedly a book repository. If it's a repository of books that were actually published in the real world, why are the original paginations, illustrations and figures, maps, indexes and bibliographies, and publication dates such a problem? If I want to be taken seriously as an engineer but I use my own terminology for basic engineering terms or just refuse to lose them at all, why should I get shirty if engineers with college degrees don't take me seriously? _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From mbuch at mcsp.com Fri Nov 12 16:49:04 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Fri Nov 12 16:47:20 2004 Subject: [gutvol-d] increasing literacy In-Reply-To: <41954E74.4EFB64DE@ibiblio.org> Message-ID: Illiterates rarely use computers for reading. PG would be useful after a person became literate, i.e., able to read. Even the children's books on PG are a bit too advanced for a person who is non-litereate. Having taught reading, it would not be the first place I would turn- it's too text-heavy, for one thing. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck Sent: Friday, November 12, 2004 7:00 PM To: Project Gutenberg Volunteer Discussion Subject: [gutvol-d] increasing literacy Michael Hart wrote: > > If we can increase literacy by even 10%, > we make more difference than if we cater > to the scholars. We could make even more difference by doing both! Setting that aside, do we have any data (or even anecdotal evidence) re the effect of Project Gutenberg on literacy levels? -Michael _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From jeroen at bohol.ph Fri Nov 12 17:30:05 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Fri Nov 12 17:30:01 2004 Subject: [gutvol-d] Gone with the wind i s "Gone with the wind" In-Reply-To: <006d01c4c8cb$e4d38440$069595ce@net> References: <006d01c4c8cb$e4d38440$069595ce@net> Message-ID: <4195639D.5060600@bohol.ph> Norm Wolcott wrote: > Sayonara. Apparently all versions of GWTW have disappeared from the net. > > nwolcott2@post.harvard.edu Friar > Wolcott, Gutenberg Abbey, Sherwood Forrest Try the wayback machine, www.archive.org Jeroen. From jon at noring.name Fri Nov 12 17:51:02 2004 From: jon at noring.name (Jon Noring) Date: Fri Nov 12 17:51:22 2004 Subject: [gutvol-d] Perfection In-Reply-To: References: Message-ID: <189945464046.20041112185102@noring.name> Michele "Her Serene Highness" wrote: > [snip of excellent comments] > > **Why not? It's done all the time. Students and scholars have cited rare > books that are impossible to find before- I remember citing a rare book that > contained the concordat between the Vatican and Germany for a grad class > years ago, and information on the Black Star line of Marcus Garvey while > still in high school. Why did my professors accept my citations? Because > they could be tracked down. It wasn't impossible to find the originals- > just difficult. The former one was located in Bobst Library at NYU and the > latter was in the NY Public Library's Schomberg Collection. I can find both > of them more easily now, because both libraries have their catalogues > online. That menas I can find the cites and then go look at the actual > books. Since there is no physical book with PG that an outsider can hold, > it would be nice to have a master scan of the text. PG isn't meant to be a > master text- it's a repository for copies. But copies come from somewhere. The above comment suggests two basic requirements PG should embrace for all texts: 1) The original source (or sources for composite works) is fully identified and described in the metadata using accepted library cataloging standards, and that these fields are searchable. 2) The original page scans also exist in the database, linked to and from the digital text version (easy to do in XML -- TEI has markup for this purpose.) > I happen to love PG- but it will be in ideal form when it has hyperlinks to > other books and to the notes I type up, when I can print it out and have it > paginated, when I can tell if I'm reading a facsimile of a first edition. I > know that's a lot, but a girl can dream. amd some sites are doing that kind > of thing with individual books already- but their scope isn't as large as > PG's. PG's scope is what makes it valuable, but I wouldn't use it foor > scholarly work. The ability to annotate, reference and interlink texts within a digital text repository are very powerful features. The fundamental architecture of the "PG Library System" should include this as a future possibility. To me, this is even more exciting than some of the other things being considered, such as language translation. The requirements associated with these features strongly point to formatting all PG master texts in XML. W3C's XPointer can be used to address both spots and ranges within an XML document using several schemes (both W3C defined and custom schemes within the XPointer Framework.) The most common and most robust/persistent scheme is the well-known fragment identifier. But there's also a scheme to point to a particular element (tag) in a document which does not have an 'id', as well as to point to a spot within content (this scheme is still in Draft form -- it is not a W3C Recommend.) So long as the XML document remains unchanged (and for the fragment identifier scheme where the 'id's are kept unchanged even if changes are made to the document), the XPointer addresses will still work. (The term used here is "persistence".) One problem area, which gets into Identifiers, is how to address the XML document itself -- can it be addressed "standalone", or must it be addressed only when it resides within a repository (such as the PG Library)? If the XML document can be addressed standalone, apart from the repository, then obviously it must internally contain an identifier, the same one used to identify it within the repository and which forms part of the URI reference. It was an interesting exercise last year when the Open eBook Forum's Publication Structure Working Group spent three months studying how to reference and interlink OEBPS Publications, and how to address particular spots and ranges within particular XML documents within a Publication (OEBPS allows multiple documents to comprise one Publication.) Of course, complicating things, which may be less of an issue for PG, is that we wanted the linkability to persist even when the OEBPS Publication is converted to something else, provided the converted format can contain the relevant internal pointers. In this study, Identifiers became a Significant Issue (tm). PG will need to come up with a viable identifier system and specialized URI syntax for using XLink. For many of you, the above is probably all Greek. But if one wants to enable annotation, referencing, and text interlinking within the PG Library system, then this will put constraints and requirements that need to be considered. One workable solution is where all the texts are in XML, and one uses these cool technologies called XPointer and XLink to enable these features. Fortunately, it appears the "powers who are" have decided upon moving someday to XML for the PG Master Texts. > One person made the comment that PG shouldn't try to anticipate what > scholars want- it should let scholars discover it and let them say what they > need. I just did, and most of what I'm hearing is that I have to learn to > adapt to PG, when there are perfectly good college libraries out there. > There is no reason for scholars to embrace a site that doesn't even meet up > with basic MLA guidelines for books. After all, that's the business you are > in- not original websites, but books. Michele's point is that before PG makes any substantive decisions, it needs to decide upon which user groups it would like its texts to target (the more the better in my opinion), and then ask the experts in those groups to submit requirements. This should be done *before*, not after, matters have been decided and the next-gen (or next-version) PG system is ready to be built. As I've said before, I believe it possible to come up with a set of basic requirements for all PG texts which will reasonably meet the needs for most, if not all, groups we identify (maybe by the "80-20" rule, at the minimum.) By designing the system to be extensible for particular special needs, then it will be able to fill in where the basic requirements don't. A summary rehash: If one considers that PG texts are not to be solely standalone (which is the traditional view), but rather are components of a dynamic and powerful repository (where the whole is greater than the sum of the parts), then this creates specific requirements which simultaneously impacts upon the areas of format, metadata/identifiers, database structure, user interface design, to name a few. A holistic approach is definitely necessary to assure that whatever is decided for one area will not cause problems in another area. Thinking holistically, factoring in the long-term vision of what we want the PG Library to do and to be fifty years from now (and I don't believe this is being discussed enough), is important. Jon Noring From jon at noring.name Fri Nov 12 18:08:03 2004 From: jon at noring.name (Jon Noring) Date: Fri Nov 12 18:08:22 2004 Subject: [gutvol-d] Oops, forgot accessibility (was PG audience) In-Reply-To: <184940134937.20041112172213@noring.name> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <41954D36.7B9503D9@ibiblio.org> <184940134937.20041112172213@noring.name> Message-ID: <187946484406.20041112190803@noring.name> I wrote: > Without thinking about it for more than 30 seconds, here's a partial > list of different user groups. No doubt this list can be expanded and > much better described/subcategorized. But it's a start to further > discussion if enough here deem it of interest. > > 1) Personal interest readers > 2) Scholars and researchers > 3) Students (K-12 and post-secondary) > 4) Professional and vocational Geez, I forgot one of the most important user groups of all: 5) Readers with special needs (blind, dyslexic, etc.) Note that there's a strong movement to require that K-12 and public post-secondary educational materials be highly accessible, to be offered in accessible formats. In the U.S., for textual materials this will very likely be mandated as the XML-based NIMAS specification (which in turn is derived from the DAISY Digital Talking Book specification.) If we want PG texts to be legally used in the classroom setting, which I think is an *opportunity*, not a *burden*, then we definitely need to assess how the "master" XML Schema settled upon (probably through DP) will be compatible with NIMAS by XSLT or other conversion method. It should be pretty easy to conform most if not all PG Master texts to the NIMAS requirements, since from what I understand the PG Master text Schema will likely be a subset of TEI. I strongly suggest that before any XML-based vocabulary be decided upon as the "master" PG format, that we consult with the technical folk at DAISY, RFB&D, CAST, etc., to assure we aren't overlooking something or doing something which would make accessibility more difficult. As a heads up -- they love good navigational aids in the markup and in external metadata (imagine being blind -- having multiple verbal menus to access the texts in different ways is important!) We might even be able to solicit the help of the accessibility community to add navigational markup to selected PG texts. Jon Noring From jtinsley at pobox.com Fri Nov 12 18:30:28 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Nov 12 18:30:42 2004 Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few questions that i don't know the answer to) Message-ID: <20041113023028.GA8211@panix.com> On Thu, 11 Nov 2004 23:33:29 -0800, Greg Newby wrote: >On Thu, Nov 11, 2004 at 11:00:52PM -0500, Bowerbird@aol.com wrote: >> second, for greg. people over at distributed proofreaders >> have reported that the f.a.q. here at project gutenberg >> do not state that styled text (specifically, italics and bold) >> be marked with underbars and asterisks in the text files. >> the understanding i have from you is that this has become >> the official policy of project gutenberg. if that's not the case, >> would you please inform people here? and if it _is_ the case, >> when you next update the f.a.q., could you include this policy? >> thank you. > >Jim maintains the FAQ, and DP has their own style guides that >sometimes vary for different texts. So, I'm not really the right guy >to ask. I don't think there was agreement on how to handle bold >& italics, but I do think everyone I heard from agreed it should be >indicated somehow in plain text. > >So, I don't think there is an official policy on handling >bold & italics in plain text files. But if DP has an official >policy I'm unaware of, then it should probably be reflected >in the FAQ as a recommendation. > >Sorry I don't know the current state on this, but perhaps >Jim or some of the DP project managers can contribute the >latest thinking. Italics is well covered at http://gutenberg.net/faq/V-94 http://gutenberg.net/faq/V-95 About three years or so ago, 'most everyone settled on _underscores_ for italics, with a few holdouts for /slants/. CAPITALS, of course, are still represented in a lot of older texts, but I haven't seen anyone using them in a new text for quite some time. Compared to italics, bold as a method of emphasizing text, as opposed to bold as an incidental property of a heading, is relatively rare. Where bold does need to be rendered in plain text, the current most common usage (from DP) is *bold text*. There are times when it is appropriate to signify bold, but I have seen some texts coming from DP where it has been used unnecessarily -- mostly to indicate a sub-heading or chapter title in the book. In such a case, where a chapter title is clearly a chapter title and on a line by itself, there really is no need to mark it in the plain text version as having been bold face in the original. I think this practice comes from people pre-marking the text for later conversion to HTML, rather than any intent to clutter the plain text. jim From servalan at ar.com.au Fri Nov 12 11:56:49 2004 From: servalan at ar.com.au (Pauline) Date: Fri Nov 12 18:32:43 2004 Subject: [gutvol-d] PG audience In-Reply-To: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> Message-ID: <41951581.5060303@ar.com.au> Joshua Hutchinson wrote: > ----- Original Message ----- > From: Marcello Perathoner >It would be useful to include the dp project number in some form. We >>have a discussion ongoing with Joshua on how to achieve this. >> > > > Yep, pretty easy to implement in the teiHeader. And then it can be used by Marcello's script to link back to the DP project commments fairly easily. At this point, I figure it will be in there in the final specs. Please note there is no 1 to 1 correspondence between the DP projectID & an etext number. Multi-volume works/works which are split at DP for ease of processing, appear as with a single etext number in PG, while having multiple DP projectIDs. DP does record the PG etext number for each projectID once the work has been posted. DP requires a user be logged in to be able to view the Project Comments pages at present. i.e. I do not see the wisdom of linking to the internal DP projectID from the PG database. It would great to capture the bio. info in some of the DP Project Comments pages in PG, for some of the projects I post process, the DP Project Manager has been adding information to existing wikipedia entries. e.g. http://en.wikipedia.org/wiki/Kermit_Roosevelt Cheers, P -- Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." From joshua at hutchinson.net Fri Nov 12 19:06:56 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 19:06:56 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41951581.5060303@ar.com.au> References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> <41951581.5060303@ar.com.au> Message-ID: <41957A50.2050809@hutchinson.net> Pauline wrote: > Joshua Hutchinson wrote: > >> ----- Original Message ----- >> From: Marcello Perathoner > >>> It would be useful to include the dp project number in some form. We >>> have a discussion ongoing with Joshua on how to achieve this. >>> >> >> >> Yep, pretty easy to implement in the teiHeader. And then it can be >> used by Marcello's script to link back to the DP project commments >> fairly easily. At this point, I figure it will be in there in the >> final specs. > > > Please note there is no 1 to 1 correspondence between the DP projectID > & an etext number. > > Multi-volume works/works which are split at DP for ease of processing, > appear as with a single etext number in PG, while having multiple DP > projectIDs. > Not exactly. The only things that don't have a one to one equivalent are the beginner books. Something spans multiple volumes is usually (always?) posted as multiple etexts. There may be an omnibus posting, though, too. I those cases, we'd probably link to the projectID used for the first part going through DP. > DP does record the PG etext number for each projectID once the work > has been posted. > > DP requires a user be logged in to be able to view the Project > Comments pages at present. > > i.e. I do not see the wisdom of linking to the internal DP projectID > from the PG database. > Whether the link back to DP would be useful or not I leave to others to decide. However, when we get to the point ... someday ... of providing the original page scans (for those that we can), I wouldn't be surprised to see the projectID used as the value to tie back to them. So, let's put the projectID in there for now. It definitely doesn't hurt anything. > It would great to capture the bio. info in some of the DP Project > Comments pages in PG, for some of the projects I post process, the DP > Project Manager has been adding information to existing wikipedia > entries. e.g. > http://en.wikipedia.org/wiki/Kermit_Roosevelt > > Cheers, > P From j.hagerson at comcast.net Fri Nov 12 19:19:13 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Fri Nov 12 19:19:45 2004 Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience] Message-ID: <007f01c4c92f$900159b0$6401a8c0@enterprise> Joshua Hutchinson wrote: >Whether the link back to DP would be useful or not I leave to others to >decide. However, when we get to the point ... someday ... of providing >the original page scans (for those that we can), I wouldn't be surprised >to see the projectID used as the value to tie back to them. So, let's >put the projectID in there for now. It definitely doesn't hurt anything. While it won't hurt anything, a DP ID might not have any significance for linking to page scans. DP periodically archives finished projects and takes them off line. If we would need to bring scans back on line, it would make more sense to me to restore them to a directory named with the PG eBook number and not the hexadecimal alphabet soup ID that DP uses. From joshua at hutchinson.net Fri Nov 12 19:22:55 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 19:23:02 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41954D36.7B9503D9@ibiblio.org> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <41954D36.7B9503D9@ibiblio.org> Message-ID: <41957E0F.2020906@hutchinson.net> Michael Dyck wrote: >Instead, some people simply want to make PG texts more useful to >scholars than they currently are, and I think we can do that without >making them less useful or less available to non-scholars. > > > AMEN! From joshua at hutchinson.net Fri Nov 12 19:34:52 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 19:34:50 2004 Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few questions that i don't know the answer to) In-Reply-To: <20041113023028.GA8211@panix.com> References: <20041113023028.GA8211@panix.com> Message-ID: <419580DC.2070705@hutchinson.net> Jim Tinsley wrote: > >Where bold does need to be rendered in plain text, the current >most common usage (from DP) is *bold text*. There are times when >it is appropriate to signify bold, but I have seen some texts >coming from DP where it has been used unnecessarily -- mostly >to indicate a sub-heading or chapter title in the book. In >such a case, where a chapter title is clearly a chapter title >and on a line by itself, there really is no need to mark it in >the plain text version as having been bold face in the original. >I think this practice comes from people pre-marking the text for >later conversion to HTML, rather than any intent to clutter the >plain text. > > > Actually, it is probably there from the OCR pre-processing and was never removed through all the rounds of proofing and post-processing... why I feel this is important enough of a distinction that I needed to make a post about ... I have no idea. I'm going to bed. Josh From joshua at hutchinson.net Fri Nov 12 19:37:24 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 12 19:37:27 2004 Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience] In-Reply-To: <007f01c4c92f$900159b0$6401a8c0@enterprise> References: <007f01c4c92f$900159b0$6401a8c0@enterprise> Message-ID: <41958174.40500@hutchinson.net> John Hagerson wrote: >Joshua Hutchinson wrote: > > >>Whether the link back to DP would be useful or not I leave to others to >>decide. However, when we get to the point ... someday ... of providing >>the original page scans (for those that we can), I wouldn't be surprised >>to see the projectID used as the value to tie back to them. So, let's >>put the projectID in there for now. It definitely doesn't hurt anything. >> >> > >While it won't hurt anything, a DP ID might not have any significance for >linking to page scans. DP periodically archives finished projects and takes >them off line. If we would need to bring scans back on line, it would make >more sense to me to restore them to a directory named with the PG eBook >number and not the hexadecimal alphabet soup ID that DP uses. > > > True ... but this way we know which text was which projectID. Otherwise, how do you know that the archive pngs in projectID747364873 should go into PG etext 10576? Josh (Yeah, I made up those numbers ... they ain't valid) From jtinsley at pobox.com Fri Nov 12 20:16:21 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Nov 12 20:16:37 2004 Subject: [gutvol-d] Linking back to DP to get page scans [Was: PG audience] In-Reply-To: <007f01c4c92f$900159b0$6401a8c0@enterprise> References: <007f01c4c92f$900159b0$6401a8c0@enterprise> Message-ID: <20041113041621.GB8211@panix.com> On Fri, Nov 12, 2004 at 09:19:13PM -0600, John Hagerson wrote: >Joshua Hutchinson wrote: >>Whether the link back to DP would be useful or not I leave to others to >>decide. However, when we get to the point ... someday ... of providing >>the original page scans (for those that we can), I wouldn't be surprised >>to see the projectID used as the value to tie back to them. So, let's >>put the projectID in there for now. It definitely doesn't hurt anything. > >While it won't hurt anything, a DP ID might not have any significance for >linking to page scans. DP periodically archives finished projects and takes >them off line. If we would need to bring scans back on line, it would make >more sense to me to restore them to a directory named with the PG eBook >number and not the hexadecimal alphabet soup ID that DP uses. I have posted, I think, three books with their page scans already, which I could do because I was working directly with the producer. We have a provisional protocol for this, which is to put them into 12345/page-images/12345-page-images.zip. You can see http://gutenberg.net/faq/S-21 for more detail. This protocol may, and probably will, evolve as we get more cases. Speaking for myself, I'd like to see more cases (though not an inundation! :-) fairly soon. So if you're producing a book now, whether from DP or elsewhere, and you are free to submit the images, let me know. The page images, where posted, should definitely be stored in the PG archive rather than linked-to; otherwise they won't get mirrored. jim From shalesller at writeme.com Fri Nov 12 21:03:58 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Nov 12 21:04:15 2004 Subject: [gutvol-d] Perfection Message-ID: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> "Her Serene Highness" writes: > David Starner writes: > > What is he supposed to do, give a page reference to one of a dozen editions > > that might be very hard for the teacher to find? With etexts, you know > > that your recipent has access to the same edition you have. And as someone > > else pointed out, if you quote the sentence, the context can be found in > > seconds. > > **Why not? It's done all the time. Students and scholars have cited rare > books that are impossible to find before- I remember citing a rare book that > contained the concordat between the Vatican and Germany for a grad class > years ago, and information on the Black Star line of Marcus Garvey while > still in high school. Why did my professors accept my citations? Because > they could be tracked down. One of the methods of mathematical proof is proof by uncheckable citation. "This lemma is proved in the January 1822 volume of the Bohemian Mathematical Journal, pages 12-43." If the volume is in some library half-way across the country, nobody is going to take the time to check a cite in some students paper. If the teacher is never going to check the cite, what's the point? And if he's going to find the one copy in the nation and order it via ILL, what's so hard about searching through an online document? > But other than as a work of literature, i'd have problems > using it- like if I were comparing 19th century versions of Arabic texts, > because I'm not even sure it was written in the 19th century.** Anyone born in 1980 or later would know quite quickly, just like I do. It was translated in 1861 and reprinted in 1971 as part of the Everyman's Library, and has been frequently reprinted. It has a second edition, in 1871; assuming the Everyman's library's text was taken from the second edition, you can quickly check to see whether the PG edition is the first edition or the second edition. Google is your friend. So is the LoC catalogs, but watch out because they frequently have authors split under two headings, one of the marked as being from the old catalog. > **How? Easy. You look at other books about Koranic translations and see if > they refer to this one- and guess what? You can't do that online. Which > means you have to go to a library anyway. Online isn't BETTER. It's > different. Or I could do a search online and find out that Rodwell's translation is considered inferior by some because he wasn't a Muslim, but is probably one of the better public-domain ones. I also find "All the prominent translations of the Quran have each been the product of a single individual, so there is no translation which truly reflects the collective and opposing thoughts of a range of scholars. Such a large-scale collaborative effort would most likely be required to establish any one translation as most authoritative. Since this has not yet happened, there is no translation of the Qur'an as widely accepted (for example) as the New Revised Standard Version of the Bible. "As a result, individual English-speaking Muslims tend to have their own personal favourites. Indeed, those who read more than one translation often develop a fondness for different aspects of each. For example, the renowned scholar Annemarie Schimmel, author of dozens of books on Islam and formerly professor of Islam at Harvard University, favoured the translation of Arthur John Arberry for beauty of expression, and that of Marmaduke Pickthall for literal rendering of Arabic phrases." which are from , and which convienantly have links to the authors so I can find their credentials. > By the way- in a library, I can tell if a book is a reprint. But what you can't tell is if it was reprinted, if all you have is the original. A quick search through the LoC's online catalogs should give you a pretty reasonable guess as to whether it was reprinted or not. > I could print it > out and share it with my friends- after all, most people don't read whole > books online. Control F is only useful if I'm in front of a machine. If I > want to read a Tom Swift book to my kids at a chapter a night, I'm not going > to do it from a laptop or park little Johnny's bed next to my desk. No, but you aren't doing scholarly work with Tom Swift. And again, your generation doesn't read whole books online, but mine does. > when I can print it out and have it > paginated, My printer _always_ paginates documents. If you're dealing with an old dot-matrix, you have to paginate it manually, but the paper is usually prescored for seperation. (-: But seriously, I can't imagine why you'd want that. The original pages were designed for the original machine, and the change in fonts and typesetting, which is unavoidable, will change where the page breaks would naturally fall even if you strive to keep everything the same as the original. It would be much better to put page numbers in the margins and let the physical breaks fall where they may. Accept that page numbers have become free of the physical form of the book. I think that we should retain source information, but you don't understand and accept the power of the tools at your fingertips. Much of the context about a book can be resolved in a google search or a search of the appropriate library catalog online. Stop and think before you hit that print button; ink is expensive, you know. A lot of things don't need printing out. Try emailing things to people, and letting them print it out if they want to. Ebooks don't need to dance to be useful. Even well-stocked libraries don't have many of the books we do, and even if you have to send for the hardcopy, having the book at hand is useful. It vastly simplifies searching and especially concordence building. Online books are better in many ways, not just different. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From brad at chenla.org Fri Nov 12 21:46:26 2004 From: brad at chenla.org (Brad Collins) Date: Fri Nov 12 21:48:26 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41950197.2020707@perathoner.de> (Marcello Perathoner's message of "Fri, 12 Nov 2004 19:31:51 +0100") References: <41950197.2020707@perathoner.de> Message-ID: This is a real polarizing issue, with many academics believing that they are the annointed guardians of literature and recorded knowledge. They feel threatened by groups like PG and DP which have by-passed their institutional traditions. Many academics today feel threatened by etexts in the same way that the clergy felt threatened by the printing press. I asked for a copy of the TEI source for Bradford's History of Plymouth Plantation last month from some academic group. They asked me to submit a formal request which would explain what I would use the text for! There are any number of academic etext repositories which block people from accessing public domain material because of `copyright issues'. Worse is how many university presses are making IP land grabs worthy of the RIAA and MPAA. There are a number of books which are now only available to astonishingly expensive editions. The OED is an example of this. Oxford has pumped a huge amount of money into the dictionary, but the dictionary has also been built with an enormous amount of volunteer help. There are no libraries anywhere near where I live in Bangkok with a copy of the OED which I can use. Since I don't have a credit card, I can't get access to the online edition even if I had the money to pay for it. The academic preisthood feels that their powerbase and institutional pupose for existence is threatened so they are circling the wagons and giving the world good reason to threaten them. On the other hand, academics, _are_ often the only people preserving a lot of man's older and mostly forgotten knowledge and placing it in context so that it can be understood today. Academics feel horrified when they hear people say, I don't care about all that stuff, just give me books. This is the same horror that geeks experience when they hear people say with pride that they can't program their VCR and never will. Being proud of being ignorant is something that I have never understood and never will, but I think that what Joe Sixpack is saying is that geeks and scholars should do their job and shouldn't bother him with the details. He's only concerned with the resulting text or software not the process of how it was created. In a sense he's right. That's our job, and we shouldn't try to force the end-user to understand the larger or technical issues involved in doing our job. The great unwashed masses have no idea how much work is involved in doing our jobs and sometimes believe that we're making things far more difficult and complex than it really is. As Neil Stephenson said, most people want a mediated experience like you get from Disney. They don't want to see or deal with the enormous complexity behind it all. I believe that we should think more like special effects artists who believe the best effects, and the ones that they are most proud of are the ones that no one realises are effects in the first place. Many academic editions are so burdened with analysis, and annotation that they get in the way of the text itself. Electronic editions can hide the glorified and sanctified academic Cliff Notes but make them easily accessible if you need or want them. Personally I like it both ways. Sometimes I want I want to work at a text and really study it and all the scholarly apparatus is a godsend. But other times I just want to read a story, and leave the stuff I don't understand for another time. The great promise of the computer age has been to provide tools which allow the average person with no experience or skills to do the work that required highly skilled workers using specialized professional equiptment. Desk top publishing in the 80's is a great example. As soon as laser printers and colour monitors became cheap enough everyone thought that a secretary who barely could use Wordstar could do the work of a team of professional graphic artists and typesetters. Visual Basic was touted as being a language that could be mastered by the average person and produce applications of the same quality as apps written in C by experienced programmers. Right. Apple is now pushing the dream that anyone with an Apple and a good video camera can be the next Stanley Kuberick with less than US$20K in hardware and software. The barrier of entry and access to the tools for the next Stanley Kuberick is now much lower, but that doesn't mean your Aunt Cindy is going to be making the next Full Metal Jacket in the corner of her family room on her iMac. People like Bowerbird (who I suspect is still here, despite giving his formal swan song) want to reduce the complexity behind the scenes to something as simple as what the end-user sees. The thing is, that at first glance it really doesn't look like it's too difficult. And the plethora of cheap, professional quality tools availible through chain stores makes it seem, at first, not to be too difficult. This has had the negative side-effect of giving Joe Sixpack the illusion that all of this stuff is a lot easier than it is and to give the impression that professionals who have spent decades studying and honing their craft are just full of crap and making things more difficult than they have to be. I suspect that over the next decade, institutions will be re-cast and professionals will re-establish themselves so that their education, experience and skills will be respected. But for those of us in the trenches during the transition it won't be easy and it won't be pretty. b/ -- Brad Collins , Bangkok, Thailand From shalesller at writeme.com Fri Nov 12 21:59:47 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Nov 12 22:00:04 2004 Subject: [gutvol-d] PG audience Message-ID: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> Brad Collins writes: > The OED is an example of this. Oxford has pumped a huge amount of > money into the dictionary, but the dictionary has also been built > with an enormous amount of volunteer help. There are no libraries > anywhere near where I live in Bangkok with a copy of the OED which I > can use. Since I don't have a credit card, I can't get access to the > online edition even if I had the money to pay for it. And I understand that despite how much it costs, it has never turned a profit in the history of its existance. Oxford keeps people working on it because of its importance, not as a profit making venture. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From traverso at dm.unipi.it Fri Nov 12 22:16:38 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Nov 12 22:16:59 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> (shalesller@writeme.com) References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> Message-ID: <200411130616.iAD6GcSm004979@posso.dm.unipi.it> There is a reason to preserve page numbers in ebooks. While correct academic quotations can be excellently made without page numbers, quoting an electronic version, the same is not true with retrieving information quoted somewhere else (in an old paper edition). So for example if an existing book contains a sentence like "This topic is discussed in book aaaaa in pages xxx-yyy, (the exact edition is quoted in a reference) how do you find easily the exact range of pages, without page numbers? The same happens when a book has an index: the index item is often not found literally in the text, and a page number is an handy way to find the reference. Of course, the index can be improved (in an *ML edition) with cross-links, but transforming an index into a cross-linked version is a lot of work, and has to be done by an expert, while reading a page to find a reference is much less work, and is usually done by a (relatively) expert. Some just answer: then do an HTML or a TEI edition. This I don want: I cannot, and I do not want to learn, I prefer working with text, and do more texts. And I prefer using text instead of *ML. Moreover if I keep page numbers, conversion to *ML with page numbers will be much easier than having to retrieve the numbers from the images. Some say: page numbers are ugly in txt. It is the same people that want to have an *ML version, so why do they bother? Please take the txt version with number, do your *ML and leave the txt alone. Of course, having page numbers in Tom Swift might be too much. But at least if a book has an index, I believe that page numbers might be useful, even in txt, and we should recommend to keep the information. Carlo From sly at victoria.tc.ca Fri Nov 12 23:22:59 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Nov 12 23:23:18 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41951581.5060303@ar.com.au> References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> <41951581.5060303@ar.com.au> Message-ID: On Sat, 13 Nov 2004, Pauline wrote: > It would great to capture the bio. info in some of the DP Project > Comments pages in PG, for some of the projects I post process, the DP > Project Manager has been adding information to existing wikipedia > entries. e.g. > http://en.wikipedia.org/wiki/Kermit_Roosevelt I've just added that link to the author record for Kermit Roosevelt in the PG online catalog, as well as his birth and death dates. If you have any other authors represented in PG who have articles about them on wikipedia, let me know... Andrew From brad at chenla.org Fri Nov 12 23:37:53 2004 From: brad at chenla.org (Brad Collins) Date: Fri Nov 12 23:39:59 2004 Subject: [gutvol-d] PG audience In-Reply-To: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> (D. Starner's message of "Fri, 12 Nov 2004 21:59:47 -0800") References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> Message-ID: "D. Starner" writes: > Brad Collins writes: > >> The OED is an example of this. Oxford has pumped a huge amount of >> money into the dictionary, but the dictionary has also been built >> with an enormous amount of volunteer help. There are no libraries >> anywhere near where I live in Bangkok with a copy of the OED which I >> can use. Since I don't have a credit card, I can't get access to the >> online edition even if I had the money to pay for it. > > And I understand that despite how much it costs, it has never turned a > profit in the history of its existance. Oxford keeps people working on > it because of its importance, not as a profit making venture. > Good point -- The bills have to be paid by _someone_. But does that factor in profits from other dictionaries like the COD (Concise Oxford Dictionary)? The OED is the baseline for all of the Oxford Dictionaries, just as Merriam Webster does with their unabridged third international and the rest. COD or the MW Collegiate would not be what they are without their monster unprofitable cousins. I read somewhere that the COD has been one of the top selling books in UK every year for quite some time (that could be wrong though). And it might well be that even with this other revenue the whole venture might still be short of a profit. But if they are working on it because of its importance and not for profit then why make it so expensive? They _want_ to make a profit from it and they are trying. Fair enough. If the OED is only available in institutions which can afford it, it will eventually be replaced by another, just as Britannica is loosing ground to Wikipedia. Wikipedia still has a ways to go (perhaps not in quantity but in quality) but the writing is on the wall. More than any other type of intellectual work, every dictionary and encyclopedia is built on the backs of those that come before it. And so it goes. b/ -- Brad Collins , Bangkok, Thailand From tb at baechler.net Sat Nov 13 00:07:16 2004 From: tb at baechler.net (Tony Baechler) Date: Sat Nov 13 00:05:50 2004 Subject: [gutvol-d] PG audience In-Reply-To: <20041112185400.GD3160@pglaf.org> References: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> <4194B4A3.3050305@perathoner.de> <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> Message-ID: <5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com> At 10:54 AM 11/12/2004 -0800, you wrote: >On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote: > > At 06:23 AM 11/12/2004 -0800, you wrote: > > > > >Actually, it's pretty easy to find all the original Project Gutenberg > > >eBooks, > > >as well as the newer versions, because so many places keep them, > usually in > > >the thousands for any of our eBooks that have been out for even a week. > > > > Hello. Actually, I've had a hard time finding any of the very early > > editions of PG files. There are some old files in the etext90 directory, > > but not edition 10 of the first several ebooks. I would be interested to > > find the very first edition of when10.txt or whatever it was called as MH > > posted it. Even the old GUTINDEX.* files have been removed, with the > > earliest being GUTINDEX.96 when it used to be GUTINDEX.90. > >Michael might have some of the older files. There are a few >sources, like old Walnut Creek CDs, that might also be able >to help. I do not have every old Walnut Creek CD ever published, but I do have one and it does not have any of the older files either. I first started using PG in 1995 and even then the very early files from 1971-89 were not generally available. The oldest file, at least as far as apparently the oldest PG header that I am aware of is plboss10.zip. I'm not sure if edition 10 is still available but I have it. From david at newmannotes.com Sat Nov 13 01:30:44 2004 From: david at newmannotes.com (David Newman) Date: Sat Nov 13 01:30:44 2004 Subject: [gutvol-d] Scholarly use of PG In-Reply-To: <20041113050417.9F29A8C914@pglaf.org> References: <20041113050417.9F29A8C914@pglaf.org> Message-ID: As a credentialed conflict avoider, I've been loathe to stick my head into this fray. Indeed, this battle about meeting the needs of academia appears to be waged at times with an ideological fervor to rival that of the recent US election. It seems to me that the fervency with which people approach this issue has made it difficult in some cases for the arguments to follow a path towards resolution. It is perhaps also complicated by the wide assortment of changes being proposed to remedy the perceived problems. Some arguments for change suggest that PG should direct its energies towards making its library suitable for scholars by including more information in the files, particularly pagination and provenance, presumably packaged with XML. I have no problem with including such information. However, I don't think it should be required of all texts, nor do I believe that it really solves the scholarship issue. Including page scans _would_, to the degree that a solution is possible, and requires approximately 0% extra work for most of our valiant volunteers. And, PG has made it clear that this is acceptable, and has already done so for some projects. I feel that Marcello gave the most persuasive and concise summary of the situation, and I didn't notice any overt disagreement. Marcello Perathoner wrote: >The best value for Academia (and the least work for us) would be just to >include the page scans. Any transcription you make will fall short of >the requirements of some scholar. I think we should use our time for >producing more books for a general audience instead than producing >Academia-certified editions of them. HSH's comments justify such an approach. Her Serene Highness wrote: >I need to know EXACTLY when the original >was published, who published it, and where, since there are variant texts >out there. Even a single word change that might have occurred in the >copying process could change the meaning of a vital sentence. Of course, there is a simple, if unsatisfactory, answer to all these questions for PG texts: they were published by PG, on the PG website, and each file states when it was published. Each work we publish is the "PG variant" of that text. As an academic, I find it dishonest and unhelpful for a scholar to cite a physical volume when the volume they consulted is an electronic edition. It is virtually impossible to guarantee that "even a single word change" was not introduced in the transcription process. Even with DP's careful processes, I would not wager that most of our books enter PG completely error free (or correction free, for that matter.) Page scans allow for an additional layer of safety for any scholar concerned about the adherence to a given print edition, though a certain level of trust in the provider is still required. Thus, while I hope that PG's holdings are as accurate as possible, it would also be my hope that scholars using PG would cite PG. Evidently this is not always the case. Michael Hart wrote: >I've also heard that many of those who complain, actually use our >eBooks in secret, and ONLY want the provenance so they can steal >them without giving credit where credit is due. This suggests to me two things. 1) We can include page scans and information about provenance, _when available_, with the files so that academics can feel confident in the reliability of those PG holdings. Not so that the original sources can be dishonestly cited, but to provide the necessary data for certain scholars to confidently cite PG's edition. We can point to this in our documentation to enhance our scholarly credibility. 2) We can prominently suggest an appropriate style of citation of works in PG's holdings. (I've seen this done with other digital collections.) Perhaps if the citation style also takes into account the original source, some otherwise reluctant scholars would be appeased. Is this something we can all agree on? -- David Newman www.davidnewman.info From traverso at dm.unipi.it Sat Nov 13 02:55:32 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sat Nov 13 02:55:56 2004 Subject: [gutvol-d] Scholarly use of PG In-Reply-To: (message from David Newman on Sat, 13 Nov 2004 01:30:44 -0800) References: <20041113050417.9F29A8C914@pglaf.org> Message-ID: <200411131055.iADAtWUG012033@posso.dm.unipi.it> David's solution is perfectly OK for me. It is sufficient that PG does not discourage keeping the extra information (it did until recently). The volunteers will do the rest. An importaint improvement would be to be able to go easily from the text to the corresponding page scan. Just having the two separately is fine, but having them linked is better; going from image to txt is easy (search), but the converse is often hard. There are of course different solutions. All require, in some form, to preserve the page information, including page numbers in the source is just one method. Another remark, on page scans obtained from other sources: one of these sources, the one that I mostly use, and that has originated hundreds and probably thousands of PG books, is the french national library, http://gallica.bnf.fr. I have received (by email) a ratheer broad permission to use everything on the site to produce ebooks for DP and PG, and related sites (I have used the permission for LiberLiber and DP-EU). It might be possible to renegotiate the permission, but might result in a restriction of the terms. But I believe that the original permission could cover the possibility of giving to the user the possibility of checking an individual page for comparison, not of mirroring their files, once the transcription completed; these files can very well obtained from the origin. The french national library is not expected to die or to become unavailable: and in that case we have the image files. Carlo From marcello at perathoner.de Sat Nov 13 05:41:56 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 13 05:41:58 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41951581.5060303@ar.com.au> References: <20041112170017.93D4C2F902@ws6-3.us4.outblaze.com> <41951581.5060303@ar.com.au> Message-ID: <41960F24.7010900@perathoner.de> Pauline wrote: > Please note there is no 1 to 1 correspondence between the DP projectID & > an etext number. Then put *all* DP project IDs into the reulting files. Or put the same project ID into multiple etexts. > i.e. I do not see the wisdom of linking to the internal DP projectID > from the PG database. DP could offer access to their database thru remote procedure calls or whatever so I could pull the data into the catalog. -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Nov 13 06:01:59 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 13 06:02:03 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041112182216.Y99646@krweb.net> References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> <20041112182216.Y99646@krweb.net> Message-ID: <419613D7.4080907@perathoner.de> kris foster wrote: > this is a dangerous reliance on a transitory medium. electronic > citation is merely more convenient. What makes medium permanence a value per se ? Academia has developed its traditions around a medium (papyrus, paper) that is permanent. Not the other way around. If the medium they had used was impermanent the methods and traditions of Academia would be different today. Medium permanence can be a big disadvantage too. The scholars in the middle ages relied blindly on Aristotle. Scientific method in the middle ages amounted to find out what Aristotle said about some subject, and that was that. Own research was not deemed a scientific method. Of course, Aristotle said that "wood swims and metal sinks" and that "heavier items fall faster than lighter ones". -- Marcello Perathoner webmaster@gutenberg.org From marcello at perathoner.de Sat Nov 13 06:42:13 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 13 06:42:19 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> Message-ID: <41961D45.1080901@perathoner.de> Brad Collins wrote: > Wikipedia still has a ways to go (perhaps not in quantity but in > quality) but the writing is on the wall. Renowned German computer magazine c't (issue 2004/21 pg. 132ff) tested following German encyclopaedias: - MS Encarta 2005 Professional (DVD) - Brockhaus 2005 Premium (DVD) - Wikipedia (internet) Wikipedia got the best score (3.6) for in the "contents" category. (Brockhaus: 3.3, Encarta 3.1) The "contents" test consisted in having domain specialists review 66 articles in 22 different subjects. This is only the German version of Wikipedia with 136,000 articles at the time of testing. English Wikipedia has now approx 400,000 articles. -- Marcello Perathoner webmaster@gutenberg.org From jon at noring.name Sat Nov 13 08:22:18 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 08:22:46 2004 Subject: [gutvol-d] Scholarly use of PG In-Reply-To: References: <20041113050417.9F29A8C914@pglaf.org> Message-ID: <161997739609.20041113092218@noring.name> Michael Hart supposedly wrote: > I've also heard that many of those who complain, actually use our > eBooks in secret, and ONLY want the provenance so they can steal > them without giving credit where credit is due. Michael, Michael, Michael -- you're grasping at straws trying to justify not providing the provenance of any PG works. Let's analyze what you wrote. Here's a portion of the Boilerplate, the "TOU" (Terms of Use) from a 2004 text. (from ftp://sailor.gutenberg.org/pub/gutenberg/etext04/ge71v10.txt ) ********************************************************************** DISTRIBUTION UNDER "PROJECT GUTENBERG-tm" You may distribute copies of this eBook electronically, or by disk, book or any other medium if you either delete this "Small Print!" and all other references to Project Gutenberg, or: [Snip of various restrictions when the small print is kept, including paying a 20% royalty if profits are made from the work.] ********************************************************************** Parsing the Distribution Notice, we see two and only two scenarios (due to the "either/or" construct): 1) Anyone may distribute copies of this eBook electronically, or by *any other medium* (implying conversion), without *any* stated restrictions whatsoever so long as the "small print" and any references to the name "Project Gutenberg" are removed (that is, no one will know for sure the work came from the PG Library.) 2) Follow the restrictions if the "small print" and/or "Project Gutenberg" is mentioned as the source. Michael, PG *is allowing* people to use PG's texts "in secret" (whatever that means), and is welcoming them to be used that way! You are essentially saying they should not use them this way, in contradiction to PG's own small print. There are several strong arguments for including the provenance of PG texts in metadata, which have been discussed ad nauseum here by quite a few sharp people. For example, the leaders and several volunteers of DP have stated it is a good thing to do (and I believe some of them have said it should be a basic requirement.) What I suggest PG does is to take a poll, using an independent forum, of the various volunteers and users of PG texts, and *ask them* the following question: Should PG include the full details of the source document(s) used to produce every PG text? Supposing 2/3 of the respondents said 'yes'. Would PG honor the wishes of the volunteers and users by then requiring the source information be included in the text's metadata? Note that over the years the PG volunteers have put in hundreds of thousands hours (and maybe over a million hours) of time to help build the PG Library collection. They have a valid claim of "moral co-ownership" of what has been produced, and should collectively have their say in the future development of the collection. (One reason why I believe PGLAF should become a member-"owned" organization -- currently its Articles of Corporation state it has no members.) Michael, I'd like to hear from you *all* the reasons you have for why the provenance of PG texts should not be included in the metadata. How does providing the provenance harm the goals of PG? I have yet to hear a cogent and logical argument. Why not write a specific FAQ on this topic? Jon Noring From hart at pglaf.org Sat Nov 13 08:30:11 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 08:30:13 2004 Subject: [gutvol-d] Perfection In-Reply-To: <419613D7.4080907@perathoner.de> References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> <20041112182216.Y99646@krweb.net> <419613D7.4080907@perathoner.de> Message-ID: On Sat, 13 Nov 2004, Marcello Perathoner wrote: > kris foster wrote: > >> this is a dangerous reliance on a transitory medium. electronic citation >> is merely more convenient. > > > What makes medium permanence a value per se ? > > Academia has developed its traditions around a medium (papyrus, paper) that > is permanent. Not the other way around. If the medium they had used was > impermanent the methods and traditions of Academia would be different today. If you visit any library archive, you might be surprised at the preservations problems they are having on an ever increasing level. A decade or so ago, the Library of Congress just completely gave up on trying to keep much of their newspaper collection, and decided to microfilm what they could and to sell off the rest of those archives before they completely fell apart. I bought several volumes from about a century ago, of the New York Herald, just so I could have an additional perspective on the era. Of course, much of the most interesting parts wouldn't be referenced, as they are advertizing. . .such as the first New York apartments that included cooking facilities. Michael From jon at noring.name Sat Nov 13 08:39:51 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 08:40:02 2004 Subject: [gutvol-d] Scholarly use of PG In-Reply-To: <161997739609.20041113092218@noring.name> References: <20041113050417.9F29A8C914@pglaf.org> <161997739609.20041113092218@noring.name> Message-ID: <32998793031.20041113093951@noring.name> I asked the following "poll" question: > Should PG include the full details of the source document(s) used > to produce every PG text? Reading some other recent messages, they seem to imply that any new text submitted to PG which includes full source metadata, that info will now be kept in, when formerly it was stripped out? Is this true? If this is true, this is definitely good news. Since DP appears to include this data, the vast majority of new and future PG texts should now include this info. When the early PG texts are redone by DP at some future time, the world will now have the data available. (If I misread something, however, someone correct me.) But we are still faced with the issue of whether PG should require provenance metadata when the work is transcribed from a "paper/ink" original? Another question is if it should attempt, wherever possible, to reinsert the data into existing PG texts. (E.g., contact the submitters and ask them if they kept that data. DP has submitted thousands of texts -- they no doubt have the source information.) Jon Noring From hart at pglaf.org Sat Nov 13 08:55:04 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 08:55:05 2004 Subject: [gutvol-d] PG audience In-Reply-To: <5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> <4194B4A3.3050305@perathoner.de> <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <5.2.0.9.0.20041112070730.01f60120@snoopy2.trkhosting.com> <5.2.0.9.0.20041113000326.02004ae0@snoopy2.trkhosting.com> Message-ID: On Sat, 13 Nov 2004, Tony Baechler wrote: > At 10:54 AM 11/12/2004 -0800, you wrote: >> On Fri, Nov 12, 2004 at 07:11:46AM -0800, Tony Baechler wrote: >> > At 06:23 AM 11/12/2004 -0800, you wrote: >> > >> > >Actually, it's pretty easy to find all the original Project Gutenberg >> > >eBooks, >> > >as well as the newer versions, because so many places keep them, >> usually in >> > >the thousands for any of our eBooks that have been out for even a week. >> > >> > Hello. Actually, I've had a hard time finding any of the very early >> > editions of PG files. There are some old files in the etext90 >> directory, >> > but not edition 10 of the first several ebooks. I would be interested >> to >> > find the very first edition of when10.txt or whatever it was called as >> MH >> > posted it. Even the old GUTINDEX.* files have been removed, with the >> > earliest being GUTINDEX.96 when it used to be GUTINDEX.90. >> >> Michael might have some of the older files. There are a few >> sources, like old Walnut Creek CDs, that might also be able to help. I could look through my old collections of CD and floppy eBook collections if this is truly important, but you should be advised that the originals of all the earliest eBooks were ALL IN CAPS, and with limited punctuation, since they were typed in on TeleType 33 machines. It would be fun to see if anyone could change them back to the originals, and if the blogosphere that caught Dan Rather could possibly check all the punctuation marks to prove that such a document COULD have been typed on a TeleType 33. Of course, I still have mine here in the basement, and might be able to fake it better than anyone could disprove. However, the whole idea of finding the original files doesn't mean a lot to me. . .but I think the first file was just named "when". . .without any number or any extension. [However, that could have been changed by the system administrators when they moved it to 9-track tape. . .which was done by file location, as I recall, rather than by file name. i.e., give me the file that starts at 1240 feet on tape number 1642. . . . That was the kind of instruction we received back in 1971 when someone wanted the Declaration of Independence. > I do not have every old Walnut Creek CD ever published, but I do have one and > it does not have any of the older files either. I first started using PG in > 1995 and even then the very early files from 1971-89 were not generally > available. The oldest file, at least as far as apparently the oldest PG > header that I am aware of is plboss10.zip. I'm not sure if edition 10 is > still available but I have it. I probably still have copies of the first one. . .I think it was an odd green color. . .but, again, it's only of sentimental value as a collectors' item, as far as I am concerned. I wonder if they will appear 100 years from now on "Antiques Roadshow?" ;-) From hart at pglaf.org Sat Nov 13 09:01:22 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:01:24 2004 Subject: !@!Re: [gutvol-d] PG audience In-Reply-To: References: <20041113055947.910244BE64@ws1-1.us4.outblaze.com> Message-ID: Speaking of the OED, Greg Newby and I were just discussing it a week or two ago, and he is willing to do the first few pages if anyone has access to an edition that correctly states the date of the first volume as 1888. . .just photocopy the title page/verso and the first couple pages and send to him, to get the ball rolling. Not ALL of the original OED is in the public domain in the U.S., by the way. . .only those volumes published before 1923: NEW DICTIONARY OF THE ENGLISH LANGUAGE BASED ON HISTORICAL PRINCIPLES also known as "The Oxford English Dictionary" Copyright dates for the first edition are: V.1 : 1888 v.2 : 1893 v.3 : 1897 v.4 : 1901 v.5 : 1901 v.6 : 1908 v.7 : 1909 v.8 : 1914 v.9i : 1919 v.9ii : 1919 <<< Last Volume We Can Do In The US! v.10i : 1926 v.10ii : 1928 Supplement 1933 ******* The completion of each individual portion was as follows: [I am not sure if the copyright dates concur exactly] [This is something for the scholars and lawyers to fight about] AB-1888 C -1893 D -1897 E -1893 F- 1897 G- 1900 H- 1899 IJK-1901 M-1908 N-1907 O-1904 P-1909 Q-1902 R-RE-1905 RE-RY-1910 S-SH-1914 SI-SQ-1915 ST-1919 SU-SZ-1919 T-1915 U-1926 V-1920 W-WE-1923 WH-WO-1927 WO-WY-1927 XYZ-1921 Michael On Sat, 13 Nov 2004, Brad Collins wrote: > "D. Starner" writes: > >> Brad Collins writes: >> >>> The OED is an example of this. Oxford has pumped a huge amount of >>> money into the dictionary, but the dictionary has also been built >>> with an enormous amount of volunteer help. There are no libraries >>> anywhere near where I live in Bangkok with a copy of the OED which I >>> can use. Since I don't have a credit card, I can't get access to the >>> online edition even if I had the money to pay for it. >> >> And I understand that despite how much it costs, it has never turned a >> profit in the history of its existance. Oxford keeps people working on >> it because of its importance, not as a profit making venture. >> > > Good point -- > > The bills have to be paid by _someone_. But does that factor in > profits from other dictionaries like the COD (Concise Oxford > Dictionary)? The OED is the baseline for all of the Oxford > Dictionaries, just as Merriam Webster does with their unabridged third > international and the rest. > > COD or the MW Collegiate would not be what they are without their > monster unprofitable cousins. > > I read somewhere that the COD has been one of the top selling books in > UK every year for quite some time (that could be wrong though). And > it might well be that even with this other revenue the whole venture > might still be short of a profit. > > But if they are working on it because of its importance and not for > profit then why make it so expensive? They _want_ to make a profit > from it and they are trying. Fair enough. > > If the OED is only available in institutions which can afford it, it will > eventually be replaced by another, just as Britannica is loosing > ground to Wikipedia. > > Wikipedia still has a ways to go (perhaps not in quantity but in > quality) but the writing is on the wall. More than any other type of > intellectual work, every dictionary and encyclopedia is built on the > backs of those that come before it. > > And so it goes. > > b/ > > > -- > Brad Collins , Bangkok, Thailand > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Sat Nov 13 09:03:36 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 09:04:19 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <41950197.2020707@perathoner.de> Message-ID: <171000217734.20041113100336@noring.name> Brad wrote: > I asked for a copy of the TEI source for Bradford's History of > Plymouth Plantation last month from some academic group. They asked > me to submit a formal request which would explain what I would use > the text for! Interesting. I happen to have a copy of the 1898 printing of `Bradford's History "Of Plimoth Plantation."' My wife's maternal ancestry goes back to colonial Massachusetts, and I think one of her ancestors is mentioned in the book (Degory Priest). If this book has not yet been scanned by anyone affiliated with PG/DP, I'll gladly offer our copy for scanning so long as whatever is used to scan it will not damage the binding (probably can't use a flat bed scanner), and that the scans *will* be made available online for free, even before the work is converted to XML. The book, including index, has over 550 pages, so it is pretty massive. A fascinating work, btw, and one I hope will be scanned and converted to TEI by PG/DP. Jon Noring From hart at pglaf.org Sat Nov 13 09:08:28 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:08:30 2004 Subject: [gutvol-d] Perfection In-Reply-To: <200411130616.iAD6GcSm004979@posso.dm.unipi.it> References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> <200411130616.iAD6GcSm004979@posso.dm.unipi.it> Message-ID: Question: How much harder is it to make an eBook set up to answer all these scholarly and reference questions, than just to read? Michael From marcello at perathoner.de Sat Nov 13 09:25:19 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 13 09:25:27 2004 Subject: [gutvol-d] Perfection In-Reply-To: References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> <200411130616.iAD6GcSm004979@posso.dm.unipi.it> Message-ID: <4196437F.9080905@perathoner.de> Michael Hart wrote: > How much harder is it to make an eBook set up to answer all > these scholarly and reference questions, than just to read? Providing source information and page numbers is easy. So it is to provide the page scans. Of course: page scans != ebook. Marking up a book to satisfy most scholarly requirements is more work than I would care for, short of being paid to do it. -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Sat Nov 13 09:27:39 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:27:41 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <41950197.2020707@perathoner.de> Message-ID: On Sat, 13 Nov 2004, Brad Collins wrote: > > This is a real polarizing issue, with many academics believing that > they are the annointed guardians of literature and recorded > knowledge. They feel threatened by groups like PG and DP which have > by-passed their institutional traditions. Many academics today feel > threatened by etexts in the same way that the clergy felt threatened > by the printing press. And this is one of the reasons they won't accept eBooks, even when I bring them to them personally, free of charge. [Oh, they take them, but the won't allow them in libraries.] They still want to be "A Big Fish In A Small Pond." They don't realize the the walls of academia have been penetrated by the virtual world. . .for them to try to stop eBooks is like James Watson's efforts to stop Craig Venter from mapping DNA, or even his efforts to stop the model building Crick 50 years ago. mh From hart at pglaf.org Sat Nov 13 09:28:32 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:28:34 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <41950197.2020707@perathoner.de> Message-ID: On Sat, 13 Nov 2004, Brad Collins wrote: > I asked for a copy of the TEI source for Bradford's History of > Plymouth Plantation last month from some academic group. They asked > me to submit a formal request which would explain what I would use > the text for! Try getting one from the Oxford Text Archive, hee hee! Presuming they are still in operation, and still use the same user apgreement. . . . mh From jon at noring.name Sat Nov 13 09:38:13 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 09:38:25 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <41950197.2020707@perathoner.de> Message-ID: <821002295031.20041113103813@noring.name> Brad wrote: > This is a real polarizing issue, with many academics believing that > they are the annointed guardians of literature and recorded > knowledge. They feel threatened by groups like PG and DP which have > by-passed their institutional traditions. Many academics today feel > threatened by etexts in the same way that the clergy felt threatened > by the printing press. > > I asked for a copy of the TEI source for Bradford's History of > Plymouth Plantation last month from some academic group. They asked > me to submit a formal request which would explain what I would use > the text for! > > [snip of excellent comments] I totally agree that academia (in a general sense, there are notable individual exceptions) is overly protective (to a neurotic degree) of their collections of Public Domain materials and digital derivatives thereof, and should not be. This does not mean, then, that PG and other like-minded digital text repositories should therefore choose not to build their text libraries to have a *reasonable* level of quality for academics and scholars. Rather, what better way to stick it to them is to compete with them on their own turf! Doing this will also raise the consciousness among many, including our politicians, of the value of free and open documents. It might even lead to politicians in progressive states to pass laws requiring their state-run colleges and universities to scan their holdings of public domain works and place them online for free and unencumbered use. After all, many of the "academics" are being paid by taxpayer money, as are many of the archives/repositories they run, thus they are ultimately beholden to the public which pays them, and which is the moral owner of the Public Domain. I'm glad that Michael, this morning, made a call to digitize the OED. Despite my heavy criticisms regarding how PG is run, and what its basic requirements should be, I'm fully in support of its Prime Directive in that (in my words): "All public domain texts, both scans and cleaned-up etexts, should be made, and must be made, freely available in digital form to the world without restriction or encumberance." It pains me when I see publicly-funded academic digital repositories not allowing free and unrestricted access to any work whose source is from the Public Domain. Even if it cost someone $$$ to scan and markup the work, the results should be open to the Public. After all, it is the Public who owns the Public Domain, thus it has the moral right to demand how any digital derivatives of the Public Domain should be used. Jon Noring (p.s., I wonder if some States have an "open documents" law on their books that could be applied to their universities and colleges, and which could be used to force them to open up their digital scans and digital derivatives of public domain works in their collections? I may bring this up with Brewster when I meet with him next week. Thoughts?) From jon at noring.name Sat Nov 13 09:47:26 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 09:47:39 2004 Subject: [gutvol-d] Perfection In-Reply-To: <4196437F.9080905@perathoner.de> References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> <200411130616.iAD6GcSm004979@posso.dm.unipi.it> <4196437F.9080905@perathoner.de> Message-ID: <1611002848015.20041113104726@noring.name> Marcello wrote: > Michael Hart wrote: >> How much harder is it to make an eBook set up to answer all >> these scholarly and reference questions, than just to read? > Providing source information and page numbers is easy. So it is to > provide the page scans. Of course: page scans != ebook. > > Marking up a book to satisfy most scholarly requirements is more work > than I would care for, short of being paid to do it. 1) There are *reasonable* basic requirements, which are not onerous at all, that can be made to make the PG corpus of texts much more useful to academia and scholars. Here are a few that come to mind: a) Provide full catalog info for the source of the digital text. b) Provide the complete set of page scans. (I'm still of the opinion this should be a requirement, with the allowance that scans need not be provided under several defined circumstances.) c) In markup in the Master copy, add markers (plus maybe XLinks) to page breaks found in the source. 2) Any 21st century digital repository of texts should allow the ability of users to annotate, reference, and interlink the texts. This can be done without altering the texts themselves. Thus, the digital repository will do things that no traditional academic library of atomic-based artifacts can do. Thus, scholars themselves will improve the texts to meet their needs -- we need not do everything for them if we give them the tools to do it themselves. Jon Noring From mbuch at mcsp.com Sat Nov 13 09:52:06 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Sat Nov 13 09:50:19 2004 Subject: [gutvol-d] Scholarly use of PG In-Reply-To: Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of David Newman Sent: Saturday, November 13, 2004 4:31 AM To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] Scholarly use of PG As a credentialed conflict avoider, I've been loathe to stick my head into this fray. Indeed, this battle about meeting the needs of academia appears to be waged at times with an ideological fervor to rival that of the recent US election. It seems to me that the fervency with which people approach this issue has made it difficult in some cases for the arguments to follow a path towards resolution. It is perhaps also complicated by the wide assortment of changes being proposed to remedy the perceived problems. Some arguments for change suggest that PG should direct its energies towards making its library suitable for scholars by including more information in the files, particularly pagination and provenance, presumably packaged with XML. I have no problem with including such information. However, I don't think it should be required of all texts, nor do I believe that it really solves the scholarship issue. Including page scans _would_, to the degree that a solution is possible, and requires approximately 0% extra work for most of our valiant volunteers. And, PG has made it clear that this is acceptable, and has already done so for some projects. I feel that Marcello gave the most persuasive and concise summary of the situation, and I didn't notice any overt disagreement. Marcello Perathoner wrote: >The best value for Academia (and the least work for us) would be just to >include the page scans. Any transcription you make will fall short of >the requirements of some scholar. I think we should use our time for >producing more books for a general audience instead than producing >Academia-certified editions of them. HSH's comments justify such an approach. Her Serene Highness wrote: >I need to know EXACTLY when the original >was published, who published it, and where, since there are variant texts >out there. Even a single word change that might have occurred in the >copying process could change the meaning of a vital sentence. Of course, there is a simple, if unsatisfactory, answer to all these questions for PG texts: they were published by PG, on the PG website, and each file states when it was published. Each work we publish is the "PG variant" of that text. As an academic, I find it dishonest and unhelpful for a scholar to cite a physical volume when the volume they consulted is an electronic edition. It is virtually impossible to guarantee that "even a single word change" was not introduced in the transcription process. Even with DP's careful processes, I would not wager that most of our books enter PG completely error free (or correction free, for that matter.) ** I would find it dishonest also. I think it is very important for people to give correct citations. However- and this is a big however- PG is not 'publishing' books. It's copying them. There is no PG publishing house that is making decisions on whether something is worth publishing or not. PG acts a repository- a library. Paper publishers cannot guarantee that each word onthe written page is exactly as written by the author. However, with books that are well known or historically important, scholars can often compare published texts with author's notes in order to see the variants. Many of the books on PG are obscure. we are given the name of a book and an author, but there is no book to be looked at. If these texts are important- and I would argue that many obscure texts are, if only for historical reasons- it is important to have copies of the scans. In some cases, PG may be the only place where someone can find particular texts. Textual clues do not live only in words. A book comes alive in typeface, and in word placement on a page. James Joyce didn't just write words to be read- he placed them on pages in ways that told the reader how to interpret them. Taking a book out of context- the context of the page- when that book was written prior to the computer revolution is like ignoring how many paintings were paired with their frames by the painters themselves. Saving a book while divorcing it from its index, illustrations, typefont, and so on is not 'saving' it. It's a decontextualization. A perfect example would be movie remakes. There are many different versions of 'A Christmas Carol', several of them in modern dress. Many of them use pretty much the same exact script. Does that make them the same? Why do people prefer even an old, scratched-up and faded copy with Alistair Sim to a nice shiny new version, even if the new film is a shot for shot remake? A film is more than actors spouting lines. Film is every aspect that goes into it, even beyond hwat Dickens thought up. There are times when we want nothing more than the words of dickens, and there are times when we want the thrill of seeing characters come to life before us in front of our physical eyes. A book may be perfectly good reading material- but an ebook printed in Courier (which is very hard to read), perhaps missing its original illustrations, without an index that shows the manner in which the author's or editor's mind worked- is no longer the original book. As a scholar I like working from original materials. An original material may be on a computer screen- that's fine by me. An original material might be enhanced by being online- many versions of The Bible are, for instance, and I received great joy recently while reading what was essentially a book that gave a key to Silverlock- it worked better online than it ever could have on paper. But PG is not publishing or storing original texts. It's working with old ones. I recall the cry that vinyl was going the way of the dinosaur- yet it has not. In fact, the MP3 player is the new vinyl- for the first time in years, there are cost effective '45s', courtesy of Napster and other companies. I can hear snippets of a song before buying, just as my mother one did in record shops. However, Napster technology is not better in the long run than a record- CDs and computer memory degrade at an alarming rate. Books aren't dead either, and people who think books are about finding passages in less than 25 seconds are missing the point of why people read- in the same way that people who drink coffee to get revved up often don't understand why tea drinkers make elaborate ceremonies around a caffeinated beverage. People read because they want a total experience- computers don't feel like paper. They don't smell. The text is usually flat and more difficult to read. Some of this will change over time- but not all of it, thank the Lord. I want books to be available to the public in ways that they have never been before, and so I support PG. But it doesn't have the credibility of a real library or publishing house, because it doesn't publish (copying things and leaving out some of the vitals doesn't constitute puplishing in most people's minds, or at least not in a good way, no matter that info techies might want to think)and it doesn't store (libraries don't cut the covers and publishing info off their books to make more room on the shelves, they include books of criticism, and they have technologies for cross referencing- they also have people called librarians who can help people refine their interests and find books that might be of use to them. So do bookstores. Even Barnes and Noble, to some extent). I think some people here want to store books. That's nice, as far as it goes. Whether they understand how people use books or why- well, I seriously doubt that some people here have thought about that. It's like MS Word, which ignores that people write more complex things than business letters. Its vocabulary and understanding of grammar are seriously stunted, and it's hellish for anyone who wants to edit anything longer than two pages. Does it process words efficiently? Yes. But it's a fucking bad word processor and has none of the grace of WordPerfect. That most people are fine with it shows how few people actually write or edit for the joy of doing so, which is fine- but its incomaptibilty with WP and vice versa makes life tough on those who do.*** Page scans allow for an additional layer of safety for any scholar concerned about the adherence to a given print edition, though a certain level of trust in the provider is still required. Thus, while I hope that PG's holdings are as accurate as possible, it would also be my hope that scholars using PG would cite PG. Evidently this is not always the case. Michael Hart wrote: >I've also heard that many of those who complain, actually use our >eBooks in secret, and ONLY want the provenance so they can steal >them without giving credit where credit is due. This suggests to me two things. 1) We can include page scans and information about provenance, _when available_, with the files so that academics can feel confident in the reliability of those PG holdings. Not so that the original sources can be dishonestly cited, but to provide the necessary data for certain scholars to confidently cite PG's edition. We can point to this in our documentation to enhance our scholarly credibility. 2) We can prominently suggest an appropriate style of citation of works in PG's holdings. (I've seen this done with other digital collections.) Perhaps if the citation style also takes into account the original source, some otherwise reluctant scholars would be appeased. Is this something we can all agree on? -- David Newman www.davidnewman.info _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From hart at pglaf.org Sat Nov 13 09:50:34 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:50:36 2004 Subject: [gutvol-d] increasing literacy In-Reply-To: References: Message-ID: On Fri, 12 Nov 2004, Her Serene Highness wrote: > Illiterates rarely use computers for reading. PG would be useful after a > person became literate, i.e., able to read. Even the children's books on PG > are a bit too advanced for a person who is non-litereate. Having taught > reading, it would not be the first place I would turn- it's too text-heavy, > for one thing. Given that most computers today can read eBooks out loud, it's a perfect way to learn how to read, except that you might end up talking like Stephen Hawking. . . . However, given the number of people who learned English and other languages over the short-wave radio, I think there is a real future for this. Obviously it's not "The Young Lady's Illustrated Primer" as in "The Diamond Age," by Neal Stephenson, but it's a start. I have received a number of emails from people who were starting to learn English who found our eBooks very useful. Michael From hart at pglaf.org Sat Nov 13 09:53:22 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:53:23 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: Message-ID: On Fri, 12 Nov 2004, Her Serene Highness wrote: > > > -----Original Message----- > From: gutvol-d-bounces@lists.pglaf.org > [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Dyck > Sent: Friday, November 12, 2004 6:55 PM > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] PG audience > > > Michael Hart wrote: >> >> If we cater to scholars, we are only expanding the "digital divide," >> so to speak. Our goal is to provide a large viable library to all, >> not just to the scholars, who represent less than 1% of the people, >> and are often very elitist. > > I don't think anyone is advocating providing the PG library "just to the > scholars", so that's a strawman. I'm worried that making the eBooks acceptable to scholars may take more effort than simply creating them did, and that then the scholars, libraries, etc., may still opt not to use them or to encourage others to use them. I'm working up a feasibility study on this now, let me know if you have a library/librarian/scholar who is willing to try out a few dozen eBooks with these additional features. Michael S. Hart From hart at pglaf.org Sat Nov 13 09:57:02 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 09:57:04 2004 Subject: [gutvol-d] increasing literacy In-Reply-To: <41954E74.4EFB64DE@ibiblio.org> References: <20041112022413.GB8242@pglaf.org> <4194B4A3.3050305@perathoner.de> <41954E74.4EFB64DE@ibiblio.org> Message-ID: On Fri, 12 Nov 2004, Michael Dyck wrote: > Michael Hart wrote: >> >> If we can increase literacy by even 10%, >> we make more difference than if we cater >> to the scholars. > > We could make even more difference by doing both! > > Setting that aside, do we have any data (or even anecdotal evidence) > re the effect of Project Gutenberg on literacy levels? Lots of schools and home schoolers have sent me messages asking and thanking us for the PG eBooks. . .enough to realize that it is no longer just a dream for these to be used in schooling. As for libraries, I get less of these messages from them, but still find that things are getting started there. mh From mbuch at mcsp.com Sat Nov 13 10:11:08 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Sat Nov 13 10:09:23 2004 Subject: [gutvol-d] Perfection In-Reply-To: <1611002848015.20041113104726@noring.name> Message-ID: This is what I want, too. I want cyber texts to be MORE useful, not less. When libraries went to electronic catelogues, Info geeks cheered- they made libraries efficient. They should have been shot. What they did was throw out the original cards, which had been marked up by librarians and scholars, and which provided clues as to which books were worth reading. The people who cheered did not love books- they loved information. Knowledge and information are very different- knowledge takes time. When people thumb through things, they discover new things- hypertext links can help them do this. Several of you here are academics. Academics who give and process info are not the same a researchers- you don't have the same needs. Research takes time and requires facts on a level that number and word-crunching don't. And Michael- I think you are brilliant in many ways, but you don't even want to provide the amount of information required of a junior high school student writing a social studies paper, let alone a scholar- and I think that's a shame. I shudder to think what you believe scholars do, and why, if you love books so much, you have so high an antipathy for them. Getting books on the web is more than a numbers game. It's about preserving somethng of value. What I'm seing here among some people is a mentality akin to the early archaeologists, who completely destroyed sites in their rush to get trophies for their museums. They were bad scientists and little more than barbarians. Destroying books in order to reach the new numerical goal is not a good thing- it's very, very bad. Michele (yeah, I have a name) -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Jon Noring Sent: Saturday, November 13, 2004 12:47 PM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] Perfection Marcello wrote: > Michael Hart wrote: >> How much harder is it to make an eBook set up to answer all >> these scholarly and reference questions, than just to read? > Providing source information and page numbers is easy. So it is to > provide the page scans. Of course: page scans != ebook. > > Marking up a book to satisfy most scholarly requirements is more work > than I would care for, short of being paid to do it. 1) There are *reasonable* basic requirements, which are not onerous at all, that can be made to make the PG corpus of texts much more useful to academia and scholars. Here are a few that come to mind: a) Provide full catalog info for the source of the digital text. b) Provide the complete set of page scans. (I'm still of the opinion this should be a requirement, with the allowance that scans need not be provided under several defined circumstances.) c) In markup in the Master copy, add markers (plus maybe XLinks) to page breaks found in the source. 2) Any 21st century digital repository of texts should allow the ability of users to annotate, reference, and interlink the texts. This can be done without altering the texts themselves. Thus, the digital repository will do things that no traditional academic library of atomic-based artifacts can do. Thus, scholars themselves will improve the texts to meet their needs -- we need not do everything for them if we give them the tools to do it themselves. Jon Noring _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From mbuch at mcsp.com Sat Nov 13 10:11:11 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Sat Nov 13 10:09:26 2004 Subject: [gutvol-d] increasing literacy In-Reply-To: Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Michael Hart Sent: Saturday, November 13, 2004 12:57 PM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] increasing literacy On Fri, 12 Nov 2004, Michael Dyck wrote: > Michael Hart wrote: >> >> If we can increase literacy by even 10%, >> we make more difference than if we cater >> to the scholars. > > We could make even more difference by doing both! > > Setting that aside, do we have any data (or even anecdotal evidence) > re the effect of Project Gutenberg on literacy levels? Lots of schools and home schoolers have sent me messages asking and thanking us for the PG eBooks. . .enough to realize that it is no longer just a dream for these to be used in schooling. ** Still, that's not the same as increasing litereacy. That's facilitating literacy. Did they say they became smarter or better read? The books are good for schooling- but they could be a hell of a lot better. Having taught every level of school except for elementary (and I've tutored in that), I still say it wouldn't take much to add info that would push PG forward into classroom acceptability.** As for libraries, I get less of these messages from them, but still find that things are getting started there. **How do you find that? Are librarians saying that, or are you? Are they using other textual sites? Why? Do they suggest improvements? What are they?** mh _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From marcello at perathoner.de Sat Nov 13 10:39:12 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 13 10:39:21 2004 Subject: [gutvol-d] PG audience In-Reply-To: References: <41950197.2020707@perathoner.de> Message-ID: <419654D0.3080204@perathoner.de> Michael Hart wrote: > They don't realize the the walls of academia have been penetrated > by the virtual world. . .for them to try to stop eBooks is like > James Watson's efforts to stop Craig Venter from mapping DNA, > or even his efforts to stop the model building Crick 50 years ago. Well, well, capitalism *has* to be good for something. So lets praise capitalism for kicking the clerics in the *** and freeing information from the inprisonment in monasteries ... before we start kicking capitalism in the *** for making information a proprietary article. -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Sat Nov 13 10:41:15 2004 From: hart at pglaf.org (Michael Hart) Date: Sat Nov 13 10:41:16 2004 Subject: !@!@!RE: [gutvol-d] Perfection In-Reply-To: References: Message-ID: On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote: > This is what I want, too. I want cyber texts to be MORE useful, not less. > > When libraries went to electronic catelogues, Info geeks cheered- they made > libraries efficient. They should have been shot. What they did was throw > out the original cards, which had been marked up by librarians and scholars, > and which provided clues as to which books were worth reading. The people > who cheered did not love books- they loved information. Knowledge and > information are very different- knowledge takes time. When people thumb through > things, they discover new things- hypertext links can help them do this. I must admit that I, too, was surprised that ye olde carde catalogues were tossed out like babies with the bathwater. > Several of you here are academics. Academics who give and process info are > not the same a researchers- you don't have the same needs. Research takes > time and requires facts on a level that number and word-crunching don't. More on research below. > And Michael- I think you are brilliant in many ways, but you don't even want > to provide the amount of information required of a junior high school > student writing a social studies paper, let alone a scholar- and I think > that's a shame. I shudder to think what you believe scholars do, and why, > if you love books so much, you have so high an antipathy for them. It's not that I don't believe in this kind of information, it's that I didn't want to provide a different Project Gutenberg eBook for each and every single paper edition out there, and then have to keep canonical errors [sic] in them for all time. I wanted to created a "critical edition" that combined corrections and items from various editions, and we have always supplied the necessary information for citing our eBooks on request, which has apparently never caused any problem either for student or teacher. > Getting books on the web is more than a numbers game. It's about preserving > somethng of value. What I'm seing here among some people is a mentality akin > to the early archaeologists, who completely destroyed sites in their rush to > get trophies for their museums. They were bad scientists and little more > than barbarians. Destroying books in order to reach the new numerical goal is > not a good thing- it's very, very bad. Being a pioneer is different that being a researcher, unless you are Indians Jones, that is, but even he, if you will recall, had his most important work[s] taken away from him repeatedly by both those above and below him on the Darwinian ladder. Me, I'm a pioneer, not a researcher, and I fully warned everyone year after year that I am NOT a cataloguer, and that once we passed 10,000 books this would become a very obvious problem. However, libraries carry all storts of materials that don't come with cataloging information, such as records, CDs, DVDs, pamphlets, paintings, etc. Doubly, however, I am doing some feasibility studies on providing MARC records, and could use some help. Michael Hart From gbnewby at pglaf.org Sat Nov 13 11:09:01 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Nov 13 11:09:04 2004 Subject: [gutvol-d] Page scans (go for it!) In-Reply-To: References: <20041113050417.9F29A8C914@pglaf.org> Message-ID: <20041113190901.GA5711@pglaf.org> On Sat, Nov 13, 2004 at 01:30:44AM -0800, David Newman wrote: > Marcello Perathoner wrote: > >The best value for Academia (and the least work for us) would be just to > >include the page scans. Any transcription you make will fall short of > >the requirements of some scholar. I think we should use our time for > >producing more books for a general audience instead than producing > >Academia-certified editions of them. It occurred to me that some people might think that page scans are forbidden or not welcome. While it's true that we don't have many (any?) eBooks with full page scans, we *are* willing & able & ready to take them. Jim Tinsley did a 'howto' on the page scan naming convention (that is the hard part - so people know what they're called and where to find them). The post-10K directory structure, created over a year ago, includes the notion of a subdir for scans. DP has been invited to submit scans along with their texts. Maybe this word has not gone out sufficiently. Like with the XML markup discussion, the question is not "if" but "how." The first folks to submit scans with their submitted eBooks will need to do some extra work to help figure out the best way to do it. The posting team will need to keep track of the large files involved. If someone has the scans for a completed eBook, now would be a good time to work on getting them online. My estimate from early 2004 was that this would have grown the PG collection by an extra terabyte or so if we did it all through 2004. We haven't, and so this growth hasn't happened. But other than needing to deal with the extra space (which is trivial for a small number of eBooks, but could be challenging for our mirrors and main distribution servers when done en masse), there's no impediment I know of to moving forward. -- Greg From ciesiels at bigpond.net.au Sat Nov 13 11:14:42 2004 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Sat Nov 13 11:15:52 2004 Subject: [gutvol-d] Page scans (go for it!) In-Reply-To: <20041113190901.GA5711@pglaf.org> References: <20041113050417.9F29A8C914@pglaf.org> <20041113190901.GA5711@pglaf.org> Message-ID: <41965D22.3050405@bigpond.net.au> Greg Newby wrote: >DP has been invited to submit scans along with their texts. > http://gutenberg.net/faq/S-21 says: "Page images submitted to Distributed Proofreaders are automatically saved, and, while not publicly available today, will probably become so in the future." I took this to mean that there is no point in submitting page scans of DP projects to PG. Is that right? Mike From gbnewby at pglaf.org Sat Nov 13 11:26:17 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sat Nov 13 11:26:18 2004 Subject: [gutvol-d] Page scans (go for it!) In-Reply-To: <41965D22.3050405@bigpond.net.au> References: <20041113050417.9F29A8C914@pglaf.org> <20041113190901.GA5711@pglaf.org> <41965D22.3050405@bigpond.net.au> Message-ID: <20041113192617.GB6386@pglaf.org> On Sun, Nov 14, 2004 at 06:14:42AM +1100, Michael Ciesielski wrote: > Greg Newby wrote: > > >DP has been invited to submit scans along with their texts. > > > http://gutenberg.net/faq/S-21 says: > > "Page images submitted to Distributed Proofreaders are automatically > saved, and, while not publicly available today, will probably become so > in the future." > > I took this to mean that there is no point in submitting page scans of > DP projects to PG. Is that right? > > Mike Jim Tinsley had sent a note in this thread about this too, that I hadn't yet seen when I wrote my reply. No, it's not right. Yes, we are ready to accept page scans as part of completed eBooks from DP or other sources. Jim said he's already done this for 3 eBooks. I hope Jim can find the little 'howto' he wrote about the file names & formats, otherwise I can dive into my email archive to seek it out. Getting the process tuned will take a few tries, and require some patience from everyone involved, but the intent for quite some time (over a year at least) has been to move forward with scans. -- Greg From jon at noring.name Sat Nov 13 12:06:22 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 12:06:40 2004 Subject: [gutvol-d] Page scans (go for it!) In-Reply-To: <20041113190901.GA5711@pglaf.org> References: <20041113050417.9F29A8C914@pglaf.org> <20041113190901.GA5711@pglaf.org> Message-ID: <221011183281.20041113130622@noring.name> Greg wrote: > It occurred to me that some people might think that page > scans are forbidden or not welcome. While it's true that > we don't have many (any?) eBooks with full page scans, > we *are* willing & able & ready to take them. This is excellent news! Yes, I think people were uncertain about how welcome page scans were by PG. (Whether PG should require page scans be submitted along with texts, with certain exceptions given, is a different issue.) Obviously, if the page scans existed for all the 10,000+ PG texts, the collection of scans would occupy a lot of space, but surprisingly not as much as one might think, at least by today's hardware standards. Assuming we have 15,000 texts, each of which has an average of 300 source pages (which may be a high estimate -- anyone?), and each page scan occupies about 60k (using an efficient lossless compression scheme -- this may also be a high estimate -- anyone?), this works out to approximately a little under 300 gigabytes. (My son recently bought two 200G hard drives for $100 each. There are 300G drives available, and it seems like year after year hard disk capacities continue to increase, while $/gig continues to drop.) I know Brewster Kahle at the Internet Archive will also be happy to receive file copies of these page scans and tuck them away into his archive (which is redundantly mirrored) for preservation and open online access. Of course, with one million scanned books, we are now talking about significant space, approximately 20 terabytes (using the assumptions above). But this is 1/5 of Brewster's "rack" (where 10 racks makes a petabyte) and again I know he'll be thrilled to store these away for safekeeping and open access. (PG should also store these scans itself and find others throughout the world willing to store them on hard disk, tape, etc., to assure redundant storage and preservation.) It would not surprise me to see in a few years high quality, durable, random access, compact, and very cheap storage in the ten to twenty terabyte range per unit -- enough to hold the original page scans for one million books. We then can start thinking about one billion books. So storage and access should NOT be an issue with regards to acquiring the original page scans for the PG Library. Jon Noring From jmdyck at ibiblio.org Sat Nov 13 12:42:56 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Sat Nov 13 12:43:21 2004 Subject: !@!@!RE: [gutvol-d] Perfection References: Message-ID: <419671D0.863F7CE1@ibiblio.org> Michael Hart wrote: > > On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote: "Her Serene Highness" is Michele, but given her email address, I doubt her last name is Dyck. Mine is, though. Michael Hart: > > ... I didn't want to provide a different Project Gutenberg eBook > for each and every single paper edition out there, and then have > to keep canonical errors [sic] in them for all time. You say "didn't". Do you still feel this way? > I wanted to created a "critical edition" that combined corrections > and items from various editions, I'm curious: How many such amalgams has PG produced? What was the latest? > and we have always supplied the necessary information for citing > our eBooks on request, But that's not apparent to someone reading a PG eBook, I think. E.g., the PG boilerplate doesn't have a sentence like: To find out what printed edition(s) this eBook was created from, send a request to someone@pglaf.org. -Michael From nwolcott2 at kreative.net Sat Nov 13 13:21:14 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Nov 13 13:21:38 2004 Subject: [gutvol-d] Perfection References: <20041112234527.EF05E4BE64@ws1-1.us4.outblaze.com> Message-ID: <00c201c4c9c6$bc5f8260$b79495ce@net> I'm working on the 1616 translation of Suetonius by Philemon Holland, still regarded by some as the best translation. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "D. Starner" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, November 12, 2004 6:45 PM Subject: RE: [gutvol-d] Perfection > Let me note, I had no way of telling Greg's comments apart > from yours except for context. Perhaps you relied on some > HTML thing; please don't do so. I'm not going to argue the > wisdom of HTML email, but HTML email that does not degrade > nicely to plain text is going to look awful to many of the > recievers. > > "Her Serene Highness" writes: > > > But a > > citation of an out of print book in anthropology, English literature, the > > hard scieces, et al, which might very well not be correct in its > > information- that will be problematic. > > But this has nothing to do with etexts; this has to do with older books. > > > > > I would be very happy to see Boas online. Eventually I hope to track down > > an out of copywright version of his writings and scan it for PG. > > It'll be a long time, unless you move to Canada. The last of his > works are out of print for another 7 years in the EU and 33 years in > the US. The Bureau of American Ethnology volumes are being worked on > up to 1930 (since it's a US government publication) and I believe that > includes some work by Boas. > > > In chapter 5 > > there might be a very quotable sentence- but what my student doesn't know is > > that this sentence was changed in later editions. And there's no page > > number- does he tell his teacher to read the entire chapter to find a > > sentence that won't be there in a later edition? > > What is he supposed to do, give a page reference to one of a dozen editions > that might be very hard for the teacher to find? With etexts, you know > that your recipent has access to the same edition you have. And as someone > else pointed out, if you quote the sentence, the context can be found in > seconds. > > > After all, I > > have no idea who JM Rodwell was, or whether his translation of The Koran is > > the definitive English version, or why his translation was chosen- other > > than that his book was out of copyright. From my point of view, that's a red > > flag itself. If this translation is so superb, why isn't it still being > > used- or is it? > > And how do I know that if I pull it off the library shelves? My college library > has a half dozen different translations of the Koran; how am I to know which > are in use? > > As for the reason it's not being used, I would suggest that the fact that > academics like to retranslate everything every decade might be an explanation. > My class used a modern translation of the Iliad, but that doesn't mean that > in several hundred years of English translation of the work that's now public > domain, there's not one competent, even superb translation. > > > Nietzsche's work for instance, was butchered by his sister. There are > > conflicting copies of his work floating around. When his works were copied > > for Project Gutenberg, did someone go for an out of copyright copy that is > > definitive, or one that his sister chopped up? Did that matter, or was it > > just more important to get a copy up? > > I doubt that the people who scanned it were aware of the differences. > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Gutenberg9443 at aol.com Sat Nov 13 13:24:19 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Nov 13 13:24:30 2004 Subject: !@!@!RE: [gutvol-d] Perfection Message-ID: <14.384d0968.2ec7d583@aol.com> In a message dated 11/13/2004 1:43:34 PM Mountain Standard Time, jmdyck@ibiblio.org writes: I wanted to created a "critical edition" that combined corrections > and items from various editions, I'm curious: How many such amalgams has PG produced? What was the latest? I don't know how many others, but my version of SWISS FAMILY ROBINSON is such an amalgam. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/3d02184a/attachment.html From Gutenberg9443 at aol.com Sat Nov 13 13:38:19 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Nov 13 13:38:30 2004 Subject: [gutvol-d] Perfection Message-ID: In a message dated 11/13/2004 2:21:47 PM Mountain Standard Time, nwolcott2@kreative.net writes: I'm working on the 1616 translation of Suetonius by Philemon Holland, still regarded by some as the best translation. Alexander Thomson did the translation already posted. Obviously this is one of the cases in which the name of the translator is essential. It does bug me when an etext doesn't give me the original pub date, although I am able to look most of them up. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/01e810df/attachment.html From shalesller at writeme.com Sat Nov 13 14:36:54 2004 From: shalesller at writeme.com (D. Starner) Date: Sat Nov 13 14:37:05 2004 Subject: [gutvol-d] Scholarly use of PG Message-ID: <20041113223654.137CA4BE64@ws1-1.us4.outblaze.com> "Her Serene Highness" writes: > However- and this is a big however- PG is not > 'publishing' books. It's copying them. There is no PG publishing house that > is making decisions on whether something is worth publishing or not. Yes, there is. The people of PG as a whole decide whether something is worth publishing or not. There are many books that the people of PG have decided they aren't worth publishing for right now, because they are too hard to scan, too hard to process, too expensive or too hard to get a copy of, etc. > Textual clues do not live only in words. A book comes alive in typeface, > and in word placement on a page. I've seen a grand total of two modern books that used the long-s and weren't facsimile copies, and one of those only used the long-s to replicate the original title pages, not the original text. Several of the facsimile editions have been upfront about the fact that they do what they do for cost reasons, not because it's better. > Saving > a book while divorcing it from its index, illustrations, typefont, and so on > is not 'saving' it. It's a decontextualization. If you want an original copy, go find it. But every new publication decontextualizes the books. Somehow Beowulf readers are willing to deal with editions that look nothing like the original, that lack its illustrations, its typeface. I've handled books printed in Germany in the mid-18th century, and it's an experience in some ways. But that doesn't mean that I insist that reprints be printed in old-style German fonts on rag paper. > A book may be perfectly good reading material- but an ebook printed > in Courier (which is very hard to read), Then don't print it in Courier. That choice is left to you. > perhaps missing its original > illustrations, without an index that shows the manner in which the author's > or editor's mind worked- is no longer the original book. No, it's not. We don't have matter copiers to replicate the original book. > I recall the cry that vinyl was going the way of the dinosaur- yet it has > not. In fact, the MP3 player is the new vinyl- Vinyl has gone the way of the dinosaur; I think it's down to less than 0.5% of the new material sold is on vinyl, and that in a few limited genres. The MP3 may be the "new vinyl", but it's not vinyl. > However, Napster technology is not better in the long run > than a record- CDs and computer memory degrade at an alarming rate. Records are degraded the instant they're pressed, are impossible to copy, and degrade while playing. Napster made backups on a million computers in a few days. You can manually make backups easily, and take them in the car or while jogging. You seem like you're looking for reasons to attack new technology. It has its faults, but I think the complete superseding of records by CDs is good evidence that CDs are overall better than records. > Books > aren't dead either, and people who think books are about finding passages in > less than 25 seconds are missing the point of why people read That was a challenge you made. > People read because they want a total experience - computers don't feel like > paper. They don't smell. The text is usually flat and more difficult to > read. Some of this will change over time- but not all of it, thank the Lord. That's absurd. People read for a million reasons; there is no one point of why people read. Some readings are for entertainment, some readings are because there's nothing else to do on a long trip, some are of whole books for detailed information, some are of one page for a little piece of information. Yes, some people prefer to read on paper for those things that will never be replicated on computer, but we aren't going into libraries and taking books off the shelves. > (copying things and > leaving out some of the vitals doesn't constitute puplishing in most > people's minds, or at least not in a good way, no matter that info techies > might want to think) My library has dozens of little brown books in a series called "Handy Literal Translations", where they took older translations, dumped most of the plays or speeches, and published them in a handy portable form. Or how about the Augustan Reprint Society which frequently reprinted only the introduction, or select essays from various volumes? > libraries don't cut the covers and > publishing info off their books to make more room on the shelves, Right, they put them on microfilm and throw the newspapers away. My mother has a signed Mark Twain that the library was getting rid of, and someone on DP bought boxes of books at the Sydney University bookfest at $5 a piece. Don't tell me that libraries don't get rid of books. As for the covers, in many academic libraries, a number of books have had their covers removed and replaced with a library binding. I rarely see dust jackets on library books, especially not in university libraries. Decontextualization galore. > they > include books of criticism, How many books of criticism do you usually find in a library of 10,000 to 15,000 books? It's not like we don't include books of criticism, it's just that we don't have many yet. > Whether they understand how people use books or why- well, I > seriously doubt that some people here have thought about that. Right, all these people who love books so much that they would spend their volunteer time working on scanning them and proofing them don't know how people use books or why. They just love books from a distance; they don't actually use them. Furthermore, I don't think you understand how people use ebooks or why. You spend a lot of time in criticism, but a lot of it just wrong. You told us we couldn't find a quote in a large body of text, you tell us that typeface is important when no printed book cares, you complain that an ebook in Courier is hard to read, which is a bit like saying that it's hard to read this book because it's upside down. > its incomaptibilty with WP and vice versa makes > life tough on those who do. Have you ever tried learning Word? Your gripes sound like someone who learned WordPerfect and never bothered to learn Word. To which, may I ask again, that you conform to the standard email quoting standard and trim irrelevant text that you aren't replying to? There are rules which have developed over time for the ease of communication via email, which may at some points be arbitary, but everyone adhering to the standard facilites communicating. Your ignoring of these rules make me feel that it's less of a communication and more your demanding that the computer world bend entirely to conform to your little world. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From vze3rknp at verizon.net Sat Nov 13 15:41:05 2004 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Sat Nov 13 15:41:16 2004 Subject: [gutvol-d] PG audience In-Reply-To: <171000217734.20041113100336@noring.name> References: <41950197.2020707@perathoner.de> <171000217734.20041113100336@noring.name> Message-ID: <41969B91.6060104@verizon.net> I scanned that book and put it through DP around the 4th of July. It is still waiting for someone to decide to post-process it. JulietS Jon Noring wrote: >Brad wrote: > > > >>I asked for a copy of the TEI source for Bradford's History of >>Plymouth Plantation last month from some academic group. They asked >>me to submit a formal request which would explain what I would use >>the text for! >> >> > >Interesting. > >I happen to have a copy of the 1898 printing of `Bradford's History >"Of Plimoth Plantation."' My wife's maternal ancestry goes back to >colonial Massachusetts, and I think one of her ancestors is mentioned >in the book (Degory Priest). > >If this book has not yet been scanned by anyone affiliated with PG/DP, >I'll gladly offer our copy for scanning so long as whatever is used to >scan it will not damage the binding (probably can't use a flat bed >scanner), and that the scans *will* be made available online for free, >even before the work is converted to XML. > >The book, including index, has over 550 pages, so it is pretty massive. > >A fascinating work, btw, and one I hope will be scanned and converted >to TEI by PG/DP. > >Jon Noring > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > From jon at noring.name Sat Nov 13 15:48:35 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 15:49:23 2004 Subject: !@!@!RE: [gutvol-d] Perfection In-Reply-To: References: Message-ID: <1131024517078.20041113164835@noring.name> Michael Hart wrote: > Michele wrote: >> And Michael- I think you are brilliant in many ways, but you don't even want >> to provide the amount of information required of a junior high school >> student writing a social studies paper, let alone a scholar- and I think >> that's a shame. I shudder to think what you believe scholars do, and why, >> if you love books so much, you have so high an antipathy for them. > It's not that I don't believe in this kind of information, it's that I > didn't want to provide a different Project Gutenberg eBook for each and > every single paper edition out there, and then have to keep canonical > errors [sic] in them for all time. > > I wanted to created a "critical edition" that combined corrections > and items from various editions, and we have always supplied the > necessary information for citing our eBooks on request, which has > apparently never caused any problem either for student or teacher. Now I think this is getting us to the core of the various issues being discussed of late. In the early days of PG, when disk space was ultra-expensive (and removable storage was of limited capacity), when volunteers were few, and when the Internet did not yet exist (and when it came into being for the ordinary Joe in the late 1980's with very slow modem access), the idea of PG focusing on producing a "critical edition" of important public domain works for casual reading made a whole lot of sense. However, I believe things have changed so much that this focus needs to be reevaluated. Let's look at the situation today, and tomorrow: (o) Disk space is getting so cheap and of such high capacity that we can now consider it economical for text repositories to hold the high-density original page scan images for *one million books*. When the texts are in high-quality XML, we can hold *billions* of textual works, with no problem. In a decade, we can begin talking about *trillions* of textual works (big and small). There's no longer an issue of which published edition to pick to "represent" a particular Work -- we can have them all online. (o) More and more people have high-speed access to the Internet, allowing fast downloading of books, as well as enabling the technologies to mobilize large numbers of avid volunteers to produce high-quality texts (eventually in XML markup) using Internet-enabled systems such as Distributed Proofreaders. And tomorrow? Here's what I see: (o) We will see Distributed Proofreaders greatly improve in both quality of production (high quality XML output) as well as much greater capacity. It will also be "clonable" by other groups dealing with specific types of publications. I believe we'll see over 1000 major books PER DAY being completed by DP and its various "clones" throughout the world, not to mention innumerable texts of other types. That's a thousand book-length works PER DAY worldwide. Thus, the need for "critical" editions based on technical limitations is no longer an issue. Many works were only issued once anyway, so the etext version *is* the critical edition, but some works were issued in various editions over time -- all of them can now be scanned and placed side-by-side online. Let the end-user decide which one to access, based on their own investigation or by the recommendations of others (advanced systems can be set up to aid in selection -- PG itself can recommend which version the reader should consider first.) It is thus important to preserve the full source information, since end-users will need to know that information, to know what they are getting. If an earlier, more faithful version of the Work is not in the PG system (how would they know unless the versions of the Work already in the system have complete source information?), they can suggest which edition to convert through DP. Ultimately, I hope that PG will cover almost all first and early editions of important works. Another aspect of this issue are submissions of works to PG which are based on original Public Domain works, but which have been substantially modified by the submitter acting as editor, in essence creating a new edition of the Work. For example, my publishing company's version of Sir Richard F. Burton's "Kama Sutra of Vatsyayana", first published in the 1880's, has been significantly edited and modified -- but not expunged in any way -- no content has been removed, but has been moved around to aid with logical organization, plus I've added several annotations to clarify things which Burton inexplicably did not. The publisher intro to this book makes clear what changes were made to the text. For submissions such as this, PG should certainly accept such altered and composite works, but it is important the metadata state clearly this is an "altered" work from the source, or something to that effect, as well as stating what public domain source(s) were used to create the work. (Ideally, PG would have these source works in the PG Library, with the original page scans and the faithful etext versions alongside, so the user of the altered/composite etext will be able to determine, if they want, the alterations which were made to create it.) In summary, I believe PG is making a big mistake going down the road of being a "gatekeeper" or "original publisher" of some sort. It should concentrate on what it does best: locate/acquire, copyright clear, and place online Public Domain (and Creative Commons) texts in high-quality form. Let others do the vetting and recommendations for what should be read. Let PG make it ALL available for free to everyone, everywhere and at all times. Jon Noring From jon at noring.name Sat Nov 13 15:55:33 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 15:56:08 2004 Subject: [gutvol-d] PG audience In-Reply-To: <41969B91.6060104@verizon.net> References: <41950197.2020707@perathoner.de> <171000217734.20041113100336@noring.name> <41969B91.6060104@verizon.net> Message-ID: <1241024934156.20041113165533@noring.name> Juliet wrote: > Jon wrote: >> I happen to have a copy of the 1898 printing of `Bradford's History >> "Of Plimoth Plantation."' My wife's maternal ancestry goes back to >> colonial Massachusetts, and I think one of her ancestors is mentioned >> in the book (Degory Priest). > I scanned that book and put it through DP around the 4th of July. It is > still waiting for someone to decide to post-process it. Great to hear! Thanks for replying. Jon From nwolcott2 at kreative.net Sat Nov 13 14:45:18 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Nov 13 16:03:18 2004 Subject: [gutvol-d] Gone with the wind i s "Gone with the wind" References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph> Message-ID: <00da01c4c9dd$50e11460$b79495ce@net> When an item is removed from a collection, it is apparently removed from all of the archive as well. Hence no GWTW on the archive. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Jeroen Hellingman" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, November 12, 2004 8:30 PM Subject: Re: [gutvol-d] Gone with the wind i s "Gone with the wind" > Norm Wolcott wrote: > > > Sayonara. Apparently all versions of GWTW have disappeared from the net. > > > > nwolcott2@post.harvard.edu Friar > > Wolcott, Gutenberg Abbey, Sherwood Forrest > > > Try the wayback machine, www.archive.org > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From nwolcott2 at kreative.net Sat Nov 13 16:02:51 2004 From: nwolcott2 at kreative.net (Norm Wolcott) Date: Sat Nov 13 16:03:21 2004 Subject: [gutvol-d] Gone with the wind i s "Gone with the wind" References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph> Message-ID: <00db01c4c9dd$51c46f80$b79495ce@net> The volume seems to have been deleted in August 2003. There are some fragments remaining, 1.67/2.24 MB . I believe that when removed from a site they are also removed from the archives. Also many zip files have been removed. Could be that the earlier defective versiions somehow did not get removed. nwolcott2@post.harvard.edu Friar Wolcott, Gutenberg Abbey, Sherwood Forrest ----- Original Message ----- From: "Jeroen Hellingman" To: "Project Gutenberg Volunteer Discussion" Sent: Friday, November 12, 2004 8:30 PM Subject: Re: [gutvol-d] Gone with the wind i s "Gone with the wind" > Norm Wolcott wrote: > > > Sayonara. Apparently all versions of GWTW have disappeared from the net. > > > > nwolcott2@post.harvard.edu Friar > > Wolcott, Gutenberg Abbey, Sherwood Forrest > > > Try the wayback machine, www.archive.org > > Jeroen. > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From j.hagerson at comcast.net Sat Nov 13 16:16:31 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Sat Nov 13 16:16:52 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] Message-ID: <009301c4c9df$349c3940$6401a8c0@enterprise> #define SOMEWHAT_PEEVED 1 I'm a student (or a researcher). I'm writing a paper. I'm planning to cite an e-book. Why don't I just DOWNLOAD the eBook to my computer and append it to my paper? Then, it doesn't make any difference what happens to the eBook left in the ether. TEACHER, HERE IS THE COPY *I* USED. I'm just a lowly volunteer slogging my way through DP producing books. If Jon Noring wants to go off and create "Distributed Definitive Editions," I don't see anything stopping him. I, personally, am not at all interested in that effort, but I'm just a lowly peasant (an *anybody* doing proofreading). What am I missing? #define SOMEWHAT_PEEVED 0 John From Gutenberg9443 at aol.com Sat Nov 13 17:22:12 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Nov 13 17:22:29 2004 Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to take to bed Message-ID: <102.53edde11.2ec80d44@aol.com> In a message dated 11/13/2004 3:37:14 PM Mountain Standard Time, shalesller@writeme.com writes: >>You >>told us we couldn't find a quote in a large body of text, >>you tell us >>that typeface is important when no printed book cares, >>you complain >>that an ebook in Courier is hard to read, which is a bit >>like saying >>that it's hard to read this book because it's upside >>down. Right on. I loathe Courier, so I don't read my ebooks in Courier; I read them on the computer in Times New Roman, and I read them on my ebook reader in Arial, because that's the default. I could change it if I wanted to. As for books to read in bed . . . well, good people, I have wonderful news for you. A version of the Rocket has been revived. FictionWise now has one available for $99 and is actively working on newer versions. Go here for more information: _eBookwise_ (http://www.ebookwise.com/) As for me, why would I (or anybody else) reading in bed want to read a paper book, which has only one right-side-up and doesn't care what position I'm in and how uncomfortable I am trying to read it in that position, when I have a lovely little device that will change its orientation so that if I want to hold it in my left hand or in my right hand or longwise with either side up, it agreeably makes its buttons available to whatever hand I want to use, and it will let me decide which of four positions is "up" right now? Oh yes, and it's backlit, so I don't have to keep an overhead light on. Believe me, you read this kind of book in bed for one hour and you will NEVER want to go back to tree books for reading in bed. (Back to the quotation I started with, I CAN read my book upside-down, because upside-down becomes right-side-up at the click of a button.) As yet, eBookWise does not sell a program for turning your own material into.rb format, but you go to the site below and download the second program, the one that supports USB or serial port, and you can turn your Gutenberg books into .rb books. Then use your eBookWise librarian to import the .rb books and put them into your eBookWise reader. _Rocket eBook Site_ (http://www.rocket-ebook.com/Readers/Software/) In fact, I will make a rash promise that I probably will live to regret, and promise you that when you buy your eBookWise ebook reader, let me know what five PG books you want the most and I'll convert them myself, though they're probably already at Blackmask in .rb format. You can also get .rb books at _Phoenix-Library - A worldwide multiformat ebook library._ (http://www.phoenix-library.org/) Among its other offerings, it has an excellent selection of translating dictionaries from and to several different languages. These are not searchable; instead, when you start to read a book that you know is likely to have words in different languages, you also load the different language, and then whenever you come to a word you don't know you tell your reader what language it is and to look it up, and most of the time the word will be present. I keep French-English and Spanish-English on my reader at all times, and add other dictionaries (which I keep in my ebook library) when I need them. So--you want a good look at the future of ebooks? Brothers and sisters, it's here. Of course technology will improve. It is the job of technology to improve. But every time it does, there will be a sufficient span of time for PG and its descendants to change the filing system into one that can remain readable. Someone used the example of Beowulf. Uh, yeah. That's a good one. I can't possibly read it in its original language, but I can read it in my original language whenever I want it. Some people find Chaucer unreadable. I don't, but I'm glad it's available in modern English for people who can read it only that way. After all, you can't understand Shakespeare unless you read him in the original Klingon. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/d4cc0412/attachment.html From Gutenberg9443 at aol.com Sat Nov 13 17:24:31 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Nov 13 17:24:51 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] Message-ID: In a message dated 11/13/2004 5:16:55 PM Mountain Standard Time, j.hagerson@comcast.net writes: Why don't I just DOWNLOAD the eBook to my computer and append it to my paper? Then, it doesn't make any difference what happens to the eBook left in the ether. TEACHER, HERE IS THE COPY *I* USED. Jon, as one who has graded many college papers, I beg and plead and implore that you provide the teacher the complete URL, not the ebook itself. However, this is a great idea. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/2feb653c/attachment-0001.html From jtinsley at pobox.com Sat Nov 13 18:35:23 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Sat Nov 13 18:35:38 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise> References: <009301c4c9df$349c3940$6401a8c0@enterprise> Message-ID: <20041114023523.GA9326@panix.com> On Sat, Nov 13, 2004 at 06:16:31PM -0600, John Hagerson wrote: >#define SOMEWHAT_PEEVED 1 >I'm a student (or a researcher). I'm writing a paper. I'm planning to cite >an e-book. > >Why don't I just DOWNLOAD the eBook to my computer and append it to my >paper? Then, it doesn't make any difference what happens to the eBook left >in the ether. TEACHER, HERE IS THE COPY *I* USED. > >I'm just a lowly volunteer slogging my way through DP producing books. If >Jon Noring wants to go off and create "Distributed Definitive Editions," I >don't see anything stopping him. I, personally, am not at all interested in >that effort, but I'm just a lowly peasant (an *anybody* doing proofreading). > >What am I missing? >#define SOMEWHAT_PEEVED 0 > > I'm afraid, John, that you are missing a great deal, as am I. We are both rather lowly, and obviously can't grasp the Big Picture that is so clear to those unencumbered by experience of making etexts, and who have lots of time to contemplate the Big Picture since they aren't spending their time actually doing useful work for PG. Until we evolve enough to understand their mighty cogitations, or at least enough to be considered suitable for a post as VP of Marketing, I think we should ignore them, and get back to our humble tasks, while our betters decide what we should be doing. jim From jtinsley at pobox.com Sat Nov 13 19:00:09 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Sat Nov 13 19:00:27 2004 Subject: Marking bold & italics in .txt (was Re: [gutvol-d] a few questions that i don't know the answer to) Message-ID: <20041114030009.GA21613@panix.com> On Fri, 12 Nov 2004 22:34:52 -0500, Joshua Hutchinson wrote: >Jim Tinsley wrote: > >> >>Where bold does need to be rendered in plain text, the current >>most common usage (from DP) is *bold text*. There are times when >>it is appropriate to signify bold, but I have seen some texts >>coming from DP where it has been used unnecessarily -- mostly >>to indicate a sub-heading or chapter title in the book. In >>such a case, where a chapter title is clearly a chapter title >>and on a line by itself, there really is no need to mark it in >>the plain text version as having been bold face in the original. >>I think this practice comes from people pre-marking the text for >>later conversion to HTML, rather than any intent to clutter the >>plain text. >> >> >> >Actually, it is probably there from the OCR pre-processing and was never >removed through all the rounds of proofing and post-processing... why I >feel this is important enough of a distinction that I needed to make a >post about ... I have no idea. Well, at least you cleared up a mystery for me! Thanks! And since I have now been asked twice within the last couple of weeks about bold, I guess that's Frequently enough to go into the FAQ. It's now on the list for the next update. jim From jon at noring.name Sat Nov 13 19:35:06 2004 From: jon at noring.name (Jon Noring) Date: Sat Nov 13 19:35:26 2004 Subject: [gutvol-d] Page Scans versus Real eText? Message-ID: <211038107671.20041113203506@noring.name> [I posted the following to The eBook Community. But clearly most of the real experts in digitizing texts are found right here on gutvol-d, so I'm reposting the message here. I'm especially curious in the economics associated with commercially doing what Michael Hart and PG has done for years and years.] There are essentially two ways to digitize and place online textual works which exist only on paper. This applies, for example, to older public domain books. 1) Digitally scan the publication, and place the resultant page scan images online as the final product. Optionally, these page scans can be OCR'd to produce a raw (uncorrected) searchable index to search for page scans that may be of interest to the user. Additionally, the scanned images can be "packed" into a PDF document for online distribution and viewing, and for offline printing. 2) The publication is converted into real digital text, using either OCR or keying in by hand to produce the raw text. Then, significant human effort is expended to proofread and correct the digital text for any transcribing/OCR and other errors. The resultant cleaned-up text can either be kept in plain text form (traditional Project Gutenberg text), or marked-up into XML documents using some markup vocabulary. (Optionally, the original page scans can be kept along-side the cleaned-up/marked-up text, thereby accruing whatever advantages the first method gives.) Clearly, digital text is superior in many respects to page scan images. The biggest downside is the need to do the laborious human proofing. Online proofing systems such as Distributed Proofreaders have made proofing much easier to do, mobilizing many willing volunteer proofers and providing a convenient Internet interface. However, in discussions with various people on this topic I've not been able to explain, in a cogent and compelling way, all the reasons why the additional effort should be expended to produce high-quality digital text. Some of these people believe that putting the scanned page images online is more than sufficient. So, this is an "Ask TeBC" request for better arguments to use. I hope, too, that it catalyzes interesting discussion on the various aspects associated with the general issue of getting our printed heritage online, which is obviously an ebook-related topic. And this not only applies to books, but to periodicals, newspapers, and many other types of historical documents. ***** Another related question: If I have a typical printed book (say a 300 page fictional work), and I hire a commercial company to convert it into a clean, high-quality digital text with XML markup (e.g., using TEI-Lite), how much would it cost? In the U.S.? Overseas (such as in India)? Anyone know? ***** Jon Noring From mbuch at mcsp.com Sat Nov 13 20:33:31 2004 From: mbuch at mcsp.com (Her Serene Highness) Date: Sat Nov 13 20:31:57 2004 Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to taketo bed In-Reply-To: <102.53edde11.2ec80d44@aol.com> Message-ID: -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Gutenberg9443@aol.com Sent: Saturday, November 13, 2004 8:22 PM To: gutvol-d@lists.pglaf.org Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to taketo bed In a message dated 11/13/2004 3:37:14 PM Mountain Standard Time, shalesller@writeme.com writes: >>You >>told us we couldn't find a quote in a large body of text, >>you tell us >>that typeface is important when no printed book cares, >>you complain >>that an ebook in Courier is hard to read, which is a bit >>like saying >>that it's hard to read this book because it's upside >>down. Right on. I loathe Courier, so I don't read my ebooks in Courier; I read them on the computer in Times New Roman, and I read them on my ebook reader in Arial, because that's the default. I could change it if I wanted to. As for books to read in bed . . . well, good people, I have wonderful news for you. A version of the Rocket has been revived. FictionWise now has one available for $99 and is actively working on newer versions. Go here for more information: >> Again- the people on this list are computer savvy. My mother isn't. My students are limited. None of them have ebook readers. They are not going to spend $99 to read a book, especially one that isn't new.. And lots of people read paper books in bed- and would prefer one over an ebook that will cost them $99 to read, and has a limited number of options. I can go to my local library and get any number of books for free, and curl up in bed with them. I can order even more through interlibrary loan, and get them in a matter of days. I can get new ones and old ones, of my own choice, in a variety of editions- not just what someone has chosen to put on the net. I can get books with illustrations in color or black and white. A book like Alice in Wonderland can be read in versions that contain illos by Rackham, Tenniel, or more modern illustrators- and the books are usually large enough for me to share with a child. For instance, i love childre's books- I was in a store today and read through several by Chris Van Allsburg. I'm sorry I didn't get to the store earlier- he was signing books. i would have bought one for him to sign. I don't think he signs books on machines. Why do I like paper? I don't need batteries or electricity. All I need is sunlight or a candle. I can pack a book in plastic and come back 20 years later to find it in working order- the technology won't have changed. In 100 years I'll be able to read it too, in many cases. I can't read books I once downloaded to 5" floppies. Soon, I won't be able to read books downloaded to 3" floppies. I can't load books to some earlier versions of ebooks. I can however read paper books without having to change the typeface myself, and I can carry them with me anywhere. I don't read twenty books at once. I read one. And I can glance ahead, go back, look at he page next to the one I'm reading, put the book down next to another book and compare the information in the two without having to spend $198, and a whole lot of other things. In NYC where I live, we are in love with technology. We have one of the oldest subway systems. WiFi is very popular among the upper middle classes. We pretty much all carry cellular phones and use them constantly. We live for our iPods and mp3 players. We are wired to the max- and in the subway, on the street and in cafes, we read paper books and paper magazines and paper tabloids. And we don't even have to change the type or pay a class-separating $99 for the right to read. When my local Barnes and Noble was selling ebooks here a few years ago, people in this high tech city, in the shadow of what was then Silicon Alley- very few sold. I look forward to the OQO and some of Sony's new products, but pay $99 to have the right to read old Tom Swift books with no pictures and not even the enticing smell that old books have? When the machine you talk about can't carry the texts I actually need on a regular basis, because I'm an academic? When pretty much all the fun books I love are only in paper form, and have pictures and other temptations to boot?? Good Lord, woman, what on earth do you read? Are you honestly saying that every book you will ever want to read is on a computer? That every book you love is usually out of print and copyright, or is on the level of John Grisham? Are you truly saying that you think the best version of Treasure Island in on a machine, and not between the pages of a book with color illustrations by NC Wyeth? Are you saying that you never look at art books, cookbooks, science books? That you only read popular literature and authors who have been dead for about a century? That the latest information about Africa or Asia was written in 1910? I have a couple of first editions by anthropologists. None of the ones I have are online. They smell musty. I know that somewhere along the line, another anthopologist loved those books like I love them. Not just the words- the books. They have marginalia. The fact that they are marked up makes me love them all the more. One day I will die and someone else will love my books and see the comments I made. They will know what I read- not only the book they will hold, but other books I mentioned i the marginalia. I will be putting a message in a bottle that will turn up in the future. I have other books that are used copies from academic bookstores- the margins told me what Professor So-and-so thought was important to his students. The notes helped me get through grad school. I made my own notes, sole the books, and passed them on. I cannot pass on a $99 machine. The individual books help the people who need them. A machine can only be held by one person and the data can be lost. I have a cookbook that belonged to my mother-in-law, now senile, that she gave to my husband, now dead. It has notes from all three of us, and stains from our cooking. I sometimes take it to bed with me, to check recipes the day before a holiday meal. You must think I'm mad to love a physical book that I will pass down to some relative of mine, who will know from which pages were stained the most what the best recipes are. There are thumb-prints all over it, and it smells vaguely of milk- it holds my favorite quiche recipe. I have another book that is made up of xeroxes and has illustrations on how exactly to prepare certain medieval recipes. The illustrations are important to me. i can hod that and take it to bed, too. When NY lost the Twin Towers a few years ago, i thought of what I would take with me if we were ever bombed and I could get out. My computer was not on the list. I thought of things I could use without a battery or any outside power. In an emergency, I could use my cookbook and read Alice in Wonderland, so I would take them along with a pot and some matches.So--you want a good look at the future of ebooks? Brothers and sisters, it's here. Of course technology will improve. It is the job of technology to improve. But every time it does, there will be a sufficient span of time for PG and its descendants to change the filing system into one that can remain readable. << Someone used the example of Beowulf. Uh, yeah. That's a good one. I can't possibly read it in its original language, but I can read it in my original language whenever I want it. Some people find Chaucer unreadable. I don't, but I'm glad it's available in modern English for people who can read it only that way. >>It's available in modern English in book form. A good modern translation is by Seamus Heaney. The original text is reprinted, for those of us who want to check accuracy. << After all, you can't understand Shakespeare unless you read him in the original Klingon. >>You can- if you're educated. Plenty of people understand Shakespeare. Even high school students. People in Italy can read Shakespeare. Tiny children can also- they could at the beginning of this century. My badly-educated, at-risk high school students were able to understand Shakespeare. If you don't, that says more about you than it does about early modern English. And some of us can even parse Beowulf- with a two-language version (which is how it's usually printed) the average person can read an amazing amount of it in the original, or at least grasp it. Maybe if you stopped reading Star Trk novels as literature, you'd realize you read Shakespeare's language pretty much every day. His turns of phrase are used all the time, and can be understood by people of all economic level who have the desire to read an learn- even people who cannot affor $99 ebooks to read Stephen King novels (not theat Stephen King is bad, but there's more to reading than that).<< Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041113/e56fa2a7/attachment-0001.html From sly at victoria.tc.ca Sun Nov 14 00:14:11 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Nov 14 00:14:30 2004 Subject: !@!@!RE: [gutvol-d] Perfection In-Reply-To: <419671D0.863F7CE1@ibiblio.org> References: <419671D0.863F7CE1@ibiblio.org> Message-ID: On Sat, 13 Nov 2004, Michael Dyck wrote: > > I'm curious: How many such amalgams has PG produced? > What was the latest? > There is no effort to count them, so I doubt you could get a reliable number. One I remember clearly doing was "Roughing it in the Bush" by Susanna Moodie. I used as my basis a text online at another site, which curiously enough, had a very scholarly citation of exactly what it used as its source, although it still had a large number of evident transcription errors. (Also, it said that it was based on the 1852 first edition. Although, learning about the publishing history of this text, I found that there were varying forms of the first edition, as corrections were being made to the plates _during_ the printing process.) Also, on the topic of alamgams, it may help to realize that sometimes corrections are made to a PG text using a different edition than was originally used. Andrew From lofstrom at lava.net Sun Nov 14 00:56:23 2004 From: lofstrom at lava.net (Karen Lofstrom) Date: Sun Nov 14 00:56:43 2004 Subject: [gutvol-d] I'm sorry, I don't get it In-Reply-To: <20041114023523.GA9326@panix.com> References: <009301c4c9df$349c3940$6401a8c0@enterprise> <20041114023523.GA9326@panix.com> Message-ID: On Sat, 13 Nov 2004, Jim Tinsley wrote: > I'm afraid, John, that you are missing a great deal, as am I. We are > both rather lowly, and obviously can't grasp the Big Picture that is > so clear to those unencumbered by experience of making etexts, and who > have lots of time to contemplate the Big Picture since they aren't > spending their time actually doing useful work for PG. I'm one of the people who wants scholar-friendly editions and I've proofread my 14K pages over the last year and half. Not to mention a little scanning and post-processing. -- Karen Lofstrom (Zora on DP) From jeroen at bohol.ph Sun Nov 14 01:59:09 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Sun Nov 14 01:58:52 2004 Subject: [gutvol-d] Gone with the wind i s "Gone with the wind" In-Reply-To: <00db01c4c9dd$51c46f80$b79495ce@net> References: <006d01c4c8cb$e4d38440$069595ce@net> <4195639D.5060600@bohol.ph> <00db01c4c9dd$51c46f80$b79495ce@net> Message-ID: <41972C6D.5010209@bohol.ph> Norm Wolcott wrote: >The volume seems to have been deleted in August 2003. There are some >fragments remaining, 1.67/2.24 MB . I believe that when removed from a site >they are also removed from the archives. Also many zip files have been >removed. Could be that the earlier defective versiions somehow did not get >removed. > > I got a version from there, don't try the zip files, but the txt files (1.0 version) are still there. I managed to get one complete... Now looking into having a Philippines based website set-up. Problem is, most Philippine hosting providers rent their servers in the U.S., very few have servers actually located in the Philippines. Also, a single book is not worth having a separate server at about $12 per month, and we need to discuss beforehand of the hosting provider about copyright policies of the hosting provider. Of course I have a lot of Philippine related works to be added to the collection, but to date they all happily can reside on the PG US server. (That might start once we want to tackle more recent Philippine PD works) In Holland, Bits of Freedom recently placed a long PD copy of one of Multatuli's works on-line, and then, using a yahoo (!) mail account a fake lawyer under the name "Droogstoppel" (should ring a bell with everybody who has read his most books) send a serious looking cease and desist letter to a large number of providers. Most complied immediately, even without notifying the owner, or verifying the copyright status (which was in fact clearly explained with the copy itself). Only one gave the correct answer that the work was public domain, and the request could not be honored. Jeroen. From j.hagerson at comcast.net Sun Nov 14 05:12:09 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Sun Nov 14 05:12:30 2004 Subject: [gutvol-d] What do scholars want? Message-ID: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise> >From the prospective of a peasant. It appears that the most important thing that scholars want is immutability. A dead tree copy of a book can't be changed, so they can go on endlessly about which dead tree copy is "better" than any other dead tree copy (I know where all of the errors are, and you don't, so there!). Even though, eventually, dead tree copies wear out, are burned up in fires, are carelessly discarded, or sold off to make space, etc., etc., they don't change. Therefore, an electronic copy is unacceptable because: 1. Maybe it is not the exact representation of a dead tree copy. This is entirely unacceptable because "my" dead tree copy is better than all of the others. 2. Its URL might change and then I couldn't find it. 3. Worse yet, the URL doesn't change but the text does. (See point 1.) It appears that we need to modify the PG web site to include checksum and CRC data on each of our files to provide a mechanism of verifying that they have not been nefariously modified after download, so "my" electronic copy can be judged the same as "your" electronic copy. I fall back to my earlier point: What would be better when you're submitting research than to include a copy of every item of source material? This is not done with dead trees because we do not have a mechanism to instantly create an exact duplicate of a given piece of material for free in the dead tree world. Such a mechanism does exist in the electronic world. When academia wakes up to this fact, maybe their negativity toward electronic copies will lessen somewhat. From tb at baechler.net Sun Nov 14 07:13:53 2004 From: tb at baechler.net (Tony Baechler) Date: Sun Nov 14 07:12:10 2004 Subject: [gutvol-d] What do scholars want? In-Reply-To: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise> Message-ID: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com> At 07:12 AM 11/14/2004 -0600, you wrote: >It appears that we need to modify the PG web site to include checksum and >CRC data on each of our files to provide a mechanism of verifying that they >have not been nefariously modified after download, so "my" electronic copy >can be judged the same as "your" electronic copy. Yes, but even CRC, hash or md5 values can be forged. All someone would need to do is somehow compromise the PG server. That has happened with a main Debian and gnu server already. How would we make sure that the hashes are real? One solution is gpg signatures, but then someone needs to download and install gpg, a tool to verify the hash, plus the actual text file. The average user won't know how to do this and wouldn't even if they could. Not to mention that the hash and signature process would have to be done every time one byte is changed in the original, such as for correcting errors. From hart at pglaf.org Sun Nov 14 07:37:52 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 07:37:55 2004 Subject: !@!Re: [gutvol-d] What do scholars want? In-Reply-To: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com> References: <5.2.0.9.0.20041114071022.01fe1c50@snoopy2.trkhosting.com> Message-ID: On Sun, 14 Nov 2004, Tony Baechler wrote: > At 07:12 AM 11/14/2004 -0600, you wrote: >> It appears that we need to modify the PG web site to include checksum and >> CRC data on each of our files to provide a mechanism of verifying that >> they >> have not been nefariously modified after download, so "my" electronic copy >> can be judged the same as "your" electronic copy. > > Yes, but even CRC, hash or md5 values can be forged. All someone would need > to do is somehow compromise the PG server. That has happened with a main > Debian and gnu server already. How would we make sure that the hashes are > real? One solution is gpg signatures, but then someone needs to download and > install gpg, a tool to verify the hash, plus the actual text file. The > average user won't know how to do this and wouldn't even if they could. Not > to mention that the hash and signature process would have to be done every > time one byte is changed in the original, such as for correcting errors. Nothing more is needed for this than "compare." This has been discussed widely over the years, and the simple and easy solution, for those who really want to test the files, is simply to get a few copies of the eBook in question from some different sources and test them with any of the various "file compare" programs that come with virtually all operating systems. Thus, even if just one ";" were changed to a ":" it would show up immediately, something that a careful proofreader might still miss. This totally avoids the possibility raise above of forged CRCs or hashes, and eliminated a need for any extra work on eBook preparation. Anyone can run the tests, themselves, without a reliance on outside authorities to tell them if one eBook edition is any different than another, and exactly how different it is. Simple, fast and effective, the way the entire eBook process should be. Michael S. Hart From marcello at perathoner.de Sun Nov 14 07:42:01 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun Nov 14 07:42:06 2004 Subject: [gutvol-d] What do scholars want? In-Reply-To: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise> References: <00a301c4ca4b$8e657bf0$6401a8c0@enterprise> Message-ID: <41977CC9.7040406@perathoner.de> John Hagerson wrote: > It appears that we need to modify the PG web site to include checksum and > CRC data on each of our files to provide a mechanism of verifying that they > have not been nefariously modified after download, so "my" electronic copy > can be judged the same as "your" electronic copy. We already have hashes for all our files. That's the way KaZaa and other P2P networks work. We keep more hashes for every file than you may want to know about: md5, sha1, kazaa, ed2k and tigertree. If you go to the bibrec page and hover over the P2P link you can see them. (or copy the link into an editor) But I still don't understand what good it will do you if you know the hash of the "original" file? -- Marcello Perathoner webmaster@gutenberg.org From hart at pglaf.org Sun Nov 14 09:39:46 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 09:39:48 2004 Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books to taketo bed In-Reply-To: References: Message-ID: Where did this idea that the $99 for the Fictionwise Rocketbook is only for a license to read out of date books come from? Can someone tell us the real deal about this? Thanks! mh From hart at pglaf.org Sun Nov 14 09:54:44 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 09:54:45 2004 Subject: [gutvol-d] Re: [ebook-community] Ask The eBook Community: Page Scans versus Real eText? In-Reply-To: <4196D9ED.4BC8D6F7@hidden-knowledge.com> References: <931037570875.20041113202609@noring.name> <4196D9ED.4BC8D6F7@hidden-knowledge.com> Message-ID: On Sat, 13 Nov 2004, Michael Ward wrote: > > Jon, two major reasons to convert from page scans: > > 1. File size, text vs. jpeg. > 2. Access to the -content- [semantics, meanings, words] > a. Search > b. Indexing > c. Quoting > d. Footnoting > e. Links > > > Michael Ward > Hidden Knowledge Let's not forget the following: 3. Changing the -content- a. Correcting typos and other errors b. Adding footnotes, comments, prefaces, intros, epilogues from other public domain sources; or your own. c. Machine Transltion d. Inserting missing lines, paragraphs, etc. From hart at pglaf.org Sun Nov 14 10:02:39 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 10:02:41 2004 Subject: [gutvol-d] Page Scans versus Real eText? In-Reply-To: <211038107671.20041113203506@noring.name> References: <211038107671.20041113203506@noring.name> Message-ID: Replying to both lists. . .mh On Sat, 13 Nov 2004, Jon Noring wrote: > [I posted the following to The eBook Community. But clearly most of > the real experts in digitizing texts are found right here on gutvol-d, > so I'm reposting the message here. > > I'm especially curious in the economics associated with commercially > doing what Michael Hart and PG has done for years and years.] My guess is that by the time there is a serious commercial eBook industry, say to the point where the next David Letterman and Jay Leno are joking about eBooks then, the way they were about the Web 5-10 years ago, that most of the public domain books will already have been scanned, OCRed, placed online, and will be going through translations into the various languages of the world. If not most, then certainly most of the ones that were easy to find and of general interest. But still, at the rate things have been going over the past 15 years, another 15 years should put us only a few years from this goal, well within sight, before the commercial eBook industry is developed enough to be part of cultural awareness. Michael S. Hart Project Gutenberg From hart at pglaf.org Sun Nov 14 10:22:33 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 10:22:35 2004 Subject: [gutvol-d] Re: [ebook-community] Ask The eBook Community: Page Scans versus Real eText? In-Reply-To: References: <931037570875.20041113202609@noring.name> <4196D9ED.4BC8D6F7@hidden-knowledge.com> Message-ID: oh. . .I forgot perhaps the most important, e. Putting the books in your own favorite font, character size, margination. From hart at pglaf.org Sun Nov 14 10:26:03 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 10:26:06 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise> References: <009301c4c9df$349c3940$6401a8c0@enterprise> Message-ID: On Sat, 13 Nov 2004, John Hagerson wrote: > #define SOMEWHAT_PEEVED 1 > I'm a student (or a researcher). I'm writing a paper. I'm planning to cite > an e-book. > > Why don't I just DOWNLOAD the eBook to my computer and append it to my > paper? Then, it doesn't make any difference what happens to the eBook left > in the ether. TEACHER, HERE IS THE COPY *I* USED. Personally, I like that idea, but for those who don't WANT a footnote the size of an entire book, perhaps putting in a separate file, or disk, would be better. . .though _I_ would be tempted just to put it on my OWN little web page, with my OWN URL: then I don't have to worry about some other problems, such as someone changing the filename, dirname, URL, or deleting the entire book, directory, or site. . .etc. Michael From holden.mcgroin at dsl.pipex.com Sun Nov 14 10:52:19 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Sun Nov 14 10:52:30 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] In-Reply-To: <009301c4c9df$349c3940$6401a8c0@enterprise> References: <009301c4c9df$349c3940$6401a8c0@enterprise> Message-ID: <4197A963.3050204@dsl.pipex.com> John Hagerson wrote: > #define SOMEWHAT_PEEVED 1 > I'm a student (or a researcher). I'm writing a paper. I'm planning to cite > an e-book. > > Why don't I just DOWNLOAD the eBook to my computer and append it to my > paper? Then, it doesn't make any difference what happens to the eBook left > in the ether. TEACHER, HERE IS THE COPY *I* USED. I'm just curious here but which journal would be willing to publish the full text of all references cited? Certainly none of those I've had papers in. Cheers, Holden From Gutenberg9443 at aol.com Sun Nov 14 10:58:28 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Nov 14 10:58:37 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] Message-ID: <1e.386f7476.2ec904d4@aol.com> In a message dated 11/14/2004 11:52:40 AM Mountain Standard Time, holden.mcgroin@dsl.pipex.com writes: >>I'm just curious here but which journal would be willing >>to publish the >>full text of all references cited? Certainly none of those >>I've had >>papers in. None. That's why Michael's suggestion--store your references on a personal Website and put a URL in your bib--is better than this suggestion. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041114/5fc22fe2/attachment.html From hart at pglaf.org Sun Nov 14 11:07:31 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 11:07:33 2004 Subject: [gutvol-d] PG Is Not A Xerox Machine, Yet [was Perfection] In-Reply-To: <1131024517078.20041113164835@noring.name> References: <1131024517078.20041113164835@noring.name> Message-ID: re: Jon Noring reply to "Re: !@!@!RE: [gutvol-d] Perfection: Jon brings up several points that are between the past and the future, and obviously he has some differing points of view as to when each of these events might be placed on the calendar. The obvious point right now is whether Project Gutenberg should be doing several possible editions of each eBook, or should be comparing several different editions and creating our own edition that we hope will eventually be better than any of the previous paper editions. Jon says we should be doing separate editions, due to advances in disk space, download speed, and the time when Distributed Proofing will be doing 1,000 eBooks per day. * If we presume this is going at a rate of about 10 per day [we are at just about 11 per day in reality] and that this rate should be doubling at Moore's Law rates, then we would have this scenario: Bks/ Day Years Date 10 0 2004 20 40 3 2007 80 160 6 2010 320 640 9 2013 1K+ 10 2014+ I agree that when all of these have been integrated into the world of 75% to 90% of even our own portion of the Internet for several years [enough time to do our first eBooks of most of these books] then it will certainly be time to start including variant editions, as we have already done with some of the great works such as those of Shakespeare, Dante, the Bible, etc. In fact, my own estimate of the time we will have 1,000,000 eBooks certainly lies within the realm of Jon's suggested 1,000 per day. By that time, we will probably be finding it harder and harder to track down all the editions we have yet to do, and it will be a matter of very good timing to start in on creating all variants of the editions Jon wants us to have. Hopefully by this time, OCR will be so accurate that the dream of simply using it as one would use a xerox machine, will be closer to reality. In the interim, perhaps we can simply make available various eBook editions that do and don't include any corrections of typos, missing words, lines, paragraphs, etc. This, along with perservation of the original scans, should allow for a timely revision of any and all eBooks we produce. With the aid of various "diff" and "compare" programs, editors can even proofread the same eBook into the various composite or non-composite editions Jon suggests we should have. Anyone who wishes to volunteer to assist Jon in his efforts should let us know, and we will work up a listserver and other support for this effort. Michael S. Hart P.S. The day should eventually come when such efforts are no longer required at the human level, and Jon can simply scan and OCR each separate edition with a sufficient level of accuracy that it could either stand immediately on its own, or do so with only a small amount of human intervention. . .less effort than it may take to work from a previous scan of a different paper variant. From jtinsley at pobox.com Sun Nov 14 11:40:32 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Nov 14 11:40:41 2004 Subject: [gutvol-d] I'm sorry, I don't get it In-Reply-To: References: <009301c4c9df$349c3940$6401a8c0@enterprise> <20041114023523.GA9326@panix.com> Message-ID: <20041114194032.GA17627@panix.com> On Sat, 13 Nov 2004 22:56:23 -1000 (HST), Karen Lofstrom wrote: > >On Sat, 13 Nov 2004, Jim Tinsley wrote: > >> I'm afraid, John, that you are missing a great deal, as am I. We are >> both rather lowly, and obviously can't grasp the Big Picture that is >> so clear to those unencumbered by experience of making etexts, and who >> have lots of time to contemplate the Big Picture since they aren't >> spending their time actually doing useful work for PG. > >I'm one of the people who wants scholar-friendly editions and I've >proofread my 14K pages over the last year and half. Not to mention a >little scanning and post-processing. Yes, you have, and by virtue of that you're due a better answer. But the answer is one you've heard before, so I'm not sure how much good repeating it is going to do. The answer is not PG-specific; it's common wisdom: "If you want something done, do it yourself", or, extrapolating from advice to writers: "Show, don't tell." There's also "The journey of a thousand miles starts with a single footstep", and "Better to light one candle than curse the darkness". Take your pick. Nothing in PG ever happened because someone said that other people should do things. When Michael typed in that first text, it wasn't because some friend bemoaned the lack of e-texts and said that Someone Should Do Something About It. When Charles Franks created DP, it wasn't because anyone nagged him. We're in the middle of trying to hammer out a good XML solution, so that issue is a bit hot, but it didn't get as far as it has, nor will it get resolved, because anyone insisted others should work for their agenda; it got as far as it has because a few people who believed in it as a way forward actually rolled their sleeves up and did the work. And they are the ones who are going to make it happen. I could give many examples of smaller ways in which PG changed, and all of them had some person or persons behind them, who actually did the work, because they thought the work should be done. Often, people think they've got a good idea. Sometimes they work toward it. Of those who work toward it, some give up, some discover that it wasn't a good idea after all, and some show that it was a good idea, and find others to join them in the work. Very often, somebody identifies a need, but identifying that need doesn't cause progress all by itself. Charles thought, back in 2000, that page images would become desirable in the future. This wasn't revolutionary -- people had been talking about it in PG for some time -- but the difference was that he _did it_. And so PG will have, someday, when we work out the mechanisms, all or most of the page scans that went through DP. I thought, a year or so ago, that page scans were about to become practical, and we built it into the new filesystem structure, and I worked with a couple of producers to get samples posted. Now we've got some posted, and there are going to be more. If you like that idea, work towards it! If you don't, ignore it. I'm personally not convinced that _I_ should spend the hours of my life pursuing your agenda. If you want to convince me that I should, show (don't tell) me what such a "scholar-friendly" program and its output would look like; because you surely are not going to convince me any other way*. [Footnote *: And that by itself might not be enough either; I'd want to believe that academics really would _use_ it. And BTW, I really wish it were possible to ask Dorothy Parker to define the difference between an academic and a scholar. I bet it'd be a good one! :-) ] If I wanted to do something towards a "scholar-friendly" PG, I'd first draw up a list of specifications for a candidate text, and then compare them against Charlz's writings about an OLS, against "competitor" sites like Bartleby and Perseus (and the OTA, if I felt in need of a laugh!) and against any statements I could find made by academics, and revise my list accordingly. I would then seek out and collar one or more academics, and ask them what would persuade them to use etexts in their work -- say, as the prescribed edition of a class text. I might well have made what I felt was a good example of what they would want to show them, so they could criticize by comparison -- blind laundry-lists of requirements are often unhelpful. I would incorporate what I thought best from that experience into a prototype "scholarly e-text". I would show this to anyone who would look, and several people whose eyes I'd have to tape open for the experience, in PG or outside, and ask for comments and for other people to join me. Around that time, I would start to get an idea whether I was on the right track or not. Given that we have 14,000 texts available, and that starting from any one of them massively cuts the work needed to produce some form of whatever you think of as a "scholarly e-text", it is obscenely frustrating to hear people exhort _me_ to work toward _their_ goals, when all the material is there for them to do it themselves. But you, at least, have helped towards getting those 14,000 books posted, and I therefore credit you with some standing, and grasp of what is involved. If you want to work toward your goal, you'll have to convince me by showing, but you won't have to tape my eyes open. :-) jim From hart at pglaf.org Sun Nov 14 11:40:45 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 11:40:47 2004 Subject: !@!@!RE: [gutvol-d] Perfection In-Reply-To: <419671D0.863F7CE1@ibiblio.org> References: <419671D0.863F7CE1@ibiblio.org> Message-ID: On Sat, 13 Nov 2004, Michael Dyck wrote: > Michael Hart wrote: >> >> On Sat, 13 Nov 2004, Her Serene Highness [Michele Dyck?] wrote: > > "Her Serene Highness" is Michele, but given her email address, > I doubt her last name is Dyck. Mine is, though. OK, then I'm still a little in the dark, as we have one other "Her Serene Highess" who has contributed as well. . . . > > Michael Hart: >> >> ... I didn't want to provide a different Project Gutenberg eBook >> for each and every single paper edition out there, and then have >> to keep canonical errors [sic] in them for all time. > > You say "didn't". Do you still feel this way? Eventually, when OCR is about as good as xeroxing, then it shouldn't be much effort to scan multiple editions. See previous note w/ xerox in header. >> I wanted to created a "critical edition" that combined corrections >> and items from various editions, > > I'm curious: How many such amalgams has PG produced? > What was the latest? Couldn't tell you, but every time a new proofer sends in errors, it's more likely some were researched from a different edition. >> and we have always supplied the necessary information for citing >> our eBooks on request, > > But that's not apparent to someone reading a PG eBook, I think. > E.g., the PG boilerplate doesn't have a sentence like: > To find out what printed edition(s) this eBook was > created from, send a request to someone@pglaf.org. Usually they just send an email asking how to cite, and I send: Bibliographic information comes from any full record displayed by either: the Project Gutenberg Search Engine (http://promo.net/cgi-promo/pg/t9.cgi) Project Gutenberg Catalog Browser (http://promo.net/cgi-promo/pg/cat.cgi). For an example, if you use Canterbury Tales from our collection, you'll get the following card information: AUTHOR: Chaucer, Geoffrey, circa 1340-1400 AKA: ADD. AUTHOR: Purves, D. Laing, Editor -- TITLE: Canterbury Tales, and Other Poems SUBJECT: LOC CLASS: PR -- NOTES: LANGUAGE: English - DOWNLOAD: cbtls10.txt - 1.62 MB cbtls10.zip - 641 KB Chaucer, Geoffrey, circa 1340-1400. - 2000. - Canterbury Tales, and Other Poems - Urbana, Illinois (USA): Project Gutenberg. Etext #2383. - First Release: Nov 2000 - ID:2862 Where the last three lines should be your bibiliographic information. Hope this helps, So nice to hear from you!! Michael S. Hart Project Gutenberg "*Ask Dr. Internet*" Executive Coordinator "*Internet User ~#100*" From hart at pglaf.org Sun Nov 14 11:43:08 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 11:43:09 2004 Subject: [gutvol-d] Page scans (go for it!) In-Reply-To: <221011183281.20041113130622@noring.name> References: <20041113050417.9F29A8C914@pglaf.org> <20041113190901.GA5711@pglaf.org> <221011183281.20041113130622@noring.name> Message-ID: I thought we had a plan to save all page scans nearly a year ago. Greg told me he thought that Charles Franks had them, but both are on the road/vacation right now, so not sure how to check. We'll see. Michael From hart at pglaf.org Sun Nov 14 11:45:42 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 11:45:44 2004 Subject: [gutvol-d] PG audience In-Reply-To: <419654D0.3080204@perathoner.de> References: <41950197.2020707@perathoner.de> <419654D0.3080204@perathoner.de> Message-ID: On Sat, 13 Nov 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> They don't realize the the walls of academia have been penetrated >> by the virtual world. . .for them to try to stop eBooks is like >> James Watson's efforts to stop Craig Venter from mapping DNA, >> or even his efforts to stop the model building Crick 50 years ago. > > Well, well, capitalism *has* to be good for something. > > So lets praise capitalism for kicking the clerics in the *** and freeing > information from the inprisonment in monasteries ... before we start kicking > capitalism in the *** for making information a proprietary article. I'm not sure ANY of the above was done via capitalism. . . . Certainly not Watson, Crick, Venter. . .or PG eBooks. . . . ;-) From hart at pglaf.org Sun Nov 14 11:49:35 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 14 11:49:37 2004 Subject: [gutvol-d] Perfection In-Reply-To: <4196437F.9080905@perathoner.de> References: <20041113050358.471844BE64@ws1-1.us4.outblaze.com> <200411130616.iAD6GcSm004979@posso.dm.unipi.it> <4196437F.9080905@perathoner.de> Message-ID: On Sat, 13 Nov 2004, Marcello Perathoner wrote: > Michael Hart wrote: > >> How much harder is it to make an eBook set up to answer all >> these scholarly and reference questions, than just to read? > > Providing source information and page numbers is easy. So it is to provide > the page scans. Of course: page scans != ebook. > > Marking up a book to satisfy most scholarly requirements is more work than I > would care for, short of being paid to do it. A big bug/feature for me is page number, bold, italic, underscore, etc., I would prefer an eBook without them. . .they are just too distracting, I just want to read the CONTENT not the FORM. I have heard people mention that creating both kinds of eBooks should be easy from one session, but I'm not sure if anyone is DOING it. BTW, bold, italic, etc., also mess up a lot of search/quotes. Michael From shalesller at writeme.com Sun Nov 14 12:56:27 2004 From: shalesller at writeme.com (D. Starner) Date: Sun Nov 14 12:56:38 2004 Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books totaketo bed Message-ID: <20041114205627.DB4B84BE64@ws1-1.us4.outblaze.com> "Her Serene Highness" writes: > Are you honestly saying that every book you will ever want > to read is on a computer?? Why are you so violent about this? Why can't you understand that no one here is planning on torching the libraries, that ebooks and paper books aren't exclusive? > You must think I'm mad to love a physical book that > I will pass down to some relative of mine, Why do you assume that? > My computer was not on the list. And neither were most of your books. Your books are, of course, naturally inferior. Under even moderate enviromental conditions, they will fade away in a hundred years, a few hundred years at the best. Even in libraries they yellow and fade. They don't have the right smell, they don't feel right in the hand. That's why they will never supersede stone tablets. >It's available in modern English in book form.? It's available in modern English in ebook form, too. The original text is also available in ebook form, both from Project Gutenberg. >> You can't understand Shakespeare until you read him in the original Klingon. >You can- if you're educated.? Bah! The poorly translated English versions are but mere shadows of the originals in the Warrior's Tongue! Your human biased education merely blinds you to that fact! > Maybe if you stopped reading Star Trk novels as literature, you'd > realize you read Shakespeare's language pretty much every day. Perhaps if you started reading Star Trek novels, you would realize that reading doesn't have to be serious, and that our dreams aren't circumscribed by the concepts of the Elizabetheans and Victorians, that there is a wonderful future ahead of us, but it may require letting go of our death-grip on the things of the past. We may have a home on the moon or Mars sometime in the near future, but if we do, the library will be composed of ebooks, not paper books. In a world where every pound costs hundreds or thousands of dollars to move, ebooks are a godsend. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From j.hagerson at comcast.net Sun Nov 14 14:22:40 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Sun Nov 14 14:23:10 2004 Subject: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience, etc.] In-Reply-To: <1e.386f7476.2ec904d4@aol.com> Message-ID: <00ab01c4ca98$7bffadd0$6401a8c0@enterprise> The journals of the future, in the unlimited storage world that has been postulated, will be delighted to publish articles with the complete text of every cited source. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Gutenberg9443@aol.com Sent: Sunday, November 14, 2004 12:58 PM To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] I'm sorry, I don't get it [Was: Perfection, PG Audience,etc.] In a message dated 11/14/2004 11:52:40 AM Mountain Standard Time, holden.mcgroin@dsl.pipex.com writes: >>I'm just curious here but which journal would be willing >>to publish the >>full text of all references cited? Certainly none of those >>I've had >>papers in. None. That's why Michael's suggestion--store your references on a personal Website and put a URL in your bib--is better than this suggestion. ? Anne From kris at transitory.org Sun Nov 14 21:24:53 2004 From: kris at transitory.org (kris foster) Date: Sun Nov 14 21:25:09 2004 Subject: [gutvol-d] Perfection In-Reply-To: <419613D7.4080907@perathoner.de> References: <20041112221426.87CDD10997C@ws6-4.us4.outblaze.com> <20041112182216.Y99646@krweb.net> <419613D7.4080907@perathoner.de> Message-ID: <20041115000421.W99646@krweb.net> > What makes medium permanence a value per se ? (I agree with the remainder of your reply) My argument is that the transience in electronic media is more deep than that of books (hopefully it's safe to leave stone tablets out of this). Beyond bit rot or pages fading, ebooks can be altered more easily than books both intentionally and unintentionally. The source of the texts, GP mirrors and publishers, perish yet only a publisher's book remains. And to be a little silly, the internet has shown it can survive for several decades, books have been proven for hundreds of years. It was demonstrated in the message I replied to how quickly and easily ebooks can be used to find quotations. An electronic citation then becomes little more than a convenience and an advertisement, which will likely have a shorter life span than the paper itself. Are people ready to put their academic necks on the line? To be constructive, how are the GP mirrors monitored to ensure consistency today? --kris > Academia has developed its traditions around a medium (papyrus, paper) that > is permanent. Not the other way around. If the medium they had used was > impermanent the methods and traditions of Academia would be different today. > > > Medium permanence can be a big disadvantage too. The scholars in the middle > ages relied blindly on Aristotle. Scientific method in the middle ages > amounted to find out what Aristotle said about some subject, and that was > that. Own research was not deemed a scientific method. > > Of course, Aristotle said that "wood swims and metal sinks" and that "heavier > items fall faster than lighter ones". > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From joshua at hutchinson.net Mon Nov 15 05:25:49 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Nov 15 05:25:52 2004 Subject: [gutvol-d] Perfection Message-ID: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com> ----- Original Message ----- From: Michael Hart > > Question: > > How much harder is it to make an eBook set up to answer all > these scholarly and reference questions, than just to read? > > Michael > As far as the ones produced at DP... neglible. A few seconds to a few minutes time to include the information. (We basically already have it, just needs some formatting applied). Josh From joshua at hutchinson.net Mon Nov 15 06:01:28 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Nov 15 06:01:30 2004 Subject: [gutvol-d] future of ebooks + Scholarly use of PG + books totaketo bed Message-ID: <20041115140128.4F9F14F491@ws6-5.us4.outblaze.com> ----- Original Message ----- From: "Her Serene Highness" > > After all, you can't understand Shakespeare unless you read him in the > original Klingon. > > >>You can- if you're educated. Plenty of people understand Shakespeare. > Even high school students. People in Italy can read Shakespeare. Tiny > children can also- they could at the beginning of this century. My > badly-educated, at-risk high school students were able to understand > Shakespeare. If you don't, that says more about you than it does about > early modern English. And some of us can even parse Beowulf- with a > two-language version (which is how it's usually printed) the average person > can read an amazing amount of it in the original, or at least grasp it. > Maybe if you stopped reading Star Trk novels as literature, you'd realize > you read Shakespeare's language pretty much every day. His turns of phrase > are used all the time, and can be understood by people of all economic level > who have the desire to read an learn- even people who cannot affor $99 > ebooks to read Stephen King novels (not theat Stephen King is bad, but > there's more to reading than that).<< > First of all, Anne's comment was an off-the-cuff, tongue-in-cheek ... JOKE. Second, I wouldn't make fun of Star Trek fans. They tend to be higher educated and have read more things like Shakespeare than the average Joe (I don't have the link to source of the information, but I remember reading it somewhere). Third, your comments are really making you sound like an academic elitist. And I don't think you are or mean to be. Josh From j.hagerson at comcast.net Mon Nov 15 06:26:05 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Mon Nov 15 06:26:22 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? Message-ID: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> I am using wget to download books from www.gutenberg.org. The process is stuck on etext04 in what appears to be a futile effort to download index.html. The file must have been there last night, because I didn't have this problem. Could the appropriate person please look into this? Thank you very much. From gbnewby at pglaf.org Mon Nov 15 07:35:03 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Nov 15 07:35:05 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? In-Reply-To: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> Message-ID: <20041115153503.GA13757@pglaf.org> On Mon, Nov 15, 2004 at 08:26:05AM -0600, John Hagerson wrote: > I am using wget to download books from www.gutenberg.org. The process is > stuck on etext04 in what appears to be a futile effort to download > index.html. > > The file must have been there last night, because I didn't have this > problem. > > Could the appropriate person please look into this? There's no index.html currently. -- gbn From hart at pglaf.org Mon Nov 15 08:52:43 2004 From: hart at pglaf.org (Michael Hart) Date: Mon Nov 15 08:52:46 2004 Subject: [gutvol-d] Perfection In-Reply-To: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com> References: <20041115132549.C606D9E751@ws6-2.us4.outblaze.com> Message-ID: On Mon, 15 Nov 2004, Joshua Hutchinson wrote: > > ----- Original Message ----- > From: Michael Hart >> >> Question: >> >> How much harder is it to make an eBook set up to answer all >> these scholarly and reference questions, than just to read? >> >> Michael >> > > As far as the ones produced at DP... neglible. A few seconds to a few > minutes time to include the information. (We basically already have it, just > needs some formatting applied). > > Josh > Then lets run a few dozen of these up the flagpole, and see what happens. . . . Michael From marcello at perathoner.de Mon Nov 15 09:13:06 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon Nov 15 09:13:17 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? In-Reply-To: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> Message-ID: <4198E3A2.6080504@perathoner.de> John Hagerson wrote: > I am using wget to download books from www.gutenberg.org. The process is > stuck on etext04 in what appears to be a futile effort to download > index.html. The indexes are auto-generated on the fly by Apache. If the load on the fileservers is too high the connection times out before a full directory listing can be retrieved. You should not harvest at peak hours anyway. -- Marcello Perathoner webmaster@gutenberg.org From gbnewby at pglaf.org Mon Nov 15 09:45:08 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Nov 15 09:45:10 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? In-Reply-To: <4198E3A2.6080504@perathoner.de> References: <00be01c4cb1f$0dae7be0$6401a8c0@enterprise> <4198E3A2.6080504@perathoner.de> Message-ID: <20041115174508.GB17511@pglaf.org> On Mon, Nov 15, 2004 at 06:13:06PM +0100, Marcello Perathoner wrote: > John Hagerson wrote: > > >I am using wget to download books from www.gutenberg.org. The process is > >stuck on etext04 in what appears to be a futile effort to download > >index.html. > > The indexes are auto-generated on the fly by Apache. > > If the load on the fileservers is too high the connection times out > before a full directory listing can be retrieved. > > You should not harvest at peak hours anyway. One more thing (or two): - you can't get the big directories via FTP. Use HTTP. (The FTP servers stop after 2K items). - Don't use HTTP, use rsync. See the mirroring HOWTO at gutenberg.org/howto for more info (yes, you can use rsync to just get particular directories, filename extensions, etc.). But if things are still weird, send something we can replicate and we'll help fix it! -- gbn From Gutenberg9443 at aol.com Mon Nov 15 09:58:38 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Nov 15 09:58:48 2004 Subject: [gutvol-d] future of ebooks + books to take to bed Message-ID: In a message dated 11/15/2004 7:02:02 AM Mountain Standard Time, joshua@hutchinson.net writes: Second, I wouldn't make fun of Star Trek fans. They tend to be higher educated and have read more things like Shakespeare than the average Joe (I don't have the link to source of the information, but I remember reading it somewhere). I'm going to ramble about for a while here, but I am getting back to the topic eventually. So please put up with me. My experience as a writing teacher, middle school through university, has been that the more fantasy the student reads, the better the student's vocabulary is. Fantasy is the only popular genre, so far as I know, that revels in vocabulary. Also, a person who plays Dungeons and Dragons is likely to know more about comparative mythology than anybody else out of graduate school. A good player of D&D reads extensively. I don't play Dungeons and Dragons, but my son does. He's schizophrenic, and his mind flies off in all directions. When he was a child, if I told him to clean his room, he would sit down on the floor and cry, because he couldn't break "clean your room" into its component tasks. But if I told him, "Put your books away, put your clothes away, put your toys away, make your bed, and vacuum your floor," he could do all that. When he first got into D&D, he spent a lot of time at the kitchen table drawing dungeons on graph paper. I kept him supplied with graph paper. Later, as I happened to be driving from Fort Worth to Dallas, he spent the entire trip cross-examining me on comparative mythology. I got most of the questions right. At that time he was in middle school and I had an MA. Twenty-five years of D&D later, most of them as an advanced Dungeon-master, he is director of parking lot security at a major stock-car race. By the time he got through examining the overall situation, deploying his personnel optimally, and keeping an eye on all his personnel and everything that happened in his jurisdiction, the race track director said that he (my son) had done the best job of policing the parking lot that he (race track director) had ever seen, and my son was instantly signed to bring his crew back the next year. Playing a much-maligned game, and reading much-maligned "junk" genre fiction, taught him sequence, analysis, and synthesis. I don't call any books except pornography junk. Even if a kid is only reading Sweet Valley High, at least the kid is READING. My opinion is that there is one main reason why many kids nowadays don't have the respect for the written word that kids several generations ago had: they don't have time to read. Our youngest daughter, about halfway through seventh grade, began begging to be homeschooled. My husband and I vetoed it, until the end of the year. At that time I gave her a few formal and informal tests and was absolutely appalled. She had learned nothing, despite making decent grades. We immediately granted her request, and we had to back her up to third-grade math and have her work forward. One day the weather was thoroughly icky, and she was in her room. She came to me and said, "Mom, a funny thing just happened." When I asked what it was, she said, "Well, I thought I had read just a few pages, but then I found that I was at the end of the book, and then I looked at the clock and I had been reading an hour." I said, "Congratulations, my child. You have learned to read." Of course she indignantly pointed out that she had been reading since the first grade. I said, "No, you haven't been reading. You've been sounding out words, and that was taking so much of your mental energy that you didn't have time to concentrate on what the words meant." How many kids, today, have an hour--or half an hour--or even fifteen minutes--of uninterrupted reading time? This problem can't be solved by the schools; the answer has to come in the homes. The one-eyed monster in the living room has an off switch; it even has an electrical cord that can be unplugged. Once I got so sick of my children arguing about it that I put the television in the attic for three months. They still argued, but now it was over which one of them got to play the piano first. Their misbehaviors got more interesting; I remember once telling Liz that she absolutely could not read the Bible any more until she had finished washing the dishes, and then thinking how happy other parents would be to have the problems I had. Those three months broke the addiction, and they watched TV after that only rarely and for something in particular that they were following--not sitcoms and soap operas. Computers are dandy, but a kid who is addicted to the computer must be required to spend at least half an hour a day reading a book of his or her choice ON THE COMPUTER. This way, the child learns that reading and computers aren't irreconcilable. As a volunteer online tutor, I have many students asking me where they can find such-and-such a book online. If it is public domain, I look it up--preferably on PG--and give the student a link to it. But often the student is asking for a book that is still in copyright, and I have to explain that one has t o go to a REAL library for that book. So this is what I mean when I say that when these kids are adults, about ten years from now, they are going to demand computerized books and they are going to get computerized books. Somebody last week mentioned Luddites; I am aware that many people my age (61) are Luddites about computers, but my state--Utah--has the highest percentage of "wired" households of any state in the Union. I do not think that a person who refuses to think about reading computerized books is a Luddite, but I do think that person is not well informed. I think that if that person would try out a Rocket for a week, preferably a week which included several days in bed for flu or recovery from surgery or something like that, that person would never go back to paper books for anything that was available electronically. But notice that I said "think." I could be wrong. I live in a rather small house--definitely too small to be running three businesses from. But given technology that exists RIGHT NOW, everything in the Library of Congress would fit into my house. Everything in the Salt Lake City Library and the Salt Lake County Library and the University of Utah libraries would fit into one bookcase in my office. There are books that, for very good reason, I own in both silicon and dead tree formats. But when my grandchildren are the age I am now, they will think that having all those dead tree books around is a stupid, space-wasting, fire hazard. So what I'm getting at is this: We can't possibly guess the future of ebooks. It's bigger than any of us think it is. Even the best science fiction writers never guessed how we would use computers, and we're still on the edge of that, also. It is really absurd to worry about errors made ten years, even five years, ago. Resolve not to make those mistakes again, but go forward, not back. Don't try to figure out who made the mistakes. That doesn't matter. Go forward, not back. The first telegraph message, on 24 May, 1844, said, "What hath God wrought?" Now, 160 years later, we gripe if television from Mars or Ganymede is a little fuzzy. I don't want to offend the atheists on this ML, but--God is still wrighting. We're part of that process. I won't live to see PG's 60th birthday, but some of you will. What, by then, will God have wrought, using our hands to do the work? But we'll never get it done if we spend all our time squabbling about what somebody should have done five or ten or fifteen years ago. Go forward, not back. My husband has instructed me not to get into any more flame wars because they upset me too much. So I'm going back into watching status, where I spend most of my time anyway. But, good people--and you are good people, all of you, because you wouldn't be pouring your heart and mind and time into this work if you weren't--stop looking behind you. The action is in front of you. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041115/0d953c36/attachment-0001.html From stephen.thomas at adelaide.edu.au Mon Nov 15 16:10:10 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Mon Nov 15 16:10:24 2004 Subject: [gutvol-d] future of ebooks + books to take to bed In-Reply-To: References: Message-ID: <41994562.9020801@adelaide.edu.au> Gutenberg9443@aol.com wrote: > ... Go forward, not back. Well said, Anne, and thank you for your many salient points. One thing jumped out at me: > .... But often the > student is asking for a book that is still in copyright, and I have to > explain that one has to go to a REAL library for that book. I believe that PG now has the right to be called a library -- even a REAL library. It fulfills the major criteria for a library: it has a large collection of books, and it has a catalog (online) through which patrons may locate items in the collection. True, there are deficiencies, but you'll find similar in any library -- even the Library of Congress is somewhat less than perfect. ;-) My own library, at the University of Adelaide, still has tens of thousands of brief catalogue records (out of 1.5M) -- we're cleaning them up as funds permit, but its a twenty year project. We're all just doing the best we can with the resources available, which is all anyone can ask. So perhaps we should start referring to "The Project Gutenberg Library" instead of simply "Project Gutenberg"? Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From j.hagerson at comcast.net Mon Nov 15 17:21:55 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Mon Nov 15 17:22:18 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? In-Reply-To: <4198E3A2.6080504@perathoner.de> Message-ID: <00ca01c4cb7a$ac6d7640$6401a8c0@enterprise> Well, not knowing what to do, I went to the Robots Readme on the Gutenberg.org web site and copied the wget command listed under the heading "Getting All EBook Files." I started this process on Sunday evening, at the end of a cable modem. Little did I realize that more than 24 hours later, the process would still be running. In a private message, I was told to use rsync. OK. If rsync is the preferred method, then why is wget presented as the example? It appears that I'm storing a bunch of index.html files that are redundant if I use rsync. I guess I can clean them up at my leisure. However, again the web page says "keep the html files" to make re-roboting faster. Well, I'll be a mirror site for all of the ZIP and HTML files, anyway. Please post suggestions here or pm me. Thank you. -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner Sent: Monday, November 15, 2004 11:13 AM To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] [etext04|etext05]/index.html missing? John Hagerson wrote: > I am using wget to download books from www.gutenberg.org. The process is > stuck on etext04 in what appears to be a futile effort to download > index.html. The indexes are auto-generated on the fly by Apache. If the load on the fileservers is too high the connection times out before a full directory listing can be retrieved. You should not harvest at peak hours anyway. -- Marcello Perathoner webmaster@gutenberg.org _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From Gutenberg9443 at aol.com Mon Nov 15 22:00:09 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Nov 15 22:00:28 2004 Subject: [gutvol-d] future of ebooks + books to take to bed Message-ID: <1a3.2bc1f095.2ecaf169@aol.com> In a message dated 11/15/2004 5:10:38 PM Mountain Standard Time, stephen.thomas@adelaide.edu.au writes: So perhaps we should start referring to "The Project Gutenberg Library" instead of simply "Project Gutenberg"? I usually describe it as the world's free public library. Of course, I meant that I have to send the kids to a dead tree library. You're quite right. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041116/6bb19ec6/attachment.html From j.hagerson at comcast.net Wed Nov 17 11:06:24 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Wed Nov 17 11:06:42 2004 Subject: [gutvol-d] PG content on Guntella? Please test. Message-ID: <000701c4ccd8$8bc4a8a0$6401a8c0@enterprise> I believe I'm set up to provide ZIP and HTML files of PG content published prior to 13-NOV-2004 through the Gnutella network. If anyone would care to test this hypothesis and tell me if you can successfully access a file, I would like to know. Thank you very much. John Hagerson (my IP address is 24.14.124.xxx to know if the file came from me) From joshua at hutchinson.net Wed Nov 17 11:19:20 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Wed Nov 17 11:19:34 2004 Subject: [gutvol-d] PG content on Guntella? Please test. Message-ID: <20041117191920.237059E899@ws6-2.us4.outblaze.com> There was some effort a while back to support P2P. *searching* Here is the PG webpage: http://www.gutenberg.org/howto/p2p-howto I never played with it to tell you if it worked at all (I'm at work now and can't test it), but it looks like we've been seeding the p2p networks already. Josh ----- Original Message ----- From: "John Hagerson" To: "'Project Gutenberg Volunteer Discussion'" Subject: [gutvol-d] PG content on Guntella? Please test. Date: Wed, 17 Nov 2004 13:06:24 -0600 > > I believe I'm set up to provide ZIP and HTML files of PG content published > prior to 13-NOV-2004 through the Gnutella network. > > If anyone would care to test this hypothesis and tell me if you can > successfully access a file, I would like to know. > > Thank you very much. > > John Hagerson (my IP address is 24.14.124.xxx to know if the file came from > me) > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From stephen.thomas at adelaide.edu.au Wed Nov 17 19:15:19 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Nov 17 19:15:40 2004 Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio Books In-Reply-To: References: Message-ID: <419C13C7.5070904@adelaide.edu.au> Brad Collins wrote: > Steve/ > > I tried to send this to the list but it bounced but you will likely > be about the only person interested so here it is... Surely that's not true! I'm sure many on this list are just thrilled by discussions about MARC. ;-) > > Steve Thomas writes: > > >>Feedback welcomed. > > > First of all, your script taught me a lot about both MARC and > Perl. VERY GOOD WORK! Thanks. > > I have been working the last few days on porting your script to elisp > and I noticed the following problem with Audio Book records: > > RDF > > > &pg; > The Seven Poor Travellers > Dickens, Charles (1812-1870) > en > Audio Book, computer-generated > 2006-01-01 > Copyrighted work. See license inside work. > > > MARC > > LDR 00560cam 22001573a 4500 > 005 20041111153800.0 > 008 060101s2006||||xxu|||||s|||||000 f eng d > 100 1 |aDickens, Charles,|d1812-1870 > 245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby Charles Dickens > 260 |bProject Gutenberg Literary Archive Foundation,|c2006 > 500 |aProject Gutenberg > 506 |aFreely available. > 516 |aElectronic text > 830 0|aProject Gutenberg|v9737 > 856 40|uhttp://www.gutenberg.org/etext/9737 > 856 42|uhttp://www.gutenberg.org/license|zLicense First, I see you are using a prior version of the script/output. The latest version now produces this: LDR 00626cam a22002053a 4500 000 9737 003 PGUSA 005 20041115162032.0 008 060101s2006||||xxu|||||s|||||000 | eng d 040 |aPGUSA|beng 042 |adc 100 1 |aDickens, Charles,|d1812-1870 245 14|aThe Seven Poor Travellers |h[electronic resource] /|cby Charles Dickens 260 |bProject Gutenberg,|c2006 500 |aProject Gutenberg 506 |aFreely available. 516 |acomputer-generated Audio Book 830 0|aProject Gutenberg|v9737 856 40|uhttp://www.gutenberg.org/etext/9737 856 42|uhttp://www.gutenberg.org/license|3Rights I can see immediately that I need to add soemthing for the copyrighted works. Probably an addition to the 506 note. > > I am far from being fluent in MARC, but from what I've seen I would > tend to say that the value for the 245 h subfield should be `sound > recording' and I am still not sure about the 516 field for electronic > file types. You'll see that the 516 now reflects what's in the PG catalog. The 245 h subfield value used is a generic term, for the medium of the item, and this is commonly used for any kind of electronic resource. The term 'sound recording' is used for things like LP (and I guess CD) records. The major intent is to distinguish this item from other media, e.g. paper. > > Does MARC have a list of defined enumerated values for these > subfields? > > I have a few other questions: > > I'm also still not clear on why a 500 field is needed. The 500 field is a > general note, so why would a note with a value of `Project Gutenberg' > be helpful? Not sure about this one. But using the 830 field (Series statement) requires either a 490 or 500 (general note). So I'm just following the MARC spec. here. Some things I just don't ask about. ;-) > > Second, I would suggest making the 260 field a bit more ISBD-ish. > > 260 |a-Urbana: |bProject Gutenberg, |c2006. > > or at least: > > 260 |aUrbana, |bProject Gutenberg, |c2006. Yes. You'll see I'm now using just 'Project Gutenberg' for the publisher name -- after coment from Greg. The a subfield can be used for place of publication, but ... I'm not sure what that is. Is it still Urbana (I thought PG had long since moved from there)? Is it the business address of PGLAF? Is it the home town of ibiblio? In the end, it seemed easiest to omit that. > > Which leads us to the question of how should the publisher name be > formated? > > In a sense, each PG; Aussie, Germany, EU, Canada etc would have a > different city they were published in, located in the country they > were from. Even if, (as they are in this case) seperate legal > entities the city should be enough to identify PG USA. There should > be a publisher authority record this points to. How should this be > handled? The catalog only includes items from "PGUSA". If other countries wanted to use the script to build MARC for their collections, then we can easily modify the script to change the publisher name. Thanks for the feedback! Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From gbnewby at pglaf.org Thu Nov 18 04:57:58 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Thu Nov 18 04:58:00 2004 Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio Books In-Reply-To: <419C13C7.5070904@adelaide.edu.au> References: <419C13C7.5070904@adelaide.edu.au> Message-ID: <20041118125758.GA5939@pglaf.org> On Thu, Nov 18, 2004 at 01:45:19PM +1030, Steve Thomas wrote: >... > > > >Second, I would suggest making the 260 field a bit more ISBD-ish. > > > >260 |a-Urbana: |bProject Gutenberg, |c2006. > > > >or at least: > > > >260 |aUrbana, |bProject Gutenberg, |c2006. > > Yes. You'll see I'm now using just 'Project Gutenberg' for the > publisher name -- after coment from Greg. The a subfield can be > used for place of publication, but ... I'm not sure what that > is. Is it still Urbana (I thought PG had long since moved from > there)? Is it the business address of PGLAF? Is it the home town > of ibiblio? In the end, it seemed easiest to omit that. I always used Urbana because it's the historical home, and of course PG still has a presence there (i.e., Michael). Legally speaking, the PGLAF organizational home is wherever I live (funny, I know), unless the PGLAF board decides otherwise. But I don't like using this as a publication location, since I might move. Chapel Hill would be reasonable, since that's where iBiblio is, but PG has no "real" organization there. Salt Lake City is where the business office is, but overall I still prefer Urbana as the "publication location" for PG. There is no 100% accurate place to list. -- Greg From holden.mcgroin at dsl.pipex.com Thu Nov 18 13:43:06 2004 From: holden.mcgroin at dsl.pipex.com (Holden McGroin) Date: Thu Nov 18 13:43:19 2004 Subject: [gutvol-d] LoC Public Domain Newspaper Archive Message-ID: <419D176A.2010709@dsl.pipex.com> Hi all! CNN is reporting that the U.S. Library of Congress is trying to digitize and make available over the web 30 million pages from public domain newspapers :-) http://www.cnn.com/2004/TECH/internet/11/17/oldnewspapers.ap/index.html Cheers, Holden From brad at chenla.org Thu Nov 18 17:58:55 2004 From: brad at chenla.org (Brad Collins) Date: Thu Nov 18 18:00:57 2004 Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio Books In-Reply-To: <20041118125758.GA5939@pglaf.org> (Greg Newby's message of "Thu, 18 Nov 2004 04:57:58 -0800") References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: Greg Newby writes: >> Yes. You'll see I'm now using just 'Project Gutenberg' for the >> publisher name -- after coment from Greg. The a subfield can be >> used for place of publication, but ... I'm not sure what that >> is. Is it still Urbana (I thought PG had long since moved from >> there)? Is it the business address of PGLAF? Is it the home town >> of ibiblio? In the end, it seemed easiest to omit that. > > I always used Urbana because it's the historical home, and of course > PG still has a presence there (i.e., Michael). > [snip] > There is no 100% accurate place to list. Since the place of publication is important for determining copyright restrictions in some cases, I think it would be better to include a place of publication. This has bothered me for some time. I've always wondered how to handle virtual organizations which don't really have a place of publication in the conventional sense like PG or the Apache Group. So I did a little digging in the ISBD specs and found the following: ,----[ ISBD(ER) 4.1.13 ] | 4.1.13 When a place of publication, production or distribution does | not appear anywhere in the item, the name of the known city or town | is supplied in square brackets. If the city or town is uncertain, or | unknown, the name of the probable city or town followed by a | question mark is supplied in square brackets. e.g. | | - [Paris] | - [Prague?] `---- ,----[ ISBD(ER) 4.1.14 ] | 4.1.14 When the name of a city or town cannot be given, the name of | the state, province or country is given, according to the same | stipulations as are applicable to the names of cities or towns. | e.g. | | - Canada | Editorial comment: Known as place of publication; | appears in prescribed source. `---- Since PG doesn't explicitly state that the place of publication is in the States in etexts, (is that right?) this would suggest something like: - [USA]: Project Gutenberg, 2004. or (I prefer) - [Urbana]: Project Gutenberg, 2004. in BMF this might look like: published : ‐ $pl[[USA]]: $pb[Project Gutenberg], $dt[2004] or more verbose BMF (bxids only for example): published : ‐ $pl[$d:bxid://geo:IKE8-5510 $l:[USA]]: $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], $dt[$v:2004-10-12 $l:2004] BMF subfields used: (For complete list of subfields see: http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html) pl place name d defined-by l label pb publisher name dt inclusive dates v value-- in dt it should be a iso8601 formated date b/ -- Brad Collins , Bangkok, Thailand From brad at chenla.org Thu Nov 18 18:08:19 2004 From: brad at chenla.org (Brad Collins) Date: Thu Nov 18 18:10:21 2004 Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio Books In-Reply-To: <20041118125758.GA5939@pglaf.org> (Greg Newby's message of "Thu, 18 Nov 2004 04:57:58 -0800") References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: My last post in reply to Greg included a chunk of rather raw notes I took on the subject yesterday. I might as well send along the rest of the notes which are all exploring issues with the nitty gritty details of manifestation entity records for PG texts. You can ignore the BMF stuff. Take this all as food for thought rather than specific suggestions for PG. ** Series ,----[ ISBD(ER) 6.6.1 ] | 6.6.1 The numbering of the item within a series or sub-series is | given in the terms in which it appears in the item. Standard | abbreviations may be used. Arabic numerals are substituted for other | numerals or spelled-out numbers. e.g. | | - (Multimedia learning series ; vol. 2) | - (Visit Canada series ; vol. C) | - (Computer simulation games ; module 5) | - (BTS research report ; 2) `---- Steve's script give's us: 830 0|aProject Gutenberg|v9737 But the ISBD suggests something like this: (Project Gutenberg etext ; no. 8654) 830 0|a(Project Gutenberg etext ; |vno. 9737 or BMF: series : ($a[Project Gutenberg etext] ; no. $vol[9737]) ** Material Designation ,----[ ISBD(ER) Appendix C ] | **General material designation:** | Electronic resource | | **Resource designations with "electronic" in the designations:** | Electronic data | Electronic font data | Electronic image data | Electronic numeric data | Electronic census data | Electronic survey data | Electronic representational data | Electronic map data | Electronic sound data | Electronic text data | Electronic bibliographic database(s) | Electronic document(s) (e.g. letters, articles) Electronic journal(s) | Electronic newsletter(s) `---- For PG, this would then suggest changing the more general Electronic Resource to more specific: Electronic document Electronic sound data The reason I am suggesting this is that all of the examples I have seen using `Electronic resourece' are for things like interactive CDROMS, and dynamic Web sites. These are not specifically electronic texts, documents or sound recordings. The distinction is small and certainly the general `Electronic Resource' works, but I wanted to find out if there were more specific enumerated values for material designation.... ** Mode of Access Since we haven't gotten around to working on Instance/Item entities yet, this is a bit premature. Access fields are not used in Manifestation entities. But reading through the ISBD and MARC specs got me thinking about the issue. I must say that I don't like the ISBD(ER) mode of access field. Mode of access: Internet via World Wide Web. URL: http://muse.jhu.edu/journals/callaloo/. This is needlessly verbose and redundent. Another example in the spec is a bit better. Mode of access: Internet. URL: http://mitpress.mit.edu/CityofBits/. But it's not much better. ,----[ ISBD(ER) 7.5.2 Notes relating to mode of access] | | Mode of access shall be recorded in a note for all remote access | electronic resources. | | Mode of access is given as the second note following the System | requirements note (see 7.5.1), if given, and is preceded by "Mode of | access" (or its equivalent in another language and/or script). In | the absence of a system requirements note, mode of access is given | as the first note. e.g. | | - Mode of access: Lexis system. Requires subscription to | Mead Data Central, Inc. | - Mode of access: World Wide Web. URL: http://www.un.org | - Mode of access: Internet via ftp://ftp.nevada.edu | - Mode of access: Gopher://gopher.peabody.yale.edu | - Mode of access: Computer university network | - Mode of access: Mikenet `---- On the whole, MARC and ISBD are a bit clumsy when it comes to networked resources--the records are basically electronic catalog cards. Numbers 2,3 and 4 are all network addresses, which have a URL pointing to the resource. I can understand putting a label indicating the type of network protocol but the examples are all screwed up mixing descriptive labels for the protocol with the type of network. Better would be something like the following: - access: Lexis [dialup network]: Note: Requires subscription to Mead Data Central, Inc. - access: Project Gutenberg (WWW site): URL: http://projectgutenberg.org - access: Project Gutenberg (FTP mirror): URL: ftp://ftp.ibiblio.org - access: Internet (FTP site): URL: ftp://ftp.nevada.edu - access: Internet (Gopher site): URL: gopher://gopher.peabody.yale.edu - access: UCLA (university intranet): URL: http://libary.ucla.edu:2080 Note: Requires university network account. - access: Mikenet (private local area network) in BMF access: - $a[$typ:dialup $l:Lexis (dialup network)]: Note: $not[Requires subscription to Mead Data Central, Inc.] - $a[$typ:www $l:Project Gutenberg (WWW site): URL: $url[http://projectgutenberg.org] - $a[$typ:ftp $l:Project Gutenberg (FTP mirror)]: URL: $url[ftp://ftp.ibiblio.org] - $a[$typ:ftp $l:Internet (FTP site)]: URL: $url[ftp://ftp.nevada.edu] - $a[$typ:gopher $l:Internet (Gopher site)]: URL: $url[gopher://gopher.peabody.yale.edu] - $a[$typ:intranet $l:UCLA (university intranet)]: URL: $url[http://libary.ucla.edu:2080] Note: Requires university network account. - $a[$typ:lan $l:Mikenet] ($not[private local area network]) Now what about MARC? Steve's script produces: 856 40|uhttp://www.gutenberg.org/etext/9737 856 42|uhttp://www.gutenberg.org/license|3Rights and the spec sez... ,----[ MARC: 856 Electronic Location and Access ] | Field 856 contains the information needed to locate and access an | electronic resource. The field may be used in a bibliographic record | for a resource when that resource or a subset of it is available | electronically. In addition, it may be used to locate and access an | electronic version of a non-electronic resource described in the | bibliographic record or a related electronic resource. `---- This breaks down to: *** Indicators First: 4 HTTP Second: 0 Resource 2 Related Resource *** Subfields $u URI (do they make a distinction between URI and URL?) $3 Materials specified. -- Brad Collins , Bangkok, Thailand From sly at victoria.tc.ca Thu Nov 18 21:53:50 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Nov 18 21:54:13 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: I question if the use of a series number as suggested below is an ideal approach. I believe it is intended for a smaller number of items which are intentionally published as a series. I'd suggest that the closest thing to PG etexts numbers in a traditional research library, would be accession numbers (as commonly used for microforms) Andrew On Fri, 19 Nov 2004, Brad Collins wrote: > > ** Series > > ,----[ ISBD(ER) 6.6.1 ] > | 6.6.1 The numbering of the item within a series or sub-series is > | given in the terms in which it appears in the item. Standard > | abbreviations may be used. Arabic numerals are substituted for other > | numerals or spelled-out numbers. e.g. > | > | - (Multimedia learning series ; vol. 2) > | - (Visit Canada series ; vol. C) > | - (Computer simulation games ; module 5) > | - (BTS research report ; 2) > `---- > > Steve's script give's us: > > 830 0|aProject Gutenberg|v9737 > > But the ISBD suggests something like this: > > (Project Gutenberg etext ; no. 8654) > > 830 0|a(Project Gutenberg etext ; |vno. 9737 > > or BMF: > > series : ($a[Project Gutenberg etext] ; no. $vol[9737]) > From brad at chenla.org Thu Nov 18 23:49:23 2004 From: brad at chenla.org (Brad Collins) Date: Thu Nov 18 23:51:42 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: (Andrew Sly's message of "Thu, 18 Nov 2004 21:53:50 -0800 (PST)") References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: Andrew Sly writes: > I question if the use of a series number as suggested below is an > ideal approach. > > I believe it is intended for a smaller number of items which are > intentionally published as a series. > > I'd suggest that the closest thing to PG etexts numbers in a traditional > research library, would be accession numbers (as commonly used for > microforms) I used Series because The Early English Text Society publications are cataloged as a series and this was the closest thing I have found to the PG etext numbers. This is from the LOC: Series: Early English Text Society (Series). Original series ; 10, [etc.] 830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.] I understand that the PG etext numbers are not a concious planned series but I still think it works.... b/ -- Brad Collins , Bangkok, Thailand From vze3rknp at verizon.net Fri Nov 19 06:02:50 2004 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Fri Nov 19 06:02:40 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: <419DFD0A.7020504@verizon.net> This is far from my area of expertise, but I do know that we are putting books into PG that come from several types of what I think of as "series". One kind is a group of books by one author (eg The Bobbsey Twins Series) and the other kind is a group of books, each by different authors, that are intended to go together (eg the English Men of Letters biographies). I'd think that we would want to have a way to represent each of these in the PG catalog. JulietS Brad Collins wrote: >Andrew Sly writes: > > > >>I question if the use of a series number as suggested below is an >>ideal approach. >> >>I believe it is intended for a smaller number of items which are >>intentionally published as a series. >> >>I'd suggest that the closest thing to PG etexts numbers in a traditional >>research library, would be accession numbers (as commonly used for >>microforms) >> >> > >I used Series because The Early English Text Society publications are >cataloged as a series and this was the closest thing I have found to >the PG etext numbers. > >This is from the LOC: > >Series: Early English Text Society (Series). Original series ; 10, [etc.] > >830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.] > >I understand that the PG etext numbers are not a concious planned >series but I still think it works.... > >b/ > > > From brad at chenla.org Fri Nov 19 06:43:42 2004 From: brad at chenla.org (Brad Collins) Date: Fri Nov 19 06:45:56 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: <419DFD0A.7020504@verizon.net> (Juliet Sutherland's message of "Fri, 19 Nov 2004 09:02:50 -0500") References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> <419DFD0A.7020504@verizon.net> Message-ID: Juliet Sutherland writes: > This is far from my area of expertise, but I do know that we are > putting books into PG that come from several types of what I think of > as "series". One kind is a group of books by one author (eg The > Bobbsey Twins Series) and the other kind is a group of books, each by > different authors, that are intended to go together (eg the English > Men of Letters biographies). I'd think that we would want to have a > way to represent each of these in the PG catalog. > And you are correct -- and this is why MARC has a number of different ways of dealing with the issue (and I am not the person to explain them) but as far as I can see they are not mutually exclusive. Fields can be repeated (MARC 830 is repeatable) and there is no reason why there aren't series within series. If there is a better way to do this? Was the LOC example I used wrong? b/ -- Brad Collins , Bangkok, Thailand From sly at victoria.tc.ca Fri Nov 19 09:16:16 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Nov 19 09:16:22 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> <419DFD0A.7020504@verizon.net> Message-ID: Hi Brad. I looked for an example of an accession number for microfiche, as used in a marc record, and found the following example: (The accession number is 05000, found in fields 490 and 830, pretty much as you had suggested.) 000 00858nam 2200181 a 450 001 571327 008 810528c19801898enka b 00011 eng 0 020 __ |a 0665050003 (Positive copy) 035 __ |a (CaOOCIHM)81603284X 035 __ |9 ACN8054TS 040 __ |a CaOOCIHM |b eng 100 10 |a Allen, Grant, |d 1848-1899 245 13 |a An African millionaire |h [microform] : |b episodes in the life of the illustrious Colonel Clay / |c by Grant Allen. 260 0_ |a London : |b G. Richards, |c 1898. 300 __ |a 4 microfiches (183 fr.) : |b ill. 490 1_ |a CIHM/ICMH Microfiche series = CIHM/ICMH collection de microfiches ; |v no. 05000 533 __ |a Filmed from a copy of the original publication held by the Izaak Walton Killam Mmemorial Library, Dalhousie University. |b Ottawa : |c Canadian Institute for Historical Microreproductions, |d 1980. 830 _0 |a CIHM/ICMH Microfiche series ; |v no. 05000 Thanks, Andrew From shalesller at writeme.com Fri Nov 19 10:05:40 2004 From: shalesller at writeme.com (D. Starner) Date: Fri Nov 19 10:05:50 2004 Subject: [gutvol-d] Re: PG catalog - MARC Message-ID: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com> "Brad Collins" writes: > I used Series because The Early English Text Society publications are > cataloged as a series and this was the closest thing I have found to > the PG etext numbers. > > This is from the LOC: > > Series: Early English Text Society (Series). Original series ; 10, [etc.] > > 830 _0 |a Early English Text Society (Series). |p Original series ; |v 10, [etc.] So what happens when we do the EETS books? There's one in the PPVing queue at DP, and more being proofed and waiting for PPers. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From joshua at hutchinson.net Fri Nov 19 10:18:10 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 19 10:18:17 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) Message-ID: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> Attached at the bottom is rough draft of a teiHeader spec. I basically wrote it as an example teiHeader with comments scattered all over to explain things. Since I'm not a cataloging expert (heck, I'm don't even qualify as a catalog neophyte), I'm relying heavily on what is in Marcello's documentation and the original TEI documentation. I basically just picked out stuff that looked important and relevant based on stuff I've done before. I'm sure I've missed some things. Please take a look through this and point out items that aren't covered and you think should be. Also, if anything is unclear, let me know. I'll try to explain it better (and update the explanation in the spec). Remember, the goal is to be able to grab information from the teiHeader in each etext to generate cataloging information (this ties in nicely with the ongoing MARC discussions around here). Josh PS The next document will be the actual markup spec rough draft. This one will probably be delayed until after the US Thanksgiving holidays (I don't think well with 5 pounds of turkey digesting in my tummy!) -------------- next part -------------- Project Gutenberg TEI Header Specification by Joshua Hutchinson This document will provide a "dummy" teiHeader, marking each line as either MANDATORY, RECOMMENDED or OPTIONAL. For fuller descriptions of what each part of the teiHeader, please see the original TEILite documentation for the Electronic Title Page () located here (http://www.tei-c.org/Lite/teiu5_en.html#U5-header). <-- MANDATORY SECTION --> <-- MANDATORY SECTION --> The Title of the EText <-- MANDATORY SECTION --> FirstName LastName <-- MANDATORY SECTION --> Illustrator, Editor, etc. FirstName LastName <-- MANDATORY SECTION (if it exists for this text) --> <-- Multiple entries are allowed. For instance, co-authors for a text would result in multiple entries, one for each author. --> <-- OPTIONAL --> First edition <-- OPTIONAL --> <-- This is information specifically about the PG edition. For instance, in the past, a major update of a text would often result in a number increment of a text's file name. This information could be tracked here. I'm not certain that this type of information is captured anymore. --> <-- MANDATORY SECTION --> Project Gutenberg <-- MANDATORY SECTION --> November, 2004 <-- MANDATORY SECTION --> 12345 <-- MANDATORY SECTION --> <-- This information pertains to the posting of the etext in PG archives. The field won't ever change. (Should it be something else, like PGLAF?) The field is the month and year it was posted. The field is the file number assigned to the text by the whitewashers before posting to the archive. (NOTE: PG could also list itself as a distributor instead of a publisher.) Possible future addition here: There has been sporadic talk in the past of getting ISBN numbers for PG posted works. This would go here as another field. --> <-- MANDATORY SECTION --> A short description of the etext (i.e., The first folio of Shakespeare, prepared by Charlton Hinman) <-- OPTIONAL --> <-- MANDATORY SECTION --> <-- MANDATORY SECTION --> The Title of the Source Text <-- MANDATORY SECTION --> FirstName LastName <-- MANDATORY SECTION --> Illustrator, Editor, etc.FirstName LastName <-- MANDATORY SECTION (if it exists for this text) --> <-- MANDATORY SECTION --> Original Source Publisher <-- MANDATORY SECTION --> January, 1922 <-- MANDATORY SECTION --> Place of Publication <-- MANDATORY SECTION --> <-- The observant will notice that the fields from the beginning of are duplicated in the section. The information will not necessarily be identical, though. The information refers to our etext, while the refers back to the original source document. --> <-- OPTIONAL --> The etext was produced by the Distributed Proofreaders at http://www.pgdp.net. <-- OPTIONAL --> <-- This section is optional. In fact, for the example given, it is largely redundant because DP will be given credit in the section detailed below. --> <-- MANDATORY SECTION --> <-- MANDATORY SECTION --> English (United States) Written out language title <-- RECOMMENDED SECTION --> KEYWORD <-- This section is fairly straight forward, just listing the languages used in the etext and some search keywords. --> <-- MANDATORY SECTION --> November 2004 Scans provided by Cornell University Joshua Hutchinson Juliet Sutherland Distributed Proofreaders Etext created November 2005 Jim Tinsley Fixed missing chapter headers and minor typos <-- This section will be added to each time something is done to the text, so we have a running record of changes. The order should be in the order the changes were done, the original creation first and each additional change listed in order below. Because some volunteers wish to remain anonymous, it is perfectly acceptable to simply list ANONYMOUS in a name line. --> From marcello at perathoner.de Fri Nov 19 10:46:03 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Nov 19 10:46:12 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> Message-ID: <419E3F6B.2050902@perathoner.de> Joshua Hutchinson wrote: > The Title of the EText <-- MANDATORY SECTION --> We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it. The Tempest A Midsummer Nights Dream This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right. > <-- OPTIONAL --> > First edition <-- OPTIONAL --> > I think the edition number is not maintained any more. I don't see any of them in the new file system. > <-- MANDATORY SECTION --> > Project Gutenberg <-- MANDATORY SECTION --> > November, 2004 <-- MANDATORY SECTION --> > 12345 <-- MANDATORY SECTION --> > The date should also mention the day. We are not using the date for filing any more. Is this the date of first publication or updated with each new edition ? > <-- RECOMMENDED SECTION --> > > > KEYWORD > > > This needs some more thought as the keywords should come out of some authority list. In that case the authority must be specified. > > November 2004 > > Scans provided by Cornell University > Joshua Hutchinson > Juliet Sutherland > Distributed Proofreaders > > Etext created > Better separate scanning and proofing: 2003 Cornell University Scanned the source November 2004 Joshua Hutchinson Juliet Sutherland Distributed Proofreaders Etext created -- Marcello Perathoner webmaster@gutenberg.org From scott_bulkmail at productarchitect.com Fri Nov 19 11:30:33 2004 From: scott_bulkmail at productarchitect.com (Scott Lawton) Date: Fri Nov 19 11:31:22 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: <419E3F6B.2050902@perathoner.de> References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> <419E3F6B.2050902@perathoner.de> Message-ID: >> The Title of the EText <-- MANDATORY SECTION --> > >We should provide an non-standard attribute of "nonfiling". This is the number of chars to remove from the start of title before sorting it. > > The Tempest > > A Midsummer Nights Dream > >This is an extension to TEI but very useful for the catalog software. It avoids unsightly titles like: "Tempest, The" and still sorts right. I think that should be done (or not) by the cataloging software rather than hardcoded into each file. It's an easy thing to miss, i.e. to be done inconsistently. And, since it's not part of non-PG TEI, there's no other software in the outside world that looks for it. (I may have made this point before, but if so I can't find it in my archives.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting From krausyaoj at ameritech.net Fri Nov 19 11:48:52 2004 From: krausyaoj at ameritech.net (Jeffrey Kraus-yao) Date: Fri Nov 19 11:48:08 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: Message-ID: <001401c4ce70$cbe4b940$0402a8c0@p3> Another option for the title is to use a file-as attribute. The Tempest A Midsummer Nights Dream While this may not be included in the TEI standard, it is part of the OEB standard, http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Scott Lawton Sent: 19 November, 2004 13:31 To: Project Gutenberg Volunteer Discussion Subject: Re: [gutvol-d] TEI Header Spec (rough draft) >> The Title of the EText <-- MANDATORY SECTION --> > >We should provide an non-standard attribute of "nonfiling". This is the >number of chars to remove from the start of title before sorting it. > > The Tempest > > A Midsummer Nights Dream > >This is an extension to TEI but very useful for the catalog software. >It avoids unsightly titles like: "Tempest, The" and still sorts right. I think that should be done (or not) by the cataloging software rather than hardcoded into each file. It's an easy thing to miss, i.e. to be done inconsistently. And, since it's not part of non-PG TEI, there's no other software in the outside world that looks for it. (I may have made this point before, but if so I can't find it in my archives.) -- Cheers, Scott S. Lawton http://Classicosm.com/ - classic books http://ProductArchitect.com/ - consulting _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d From traverso at dm.unipi.it Fri Nov 19 14:12:11 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Fri Nov 19 14:12:25 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: (message from Scott Lawton on Fri, 19 Nov 2004 14:30:33 -0500) References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> <419E3F6B.2050902@perathoner.de> Message-ID: <200411192212.iAJMCBdV012178@posso.dm.unipi.it> >>>>> "Scott" == Scott Lawton writes: >>> The Title of the EText <-- MANDATORY SECTION >>> --> >> We should provide an non-standard attribute of >> "nonfiling". This is the number of chars to remove from the >> start of title before sorting it. >> >> The Tempest >> >> A Midsummer Nights Dream >> >> This is an extension to TEI but very useful for the catalog >> software. It avoids unsightly titles like: "Tempest, The" and >> still sorts right. Scott> I think that should be done (or not) by the cataloging Scott> software rather than hardcoded into each file. How can the software guess what is filing and what not? "As Farpas" and "As you like it", "As" is filing or not? Here the language might decide, but I think that it is possible in the same language to have the same word to be filing or non filing (surely it is in italian if you disregard accents). However, relying on character count is very fragile, especially in a context in which whitespace is considered irrelevant. I have often seen braces used in sorting software: {The} Tempest, {A} Midsummer Nights Dream: characters in braces and whitespace are discarded for the purpose of sorting, braces are discarded for the purpose of printing. Of course it is possible to achieve the same result, much more verbosely, with angled brackets.... <nonfiling>The</nonfiling> Tempest (a side remark: a non-filing part is not always separated by space: {L'}Inferno) Carlo From brad at chenla.org Fri Nov 19 19:05:51 2004 From: brad at chenla.org (Brad Collins) Date: Fri Nov 19 19:07:56 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com> (D. Starner's message of "Fri, 19 Nov 2004 10:05:40 -0800") References: <20041119180540.5D2BC4BDAA@ws1-1.us4.outblaze.com> Message-ID: "D. Starner" writes: > So what happens when we do the EETS books? There's one in the PPVing > queue at DP, and more being proofed and waiting for PPers. Since the PG edition is distinct from the EETS edition this won't be a problem. Any reference to the EETS series number would be in a note indicating the source used for the PG edition. It might look something like this (please excuse my shaky ISBD) mainTitle : Vices and virtues; being a soul's confession of its sins with reason's description of the virtues. A middle-English dialogue of about 1200 A.D. [electronic document] / Edited by F. Holthausen. responsibility : Holthausen, Ferdinand, 1860-1956. ed. published : - [Urbana]: Project Gutenberg, 2006. series : Project Gutenberg ; etext no. 55031 source : Text based on: Vices and Virtues / ed. by Dr. F Holthausen. EETS Original Series ; no. 89, 159 - London: Kegan Paul, Trench, Tr?bner, 1888. or it could be included in an 500 note field and a 830 labeled as a Variant Series as was done in a record for reprint of the book in the LOC catalog. notes : v. 1 first published 1888, v. 2 first published 1921. variantSeries : Early English Text Society. Publications. Original series ; no. 89, 159 Actually I like this better. b/ who just now realised it's Saturday morning... -- Brad Collins , Bangkok, Thailand From joshua at hutchinson.net Fri Nov 19 20:15:03 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Nov 19 20:15:04 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: <419E3F6B.2050902@perathoner.de> References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> <419E3F6B.2050902@perathoner.de> Message-ID: <419EC4C7.7080307@hutchinson.net> Marcello Perathoner wrote: > Joshua Hutchinson wrote: > >> The Title of the EText <-- MANDATORY SECTION --> > > > We should provide an non-standard attribute of "nonfiling". This is > the number of chars to remove from the start of title before sorting it. > > The Tempest > > A Midsummer Nights Dream > > This is an extension to TEI but very useful for the catalog software. > It avoids unsightly titles like: "Tempest, The" and still sorts right. > > I see the need for this... But I think I like Jeffrey's method a little better. (From another post) The Tempest A Midsummer Nights Dream While this may not be included in the TEI standard, it is part of the OEB standard, http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm >> <-- MANDATORY SECTION --> >> Project Gutenberg <-- MANDATORY SECTION >> --> >> November, 2004 <-- MANDATORY >> SECTION --> >> 12345 <-- MANDATORY SECTION --> >> > > > The date should also mention the day. We are not using the date for > filing any more. > Fair enough. Will change it. > Is this the date of first publication or updated with each new edition ? > > First publication. Subsequent updates will be documented at the end. >> <-- RECOMMENDED SECTION --> >> >> >> KEYWORD >> >> >> > > > This needs some more thought as the keywords should come out of some > authority list. In that case the authority must be specified. > > This is where the catalog folks need to step in. :) >> >> November 2004 >> >> Scans provided by Cornell University >> Joshua Hutchinson >> Juliet Sutherland >> Distributed Proofreaders >> >> Etext created >> > > > Better separate scanning and proofing: > > > 2003 > > Cornell University > > Scanned the source > > > November 2004 > > Joshua Hutchinson > Juliet Sutherland > Distributed Proofreaders > > Etext created > > > > Ok, we can separate that information out... Will update. Josh From stephen.thomas at adelaide.edu.au Fri Nov 19 21:29:06 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Fri Nov 19 21:29:28 2004 Subject: [gutvol-d] Re: PG catalog - MARC In-Reply-To: References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: <419ED622.5060904@adelaide.edu.au> Brad Collins wrote: > Since the place of publication is important for determining copyright > restrictions in some cases, I think it would be better to include a > place of publication. > > This has bothered me for some time. I've always wondered how to > handle virtual organizations which don't really have a place of > publication in the conventional sense like PG or the Apache Group. I think the ISBD recommends using "s.l." where the place is unknown or indeterminate. (Initials for "sine loco". See http://www.ifla.org/VII/s13/pubs/isbd3.htm#18 section 4.1.15) However, this does not help the copyright question. MARC does provide the 506 field ("Restrictions on Access note") for copyright notices etc. Right now, I'm just putting "Freely available" in here (or the copyright statement for copyrighted works). But we could use a more detailed statement here. E.g. we should as a minimum say "Freely available in the USA. May be subject to copyright in other locations." We could also place the license url in this field (subfield u) rather than the 856. Regarding the series statement -- I'm not wedded to the use of 830 for "Project Gutenberg". It just seemed an appropriate way to include the PG number. One typical use of 830 in library catalogues is to be able to index works by series name. So this would allow (in this case) for a search on series name "Project Gutenberg" to list all the works in the collection. However, with currently almost 14,000 titles, maybe this isn't a worthwhile goal. "Project Gutenberg" should also be available as keywords in any library catalog search, if one needed to limit a search to just PG works. We could always expand the 500 General Note to include more detail about PG, including the item number. (500 can be whatever we want, and you can have as many 500 notes as you need.) Also, the item number is present in field 001 -- although that probably won't be visible to the general user of a library catalog, so including it in the 500 note is useful (and again makes the number usable in a keyword search). So if you want to reserve the 830 for particular series within PG (e.g. EET) then that's fine with me. Steve -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From marcello at perathoner.de Sat Nov 20 01:46:10 2004 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat Nov 20 01:46:33 2004 Subject: [gutvol-d] TEI Header Spec (rough draft) In-Reply-To: <419EC4C7.7080307@hutchinson.net> References: <20041119181810.AC5522F9EF@ws6-3.us4.outblaze.com> <419E3F6B.2050902@perathoner.de> <419EC4C7.7080307@hutchinson.net> Message-ID: <419F1262.5080408@perathoner.de> Joshua Hutchinson wrote: > I see the need for this... But I think I like Jeffrey's method a little > better. (From another post) > > The Tempest > A Midsummer Nights > Dream > > While this may not be included in the TEI standard, it is part of the > OEB standard, > http://www.openebook.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm My method is part of the MARC standard and is already implemented in the catalog database. -- Marcello Perathoner webmaster@gutenberg.org From j.hagerson at comcast.net Sat Nov 20 09:09:42 2004 From: j.hagerson at comcast.net (John Hagerson) Date: Sat Nov 20 09:10:57 2004 Subject: [gutvol-d] Problems running W3 validator from XP Message-ID: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> My efforts to use the W3 HTML validator fail every time because the MIME type is text. In the past, I was able to validate files. Other than chucking my operating system, do you have any suggestions as to how I can address this problem? Thank you very much. From Gutenberg9443 at aol.com Sat Nov 20 09:32:23 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sat Nov 20 09:32:35 2004 Subject: [gutvol-d] documenting etexts Message-ID: <13e.6b4b973.2ed0d9a7@aol.com> Since this topic came up, I have given it a lot of thought. I think I have the answer. We do not have to know the specific page number if we're quoting the Bible or Shakespeare. That can be carried over to other texts as well. (I hope my underlining shows up in all email.) Bib entry: Richardson, Samuel. Pamela, or Virtue Rewarded. orig. pub. 1740-1741. n.p.: Project Gutenberg, n.d. footnote or endnote: Richardson. Pamela. Section IV, Letter VII, par. 4. Would not this serve most purposes? Please discuss this WITHOUT FLAMING. The world has flames enough without them showing up here. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041120/9e0d6dc0/attachment.html From jmdyck at ibiblio.org Sat Nov 20 10:29:00 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Sat Nov 20 10:30:01 2004 Subject: [gutvol-d] documenting etexts References: <13e.6b4b973.2ed0d9a7@aol.com> Message-ID: <419F8CEC.8B69BB26@ibiblio.org> Gutenberg9443@aol.com wrote: > > We do not have to know the specific page > number if we're quoting the Bible or Shakespeare. > That can be carried over to other texts as well. (I > hope my underlining shows up in all email.) > > Bib entry: > > Richardson, Samuel. Pamela, or Virtue Rewarded. > orig. pub. 1740-1741. n.p.: Project > Gutenberg, n.d. > > footnote or endnote: > > Richardson. Pamela. Section IV, Letter VII, par. 4. > > Would not this serve most purposes? Theoretically, perhaps, but I think it has some practical shortcomings. 1) It assumes that the person making the reference and the people looking up the reference all agree on how to count paragraphs. Usually it's straightforward, but if the source has display quotes, poetry (with stanzas), epigraphs, footnotes, etc, people will probably make different assumptions about how to count them. 2) Some cases would require you to count a lot of paragraphs. Consider a chapter in a novel, with lots of conversational dialogue. The number of paragraphs could easily get into the hundreds. A reference like "Chapter 5, par. 157" might be rather discouraging. (The Bible, and some editions of Shakespeare, avoid these problems by putting the numbering system explicitly in the text.) -Michael From joshua at hutchinson.net Sat Nov 20 13:08:11 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Nov 20 13:08:07 2004 Subject: [gutvol-d] Problems running W3 validator from XP In-Reply-To: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> Message-ID: <419FB23B.5000906@hutchinson.net> Hmm... Are you pointing it to a file on a server or just browsing for the local .HTML file and uploading it? If it is a server, then there is something set wrong on your server. If it is a local file, there should BE a MIME type. Josh John Hagerson wrote: >My efforts to use the W3 HTML validator fail every time because the MIME >type is text. In the past, I was able to validate files. > >Other than chucking my operating system, do you have any suggestions as to >how I can address this problem? > >Thank you very much. > > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From joshua at hutchinson.net Sat Nov 20 13:09:31 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Sat Nov 20 13:09:21 2004 Subject: [gutvol-d] Problems running W3 validator from XP In-Reply-To: <419FB23B.5000906@hutchinson.net> References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> <419FB23B.5000906@hutchinson.net> Message-ID: <419FB28B.5030108@hutchinson.net> Oops, that last sentence should read, "If it is a local file, there should NOT be a MIME type." Joshua Hutchinson wrote: > Hmm... Are you pointing it to a file on a server or just browsing for > the local .HTML file and uploading it? If it is a server, then there > is something set wrong on your server. If it is a local file, there > should BE a MIME type. > > Josh > > John Hagerson wrote: > >> My efforts to use the W3 HTML validator fail every time because the MIME >> type is text. In the past, I was able to validate files. >> >> Other than chucking my operating system, do you have any suggestions >> as to >> how I can address this problem? >> >> Thank you very much. >> >> >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d@lists.pglaf.org >> http://lists.pglaf.org/listinfo.cgi/gutvol-d >> >> >> > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From servalan at ar.com.au Sat Nov 20 13:18:10 2004 From: servalan at ar.com.au (Pauline) Date: Sat Nov 20 13:45:52 2004 Subject: [gutvol-d] Problems running W3 validator from XP In-Reply-To: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> References: <002901c4cf23$bda4b6b0$6401a8c0@enterprise> Message-ID: <419FB492.8010809@ar.com.au> John Hagerson wrote: > My efforts to use the W3 HTML validator fail every time because the MIME > type is text. In the past, I was able to validate files. > > Other than chucking my operating system, do you have any suggestions as to > how I can address this problem? This is a known problem with XP Service Pack 2 & IE. Details here: http://www.webmasterworld.com/forum21/8867.htm & more places if you google: http://www.google.com.au/search?q=validate+w3c+xp+sp2+IE&btnG=Search&hl=en & discussed at DP here: http://www.pgdp.net/phpBB2/viewtopic.php?p=88608&highlight=validate+xp#88608 As a permanent fix - use a real browser :) Download one here: http://www.mozilla.org/ I'm a recent convert from Mozilla for browsing & email to Firefox for browsing & Thunderbird for email. I hope this helps, P -- Distributed Proofreaders: http://www.pgdp.net "Preserving history one page at a time." From stephen.thomas at adelaide.edu.au Sat Nov 20 16:00:33 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Sat Nov 20 16:00:49 2004 Subject: [gutvol-d] documenting etexts In-Reply-To: <13e.6b4b973.2ed0d9a7@aol.com> References: <13e.6b4b973.2ed0d9a7@aol.com> Message-ID: <419FDAA1.10405@adelaide.edu.au> As Michael Dyck has pointed out, there are problems with citing paragraph numbers. With an HTML version, it would be quite possible to add an anchor to the start of every paragraph, so that a citation might simply provide a URL to the exact paragraph. E.g. Richardson. _Pamela_. Section IV, Letter VII, http://www.gutenberg.org/dirs/etext04/pam1w10.htm#p456 (which unfortunately doesn't exist -- but follows Anne's example.) (One would expect that anyone citing a PG work would provide the link to the exact version that they'd used.) With a plain text version, it's simply not possible to give an exact citation, for the reasons that Michael mentioned. However, citation is about citing sources, as best one can, so that Richardson. _Pamela_. Section IV, Letter VII, http://www.gutenberg.org/dirs/etext04/pam1w10.txt would be perfectly acceptable as a citation -- leaving of course the matter of *finding* the exact point in the text to the reader. GIven that the reader can use the URL to obtain the text, and then use Find to search for the phrase in question, with less trouble that locating a phrase on a particular printed page, this seems to me to be a perfectly adequate form of citation. (Especially as any reader will be able to easily obtain the PG text, which can't be said for many print citations -- if you can't lay your hands on the print edition, it doesn't matter how closely the thing is cited! So my advice is -- don't sweat it, cite as best you can and consider the advantages over the disadvantages. Steve Gutenberg9443@aol.com wrote: > Since this topic came up, I have given it a > lot of thought. I think I have the answer. > > We do not have to know the specific page > number if we're quoting the Bible or Shakespeare. > That can be carried over to other texts as well. (I > hope my underlining shows up in all email.) > > Bib entry: > > Richardson, Samuel. _Pamela, or Virtue Rewarded_. > orig. pub. 1740-1741. n.p.: Project > Gutenberg, n.d. > > footnote or endnote: > > Richardson. _Pamela_. Section IV, Letter VII, par. 4. > > Would not this serve most purposes? > > Please discuss this WITHOUT FLAMING. > The world has flames enough without them > showing up here. > > Anne > > > ------------------------------------------------------------------------ > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d -- Stephen Thomas, Senior Systems Analyst, Adelaide University Library ADELAIDE UNIVERSITY SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ From traverso at dm.unipi.it Sat Nov 20 11:54:41 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Sun Nov 21 01:07:12 2004 Subject: [gutvol-d] documenting etexts In-Reply-To: <13e.6b4b973.2ed0d9a7@aol.com> (Gutenberg9443@aol.com) References: <13e.6b4b973.2ed0d9a7@aol.com> Message-ID: <200411201954.iAKJsfI2004973@posso.dm.unipi.it> >>>>> "Anne" == Gutenberg9443 writes: Anne> Since this topic came up, I have given it a lot of Anne> thought. I think I have the answer. Anne> We do not have to know the specific page number if we're Anne> quoting the Bible or Shakespeare. That can be carried over Anne> to other texts as well. (I hope my underlining shows up in Anne> all email.) Anne> Bib entry: Anne> Richardson, Samuel. Pamela, or Virtue Rewarded. Anne> orig. pub. 1740-1741. n.p.: Project Gutenberg, n.d. Anne> footnote or endnote: Anne> Richardson. Pamela. Section IV, Letter VII, par. 4. Anne> Would not this serve most purposes? Anne> Please discuss this WITHOUT FLAMING. The world has flames Anne> enough without them showing up here. I have two objections, that I would like to know how you would solve: - existing books quote other books through pages. If you want to find in a book a discussion that is quoted by page, how are you going to find it, if you don't have page numbers? Of course, if you have an exact quotation you can search for it, but assume that you have just a description, or maybe a translation. - assume that you want to quote a book that you have only in a paper edition; to quote it in your style, you need to manually count the paragraphs, both when quoting and when checking a quotation; wouldn't the standard way of quoting pages of a reference edition (usually, the only edition) be better? Of course, you said: Anne> Would not this serve most purposes? Yes, most maybe, but not all. And most is not enough. You said also Anne> to other texts as well. (I hope my underlining shows up in Anne> all email.) No, it doesn't. Here too you are assuming that other people use the same tools that you use. A good tool is one that adapts itself to an unknown situation, and does not make assumptions. Discarding page numbers in reference works makes assumptions on other people's working methods; the result is a less flexible tool. Carlo From gbnewby at pglaf.org Sun Nov 21 14:06:37 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Nov 21 14:06:39 2004 Subject: [gutvol-d] Re: PG catalog - MARC -- problem with encoding for Audio Books In-Reply-To: References: <419C13C7.5070904@adelaide.edu.au> <20041118125758.GA5939@pglaf.org> Message-ID: <20041121220637.GB24601@pglaf.org> On Fri, Nov 19, 2004 at 08:58:55AM +0700, Brad Collins wrote: > Greg Newby writes: > >> Yes. You'll see I'm now using just 'Project Gutenberg' for the > >> publisher name -- after coment from Greg. The a subfield can be > >> used for place of publication, but ... I'm not sure what that > >> is. Is it still Urbana (I thought PG had long since moved from > >> there)? Is it the business address of PGLAF? Is it the home town > >> of ibiblio? In the end, it seemed easiest to omit that. > > > > I always used Urbana because it's the historical home, and of course > > PG still has a presence there (i.e., Michael). > > > [snip] > > There is no 100% accurate place to list. > > > Since the place of publication is important for determining copyright > restrictions in some cases, I think it would be better to include a > place of publication. I definitely agree. I left the below for context, but wanted to mention my favorite is: [Urbana, Illinois]: Project Gutenberg, 2004. Note, I added the state, since there are many Urbanas. Urbana is as accurate as we are likely to get. -- Greg > This has bothered me for some time. I've always wondered how to > handle virtual organizations which don't really have a place of > publication in the conventional sense like PG or the Apache Group. > > So I did a little digging in the ISBD specs and found the following: > > ,----[ ISBD(ER) 4.1.13 ] > | 4.1.13 When a place of publication, production or distribution does > | not appear anywhere in the item, the name of the known city or town > | is supplied in square brackets. If the city or town is uncertain, or > | unknown, the name of the probable city or town followed by a > | question mark is supplied in square brackets. e.g. > | > | - [Paris] > | - [Prague?] > `---- > > ,----[ ISBD(ER) 4.1.14 ] > | 4.1.14 When the name of a city or town cannot be given, the name of > | the state, province or country is given, according to the same > | stipulations as are applicable to the names of cities or towns. > | e.g. > | > | - Canada > | Editorial comment: Known as place of publication; > | appears in prescribed source. > `---- > > Since PG doesn't explicitly state that the place of publication is in > the States in etexts, (is that right?) this would suggest something > like: > > - [USA]: Project Gutenberg, 2004. > > or (I prefer) > > - [Urbana]: Project Gutenberg, 2004. > > in BMF this might look like: > > published : ‐ $pl[[USA]]: $pb[Project Gutenberg], $dt[2004] > > or more verbose BMF (bxids only for example): > > published : ‐ $pl[$d:bxid://geo:IKE8-5510 $l:[USA]]: > $pb[$d:bxid://aut:JIQ6-7286 $l:Project Gutenberg], > $dt[$v:2004-10-12 $l:2004] > > BMF subfields used: > (For complete list of subfields see: > http://192.168.0.103/cgi-bin/bmf.cgi/Reference/SubfieldQuickRef.html) > > pl place name > d defined-by > l label > pb publisher name > dt inclusive dates > v value-- in dt it should be a iso8601 formated date > > > b/ > > > -- > Brad Collins , Bangkok, Thailand > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From gbnewby at pglaf.org Sun Nov 21 14:18:32 2004 From: gbnewby at pglaf.org (Greg Newby) Date: Sun Nov 21 14:18:34 2004 Subject: [gutvol-d] [etext04|etext05]/index.html missing? In-Reply-To: <00ca01c4cb7a$ac6d7640$6401a8c0@enterprise> References: <4198E3A2.6080504@perathoner.de> <00ca01c4cb7a$ac6d7640$6401a8c0@enterprise> Message-ID: <20041121221832.GF24601@pglaf.org> On Mon, Nov 15, 2004 at 07:21:55PM -0600, John Hagerson wrote: > Well, not knowing what to do, I went to the Robots Readme on the > Gutenberg.org web site and copied the wget command listed under the heading > "Getting All EBook Files." I started this process on Sunday evening, at the > end of a cable modem. Little did I realize that more than 24 hours later, > the process would still be running. > > In a private message, I was told to use rsync. OK. If rsync is the preferred > method, then why is wget presented as the example? > > It appears that I'm storing a bunch of index.html files that are redundant > if I use rsync. I guess I can clean them up at my leisure. However, again > the web page says "keep the html files" to make re-roboting faster. > > Well, I'll be a mirror site for all of the ZIP and HTML files, anyway. > > Please post suggestions here or pm me. Thank you. John, please see the mirroring HOWTO at http://gutenberg.org/howto Mirroring the entire site is different than harvesting a few directories or sets of files. The "index.html" is created by the remote server, to simply list the files in a directory - you are right that it's transient/temporary/imaginary. Note that a 256Kbit DSL model will take about 6 days to download the entire PG collection (it's 140GB). We do not recommend DSL or cable modems for setting up mirrors, and generally don't list them in our mirror list. -- Greg > -----Original Message----- > From: gutvol-d-bounces@lists.pglaf.org > [mailto:gutvol-d-bounces@lists.pglaf.org] On Behalf Of Marcello Perathoner > Sent: Monday, November 15, 2004 11:13 AM > To: Project Gutenberg Volunteer Discussion > Subject: Re: [gutvol-d] [etext04|etext05]/index.html missing? > > John Hagerson wrote: > > > I am using wget to download books from www.gutenberg.org. The process is > > stuck on etext04 in what appears to be a futile effort to download > > index.html. > > The indexes are auto-generated on the fly by Apache. > > If the load on the fileservers is too high the connection times out > before a full directory listing can be retrieved. > > You should not harvest at peak hours anyway. > > > > -- > Marcello Perathoner > webmaster@gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From Gutenberg9443 at aol.com Sun Nov 21 17:02:04 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Sun Nov 21 17:02:23 2004 Subject: [gutvol-d] documenting etexts Message-ID: <19e.2bb96cf4.2ed2948c@aol.com> In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time, traverso@dm.unipi.it writes: >>Here too you are assuming that other people use the >>same tools that you use. A good tool is one that adapts >>itself to an >>unknown situation, and does not make assumptions. >>Discarding page >>numbers in reference works makes assumptions on >>other people's working >methods; the result is a less flexible tool. I didn't assume. I asked. Some ISPs carry formatting over and some don't. I have no idea what ISP someone I don't know is using or what that ISP might do with formatting. A good tool is one that can be used for at least 50 things besides the one it is designed for. You can use two bricks to kill a fly, if you can figure out how to make the fly stay on the bottom brick long enough for you to clap the top brick on top of it. You can use bricks and boards to make a bookcase. You can make a street out of bricks. Go away and think of 47 other uses for bricks. Then talk to me about tools. I am not recommending discarding page numbers in reference books. I am suggesting that the majority of books already posted do not necessitate going back and redoing to insert the page numbers. It's like those old Tom Swift books I was recently accused of reading in preference to anything else: if I want to do a learned paper or book on the Stratemeyer syndicate--I think but am not sure, and it is not necessary for anybody to inform me, that it included Tom Swift; I know it included Nancy Drew and the Hardy Boys--I will have to go somewhere that I can use the tree book versions, and even then I'll have to be careful, because I know that Nancy Drew and the Hardy Boys were rewritten umpteen times, often with no more than the title saved from edition to edition and no indication in the front matter as to what version this one was. But do enough people want to write learned papers on Tom Swift, or Tarzan, or Elsie Dinsmore, or The Wizard of Oz, for it to be reasonable for me to demand that all the Tom Swift, Tarzan, Elsie Dinsmore, and Wizard of Oz books to be pulled down until somebody has time to rescan them and keep all the page numbers this time? I don't think so. By the way, since so many people seem to know better than I do what I'm reading at present, I'll save them the trouble of guessing. I have finally laid my hands on a copy of Isabella Beeton's 1865 book on household management--University of Adelaide has posted it--and I'm reading it because I think that it is appropriate Sabbath Day reading, and yes I know different religions have different "Sabbath Days" but I'm referring to my own religion's. (I specify this because I was once head of a very small--three person--department which happened to include a Muslim, a Christian, and a fellow who wasn't interested in religion. So I set schedules up so that I was always off Sunday, which I wanted, and Saki was always off Saturday, which he wanted, and Pat was always off in the middle of the week, which he wanted. So my boss's boss turned it all around so that none of us had the days off we wanted, because I could not get it through his head that we were all happy with the schedule I had arranged.) I stopped reading the book long enough to send a message to my brothers inquiring how to slice a garfish and how many axes would be necessary. Mrs. Beeton gives instructions for how to cook the garfish but she begins by saying that it is necessary to begin by slicing the garfish. Last time (okay, the only time) I ever saw a garfish, one of my brothers tried to behead it and broke an axe. By the way, Mrs. Beeton does not number pages. She numbers recipes. So her table of contents and her index get a reader to the right place no matter what form the text is in. Three cheers for Mrs. Beeton! The previous book was Pamela; the next ones will be A. Merritt's The Moon Pool and The Metal Monster. I'm perfectly furious that one of the A. Merritt books I've been seeking has turned up on FictionWise and I have to PAY for it and I don't have the money. Shall I report on what books I read after The Metal Monster? Actually I was kind of thinking about calling a whole lot of state capitals and explaining that we need to redo our registrations and asking how I need to go about doing it, but if it's necessary for me to give book reports I can do that instead. Also I'm sort of busy reading ancient Egyptian medical books in preparation for a novel I'm writing that includes Luke the Physician, but I couldn't get them online because the English versions are still in copyright and I can't read hieroglyphics, which doesn't matter because they aren't on line in hieroglyphics either, so I had to get them through ILL. What have I ever done to you to make you want to bite my head off every time I post? I can't help being autistic. I was born autistic. You can help being a walking, talking, grouch box. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041121/b5a17e8f/attachment.html From Gutenberg9443 at aol.com Mon Nov 22 06:38:11 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Nov 22 06:42:42 2004 Subject: [gutvol-d] documenting etexts (without flaming) Message-ID: <9.37cba20c.2ed353d3@aol.com> In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time, traverso@dm.unipi.it writes: >>- assume that you want to quote a book that you have >>only in a paper >> edition; to quote it in your style, you need to manually >>count the >> paragraphs, both when quoting and when checking a quotation; >> wouldn't the standard way of quoting pages of a >>reference edition >> (usually, the only edition) be better? >>Of course, you said: >> Anne> Would not this serve most purposes? >>Yes, most maybe, but not all. And most is not enough. >>You said also >> Anne> to other texts as well. (I hope my underlining shows up in >> Anne> all email.) >>No, it doesn't. Here too you are assuming that other >>people use the >>>>same tools that you use. A good tool is one that adapts itself to an >>unknown situation, and does not make assumptions. >>Discarding page >>numbers in reference works makes assumptions on >>other people's working ??methods; the result is a less flexible tool. There's another problem here, that YOU are missing, and it is this: PG does not have control of all etexts. Whether the total number of free etexts online is 40,000, as I estimate, or 100,000, as Michael estimates, the fact remains that PG does not have all etexts, or a majority of etexts, or even a plurality of etexts. I keep track of every etext site I hear of, and I check all of them out. Only those few that post page scans, and there are very few of them, make original page numbers available. If I am looking for a book and I can find only page scans of it, I won't download it unless I desperately want it and can't find it anywhere else, because I don't like to fiddle around with putting the pages together to read. Some years ago I wanted a specific edition of the Qur'an, and had to download it sura by sura. It took me a lot more hours to put it together than I wanted to expend on that task. So a documentation method that works only for page scans and/or full texts that include page numbers is unusable for more texts than it is usable for Also, I don't want to say that all, or even most, reference books come in only one edition. My Oxford Guide to American Literature is fifth edition, and I'm almost certain there's now a sixth edition available. My Granger's Index to Poetry is eighth edition and I think it is two editions old; I know it is at least one. My Larousse English/German dictionary is dated 2000 and MIGHT be current, except for the fact that it uses the "new" German spelling, and I think I read online that the "old" spelling is back in use. Most astronomy, physics, biology, geography, and geology texts are out of date by the time they roll off the press, and by the time they make their way online they are so hideously out of date that anyone relying on them would be in trouble. Of the solutions proposed, the one I like best is the suggestion that the person doing the paper could include with it the URL of, or a link to, the specific reference book used, and to make sure it doesn't change, that person should put the source on his or her own Website and link to it there. But even THAT won't work for purchased ebooks. I think we'll probably flounder around for another ten to twenty years before a workable permanent solution is devised. But all the flaming and/or condescension in the world isn't going to help a bit. I apologize for my flaming yesterday. I try not to blow up but sometimes I do it anyway. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/983092bc/attachment-0001.html From joshua at hutchinson.net Mon Nov 22 11:05:49 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Nov 22 11:05:56 2004 Subject: [gutvol-d] documenting etexts Message-ID: <20041122190549.B40EC109939@ws6-4.us4.outblaze.com> ----- Original Message ----- From: Gutenberg9443@aol.com > > In a message dated 11/21/2004 2:07:20 AM Mountain Standard Time, > traverso@dm.unipi.it writes: > > >>Here too you are assuming that other people use the > >>same tools that you use. A good tool is one that adapts >>itself to an > >>unknown situation, and does not make assumptions. >>Discarding page > >>numbers in reference works makes assumptions on >>other people's working > >methods; the result is a less flexible tool. > > > > I didn't assume. I asked. Some ISPs carry formatting over and some don't. I > have no idea what ISP someone I don't know is using or what that ISP might do > with formatting. Actually, the ISP has nothing to do with it... What shows up is dependent on what program you are using to read the e-mail. Just a quick FYI. > > A good tool is one that can be used for at least 50 things besides the one > it is designed for. You can use two bricks to kill a fly, if you can figure out > how to make the fly stay on the bottom brick long enough for you to clap the > top brick on top of it. You can use bricks and boards to make a bookcase. > You can make a street out of bricks. Go away and think of 47 other uses for > bricks. Then talk to me about tools. > > I am not recommending discarding page numbers in reference books. I am > suggesting that the majority of books already posted do not necessitate going back > and redoing to insert the page numbers. > Well, you can kill that same fly by running over it with a semi truck ... but that doesn't make either one a GOOD tool for the job. Carlos' point (which was worded nicely despite your reaction to it) is that you have to create a system that works despite not knowing the exact environment it will be used in. This is why so many people had problems with bowerbird's ZML viewer. It required everyone to be using a specific reader program that you simply cannot guarantee will be in use. > It's like those old Tom Swift books I was recently accused of reading Unless I missed a message somewhere ... someone was using the Tom Swifts as an example of a type of book for a particular point. It was not a listing of what you read or don't read. > > But do enough people want to write learned papers on Tom Swift, or Tarzan, > or Elsie Dinsmore, or The Wizard of Oz, for it to be reasonable for me to > demand that all the Tom Swift, Tarzan, Elsie Dinsmore, and Wizard of Oz books to > be pulled down until somebody has time to rescan them and keep all the page > numbers this time? > No one has ever said that (unless, again, I missed a message). Many people have said that they will need to be redone at some future point to put that information back in. (Jon Noring is the biggest proponent of this.) > By the way, since so many people seem to know better than I do what I'm > reading at present, I'll save them the trouble of guessing. This type of wording is what starts flame wars. And it is coming from your side. Please calm down a little here. No one has tried to start a flame war, but I can see people getting defensive in reply to your recent messages and it will lead to some things being said that probably shouldn't be. > > What have I ever done to you to make you want to bite my head off every time > I post? I can't help being autistic. I was born autistic. You can help being > a walking, talking, grouch box. > > Anne > All I can tell you, Anne, is that Carlos did NOT bite your head off. Rather, he explained the fallacies he saw in your argument. Carlos is actually one of the more even tempered folks around here. He won't hold back on pointing out things he disagrees with, but I've never seen him be a "walking, talking, grouch box." Josh From Gutenberg9443 at aol.com Mon Nov 22 12:21:10 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Nov 22 12:21:26 2004 Subject: [gutvol-d] documenting etexts Message-ID: <45.1bdb2f1d.2ed3a436@aol.com> In a message dated 11/22/2004 12:06:08 PM Mountain Standard Time, joshua@hutchinson.net writes: Rather, he explained the fallacies he saw in your argument. I have no objections to having fallacies pointed out; however, I had made no assumptions. I had made a SUGGESTION and ASKED FOR COMMENT. I had thought about the situation for some time before I was ready to put forth the suggestion. Therefore, condescendingly telling me I had made incorrect assumptions was maddening. I shall now explain why I almost never make assumptions. When I first became a crime scene technician, my boss would never allow me to say a substance was blood. I had to say "a red fluid which appeared to be blood." Even if somebody is lying on the floor with a shotgun blast through his chest, he is lying in a pool of "a red fluid which appears to be blood." I couldn't understand why I had to do this, until the day that my boss and I were trailing an injured murderer down an alley by the places he had stopped to bleed. The last blood spatter was in the middle of a blind alley with no doors opening onto it and a wall too high for an injured person to climb. This made no sense at all to us. There was nowhere for him to go from there. Nevertheless, a sample was taken from each splotch. When the lab report came back, we learned that the last spatter was brake fluid. We had lost him on the street, at the end of the alley, where he apparently got into a car. I don't KNOW that he got into a car. He might have gotten into a truck or onto a motorcycle or bicycle. He might have gotten into a flying saucer. It APPEARED that he had gotten into a car. I cannot ASSUME what he did. I wasn't there. I didn't see it. Therefore I rarely make assumptions. I asked whether my suggestion would work. I have no problem at all with being told that it would not work. It was the condescending attitude, in this case and in the "old Tom Swift books," post, that was like a red flag to a bull. And you have apparently missed some posts, because this is the third time in less than a month that Carlo has dropped on me like a ton of lead, assuming I have assumptions that I do not have. The first two times I laboriously explained what I was saying and why and how I had not meant what he assumed I meant. This time he got me on a day when I was ill and already crabby, and I bit back. I wished that I had not done so two seconds after I sent it, but I couldn't unsend it. As to ISPs and programs, you explained without acting as if I had an IQ of minus thirty. Thank you. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/6b52e387/attachment.html From joshua at hutchinson.net Mon Nov 22 13:03:17 2004 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Nov 22 13:03:27 2004 Subject: [gutvol-d] documenting etexts Message-ID: <20041122210317.6FEEE4F52D@ws6-5.us4.outblaze.com> Something just occured to me here ... Did you realize that Carlos (while very articulate in English) is not a native english speaker? Maybe that explains why you are reading more into what Carlos is writing than I am getting from it. Also, it is not out of line to interpret "it should underline" as an assumption of sorts. Josh ----- Original Message ----- From: Gutenberg9443@aol.com To: gutvol-d@lists.pglaf.org Subject: Re: [gutvol-d] documenting etexts Date: Mon, 22 Nov 2004 15:21:10 EST > > > In a message dated 11/22/2004 12:06:08 PM Mountain Standard Time, > joshua@hutchinson.net writes: > > Rather, he explained the fallacies he saw in your argument. > > > I have no objections to having fallacies pointed out; > however, I had made no assumptions. I had > made a SUGGESTION and ASKED FOR COMMENT. > I had thought about the situation for some time before > I was ready to put forth the suggestion. Therefore, condescendingly telling > me I had made > incorrect assumptions was maddening. I shall now > explain why I almost never make assumptions. > > When I first became a crime scene technician, > my boss would never allow me to say a substance > was blood. I had to say "a red fluid which > appeared to be blood." Even if somebody is > lying on the floor with a shotgun blast through > his chest, he is lying in a pool of "a red > fluid which appears to be blood." I couldn't > understand why I had to do this, until the day > that my boss and I were trailing an injured murderer > down an alley by the places he had stopped > to bleed. The last blood spatter was in the > middle of a blind alley with no doors opening onto > it and a wall too high for an injured person to > climb. This made no sense at all to us. There > was nowhere for him to go from there. Nevertheless, > a sample was taken from each splotch. When > the lab report came back, we learned that the last > spatter was brake fluid. We had lost him on the street, > at the end of the alley, where he apparently got into a car. > > I don't KNOW that he got into a car. He might > have gotten into a truck or onto a motorcycle > or bicycle. He might have gotten into a flying > saucer. It APPEARED that he had gotten into > a car. I cannot ASSUME what he did. I wasn't > there. I didn't see it. > > Therefore I rarely make assumptions. > > I asked whether my suggestion would work. I > have no problem at all with being told that it would > not work. It was the condescending attitude, in this > case and in the "old Tom Swift books," post, that > was like a red flag to a bull. And you have apparently > missed some posts, because this is the > third time in less than a month that Carlo > has dropped on me like a ton of lead, > assuming I have assumptions that I do > not have. The first two times I > laboriously explained what I was saying > and why and how I had not meant what > he assumed I meant. This time he got > me on a day when I was ill and already > crabby, and I bit back. I wished that I > had not done so two seconds after I > sent it, but I couldn't unsend it. > > As to ISPs and programs, you explained > without acting as if I had an IQ of minus > thirty. Thank you. > > Anne > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From traverso at dm.unipi.it Mon Nov 22 13:39:49 2004 From: traverso at dm.unipi.it (Carlo Traverso) Date: Mon Nov 22 13:40:20 2004 Subject: [gutvol-d] documenting etexts In-Reply-To: <45.1bdb2f1d.2ed3a436@aol.com> (Gutenberg9443@aol.com) References: <45.1bdb2f1d.2ed3a436@aol.com> Message-ID: <200411222139.iAMLdn9W024656@posso.dm.unipi.it> Dear Anne, I apologize if you have felt any animosity towards you. It was not meant. As you know, I am not a native speaker of english. Maybe it is just a misunderstanding. You say: > And you have apparently > missed some posts, because this is the > third time in less than a month that Carlo > has dropped on me like a ton of lead, > assuming I have assumptions that I do > not have. The first two times I > laboriously explained what I was saying > and why and how I had not meant what > he assumed I meant. Sorry, I should have missed them too, I don't remember having answered posts of yours recently, and I have not found one in the last year of gutvol-d. Of course I often disagree with you, our point of view is different. Another coincidence might have been bad in my post: I had answered much earlier, but the mail was delayed by the mailer, and arrived when the post was already answered (you can check in the header, I answered on saturday, and arrived late on sunday). So my post seemed to be insisting on a point that was already discussed enough. I would not have sent it on sunday. Carlo From Gutenberg9443 at aol.com Mon Nov 22 15:27:36 2004 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Mon Nov 22 15:27:54 2004 Subject: [gutvol-d] documenting etexts Message-ID: <144.39631f99.2ed3cfe8@aol.com> In a message dated 11/22/2004 2:03:48 PM Mountain Standard Time, joshua@hutchinson.net writes: Something just occured to me here ... Did you realize that Carlos (while very articulate in English) is not a native english speaker? Yes, I know that. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20041122/eaff83fa/attachment.html From jmdyck at ibiblio.org Wed Nov 24 11:58:59 2004 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed Nov 24 11:59:24 2004 Subject: [gutvol-d] "We're still keeping up with Moore's Law!" Message-ID: <41A4E803.39FDF18A@ibiblio.org> In today's gweekly Pt 1, Michael Hart wrote: > > We're still keeping up with Moore's Law! > Moore's Law 18 month percentage = 115% > Moore's Law 12 month percentage = 67% I don't understand how these percentages were calculated. 18 months ago (May 24, 2003), the "TOTAL COUNT" (incl PG Australia) was about 8044 (interpolating between 8021 on May 21 and 8075 on May 28), so doubling every 18 months would land us at 2 * 8044 = 16088 today. Today's actual count is 14,484, or 90% of the predicted amount. 12 months ago (Nov 24, 2003), the total count was about 10,517 (interpolating between 10,396 on Nov 19 and 10,565 on Nov 26). Doubling every 18 months means multiplying by 2^(12/18) = 1.5874 every 12 months, which would predict a count of 16695 today. Our actual count of 14,484 is 87% of that. So where do 115% and 67% come from? -Michael From stephen.thomas at adelaide.edu.au Wed Nov 24 17:52:53 2004 From: stephen.thomas at adelaide.edu.au (Steve Thomas) Date: Wed Nov 24 17:53:14 2004 Subject: [gutvol-d] Linking to page images Message-ID: <41A53AF5.9020208@adelaide.edu.au> Regarding the question of whether and how we should provide links to page scans, here's one idea which may have potential: Etext number 10072, "English Housewifery Exemplified", by Elizabeth Moxon, was produced from scans from Biblioteca de la Universitat de Barcelona (it says so at the top of the text). It happens that Biblioteca de la Universitat de Barcelona makes their scans available on the net, for free (apparently -- my Spanish is non-existent; might even be Catalan?) So I've added a note for this text ni the catalog, which reads: " Produced from page images available from Biblioteca de la Universitat de Barcelona, at http://www.bib.ub.es/grewe/showbook.pl?gw58 " thus providing a link to the actual page scans for anyone interested. Is this a model to be followed? Of course, this raises the question of who is to add this information to the catalogue -- the link was not provided in the etext, I had to go look for it. Also, the link is not clickable. Ideally, you'd want the link to be active, but there's no place for this in the present catalog design. Steve -- Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Phone: +61 8 830 35190 Fax: +61 8 830 34369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From hart at pglaf.org Thu Nov 25 08:24:53 2004 From: hart at pglaf.org (Michael Hart) Date: Thu Nov 25 08:24:55 2004 Subject: [gutvol-d] "We're still keeping up with Moore's Law!" In-Reply-To: <41A4E803.39FDF18A@ibiblio.org> References: <41A4E803.39FDF18A@ibiblio.org> Message-ID: On Wed, 24 Nov 2004, Michael Dyck wrote: > In today's gweekly Pt 1, Michael Hart wrote: >> >> We're still keeping up with Moore's Law! >> Moore's Law 18 month percentage = 115% >> Moore's Law 12 month percentage = 67% > > I don't understand how these percentages were calculated. > > 18 months ago (May 24, 2003), the "TOTAL COUNT" (incl PG Australia) was > about 8044 (interpolating between 8021 on May 21 and 8075 on May 28), > so doubling every 18 months would land us at 2 * 8044 = 16088 today. > Today's actual count is 14,484, or 90% of the predicted amount. > > 12 months ago (Nov 24, 2003), the total count was about 10,517 > (interpolating between 10,396 on Nov 19 and 10,565 on Nov 26). > Doubling every 18 months means multiplying by 2^(12/18) = 1.5874 > every 12 months, which would predict a count of 16695 today. > Our actual count of 14,484 is 87% of that. > > So where do 115% and 67% come from? These come from one of Brett's programs. . .I've asked him to check on them a few times, as I agree with you that the figures don't look right. I forwarded him this to encourage some rechecking. ;-) Happy Thanksgiving! Michael From nihil_obstat at mindspring.com Fri Nov 26 10:09:53 2004 From: nihil_obstat at mindspring.com (Dennis McCarthy) Date: Fri Nov 26 10:10:03 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? Message-ID: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> A technical question: Exactly what characters make up 7-bit ascii? I presume it is 128 (2 to the 7th power). So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text. Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124). Generally, are the following (Alt+0000 thro' Alt+0127) always okay? ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  --------------------------- Dennis McCarthy nihil_obstat@mindspring.com From jtinsley at pobox.com Fri Nov 26 10:25:34 2004 From: jtinsley at pobox.com (Jim Tinsley) Date: Fri Nov 26 10:25:41 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: <20041126182534.GB8717@panix.com> On Fri, Nov 26, 2004 at 01:09:53PM -0500, Dennis McCarthy wrote: > >A technical question: > >Exactly what characters make up 7-bit ascii? I presume it is 128 (2 to the 7th power). So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text. > >Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124). > >Generally, are the following (Alt+0000 thro' Alt+0127) always okay? > ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  > Depends what you mean by "okay". Anything in the range 32 (space) through 126 (tilde) is definitely OK, and common. Your specific character, 124, is commonly used by people who want to create a box-like layout. Below 32, chars 10 and 13 (LF and CR) are definitely necessary at the end of every line, but character 9 (Tab) is specifically discouraged, because of the undefined effect it may have on different viewing or editing programs. Other characters below space (32) . . . well, I imagine someone could come up with a useful reason to use one or more of them, in some special situation, but I can't think of one right now. Ditto 127, whose only reason for existence is to delete another character. jim From hyphen at hyphenologist.co.uk Fri Nov 26 11:28:14 2004 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Nov 26 11:28:34 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: On Fri, 26 Nov 2004 13:09:53 -0500 (GMT-05:00), Dennis McCarthy wrote: | | A technical question: | | Exactly what characters make up 7-bit ascii? I presume it is 128 (2 to the 7th power). So logically any character I can generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is kosher in a 7-bit ASCII text. | | Specifically I want to know if I can us "|" (the character made by hitting Shift+backslash on a standard US keyboard, or Alt+0124). | | Generally, are the following (Alt+0000 thro' Alt+0127) always okay? | ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  Not everyone used Windoze there are some 500 8 bit character sets in use, but many are obsolete/obsolescent. See http://www.asciitable.com/ Decimal 0 to 31 are control characters and unusable in text. They may be many other things besides control characters. Decimal 32 is space. Decimal 33 to 126 are usable in 7 bit ASCII text as listed in the URL Decimal 127 is unusable. Decimal 128 and above may be absolutely anything, to use these one must state which of the 500 character sets you are using. One persons standard sends the next person insane :-( -- Dave F From stephen.thomas at adelaide.edu.au Fri Nov 26 18:51:01 2004 From: stephen.thomas at adelaide.edu.au (Stephen Thomas) Date: Fri Nov 26 18:52:35 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: <1101523861.41a7eb95a9b5e@pandani.services.adelaide.edu.au> If you have a standard US keyboard, then any of the keys you can hit (excluding the function keys), with or without the shift key, are ASCII. (ASCII also includes "control" characters, 0000 thru 0031, which you won't see, and won't want to enter anyway.) The "|" (vertical bar or pipe) is certainly legit ASCII. Steve Quoting Dennis McCarthy : > > A technical question: > > Exactly what characters make up 7-bit ascii? I presume it is > 128 (2 to the 7th power). So logically any character I can > generate by typing Alt+0000 thro' Alt+0127 (in MS Windows) is > kosher in a 7-bit ASCII text. > > Specifically I want to know if I can us "|" (the character > made by hitting Shift+backslash on a standard US keyboard, or > Alt+0124). > > Generally, are the following (Alt+0000 thro' Alt+0127) always > okay? > ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < > = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ > \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z > { | } ~  > > > > --------------------------- > Dennis McCarthy > nihil_obstat@mindspring.com > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > Stephen Thomas, Senior Systems Analyst, University of Adelaide Library UNIVERSITY OF ADELAIDE SA 5005 AUSTRALIA Tel: +61 8 8303 5190 Fax: +61 8 8303 4369 Email: stephen.thomas@adelaide.edu.au URL: http://staff.library.adelaide.edu.au/~sthomas/ CRICOS Provider Number 00123M ----------------------------------------------------------- This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From sly at victoria.tc.ca Sat Nov 27 00:34:59 2004 From: sly at victoria.tc.ca (Andrew Sly) Date: Sat Nov 27 00:35:22 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? In-Reply-To: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: On Fri, 26 Nov 2004, Dennis McCarthy wrote: > Exactly what characters make up 7-bit ascii? > > Generally, are the following (Alt+0000 thro' Alt+0127) always okay? One thing I might add to this discussion is a matter of semantics. The character _`_ does belong to ascii (96-decimal, 60-hex) However it is a spacing grave accent mark, and not an opening single quote, which you may sometimes see it used for in text files. A while ago, I had the address of a web page which explained in detail why ASCII-96 should not be used as an opening single quote, but I can't find it now. Andrew From ke at gnu.franken.de Sun Nov 28 06:17:26 2004 From: ke at gnu.franken.de (Karl Eichwalder) Date: Sun Nov 28 07:39:35 2004 Subject: [gutvol-d] Re: 7-bit ASCII, how many characters? In-Reply-To: (Andrew Sly's message of "Sat, 27 Nov 2004 00:34:59 -0800 (PST)") References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: Andrew Sly writes: > A while ago, I had the address of a web page which explained > in detail why ASCII-96 should not be used as an opening single > quote, but I can't find it now. Search for "Markus" and "Kuhn" and "quote": http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html -- http://www.gnu.franken.de/ke/ | ,__o | _-\_<, | (*)/'(*) Key fingerprint = F138 B28F B7ED E0AC 1AB4 AA7F C90A 35C3 E9D0 5D1C From hart at pglaf.org Sun Nov 28 07:43:49 2004 From: hart at pglaf.org (Michael Hart) Date: Sun Nov 28 07:43:51 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? In-Reply-To: References: <27742996.1101492594159.JavaMail.root@wamui07.slb.atl.earthlink.net> Message-ID: On Sat, 27 Nov 2004, Andrew Sly wrote: > > > On Fri, 26 Nov 2004, Dennis McCarthy wrote: > >> Exactly what characters make up 7-bit ascii? >> >> Generally, are the following (Alt+0000 thro' Alt+0127) always okay? > > One thing I might add to this discussion is a matter of semantics. > The character _`_ does belong to ascii (96-decimal, 60-hex) > However it is a spacing grave accent mark, and not an opening > single quote, which you may sometimes see it used for in > text files. > > A while ago, I had the address of a web page which explained > in detail why ASCII-96 should not be used as an opening single > quote, but I can't find it now. Since French doesn't really USE the ` with the spacing, other than in cases where we usually would use _`_ or " ' " etc, it is really somewhat of a moot point. In addition, if this really had been intended to be a French accent grave, why is the "_" between it and the " ^ " which could be the French accent cironflex. . .not to mention the lack of an accent aigue, etc. . . . 87 57 W 88 58 X 89 59 Y 90 5A Z 91 5B [ 92 5C \ 93 5D ] 94 5E ^ <<< 95 5F _ 96 60 ` <<< 97 61 a 98 62 b 99 63 c Michael From shalesller at writeme.com Sun Nov 28 17:04:10 2004 From: shalesller at writeme.com (D. Starner) Date: Sun Nov 28 17:04:22 2004 Subject: [gutvol-d] 7-bit ASCII, how many characters? Message-ID: <20041129010410.E14A14BDAA@ws1-1.us4.outblaze.com> "Michael Hart" writes: > On Sat, 27 Nov 2004, Andrew Sly wrote: > > A while ago, I had the address of a web page which explained > > in detail why ASCII-96 should not be used as an opening single > > quote, but I can't find it now. It's ; basically, the grave and the Latin-1/Unicode acute accent (U+00B4) should be balanced and are in most modern fonts, > In addition, if this really had been intended to be a French > accent grave, why is the "_" between it and the " ^ " which > could be the French accent cironflex. . .not to mention the > lack of an accent aigue, etc. . . . There's no particular reason for the sorting, but the ' originally leaned right, which not only made `quotes' look right, but make it possible to be used as an acute accent, both designed to be backspaced over the character. If you used the " as diaresis and , as cedilla, you have German and all the Romantic languages handled. The use of backspace in this manner disappeared after ASCII was standardized, making the ASCII collection a little unusual. -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm From jon at noring.name Mon Nov 29 18:32:58 2004 From: jon at noring.name (Jon Noring) Date: Mon Nov 29 18:33:41 2004 Subject: [gutvol-d] On quote-like marks... Message-ID: <180519400796.20041129193258@noring.name> Regarding the recent discussion about ASCII and the single/double quote marks (and what to use), I have my two cents to add (and those here who are much more expert at character sets and Unicode than I am will undoubtely be able to add to this.) The situation regarding single and double quote-like marks is even more complicated than what it has been presented so far. It has an impact on the future expanded use of PG texts as envisioned by Michael Hart and others, such as text-to-speech and language conversion. So I believe it needs to be dealt with in a more standardized-fashion (that is, don't simply use the straight keyboard ' and " for everything under the sun.) Quote-like marks are used for multiple purposes in texts -- especially single quote-like marks. And then there are the "curly" types of marks used in typographical presentation. Here's a (probably) partial list of their multiple uses: 1) For marking up quotations (other conventions are also used) 2) Word contractions (e.g., "we're" for "we are") 3) Possessives ("the Emperor's crown") 4) Non-breaking character modifiers (see below) 5) Minutes and second of time and arc. (50d3'25") 6) Feet and inches unit indicator (She is 5'7" tall) 7) Other mathematical symbol and unit measurement uses. Item (4) is particularly interesting since I'm working on cleaning up Burton's "1001 Arabian Nights Tales", and in it there are many Arabic names where, when Burton converted to Latin script, single quote-like marks were inserted to indicate a type of non-breaking character modifier for pronounciation purposes. For example: Ja'afar. This semantically differs from the apostrophes used for contractions/ possessives -- or at least semantically different enough (imho) that warrants differentiation in character encoding/entities. In the XML markup of the Arabian Nights, I've chosen to use the following Unicode character conventions to keep everything straight. It's not what I necessarily propose PG/DP do, but it indicates one possible approach. Since at present I do not enclose quotations in (for example), I keep in the quotation marks (double and single) to identify quotations. In the Arabian Nights I find some odd quotation passages, a couple of which start in the middle of one paragraph and end in the middle of another paragraph later within a story, so adding ... would result in non-well-formed XML (I could use the "mile marker" approach as defined in TEI, but for the Arabian Nights have chosen not to.) 1) For quotations using double quote marks, I use the Unicode left-double quote mark for the beginning, and the right-double quote mark for the ending: “ and ”, respectively. (The "curly" quotes -- for those who don't like curly quotes for reading, it is trivial to convert them to straight keyboard quote marks, but going the other way is more difficult to do.) 2) For quotations using single quote marks, I use the Unicode left-single quote mark for the beginning, and the right-single quote mark for the ending: ‘ and ’, respectively. 3) For the non-breaking character modifier as described above, e.g. for "Ja'afar", I use the Unicode character specific for this purpose: ʼ 4) For word contractions and possessives I use the ordinary lower- ASCII single straight quote mark: ' (For later presentational purposes this character can always be converted to the right-single "curly" quote mark.) For use of ' and " for minutes and seconds of arc, and feet/inches, there are special Unicode code points for these (I don't see this usage in the Arabian Nights footnotes, but maybe I'll encounter it somewhere, not having finished the 5000+ footnotes.) If one is working with plain text files (not XML), the above Unicode characters can be encoded at the bit-level using UTF-8 or UTF-16 encoding. Jon Noring From jeroen at bohol.ph Tue Nov 30 14:05:43 2004 From: jeroen at bohol.ph (Jeroen Hellingman) Date: Tue Nov 30 14:04:42 2004 Subject: [gutvol-d] On quote-like marks... In-Reply-To: <180519400796.20041129193258@noring.name> References: <180519400796.20041129193258@noring.name> Message-ID: <41ACEEB7.8080403@bohol.ph> When I prepare TEI versions of my texts, I normally use the following: “ ‘ ” ’ for quotation marks (in English, LOTE has even more variants) ' for apostrophe, including those used in the possesive, as they are the same. ′ ″ for minutes and seconds (and even ‴ for tripple primes) For works using Arabic, I also use &ayn;, etc., to represent those Arabic letters, if they are thus represented in Roman script. I map these entities to Unicode for HTML versions, and to nearest ASCII equivalents in plain vanilla. I avoid ... for the reasons you mention. If you need help with validation / transforms, etc., drop me a note... Jeroen. Jon Noring wrote: >Regarding the recent discussion about ASCII and the single/double >quote marks (and what to use), I have my two cents to add (and those >here who are much more expert at character sets and Unicode than I am >will undoubtely be able to add to this.) > >The situation regarding single and double quote-like marks is even >more complicated than what it has been presented so far. It has an >impact on the future expanded use of PG texts as envisioned by Michael >Hart and others, such as text-to-speech and language conversion. So I >believe it needs to be dealt with in a more standardized-fashion (that >is, don't simply use the straight keyboard ' and " for everything >under the sun.) > >Quote-like marks are used for multiple purposes in texts -- especially >single quote-like marks. And then there are the "curly" types of marks >used in typographical presentation. > >Here's a (probably) partial list of their multiple uses: > >1) For marking up quotations (other conventions are also used) >2) Word contractions (e.g., "we're" for "we are") >3) Possessives ("the Emperor's crown") >4) Non-breaking character modifiers (see below) >5) Minutes and second of time and arc. (50d3'25") >6) Feet and inches unit indicator (She is 5'7" tall) >7) Other mathematical symbol and unit measurement uses. > >Item (4) is particularly interesting since I'm working on cleaning up >Burton's "1001 Arabian Nights Tales", and in it there are many Arabic >names where, when Burton converted to Latin script, single quote-like >marks were inserted to indicate a type of non-breaking character >modifier for pronounciation purposes. For example: Ja'afar. This >semantically differs from the apostrophes used for contractions/ >possessives -- or at least semantically different enough (imho) that >warrants differentiation in character encoding/entities. > >In the XML markup of the Arabian Nights, I've chosen to use the >following Unicode character conventions to keep everything straight. >It's not what I necessarily propose PG/DP do, but it indicates one >possible approach. Since at present I do not enclose quotations in >(for example), I keep in the quotation marks (double and single) to >identify quotations. In the Arabian Nights I find some odd quotation >passages, a couple of which start in the middle of one paragraph and >end in the middle of another paragraph later within a story, so adding >... would result in non-well-formed XML (I could use the "mile >marker" approach as defined in TEI, but for the Arabian Nights have >chosen not to.) > >1) For quotations using double quote marks, I use the Unicode > left-double quote mark for the beginning, and the right-double > quote mark for the ending: “ and ”, respectively. > > (The "curly" quotes -- for those who don't like curly quotes for > reading, it is trivial to convert them to straight keyboard quote > marks, but going the other way is more difficult to do.) > >2) For quotations using single quote marks, I use the Unicode > left-single quote mark for the beginning, and the right-single > quote mark for the ending: ‘ and ’, respectively. > >3) For the non-breaking character modifier as described above, e.g. > for "Ja'afar", I use the Unicode character specific for this > purpose: ʼ > >4) For word contractions and possessives I use the ordinary lower- > ASCII single straight quote mark: ' (For later presentational > purposes this character can always be converted to the right-single > "curly" quote mark.) > >For use of ' and " for minutes and seconds of arc, and feet/inches, >there are special Unicode code points for these (I don't see this >usage in the Arabian Nights footnotes, but maybe I'll encounter it >somewhere, not having finished the 5000+ footnotes.) > >If one is working with plain text files (not XML), the above Unicode >characters can be encoded at the bit-level using UTF-8 or UTF-16 >encoding. > > >Jon Noring > >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > From stuart at ww1aviationlinks.cjb.net Tue Nov 30 17:29:08 2004 From: stuart at ww1aviationlinks.cjb.net (stuart) Date: Tue Nov 30 17:29:22 2004 Subject: [gutvol-d] request for input (first timer here) Message-ID: <20041130172908.237d8b64.stuart@ww1aviationlinks.cjb.net> I am starting on a project to convert Jane's All The World's Aircraft 1919 to an ebook, any suggestions welcome. If interested, take a look at http://ww1aviationlinks.cjb.net/janes/index.html