From Bowerbird at aol.com Thu Jan 5 13:42:08 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jan 5 13:42:27 2006 Subject: [gutvol-d] 36,000 and counting Message-ID: <82.35f5d394.30eeecb0@aol.com> distributed proofreaders just hit 36,000 registered users. congratulations! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060105/35e97f53/attachment.html From charlzzf at heritagewifi.com Thu Jan 5 18:39:25 2006 From: charlzzf at heritagewifi.com (Charles Franks) Date: Thu Jan 5 18:47:23 2006 Subject: [gutvol-d] 36,000 and counting In-Reply-To: <82.35f5d394.30eeecb0@aol.com> Message-ID: Actually, due to a previous programmer removing inactive usernames from the database the count is much higher...37,437 at the moment. The way to figure out the correct number is to go to the forums and at the bottom of the main page and hover over the "The newest registered user is" link and look for the &u=(some number). That number will be the 'correct' number of users. Apparently their code for the "We have 36001 registered users" line actually counts lines in the user table versus looking at the highest userid number in use. Thanks though! Charles Franks Founder, Distributed Proofreaders -----Original Message----- From: gutvol-d-bounces@lists.pglaf.org [mailto:gutvol-d-bounces@lists.pglaf.org]On Behalf Of Bowerbird@aol.com Sent: Thursday, January 05, 2006 2:42 PM To: gutvol-d@lists.pglaf.org; Bowerbird@aol.com Subject: [gutvol-d] 36,000 and counting distributed proofreaders just hit 36,000 registered users. congratulations! -bowerbird From Bowerbird at aol.com Thu Jan 5 19:32:34 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Thu Jan 5 19:32:55 2006 Subject: [gutvol-d] 36,000 and counting Message-ID: <1c5.37e058be.30ef3ed2@aol.com> charles said: > Thanks though! no, thank _you_. it is a remarkable achievement to have motivated so many to enlist in the cause, with a good number (in the hundreds!) devoting _significant_ time and energy -- 10-40 hours a week! -- from busy lives... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060105/83a49dd9/attachment.html From jon at noring.name Thu Jan 5 20:10:32 2006 From: jon at noring.name (Jon Noring) Date: Thu Jan 5 20:10:53 2006 Subject: [gutvol-d] 36,000 and counting In-Reply-To: References: <82.35f5d394.30eeecb0@aol.com> Message-ID: <612299901.20060105211032@noring.name> Charles wrote: > Bowerbird wrote: >> distributed proofreaders just hit 36,000 registered users. > Actually, due to a previous programmer removing inactive usernames from the > database the count is much higher...37,437 at the moment. The way to figure > out the correct number is to go to the forums and at the bottom of the main > page and hover over the "The newest registered user is" link and look for > the &u=(some number). That number will be the 'correct' number of users. > > Apparently their code for the "We have 36001 registered users" line actually > counts lines in the user table versus looking at the highest userid number > in use. If the number of users was 1/10 of that, it would be a remarkable achievement. But a number rapidly approaching 40,000 is hard to fathom. A remarkable achievement! Kudos to Charles for founding DP, and for Juliet and many of the others who keep the system going. And of course a lot of praise to all the ordinary folk who, page-by-page, proof the scan sets. (I need to revisit DP and do a few pages myself.) Maybe it's time to hold an annual DP picnic. Considering the number of people, it probably needs to be a potluck. Can you imagine having to buy and prepare 40,000 hot dogs? Jon From tstowell at chattanooga.net Sat Jan 7 13:49:58 2006 From: tstowell at chattanooga.net (Tim Stowell) Date: Sat Jan 7 16:59:29 2006 Subject: [gutvol-d] 36,000 and counting In-Reply-To: <612299901.20060105211032@noring.name> References: <82.35f5d394.30eeecb0@aol.com> Message-ID: <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net> At 09:10 PM 1/5/06 -0700, Jon wrote: >Charles wrote: >A remarkable achievement! Kudos to Charles for founding DP, and for >Juliet and many of the others who keep the system going. And of course >a lot of praise to all the ordinary folk who, page-by-page, proof the >scan sets. (I need to revisit DP and do a few pages myself.) > >Maybe it's time to hold an annual DP picnic. Considering the number of >people, it probably needs to be a potluck. Can you imagine having to >buy and prepare 40,000 hot dogs? > >Jon What is DP? Tim no hot dogs thanks From ajhaines at shaw.ca Sat Jan 7 17:28:14 2006 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sat Jan 7 17:28:25 2006 Subject: [gutvol-d] 36,000 and counting References: <82.35f5d394.30eeecb0@aol.com> <3.0.5.32.20060107164958.0254d100@mail.chattanooga.net> Message-ID: <000301c613f2$cbe0ab70$6401a8c0@ahainesp2600> Distributed Proofreaders - http://www.pgdp.net/c/default.php ----- Original Message ----- From: "Tim Stowell" To: "Project Gutenberg Volunteer Discussion" Sent: Saturday, January 07, 2006 1:49 PM Subject: Re: [gutvol-d] 36,000 and counting > At 09:10 PM 1/5/06 -0700, Jon wrote: >>Charles wrote: >>A remarkable achievement! Kudos to Charles for founding DP, and for >>Juliet and many of the others who keep the system going. And of course >>a lot of praise to all the ordinary folk who, page-by-page, proof the >>scan sets. (I need to revisit DP and do a few pages myself.) >> >>Maybe it's time to hold an annual DP picnic. Considering the number of >>people, it probably needs to be a potluck. Can you imagine having to >>buy and prepare 40,000 hot dogs? >> >>Jon > > What is DP? > > Tim > no hot dogs thanks > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jon at noring.name Mon Jan 9 12:38:20 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 12:45:19 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov Message-ID: <1675677963.20060109133820@noring.name> Everyone, In doing some research on ebook history, which naturally will prominently include Project Gutenberg because of the obvious impact PG and Michael Hart has had on etexts and ebooks, I'm trying to reconstruct a fact database which includes the who, what, when, where, why and how of the various seminal events. So, I've just created a specialized Yahoo Group to collect/archive the snippets of facts that come up: http://groups.yahoo.com/group/ebook-history/ You're welcome to join and post any information you know on ebook history. Especially wanted is the pre-1990 period: commercial, academic, and public (free). We should collect this information before it is lost to the mists of time. ***** Anyway, in doing research on what is found in the early Google Groups archive on PG, Bowerbird recommended that I dig through textfiles.com, which appears to have archived a large number of ASCII texts that existed on various BBS systems of the 1980's and early 1990's (its coverage/completeness is unknown, however.) So, focusing on the first "modern" classic book that PG issued, "Alice's Adventures in Wonderland" (text #11, officially released 01 Jan 1991), I dug through textfiles.com to see what they had. The oldest copy I found there was "alice11.txt", Millenium Fulcrum Edition 1.1, dated 1990 (by copyright claim.) See: http://www.textfiles.com/etext/FICTION/alice11.txt That was issued in the very early era when PG was still affiliated with Duncan Research. As an aside, it is interesting to note the huge differences in that early text's boilerplate compared with the present boilerplate. PG has evolved quite a bit in handling the legal aspects (particularly copyright) of its texts, which is to be fully expected. So this text is a nice historical reminder of how far PG has come in the last 16 years. I was hoping, though, to find a much older text version of "Alice". Bowerbird stated his belief that Michael Hart keypunched Alice a lot earlier than 1990/91, but so far I have not found that version, assuming it was distributed and ended up in some BBS or online archive. So, Michael, if you're reading this, did you keypunch "Alice" well before 1990, where did you distribute it, and does a copy exist somewhere? That would truly be a historical work, especially if it was "digiscribed" in the 1970's or early 80's (BBS systems began maturing in the mid- to late-1980's.) A question for others reading this: where else should I search for information on digitized books placed on BBS in the 1980's? Were there others besides Michael who "digiscribed" public domain books and texts in the 1980's and placed them online? (I plan to dig through more of the textfiles area to see what book texts are dated in the 1980's, if any, and who did them.) ***** Another interesting thing I discovered in my research -- and which some of you undoubtedly know about -- is the "Brown Corpus": http://en.wikipedia.org/wiki/Brown_Corpus In the late-1960's, the partial/full texts of a variety of 500 works published in 1961 were keypunched for computer use (a maximum of 2000 words for each work), totalling a little over 1,000,000 words. The purpose was solely for lexicostatistics and not for direct reading. For this purpose the Brown Corpus is quite famous (enough to rate its own wikipedia article.) Only a few years later, in 1971 Michael Hart keypunched into a computer "The Declaration of Independence" for the purpose of electronic "distribution" and direct reading by others, so Michael is, as far as is now known, the first person documented to experiment with electronic distribution of readable, published digital texts. (I plan to contact the Brown Corpus people, if any are still alive, to see if there were experiments at Brown, or elsewhere, on this in the 1960's.) But nevertheless, to see major portions of published texts and books being keypunched and processed by computers in the 1960's is truly remarkable. ***** Another really cool thing, I found a Usenet message from 1987 which, in turn, is a fairly comprehensive description of an Atlantic Monthly article written by Vannevar Bush in July 1945, entitled "As We May Think". It is beyond amazing the insights Vannevar Bush had relevant to ebooks, to elibraries (like PG's) and the role of individuals and volunteers. Again, some of you have probably read Vannevar Bush's article, but for those who haven't... Usenet summary: http://groups.google.com/group/comp.sys.mac.hypercard/msg/660f72a6e3b5f7a2?hl=en& And the actual article is reproduced here: http://www.theatlantic.com/doc/194507/bush ***** Finally, Mark Bernstein, the founder of Eastgate, which in 1987 issued several contemporary hypertext fiction ebooks on floppy disk and CD-ROM, mentioned to me Asimov's mid-50's book "Foundation" where ebooks are implicit. Has anyone read this book and can comment on Asimov's 1950's vision for ebooks? Thanks! Jon Noring From sly at victoria.tc.ca Mon Jan 9 13:38:01 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Mon Jan 9 13:38:33 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <1675677963.20060109133820@noring.name> References: <1675677963.20060109133820@noring.name> Message-ID: Thanks for providing the links, fascinating reading. In answer to one of your questions, The website: http://www.aston.ac.uk/lss/english/02_msc/02_diss/mward.jsp Mentions using alice10.txt, as well as a few other early PG texts, from the the Walnut Creek CD ROM. Any chance of finding one of those still floating around somewhere? Andrew On Mon, 9 Jan 2006, Jon Noring wrote: > Everyone, > > In doing some research on ebook history, which naturally will > prominently include Project Gutenberg because of the obvious impact PG > and Michael Hart has had on etexts and ebooks, I'm trying to > reconstruct a fact database which includes the who, what, when, where, > why and how of the various seminal events. > [snip] From joshua at hutchinson.net Mon Jan 9 13:56:17 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Mon Jan 9 13:56:47 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Message-ID: <20060109215617.70460EE901@ws6-1.us4.outblaze.com> ----- Original Message ----- From: "Jon Noring" > > Finally, Mark Bernstein, the founder of Eastgate, which in 1987 issued > several contemporary hypertext fiction ebooks on floppy disk and > CD-ROM, mentioned to me Asimov's mid-50's book "Foundation" where > ebooks are implicit. Has anyone read this book and can comment on > Asimov's 1950's vision for ebooks? > > Thanks! > > Jon Noring > Asimov mentions "film-books" in many of his works, but it is unclear if they are film strips of text running through a special reader or some kind of interactive medium or perhaps a "moving pictures" version of books. I think he leaves it deliberately vague. NOTE: I'm pulling this from memory as I don't have any of my Asimov books here at work to flip through. Josh From jon at noring.name Mon Jan 9 14:11:34 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 14:12:10 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: <6132705.20060109151134@noring.name> Andrew wrote: > Thanks for providing the links, fascinating reading. You're welcome. I found my quickie search to yield fascinating stuff. Unfortunately, I have little time these days to pursue the level of research I'd like to (which involves talking to a lot of the old-timers by phone, digging through obscure archives, even doing some library research to get paper copies of old articles and books that I can't get online.) > In answer to one of your questions, > > The website: > http://www.aston.ac.uk/lss/english/02_msc/02_diss/mward.jsp > > Mentions using alice10.txt, as well as a few other early > PG texts, from the the Walnut Creek CD ROM. > > Any chance of finding one of those still floating around > somewhere? Interesting. It is unknown whether this is the original one (version 1.0) which appeared before the version 1.1 edition linked in my prior message. This might have been version 10.0, for example (Alice is currently at version 30). Until we find it, we won't know for sure. (The date of the WC CD-ROM is 1997, and I would surmise they would have kept up with the latest PG texts, but then maybe not.) The ultimate authority on whether Michael Hart keyed-in and/or released "Alice" well before 1990 is Michael himself. Hopefully he will reply and clarify matters. Better yet, to point out which BBS had archived it, so we can try to locate a copy with a timestamp. Jon From Bowerbird at aol.com Mon Jan 9 14:35:52 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jan 9 14:36:28 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov Message-ID: <2d0.161067f.30f43f48@aol.com> andrew said: > a few other early PG texts, from the Walnut Creek CD ROM. > Any chance of finding one of those still floating around somewhere? i did some searching for a walnut-creek c.d. a long time back, but got nowhere. but maybe an ebay expert could help you... however, mr. noring already has solid evidence that predates that cd-rom. what he's looking for is even _earlier_ evidence... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/a927b75a/attachment.html From Bowerbird at aol.com Mon Jan 9 14:46:44 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jan 9 14:47:29 2006 Subject: [gutvol-d] my impression Message-ID: <1fc.103cdcc2.30f441d4@aol.com> my impression is that mr. noring would like to knock michael hart down a few notches, and that's why he's doing his "historical research"... still, those of us who were promoting e-books back in the '80s know who was leading the pack. it was michael hart. and not only was he not getting any _credit_ then, he was actually derided as something of a kook... which he _is_, of course, but the kooks are often the people who end up transforming our world... :+) so to try and strip him of his credit here, now that we finally have come to realize his genius, well... it is downright cruel. small-minded and cruel... with due respect to the dreamers who came before and handed to us the _idea_ of electronic-books, including alan kaye, h.g. wells, and douglas adams, there is no question who _invented_ the e-book, by virtue of sitting down and actually entering one: it's michael hart... as one of the greatest inventors who ever lived said, only 1% is inspiration, the other 99% is perspiration. plus michael hart gave us something even better -- the concept of "unlimited distribution" of e-books... compared to the _commercial_ e-book efforts, which somehow noring wants on equal footing, look how many more riches _that_ idea gave us. -bowerbird p.s. along these lines, consider this from alan kaye: > "We're running on fumes technologically today," > he says. "The sad truth is that 20 years or so of > commercialization have almost completely missed > the point of what personal computing is about." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/035b7817/attachment.html From gbnewby at pglaf.org Mon Jan 9 15:38:15 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jan 9 15:38:17 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <1675677963.20060109133820@noring.name> References: <1675677963.20060109133820@noring.name> Message-ID: <20060109233815.GB21426@pglaf.org> > ... > I was hoping, though, to find a much older text version of "Alice". > Bowerbird stated his belief that Michael Hart keypunched Alice a lot > earlier than 1990/91, but so far I have not found that version, > assuming it was distributed and ended up in some BBS or online > archive. I got my first copy of Alice via email in about 1988, or possibly a little earlier (it was no later than June of 1988, because I got it while still at SUNY Albany). Unfortunately the file I got is lost, among some other old emails on 9-track tapes that didn't make one of my institutional transitions. It was the same (if earlier) Millenium Fulcrum edition that was in the PG collection, though at that time I had never heard of PG. (When I arrived at UIUC in 1991, I read a local newspaper article about Michael & PG and looked him up. The rest, as they say, is history!). -- Greg From jon at noring.name Mon Jan 9 15:48:35 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 15:49:15 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <20060109233815.GB21426@pglaf.org> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> Message-ID: <80308709.20060109164835@noring.name> Greg wrote: >> I was hoping, though, to find a much older text version of "Alice". >> Bowerbird stated his belief that Michael Hart keypunched Alice a lot >> earlier than 1990/91, but so far I have not found that version, >> assuming it was distributed and ended up in some BBS or online >> archive. > I got my first copy of Alice via email in about 1988, or possibly > a little earlier (it was no later than June of 1988, because > I got it while still at SUNY Albany). Thanks! This is useful information. I'll re-search in Google and see if someone archived this older version. The version 1.1 I found is dated (copyrighted) 1990, so I assume the version you had may have been the original version 1.0. You might ask Michael to think back when he first placed Alice online. It's a very historical fact. Besides the KJV of the Bible, did Michael post to the Internet before 1990 any other classic works? (I'm not referring to the nine political documents which form PG text #1-9, but more recognized book works like "Alice".) > Unfortunately the file I got is lost, among some other old > emails on 9-track tapes that didn't make one of my institutional > transitions. Understood. I saved very little from the late 1980's when I first got online (Usenet and various BBS). > It was the same (if earlier) Millenium Fulcrum edition that > was in the PG collection, though at that time I had never > heard of PG. Was the early GUTNBERG mailing list archived as well? When was that list formed? The first mention I see of GUTNBERG is January 1990. I don't believe Google has it archived, unfortunately. That would be a treasure trove of information on the early days of PG, and probably will mention, in passing, things that happened before GUTNBERG was started. Thanks again. Jon From gbnewby at pglaf.org Mon Jan 9 16:22:56 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jan 9 16:22:56 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <80308709.20060109164835@noring.name> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> <80308709.20060109164835@noring.name> Message-ID: <20060110002256.GA27181@pglaf.org> On Mon, Jan 09, 2006 at 04:48:35PM -0700, Jon Noring wrote: > Was the early GUTNBERG mailing list archived as well? When was that > list formed? The first mention I see of GUTNBERG is January 1990. I > don't believe Google has it archived, unfortunately. That would be a > treasure trove of information on the early days of PG, and probably > will mention, in passing, things that happened before GUTNBERG was > started. I don't think it was. I believe our first automated list was on a LISTSERV run at uiuc (listserv.cso.uiuc.edu). That was from about 1992 or so through 1997 or earlier 1998). Then, we moved to UNC, which used similar software. listserv.unc.edu. I do have the archives from that period, somewhere. It was only in 2003 or so that we moved to mailman on lists.pglaf.org The number & makeup of lists changed over that time. Originally, the main (or only) purpose was for a monthly newsletter or similar, plus announcements of new titles. I am not aware of any archives we got from the UIUC LISTSERV lists. I'm sure some folks have their own personal copies, though. -- Greg From jon at noring.name Mon Jan 9 16:47:51 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 16:48:30 2006 Subject: [gutvol-d] my impression In-Reply-To: <1fc.103cdcc2.30f441d4@aol.com> References: <1fc.103cdcc2.30f441d4@aol.com> Message-ID: <237975147.20060109174751@noring.name> Bowerbird wrote: > my impression is that mr. noring would like to > knock michael hart down a few notches, and > that's why he's doing his "historical research"... The important thing is to gather the facts as best as can be done from primary sources, and the recollections of those who lived then. Let the facts speak for themselves. I welcome, and want, Michael Hart, for example, to provide specific information of the 1971 to 1989-90 period, such as what texts he keyed in and distributed on various networks (the fledgling Internet and BBS), besides the nine "political" ones which form PG texts #1-9 (like the Declaration of Independence -- not that I have anything against these texts, but they are not book length works.) I hope that a few PGers will take an interest in this evolving project and submit their tidbits of ebook lore and history (and hopefully references) to the ebook-history group. Bowerbird, you are welcome to add your own knowledge of the old-days. Anyway, if I'm guilty of anything, it's that I want to get to the bottom of the truth, so everything can be put into its proper and correct perspective, whatever that may turn out to be. > still, those of us who were promoting e-books > back in the '80s know who was leading the pack. > it was michael hart. What I'm still having trouble finding is any pre-1989 references to Michael Hart in relation to his text activities. On Usenet, it is a total blank (according to Google groups, which has archived Usenet and Bitnet back before 1985.) The first mention I've found of "Project Gutenberg" on Usenet is a message posted 15 Jan 1990, which contains a few messages written by Michael Hart on 20 Dec 1989 talking about Project Gutenberg, and that it is #1 in a series: http://groups.google.com/group/soc.culture.esperanto/msg/5e7ebc23a2866f1b?dmode=source&hl=en People like Bowerbird who were plying the BBS back in the late 80's may vaguely remember some things, but I'd like to know specifics. Were there newspaper and magazine articles covering MH? Were any of his texts, besides the short political ones that form PG texts #1-9, being distributed around BBS and ftp sites in the late 1980's? I simply see a dearth of information. Even Michael Hart's various bios, including his wikipedia, are fuzzy on what he did between 1971 and 1988/9 other than to say that technology was not quite there to do anything major, including with volunteers. When we see the first mention of PG in 1990, PG rapidly grew after that, and the rest is history. I can only surmise that even if Michael were thinking about launching PG (in a "let's get thousands of volunteers to type in books" way) in the mid-80's, he did not do so -- he waited for technology to hit a critical point. This is understandable. Paraphrasing Ecclesiastes: "there's a time and place for everything under the sun" -- and at least from the evidence I've seen, he really didn't go gangbusters on the volunteer-driven Project Gutenberg vision until very late in 1989. Yet I know there were commercial ebook projects in the 1986 time frame, and I think even before. A Turkish scholar digitized the complete works of a famous Turkish poet in 1986. Lots and lots of stuff, but nothing about Michael Hart in the same mid- to late- 80's time frame. Help me, people, let me know what's missing... Would love to get the original GUTNBERG Bitnet list archive, if it exists before 1990. That should be placed online (if not already). > so to try and strip him of his credit here, now that > we finally have come to realize his genius, well... > it is downright cruel.? small-minded and cruel... So you are saying the history of ebooks should not be studied in full because it might clarify everyone's actual role in the history of the ebook? As I noted, I welcome *everyone* to submit ebook lore tidbits, especially for the pre-1990 period. I'm not really planning to write the actual history, but to help collect the bits and pieces for either a historian to write, or a *group* of us to write for the wikipedia. The truth is out there, as Mulder would say. What really happened is what should be documented. No more, no less. > with due respect to the dreamers who came before > and handed to us the _idea_ of electronic-books, > including alan kaye, h.g. wells, and douglas adams, > there is no question who _invented_ the e-book, > by virtue of sitting down and actually entering one: > it's michael hart... He may have been the first to experiment with placing a short text on a computer for the purpose that others may electronically access it for reading. But I'm not so sure of that. With Brown University keypunching in over 1,000,000 words in the mid-1960's, plus the various types of pioneering electronic text research they and others were doing at the time, someone might have experimented with this, and possibly even wrote about it in journal articles. That it is not reported may be due to them not seeing the importance of it right away (it was the 1960's -- even by 1971 visualization hardware on computers greatly improved -- my wife worked in the early computer hardware days in an image processing lab.) When I worked as a research associate at the University of Minnesota, I investigated a lot of odd things that never got published, but on rare occasion have shared that 20 years after the fact with interested researchers who *were* interested in it. I believe Michael Hart when he says he did what he did back in 1971. But did he publish it? When was that information first made "public"? And this is not proof that he was the first to experiment with it. The only thing we are sure of is that today, no one else has stepped forward to claim having done it earlier. That's why I'm hoping to talk to those involved with the Brown research in the 1960's and 70's since they probably know a lot of interesting tidbits of who did what around the world in the late 60's and early 70's regarding electronic text research and ideas related to what we today define to be an "ebook". ***** It is interesting that in my search, the first use of the phrase "Project Gutenberg" as found in Google Groups was in 1987 by the Atari Corporation! I just talked with the guy, Art Morgan, who led the project at Atari at the time using that project name. He wrote, in part, "...I used "Project Gutenberg" as the internal code name for the Atari SLM804 Laser Printer. We (Atari) never trademarked the name, and used it for briefings to user groups and the press shortly before the product launch of the Atari desktop publishing system. Since we didn't commercially use the name, Mr. Hart probably had no knowledge that Atari was using it. That is, unless he owned an Atari ST and was plugged into the Atari user community at the time." Mr. Morgan also seemed to imply that he never even heard of Michael Hart's Project Gutenberg even up to today (have another email in with him to clarify that. Definitely it does not seem like there was any communication between Atari and Michael at that time.) > plus michael hart gave us something even better -- > the concept of "unlimited distribution" of e-books... The biggest contribution Michael Hart gave is that he promoted with a zealousness unmatched by anyone else the need to digitize public domain books and distribute them for free to everyone and anyone, and organizing volunteers to help make this a reality. This is his legacy and place in the ebook universe, and what a wonderful legacy it is. I currently believe that in public discourse Barry Shein and his "KiloMonkeys" (later OBI) proposal (from Sept 1989) beat MH to the punch in the public airing of the idea which includes volunteerism (subject to change as new evidence surfaces.) But Michael Hart made it happen. (To be fair to Barry, he was diverted in running the world.std.com ISP, while Michael threw himself full-time into PG, which is necessary to run a network of volunteers, so that's why the Online Book Initiative never gained the same traction as PG did in the early 1990's. Michael probably has more interesting info to share about Barry Shein and his KiloMonkeys proposal in 1989. Maybe Michael did publicly propose the PG idea earlier than Barry Shein's "KiloMonkeys", but I've not found any mention in the Google database, nor in any of the bios on Michael.) > compared to the _commercial_ e-book efforts, > which somehow noring wants on equal footing, > look how many more riches _that_ idea gave us. All aspects of digital publications, both copyrighted and public domain, are important when gathering the history of ebooks and digital publications. It's a complex, multi-faceted area with many players. I do believe when the final history is written, it will be very much like the automobile in complexity, seminal events and individuals. Jon Noring (p.s., doing a quick check on Google looking for the archive of the GUTNBERG list, bit.listserv.gutnberg -- Google groups has 717 messages for this group, and the oldest, a cross-post to the rec.arts.books, dated 17 July 1990, is a request for an online copy of the "Taming of the Shrew". So, again, I somehow believe this group, intended for use by the volunteers, was not around before 1990 or so. But let me know if it was!) From jon at noring.name Mon Jan 9 16:52:29 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 16:53:07 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <20060110002256.GA27181@pglaf.org> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> <80308709.20060109164835@noring.name> <20060110002256.GA27181@pglaf.org> Message-ID: <369937765.20060109175229@noring.name> Greg wrote: > Jon Noring wrote: >> Was the early GUTNBERG mailing list archived as well? When was that >> list formed? The first mention I see of GUTNBERG is January 1990. I >> don't believe Google has it archived, unfortunately. That would be a >> treasure trove of information on the early days of PG, and probably >> will mention, in passing, things that happened before GUTNBERG was >> started. > I don't think it was. > > I believe our first automated list was on a LISTSERV run > at uiuc (listserv.cso.uiuc.edu). That was from about > 1992 or so through 1997 or earlier 1998). Oh well. As noted in another message I just posted, a quick Google Groups search shows that it has archived 717 messages for the group bit.listserv.gutnberg. The first mention is from July 1990, and the rest start in 1991. I'm sure the early GUTNBERG list would be fascinating to follow, and itself contain historical information. Do you know when the GUTNBERG list was actually started? > I am not aware of any archives we got from the UIUC > LISTSERV lists. I'm sure some folks have their own > personal copies, though. Well, hopefully someone saved the early GUTNBERG archive. If so, be sure to let Greg know. Jon From greg at durendal.org Mon Jan 9 16:34:22 2006 From: greg at durendal.org (Greg Weeks) Date: Mon Jan 9 17:00:38 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: On Mon, 9 Jan 2006, Andrew Sly wrote: > Mentions using alice10.txt, as well as a few other early > PG texts, from the the Walnut Creek CD ROM. I believe I have a copy of this, but I'm not sure what vintage. I'll have to look. -- Greg Weeks http://durendal.org:8080/greg/ From jon at noring.name Mon Jan 9 19:05:41 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 9 19:06:22 2006 Subject: [gutvol-d] Further comments by the Atari "Project Gutenberg" team leader Message-ID: <11610376783.20060109200541@noring.name> Everyone, In a prior message this afternoon I noted that the first mention of the phrase "Project Gutenberg" in Google Groups appeared in 1987, and had nothing to do with the PG we know. It was used internally by the Atari Corporation to describe a new laser printer. I asked Art Morgan, who headed up that project, and whose name is associated with the 1987 message, to clarify what he wrote, and why he chose the name "Project Gutenberg", which I know everyone here will relate to. With his permission, he said: "Yes, I personally came up with the name since I was team lead on the SLM804 project. We didn't assign the product a model number until fairly late in its development. I had to give a "sneak peek" talk on it, so I dubbed it Project Gutenberg. In 1987, Apple had the only true desktop publishing system around, way before HP started selling laser printers for PCs and commoditized them. Unfortunately, Apple's laser printer was a computer in itself, and they had to charge a premium for this redundancy. "Atari's CEO, Jack Tramiel, gave us the edict to create a laser printer "for the masses, not the classes". I came up with the idea to have the host system perform all the RIP (raster image processing) functions of the printer, and just "pump" the final bitmap image of the page to the printer. This would require only a "dumb" laser printer engine to get the job done. The Atari ST was the perfect printer host - it was based on the Motorola 68000 and had gobs of memory - exactly the platform found in Apple's laser printers. "It was easy to talk to Adobe to port their code to the ST since it was developed on the same 68000 platform. But, to lower the costs further, we went with a PostScript clone from Imagen, and a printer engine from TEC (not Canon). Anyway, we should have patented the whole lot because NeXT later used the host-based laser printer idea for their system, and now Dell offers one for PCs. "Why did I pick Gutenberg? Gutenberg's moveable type technology enabled the printing of low-cost bibles and brought the word of God to everyone, just as Atari's unique RIP technology brought desktop publishing to the masses ..." (In a followup reply, Art mentioned:) "Sorry to admit it, but I didn't know about Michael Hart until you told me about him. He sounds like quite a visionary fellow - like Ted Nelson or Alan Kay. "Feel free to use my text - I'm honored and flattered! Take care!" For those interested... Jon From tb at baechler.net Mon Jan 9 23:58:31 2006 From: tb at baechler.net (Tony Baechler) Date: Tue Jan 10 00:04:41 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: <7.0.1.0.2.20060109235503.037acc60@baechler.net> Hello. I have the following. Write me off list if interested. I think as many of the early files should be saved as possible. I don't have alice10 but it might be floating around online somewhere. etext90: ALL11.ZIP ALL7011.ZIP BILL11.ZIP CONST11.ZIP GETTY11.ZIP JFK11.ZIP KJV10.ZIP LIBER11.ZIP LINC111.ZIP LINC211.ZIP MAYFL11.ZIP WHEN11.ZIP etext91: AESOP10.ZIP AESOP11.ZIP ALICE30.ZIP feder16.zip HISONG12.ZIP LGLASS18.ZIP lglass19.zip moby.zip MOBYNO.ZIP PETER15A.ZIP PETER16.ZIP PLBOSS10.ZIP ROGET13.ZIP ROGET13A.ZIP roget14.zip ROGET14A.ZIP roget15a.zip SNARK12.ZIP WORLD12.ZIP The plboss10.zip is particularly interesting because it contains a brief note from Judy Boss but none of the usual PG headers. As far as the standard PG headers go, this seems to be the oldest. It came from my Walnut Creek CD-ROM. I am not sure about the other titles but I used to have an older copy of Alice around somewhere. From tb at baechler.net Tue Jan 10 00:07:36 2006 From: tb at baechler.net (Tony Baechler) Date: Tue Jan 10 00:07:05 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <1675677963.20060109133820@noring.name> References: <1675677963.20060109133820@noring.name> Message-ID: <7.0.1.0.2.20060110000514.037d8eb0@baechler.net> Hello. One other source of old books is OBI, or the Online Books Initiative. PG borrowed some books from them including Moby. Also there was Wiretap. OBI used to be at ftp://world.std.com/ but I think it's long gone. However, it was reasonably famous for its time so it might be archived somewhere. I remember browsing it when PG was much smaller, even in 1996 or so. Also ftp.ibiblio.org has quite a bit of old articles and such. From sly at victoria.tc.ca Tue Jan 10 00:36:57 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue Jan 10 00:37:37 2006 Subject: [gutvol-d] Pre-1990 ebook history In-Reply-To: <80308709.20060109164835@noring.name> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> <80308709.20060109164835@noring.name> Message-ID: Jon: Here's a very promising lead for you to follow up. Check out the notes at the beginning of Paradise Lost http://www.gutenberg.org/etext/26 "This etext was originally created in 1964-1965 according to Dr. Joseph Raben of Queens College, NY..." At a closer look, "edition 12" appears to be the only one right now in the main PG archive. http://www.gutenberg.org/dirs/etext92/plrabn12.txt However, a search for "plrabn10.txt" and "plrabn11.txt" finds some sites that still have them. Andrew From gbnewby at pglaf.org Tue Jan 10 00:52:19 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Tue Jan 10 00:52:20 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <7.0.1.0.2.20060110000514.037d8eb0@baechler.net> References: <1675677963.20060109133820@noring.name> <7.0.1.0.2.20060110000514.037d8eb0@baechler.net> Message-ID: <20060110085219.GA1275@pglaf.org> On Tue, Jan 10, 2006 at 12:07:36AM -0800, Tony Baechler wrote: > Hello. One other source of old books is OBI, or the Online Books > Initiative. PG borrowed some books from them including Moby. Also > there was Wiretap. OBI used to be at ftp://world.std.com/ but I > think it's long gone. However, it was reasonably famous for its time > so it might be archived somewhere. I remember browsing it when PG > was much smaller, even in 1996 or so. Also ftp.ibiblio.org has quite > a bit of old articles and such. Spies in the wire http://wiretap.area.com/Gopher/ I have a local copy from a few years ago, but it looks about the same. When wiretap was active, they took some stuff from PG, and vice-versa. Most of the eBook content from Wiretap is now in textfiles.org, I think. -- Greg From greg at durendal.org Tue Jan 10 04:39:14 2006 From: greg at durendal.org (Greg Weeks) Date: Tue Jan 10 05:00:04 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: On Mon, 9 Jan 2006, Greg Weeks wrote: > On Mon, 9 Jan 2006, Andrew Sly wrote: > >> Mentions using alice10.txt, as well as a few other early >> PG texts, from the the Walnut Creek CD ROM. > > I believe I have a copy of this, but I'm not sure what vintage. I'll have to > look. The one I have is dated 1992. It probably doesn't have any of the older stuff on it. I also found my copy of "The Library of the Future" -- Greg Weeks http://durendal.org:8080/greg/ From radicks at bellsouth.net Mon Jan 9 17:51:56 2006 From: radicks at bellsouth.net (Dick Adicks) Date: Tue Jan 10 07:07:34 2006 Subject: [gutvol-d] my impression Message-ID: See the introduction to PG etext #26, Paradise Lost, identified there as "the oldest etext known to Project Gutenberg (ca. 1964-1965)" Dick Adicks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060109/2bad90a2/attachment.html From hart at pglaf.org Tue Jan 10 07:46:32 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Jan 10 07:46:33 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <2d0.161067f.30f43f48@aol.com> References: <2d0.161067f.30f43f48@aol.com> Message-ID: On Mon, 9 Jan 2006 Bowerbird@aol.com wrote: > andrew said: >> a few other early PG texts, from the Walnut Creek CD ROM. >> Any chance of finding one of those still floating around somewhere? > > i did some searching for a walnut-creek c.d. a long time back, > but got nowhere. but maybe an ebay expert could help you... > > however, mr. noring already has solid evidence that predates > that cd-rom. what he's looking for is even _earlier_ evidence... > > -bowerbird > I still have some of the old Simtel and Walnut Creek CDROMS, and even some of the floppies that predated those. . .somewhere. Michael From hart at pglaf.org Tue Jan 10 08:01:59 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Jan 10 08:02:01 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: On Tue, 10 Jan 2006, Greg Weeks wrote: > On Mon, 9 Jan 2006, Greg Weeks wrote: > >> On Mon, 9 Jan 2006, Andrew Sly wrote: >> >>> Mentions using alice10.txt, as well as a few other early >>> PG texts, from the the Walnut Creek CD ROM. >> >> I believe I have a copy of this, but I'm not sure what vintage. I'll have >> to look. > > The one I have is dated 1992. It probably doesn't have any of the older stuff > on it. I also found my copy of "The Library of the Future" Alice was possibly the first widely distributed eBook, but my recollection doesn't seem to match everyone else's. I recall completing it in 1988, and many of the files contain a notice of the 1988 Millennium Fulcrum Edition. . .that was me, thinking outside the box of the then current limitations of Project Gutenberg, which had been mostly a History of Democracy sort of thing in the 1970's. Some people, including our CEO Greg Newby, tell me that I released Alice a few years earlier than that, perhaps in 1984, as Greg says he first saw it around 1985 and that's how he first became aware of me and PG. This IS possible, since I was a BBS Sysop in 1984-1985 and did put a LOT of the early Project Gutenberg works on the Champaign County Computer Club BBS during those years for free download, which might have included the earliest versions of Alice. My own recollection is that I did Alice no earlier than 1985, after I moved into this house in September, because I seem to recall doing the typing and proofreading at this very desk on an early incarnation of this same computer system. 60,900,000 hits for "e-book" OR ebook OR ebooks. 60,900,000 hits for bomb. Give eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From jon at noring.name Tue Jan 10 08:10:19 2006 From: jon at noring.name (Jon Noring) Date: Tue Jan 10 08:10:34 2006 Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5) In-Reply-To: References: Message-ID: <1291928104.20060110091019@noring.name> Dick wrote: > See the introduction to PG etext #26, Paradise Lost, identified > there as "the oldest etext known to Project Gutenberg (ca. 1964-1965)" Wow! Here's the Intro of that text detailing the source of the original etext. Undoubtedly Michael Hart wrote this introduction since it is nicely right-justified. A comment and question below. ******************************************************************* Introduction (one page) This etext was originally created in 1964-1965 according to Dr. Joseph Raben of Queens College, NY, to whom it is attributed by Project Gutenberg. We had heard of this etext for years but it was not until 1991 that we actually managed to track it down to a specific location, and then it took months to convince people to let us have a copy, then more months for them actually to do the copying and get it to us. Then another month to convert to something we could massage with our favorite 486 in DOS. After that is was only a matter of days to get it into this shape you will see below. The original was, of course, in CAPS only, and so were all the other etexts of the 60's and early 70's. Don't let anyone fool you into thinking any etext with both upper and lower case is an original; all those original Project Gutenberg etexts were also in upper case and were translated or rewritten many times to get them into their current condition. They have been worked on by many people throughout the world. In the course of our searches for Professor Raben and his etext we were never able to determine where copies were or which of a variety of editions he may have used as a source. We did get a little information here and there, but even after we received a copy of the etext we were unwilling to release it without first determining that it was in fact Public Domain and finding Raben to verify this and get his permission. Interested enough, in a totally unrelated action to our searches for him, the professor subscribed to the Project Gutenberg listserver and we happened, by accident, to notice his name. (We don't really look at every subscription request as the computers usually handle them.) The etext was then properly identified, copyright analyzed, and the current edition prepared. To give you an estimation of the difference in the original and what we have today: the original was probably entered on cards commonly known at the time as "IBM cards" (Do Not Fold, Spindle or Mutilate) and probably took in excess of 100,000 of them. A single card could hold 80 characters (hence 80 characters is an accepted standard for so many computer margins), and the entire original edition we received in all caps was over 800,000 chars in length, including line enumeration, symbols for caps and the punctuation marks, etc., since they were not available keyboard characters at the time (probably the keyboards operated at baud rates of around 113, meaning the typists had to type slowly for the keyboard to keep up). ******************************************************************* Am I right to assume that this etext was originally punched in for lexical (text) analysis? That time frame corresponds to when the Brown Corpus was started. What other complete texts of books were rumored (or known to be) "digitized" (such as it is on punch cards) in the 1960's and early 70's? Thanks. Jon From jon at noring.name Tue Jan 10 08:20:00 2006 From: jon at noring.name (Jon Noring) Date: Tue Jan 10 08:20:11 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <1675677963.20060109133820@noring.name> Message-ID: <641093536.20060110092000@noring.name> Michael Hart wrote: > Alice was possibly the first widely distributed eBook, but my recollection > doesn't seem to match everyone else's. > > I recall completing it in 1988, and many of the files contain a notice > of the 1988 Millennium Fulcrum Edition. . .that was me, thinking outside > the box of the then current limitations of Project Gutenberg, which had > been mostly a History of Democracy sort of thing in the 1970's. > > Some people, including our CEO Greg Newby, tell me that I released Alice > a few years earlier than that, perhaps in 1984, as Greg says he first saw > it around 1985 and that's how he first became aware of me and PG. > > This IS possible, since I was a BBS Sysop in 1984-1985 and did put a LOT > of the early Project Gutenberg works on the Champaign County Computer Club > BBS during those years for free download, which might have included the > earliest versions of Alice. My own recollection is that I did Alice > no earlier than 1985, after I moved into this house in September, > because I seem to recall doing the typing and proofreading at this > very desk on an early incarnation of this same computer system. Thanks for the details, Michael. Very good historical information. I've cross-posted this reply to the ebook-history list so it may be preserved. What other books, besides the "history of democracy type texts", do you recall placing on the Champaign County Computer Club BBS in 1984-85? Jon From hart at pglaf.org Tue Jan 10 08:33:24 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Jan 10 08:33:25 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: <20060110002256.GA27181@pglaf.org> References: <1675677963.20060109133820@noring.name> <20060109233815.GB21426@pglaf.org> <80308709.20060109164835@noring.name> <20060110002256.GA27181@pglaf.org> Message-ID: On Mon, 9 Jan 2006, Greg Newby wrote: > On Mon, Jan 09, 2006 at 04:48:35PM -0700, Jon Noring wrote: >> Was the early GUTNBERG mailing list archived as well? When was that >> list formed? The first mention I see of GUTNBERG is January 1990. I >> don't believe Google has it archived, unfortunately. That would be a >> treasure trove of information on the early days of PG, and probably >> will mention, in passing, things that happened before GUTNBERG was >> started. > > I don't think it was. > > I believe our first automated list was on a LISTSERV run > at uiuc (listserv.cso.uiuc.edu). That was from about > 1992 or so through 1997 or earlier 1998). > > Then, we moved to UNC, which used similar software. > listserv.unc.edu. I do have the archives from that > period, somewhere. > > It was only in 2003 or so that we moved to mailman > on lists.pglaf.org > > The number & makeup of lists changed over that time. > Originally, the main (or only) purpose was for a > monthly newsletter or similar, plus announcements > of new titles. > > I am not aware of any archives we got from the UIUC > LISTSERV lists. I'm sure some folks have their own > personal copies, though. > -- Greg I think the first PG Newsletters went out from one of the IBM mainframes at the UI in 1988 or 1989. We moved several times. . .I remember at least two: vmd.cso.uiuc.edu and vme.cso.uiuc.edu before we moved to the UNIX machines. I'm not sure if we ever ran from vmc.cso.uiuc.edu. Michael From schultzk at uni-trier.de Wed Jan 11 00:22:13 2006 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Wed Jan 11 01:05:50 2006 Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5) In-Reply-To: <1291928104.20060110091019@noring.name> References: <1291928104.20060110091019@noring.name> Message-ID: Hi Everybody, I hate to disapoint everybody, but there are even older "etexts" than this! Though I have to admit that back then they were not called etexts !!? They were called corpera. They were not stored on disks or such mass storage system, but on punch cards and such. "ebooks" have been aroun since the mid 80s. They were programs that were dedicated to one book and its display. I have "The Hitchhiker's Guide to the Galaxy" somewhere in box. Anybody remeber the Apple Newton (also mid 80s) it also what would be termed today as ebooks. As a matter of fact I use to read the first PG etexts on my Newton. Just my 2 Euro cents worth Keith. Am 10.01.2006 um 17:10 schrieb Jon Noring: > Dick wrote: > >> See the introduction to PG etext #26, Paradise Lost, identified >> there as "the oldest etext known to Project Gutenberg (ca. >> 1964-1965)" > > Wow! > > Here's the Intro of that text detailing the source of the original > etext. Undoubtedly Michael Hart wrote this introduction since it is > nicely right-justified. A comment and question below. > > ******************************************************************* > > Introduction (one page) > > This etext was originally created in 1964-1965 according to Dr. > Joseph Raben of Queens College, NY, to whom it is attributed by > Project Gutenberg. We had heard of this etext for years but it > was not until 1991 that we actually managed to track it down to > a specific location, and then it took months to convince people > to let us have a copy, then more months for them actually to do > the copying and get it to us. Then another month to convert to > something we could massage with our favorite 486 in DOS. After > that is was only a matter of days to get it into this shape you > will see below. The original was, of course, in CAPS only, and > so were all the other etexts of the 60's and early 70's. Don't > let anyone fool you into thinking any etext with both upper and > lower case is an original; all those original Project Gutenberg > etexts were also in upper case and were translated or rewritten > many times to get them into their current condition. They have > been worked on by many people throughout the world. > > In the course of our searches for Professor Raben and his etext > we were never able to determine where copies were or which of a > variety of editions he may have used as a source. We did get a > little information here and there, but even after we received a > copy of the etext we were unwilling to release it without first > determining that it was in fact Public Domain and finding Raben > to verify this and get his permission. Interested enough, in a > totally unrelated action to our searches for him, the professor > subscribed to the Project Gutenberg listserver and we happened, > by accident, to notice his name. (We don't really look at every > subscription request as the computers usually handle them.) The > etext was then properly identified, copyright analyzed, and the > current edition prepared. > > To give you an estimation of the difference in the original and > what we have today: the original was probably entered on cards > commonly known at the time as "IBM cards" (Do Not Fold, Spindle > or Mutilate) and probably took in excess of 100,000 of them. A > single card could hold 80 characters (hence 80 characters is an > accepted standard for so many computer margins), and the entire > original edition we received in all caps was over 800,000 chars > in length, including line enumeration, symbols for caps and the > punctuation marks, etc., since they were not available keyboard > characters at the time (probably the keyboards operated at baud > rates of around 113, meaning the typists had to type slowly for > the keyboard to keep up). > > ******************************************************************* > > > Am I right to assume that this etext was originally punched in for > lexical (text) analysis? That time frame corresponds to when the Brown > Corpus was started. > > What other complete texts of books were rumored (or known to be) > "digitized" (such as it is on punch cards) in the 1960's and early > 70's? > > > Thanks. > > Jon > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From prosfilaes at gmail.com Wed Jan 11 02:36:33 2006 From: prosfilaes at gmail.com (David Starner) Date: Wed Jan 11 02:36:55 2006 Subject: [gutvol-d] Intro from PG text #26 (source text from 1964-5) In-Reply-To: References: <1291928104.20060110091019@noring.name> Message-ID: <6d99d1fd0601110236y65437f09te5cdccb79e25869d@mail.gmail.com> On 1/11/06, Keith J. Schultz wrote: > Hi Everybody, > > I hate to disapoint everybody, but there are even older > "etexts" than this! Though I have to admit that back then they > were not called etexts !!? They were called corpera. They were not > stored on disks or such mass storage system, but on punch cards and > such. According to the intro, etext #26 was probably entered on cards in 64-65. The Brown Corpus (I'm guessing 1962, since the Wikipedia article doesn't really say) didn't really include etexts, since it was 2000 word samples, not entire texts. Given the memory size and cost of early computers, and the fact that Wikipedia says the "Brown Corpus pioneered the field of corpus linguistics", I'd like some evidence that there were older etexts. From tb at baechler.net Wed Jan 11 02:46:46 2006 From: tb at baechler.net (Tony Baechler) Date: Wed Jan 11 02:45:55 2006 Subject: [gutvol-d] Early ebook history info wanted; "Alice"; Brown Corpus; Vannevar Bush; Asimov In-Reply-To: References: <2d0.161067f.30f43f48@aol.com> Message-ID: <7.0.1.0.2.20060111024416.031ca6e0@baechler.net> Hello all and especially Michael. I am very interested in old simtel CDs. I am especially looking for simtel-20 archives which would be from the early 1990's. Also it would be nice to find an old mirror of oak.oakland.edu which had the early PC-SIG, PC-BLUE, COMUG and large CP/M collections. If anyone has any PC-SIG CDs, especially edition 12 or earlier, I am interested. Contact me off list since this is off topic. At 07:46 AM 1/10/2006, you wrote: I still have some of the old Simtel and Walnut Creek CDROMS, >and even some of the floppies that predated those. . .somewhere. > >Michael From radicks at bellsouth.net Wed Jan 11 07:44:05 2006 From: radicks at bellsouth.net (Dick Adicks) Date: Wed Jan 11 07:44:11 2006 Subject: [gutvol-d] Early ebook history info wanted Message-ID: When I was teaching at Georgia Tech from 1965 to 1968, Professor William Mullen in the English department there was transferring the Episcopal psalter to IBM cards, using ALGOL. Bill inspired me to take an introductory course in that computer language so that I could initiate a similar project, but I went no further with it. I don't know what become of his work, but he had accumulated many boxes of cards, with one line to every card. Dick Adicks A small group of thoughtful people could change the world. Indeed, it's the only thing that ever has.--Margaret Mead . . . if vicious people are united and form a power, honest people must do the same. --Leo Tolstoy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060111/ce25442f/attachment.html From traverso at dm.unipi.it Wed Jan 11 11:22:56 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Wed Jan 11 11:10:44 2006 Subject: [gutvol-d] Early ebook history info wanted In-Reply-To: (message from Dick Adicks on Wed, 11 Jan 2006 10:44:05 -0500) References: Message-ID: <200601111922.k0BJMuA21996@pico.dm.unipi.it> In Pisa, there is an institute of computational linguistics, http://www.ilc.cnr.it/ who originated from a research group of the national univerity computing center in 1967. I think that there has been work even earlier, probably 1965, and I remember that they began with the input of Dante's Commedia and the creation of concordances. I can retrieve more accurate informations if needed. Carlo Traverso From robsuth at robsuth.plus.com Thu Jan 12 08:50:18 2006 From: robsuth at robsuth.plus.com (Robert Sutherland) Date: Thu Jan 12 11:15:10 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net> Just to bring my enquiry of 8 June 05 uptodate, I continued my email/online enquiries as far as I could, but found no product among the DVD portable readers that would deal with text, nor was any of the manufacturers interested in producing one. Apart from the French reader Cybook (which I think is too expensive for the task and market although otherwise for the most part ideal) there seems still to be no special ebook reading device available in UK or the EU generally: laptops are still too big and heavy, and PDAs still have far too small a screen My enquiries confirmed my strong impression that there are protectionist interests holding this back, presumably in the interests of proprietary issues of ebooks. That is really rather to bury the head in the sand, especially now that Google are to set up their Library - if PG and Google are going to be fully useful a new device is inevitable. Can PG and Google not take a hand themselves to enable production of something they undoubtedly will need? Robert Sutherland ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060112/74e87244/attachment.html From hart at pglaf.org Thu Jan 12 12:00:35 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jan 12 12:00:37 2006 Subject: !@!Re: [gutvol-d] Ebook reading devices In-Reply-To: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net> References: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net> Message-ID: On Thu, 12 Jan 2006, Robert Sutherland wrote: > Just to bring my enquiry of 8 June 05 uptodate, I continued my email/online > enquiries as far as I could, but found no product among the DVD portable > readers that would deal with text, nor was any of the manufacturers > interested in producing one. Apart from the French reader Cybook (which I > think is too expensive for the task and market although otherwise for the > most part ideal) there seems still to be no special ebook reading device > available in UK or the EU generally: laptops are still too big and heavy, Don't they have "notebook" computers that weigh hardly more than books? > and PDAs still have far too small a screen I've seen products such as Blackberrys and Treos that seem to have screens at least twice as large as most PDAs, perhaps one of those would be better. > My enquiries confirmed my strong impression that there are protectionist > interests holding this back, presumably in the interests of proprietary > issues of ebooks. Yes, it is all too obvious that virtually all the multibillion dollar participants, from Google to Yahoo to Amazon to HarperCollins, and even to The Library of Congress, do not seem to have "easy access" in mind, but that is merely because they have no concept other than traditional "business plans." I think that just as Google came up with a different kind of business plan based on a free product, that others will do so with free eBooks, or eBooks so inexpensive that no one will worry about the cost. > That is really rather to bury the head in the sand, especially now that > Google are to set up their Library - if PG and Google are going to be fully > useful a new device is inevitable. Can PG and Google not take a hand > themselves to enable production of something they undoubtedly will need? Need? But first, I can't put Google and PG in the same group for several reasons. Google is worth over $100 billion, PG is barely worth an account sheet. Google, after 13 months of high visibility press releases still has not taken over more than a few percent of the eBook marketplace, and I still haven't seen anything new from Yahoo, Amazon, HarperCollins, or even The Library of Congress that makes me think they will be responsible for a million eBooks between the lot of them before a million eBooks are made simply and easily available by people beneath their radar. Back to need. . . . Of course, iPods and cellphones don't have large screens, but that didn't stop people from reading eBooks on them, and even Apple has to admit that they are selling 10 times as many iPods as computers. 1.25 million Apple computers sold in the last quarter. 14 million iPods. But that barely scratches the surface of cellphone sales which are in the range of 1 billion per year and only 100 million computers. People are going to use what they have. I just don't see a market for that dedicated eBook reader we have heard talk about for ever so long, particularly at prices we must say are equal to that of a cheap computer and usually filled with stuff to keep us from doing much with free eBooks. Michael From hart at pglaf.org Thu Jan 12 14:29:34 2006 From: hart at pglaf.org (Michael Hart) Date: Thu Jan 12 14:29:36 2006 Subject: [gutvol-d] Early ebook history info wanted In-Reply-To: <200601111922.k0BJMuA21996@pico.dm.unipi.it> References: <200601111922.k0BJMuA21996@pico.dm.unipi.it> Message-ID: On Wed, 11 Jan 2006, Carlo Traverso wrote: > > In Pisa, there is an institute of computational linguistics, > http://www.ilc.cnr.it/ who originated from a research group of the > national univerity computing center in 1967. > > I think that there has been work even earlier, probably 1965, and I > remember that they began with the input of Dante's Commedia and the > creation of concordances. I can retrieve more accurate informations if > needed. Our first Webmaster, Pietro di Miceli, is Italian, living in Rome, and I think he heard of this but that it was so walled off from any public consumption that no one he knew could get to it, so it pretty much remained in the area of rumor. Michael From imaclean at gmail.com Fri Jan 13 00:54:27 2006 From: imaclean at gmail.com (Ian MacLean) Date: Fri Jan 13 01:57:47 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net> References: <6.2.3.4.1.20060112164044.02ce58d0@mail.plus.net> Message-ID: <3156339d0601130054p1cf6d6f9o4e8c1926528ce805@mail.gmail.com> Have you seen the just announced ebook reader from Sony using a high res e-ink display ?: http://blogs.reuters.com/2006/01/04/glimpse-at-new-sony-reader/ http://products.sel.sony.com/pa/prs/reader_features.html They claim to have done away with the ridculous restrictions built into their previous reader ( the libre ) and it *should* be able to ready txt and pdf, html although these will need to be converted into Sony's proprietry BBeB format before being transferred to the device. Another product out soon that will use the same e-ink technology is the irex ER0100 : http://www.epaper.org.uk/index.php?option=com_content&task=view&id=56&Itemid=2 which apparently will be able to display PDF, XHTML, or TXT without conversion. Both these devices will probably be as expensive as the Cybook but with much higher readability of e-ink displays. Ian On 1/13/06, Robert Sutherland wrote: > Just to bring my enquiry of 8 June 05 uptodate, I continued my > email/online enquiries as far as I could, but found no product among the DVD > portable readers that would deal with text, nor was any of the manufacturers > interested in producing one. Apart from the French reader Cybook (which I > think is too expensive for the task and market although otherwise for the > most part ideal) there seems still to be no special ebook reading device > available in UK or the EU generally: laptops are still too big and heavy, > and PDAs still have far too small a screen My enquiries confirmed my strong > impression that there are protectionist interests holding this back, > presumably in the interests of proprietary issues of ebooks. That is really > rather to bury the head in the sand, especially now that Google are to set > up their Library - if PG and Google are going to be fully useful a new > device is inevitable. Can PG and Google not take a hand themselves to enable > production of something they undoubtedly will need? > > Robert Sutherland > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > From joshua at hutchinson.net Fri Jan 13 06:44:02 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Fri Jan 13 06:44:06 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com> Anyone heard if the BBeB format is open/documented? And even better, if anyone has created an open source converter? If these things take off, it would be nice to have the ability to generate files for them from our collection. If there is an open source converter, there is a chance we could do such a thing right on the server. Josh ----- Original Message ----- From: "Ian MacLean" To: "Project Gutenberg Volunteer Discussion" Subject: Re: [gutvol-d] Ebook reading devices Date: Fri, 13 Jan 2006 17:54:27 +0900 > > Have you seen the just announced ebook reader from Sony using a high > res e-ink display ?: > > http://blogs.reuters.com/2006/01/04/glimpse-at-new-sony-reader/ > http://products.sel.sony.com/pa/prs/reader_features.html > > They claim to have done away with the ridculous restrictions built > into their previous reader ( the libre ) and it *should* be able to > ready txt and pdf, html although these will need to be converted into > Sony's proprietry BBeB format before being transferred to the device. > > Another product out soon that will use the same e-ink technology is > the irex ER0100 : > http://www.epaper.org.uk/index.php?option=com_content&task=view&id=56&Itemid=2 > > which apparently will be able to display PDF, XHTML, or TXT without conversion. > > Both these devices will probably be as expensive as the Cybook but > with much higher readability of e-ink displays. > > Ian > > > On 1/13/06, Robert Sutherland wrote: > > Just to bring my enquiry of 8 June 05 uptodate, I continued my > > email/online enquiries as far as I could, but found no product among the DVD > > portable readers that would deal with text, nor was any of the manufacturers > > interested in producing one. Apart from the French reader Cybook (which I > > think is too expensive for the task and market although otherwise for the > > most part ideal) there seems still to be no special ebook reading device > > available in UK or the EU generally: laptops are still too big and heavy, > > and PDAs still have far too small a screen My enquiries confirmed my strong > > impression that there are protectionist interests holding this back, > > presumably in the interests of proprietary issues of ebooks. That is really > > rather to bury the head in the sand, especially now that Google are to set > > up their Library - if PG and Google are going to be fully useful a new > > device is inevitable. Can PG and Google not take a hand themselves to enable > > production of something they undoubtedly will need? > > > > Robert Sutherland > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d From jon at noring.name Fri Jan 13 07:32:33 2006 From: jon at noring.name (Jon Noring) Date: Fri Jan 13 07:32:49 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com> References: <20060113144402.B17F6EE36C@ws6-1.us4.outblaze.com> Message-ID: <1754496216.20060113083233@noring.name> Joshua asked: > Anyone heard if the BBeB format is open/documented? And even > better, if anyone has created an open source converter? If these > things take off, it would be nice to have the ability to generate > files for them from our collection. If there is an open source > converter, there is a chance we could do such a thing right on the server. My best understanding from following the Librie list, talking with the "librie guy", and a few snippets of news releases, is that the BBeB Xylog DTD/schema/spec is still unpublished, but that Sony plans to publish (and maybe release as an "open standard") the format. Looking at the incomplete Xylog schema used in the Librie which has been reverse engineered, as well as a couple of Xylog XML documents, has revealed some interesting tidbits: 1) It's an all-in-one XML document -- everything is dumped inside a single document, including images, metadata, etc. (I vaguely remember Microsoft trying to patent this idea. Anyone know?) 2) All the examples I've seen are text-encoded in UTF-16. This means either that UTF-16 is supported (along with hopefully UTF-8), or that it is required. This makes sense for the Japanese origin of the format where, I gather, UTF-16 is more efficient than UTF-8 when encoding Han characters and such. 3) It does NOT use CSS -- rather it uses its own styling scheme which does not appear to completely map to CSS (or the mapping is very complex). The core model may not be the same as the CSS box model. This is troubling why they chose their own styling language rather than fully embrace some subset of CSS. Part of this may stem from the core layout model, which is faintly reminiscent of PDF. 4) The document structure is dirt simple. There are two types of "text blocks" supported, which is sort of analogous to a
box. Within a text block one can have one or more

(paragraphs), and there is a small supported set of inline tags. There does not appear, but I'm not certain (I can only go by what I've seen so far), to be support for defined structures such as tables, lists, blockquotes, and even headers. All these things have to be fitted within the text block/paragraph. This appears to make accessibility more difficult since there's no predefined semantics one can assign to the various structures (which could include, I suppose sidebars and stuff) so those using text-to-speech may have to figure out what's what without any machine-recognizable cues.) Definitely, the Xylog vocabulary is not suitable for use as a "master" format for etexts. It's more of a derivative format for primarily visual presentation purposes. Anyway, these are my impressions from incomplete information. Once the BBeB Xylog schema is published, we'll know for sure. And it is possible the schema used for the U.S. Sony may be updated from the one used in the Librie. It is sad that they ignored established standards (such as HTML, OEBPS, CSS, TEI, etc.) and decided to roll their own. And so far I don't see any innovations that makes it better for representing digital publications. I see it as a step backwards. (To be fair, the motivation for developing it maybe was to minimize hardware resource requirements, so for that it may be innovative, but I see no other advantages, not even in document conversion.) We'll see... YMMV. Jon From Bowerbird at aol.com Fri Jan 13 09:36:04 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jan 13 09:36:17 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: ian said: > Have you seen the just announced ebook reader > from Sony using a high res e-ink display ? i might pay $350 for a handheld with web access. but for a machine that's isolated from cyberspace? um, no thanks. next? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060113/33ab2c07/attachment.html From marcello at perathoner.de Fri Jan 13 14:34:49 2006 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri Jan 13 14:34:47 2006 Subject: [gutvol-d] PG Website Report for 2005 Message-ID: <43C82B09.2050205@perathoner.de> User base and server load The PG website is ranked at position 3,842 by Alexa (Dec 31, 2005). Note that Alexa is an Amazon company and thus bookish people are overrepresented in the Alexa user base. While the absolute Alexa rank value is skewed in our favor, the trend Alexa gives is quite objective. Since Jan 1, 2005: - 2.5 times as many people are using PG and - PG is serving 2 times as many web pages. To accomodate the increased load at better response times the site was redesigned for more static pages and better caching. Rarely used features like skins were dropped in favor of better performance. Where do new users come from? Most visitors find us thru search engines. But 10% come from wikipedia. The catalog team has been active in editing appropriate wikipedia articles to point to our ebooks. If you want more people to come to PG, help us editing more wikipedia articles. Dec 2005 Visitors % Referrer URL 236011 24.20% www.google.com/ 97795 10.03% en.wikipedia.org/ 78527 8.05% search.yahoo.com/ 34796 3.57% www.promo.net/ 28997 2.97% www.google.co.uk/ 24810 2.54% www.stumbleupon.com/ 20355 2.09% www.google.ca/ 16598 1.70% search.msn.com/ 15289 1.57% www.google.de/ 13714 1.41% www.google.co.in/ Search Terms Visitors found us searching for these terms: Dec 2005 Visitors Search Term 24923 project gutenberg 18297 gutenberg 9391 gutenberg project 9176 free ebooks 6964 ebooks 4303 e books 4059 free e books 3807 free books 3489 online books 3348 project gutenburg While most of these people knew beforehand what they were looking for, some of them found us by searching for "free ebooks" or similar generic terms. I have rewritten our main page to push our ranking for the "free ebooks" search term. We are also well positioned in google for following generic search terms: Dec 2005 Pos. Search Term 2 free ebooks 2 free books 5 ebooks 15 books Some other sites rank better than PG because they have "ebook" in the domain name. This way all links to those sites must contain the target word "ebook". If you want to help pushing our rank, set links in your web pages pointing to the PG main page with the link text "free ebooks" like this: My favorite free ebooks site. Pretty Pictures Snapshots of Alexa statistics on Dec 31, 2005: Pageviews: http://www.gutenberg.org/internal/reports/2005/alexa.6yp.png Reach: http://www.gutenberg.org/internal/reports/2005/alexa.6yr.png Rank: http://www.gutenberg.org/internal/reports/2005/alexa.6yt.png User: internal Pass: books -- Marcello Perathoner webmaster@gutenberg.org From imaclean at gmail.com Fri Jan 13 21:07:13 2006 From: imaclean at gmail.com (Ian MacLean) Date: Fri Jan 13 21:07:33 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: References: Message-ID: <3156339d0601132107g41a89d08j9b2be0b92c9bb62@mail.gmail.com> On 1/14/06, Bowerbird@aol.com wrote: > ian said: > > Have you seen the just announced ebook reader > > from Sony using a high res e-ink display ? > > i might pay $350 for a handheld with web access. > but for a machine that's isolated from cyberspace? > um, no thanks. next? Fair enough - then the iRex device might be a better bet. And less proprietry. http://www.cryptonomicon.net/msh/2005/12/eink-based-ebook-reader-to-ship-in.html http://www.irextechnologies.com/downloads/Productleaflet-Iliad.pdf "According to iRex, the Illiad will come with a digitizer and stylus allowing the user to input comments on digital documents. It directly supports PDF, XHTML and Text (Unicode?) formats. Like traditional eBook readers, the device will connect to a host PC via USB. Unlike traditional eBook readers, however, the unit will also sport WiFi and wired ethernet network interfaces." Ian From Bowerbird at aol.com Fri Jan 13 21:53:06 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jan 13 21:53:32 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <2e5.8ddc4c.30f9ebc2@aol.com> ian said: > Fair enough - then the iRex device > might be a better bet. And less proprietry. might be. except i heard it will be $300-$500. with no deep pockets to subsidize and back it. way too much for a dumb terminal in this age... next? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/938d4868/attachment.html From hyphen at hyphenologist.co.uk Fri Jan 13 23:13:35 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Fri Jan 13 23:14:05 2006 Subject: [gutvol-d] PG Website Report for 2005 In-Reply-To: <43C82B09.2050205@perathoner.de> References: <43C82B09.2050205@perathoner.de> Message-ID: On Fri, 13 Jan 2006 23:34:49 +0100, Marcello Perathoner wrote: >Most visitors find us thru search engines. But 10% come from wikipedia. >The catalog team has been active in editing appropriate wikipedia >articles to point to our ebooks. If you want more people to come to PG, >help us editing more wikipedia articles. How easy would it be to add my Yorkshire dialect authors to Wilipedia? -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From sly at victoria.tc.ca Fri Jan 13 23:27:25 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Fri Jan 13 23:27:56 2006 Subject: [gutvol-d] PG Website Report for 2005 In-Reply-To: References: <43C82B09.2050205@perathoner.de> Message-ID: Hi Dave On Sat, 14 Jan 2006, Dave Fawthrop wrote: > > How easy would it be to add my Yorkshire dialect authors to Wilipedia? > Fairly easy, as long as you have the information to put in. Wikipedia is very much a learn-as-you-go type of enviroment, so you might want to just take a look at a few other author biographies and then create an initial article. The easiest might be to just create what's called a "stub" article to start with, then flesh it out. I could try to anticipate any of a number of different questions you might have, but perhaps I'll just say feel free to ask if you have any. An alternate solution would be to send to me what information you have, and I could rework it when I find the time (and inclination) :) Andrew From imaclean at gmail.com Sat Jan 14 08:20:09 2006 From: imaclean at gmail.com (Ian MacLean) Date: Sat Jan 14 08:20:15 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <2e5.8ddc4c.30f9ebc2@aol.com> References: <2e5.8ddc4c.30f9ebc2@aol.com> Message-ID: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> On 1/14/06, Bowerbird@aol.com wrote: > ian said: > > Fair enough - then the iRex device > > might be a better bet. And less proprietry. > > might be. except i heard it will be $300-$500. > with no deep pockets to subsidize and back it. hmm iRex is a spin off of Phillips - thats fairly deep pockets. > way too much for a dumb terminal in this age... Sure its quite expensive - but its only the 2nd or third e-ink based device out there and that is of course its biggest selling point - the readability of the screen. And besides - an ipod is $300 and it only plays music ... From hart at pglaf.org Sat Jan 14 11:06:59 2006 From: hart at pglaf.org (Michael Hart) Date: Sat Jan 14 11:07:01 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> Message-ID: On Sun, 15 Jan 2006, Ian MacLean wrote: > On 1/14/06, Bowerbird@aol.com wrote: >> ian said: >> > Fair enough - then the iRex device >> > might be a better bet. And less proprietry. >> >> might be. except i heard it will be $300-$500. >> with no deep pockets to subsidize and back it. > > hmm iRex is a spin off of Phillips - thats fairly deep pockets. >> way too much for a dumb terminal in this age... > > Sure its quite expensive - but its only the 2nd or third e-ink based > device out there and that is of course its biggest selling point - the > readability of the screen. "The Medium Is The Massage!" > And besides - an ipod is $300 and it only plays music ... But it plays a LOT of music!!! eBook readers should certainly be able to hold as many books as MP3 players hold tunes!!! Not to mention that iPods will do eBooks, right from the 1st week they were ever on the market, but I'll be that the new eBook readers won't do iTunes. . .at least for now. . . . ;-) Michael From Bowerbird at aol.com Sat Jan 14 11:27:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Jan 14 11:27:33 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <6d.53d0baa4.30faaa9b@aol.com> ian said: > hmm iRex is a spin off of Phillips - thats fairly deep pockets. oh, right. why do you think they didn't put the big name on it? ian said: > And besides - an ipod is $300 and it only plays music ... "only"? i'd guess people are willing to pay a lot more for music than for books. i don't have figures on music, but the number of books read annually by the average american is _one_. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/d9b7cb68/attachment.html From mattsen at arvig.net Sat Jan 14 11:09:30 2006 From: mattsen at arvig.net (Chuck MATTSEN) Date: Sat Jan 14 11:28:49 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> Message-ID: On Sat, 14 Jan 2006 13:06:59 -0600, Michael Hart wrote: > "The Medium Is The Massage!" Thank god, someone who quotes that one correctly... :-) -- Chuck Mattsen (Mahnomen, MN) mattsen@arvig.net http://eot.com/~mattsen/mtsearch.htm From hart at pglaf.org Sat Jan 14 12:06:16 2006 From: hart at pglaf.org (Michael Hart) Date: Sat Jan 14 12:06:17 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <6d.53d0baa4.30faaa9b@aol.com> References: <6d.53d0baa4.30faaa9b@aol.com> Message-ID: On Sat, 14 Jan 2006 Bowerbird@aol.com wrote: > ian said: >> hmm iRex is a spin off of Phillips - thats fairly deep pockets. > > oh, right. why do you think they didn't put the big name on it? > > > ian said: >> And besides - an ipod is $300 and it only plays music ... > > "only"? > > i'd guess people are willing to pay a lot more for music > than for books. iTunes alone is closing in on a million sales already. We've already seen the first million selling download. > i don't have figures on music, but the > number of books read annually by the average american > is _one_. Where can I look up things like that? > > -bowerbird > mh From hart at pglaf.org Sat Jan 14 12:07:38 2006 From: hart at pglaf.org (Michael Hart) Date: Sat Jan 14 12:07:40 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> Message-ID: On Sat, 14 Jan 2006, Chuck MATTSEN wrote: > On Sat, 14 Jan 2006 13:06:59 -0600, Michael Hart wrote: > >> "The Medium Is The Massage!" > > Thank god, someone who quotes that one correctly... :-) Well, there WERE censored editions [Texas?] that required the book to be "The Medium Is The Message" or so I have been told. mh From Bowerbird at aol.com Sat Jan 14 12:14:12 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Jan 14 12:14:28 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <1f0.4a97b395.30fab594@aol.com> michael said: > Where can I look up things like that? i think it was in the recent report on reading out of the n.e.a. which was a hatchet job, so i'm not sure how trustworthy it is, so... but i think it's no longer a major secret that americans don't really read a lot... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060114/f132568a/attachment.html From walter.van.holst at xs4all.nl Sat Jan 14 12:49:09 2006 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Sat Jan 14 12:49:19 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> Message-ID: <43C963C5.4080400@xs4all.nl> Ian MacLean wrote: > >> might be. except i heard it will be $300-$500. >> with no deep pockets to subsidize and back it. >> >> > >hmm iRex is a spin off of Phillips - thats fairly deep pockets. > > How much of a pipe dream would it to try to get iRex to introduce a version with a 1 GB CF2 card included with a selection from Gutenberg? I mean, having to pay about 400 E for an eReader that already includes a few thousand classics changes the equation quite a bit. Regards, Walter From sly at victoria.tc.ca Sun Jan 15 00:23:27 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Sun Jan 15 00:23:49 2006 Subject: [gutvol-d] Chinese names in PG catalog In-Reply-To: References: <43C82B09.2050205@perathoner.de> Message-ID: I've put this message here, rather than on the low-traffic catalogers list, so as to reach more people--see request for help at the end. I've been editing the author headings for Chinese names in the catalog over the last few days, in an effort to get closer to some kind of consistancy. You can see some of the results by looking towards the bottom of the pages: http://www.gutenberg.org/browse/authors/other.html http://www.gutenberg.org/browse/languages/zh For anyone interested in the details, I'm aiming to have the main form using Pinyin romanization, with no tone marks, as used at the Library of Congress, since their Pinyin conversion (of which day 1 was Oct. 1, 2000). Then I'd like to have the names in Chinese characters as a secondary form, if possible. I'll also include other romanized forms if they seem wide-spread enough. However, I've about reached the limit of what I can do. (and there are no guarantees that I have not made any blatant errors, although I've tried to be careful). So this is a request for anyone more familiar with the language who might be able to help check what I have done and do the same for the remaining Chinese authors... Andrew From hyphen at hyphenologist.co.uk Mon Jan 16 09:13:12 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Mon Jan 16 09:20:11 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <43C963C5.4080400@xs4all.nl> References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> Message-ID: I am working on Yorkshire dialect poems and text, by John Hartley etext No 17472 and have previously done some of F W Moorman's 3232, 2888 work. There never was and never will be grammar or dictionaries for Yorkshire dialect, and there were *many* variations extant in late 19th/early 20th centuries. I was brought up in the West Riding and am doing another book about North Riding dialect, only 100km away and find it difficult to understand. Conventionally there are three variations for the three Yorkshire Ridings extant at the present day. My mother a teacher in the 1920s could detect several variations in a single *town*. Think about English before Dr Johnson, or American before Noah Webster. I am told by the whitewashers that it is *essential* that all text for PG pass guiguts. Because this assumes that the language scanned is American it gives 90% plus false positive errors, on my books, which is totally unsatisfactory for any piece of test software. Is there a language free version of Guiguts? -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From hiddengreen at gmail.com Mon Jan 16 10:45:25 2006 From: hiddengreen at gmail.com (Cori) Date: Mon Jan 16 10:53:10 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> Message-ID: <910fee4a0601161045t71d44beci1f5476bbc079bc3a@mail.gmail.com> Hullo Dave, and all. > Is there a language free version of Guiguts? I'm guessing you mean language-free version of Gutcheck, since Guiguts (one of the custom-written eText processors used at Distributed Proofreaders) is essentially language-free (its interface is in English, but it copes with all sorts of odd characters in other languages.) > I am told by the whitewashers that it is *essential* that all text for PG > pass guiguts. Because this assumes that the language scanned is American > it gives 90% plus false positive errors, on my books, which is totally > unsatisfactory for any piece of test software. This is just my thought, so I expect a WWer will reply shortly and far more authoritatively. But Gutcheck flags are warnings, not necessarily errors. It *is* necessary to *check* all of them, but unnecessary to *fix* all of them. For example, in a "quoted sentence ending in a footnote marker,"[1] ... Gutcheck will grouse about unspaced quotes, whereas obviously this is quite fine. However, in other places in the text, the"spacing of quotes might well have gone astray," and that would be a fixable error. Some warnings, such as for non-ASCII characters, may be rather redundant in a Latin-1 or UTF-8 file. I use Guiguts to check the Character Counts present (to make sure there aren't any unexpected characters) and then turn off this warning for Gutcheck with a clear conscience. As long as the check is done at an appropriate point in processing, **fully**, the Gutcheck warnings are duplicating what you already know about the file. Hope this helps - it's non-official, but informed through many, many cheery hours of Gutchecking :) Cori From jtinsley at pobox.com Mon Jan 16 12:54:37 2006 From: jtinsley at pobox.com (Jim Tinsley) Date: Mon Jan 16 12:55:12 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> Message-ID: <20060116205437.GA3423@panix.com> On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop wrote: >I am told by the whitewashers that it is *essential* that all text for PG >pass guiguts. Because this assumes that the language scanned is American >it gives 90% plus false positive errors, on my books, which is totally >unsatisfactory for any piece of test software. > >Is there a language free version of Guiguts? I'm not quite sure which question you're asking, and about which checking tool, but I think there is some confusion somewhere, of emphasis if not of fact, and I'm continually surprised by people who don't know the origins of really quite recent procedures I remember vividly, and I've had several threads recently about this general subject of checking, so please bear with me while I regurgitate history. I hope you'll find a satisfactory answer in here somewhere. Anybody can use any programs they like to make texts, and different people do use different tools, according to their own needs or the needs of the individual texts. Considering that we get French and German and Esperanto and Chinese texts, not to mention older English, there is no one-size-fits-all solution for language. Once, there were no checking tools at all, except for spellcheckers built into Word Perfect and Word, which is what most people used, and I could tell you some stories about having to convert those! David Price and Martin Ward and I made checkers that we used for ourselves. There may have been others, but those are the ones I'm aware of. Everything else was Mark One Eyeball. I had done a lot of cleaning-up work on a lot of texts for various people, and I would then send those on to Michael for posting. They would commonly take hours of work each. In self-defense, I wrote a checker I (later) renamed to gutcheck. When the WWs were formed in 2001, I brought gutcheck with me, and we all used it to find errors quickly in incoming texts. It was still standard, at that time, for gutcheck to find anything up to 50 or 100 errors in a typical incoming text. Checking and fixing could still take hours, and often involve long threads with the submitter. Up till then, there was really no difference between DP and Other texts, though because the people who mostly submitted from DP were experienced, and because DP favored simple texts (! yes, it's true), they were easier than the usual. When DP hit Slashdot, in late 2002, I was still posting the majority of texts, and both the quantity and quality of texts coming from DP went nuts. And so did I. To put it mildly, I got mediaeval on peoples' asses about the quality of incoming texts. I still wince when I remember some of the things I said then. But the point is that the few WWs couldn't possibly handle the amount of work now being spewed at us. What happened next was a kind of arms race between submitters and WWs. Submitters didn't want to have their texts bounced, or go through a long re-checking thread, so they adopted the checking tools we used to ensure that we wouldn't easily find errors. (Which, in a way was kind of a bad thing. It used to be that I knew that gutcheck would find about _half_ of the errors in an incoming text, but if the submitter had used gutcheck, I would find none, but would have no idea how many more I had to look for. I used to have lots of fun when I found a new check to add but hadn't released the new version yet. Heh. Anyway...) The most significant feature of DP, I often think, is that because of the need for multiple people to work on the same text, new information and methods propagate and are assimilated much faster there than elsewhere. In March 2003, Charlz set up the PPV system to meet the new pressures. New producers/PPs would have their file checked by more experienced people, who have come to do, at least for DP, most of the work that the WWs did pre-Slashdot. I burned out, and had to go away on an extended business trip anyhow. David Widger started actively WWing other peoples' submissions, and between the new PPV system and David, things became stable again, but at a higher volume than before. A couple of months later, Steve Schulze (thundergnat) responded to the need for people who couldn't easily work with command-line tools to use gutcheck, and wrote GuiGuts, which uses gutcheck to create a list of things to check, and does a whole lot of other things as well, in a GUI. It has become the standard "Swiss Army Knife" for preparing texts in DP. I will be forever grateful to him for saving me from having to write a cross-platform GUI for gutcheck! :-) And GuiGuts and gutcheck have accreted features ever since. If you have GuiGuts, then you have gutcheck, since Steve bundles it with GuiGuts -- and you also have a large number of other tools that may or may not be useful for the particular text you're working on. There are many other checkers available as well, and I'd love to ramble on about them, but this is too long already, and it doesn't bear on your question. This is how it comes -- by evolution, not by fiat -- that incoming texts are checked with _several_ tools, according to what seems appropriate for the text, but most commonly with gutcheck and/or GuiGuts. Of course, we don't catch all the errors, but we mostly don't have to spend hours on each one anymore either. With texts from DP, we know that usually two people have gone through more-or-less the same list of checks that we do, so mostly we don't find much that needs querying. But still we give each one a once-over. Now, _which_ tools are going to get used by a WW will depend on the person and the text. "Text-checking" (scannos, letter-combinations, etc.) in gutcheck is pretty useless outside "normal" modern English prose, because of the false positives. You can switch it off by using the -t switch from the command line. Or, running through GuiGuts, in Fixup/Gutcheck options, just tick the -t option to disable. But there are also other checks like scannos and regexes in GuiGuts that may give a lot of false positives when run against a text heavy in dialect. So when you say "pass GuiGuts", I don't know exactly what you mean. The things that GuiGuts and gutcheck (and the various other checkers) note are _queries_, not pass/fail items. If the author wrote "beear", then that's what he wrote. Some functions (but I couldn't offhand give you a list of which) in GuiGuts may query it, and so might gutcheck, or GutAxe, or gutspell, or check-punct, or whatever. In fact, I'm surprised you got a comment about it at all, unless there were real errors in the text that could have been caught by the commonest of checks used today. Getting into discussion threads with submitters is a HUGE burner of time that, for the most part, the WWs don't have, so we don't start one except when we must. It's still a bit of an arms race between the producers and the checkers, whether those are WWs or PPVs. It doesn't matter whether you use one tool or another, so long as the result is at least good enough that whoever checks your file won't find any problems. I had a thread with a submitter recently in which I bounced a text, saying that I had spent 18 minutes to find the first error, and the submitter asked what I do and I said something like "Well, I run the standard checks, and I look at those and call up any extra checks I think might apply and I actually _read_ paragraphs from the text for about half an hour, and if I can't find any problems in that time, I consider it goes clean," and he said "OK, then next time, I just have to hold you off for 12 more minutes! :-)" The thing about this particular arms race is that it is beneficial. Because the producers are always trying to get it past the checkers clean, and the checkers are always trying to catch something wrong in the incoming texts, the overall quality level goes relentlessly up. If every checker could spend hours and hours on every text, it would go up more, but as many people on this list know, checking is hard and tiresome work, and people who are willing and experienced and good at it are always in demand, and there are always more texts coming in -- which is a GOOD thing! -- so we have to accept that there is only so much we can do in any given case. jim (Now tell me that all you wanted was the -t switch. :-) From Bowerbird at aol.com Mon Jan 16 13:16:47 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jan 16 13:17:22 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: <254.4cc4a1d.30fd673f@aol.com> seems like you should welcome new tools... anyway, when are scans going to go online? until the general public can compare e-texts to the page-scans, you simply aren't using the best "checker" at your disposal -- all of their eyeballs. "debugging is parallelizable" (a.k.a. distributable), but hey, _only_ if you actually _set_it_up_ that way... of course, if you _want_ to do all the checking yourself... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060116/66b8e365/attachment.html From darrenburnhill at hotmail.com Tue Jan 17 04:15:27 2006 From: darrenburnhill at hotmail.com (Darren Burnhill) Date: Tue Jan 17 04:16:13 2006 Subject: [gutvol-d] Yorkshire Dialect In-Reply-To: <20060116200003.40CF68C28F@pglaf.org> Message-ID: Hi, > There never was and never will be grammar or dictionaries for Yorkshire > dialect, Forgive me for pointing you toward things you already know; There are a few others (A grammar of the dialect of Windhill by Joseph Wright, etc.), but Folklore and Customs of the North Riding of Yorkshire by Richard Blakeborough is probably right up your street as it contains a substantial glossary. I do have a (reprint) copy that I will get round to OCRing, but i'm working my way through the 'Old Yorkshire' series first. Meanwhile there are a few copies in the Bradford system; http://www.biskit.yorks.com/ This may also be of interest: http://www.yorksj.ac.uk/dialect/ IME the dialectical variants still exist to this day. >From sunny Shipley ;) From hyphen at hyphenologist.co.uk Tue Jan 17 04:53:44 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Jan 17 04:54:40 2006 Subject: [gutvol-d] Yorkshire Dialect In-Reply-To: References: <20060116200003.40CF68C28F@pglaf.org> Message-ID: On Tue, 17 Jan 2006 12:15:27 +0000, "Darren Burnhill" wrote: |Hi, | |> There never was and never will be grammar or dictionaries for Yorkshire |> dialect, | |Forgive me for pointing you toward things you already know; | |There are a few others (A grammar of the dialect of Windhill by Joseph |Wright, etc.), but Folklore and Customs of the North Riding of Yorkshire by |Richard Blakeborough is probably right up your street as it contains a |substantial glossary. As I explained, but you snipped there are massive differences between the dialects of the three Ridings. :-( not to mention the changes with time. I would not consider any of the existing glossaries definitive. I generally use Kellett's The Yorkshire Dictionary. |I do have a (reprint) copy that I will get round to |OCRing, but i'm working my way through the 'Old Yorkshire' series first. |Meanwhile there are a few copies in the Bradford system; |http://www.biskit.yorks.com/ Great, let me know what else you are considering doing, so we do not start the same things. I have almost finished Ben Preston's "Dialect and other poems". I have partially done Yorksher Puddin' by John Hartley, Yorkshire Folk Talk, By Morris harvested from http://www.genuki.org.uk/big/eng/YKS/Misc/Books/FolkTalk/ I hope eventually to do all John Hartley's work, but I doubt I will ever finish it. |>From sunny Shipley ;) From windy Shelf ;-) -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From hyphen at hyphenologist.co.uk Tue Jan 17 06:13:09 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Jan 17 06:14:23 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <254.4cc4a1d.30fd673f@aol.com> References: <254.4cc4a1d.30fd673f@aol.com> Message-ID: On Mon, 16 Jan 2006 16:16:47 EST, Bowerbird@aol.com wrote: | |seems like you should welcome new tools... If they work on my projects |anyway, when are scans going to go online? When I get tools which work half way OK, and can get the submission process sorted to my satisfaction. -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From Bowerbird at aol.com Tue Jan 17 09:57:58 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jan 17 09:58:08 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: <9d.6f74a718.30fe8a26@aol.com> dave said: > If they work on my projects ... > When I get tools which work half way OK, and can > get the submission process sorted to my satisfaction. actually, dave, my post was directed at project gutenberg. :+) but i'm glad you're thinking along similar lines... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060117/15ca1160/attachment.html From hyphen at hyphenologist.co.uk Wed Jan 18 18:55:17 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Wed Jan 18 18:55:32 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <254.4cc4a1d.30fd673f@aol.com> Message-ID: On Tue, 17 Jan 2006 14:13:09 +0000, Dave Fawthrop wrote: | |Finally guiguts is as it stands unusable on my texts. No doubt I will find |other equally drastic problems I have now played with gutcheck a bit more and worked out what some of the specifically *American* tests are which are not true in Yorkshire dialect are, or indeed *English* which translates to Queens English in American. Without the code or indeed a working knowledge of Perl, these are based purely on the gutcheck false errors found. My computer languages are C Fortran77, Basic, some Pascal, and a little Cobol. 1. Gutcheck objects to words consisting of one or more consonants without a vowel. These are common in Yorkshire dialect Th' T' etc.etc. the apostrophe indicates missing letters 2. Gutcheck objects to words of three vowels or more eea and similar words occur in Yorkshire dialect. 3. Gutcheck gets its knickers completely in the twist on single quotes ' It assumes that the single quote is speech, Whereas in English (Queens English in American) single quotes are uncommon and double quotes indicate speech. Hartley, especially uses double quotes for speech. It misinterprets apostrophes indicating missing unsounded letters as single quotes. 4. Gutcheck also gets its knickers in a twist about double quotes and objects to " " I have not worked out why, but no doubt that will give me more sleepless nights come to me at about 3 am :-( 5. Gutcheck also assumes that a line end is a paragraph end, which is not true in poetry, even American poetry. Speech commonly spans lines in all poetry, many lines in the poetry I work on, and may well span stanzas in Harleys work. No doubt I will find other systematic errors in the way Gutcheck works on my text. Sorry about the threading. Agent is sending things by email which I had meant to send to the list. :-( -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From donovan at abs.net Wed Jan 18 19:36:27 2006 From: donovan at abs.net (D Garcia) Date: Wed Jan 18 20:14:28 2006 Subject: [dp-pg] Re: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <254.4cc4a1d.30fd673f@aol.com> Message-ID: <200601182236.27702.donovan@abs.net> On Wednesday 18 January 2006 09:55 pm, Dave Fawthrop wrote: > On Tue, 17 Jan 2006 14:13:09 +0000, Dave Fawthrop > > wrote: > |Finally guiguts is as it stands unusable on my texts. No doubt I will find > |other equally drastic problems > > I have now played with gutcheck a bit more and worked out what some of the > specifically *American* tests are which are not true in Yorkshire dialect > are, or indeed *English* which translates to Queens English in American. > Without the code or indeed a working knowledge of Perl, these are based > purely on the gutcheck false errors found. My computer languages are C > Fortran77, Basic, some Pascal, and a little Cobol. Dave, perhaps you're labouring under a misapprehension here. Your comments seem to indicate some confusion. Jim Tinsley's gutcheck is pretty much 100% plain vanilla C code. Steve (thundergnat's) guiguts is 100% perl, with the ability to act as a front end interface to external programs such as gutcheck. Source code to both is readily available and included in the downloads last I checked. I'm not sure what you expect either of those developers to do about your situation. It seems to me that you are trying to use a hammer when what you really need is a 4mm Torx driver. Obviously, both programs were developed with the intent of checking the most common texts submitted to PG--English. Others with specialised needs (such as decrufting poorly OCR'ed Fraktur, or old long-ess texts) have developed their own specialised tools for those specific purposes. They had the subject matter expertise and the technical skills to implement these. Quick Google searches will reveal other similar tools directed at their niches. I completely support the preservation of strongly localized texts such as those you are working with. Have you considered applying your skills in C and Yorkshire to create a customised version of gutcheck for your needs? From hyphen at hyphenologist.co.uk Thu Jan 19 01:32:28 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Thu Jan 19 01:32:42 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <20060119011933.GA16170@panix.com> References: <20060119011933.GA16170@panix.com> Message-ID: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> On Wed, 18 Jan 2006 20:19:33 -0500, Jim Tinsley wrote: |On Wed, 18 Jan 2006 11:44:46 +0000, Dave Fawthrop wrote: | |>On Mon, 16 Jan 2006 15:54:37 -0500, Jim Tinsley |>wrote: |> |>|On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop wrote: |>| |>| |>|>I am told by the whitewashers that it is *essential* that all text for PG |>|>pass guiguts. Because this assumes that the language scanned is American |>|>it gives 90% plus false positive errors, on my books, which is totally |>|>unsatisfactory for any piece of test software. |>|> |>|>Is there a language free version of Guiguts? |>| |>|I'm not quite sure which question you're asking, and about which |>|checking tool, but I think there is some confusion somewhere, of |>|emphasis if not of fact, and I'm continually surprised by people who |>|don't know the origins of really quite recent procedures I remember |>|vividly, and I've had several threads recently about this general |>|subject of checking, so please bear with me while I regurgitate |>|history. I hope you'll find a satisfactory answer in here somewhere. |> |>I could only find one tool which shows on my Win XP computer that is |>guiguts. This as far as I can ascertain has various subroutines which |>are very badly tied together, and in no way at all follow the Windoze |>interface. |> |>|Anybody can use any programs they like to make texts, and different |>|people do use different tools, according to their own needs or the |>|needs of the individual texts. Considering that we get French and |>|German and Esperanto and Chinese texts, not to mention older English, |>|there is no one-size-fits-all solution for language. |> |>To get things past whitewashers one apparently must use this, or things get |>rejected. Your assertion is therefore clearly theoretically correct, but |>in reality absolutely wrong | |Now, this I can flatly deny. There is no such thing as a standard without a test to show that the test has been passed. Schools teach to the exam, which some think wrong, but it happened to me and judging by the media Brouhaha still happens in the UK. I was an Engineer, and there a draughtsman who failed to put a test (tolerance) on anything in a drawing, was exposed to public ridicule. If such a drawing got onto the shop floor the production departments would deliberately fail to follow any reasonable tolerance. | I can think of half-a-dozen people |offhand, regular producers, who don't use gutcheck in any form. |They don't need to. Their quality standards are high enough that |it won't find any real errors. I do run it on their texts, as a |matter of form, but I know in advance what the result will be. |For all I know, there are others, equally good, but I don't know |that they don't use gutcheck because the subject never comes up. |Most of us, of course, are not that good. You have just admitted that gutcheck is the standard on PG. | |Bill Flis, who wrote the GutWrench package, uses his own checkers |exclusively, and I know equally well that I won't find any errors |that can sanely be caught by automation in his texts either. You can |find them, if you're interested, at http://www.pgdp.net/tools/GW.zip | |>|Once, there were no checking tools at all, except for spellcheckers |>|built into Word Perfect and Word, which is what most people used, and |>|I could tell you some stories about having to convert those! |>| |>|David Price and Martin Ward and I made checkers that we used for |>|ourselves. There may have been others, but those are the ones I'm |>|aware of. Everything else was Mark One Eyeball. |>| |>|I had done a lot of cleaning-up work on a lot of texts for various |>|people, and I would then send those on to Michael for posting. They |>|would commonly take hours of work each. In self-defense, I wrote a |>|checker I (later) renamed to gutcheck. When the WWs were formed in |>|2001, I brought gutcheck with me, and we all used it to find errors |>|quickly in incoming texts. |> |>But gutcheck gives 90% plus false positive errors, many hundreds on my |>texts in Yorkshire Dialect, mostly poems. It enforces the American |>language, and American punctuation conventions. It objects to most |>Yorkshire abbreviated words such as t' which occur dozens of times |>in the poems I work on. It also objects to non standard punctuation |>which occur in my texts as an example "? whereas American convention |>apparently is ?" . |> |>Writing as one who has designed, written and sold language software for |>some 20 years (see my web site). The *first* stage in the design of any |>software involving language is how other languages will be treated. This |>is usually done by putting all the features of one language, in a specific |>data structure(s) and/or subroutine(s) which can be used or not as |>required. |> |>All I asked for was a copy of gutcheck with the features specific to |>American removed which should be a very short editing and recompiling job. | |I'm not sure how you define "American", but ALL gutcheck features are |language-specific, one way or another. You really appreciate this when |checking Hebrew or Tagalog! Even the relatively familiar French, |German and Spanish have various punctuation features quite |incompatible with gutcheck's assumptions. I'm talking with various |LOTE producers about language-specific versions, but have not yet |decided to take any action. Then gutcheck should be modified to have versions for many languages. If you read the Subject of this thread, you will find: "Language free version of guiguts?" | |>Worse the only way to view output is on a screen. Copy does not work so it |>is impossible to copy the output to a text file and edit the repeated false |>positives out of the list. It is totally unacceptable to distribute a GUI |>program where the standard Copy and Paste functions do not work |> |>Worse still and absolutely ***unforgivable*** in any GUI program the |>settings places the settings file on ***THE DESKTOP***. Deleting it loses |>all settings. |> | |I can't comment on GuiGuts. As a command-line guy, I don't use it all |that much, except sometimes, when I find some specific feature |invaluable. If you want to comment, the appropriate place is in the |GuiGuts thread of the Tool Development forum at DP, which Steve reads |and answers questions and requests in. |http://www.pgdp.net/phpBB2/viewforum.php?f=13 I have asked the question here. I do not do forums. |>|Up till then, there was really no difference between DP and Other |>|texts, though because the people who mostly submitted from DP were |>|experienced, and because DP favored simple texts |> |>DP is by its nature not suitable for my texts, because the language is as |>different from American as say French. A non Tyke (yorkshireman) as has |>been shown in the past, has extreme difficulty understanding the text. | |Well, considering that they regularly do several languages, I doubt if |Yorkshire dialect would stand out much. Right now, in round 1, I find: |English, German (math, with LaTeX), Finnish, French with Scots, Middle |English, Middle French, Portuguese, English with Ancient Greek, |Spanish, Italian, Dutch, German, English with Breton, French, Tagalog, |Latin, and I just know there's some Esperanto around somewhere. I know |they've also done Irish (sean-litri?), because I had a hell of a time |finding all the correct characters for the UTF-8 version (and I'm |still not convinced about Tironian-et). Of course, if you want real |variety, you need to hit the European DP. | |>|And GuiGuts and gutcheck have accreted features ever since. If you |>|have GuiGuts, then you have gutcheck, since Steve bundles it with |>|GuiGuts -- and you also have a large number of other tools that may |>|or may not be useful for the particular text you're working on. |>| |>|There are many other checkers available as well, and I'd love to |>|ramble on about them, but this is too long already, and it doesn't |>|bear on your question. |> |>|This is how it comes -- by evolution, not by fiat -- |> |>Untrue! |>I am *forced* to use guiguts/gutcheck by the Whitewashers. | |I say again: not everyone does. Just eradicate all mistakes and nobody |will ever know what you used. | |>Gutcheck does not work on Windoze. | |It runs in a Win32 command prompt, but it doesn't have a GUI on any |platform. "You have to be joking MAN" | |>| that incoming |>|texts are checked with _several_ tools, according to what seems |>|appropriate for the text, but most commonly with gutcheck and/or |>|GuiGuts. |> |> |>Finally guiguts is as it stands unusable on my texts. No doubt I will find |>other equally drastic problems |> |>As all my work goes on my own web site, and gets copied from there onto |>many other sites, PG is just a nice add on and could be ditched if it were |>to take too much effort. |> |>The text which WW objected to so strongly has been on my site for a couple |>of years, and absolutely *nobody* has noticed the ?errors? People read it |>for the dialect, not the punctuation. I have however had several |>appreciative emails. | |Well, I'm very familiar with that condition, but that's a whole |'nother argument. A text does not have to be perfect to be valuable. |We have many older texts, especially, that have many errors. That |doesn't make them useless. I handle most of the errata reports for PG, |and nearly all of then express appreciation for the availability of |the text, along with their handful of reported errors. I may find |another hundred or so problems when I check the text out, but these |readers never noticed them. Two million downloads a month, with (I |estimate) about one million errors among 17,000 books, and we get |about one errata report per day. | |And there are many people who do want to make etexts but don't want |to live within the constraints of PG -- some don't want the |quality-checking, some complain that we don't quality-check enough, |some don't want to work in plain text, some don't want to go through |the clearance procedures, and so on. | |We have 40 to 60 submitted texts in the average week, and three WWs |active to take them at the moment. If everything in an incoming text |is perfect, one of us will spend about an hour on it. Plus a load of |time on other activities. We can't accommodate everyone on everything, |and there is no doubt that the quality gets higher as time goes on, |because of the processing that we do. This is what we have to do, |to keep the operation moving and the quality high. Not everyone is |going to be happy with the process. Some will choose not to send their |texts to PG. I'm sorry about that. | |>|(Now tell me that all you wanted was the -t switch. :-) |> |>I am not going back to the bad old Unix days, when each program had to be |>learned individually. Come back Bill Gates. All is forgiven. | |Well, I say again, if you don't want to use it, you don't have to; |not everyone does, and especially not everyone does for all texts. |It's essentially a collection of regexes, selected to give, on |average, the best results for the most common type of PG files. |Many |DPers who work on other types of texts just put together their own set |of regexes, and run them through GuiGuts or GutWrench or from a *nix |command line, whichever they prefer. I do not do windoze programming. You are essentially saying that a non programmer can work for PG. :-( Did you really mean that? You have agreed with me above that gutcheck is the standard which must be passed to get. I am just trying to find a version of that standard which will run on my machine, with the text As I understand it that answer to a perfectly reasonable request see Subject from PG was: ****************** ***GET STUFFED.*** ****************** I will look for a workaround. -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From traverso at dm.unipi.it Thu Jan 19 02:21:45 2006 From: traverso at dm.unipi.it (Carlo Traverso) Date: Thu Jan 19 02:08:27 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> (message from Dave Fawthrop on Thu, 19 Jan 2006 09:32:28 +0000) References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> Message-ID: <200601191021.k0JALjx02910@pico.dm.unipi.it> I have times ago done some work to build a multilingual form of gutcheck, (and I still think that it is a very reasonable aim) but I stopped when Jim refused the very idea that this should be done. I am still using my (now obsolete) version of gutcheck with the french customization. My idea was that some constants, (for example, the list of vowels and the list of strings suspicious inside a word) instead of being had-wired in the code should be contined in constants defined in header files included at compile time. If you want, I can try to update my version, and discuss extensions to other languages. Carlo Traverso From sly at victoria.tc.ca Thu Jan 19 02:58:16 2006 From: sly at victoria.tc.ca (Andrew Sly) Date: Thu Jan 19 02:58:23 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> Message-ID: First, some general musing here... It's interesting to see how as the number of people involved with PG in one way or another keeps growing, so does a general misunderstanding about the nature of the project. Somehow, the impression of some people is that it all runs like clockwork, and all little ambiguities are swiftly and efficiantly dealt with. I can understand that when someone sees the sheer amount of what has been accomplished so far, it can be easy to assume that "of course, _this_ has been done--it wouldn't make sense otherwise." Realistically, the processes that are in place grew up over time, with volunteers doing their best to deal with the demands of the moment and string something together that would work. And it is not static either, it keeps changing. I'm not kidding when I say that the few people who do the majority of the back-end stuff that keeps PG growing have a backlog of years of PG-related tasks to tackle. So, on the specific topic at hand... On Thu, 19 Jan 2006, Dave Fawthrop wrote: > You have just admitted that gutcheck is the standard on PG. Yes, it is used a lot. Often it's a very useful tool (sometimes even for non-english texts.) However, I would not call it a standard in the sense of being a "test" that a given text has to "pass" (such as a test for valid markup on an html file). Rather, it is a tool which just about every text being added to the collection is run through, as a way of 1) assesing the over-all level of the text, and 2) guarding against last-minute gremlins that do unexpected things to a text (and yes, interesting things do happen sometimes.) I have submitted some German and French texts to PG which I have reformatted from other sources, and, as expected, a run through gutcheck resulted in many places being questioned that were just fine in the given languages. So, if I thought it needed, I just added a note when submitting the texts that "gutcheck flags a lot of false positives on this one." It looks like the source for gutcheck is availible at http://gutcheck.sourceforge.net/ if you are interested in modifying it for your own uses. (If you are just dealing with one or two texts, it might not be worth the bother, but if you foresee working through lots of Yorkshire text, it could be more worthwhile.) ...... So, will the conditions I discussed above change? Well, PG is certainly more organized in some ways than it used to be, and I could see it going further in that direction. However, I don't realistically see it ceasing to be run by volunteers, which does set some of the tone. I'm not pretending that I think PG is perfect here. Like anyone else who is involved, I have my own issues (one of my pet peeves is if the stated character encoding in the header does not match what is actually in the text), but I know they will not likely be dealt with unless I go ahead and try to work on them. I've found a good approach is building consensus with others. >From the cataloging point of view, I've regularly had help from native speakers of various languages (Finnish and Tagalog spring to mind) which has helped me to make bibliographic data more precise than I ever could have managed on my own. As well I've occasionally sent queries to the reference desks of libraries in many corners of the world. If I can get it organized, I'm hoping to make a sub-project where I can target a few wikipedia users who have indicated they have fluency in both English and Chinese, and give them a way to help improve the consistency of the author data for our Chinese texts. I'd better stop now, before I meander off-topic too much... But I hope this has helped somewhat. And thanks for caring about Project Gutenberg. :) Andrew From blondeel at clipper.ens.fr Thu Jan 19 02:25:47 2006 From: blondeel at clipper.ens.fr (Sebastien Blondeel) Date: Thu Jan 19 02:58:26 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <200601191021.k0JALjx02910@pico.dm.unipi.it> References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> <200601191021.k0JALjx02910@pico.dm.unipi.it> Message-ID: <20060119102547.GA10893@clipper.ens.fr> I have developed programs to help me proof faster/better. I work mainly in French but they seem to work well in other latin alphabet languages (I tried them a little in English, Spanish). http://www.pgdp.net/phpBB2/viewtopic.php?p=158673#158673 (get in touch with me if you want to give them a try; the CVS-commited version is not the very latest one) I use them to do R1/R2, P1/P2, and, as of recently, P0 that is to say quick preparation of OCR'd texts before publication on PGDP Int'l. I define language-related things (constants, suffixes, prefixes). Right now, apparently being the only user and developer of these programs, there are many special cases for French. But it could be easy to add things for other languages. As an example a French rule is: the word is accepted if it starts with "j'" and continues with a vowel and the rest is an accepted word. For example: "j'aime" (I love) is accepted because "aime" (love) is. "j'arbre" (I tree) is accepted because "arbre" (tree) is. This means nothing of course, but a proofer is bound to spot that: it is not a scanno (and not likely to happen in OCR anyway). Kicking some grammatical checks in would be the next step. Right now the programs are just working on a syntactical basis. I have a list of French words with all their possible grammatical natures (noun / adjective / conjugated verb for this tense and this person...) but unfortunately it was published by ABU under a restrictive license which makes it difficult for me to repackage and reuse. The free list of words I found in Debian packages is very incomplete (it is missing many simple pass? simple conjugated verbs, most if not all subjonctif imparfaits...) In English we could for example decide "'s" is accepted if "" is (and does not finish with an "s"). I am planning to think and develop or reuse things to do PM later on, probably focusing more or less on producing XML TEI. From joshua at hutchinson.net Thu Jan 19 05:32:36 2006 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu Jan 19 05:32:38 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: <20060119133236.3E2F19E851@ws6-2.us4.outblaze.com> > > I do not do windoze programming. > You are essentially saying that a non programmer can work for PG. :-( > Did you really mean that? > > You have agreed with me above that gutcheck is the standard which must be > passed to get. I am just trying to find a version of that standard which > will run on my machine, with the text > Well, I'll try to be nice even though you are being very confrontational... No one but you ever said any of the above. 1 - GutCheck is *not* required. Jim, who is probably the end-all on the subject since he wrote the thing, flat out said it is not required. 2 - GuiGuts and GutCheck are *not* the same thing. GuiGuts is a text editor written in PERL. GutCheck is a text checker written in C. You can run GutCheck from GuiGuts (among many, many other things). 3 - Asking for GuiGuts support here is a waste of time. The developer of GuiGuts isn't here, he is on the DP forums. Which you've flat out refused to go to. Fine, just don't expect help for a Dell laptop when you call HP tech support, either. 4 - No one said you had to work/create tools for PG. But if you want a tool for something that doesn't currently exist, you either create it yourself or do without (or wait until someone else needs it and decides to do the work you don't want to do). Personally, I've done all three over the years. 5 - *AND MOST IMPORTANT!* GutCheck is not a test that must be passed. It is better thought of as a checker that will flag things that are wrong more often than right. You should run it, because it *will* help you find mistakes that exist in your text. You, as an intelligent, thinking human being, must check each item and verify whether it is correct. Sometimes it is right. Sometimes it is wrong. If the system was 100% infalliable, we wouldn't need humans anywhere in the process. Josh From Gutenberg9443 at aol.com Fri Jan 20 15:31:58 2006 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Jan 20 15:32:04 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <1c8.39607e27.3102ccee@aol.com> In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time, robsuth@robsuth.plus.com writes: enquiries as far as I could, but found no product among the DVD portable readers that would deal with text, nor was any of the manufacturers interested in producing one. Apart from the French reader Cybook (which I think is too expensive for the task and market although otherwise for the most part ideal) there seems still to be no special ebook reading device available in UK or the EU generally: laptops are still too big and heavy, and PDAs still have far too small a screen My enquiries confirmed my strong impression that there are protectionist interests holding this back, presumably in the interests of proprietary issues of ebooks. I suggest you go to eBookWise and FictionWise. Although they do not deal with DVD portable readers, they are seeking input into what people need and want, and would be very glad to hear from you. My husband, two of my daughters, several of my friends, and I all use the eBookWise reader which you can find for sale for about $100 at eBookWise.com. By using extra memory cards, you can build immense libraries, or you can use only the memory built into the device and build an immense library on your computer, realizing that you'll have to download often. I have six memory cards and would like to have about sixty more, but they aren't free. Anne Wingate -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060120/21ebeb56/attachment.html From Gutenberg9443 at aol.com Fri Jan 20 15:40:06 2006 From: Gutenberg9443 at aol.com (Gutenberg9443@aol.com) Date: Fri Jan 20 15:40:12 2006 Subject: [gutvol-d] Ebook reading devices Message-ID: <22b.51ffa60.3102ced6@aol.com> In a message dated 1/13/2006 10:07:27 P.M. Mountain Standard Time, imaclean@gmail.com writes: It directly supports PDF, XHTML and Text (Unicode?) formats. An ebook that will support PDF is a good thing, but I wouldn't want it if it will support ONLY these formats. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060120/7132f2fb/attachment.html From bubblegirl at optusnet.com.au Fri Jan 20 15:55:25 2006 From: bubblegirl at optusnet.com.au (Season BubbleGirl - bubblegirl.net) Date: Fri Jan 20 19:09:21 2006 Subject: [gutvol-d] Ebook reading devices References: <1c8.39607e27.3102ccee@aol.com> Message-ID: <001801c61e1c$fced3740$0a01a8c0@bubblegirl> If you want a portable reading device, you can get a PDA and then use a screen magnifier. They are very good and magnify the typical screen by 5x. It's still portable, but a lot easier to read. I use one for everything. Here is my review of one: "This product is called a Screen Magnifier. Officeonthegogo.com are the only makers and retailers of this part, but I think it's worth it. At a price of $29.95US, you can view everything twice the size. How? Designers made a cradle-type structure that slots onto the back of the Pocket PC. A magnifying plate is then to hover over the front of it from long arms. Through this plate users read or use their screen, it not actually touching the screen. It's a light-weight, easily handled product, compared to using a normal magnifier you could drop and break. For PALM users, they customise them to the right size. Because Office On The GoGo make them, they can alter them to the customers' needs. There are two types of stands the magnifying plate is put on. One being a backpack for the PDA, the other a proper stand. The backpack, as mentioned above, surrounds the PDA shell, where the stand props up the PDA for those using clip-on or infra-red keyboard. (I know you're wondering about keyboards - stay tuned for that information in an article coming soon). If you want both of these stands you can buy the "Magnifico Combo" for $39.95US. Tell Mike Sirius that Bubble Girl sent you. " PDA Patrol by Season BubbleGirl Hope this helps! Season BubbleGirl www.bubblegirl.net Where individuality truly shines! Author of A Doggy Diary and the coming autobiography, Life in a Bubble Creator of Music Mash, PDA Patrol, and other free literature at bubblegirl.net ----- Original Message ----- From: Gutenberg9443@aol.com To: gutvol-d@lists.pglaf.org Sent: Saturday, January 21, 2006 10:01 AM Subject: Re: [gutvol-d] Ebook reading devices In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time, robsuth@robsuth.plus.com writes: enquiries as far as I could, but found no product among the DVD portable readers that would deal with text, nor was any of the manufacturers interested in producing one. Apart from the French reader Cybook (which I think is too expensive for the task and market although otherwise for the most part ideal) there seems still to be no special ebook reading device available in UK or the EU generally: laptops are still too big and heavy, and PDAs still have far too small a screen My enquiries confirmed my strong impression that there are protectionist interests holding this back, presumably in the interests of proprietary issues of ebooks. I suggest you go to eBookWise and FictionWise. Although they do not deal with DVD portable readers, they are seeking input into what people need and want, and would be very glad to hear from you. My husband, two of my daughters, several of my friends, and I all use the eBookWise reader which you can find for sale for about $100 at eBookWise.com. By using extra memory cards, you can build immense libraries, or you can use only the memory built into the device and build an immense library on your computer, realizing that you'll have to download often. I have six memory cards and would like to have about sixty more, but they aren't free. Anne Wingate ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d@lists.pglaf.org http://lists.pglaf.org/listinfo.cgi/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/3ab0b4f9/attachment.html From robsuth at robsuth.plus.com Fri Jan 20 17:58:48 2006 From: robsuth at robsuth.plus.com (Robert Sutherland) Date: Fri Jan 20 19:30:45 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <22b.51ffa60.3102ced6@aol.com> References: <22b.51ffa60.3102ced6@aol.com> Message-ID: <6.2.3.4.1.20060121015639.02ce3010@mail.plus.net> I entirely agree - I believe the minimum should be .txt, .rtf, .htm & .pdf, and of course that still leaves the proprietary formats if one is likely to want to use them. But at least conversion from other formats to .pdf is usually straightforward. Robert Sutherland ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At 23:40 20-01-06, you wrote: >In a message dated 1/13/2006 10:07:27 P.M. Mountain Standard Time, >imaclean@gmail.com writes: >It directly >supports PDF, XHTML and Text (Unicode?) formats. > >An ebook that will support PDF is a good thing, but I wouldn't want >it if it will support ONLY these formats. >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.375 / Virus Database: 267.14.21/236 - Release Date: 20-01-06 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/d259e112/attachment.html From robsuth at robsuth.plus.com Fri Jan 20 18:20:21 2006 From: robsuth at robsuth.plus.com (Robert Sutherland) Date: Fri Jan 20 19:30:48 2006 Subject: [gutvol-d] Ebook reading devices In-Reply-To: <1c8.39607e27.3102ccee@aol.com> References: <1c8.39607e27.3102ccee@aol.com> Message-ID: <6.2.3.4.1.20060121015910.02ce3740@mail.plus.net> Ah, But!! I live furth of the USA and Canada (in Scotland), and those readers are not available here (or anywhere in EU, as far as I can make out). No one can tell me why - the providers all talk about incompatibilities, but don't explain what they are; others say the market is too small to justify it, which is plainly nonsense. However, I pin my hopes on Iliad, which Phillips (Netherlands) are likely to bring out in April, but the price is not yet announced. In the meantime I hump my ancient Thinkpad up and down to bed!. The DVD player inquiry was just the tail end of an earlier search - in case one could be found that ran text - their screens are mostly big enough, they are much lighter than laptops and some are very cheap. But alas, none does text! The aggravation of the US/Canada restriction is that it is highly probable that whatever the incompatibility it could be overcome by a simple adapter or by quite simple software of some kind. It is incomprehensible to me that the providers of these devices have not made some kind of modification which would open up this potential market to them - What is the population of the European Union? and India and China, even more, a huge market waiting. Robert Sutherland ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ At 23:31 20-01-06, you wrote: >In a message dated 1/12/2006 12:15:48 P.M. Mountain Standard Time, >robsuth@robsuth.plus.com writes: >enquiries as far as I could, but found no product among the DVD >portable readers that would deal with text, nor was any of the >manufacturers interested in producing one. Apart from the French >reader Cybook (which I think is too expensive for the task and >market although otherwise for the most part ideal) there seems still >to be no special ebook reading device available in UK or the EU >generally: laptops are still too big and heavy, and PDAs still have >far too small a screen My enquiries confirmed my strong impression >that there are protectionist interests holding this back, presumably >in the interests of proprietary issues of ebooks. > >I suggest you go to eBookWise and FictionWise. Although they do not >deal with DVD portable readers, they are seeking input into what >people need and want, and would be very glad to hear from you. > >My husband, two of my daughters, several of my friends, and I all >use the eBookWise reader which you can find for sale for about $100 >at eBookWise.com. By using extra memory cards, you can build immense >libraries, or you can use only the memory built into the device and >build an immense library on your computer, realizing that you'll >have to download often. I have six memory cards and would like to >have about sixty more, but they aren't free. > >Anne Wingate >_______________________________________________ >gutvol-d mailing list >gutvol-d@lists.pglaf.org >http://lists.pglaf.org/listinfo.cgi/gutvol-d > >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.375 / Virus Database: 267.14.21/236 - Release Date: 20-01-06 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060121/62c9c6b0/attachment-0001.html From jtinsley at pobox.com Sun Jan 22 10:28:33 2006 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Jan 22 11:42:33 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <200601191021.k0JALjx02910@pico.dm.unipi.it> References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> <200601191021.k0JALjx02910@pico.dm.unipi.it> Message-ID: <20060122182833.GA25329@panix.com> On Thu, 19 Jan 2006 11:21:45 +0100, Carlo Traverso wrote: > >I have times ago done some work to build a multilingual form of >gutcheck, (and I still think that it is a very reasonable aim) but I >stopped when Jim refused the very idea that this should be done. "Opinions about Tolstoy and his work differ, but on one point there surely might be unanimity. A writer of world-wide reputation should be at least allowed to know how to spell his own name. Why should any one insist on spelling it "Tolstoi" (with one, two or three dots over the "i"), when he himself writes it "Tolstoy"? The only reason I have ever heard suggested is, that in England and America such outlandish views are attributed to him, that an outlandish spelling is desirable to match those views." Love that quote. From Louise Maude's Translator's Preface to "Resurrection". I really must re-scan that, if only to capture the image of Tolstoy's signature -- with a "y" -- above those words. I'm not a writer of world-wide reputation, of course, but I've recently heard such outlandish views attributed to me that I'm beginning to think of signing myself "Jim Tinslei", with, possibly, three decadent dots over the "i". So, did I ever "refuse the very idea that this should be done"? The society that is one of the referents of "Project Gutenberg", as I understand it -- and I'm not at all sure that I do -- is a pretty good model of a Libertarian society. It's even better than a Real Life Libertarian society, since anyone can opt out; try doing that next time your local tax-collector sends you a letter. People do (a) what they want to do and (b) what they think should be done strongly enough that they're willing to spend the hours of their lives doing it. Some people also do (c) what Other People want them to do. Occasionally, or usually. PPVs and WWs spend a lot of time on projects in which they, personally, have no interest. Toolmakers try to accommodate the people who use their tools. DP admins solve problems. Gravediggers move difficult projects along. Like that. The complex society that exists today in and around PG would all fall apart if some people didn't offer themselves as something of a "public utility" in some limited sphere. So I'm used to the idea that people write to me out of the blue asking for help or advice, just as I ask other "public utilities" within the project for help and advice. But in such cases I, or they, are free to refuse, or do something different. Project Gutenberg, however you define it, doesn't sign anybody's paycheck, or make anybody do anything at all. Neither do Michael or Greg as individuals. In fact, those two worthies would walk a country mile in tight shoes to assure you -- gesticulatingly -- that they have about as much influence over what I do as Uri Geller's daily horoscope has over the shape of Reese Witherspoon's toenails. What's more, having the experience of being a public utility in PG yourself, you know this better than most, which is why, when I dug your original email on the subject in July 2002 out of the dumpster of my archives, I was a little annoyed all over again that you had copied Michael and Greg on it. 'Sfunny: I didn't remember the thread, but when I saw the e-mail, I did remember that little sting of annoyance at the assumption that either of them had anything at all to do with my decisions about gutcheck. Which is, of course, as nothing to the annoyance gutcheck has inflicted over the years on various producers, so I guess, karmically, I have it comin'. People who "grew up" in DP were "born" with others looking over their shoulders, and so expect their homework to be corrected unmercifully. Everybody there has fully internalized the knowledge that they, and everyone else, makes mistakes, or, as Juliet more correctly and insightfully remarked, _overlooks_ mistakes. Your own recent excellent work on quantifying that will be invaluable in several ways. Producers who had been making certain kinds of mistakes for years without being aware of it, or having anyone correct them, though, fully appreciated the pun in the name. It is no fun at all having these things pointed out to you for the first time by someone else. Dave's comments are really quite temperate compared to many of the love-notes I received back in 2000-2002. My favorite was "DON'T YOU DARE RUN THIS THING OVER MY OLD TEXTS!!" I was more than somewhat sick myself when I first exposed some of my old work to jeebies, and saw the full extent of my own heebieness, but at least in that case, nobody else saw my shame, and I had no-one to be annoyed at but myself. I mention my annoyance because on re-reading what I wrote in response to your proposal, it does jump off the screen at me, and I apologize belatedly for that. It doesn't, however, have any bearing on my decisions then or now. What you actually proposed was that you should carve up gutcheck into separate files, dealing with separate languages. If there had ever been a day when I decided to sit down and write gutcheck, that's what I might well have done from Day One, but there never was such a day. To me, it's just a handy platform into which I can plug checks that I find useful. As I said at the time, and so often before and since, I don't actually think that the language-specific typo-checking functions should really be in there at all; every text needs a spellcheck, and for texts that have been spellchecked, these functions are only a source of false positives. That's why I added the -t switch when I started sending it out to other people. For me, they were handy as a quick way of getting a hint whether an incoming file had been spellchecked or not. Unfortunately, some producers lulled into a false sense of security by not seeing typos flagged in gutcheck didn't do the spellcheck, which was a problem I had to address around that time, but it seems to be resolved now. That's what I said, and that's what I still believe. I do think that punctuation checks for LOTE are an appropriate add-in, but as a devout monoglot, I'm in no position to define them. I don't have the experience of finding certain error-patterns by hand in LOTE texts. People have suggested specific changes like this from time to time, and I have usually incorporated them, where they don't cause problems somewhere else. A few days ago, I asked any PPVs who want certain punctuation type checks (or removal of existing checks) for LOTE to define some for me. We'll see what comes out of that. Until I see what the requested checks are, I'm not going to decide how to make the changes. I'm certainly not going to refactor the code, or commit myself to working with somebody else's refactored code, in advance of knowing in what way it needs to be changed. Reading over old emails is weird; it brings back context. You wrote to me when I was just setting up the SF site, to get it installed before I released the FAQ and to give the Software Site a permanent link. Up until then, people had got gutcheck directly from me, and often asked for individualized versions, which I mostly made for them. If the checks seemed good by my usual tests, I added them to "my" gutcheck as well. That was the way it worked in that era. I looked forward, at that time, to getting the damthing OUT, so that people could do their own customizing, and I would be free. Free!! Bwahahahah!! Heh. I wished you well in your own customization, and I still do. The volume of LOTE is much greater than it was then, and maybe somebody working in that area (those areas?) will do their own thing. Great! Maybe they'll ask me to customize some specific checks. Very occasionally, people do that still. Maybe some PPVs in specific languages will get together and suggest a coherent agenda to make gutcheck (or some variant thereof) friendly to those languages. I hope they do. They haven't yet. Until that happens, I have more than enough things I want to do, and think should be done, not to spend my limited PG time chasing Other People to tell me things they want me to do . . . or, for that matter, self-indulging in writing long posts to the vandalized wasteland that was once a productive resource for people making etexts. I really have been very lazy since the Christmas break. Back to the grindstone. jim From Bowerbird at aol.com Sun Jan 22 12:21:35 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sun Jan 22 12:21:40 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: anyway, it's a good thing command-line programs are back "in" these days... thus gutcheck was able to just "skip over" that messy graphical-user-interface period, thank heaven... -bowerbird p.s. and no, i am _not_ suggesting that it was anyone's _individual_ responsibility or failure. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060122/3f793c38/attachment.html From hyphen at hyphenologist.co.uk Sun Jan 22 13:16:08 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sun Jan 22 13:16:20 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: Message-ID: <8ft7t19c2cileed5otbp4bvctgqk18948u@4ax.com> On Sun, 22 Jan 2006 15:21:35 EST, Bowerbird@aol.com wrote: |anyway, it's a good thing |command-line programs |are back "in" these days... Not with me. -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From hyphen at hyphenologist.co.uk Sun Jan 22 13:26:24 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Sun Jan 22 13:26:37 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <20060122182833.GA25329@panix.com> References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> <200601191021.k0JALjx02910@pico.dm.unipi.it> <20060122182833.GA25329@panix.com> Message-ID: On Sun, 22 Jan 2006 13:28:33 -0500, Jim Tinsley wrote: |What you actually proposed was that you should carve up gutcheck into |separate files, dealing with separate languages. If there had ever |been a day when I decided to sit down and write gutcheck, that's what |I might well have done from Day One, but there never was such a day. |To me, it's just a handy platform into which I can plug checks that I |find useful. IMO with the advent of huge memory in even the entry level computers, All tests should be in the one program, and the different language versions should be handled by simple switches/radio buttons, as with the various sorts of angle brackets ATM. OK the switches will inevitably become complex and difficult. -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From jtinsley at pobox.com Sun Jan 22 13:39:44 2006 From: jtinsley at pobox.com (Jim Tinsley) Date: Sun Jan 22 13:39:46 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> <200601191021.k0JALjx02910@pico.dm.unipi.it> <20060122182833.GA25329@panix.com> Message-ID: <20060122213944.GA21967@panix.com> On Sun, Jan 22, 2006 at 09:26:24PM +0000, Dave Fawthrop wrote: > >IMO with the advent of huge memory in even the entry level computers, All >tests should be in the one program, and the different language versions >should be handled by simple switches/radio buttons, as with the various >sorts of angle brackets ATM. OK the switches will inevitably become >complex and difficult. You're right, of course. I _think_ you might even go one better. I've used the occurrence of 50 instances of something recognizable as the English word "the" as an indicator that a file is (at least partly) in English, and a high number of certain types of characters to suggest that the file is in ISO-8859 or UTF-8, and a high number of strings within <> to indicate some flavor of *ML. I suspect that a similar technique might be useful in multilingual checkers in general, and if I wrote one I would certainly consider it. jim From hyphen at hyphenologist.co.uk Mon Jan 23 00:12:02 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Mon Jan 23 00:12:16 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <20060122213944.GA21967@panix.com> References: <20060119011933.GA16170@panix.com> <0alus1ppjor5sk9vd7n843fer4r1411v9m@4ax.com> <200601191021.k0JALjx02910@pico.dm.unipi.it> <20060122182833.GA25329@panix.com> <20060122213944.GA21967@panix.com> Message-ID: On Sun, 22 Jan 2006 16:39:44 -0500, Jim Tinsley wrote: |On Sun, Jan 22, 2006 at 09:26:24PM +0000, Dave Fawthrop wrote: |> |>IMO with the advent of huge memory in even the entry level computers, All |>tests should be in the one program, and the different language versions |>should be handled by simple switches/radio buttons, as with the various |>sorts of angle brackets ATM. OK the switches will inevitably become |>complex and difficult. | |You're right, of course. | |I _think_ you might even go one better. I've used the occurrence |of 50 instances of something recognizable as the English word "the" |as an indicator that a file is (at least partly) in English, and |a high number of certain types of characters to suggest that the |file is in ISO-8859 or UTF-8, and a high number of strings within |<> to indicate some flavor of *ML. | |I suspect that a similar technique might be useful in multilingual |checkers in general, and if I wrote one I would certainly consider it. There has been a lot of academic work on detecting language by counting frequently used short words. All languages have a different set of frequently used short words. IIRC it is not particularly accurate, and naturally falls down on text in two or more languages, I have a book in Yorkshire and English on my desk ATM. IMO Asking the user which language he/she is using would be easisier and more reliable. -- Dave Fawthrop 17,000 free e-books at Project Gutenberg! http://www.gutenberg.net For Yorkshire Dialect go to www.hyphenologist.co.uk/songs/ From kouhia at nic.funet.fi Wed Jan 25 11:39:46 2006 From: kouhia at nic.funet.fi (Juhana Sadeharju) Date: Wed Jan 25 12:43:58 2006 Subject: [gutvol-d] Re: Ebook reading devices Message-ID: Most important property of such a reader would be that there is no need to install any upload-software to the computer. Only then I could go to public library, browse and download free ebooks to the reader. If the device has the USB cable, then the device should be seen in the computer as an usb/portable disk. The books saved to this usb/portable disk then must also be readable in the reader. Note, some mp3 players are seen as a usb/portable disk, but the music files saved to the disk are not playable. Some proprietary formats may have been reverse engineered but that is illegal in USA and Europe. In Europe, one is not even allowed to tell what unofficial software could be used to convert the text to the proprietary format. And reverse engineering would not help at all, because one cannot install anything in public libraries. Does eBookWise require a software installation? Sony devices are known for requiring a software installation. Note, the requirement for the software installation hits harder the digital camera. Most popular cameras require the software installation, and therefore one is not able, e.g., to transfer the photos to safe, to home, via a public library, an internet cafe, or a hotel. Think about travelling and a possibility that your camera gets stolen or otherwise damaged, or that you're short of the memory cards. Juhana -- http://music.columbia.edu/mailman/listinfo/linux-graphics-dev for developers of open source graphics software From imaclean at gmail.com Wed Jan 25 19:16:42 2006 From: imaclean at gmail.com (Ian MacLean) Date: Wed Jan 25 19:16:46 2006 Subject: [gutvol-d] Re: Ebook reading devices In-Reply-To: References: Message-ID: <3156339d0601251916x63e7b46ai884ecadac1f7c0fc@mail.gmail.com> On 1/26/06, Juhana Sadeharju wrote: > > Most important property of such a reader would be that > there is no need to install any upload-software to the > computer. Only then I could go to public library, browse > and download free ebooks to the reader. If the device has > the USB cable, then the device should be seen in the > computer as an usb/portable disk. The books saved to this > usb/portable disk then must also be readable in the reader. > both the sony reader and the iRex have an sd card slot. So you would only need to connect a card reader to the library computer to download. Whether the reader supports the format you choose to copy to the SD card is another matter. Ian From Bowerbird at aol.com Fri Jan 27 12:22:51 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Fri Jan 27 12:23:07 2006 Subject: [gutvol-d] blah blah blog Message-ID: <90.6e2d8c3d.310bdb1b@aol.com> as promised, i'm morphing myself to my blah blah blog. don't want to talk about you behind your back, though... > http://journals.aol.com/bowerbird/bowerbirdseyeview -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060127/ae920118/attachment.html From hart at pglaf.org Sat Jan 28 10:17:08 2006 From: hart at pglaf.org (Michael Hart) Date: Sat Jan 28 10:17:12 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: <90.6e2d8c3d.310bdb1b@aol.com> References: <90.6e2d8c3d.310bdb1b@aol.com> Message-ID: On Fri, 27 Jan 2006 Bowerbird@aol.com wrote: > as promised, i'm morphing myself to my blah blah blog. > don't want to talk about you behind your back, though... > >> http://journals.aol.com/bowerbird/bowerbirdseyeview > > -bowerbird > In re: to the top note and your reply: I read the PG eBooks with all sorts of plain text viewers and have no problems with inconsistencies, much less in the various browsers that have a wider range of options. Michael From Bowerbird at aol.com Sat Jan 28 12:00:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Sat Jan 28 12:00:27 2006 Subject: [gutvol-d] blah blah blog Message-ID: <202.11261918.310d2757@aol.com> michael said: > I read the PG eBooks with all sorts of plain text viewers > and have no problems with inconsistencies, much less in > the various browsers that have a wider range of options. the inconsistencies are ones that a person "wouldn't notice", but which trip up any automated processing by a program... an obvious example would be that most section-headings (e.g., chapter headings) are preceded by four blank lines, but the occasional one might have three, or five, instead... nobody would claim that, in terms of a human reader, this inconsistency is meaningful -- it's not -- but when it comes to a program analyzing the file, it might make a difference... if there's only one level of header, then 3 or 4 or 5 blank lines might be equally good at signaling that there is a new section. but if a book has three different levels of headers, as some do, then you could use 5 blank lines to indicate the major sections, 4 to indicate regular sections, and 3 to indicate the subsections. if the number of blank lines isn't consistent, the program has to become much more sophisticated (and thus prone to failure) to try and determine the _actual_ level of each header. another example involves lines which should not be rewrapped, such as the lines in a table, or the lines in a letter's address-block. if these are consistently prefaced with one or more leading spaces, then a rewrap routine is easy to _write_ and easy to _comprehend_, and a programmer can spend time on more productive pursuits that add value and functionality, not ones that just resolve inconsistencies. lots of programmers have _started_ programs for the p.g. library. the vast majority of them have given up before long, in frustration. the inconsistencies in the formatting are the main source of difficulty. someday someone will set up a shadow version of the p.g. library where all the inconsistencies are resolved, and you will see then how much value is added by the ingenuity of programmers who are able to take consistent formatting of the e-texts for granted... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060128/0c77c7d4/attachment.html From hart at pglaf.org Mon Jan 30 13:34:09 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Jan 30 13:34:11 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: <202.11261918.310d2757@aol.com> References: <202.11261918.310d2757@aol.com> Message-ID: So, what you are telling me hre, ist hat while a human can muddle through ok, it takes a computer to really maess things up. On Sat, 28 Jan 2006 Bowerbird@aol.com wrote: > michael said: >> I read the PG eBooks with all sorts of plain text viewers >> and have no problems with inconsistencies, much less in >> the various browsers that have a wider range of options. > > the inconsistencies are ones that a person "wouldn't notice", > but which trip up any automated processing by a program... > > an obvious example would be that most section-headings > (e.g., chapter headings) are preceded by four blank lines, > but the occasional one might have three, or five, instead... > > nobody would claim that, in terms of a human reader, this > inconsistency is meaningful -- it's not -- but when it comes > to a program analyzing the file, it might make a difference... > > if there's only one level of header, then 3 or 4 or 5 blank lines > might be equally good at signaling that there is a new section. > > but if a book has three different levels of headers, as some do, > then you could use 5 blank lines to indicate the major sections, > 4 to indicate regular sections, and 3 to indicate the subsections. > > if the number of blank lines isn't consistent, the program has to > become much more sophisticated (and thus prone to failure) to > try and determine the _actual_ level of each header. > > another example involves lines which should not be rewrapped, > such as the lines in a table, or the lines in a letter's address-block. > if these are consistently prefaced with one or more leading spaces, > then a rewrap routine is easy to _write_ and easy to _comprehend_, > and a programmer can spend time on more productive pursuits that > add value and functionality, not ones that just resolve inconsistencies. > > lots of programmers have _started_ programs for the p.g. library. > the vast majority of them have given up before long, in frustration. > the inconsistencies in the formatting are the main source of difficulty. > > someday someone will set up a shadow version of the p.g. library > where all the inconsistencies are resolved, and you will see then > how much value is added by the ingenuity of programmers who > are able to take consistent formatting of the e-texts for granted... > > -bowerbird > From hart at pglaf.org Mon Jan 30 14:16:31 2006 From: hart at pglaf.org (Michael Hart) Date: Mon Jan 30 14:16:33 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: <43DE8E8E.5090900@novomail.net> References: <202.11261918.310d2757@aol.com> <43DE8E8E.5090900@novomail.net> Message-ID: On Mon, 30 Jan 2006, Lee Passey wrote: > Michael Hart wrote: > >> >> So, what you are telling me hre, ist hat while a human can muddle through >> ok, >> it takes a computer to really maess things up. > > > I think what he is saying is that the human brain is a highly capable, > general purpose computing device, highly capable of resolving ambiguity, > whereas computers and their associated software are still rather primative > devices. I believe that at some point in the future computers will be capable > of resolving all the ambiguities inherent in PG e-texts, but that day is not > yet here, and until then the software is going to require some human help. Still, I don't see why the computer has to make all those decisions. Can't it just lay there, out of the process, and just let me read? ;-) From lee at novomail.net Mon Jan 30 14:09:18 2006 From: lee at novomail.net (Lee Passey) Date: Mon Jan 30 14:45:33 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: References: <202.11261918.310d2757@aol.com> Message-ID: <43DE8E8E.5090900@novomail.net> Michael Hart wrote: > > So, what you are telling me hre, ist hat while a human can muddle > through ok, > it takes a computer to really maess things up. I think what he is saying is that the human brain is a highly capable, general purpose computing device, highly capable of resolving ambiguity, whereas computers and their associated software are still rather primative devices. I believe that at some point in the future computers will be capable of resolving all the ambiguities inherent in PG e-texts, but that day is not yet here, and until then the software is going to require some human help. From lee at novomail.net Mon Jan 30 14:54:23 2006 From: lee at novomail.net (Lee Passey) Date: Mon Jan 30 14:54:24 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: References: <202.11261918.310d2757@aol.com> <43DE8E8E.5090900@novomail.net> Message-ID: <43DE991F.5070801@novomail.net> Michael Hart wrote: > > On Mon, 30 Jan 2006, Lee Passey wrote: > >> Michael Hart wrote: >> >>> >>> So, what you are telling me hre, ist hat while a human can muddle >>> through ok, >>> it takes a computer to really maess things up. >> >> >> >> I think what he is saying is that the human brain is a highly >> capable, general purpose computing device, highly capable of >> resolving ambiguity, whereas computers and their associated software >> are still rather primative devices. I believe that at some point in >> the future computers will be capable of resolving all the ambiguities >> inherent in PG e-texts, but that day is not yet here, and until then >> the software is going to require some human help. > > > Still, I don't see why the computer has to make all those decisions. > > Can't it just lay there, out of the process, and just let me read? > > ;-) It can, but it can also do more. Personally, my reading experience is improved if new chapters always start at the top of the screen, and if chapter and section headings are rendered in a way that makes it _obvious_ that they are chapter and section headings. When I read I like to become so engrossed that I don't have to stop and think about the mechanics of the layout. I _can_ do so, I just don't _want_ to. Obviously, Project Gutenberg e-texts are insufficient for me, just as they are adequate for you. But it is a fallacy to assume that because they are sufficient for you that they are sufficient for everyone. From jon at noring.name Mon Jan 30 14:56:26 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 30 14:56:26 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? Message-ID: <913752685.20060130155626@noring.name> Everyone, Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in 1928, some of the publishing details in the following web site indicate the possibility it may be public domain in the U.S.: http://web.ukonline.co.uk/rananim/lawrence/lcl.html Anyone have more knowledge on the U.S. public domain status of this work? Jon From Bowerbird at aol.com Mon Jan 30 15:21:16 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Mon Jan 30 15:21:25 2006 Subject: [gutvol-d] blah blah blog Message-ID: <2cc.27f269c.310ff96c@aol.com> lee said: > I believe that at some point in the future computers will be > capable of resolving all the ambiguities inherent in PG e-texts, > but that day is not yet here, and until then the software is > going to require some human help. well, my computer routines can already resolve _most_ ambiguities, as i pointed out in the blog entry. i've even offered to do this for p.g., providing p.g. makes a commitment to stay consistent in the future... but there seemed to me no recognition for the need for consistency, let alone any desire to make a commitment to attain it... *** michael said: > Can't it just lay there, out of the process, and just let me read? sure, it could. but sometimes -- in the course of reading, and even outside of it -- there are things you want to know, or find out, or have the computer do for you, or figure out for you. take my headings example, for instance. it's nice if the computer can figure out the headings for you, because then it can produce a nice "table of contents" menu, which you can look at to get a quick overview on the e-text, or use to jump directly to the "dance of the quadrille" chapter (even if you hadn't yet known that there was such a chapter)... or, as lee has remarked, if the program knows the headers, it can do nice things like start a chapter on a new screen, and make the header big and bold. these nice touches are... nice. or take my text-wrapping example, as yet another instance. sometimes you want to reflow the text to a narrow window. if the non-rewrap lines have been clearly indicated -- e.g., with the leading space that i suggested -- then the computer can do the rewrap for you quickly and easily, with no errors... (and this goes beyond "a nice touch" into basic functionality.) consistency also goes a _long_ way in making _conversions_. surely you must realize that many of your e-texts are finding their highest popularity as a result of some sort of conversion, whether over at blackmask.com or manytexts.com or wherever. i think it behooves you to groom your texts for such conversions, whether you do those conversions or someone else does them... (heck, the time that has been spent doing .html conversions alone could have been reduced _significantly_ by improving consistency and then depending upon automatic .html generation by computer. and then all of the .html versions would have been consistent too!) unless you have a _commitment_ to removing the inconsistencies, though, you won't understand the depth of the processes required. for instance, staying with the rewrap example, you wouldn't realize that if you're creating a book of poetry, where _none_ of the lines should be rewrapped, you should preface _all_ lines with a space. other examples abound. if footnotes are marked consistently, the computer can be programmed to treat them appropriately. a "table of illustrations" can become a more-useful hotlinked list, or the launching point for a slideshow of all the book's pictures... basically, it means thinking in terms of the _library_, not the _book_. consistency is the foundation that can allow _all_kinds_ of neat new features, limited only by the imaginations of programmers. the reason you haven't seen this exhibited already is because programmers have been flustered by massive inconsistencies. even your own p.g. people can't produce the best possible tools to help them do their jobs because of all the parsing difficulties. *** so yeah, the computer could just lay there. but why not put it to work? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/d01e4bd2/attachment.html From grythumn at gmail.com Mon Jan 30 15:29:54 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Mon Jan 30 15:36:08 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? In-Reply-To: <913752685.20060130155626@noring.name> References: <913752685.20060130155626@noring.name> Message-ID: <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> AFAICT, it's still under copyright in the US. Lawrence was British (which rules out Rule 5), and it was still under copyright (LIFE+70) in the UK in 1996, so GATT prevents Rule 6. 1930+95=2025 for the US. http://www.gutenberg.org/howto/copyright-howto It appears to be clearable in LIFE+50 and LIFE+70 countries, though. Anyone want to chip in with expansions or corrections? R C On 1/30/06, Jon Noring wrote: > > Everyone, > > Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in > 1928, some of the publishing details in the following web site indicate > the possibility it may be public domain in the U.S.: > > http://web.ukonline.co.uk/rananim/lawrence/lcl.html > > > Anyone have more knowledge on the U.S. public domain status of this > work? > > Jon > > _______________________________________________ > gutvol-d mailing list > gutvol-d@lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/ece74dc0/attachment.html From grythumn at gmail.com Mon Jan 30 15:57:01 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Mon Jan 30 15:57:07 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? In-Reply-To: <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> References: <913752685.20060130155626@noring.name> <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> Message-ID: <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com> Actually, that would 1928+95=2023, provided you find one of the first few printings. Sorry. Still not PD in the US for quite some time. R C On 1/30/06, Robert Cicconetti wrote: > > AFAICT, it's still under copyright in the US. Lawrence was British (which > rules out Rule 5), and it was still under copyright (LIFE+70) in the UK in > 1996, so GATT prevents Rule 6. 1930+95=2025 for the US. > > http://www.gutenberg.org/howto/copyright-howto > > It appears to be clearable in LIFE+50 and LIFE+70 countries, though. > > Anyone want to chip in with expansions or corrections? > > R C > > On 1/30/06, Jon Noring wrote: > > > > Everyone, > > > > Even though D.H. Lawrence's "Lady Chatterly's Lover" was published in > > 1928, some of the publishing details in the following web site indicate > > the possibility it may be public domain in the U.S.: > > > > http://web.ukonline.co.uk/rananim/lawrence/lcl.html > > > > > > Anyone have more knowledge on the U.S. public domain status of this > > work? > > > > Jon > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d@lists.pglaf.org > > http://lists.pglaf.org/listinfo.cgi/gutvol-d > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/3ddffcc8/attachment-0001.html From jon at noring.name Mon Jan 30 17:12:43 2006 From: jon at noring.name (Jon Noring) Date: Mon Jan 30 17:12:45 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? In-Reply-To: <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com> References: <913752685.20060130155626@noring.name> <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com> Message-ID: <13372441.20060130181243@noring.name> Robert Cicconetti wrote: > Actually, that would 1928+95=2023, provided you find one of the > first few printings. Sorry. Still not PD in the US for quite some time. However, if one reads the link I provided previously, see: http://web.ukonline.co.uk/rananim/lawrence/lcl.html It says: "...As it was published privately [in 1928], no copyright was issued, so the novel was free game to any and all who wished to pirate their own editions, and the pirates could sell them for as much as possible." So, given this, is the original text public domain in the U.S.? If not, who is the "rights holder"? Jon From grythumn at gmail.com Mon Jan 30 19:39:17 2006 From: grythumn at gmail.com (Robert Cicconetti) Date: Mon Jan 30 19:39:22 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? In-Reply-To: <13372441.20060130181243@noring.name> References: <913752685.20060130155626@noring.name> <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com> <13372441.20060130181243@noring.name> Message-ID: <15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com> On 1/30/06, Jon Noring wrote: > > Robert Cicconetti wrote: > > > Actually, that would 1928+95=2023, provided you find one of the > > first few printings. Sorry. Still not PD in the US for quite some time. > > However, if one reads the link I provided previously, see: > > http://web.ukonline.co.uk/rananim/lawrence/lcl.html > > It says: > > "...As it was published privately [in 1928], no copyright was issued, > so the novel was free game to any and all who wished to pirate their > own editions, and the pirates could sell them for as much as possible." > > So, given this, is the original text public domain in the U.S.? If > not, who is the "rights holder"? > Then you'll have to establish that it wasn't under copyright in the UK in 1996 (Instead of using the blanket Life+70), and further search for renewals under US law for a Rule 6 clearance. An article would not suffice. If you wish to try to clear the book, feel free. It's not something that I'm particularly interested in. I would expect it to take a lot of time and effort both on your part and for Mr. Newby (who as we all know has gobs of free time). R C -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060130/af069358/attachment.html From gbnewby at pglaf.org Mon Jan 30 21:44:52 2006 From: gbnewby at pglaf.org (Greg Newby) Date: Mon Jan 30 21:44:54 2006 Subject: [gutvol-d] "Lady Chatterly's Lover" public domain in the U.S.? In-Reply-To: <15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com> References: <913752685.20060130155626@noring.name> <15cfa2a50601301529s22d9772bje78f49a792185c14@mail.gmail.com> <15cfa2a50601301557u1dd66479jfd0b520307b523af@mail.gmail.com> <13372441.20060130181243@noring.name> <15cfa2a50601301939q7a7acd10y6b3561bdae3dbeee@mail.gmail.com> Message-ID: <20060131054452.GA1424@pglaf.org> On Mon, Jan 30, 2006 at 10:39:17PM -0500, Robert Cicconetti wrote: > On 1/30/06, Jon Noring wrote: > > > > Robert Cicconetti wrote: > > > > > Actually, that would 1928+95=2023, provided you find one of the > > > first few printings. Sorry. Still not PD in the US for quite some time. > > > > However, if one reads the link I provided previously, see: > > > > http://web.ukonline.co.uk/rananim/lawrence/lcl.html > > > > It says: > > > > "...As it was published privately [in 1928], no copyright was issued, > > so the novel was free game to any and all who wished to pirate their > > own editions, and the pirates could sell them for as much as possible." > > > > So, given this, is the original text public domain in the U.S.? If > > not, who is the "rights holder"? > > > > Then you'll have to establish that it wasn't under copyright in the UK in > 1996 (Instead of using the blanket Life+70), and further search for renewals > under US law for a Rule 6 clearance. An article would not suffice. > > If you wish to try to clear the book, feel free. It's not something that I'm > particularly interested in. I would expect it to take a lot of time and > effort both on your part and for Mr. Newby (who as we all know has gobs of > free time). > > R C I don't see any indication that this isn't covered by GATT, as described in our Rule 6 "howto" at http://www.gutenberg.org/howto/copyright-howto No "registration" for copyright was/is required in the UK nor, indeed, anywhere...it's a US invention that didn't catch on much elsewhere, and in fact is no longer required in the US. The GATT actually rolled back copyrights (or created enforcement opportunities, anyway) for items previously published in the US written by non-US authors. The Grove Press was one of several publishers that specialized in these types of items. From what I understand, it was legal then (or at least not very enforceable, if illegal), but GATT made things crystal clear. Our "Rule 6" howto is a highly distilled version of GATT prepared & endorsed by some of our volunteer legal experts... As mentioned earlier, this should be copyright-free in life+70 countries, as well as life+50. BTW, the unpublished (or largely undistributed) manuscripts mentioned might be even worse off! See our copyright howto for the dates. But the limited distribution of thos might be enough to not treat them as unpublished manuscripts. To clear this item as public domain, we'd need confirmation from one of our legal experts (which I won't ask for, based on what I've seen so far), or a letter from a qualified lawyer, or something similar (like a letter from the Librarian of Congress, or Lawrence's estate). Yes, a pretty high barrier. Unless, of course, I'm missing something... -- Greg From hyphen at hyphenologist.co.uk Tue Jan 31 06:55:55 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Jan 31 07:00:06 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> Message-ID: On Mon, 16 Jan 2006 17:13:12 +0000, Dave Fawthrop wrote: |I am working on Yorkshire dialect poems and text, by John Hartley etext No |17472 and have previously done some of F W Moorman's 3232, 2888 work. I have just finished another of Hartley's Dialect Books, Hartley's Yorkshire ditties Second Series and put it through Gutcheck. It is a tiny book, 4 1/2 ins * 6 1/2 Ins and only 143 pages 3062 lines of PG Etext. Gutcheck throws 526 errors. All of which are wrong, except about 10 are trying to correct errors in the original text. It only found about a dozen real errors. No more comment required. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From bzg at altern.org Tue Jan 31 05:44:50 2006 From: bzg at altern.org (Bastien) Date: Tue Jan 31 07:08:21 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: <2cc.27f269c.310ff96c@aol.com> (Bowerbird@aol.com's message of "Mon, 30 Jan 2006 18:21:16 EST") References: <2cc.27f269c.310ff96c@aol.com> Message-ID: <87fyn43419.fsf@tallis.ilo.ucl.ac.uk> Hi, Bowerbird@aol.com writes: > well, my computer routines can already resolve _most_ ambiguities, Can we see the code? Can we use/test/improve it? > whether over at blackmask.com or manytexts.com or wherever. I can't find manytexts.com. Typo ? > so yeah, the computer could just lay there. but why not put it to > work? I wish i could put my computer to test your routines. Maybe i missed a link, but i can't find any relevant entrie in the archives. Cheers, -- Bastien From matthew at mc.clintock.com Tue Jan 31 07:33:48 2006 From: matthew at mc.clintock.com (Matthew McClintock) Date: Tue Jan 31 07:55:15 2006 Subject: [gutvol-d] blah blah blog In-Reply-To: <87fyn43419.fsf@tallis.ilo.ucl.ac.uk> References: <2cc.27f269c.310ff96c@aol.com> <87fyn43419.fsf@tallis.ilo.ucl.ac.uk> Message-ID: <03BD8B4A-9204-417F-A35B-5FE27938DE5E@mc.clintock.com> Maybe he means http://manybooks.net ? Sorry, I don't know the context - I filter his posts, and only see everyone else's replies. -Matt On Jan 31, 2006, at 7:44 AM, Bastien wrote: > I can't find manytexts.com. Typo ? -- http://mc.clintock.com http://manybooks.net -- From hart at pglaf.org Tue Jan 31 08:00:02 2006 From: hart at pglaf.org (Michael Hart) Date: Tue Jan 31 08:00:05 2006 Subject: !@!Re: [gutvol-d] blah blah blog In-Reply-To: <43DE991F.5070801@novomail.net> References: <202.11261918.310d2757@aol.com> <43DE8E8E.5090900@novomail.net> <43DE991F.5070801@novomail.net> Message-ID: On Mon, 30 Jan 2006, Lee Passey wrote: > Michael Hart wrote: > >> >> On Mon, 30 Jan 2006, Lee Passey wrote: >> >>> Michael Hart wrote: >>> >>>> >>>> So, what you are telling me hre, ist hat while a human can muddle >>>> through ok, >>>> it takes a computer to really maess things up. >>> >>> >>> >>> I think what he is saying is that the human brain is a highly capable, >>> general purpose computing device, highly capable of resolving ambiguity, >>> whereas computers and their associated software are still rather >>> primative devices. I believe that at some point in the future computers >>> will be capable of resolving all the ambiguities inherent in PG e-texts, >>> but that day is not yet here, and until then the software is going to >>> require some human help. >> >> >> Still, I don't see why the computer has to make all those decisions. >> >> Can't it just lay there, out of the process, and just let me read? >> >> ;-) > > > It can, but it can also do more. Personally, my reading experience is > improved if new chapters always start at the top of the screen, and if > chapter and section headings are rendered in a way that makes it _obvious_ > that they are chapter and section headings. When I read I like to become so > engrossed that I don't have to stop and think about the mechanics of the > layout. I _can_ do so, I just don't _want_ to. > > Obviously, Project Gutenberg e-texts are insufficient for me, just as they > are adequate for you. But it is a fallacy to assume that because they are > sufficient for you that they are sufficient for everyone. Obviously this is a conversation from those who are demanding something in terms of eBook preparation that they do not demand of paper books. I've spoken with librarians who would prefer that all books be made out of the same kinds of materials, the same kinds of paper, bindings, with them all cut to the same size. . .just think how much that would help a library with shelving, cart design, drop off slots, mailing boxes, etc. Then again, I've spoken to library patrons who would prefer that all of the libraries buy the same edition of the same books and shelve all the books in exactly the same manner, so they can walk into ANY library and just grab the first red book on the left and know what it will be. Once again the major point is that most of the work has already been in the system for you before you came along, and it is up to you to, "Take matters into your own hands," as one put it, and do the minuscule works that are required to make the books completely consistent with your own philosophy of how eBooks should be created. There's nothing wrong with what such people are asking, other than that they are asking someone else to do it for them, free of charge. "An Unfunded Mandate" as the politicians often refer to such things. Those of us who have been on this list, and others, for very long these days have no trouble remember any number of people who have had variety upon variety of requests that Project Gutenberg should be run in such a fashion as to meet with their demands. The response is always to invite the creation of some examples, along a suggested pathway for future efforts, to be accompanied by your request for others to assist you in making such future efforts. It's one thing to ask for help with something you are doing, whether it be a dozen examples per month or per year, until you finally get all of the volunteers you need to make things happen the way you would like. It's totally something else to insist that rules should be made to make others do things your way, whether they want to or not, especially when they are already doing most of what you want. Of course, there is a middle ground: write up the rules/suggestions in such a way that anyone creating eBooks is likely to find them before an eBook is created, and provide them with encouragement through examples. I don't think anyone would have an objection to each of the participant elements in these disussions having a URL posted to link to suggestions they have about eBook creation standards as long as all such suggestion files come complete with an ever increasing set of examples; after all, if YOU are not convinced enough of your own suggestions to carry them a short way every so often, how can you expect others to carry them every single time they make an eBook? The preferred solution at Project Gutenberg is to lead by example; make your preferences known and provide a continuing set of examples. Otherwise how will anyone know what you are encouraging them to do, and the reasons you have for your requests. In terms of making eBooks the way you want them to be: "It is better to light a candle than to curse the darkness." Provide examples. Describe how these examples are better than previous examples, and how the previous examples are better than those that came before them. Make sure you describe the evolution of eBooks as a process, a process that can be improved from time to time. Then, if people like your new eBooks better than the old ones, you are very likely to find a dozen volunteers to help you provide even better eBook collections that will help you find even more volunteers. Give the world eBooks in 2006!!! Michael S. Hart Founder Project Gutenberg From ciesiels at bigpond.net.au Tue Jan 31 07:18:08 2006 From: ciesiels at bigpond.net.au (Michael Ciesielski) Date: Tue Jan 31 09:48:40 2006 Subject: [gutvol-d] Comments on "blah blah blog" In-Reply-To: <2cc.27f269c.310ff96c@aol.com> References: <2cc.27f269c.310ff96c@aol.com> Message-ID: <43DF7FB0.9050200@bigpond.net.au> Please keep comments on issues raised in bowerbird's "blah blah blog" off this list. There appears to be a facility for comments on the blog itself, or I'm sure bowerbird would be happy to hear from you directly by email. Thank you! From hiddengreen at gmail.com Tue Jan 31 10:23:04 2006 From: hiddengreen at gmail.com (Cori) Date: Tue Jan 31 10:23:12 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> Message-ID: <910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com> On 1/31/06, Dave Fawthrop wrote: > I have just finished another of Hartley's Dialect Books, Hartley's > Yorkshire ditties Second Series and put it through Gutcheck. It is a tiny > book, 4 1/2 ins * 6 1/2 Ins and only 143 pages 3062 lines of PG Etext. > Gutcheck throws 526 errors. All of which are wrong, except about 10 are > trying to correct errors in the original text. It only found about a > dozen real errors. > > No more comment required. Indeed :) Catching a dozen real errors is a definite win! Plus, though it doesn't sound like it happened for you, I find that checking the false errors gives me a different view of the text, and thus occasionally spot other things in the text, (usually sneaky scannos of the lie/he, ago/age type.) Thanks again for Gutcheck, Jim! Cori From hyphen at hyphenologist.co.uk Tue Jan 31 11:32:57 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Jan 31 11:48:22 2006 Subject: [gutvol-d] Anyone recomend a good free wisywig html editor to PG standards Message-ID: Just turning my Hartley books into html. The generate html function of guiguts did not do a bad job of generating html, but the poetry needs a lot of tweaking. But it expects me to insert flags by hand :-( I gave up doing that in the bad old nroff days. way back in OMG 1986. Anyone recommend a good free windoze html editor to PG standards? I normally use an old copy of M$ Front page for html. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From hyphen at hyphenologist.co.uk Tue Jan 31 11:49:13 2006 From: hyphen at hyphenologist.co.uk (Dave Fawthrop) Date: Tue Jan 31 11:58:38 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: <910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com> References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> <910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com> <910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com> Message-ID: On Tue, 31 Jan 2006 19:30:55 +0000, Cori wrote: |On 1/31/06, Dave Fawthrop wrote: |> |Indeed :) Catching a dozen real errors is a definite win! |> |> 12 in 500plus is a resounding failure. | |I think there might be a misunderstanding about the purpose of the |Gutcheck ... if a text checking tool was provided that never gave me |any false errors, I'd be convinced that it wasn't catching all it |should be. Spellcheckers, or the barrage of regex checks that DP has |developed, all flag up false positives on my books, but they couldn't |be made much more effective without personalising them to each and |every text -- which would take more time than just clicking through |the false alarms..? The point of all these checks is to (hopefully) |be over-sensitive to problems, rather than under-sensitive (thus |leaving errors.) | |Or have I missed something in turn..? Do you have text checking tools |that only ever signal real errors..? Can they be shared..? With my other hat on I write "intelligent" language software, Low 90% correct is very bad, above 99% correct is acceptable. For a voluntary organisation I would be accept 50% correct. -- Dave Fawthrop "Intelligent Design?" my knees say *not*. "Intelligent Design?" my back says *not*. More like "Incompetent design". Sig (C) Copyright Public Domain From prosfilaes at gmail.com Tue Jan 31 12:04:30 2006 From: prosfilaes at gmail.com (David Starner) Date: Tue Jan 31 12:04:33 2006 Subject: !@!Re: [gutvol-d] blah blah blog In-Reply-To: References: <202.11261918.310d2757@aol.com> <43DE8E8E.5090900@novomail.net> <43DE991F.5070801@novomail.net> Message-ID: <6d99d1fd0601311204g15258afcxe6e276df9543eed8@mail.gmail.com> On 1/31/06, Michael Hart wrote: > Once again the major point is that most of the work has already been in > the system for you before you came along, and it is up to you to, "Take > matters into your own hands," as one put it, and do the minuscule works > that are required to make the books completely consistent with your own > philosophy of how eBooks should be created. > > There's nothing wrong with what such people are asking, other than that > they are asking someone else to do it for them, free of charge. > > "An Unfunded Mandate" as the politicians often refer to such things. Please don't bite people because they don't work the way you do. All he said is that Gutenberg books aren't layed out optimally for him. There's nothing wrong with someone discussing how things could be better, in their opinion. He didn't ask anyone to do anything for him. From Bowerbird at aol.com Tue Jan 31 13:34:43 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jan 31 13:34:50 2006 Subject: !@!Re: [gutvol-d] blah blah blog Message-ID: <12b.6d4ce464.311131f3@aol.com> mikey said: > Please keep comments on issues > raised in bowerbird's "blah blah blog" > off this list. sorry to disturb your sleep, mikey. but you can go back to bed now... *** bastien said: > Can we see the code? no. > Can we use/test/improve it? i'll release an app sooner or later that you can use. as far as the routines themselves, creating them took a _persistent_ application of common-sense and a lot of elbow-grease. that's all. which means you can generate them yourself, if you really want. i'll also be willing to give you some pointers if you should happen to hit a wall at any point, once you have shown me you are willing to take on the job. or, if you prefer, you can take the shortcut, and buy my source code. the price is in the 6 figures. i can't afford to give away something that valuable. if you can, i suggest you buy it and give it away... but really, what i want you to take away is that you _can_ do this, and it's really not even all that _hard_. for instance, i've talked about using the number of preceding blank lines as an indicator of heading level. assuming that such consistency has been maintained, even a beginning programmer could write the code to ascertain the structure of a book. care to try it? write it in pseudo-code first, and then in a language, any language with which you are comfortable, be it perl or python or ruby or basic or whatever you like. it might be better to take it over to gutvol-p, so we don't put a coding session in all the gutvol-d boxes, but if you show me you're willing to do some work, i'll be more than happy to show you what will pay off. if you're not, then i'll refer you to the fables of aesop, specifically the one about the little red hen... *** michael said: > There's nothing wrong with > what such people are asking, > other than that they are asking > someone else to do it for them, > free of charge. i know that michael is talking about "such people" as a general class, and i'm fairly confident he doesn't lump me in that class. but other people here might. so let me be absolutely clear here for them... i'm not asking anyone to do anything "for" me. and i am more than willing to do the job myself. i said so, directly, on my blog, and i repeat it here. what i am doing is suggesting you make these changes for _yourselves_ and for _your_readers_, simply because consistency in the e-texts leads to greater functionality... my experience is that this greater functionality would benefit you with increased efficiency in the preparation of the e-texts, and benefit your readers in greater total usability of the e-texts. > Those of us who have been on this list, and others, > for very long these days have no trouble remember > any number of people who have had variety upon variety > of requests that Project Gutenberg should be run in > such a fashion as to meet with their demands. again, i make no "demands". just giving you tips. for your own good. > The response is always to invite the creation of some examples, > along a suggested pathway for future efforts, to be accompanied by > your request for others to assist you in making such future efforts. it is also the case that i _am_ putting examples online. i've already pointed to some, and more will come soon. > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.html > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01b.pdf other examples will be "my antonia", "books and culture", "free culture", "the universe or nothing", "the secret garden", and a handful of books by cory doctorow, for starters. as z.m.l. requires just a smattering of changes to the markup-free version of an e-text, hundreds of examples will follow before long, and thousands once my processes are refined... if i don't lose interest, i might have most of the library converted in a year. from these plain-text .zml "masters" will emerge automatic .html versions -- from which a plethora of other formats will be able to be generated -- and automatic creation of .pdf versions according to user specifications... other sweetnesses might follow too, like ipod versions and p.s.p. versions. and last but not least, the z.m.l. viewer-app will create a kick-ass powerful high-functionality electronic-book experience using these .zml "masters". enough so that you'll wonder why you ever thought you needed "markup". (there's more -- scansets and banana-cream -- but that's enough for now.) > Describe how these examples are better than previous examples have done, and will continue to do, as appropriate times arrive... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/b7a245f6/attachment.html From Bowerbird at aol.com Tue Jan 31 14:59:23 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jan 31 14:59:33 2006 Subject: [gutvol-d] blah blah blog Message-ID: a concrete example might help... here's the table of contents from "free culture" by lawrence lessig, in zen markup language format, generated automatically from a simple straightforward analysis in about one-half of a second... even though there are 3 levels of headers, they are very clear, indicated by varying indentation (which represents, at the headers themselves, a varying number of preceding blank lines, of course.) text-structures even more complex than the one shown in this outline can be communicated easily by the number of preceding blank lines -- _if_ the rule is followed _consistently_ -- and grokked by routines consisting of just a few lines of dirt-simple code... by the way, just to say something "obvious" that lee probably had not considered before, one of the many ways my routines determine the headers in a digitized text is to look for a "table of contents" section -- usually toward the start of the file, and usually marked with "contents" or "table of contents" as a header -- and then examine that section quite carefully. ends up it does a very good job of telling you what specific phrases "might be" header-lines. and if you're cleaning up the o.c.r. of a p-book, for instance, there are usually _page-numbers_ there too, telling what _page_ each header is on. pretty handy, eh? indeed, in the .pdf of this book, which you can download at http://www.lessig.org, you will see that the page-numbers _are_ there, and chapter 11, chimera, for instance, starts on page 177. like i said, if you know what a header is likely to be, and on what page it is located, it's fairly easy to find. indeed, people have been using the "table of contents" for precisely that reason for several hundred years now. this is just one of the reasons why it ain't that hard to write routines to ascertain the headers in a book. like i said, it sounds very obvious when you hear it. but have you ever heard anyone say it here before? -bowerbird --------------------------------------------- TABLE OF CONTENTS Free Culture Table of Contents License Publisher Page Library of Congress Cataloging Dedication Preface Introduction 'Piracy' Chapter 1: Creators Chapter 2: "Mere Copyists" Chapter 3: Catalogs Chapter 4: "Pirates" Film Recorded Music Radio Cable TV Chapter 5: "Piracy" Piracy I Piracy II 'Property' Chapter 6: Founders Chapter 7: Recorders Chapter 8: Transformers Chapter 9: Collectors Chapter 10: "Property" Why Hollywood Is Right Beginnings Law: Duration Law: Scope Law and Architecture: Reach Architecture and Law: Force Market: Concentration Together Puzzles Chapter 11: Chimera Chapter 12: Harms Constraining Creators Constraining Innovators Corrupting Citizens Balances Chapter 13: Eldred I Chapter 14: Eldred II Conclusion Afterword Us, Now Rebuilding Freedoms Previously Presumed: Examples Rebuilding Free Culture: One Idea Them, Soon More Formalities Shorter Terms Free Use Vs. Fair Use Liberate the Music -- Again Fire Lots of Lawyers Footnotes Hyperlinks Acknowledgments Index About the Author Jacket Typos Corrected Permissions The Dead-Tree Hardback Version of this Work zero markup language -- z.m.l. -- the future of electronic-books --------------------------------------------- p.s. extra points for everyone who realized that -- since the lines in the table of contents section are not to be rewrapped -- that is the reason that all are prefaced with at least one leading space... -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/70be7733/attachment-0001.html From Bowerbird at aol.com Tue Jan 31 15:32:46 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jan 31 15:32:54 2006 Subject: [gutvol-d] coold nothav e Message-ID: <1c4.390ca2f5.31114d9e@aol.com> mihceal siad: > So, what you are telling me hre, ist hat > while a human can muddle through ok, > it takes a computer to really maess things up. i coold nothav esaidit bettter myyyself. :+) -bowrebrid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/a35049a0/attachment.html From prosfilaes at gmail.com Tue Jan 31 16:27:13 2006 From: prosfilaes at gmail.com (David Starner) Date: Tue Jan 31 16:27:16 2006 Subject: [gutvol-d] Language free version of guiguts? In-Reply-To: References: <2e5.8ddc4c.30f9ebc2@aol.com> <3156339d0601140820kf0ff17dsf5894499b1d8237e@mail.gmail.com> <43C963C5.4080400@xs4all.nl> <910fee4a0601311023h7a826156ke08a5e683f40e983@mail.gmail.com> <910fee4a0601311130i65126631l97a9c6ef51c6ab4a@mail.gmail.com> Message-ID: <6d99d1fd0601311627m4059cef9xf9dccb3ea4870627@mail.gmail.com> On 1/31/06, Dave Fawthrop wrote: > 12 in 500plus is a resounding failure. Then don't use it. For many users, 12 out of 500 is good enough to make the program useful. No program is ever going to handle dialect well, because dialect doesn't follow the normal rules. > With my other hat on I write "intelligent" language software, Low 90% > correct is very bad, above 99% correct is acceptable. For a voluntary > organisation I would be accept 50% correct. Intelligent language software is too broad; you write code to automatically hyphenate words. A concrete problem like that is significantly easier than a problem like "find errors in this text document". 50% of errata sent by humans to errata@pglaf.org is wrong; how do you expect a computer to do better? From Morasch at aol.com Tue Jan 31 17:02:40 2006 From: Morasch at aol.com (Morasch@aol.com) Date: Tue Jan 31 17:02:48 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: <2ea.9b2402.311162b0@aol.com> it's fair to say that gutcheck is an excellent piece of software. it's also fair to say that it returns far too many false positives. it's also fair to say that we _could_ have hoped that gutcheck's status as "open source" would have lead it to be _improved_ far more often than it has been, given its level of importance. it's also fair to say that it wouldn't be that hard to improve it, if someone _decided_ to, as the problems are not intractable. it's also fair to say if nobody improves it, it won't be improved. all those things are fair to say. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/b2ed12ed/attachment.html From Bowerbird at aol.com Tue Jan 31 17:10:05 2006 From: Bowerbird at aol.com (Bowerbird@aol.com) Date: Tue Jan 31 17:10:13 2006 Subject: [gutvol-d] Language free version of guiguts? Message-ID: <1ef.4b0b5db7.3111646d@aol.com> i said: > it's fair to say yep, that's me, and not some rogue impostor, just mailed from my girlfriend's account, so you know. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20060131/9d7c9d0a/attachment.html From imaclean at gmail.com Tue Jan 31 21:20:57 2006 From: imaclean at gmail.com (Ian MacLean) Date: Tue Jan 31 21:21:00 2006 Subject: !@!Re: [gutvol-d] blah blah blog In-Reply-To: <12b.6d4ce464.311131f3@aol.com> References: <12b.6d4ce464.311131f3@aol.com> Message-ID: <3156339d0601312120l3d6060cao114047a55d42f13b@mail.gmail.com> On 2/1/06, Bowerbird@aol.com wrote: > bastien said: > > Can we see the code? > > no. > We'll just have to take your word for how good these "routines" are then. > > Can we use/test/improve it? > > i'll release an app sooner or later that you can use. > or, if you prefer, you can take the shortcut, and > buy my source code. the price is in the 6 figures. > i can't afford to give away something that valuable. > if you can, i suggest you buy it and give it away... You're kidding right ? Your code might be fantastic but you're deluding yourself if you think anyone will pay you six figures for a text processing app. > what i am doing is suggesting you make these changes > for _yourselves_ and for _your_readers_, simply because > consistency in the e-texts leads to greater functionality... > my experience is that this greater functionality would benefit > you with increased efficiency in the preparation of the e-texts, > and benefit your readers in greater total usability of the e-texts. This I agree with. Although with an all-volunteer project its hard to define and enforce the format. Tools like gutcheck are a start along this road. > it is also the case that i _am_ putting examples online. > i've already pointed to some, and more will come soon. > > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.zml > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01.pdf > > > http://snowy.arsc.alaska.edu/bowerbird/alice01/alice01/alice01b.pdf I assume the pdf there is generated from the zml ? via what mechanism ? a single conversion step or via somthing like latex ? Any reason to not go with one of the existing plain-text markup languages that already exist and have existing conversion tools ? reST, asciidoc and others. Is there a format description for z.m.l ? Do you have existing tools/parsers for z.m.l with freely available code ? Or is that another 6 figures ? :) > from these plain-text .zml "masters" will emerge automatic .html versions > -- from which a plethora of other formats will be able to be generated -- > and automatic creation of .pdf versions according to user specifications... and how is this different from what gutenmark does ? Apart from hopefully fewer errors in the html or latex output ? > other sweetnesses might follow too, like ipod versions and p.s.p. versions. > and last but not least, the z.m.l. viewer-app will create a kick-ass > powerful high-functionality electronic-book experience Surely this is competely orthogonal to the choice of markup language. > using these .zml "masters". > enough so that you'll wonder why you ever thought you needed "markup". uh - just because you use whitespace instead of to indicate headings doesn't mean its not markup. The very fact that you are pushing for consistent format for conversion means that it *is* markup - oh and the fact that you've named it Zen *markup* language :) Ian