From Bowerbird at aol.com Sun Jun 1 12:03:46 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 1 Jun 2008 15:03:46 EDT Subject: [gutvol-d] good tools that do all the work (the short version) Message-ID: i've written a longer version of this post, which i might or might not send later, but here's the short version. *** jon richfield does make some very good points about wanting to have good tools that do all the work -- or at least as much of it that it is possible for tools to do. jon notes that he gets .html output from his o.c.r.-app. the problem with that .html is that you can't maintain it. even maintaining one such file can be a chore, but when you must maintain tens of thousands, it gets impossible. likewise with all the hand-crafted .html coming from d.p. it takes far too long to determine the unique fingerprint of each one, so you'll know what you need to do to fix it. you can be sure the world will "progress" to _something_ different -- be it .html6, xhtml3, the "semantic web" or something we haven't even anticipated at this early date -- and when it does, the p.g. .html files will be abandoned. it'll be easier to "start from scratch" (i.e., from the .txt files, or maybe even google's o.c.r.) than to convert that .html... _creation_ is just the first step. _maintenance_ is long-term. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080601/e7e7a6d3/attachment.htm From wainwright1000 at gmail.com Mon Jun 2 01:57:57 2008 From: wainwright1000 at gmail.com (Andrew Wainwright) Date: Mon, 2 Jun 2008 10:57:57 +0200 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com> On May 31, 2008, Bowerbird at aol.com wrote: > here's a good example -- the story of pocahontas: > > http://www.gutenberg.org/files/24487/24487.txt > > it has lots of stuff like this: > > She was a child of nature, and the birds trusted her > > and came at her call. She knew their songs, and > > where they built their nests. So she roamed the woods, > > and learned the ways of all the wild things, and > > grew to be a care-free maiden. > > > > [Illustration] > > notice that "[illustration]" notation? really something, isn't it? > what good does it do? tells the user they're missing a picture. > doesn't do a single thing to tell them _where_ they can find it. > > even if they were to navigate to the folder where the pictures are: > > http://www.gutenberg.org/files/24487/24487-h/images/ > it doesn't tell them the _name_ of the file containing that picture. > > by looking at the .html version, i can tell you that the picture is here: > > http://www.gutenberg.org/files/24487/24487-h/images/i005-1.jpg > > is it that hard to imagine that if you put that u.r.l. in the .txt file, > a .txt-file viewer-program could fetch it from there and display it, > right there at that spot in the text? > > it isn't that hard for _me_ to imagine. > > so the person who prepared this book could've added a lot of value > to the .txt file by including that information. and make no mistake, > they _had_ that info, since they needed it to make the .html version. > but they _took_an_extra_step_of_work_ to discard it from the .txt file. > As someone who produces ebooks similar in structure to the one mentioned, could you please let me know how I find the eBook number, before I produce the ebook? (It's always part of illustration URLs.) As far as I know, the ebook number is only assigned at whitewashing time. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/e2c3e9e0/attachment.htm From prosfilaes at gmail.com Mon Jun 2 03:37:10 2008 From: prosfilaes at gmail.com (David Starner) Date: Mon, 2 Jun 2008 06:37:10 -0400 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: References: Message-ID: <6d99d1fd0806020337m28c991c0hbdc5dd11b41ad744@mail.gmail.com> On Sat, May 31, 2008 at 3:10 AM, wrote: >> [Illustration] > > notice that "[illustration]" notation? There's no [illustration] notation there. > really something, isn't it? > what good does it do? tells the user they're missing a picture. > doesn't do a single thing to tell them _where_ they can find it. True. In most cases it should probably be omitted. > is it that hard to imagine that if you put that u.r.l. in the .txt file, > a .txt-file viewer-program could fetch it from there and display it, > right there at that spot in the text? No plain text viewer could do it, because then it would no longer be a plain text viewer. Furthermore, there's no real need to worry about what can be imagined; we should worry about what's useful to most of our users, who aren't and won't be using such a tool. > second, "reading" isn't the only thing people will do with these files. > they will be remixing them, and we make that unnecessarily difficult > whenever we fail to include all of the relevant information in the file... No, for many purposes, if you include extraneous data in the file, it makes it harder to remix. If you want a corpus, you want all that image information in the trash, not messing up your corpus. From Bowerbird at aol.com Mon Jun 2 10:24:04 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 2 Jun 2008 13:24:04 EDT Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB Message-ID: andrew said: > As someone who produces ebooks similar in structure to the one > mentioned, could you please let me know how I find the eBook number, > before I produce the ebook?? (It's always part of illustration URLs.)? > As far as I know, the ebook number is only assigned at whitewashing time. yeah, that's kind of silly how they keep you in the dark, isn't it? at the very least, they should let you use a "placeholder" string that would be automatically converted to the e-text number... at any rate, since the images for any particular e-text _are_ in a specific location, you don't need to put the full u.r.l. in the e-text; just put the filename. oh, and thanks for asking a good question... the assumption my viewer-program makes is that any images are in the same folder as the .txt file, but it's also smart enough to look in a subfolder named "images" if there happens to be one of those... (this rule applies whether the e-text is located on the web or offline.) -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/20fde545/attachment.htm From richfield at telkomsa.net Mon Jun 2 06:06:03 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Mon, 02 Jun 2008 15:06:03 +0200 Subject: [gutvol-d] Concerning the finger and those Gothic digressions... Message-ID: <4843F03B.30904@telkomsa.net> BB, you have been BBing too long. It clearly has been eroding your mind into too-well-worn channels. When the only thing that happens when one speaks is that one's fingers get squashed, every topic looks like a hammer. I wasn't giving anyone any fingers, much less any ells, elbows or arms. The topic in question is of tenuous interest to me, whatever its intrinsic merit. If I had no text input but HTML, and nothing but text readers and a graphics displayer to look at the pics with, it would be a minor problem to turn it into reasonably useful TXT plus graphic files, so I am not terribly sensitive to the needs of more vulnerable parties. Mea maxima culpa of course. But giving people TXT plus pic references seems barely more useful than giving them HTML. You have of course inspected HTML source? Give or take a few tags, and ignoring really elaborate formatting like tables, multiple columns, indenting and so on, it looks almost suspiciously like text, doesn't it? OTOH, if a user cannot handle graphic formats, then it hardly matters whether the graphics accompany TXT or HTML. If I condemned everyone to obscure PDF source instead, I might feel a greater sense of guilt. Or am I missing summat? Being, as I am, an unfrocked biologist, I am not much moved to join in the merry romp of partisans for rival formats and their associated software, obvious though the eventual value of a universally homologated and accepted superior notation may be. Analogously, I am a keen reformer of spelling and English: in principle I out-Shaw Shaw any day, but in practice I go through the world rapping the knuckles of perpetrators of errors and infelicities in terms of Onions and Fowlers and I writhe I whenever I catch myself in similar sin. You see, until everyone learns how much better it would be to listen to me, all reform is futile, so it is better to nurture and conserve such merits as the language retains for as long as infer does not mean imply, any more than if means whether. Similarly, when the hurly-burly's done, when all have accepted a nice new convention, feel welcome to wake me with a kiss after the requisite hundred years. All I was saying was that a particular notation is convenient to me, both for reading and preparation, and is widely usable for others. Silly of me of course, but what is new about that? All the best, Jon From Bowerbird at aol.com Mon Jun 2 10:46:46 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 2 Jun 2008 13:46:46 EDT Subject: [gutvol-d] one month of nothing been done on a roundless system for d.p. Message-ID: it's been a month since piggy updated the "confidence in page" wiki over at d.p. i understand he got a new job that's keeping him busy... meanwhile, rfrank has been continuing with his experiments, but he seems to work a bit more privately. and that's fine, but it's also easy to get the impression that no one at d.p. really _cares_ about implementing a roundless system. (rfrank seems to be hung up on _parallel_proofing_, and it doesn't seem to me that he's considered the extra costs involved therein. as usually, p1 proofers are seen as plentiful, so there's little effort made to conserve them as a resource.) rfrank also does programming, so there's a very real chance that he will be guided by the data to learn the importance of preprocessing, so i'm optimistic about that, because then d.p. will get worthy tools... but again, it's quite sad "the powers that be" don't support this better. meanwhile, much time and energy is being spent _circumventing_ the present workflow -- what with p1->p1 becoming very popular, and f1->f1 experiments being done now -- which is a total waste, because if that time and energy were being spent on _roundless_ experimentation, it could be leading to a _much_ better outcome... ("skips" in the workflow mean some pages get too little attention.) as usual, their focus is myopic on "getting this book here done now", instead of a _smarter_ investment of time and energy in determining the best way to get the most books done in the near and far _future_. they're "too busy" using their shovels to fire up the bulldozer nearby... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/199bf685/attachment.htm From Bowerbird at aol.com Mon Jun 2 11:38:19 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 2 Jun 2008 14:38:19 EDT Subject: [gutvol-d] Concerning the finger and those Gothic digressions... Message-ID: jon richfield said: > giving people TXT plus pic references seems > barely more useful than giving them HTML.? well, jon, that's where i can tell you unequivocally that _you_are_wrong_. from the standpoint of reworking an e-text, it's _much_ easier to start with clean -- and non-deficient -- text than to try and rework an .html file. in fact, if you _do_ have only an .html file to work with, the best course is to turn it into clean non-deficient text, so that you _can_ start it all anew, working from scratch... take it from me. i've done it. in fact, just to give you an example of such a reworking, i redid the "pocahontas" e-text i used earlier as an example. you can find my version here: > http://z-m-l.com/pgmirror/88/pocah.html for comparison, see the original .html version: > http://www.gutenberg.org/files/24487/24487-h/24487-h.htm i like my version better, because it retains the look-and-feel landscape orientation of the original book. i also prefer how my table-of-contents links work better. (plus, you can click on a picture to "turn the page", which brings the next spread up in a focused way, to avoid scrolling.) you might prefer mine, or you might prefer the other. the point is _some_ people _will_ want to rework your e-texts, for an infinite variety of reasons, and there are simple things you can do to make it easy for 'em. it was pretty easy for me to rework this text. _except_ for the fact that i had to puzzle out which graphic went on each page. (thankfully, the graphic file-names reflected their pagenumber, so even that was pretty easy to figure out, for the most part...) it would have been more difficult for me to figure out how to rework the _.html_ to get the precise look-and-feel i wanted. first i would've had to determine what the original producer did, and then decide how to make his system do what i wanted to do. believe me, it was easier to work from scratch. but if you wanna provide a little "pudding" as some "proof" for your side of the argument, jon, _you_ could rework that .html... show the world that it's not really as "difficult" as i make it out. i'm sure if you try, you will begin to see what i'm talking about. plus, now that my .txt version is clean _and_ non-deficient: > http://z-m-l.com/pgmirror/88/pocah.zml i can generate new versions -- .html, .pdf, whatever -- easily! > You have of course inspected HTML source? um, yeah, i sure have... why, just yesterday, i looked at the .html source for a few of the e-texts you did for p.g. (we can talk about those, if you'd like...) :+) > Give or take a few tags, and > ignoring really elaborate formatting like tables, > multiple columns, indenting and so on, > it looks almost suspiciously like text, doesn't it?? you know, if you ignore the mane, and the coloring, and the lack of stripes, a lion looks "suspiciously" like a tiger. but if what you want is a tiger, it's best to start with a tiger, not a lion. so when i only have an .html file, i load it into a browser, and then i do a copy-and-paste into my word-processor. that's the easiest way to get back to the text, and just text. for the information of the mac people out there, safari is the best browser to use when you do this, because it will _retain_ the bulk of the formatting, which is important... still, there usually remains much formatting to be re-done. (you failed to mention the one i hate most -- translating those ampersand-entities back into their .txt equivalents.) > OTOH, if a user cannot handle graphic formats, then it > hardly matters whether the graphics accompany TXT or HTML. i don't know of any users these days who cannot handle graphics. yet p.g. still largely removes info about the graphics in the .txt files. i should note that there are a few smart producers out there who do indeed include the filenames of the graphics in their .txt files. i wish i'd kept track of _their_ names so i could laud them publicly. > and I writhe I whenever I catch myself in similar sin. it's best not to talk about mistakes, because then you'll make one. :+) > All I was saying was that a particular notation is convenient to me, > both for reading and preparation, and is widely usable for others. > Silly of me of course, but what is new about that? well, of course, you know that's not "silly". it's very astute. and i was merely trying to inform you that you could give other people -- who want to rework your e-texts -- some _help_ by including the graphic filenames in your .txt files... i also said that, if this was "too much work" -- or whatever -- that you didn't _want_ to do it, that that was fine, too, because i will be reworking the p.g. files to put this information back in. so surely i'm not out of line for making such a simple suggestion. am i? all the best, -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/627c62ca/attachment-0001.htm From Bowerbird at aol.com Mon Jun 2 11:51:44 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 2 Jun 2008 14:51:44 EDT Subject: [gutvol-d] spam spam spam spam spam spam spam Message-ID: i see josh and david in my spam folder. if they've made any valid points to which you would like to see me respond, rephrase them in your own words, please, backchannel or frontchannel, and i'll respond frontchannel. otherwise, i will just assume they aren't talking about me... no sense being paranoid, is there?, especially about fleas... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/c3d8c1a6/attachment.htm From Bowerbird at aol.com Mon Jun 2 14:47:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 2 Jun 2008 17:47:08 EDT Subject: [gutvol-d] e-book viewer-programs for p.g. e-texts Message-ID: speaking of e-book viewer-programs for p.g., here's one: > http://voluminous.wooji-juice.com/blog this is for the mac only -- and only mac-os 10.5 to boot -- but i mention it because just today, the guy made a post to his blog about determining where new chapters begin: > http://voluminous.wooji-juice.com/blog/1-0-5-chapter-and-verse.html he notes that _inconsistencies_ in the e-text formatting make this task more difficult than it would be otherwise. just wait until he tries to figure out _footnotes_... ;+) *** nor is this guy the only one using p.g. e-texts as content... there's also blackmask.com and manybooks.net, of course, and feedbooks.com, which creates some nice-looking books. but there is also ybook, and ubook, and fbreader as well... > http://www.spacejock.com/yBook.html > http://www.gowerpoint.com/ > http://www.fbreader.org/ this program for the nokia handhelds browses the p.g. catalog: > http://www.elisanet.fi/ptvirtan/software/gutenbrowse/index.html it then lets you download the e-texts to be displayed by fbreader... and here are some lesser-known programs... > http://guten.sourceforge.net/ > http://gutenpy.sourceforge.net/#about > http://pybookreader.narod.ru/download.html > http://jbook.sourceforge.net/ > http://pyge.sourceforge.net/ and of course there are also the programs aimed at the ipod: > http://ebookhood.com/ipod-ebook-creator > http://www.tomsci.com/book2pod/ > http://pod2go.en.softonic.com/ > http://www.macupdate.com/info.php/id/16915/podreader > http://homepage.mac.com/applelover/text2ipodx/text2ipodx.html > http://burtcom.com/lex/#Anchor-iPoDoc-49575 > http://www.ipodebookmaker.com/ > http://www.iamlarge.com/ i suspect the list of programs available for ipod reading will grow exponentially once developers take to the s.d.k., and i look forward with delightful expectation to what steve will reveal about the iphone's future at w.w.d.c. next month... moreover, with technologies like adobe-air and ms-silverlight popping up all over the place, not to mention google-gears and yahoo's just-announced in-browser plug-in capabilities, there will be more and more developers taking on the job of creating e-book viewer-programs. it's a shame that each one will have to suffer through the same discovery of the hassles caused by inconsistencies that resulted in the blog entry above. but of course, that just means that they'll be very, very happy once they find out that i have produced a _consistent_ corpus. speaking of adobe-air, here's a newish viewer-app based on it: > http://members.cox.net/dean-mckee/ -bowerbird p.s. by the way, in doing the google-work for this message, i came across the original o.l.p.c. interest in the p.g. corpus: > http://dev.laptop.org/wiki/EBookViewerFormatSpec the date on my edit of the page shows it was 22 months ago. ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080602/72db23f2/attachment.htm From richfield at telkomsa.net Tue Jun 3 01:40:43 2008 From: richfield at telkomsa.net (Jon Richfield) Date: Tue, 03 Jun 2008 10:40:43 +0200 Subject: [gutvol-d] Concerning the finger and those Gothic digressions... Message-ID: <4845038B.8030809@telkomsa.net> Hi again BB, I'm not sure why we are errr... in discussion. Well-meaning friends do accuse me of a tendency towards obscurity in both articulation and verbalisation, but who listens to well-meaning friends anyway? Certainly not everyone in this forum, no? In any case, I cannot remember what I said to leave you unable to assimilate my intimations. Do I sleep? do I dream? Do I wonder and doubt? Are things what they seem? Or is visions about? What did I say to give the impression that I thought the media I used or the format I adopted were perfect, or even satisfactory? Did I not indicate that once everyone had elected to use something superior I would change unhesitatingly? And have I not lived up to that commitment so far? Even to a more generous commitment to change once such an improvement became generally accepted as an accessible alternative to our current tools? Haven't I, BB? Haven't I? And what appreciation do I get for it? A lecture on lions and tigers? Barely escaping the bears! Oh my? Someday when you are in a duly palaeotaxonomic mood, we must sit down and have a good long discussion on the interchangeability of species in the genus Panthera, in the strata Pleistocene to Recent. Fascinating stuff of course, but currently I am busy. Part of what I am busy on (just part, not being in the same category as David W. nor even aspiring to anything like it) is capturing material that I value and commend to others and that may well no longer be there once the eventual, the ineffably, unassailably, perfect, system is established. Nothing I have seen here so far persuades me that its advent is imminent. Count me among the deafer, more shortsighted, even curmudgeonly, among the shepherds watching the flocks. I am the one who goes on with the sheep while the enlightened charge off to worship. Silly, limited me... The best is enemy to the good. (Lan' o' Goshen! My text is so creative today. I cannot help thinking that if Heaven had not made me a lunatic my peculiar talent might have made me an entertaining writer!) So I do not aspire to the best, BB, not till it is served predigested by stronger, more corrosive entrails than mine own. (In the interests of my appetite, it might be well to omit identifying those energetic entrails before feeding me their proceeds, but don't say that I do not show willing!) I also cannot help thinking that if I had insisted on selecting one crusade among those in contention, and nailed my baby knickers to its mast, I should have done less good than I have done by submitting my few miserable and internally inferior scans so far, not even to mention the fact that this is but one of my fields of activity. Do I realise that we are building an ever-accumulating task for future tidiers of our archives? Certainly. Do I regret it? As a systems consultant of decades' standing, definitely. Do I none the less think that that is at least better than accumulating an increasingly Augean agglomeration of Psocopteran-riddled, oxidation-browned paper? Decidedly. Rest assured that by the time everyone has seen the future in the blinding illumination of your insights and achievements, I shall have produced no more than a few more items and shall subsequently join the congregation around the shrine that I had overlooked in my grubbing and plodding. I might even assist in the manual conversion of the works that I had perpetrated in forms inaccessible to the tools of that future. But for now I shall neither encumber nor oppose you. That which lies there bleeding is my heart. That which you smell there burning is my zeal. Go well, good luck, and enjoy. Jon From hart at pglaf.org Tue Jun 3 10:02:49 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 3 Jun 2008 10:02:49 -0700 (PDT) Subject: [gutvol-d] Concerning the finger and those Gothic digressions... In-Reply-To: References: Message-ID: Any time technology sufficiently advances, then you come to a point where starting on things from scratch via the new technology gets easier and easier to the final point, where it is easier than rebuilding ye olde stuff and better to start from scratch. I first said this long long ago. . . . Michael On Mon, 2 Jun 2008, Bowerbird at aol.com wrote: > jon richfield said: >> giving people TXT plus pic references seems >> barely more useful than giving them HTML.? > > well, jon, that's where i can tell you unequivocally that > _you_are_wrong_. > > from the standpoint of reworking an e-text, it's _much_ > easier to start with clean -- and non-deficient -- text > than to try and rework an .html file. > > in fact, if you _do_ have only an .html file to work with, > the best course is to turn it into clean non-deficient text, > so that you _can_ start it all anew, working from scratch... > > take it from me. i've done it. > > in fact, just to give you an example of such a reworking, > i redid the "pocahontas" e-text i used earlier as an example. > > you can find my version here: >> http://z-m-l.com/pgmirror/88/pocah.html > > for comparison, see the original .html version: >> http://www.gutenberg.org/files/24487/24487-h/24487-h.htm > > i like my version better, because it retains the look-and-feel > landscape orientation of the original book. i also prefer how > my table-of-contents links work better. (plus, you can click > on a picture to "turn the page", which brings the next spread > up in a focused way, to avoid scrolling.) you might prefer mine, > or you might prefer the other. the point is _some_ people _will_ > want to rework your e-texts, for an infinite variety of reasons, > and there are simple things you can do to make it easy for 'em. > > it was pretty easy for me to rework this text. _except_ for the > fact that i had to puzzle out which graphic went on each page. > (thankfully, the graphic file-names reflected their pagenumber, > so even that was pretty easy to figure out, for the most part...) > > it would have been more difficult for me to figure out how to > rework the _.html_ to get the precise look-and-feel i wanted. > first i would've had to determine what the original producer did, > and then decide how to make his system do what i wanted to do. > believe me, it was easier to work from scratch. > > but if you wanna provide a little "pudding" as some "proof" for > your side of the argument, jon, _you_ could rework that .html... > show the world that it's not really as "difficult" as i make it out. > i'm sure if you try, you will begin to see what i'm talking about. > > plus, now that my .txt version is clean _and_ non-deficient: >> http://z-m-l.com/pgmirror/88/pocah.zml > i can generate new versions -- .html, .pdf, whatever -- easily! > > >> You have of course inspected HTML source? > > um, yeah, i sure have... why, just yesterday, i looked at > the .html source for a few of the e-texts you did for p.g. > (we can talk about those, if you'd like...) :+) > > >> Give or take a few tags, and >> ignoring really elaborate formatting like tables, >> multiple columns, indenting and so on, >> it looks almost suspiciously like text, doesn't it?? > > you know, if you ignore the mane, and the coloring, and > the lack of stripes, a lion looks "suspiciously" like a tiger. > > but if what you want is a tiger, it's best to start with a tiger, > not a lion. > > so when i only have an .html file, i load it into a browser, > and then i do a copy-and-paste into my word-processor. > that's the easiest way to get back to the text, and just text. > > for the information of the mac people out there, safari is > the best browser to use when you do this, because it will > _retain_ the bulk of the formatting, which is important... > > still, there usually remains much formatting to be re-done. > > (you failed to mention the one i hate most -- translating > those ampersand-entities back into their .txt equivalents.) > > >> OTOH, if a user cannot handle graphic formats, then it >> hardly matters whether the graphics accompany TXT or HTML. > > i don't know of any users these days who cannot handle graphics. > yet p.g. still largely removes info about the graphics in the .txt files. > > i should note that there are a few smart producers out there who > do indeed include the filenames of the graphics in their .txt files. > i wish i'd kept track of _their_ names so i could laud them publicly. > > >> and I writhe I whenever I catch myself in similar sin. > > it's best not to talk about mistakes, because then you'll make one. > :+) > > >> All I was saying was that a particular notation is convenient to me, >> both for reading and preparation, and is widely usable for others. >> Silly of me of course, but what is new about that? > > well, of course, you know that's not "silly". it's very astute. > > and i was merely trying to inform you that you could give > other people -- who want to rework your e-texts -- some > _help_ by including the graphic filenames in your .txt files... > > i also said that, if this was "too much work" -- or whatever -- > that you didn't _want_ to do it, that that was fine, too, because > i will be reworking the p.g. files to put this information back in. > > so surely i'm not out of line for making such a simple suggestion. > am i? > > all the best, > > -bowerbird > > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From Bowerbird at aol.com Tue Jun 3 10:49:02 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 3 Jun 2008 13:49:02 EDT Subject: [gutvol-d] Concerning the finger and those Gothic digressions... Message-ID: jon said: > I'm not sure why we are errr... in discussion. well, i made a simple point, and then you responded. and kept responding. even though you claim you are uninterested in the topic. i am _quite_ interested in the general topic of revealing the inherent power of the .txt files, so that's why i keep responding. but yes, i'm making general points intended at the whole list... i'm talking to you personally because -- otherwise -- when one speaks to an "abstract" entity like "the list as a whole", one can become too removed from reality. so i like to interact with an actual human -- like you -- to stay grounded, rather than pontificate at large. :+) (although, as we all know, i can -- and do -- do both.) > Well-meaning friends do accuse me of a? tendency > towards obscurity in both articulation and verbalisation nah. they're just jealous of your poetic inclination... ;+) > In any case, I cannot remember what I said > to leave you unable to assimilate my intimations. well, as i said, it's really not about you. it's a general point. and that general point is that _some_ people -- and you might well be one of them -- are trying to _cripple_ the .txt versions, either intentionally or because they can't see the inherent power. "the .txt files cannot include illustrations," they'll tell you. bullshit! _of_course_ they can. they include illustrations the very same way that the .html files include illustrations -- by listing their filename so that a viewer-program (in the case of .html, that'd be a browser) can show them at the point in the text where they should be shown. > Do I sleep? do I dream? Do I wonder and doubt? > Are things what they seem? Or is visions about? see? poetry. don't be ashamed of that, jon. poetry is beautiful. > What did I say to give the impression that I thought the media I used > or the format I adopted were perfect, or even satisfactory?? are you interested in making your formats _more_useful?_ (remember that "you" means every single digitizer on this list.) if so, i just told you one dirt-simple way that you can do that. if not, just ignore my suggestion. like i said, i'll fix your files. what is so difficult to understand about what _i_ have just said? > Did I not indicate that once everyone had elected to use > something superior I would change unhesitatingly? you can even cling to your old ways _after_ everyone else has changed to "something superior", and i'll _still_ fix your files... nobody is asking you to change, jon. (or anyone else, either.) i just suggested a way that you _could_ make your .txt version more useful to people out in the real world, if you _wanted_ to. that's all. > And have I not lived up to that commitment so far? again, it's not really about _you_, not you personally, jon, so i haven't really been keeping track. nor am i wont to... > Even to a more generous commitment to change > once such an improvement became generally accepted > as an accessible alternative to our current tools?? Haven't I, BB? again, i don't really care if you change, jon. not in the slightest... > Haven't I?? And what appreciation do I get for it?? > A lecture on lions and tigers?? Barely escaping the bears! Oh my?? you have to admit it was funny of you to say that -- once you "look past" the markup -- an .html file looks "suspiciously like" a text file. as they say on the playground, if my aunt had balls, she'd be my uncle. > Someday when you are in a duly palaeotaxonomic mood, > we must sit down and have a good long discussion on > the interchangeability of species in the genus Panthera, > in the strata Pleistocene to Recent. we'll have to make sure there is wifi, so i can consult wikipedia, and look up some of those big words you're throwing around... > Fascinating stuff of course, but currently I am busy. me too. or rather, i should say that my computers are busy, churning away on p.g. e-texts #10000-#25555+change... step one is stripping off the headers and footers, and _boy_, even that minor change makes it seem more like a _library_, since the book-titles now pop to the top of the display and remind you that these are indeed _books_ that you're viewing. sometimes i forget what a headache that legalese gives me... > Part of what I am busy on (just part, not being in the same > category as David W. nor even aspiring to anything like it) there's almost nobody in the widger category. if only we could make a widget out of david widger, none of us would have to digitize again. > is capturing material that I value and commend to others > and that may well no longer be there once the eventual, > the ineffably, unassailably, perfect, system is established. and for that, jon, i give to you my heartfelt thanks for doing it... i'm not sure why you think this involves some "perfect" system, let alone one that is "ineffably, unassailably" perfect, when all i have suggested is the dirt-simple recommendation that digitizers include the filename of the graphics file in their .txt versions, but this seems a small price to bear (see, you did get the bear after all) to elicit your wonderful poetry... > Nothing I have seen here so far persuades me that its advent > is imminent. nah, perfection is as elusive as ever... :+) > Count me among the deafer, more shortsighted, even curmudgeonly, > among the shepherds watching the flocks.? I am the one who goes on > with the sheep while the enlightened charge off to worship.? > Silly, limited me... on the one hand you call yourself "curmudgeonly", and on the other hand you are . decide! because curmudgeons don't cry. it's against the code. > The best is enemy to the good.? (Lan' o' Goshen! > My text is so creative today.? I cannot help thinking that > if Heaven had not made me a lunatic my peculiar talent > might have made me an entertaining writer!) well, _i_ am entertained, for sure. but since your "well-meaning" friends don't seem to grok your charm, do not quit the day-job... > So I do not aspire to the best, BB well, _that_ is unfortunate. but nobody is perfect, because, as i said, perfection is as elusive as ever... (how come nobody ever notes that "the good is enemy to the best"? and if these two are _really_ fighting, whose side do you want to be on?) > I also cannot help thinking that if I had insisted on > selecting one crusade among those in contention, > and nailed my baby knickers to its mast, > I should have done less good than I have done by > submitting my few miserable and internally inferior scans so far oh, hey, that reminds me. you didn't submit your _scans_ to p.g. for the books that you digitized. could you, please? that way, when people remix your work in the future, they will be able to see how the text looked in the original book, and that will be a tremendous aid to them. so be a pal, ok? (but hey, if it's too much work for you to do that, no sweat. i am sure google will get around to scanning your books.) > Do I realise that we are building an ever-accumulating > task for future tidiers of our archives?? Certainly.? see, that's where you're wrong, jon. there will be _no_ "future tidiers" of these archives, because the job has already become too immense... oh, it might still take some time for the "present tidiers" to realize they've created a mess they cannot get out of, but at _some_ point down the line, that realization will be full-blown, and they will run screaming from the building. you've seen on the nightly news about those government projects where they spent hundreds of millions of dollars on a computer system that just plain flat-out doesn't work? and they can't get it to work? so they just have to eat the loss? it'll be the same thing here. minus, of course, the small matter of hundreds of millions of dollars. oh, and please, whoever is thinking of doing it right now, don't respond by saying "the text files will still be here"... they will, but since they are inconsistent, they won't help. and also, you've changed the linebreaks, so each text-file can no longer be associated with a specific google scan-set (if it ever could), so the future will have very little use for it... > Do I regret it?? As a systems consultant of decades' standing, > definitely. oh, i see the problem... you've become inured to the problem... you are one of the purveyors of those big complex systems that don't work, so you think that's "the natural order" of such things. > Do I none the less think that that is at least better than > accumulating an increasingly Augean agglomeration of > Psocopteran-riddled, oxidation-browned paper?? Decidedly. well, yes and no. in spite of the fact i have immersed my life in digitization, i do still have a deep and profound love for paper. but of course i can certainly see your point as well. and besides, since the process of digitizing some books has kept you busy, and therefore saved the world from at least _some_ damage that might've occurred from more "systems consulting" by you, i'd say that your hobby has had some good effects... ;+) > Rest assured that by the time everyone has seen the future > in the blinding illumination of your insights and achievements, > I shall have produced no more than a few more items well, you know, we each do what we can. no one can expect more... ;+) > and shall subsequently join the congregation around the > shrine that I had overlooked in my grubbing and plodding. you mean someone is going to put up a _shrine_ to the idea of putting the graphics filenames in the .txt versions. _who_knew?_ i mean, seriously, who knew? i thought it was such a simple thing. it's probably marcello. that guy is such a cad. have you seen the fansite that he made for me? so nice to see i have a stalker. > I might even assist in the manual conversion of the works that > I had perpetrated in forms inaccessible to the tools of that future. oh, we ain't going to do that work _manually_, my friend, not at all. the program is already written to do most of the work automatically. it pulls in the .html version, locates the "img" tags, gets the filename from the "src" component, and then finds the equivalent location in the .txt file, and plops the filename there. yes, there is an occasional idiosyncratic glitch, but overall it works pretty smoothly in most cases. so this task isn't really a big deal. > But for now I shall neither encumber nor oppose you. like i said, it's not really about you. so do whatever you like, jon. > That which lies there bleeding is my heart. i'd expect nothing less from a poet... > That which you smell there burning is my zeal. no, actually, that's the smell of the burrito i had for breakfast. you'd think i woulda learned by now those things give me gas. go well, good luck, and enjoy. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4& ?NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080603/762f5c29/attachment-0001.htm From hart at pglaf.org Wed Jun 4 09:28:12 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 4 Jun 2008 09:28:12 -0700 (PDT) Subject: [gutvol-d] can't see the forest for the trees In-Reply-To: References: Message-ID: On Fri, 30 May 2008, Bowerbird at aol.com wrote: > michael said: >> I would certainly like to mirror your mirror, if that's ok. > > you know that public-domain means you don't need my ok. > > plus i'd consider it to be an honor if you did that. > > -bowerbird > Please keep us posted!!! And if you decide you need any volunteers. . .or want. . . . me > > > ************** > Get trade secrets for amazing burgers. Watch "Cooking with > Tyler Florence" on AOL Food. > (http://food.aol.com/tyler-florence?video=4& > ?NCID=aolfod00030000000002) > From gbnewby at pglaf.org Wed Jun 4 10:43:16 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Wed, 4 Jun 2008 10:43:16 -0700 Subject: [gutvol-d] Why stick with PG? And other Gothic digressions Al and BB In-Reply-To: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com> References: <927a0ec40806020157u3d78cd04pc048133d7a4fd5ee@mail.gmail.com> Message-ID: <20080604174316.GA16067@mail.pglaf.org> On Mon, Jun 02, 2008 at 10:57:57AM +0200, Andrew Wainwright wrote: > ... > > As someone who produces ebooks similar in structure to the one mentioned, > could you please let me know how I find the eBook number, before I produce > the ebook? (It's always part of illustration URLs.) As far as I know, the > ebook number is only assigned at whitewashing time. You are right that we generally don't pre-assign eBook numbers. The way it works in illustration URLs is that the URLs are relative...so within the book you have something like: image text which your Web browser will built a full URL out of. So if it's eBook #23456, this would be prepended: http://www.gutenberg.org/files/23456/23456-h/images/image01.png or similar... so, eBook producers don't need to know the eBook # for this to work. In fact [and this is by design], relative URLs mean that the eBook will be properly viewable on any Web or FTP site, or even a local directory. -- Greg From Bowerbird at aol.com Wed Jun 4 11:18:01 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 4 Jun 2008 14:18:01 EDT Subject: [gutvol-d] can't see the forest for the trees Message-ID: michael said: > Please keep us posted!!! oh, i'm not going anywhere... i'm staying right here, chatting in the lobby of the project gutenberg library. so you'll hear all about it... i have to say that i don't think you realize the danger in forking your library. i've been reluctant to do that, but just can't wait on this any longer. and if you're not worried about it, it's silly for me to be worried about it. but still, i don't think you fully understand the fallout... and realistically, your people will not back-incorporate my version of the library, because it is a repudiation of their carelessness about inconsistencies in the library, and they know it as well as i do and everyone else does. i'm just sayin'... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080604/4130d489/attachment.htm From Bowerbird at aol.com Wed Jun 4 13:32:31 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 4 Jun 2008 16:32:31 EDT Subject: [gutvol-d] e-book viewer-programs for p.g. e-texts Message-ID: wouldn't you know it, just a few hours after i wrote up this post on viewer-programs for p.g. e-texts, i learned of a new entrant. "stanza" is now available in beta form from "lexcycle": > http://www.lexcycle.com it promises to read a staggering number of formats... this dude seems ambitious. the website looks really nice. and he's already written up a wikipedia entry for the app! it's mac only, at present, though it looks like he wants to port it... functionally, it won't handle any fancy formatting or graphics yet, but like i said, this guy seems very ambitious. look for this to grow. "stanza" reminds me of another mac-only e-book viewer-app which i'd forgotten to mention the first time around -- "tofu": > http://amarsagoo.info/tofu/index.shtml -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080604/c55b1c93/attachment.htm From Bowerbird at aol.com Thu Jun 5 12:54:37 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 5 Jun 2008 15:54:37 EDT Subject: [gutvol-d] cory scores another first Message-ID: this just came over on cory doctorow's announcement-list... > Remember four weeks ago when I told you > that my young adult novel Little Brother > made the New York Times bestseller list? > Well, I've just heard from my publisher that > it's about to go into its *fourth week* on the list, > having climbed to position *eight* Color me ecstatic! > My sincere thanks to all of you who talked about the book, > gave it to your friends, sent it to teachers and librarians, > and downloaded it -- you all helped make this > the first-ever Creative Commons-licensed novel > to get on the NYT list! giving away free copies doesn't seem to hurt cory's sales, that's for sure... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080605/068ab3f9/attachment.htm From Bowerbird at aol.com Thu Jun 5 16:25:24 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 5 Jun 2008 19:25:24 EDT Subject: [gutvol-d] most recent research demo-app Message-ID: today's research demo-app takes any p.g. e-text number and downloads the graphic-files for that book and runs a slideshow. e-mail me (telling me your o.s.) if you'd like a copy... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080605/37bf092f/attachment.htm From ebooks at ibiblio.org Fri Jun 6 00:54:13 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Fri, 06 Jun 2008 03:54:13 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <4848ED25.1040202@ibiblio.org> Sorry for the very late reply, but my internet time has been limited for the past few weeks. On May 10, 2008, Bowerbird wrote: > sometimes the backbiting on this list gets _extremely_ > amusing... :+) I find the misinformation, ignorance, jumping to conclusions, and faulty logic more amusing. :) > the original f.a.q. from michigan tells the numbers about michigan... [snip] > voila, we have the 7-million number. [snip] > voila, we have the 6-year timeframe... > > > there were 5 libraries involved in the project at the outset. > my guess -- back then, and even now today -- would be > that if they intended to scan 7 million umichigan books in > 6 years, they intended to scan _at_least_ another 7 million > from the other 4 libraries in that same amount of time, so > i'd say the implicit promise was to do 14 million in 6 years, > and i don't think you can call that an unreasonable position, > either then or now. I'd not only call it "unreasonable," I'd call it silly, especially since both the "New York Times" and ABC News reported on Dec. 14, 2004 that it could take 10 years to finish 15 million books. :) > since -- after 3 years -- they've only scanned _1_ million books > from umichigan, then it is _completely_ fair to say that they are > "behind schedule" at umichigan. of course, since many libraries > (dozens?) have joined the project since its onset, i'd guess the > schedule was altered somewhere along the line, and that's fine. > i'm convinced they're working on it, and working hard, so fine... You're assuming that the number of books the University of Michigan has placed online equals the number scanned, and it would be completely fair to say that your assumption is completely wrong. Take a look at the main MBooks page: http://www.lib.umich.edu/mdp/ Note the "Scanning Schedule" about half-way down the page: "Currently scanning from the Hatcher Graduate Library. More details about the Hatcher Scanning Schedule at Michigan. This summer we expect to scan Dentistry Library titles, Taubman Medical Library monographs and selected sections of the Undergraduate Library, before we return to the Graduate Library scanning in the Fall "Library materials have been scanned from the Buhr Remote Shelving Facility, the Social Work Library, and the Art, Architecture and Engineering Library." Now take another look at the UMich FAQ you like so much: http://www.lib.umich.edu/staff/google/public/faq.pdf Note these two FAQs from page 2 of the PDF: "Q. 7: What collections in the library will be digitized? A: Most of the University Library's bound print collections will be digitized (see Question 10 below for exceptions), beginning with all volumes in the Buhr Shelving Facility." "Q. 9: In what order will the different libraries be scanned, and will the project include new acquisitions? A: A timetable and strategy for digitizing volumes in locations other than Buhr will be developed over time. We are currently focusing on the 2.5 million volumes in Buhr; consequently, newly acquired materials are not factored into the conversion process. As we move into other libraries, we will formulate strategies for taking new acquisitions into account." When I first saw that scanning schedule, I was tempted to assume that it meant Google had finished scanning all 2.5 million volumes in the Buhr Shelving Facility. But rather than just assume, I decided to check with someone who would know. So I emailed John Wilkin, an associate librarian who has been working with Google at the University of Michigan. At first, because of Google's love of secrecy, Wilkin only told me that they've finished the Social Work Library, and the Art, Architecture and Engineering Library and that they've done "a considerable portion" of Buhr. But when I told him you'd said that Google had only scanned 1 million books at UMich and is "behind schedule," he sent me this reply: "I think you can satisfy him by saying that, according to me (and you can quote me), we've got well in excess of 1m online here at UM and have digitized more than twice that." And thanks to something Paul Courant, University Librarian and Dean of Libraries at the University of Michigan, posted on his blog recently, we can calculate an estimate of how many UMich books have been posted and scanned. In this May 31st blog post: "Microsoft Exits the Mass Digitization Business" http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/ Courant wrote, "In the meantime, the University of Michigan Library now has well over a million digitized books in its catalogue, with the number growing by thousands every day." Now, the announcement of the "millionth book" was posted on or before February 2nd. ("Last Update: 08:30 PM EST on Saturday, February 02, 2008".) http://www.lib.umich.edu/news/millionth.html Let's be conservative in our calculations, so let's assume that books have only been posted online on normal workdays (5 days per week). There have been 87 full workdays since February 2nd. (89 weekdays minus the Presidents' Day and Memorial Day holidays.) Now, Paul Courant said that the number is "growing by thousands every day." "Thousands" (plural) implies a minimum of 2,000 each day. 87 * 2,000 = 174,000. Add that to 1 million, and we're up to 1.174 million UMich books online. Now, since John Wilkin told me that they've "digitized more than twice" the number they have online, let's calculate a range: 1,174,000 * 2.0 = 2,348,000 1,174,000 * 2.1 = 2,465,400 1,174,000 * 2.2 = 2,582,800 1,174,000 * 2.3 = 2,700,200 1,174,000 * 2.4 = 2,817,600 1,174,000 * 2.5 = 2,935,000 So even using conservative assumptions, we're looking at a minimum of about 2.4 million books scanned at UMich so far--and perhaps many more. :) > i _do_ wish that -- 3 years into it -- they would be a little bit > further along than 1 million out of 7 million umichigan books, > because that makes it look like this could take 20 years total... > but, you know, i'm not paying their bills, so what say do i have? It looks like your wish has been granted. :) And to see how fast Google is racing through the stacks, take a look at the additional details about the current scanning: http://www.lib.umich.edu/grad/mdpprogress.html "Beginning on Tuesday, February 19, 2008, scanning in the Graduate Library started with the collections on the 3rd, 4th and 5th floors of Hatcher South. With approximately 250,000 volumes on each floor, it will take several months to digitize this part of the Graduate Library." Only "several months" to digitize about 750,000 volumes! Jose Menendez P.S. I'm surprised that no one has mentioned on gutvol-d before now that Microsoft was quitting its book scanning operation. From Bowerbird at aol.com Fri Jun 6 01:32:01 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 6 Jun 2008 04:32:01 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: jose said: > blah blah blah dodge distortion factoid blah blah blah ... time for me to go to bed... -bowerbird p.s. tell john wilkin "hi" for me. he never answers me when i try to ask him a question directly on his blog... ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/fce18cf4/attachment.htm From hart at pglaf.org Fri Jun 6 09:15:38 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 6 Jun 2008 09:15:38 -0700 (PDT) Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: <4848ED25.1040202@ibiblio.org> References: <4848ED25.1040202@ibiblio.org> Message-ID: On Fri, 6 Jun 2008, Jose Menendez wrote: > Sorry for the very late reply, but my internet time has been > limited for the past few weeks. > > On May 10, 2008, Bowerbird wrote: > >> sometimes the backbiting on this list gets _extremely_ amusing... >> :+) > > > I find the misinformation, ignorance, jumping to conclusions, and > faulty logic more amusing. :) Sad. . .but amusing. > > >> the original f.a.q. from michigan tells the numbers about >> michigan... > > [snip] > >> voila, we have the 7-million number. > > [snip] > >> voila, we have the 6-year timeframe... >> >> >> there were 5 libraries involved in the project at the outset. my >> guess -- back then, and even now today -- would be that if they >> intended to scan 7 million umichigan books in 6 years, they >> intended to scan _at_least_ another 7 million from the other 4 >> libraries in that same amount of time, so i'd say the implicit >> promise was to do 14 million in 6 years, and i don't think you >> can call that an unreasonable position, either then or now. > > I'd not only call it "unreasonable," I'd call it silly, especially > since both the "New York Times" and ABC News reported on Dec. 14, > 2004 that it could take 10 years to finish 15 million books. :) Once again I feel I must point out that giving an estimate of: 10 years for 15 million books is hardly a contradiction 6 years for 10 million books. In fact, given the usual discrepancies between "Peter and Paul," I'd say these were pretty close. Of course, in the original December 14, 2004 media frenzy, there were no references to half the books scanned being a secret library not available to the public. . .references, all the references _I_ heard were about a great library an entire world could use, and nothing about the majority for some secret private library only insiders could use and an entirely different, much more limited public library. {snip] > Courant wrote, "In the meantime, the University of Michigan > Library now has well over a million digitized books in its > catalogue, with the number growing by thousands every day." > > Now, the announcement of the "millionth book" was posted on or > before February 2nd. ("Last Update: 08:30 PM EST on Saturday, > February 02, 2008".) Now THIS is what they were all talking about on 12/14/04 and it is what the public can presumably use. 3 years and a fraction for the first million public eBooks. Let's suppose they double in the next equal timeframe, double production, that is, triple the total. That would be: 6 years and two fractions for the first 3 million public books. Which could happen, not to mention the other libraries. We'll just have to wait to see what happens. The real question, of course, is "how useful will they be?" Will there be too many that are just raw scans? Will there be too many that are just raw OCR? How many will be full text files of 99.975+& accuracy? > P.S. I'm surprised that no one has mentioned on gutvol-d before > now that Microsoft was quitting its book scanning operation. Perhaps no one here actually believed Microsoft was serious about doing eBooks in the first place. Perhaps some of the people here realized that Microsoft was not going to be happy about losing the Yahoo! deal, and combined it with the fact that Yahoo! is also a major supporter of the same OCA [Open Content Alliance] that Microsoft was trying to get in perhaps yet another kind of takeover bid. Quite possibly Microsoft found, as did the Federal Spook Agency and Co., that Brewster Kahal, who runs the OCA, is not quite an easy to maniputulate person as they had assumed. In any of these cases the real proof is in the pudding, as is a similar case with Amazon's Kindle and Sony's Reader. . .and The Google Book Search. . .how many people download how many books? Amazon is reluctant to admit they have hardly sold any Kindles, just as Sony won't admit The Sony Reader isn't selling, even at reduced pricing. The pundits aren't even estimating much over 50,000 sales for a reader of any brand, and that is nothing in a world with a more than a billion computer total and over 3 billion cellphones and who knows what other devices for reading eBooks. mh > _______________________________________________ gutvol-d mailing > list gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Fri Jun 6 09:29:43 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 6 Jun 2008 12:29:43 EDT Subject: [gutvol-d] mockingbirds Message-ID: over on the d.p. forms, dkretz recommends this: > http://www.ted.com/talks/view/id/161 it's a talk at t.e.d. by a lexicographer. she's cute, funny, and smart. highly recommended. also cute, funny, and smart, and highly recommended as well, is the video from that same session, by my performance poet friend rives: > http://www.ted.com/talks/view/id/108 and yeah, that second guy giving him a bear-hug afterward actually _is_ al gore... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/4bb83182/attachment.htm From Bowerbird at aol.com Fri Jun 6 10:28:23 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 6 Jun 2008 13:28:23 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: michael said: > Once again I feel I must point out > that giving an estimate of: > 10 years for 15 million books > is hardly a contradiction > 6 years for 10 million books. i'm not sure why you feel the need to "point out" simple arithmetic like that... i'd say that whoever would fall for such a dodge isn't worth bothering about. jose's just trying to yank your chain. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/e6c3267d/attachment-0001.htm From julio.reis at tintazul.com.pt Fri Jun 6 10:41:06 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Fri, 06 Jun 2008 18:41:06 +0100 Subject: [gutvol-d] cory scores another first In-Reply-To: References: Message-ID: <1212774066.6961.113.camel@abetarda> > giving away free copies doesn't seem to hurt cory's sales, > that's for sure... And I like his reasoning for licensing under Creative Commons... the thing about a paradigm shift in printed books... on how giving away books raises his profile, so that if *selling* books becomes the past, he'll still earn a good living by being invited to do lectures. So, he's laying his eggs in all baskets. Or, to use the nomenclature from 'Down and Out,' he's "raising his Whuffie." Nice book, and dailylit.com is great too J?lio. From marcello at perathoner.de Fri Jun 6 11:40:41 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri, 06 Jun 2008 20:40:41 +0200 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: <4848ED25.1040202@ibiblio.org> Message-ID: <484984A9.20000@perathoner.de> Michael Hart wrote: > In any of these cases the real proof is in the pudding, Michael Hart and Bowerbird announced today they founded Distributed Proofeaters. The goal of Distributed Proofeaters is to find the proof that was allegedly hidden in a pudding (by terrorists). Bowerbird wrote a pudding reader program that is almost in beta stage. It is estimated they will eat about 10 million puddings in 6 years. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Fri Jun 6 13:07:19 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 6 Jun 2008 16:07:19 EDT Subject: [gutvol-d] a new type of inconsistency Message-ID: it must be tough to keep coming up with new inconsistencies in the e-texts. but thanks to the ingenuity of many volunteers, and the willing cooperation of the whitewashers, project gutenberg continues with its important mission. take e-text #25536 as an example: > http://www.gutenberg.org/files/25536/ the images for an e-text are stored in the "images" subdirectory located in the html folder -- 25536-h/ in this case -- which is all well and good... except that the producer of this e-text decided to link to the page-images. now, that's a great idea -- indeed, i've suggested it often in the past. but... but for this text, the page-scans are in the "images" subdirectory, which is inconsistent with the way the page-scan images have always been handled, i.e., in a "#####-page-images" subdirectory. see, for instance, this e-text: > http://www.gutenberg.org/files/22144/ just another wrinkle that needs to be smoothed out for a consistent library... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080606/72f91e4d/attachment.htm From Bowerbird at aol.com Sun Jun 8 11:50:39 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 8 Jun 2008 14:50:39 EDT Subject: [gutvol-d] "but when she came there, the cupboard was bare..." Message-ID: here's an observation: > http://z-m-l.com/misc/was_bare.html on the left is the original. on the right is the version from distributed proofreaders. i'd say there's a large amount of disutility in removing text from pictures like this... but i'd be open to a discussion... -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080608/cf381fbb/attachment.htm From tb at baechler.net Sun Jun 8 12:15:31 2008 From: tb at baechler.net (Tony Baechler) Date: Sun, 8 Jun 2008 12:15:31 -0700 Subject: [gutvol-d] Preprints: Hart/Newby presentation Message-ID: <20080608191531.GA25191@investigative.net> Hello, I didn't see any contact email address on the Preprints site, so I'm asking here. I saw the Hart and Newby presentation from HOPE 6 in the form of an .iso file. What exactly is needed to make this ready for PG? All the site said was that it would be more effective in a compressed format, but that's vague. I'm assuming that we would at least want an mp3 audio file and an mpeg or mp4 video, but what else would be necessary? I can convert uncompressed audio and video to compressed formats, but some idea of what's wanted would be helpful. If someone else is already working on this, that's fine with me, just let me know. Thanks. From gbnewby at pglaf.org Sun Jun 8 14:05:26 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 8 Jun 2008 14:05:26 -0700 Subject: [gutvol-d] Preprints: Hart/Newby presentation In-Reply-To: <20080608191531.GA25191@investigative.net> References: <20080608191531.GA25191@investigative.net> Message-ID: <20080608210526.GA24061@mail.pglaf.org> On Sun, Jun 08, 2008 at 12:15:31PM -0700, Tony Baechler wrote: > Hello, > > I didn't see any contact email address on the Preprints site, so I'm > asking here. I saw the Hart and Newby presentation from HOPE 6 in the > form of an .iso file. What exactly is needed to make this ready for PG? > All the site said was that it would be more effective in a compressed > format, but that's vague. I'm assuming that we would at least want an > mp3 audio file and an mpeg or mp4 video, but what else would be > necessary? I can convert uncompressed audio and video to compressed > formats, but some idea of what's wanted would be helpful. If someone > else is already working on this, that's fine with me, just let me know. I'm the contact. There isn't anything precise I had in mind.. preprints are where I put "raw" material [with no particular definition] that needs effort to be added to the main PG collection. You're right that we'd like MP3 or similar audio, and MP4 or similar audio+video, extracted. If you want to do different versions [ogg, etc.] that's fine. I don't think anyone else is working on this. Michael and I are planning a presentation at the next conference in the same series, www.thelasthope.org Thanks! -- Greg From Bowerbird at aol.com Sun Jun 8 23:36:29 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 9 Jun 2008 02:36:29 EDT Subject: [gutvol-d] c'mon steve Message-ID: c'mon steve, let's push the envelope even further. v2 of the iphone is nice, but hardly revolutionary. give us a new machine that blows the doors off... please. thanks. -bowerbird ************** Get trade secrets for amazing burgers. Watch "Cooking with Tyler Florence" on AOL Food. (http://food.aol.com/tyler-florence?video=4?& NCID=aolfod00030000000002) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080609/0f8e6000/attachment.htm From lee at novomail.net Mon Jun 9 15:48:09 2008 From: lee at novomail.net (Lee Passey) Date: Mon, 09 Jun 2008 16:48:09 -0600 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: References: Message-ID: <484DB329.8060802@novomail.net> I apologize, but I simply cannot let this go without comment: Bowerbird at aol.com wrote: [snip] > the problem with that .html is that you can't maintain it. > even maintaining one such file can be a chore, but when > you must maintain tens of thousands, it gets impossible. As a general rule, bowerbird's comments are, while inflammatory, mostly correct. As he has pointed out, the biggest flaw in the PG corpus, even more serious than the loss of metadata, is that it is totally lacking in standards (he uses the word consistency) and that Mr. Hart is thoroughly committed to maintaining this chaos as an institutional objective. On this one particular issue, however, he is utterly, totally wrong. It is the big lie, that he hopes we will eventually accept simply because he repeats it over and over. The problem with bowerbird's s.m.l. is that you can't maintain it. HTML is a well-documented, standard markup language with dozens, if not hundreds, of tools that can be used to display and manipulate it. The notion that the display of HTML files is restricted to web-browsers is simply na?ve. s.m.l. is subtle, incomplete and ambiguous. It is, of course, an attempt to create a markup language and is far more than the Plain Vanilla Text (or Impoverished Text Format) that Mr. Hart advocates. It is utterly inconceivable to me that anyone could possibly claim that HTML is difficult to maintain whereas s.m.l. is not. As we have seen, it is certainly possible to abuse any markup language, and many of the HTML files now in the PG archive are evidence of this. But even the worst of these files are easier to modify, update and maintain than /any/ s.m.l. file. I realize that no one here really lends any credence to bowerbird's attempt to create Yet Another Markup Language; but every once in a while I think it is appropriate to call a spade a spade, and an irrational conclusion an irrational conclusion. From marcello at perathoner.de Tue Jun 10 04:23:21 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 10 Jun 2008 13:23:21 +0200 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484DB329.8060802@novomail.net> References: <484DB329.8060802@novomail.net> Message-ID: <484E6429.7040504@perathoner.de> Lee Passey wrote: > As a general rule, bowerbird's comments are, while inflammatory, mostly > correct. I strongly disagree. He's a braggard. Big mouth. No teeth. All BB did in 5 years was to propose one simple solution for all our problems, and that solution is wrong. It is wrong because his pet format doesn't scale to the level of text detail most of us want to capture. It is wrong because his pet format also is fundamentally flawed for reasons I detailed long ago but which BB never had the pluck to reply to. (ie. his use of non-printing characters for formatting and his reliance on counting empty lines for text division markup.) > As he has pointed out, the biggest flaw in the PG corpus, even > more serious than the loss of metadata, is that it is totally lacking in > standards (he uses the word consistency) and that Mr. Hart is thoroughly > committed to maintaining this chaos as an institutional objective. I wouldn't say that. Basically, once an idea formed in MH's head, it is not amendable by fact. Fact is: that DP, doing exactly the opposite of what MH recommends, produced more books in 8 years than PG in 37 years. There is no way on earth that this simple fact will convince MH that organized ebook production is the way to go. Fortunately the people at DP didn't listen to what MH said but simply set up an environment for organized ebook production and started turning out books. The solution to your problem is: take your ebook standard of choice and start converting the library. If it does any good, people will notice and jump to it. -- Marcello Perathoner webmaster at gutenberg.org From hart at pglaf.org Tue Jun 10 08:58:04 2008 From: hart at pglaf.org (Michael Hart) Date: Tue, 10 Jun 2008 08:58:04 -0700 (PDT) Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484E6429.7040504@perathoner.de> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> Message-ID: I suppose the real question is whether any more than a small handful of people believe the message I reply to below. If not, perhaps it would be best simply to ignore messages a person such as this sends to the list. How many people think such replies are really necessary? And, by the way, it's more like 38 years, but who would make the presumptive call that the author in question pays a more than lip-service attention to what he is saying. . . . Hopefully most of you will agree that replying is just waste of time after waste of time after waste of time, and forgive me if I take some of my own advice and wait until people say they need a refutation to all this garbage. . . . . The message below will have you believe that DP did all this without either permission or encouragement from me. This can be filed with most of the rest of that author. As I have stated in reply to these same accusations before, I personally went to Las Vegas, the then home town of DP's founder, Charles Franks, where we met in person and worked out all the details he had in mind, but the author below-- sadly to say--was not in attendance. A latecomer. In addition, he would have you believe that DP has created much more than half of all the eBooks at PG sites. According to his own numbers, it is, and has been for some time, about half of the total listed in the Newsletters. Which is no mean feat and should not reflect upon DP other than in the most positive manner, it is simply a numerical, an nothing more, observation on the strategies and tactics, used for years by the author whose name appears below. On Tue, 10 Jun 2008, Marcello Perathoner wrote: > Lee Passey wrote: > >> As a general rule, bowerbird's comments are, while inflammatory, mostly >> correct. > > I strongly disagree. He's a braggard. Big mouth. No teeth. > > All BB did in 5 years was to propose one simple solution for all our > problems, and that solution is wrong. > > It is wrong because his pet format doesn't scale to the level of text > detail most of us want to capture. > > It is wrong because his pet format also is fundamentally flawed for > reasons I detailed long ago but which BB never had the pluck to reply > to. (ie. his use of non-printing characters for formatting and his > reliance on counting empty lines for text division markup.) > > >> As he has pointed out, the biggest flaw in the PG corpus, even >> more serious than the loss of metadata, is that it is totally lacking in >> standards (he uses the word consistency) and that Mr. Hart is thoroughly >> committed to maintaining this chaos as an institutional objective. > > I wouldn't say that. Basically, once an idea formed in MH's head, it is > not amendable by fact. Fact is: that DP, doing exactly the opposite of > what MH recommends, produced more books in 8 years than PG in 37 years. > There is no way on earth that this simple fact will convince MH that > organized ebook production is the way to go. > > Fortunately the people at DP didn't listen to what MH said but simply > set up an environment for organized ebook production and started turning > out books. > > The solution to your problem is: take your ebook standard of choice and > start converting the library. If it does any good, people will notice > and jump to it. > > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From Bowerbird at aol.com Tue Jun 10 10:38:15 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 10 Jun 2008 13:38:15 EDT Subject: [gutvol-d] good tools that do all the work (the short version) Message-ID: marcello is in my spam folder, of course, since his signal-to-noise ratio is so close to zero that there's no reason to even try to discern the number... lee has, on some occasions, said a few things that were worth reading, but i had to put him in my spam folder too, because he comes from a position where he hates project gutenberg, while i come from a position of _love_. so even when lee _thinks_ that i might agree with him, he's badly mistaken. and in addition to our motivational differences, there are just a lot of times where lee is misguided. this latest post of his would be one such instance. but i've passed the point where i'm jumping in to say "no, you are wrong..." so that post could have stayed in my spam folder and i wouldn't have cared. now that michael has fished it out and waved it in front of me, i will say that lee doesn't seem to have a clue. i wonder if he's actually _looked_ at any of the d.p. .html books. and i'm quite sure he hasn't _tracked_ them over time. if he had, he'd know that the various producers have used a _wide_variety_ of methods in creating all those .html files. and _that_ is what makes them difficult (to the point of impossibility) to maintain. and the fact that .html is "a standard" doesn't really fix that problem. anyone who wants to convert those files is going to have to go into each one individually and _grok_ it, first of all, and then _apply_an_upgrade_ more-or-less manually. difficult. and it becomes more and more difficult the longer that the task is put off... with z.m.l., on the other hand, there's only one way to get a desired effect. so all the files will be _consistent_, so they can be treated _programatically_. that is, i just write the program that does the upgrade, and run it across all the files. once the program is written, most of the work is done, no matter how many files i run it against. so this infrastructure scales extremely well. lee can say whatever he will, but what he says won't make it one bit easier for anyone to maintain the big hairy mess that has become the p.g. corpus. after all, if it was easy to upgrade those files, they'd all be .tei files by now... and what he says won't make it one bit harder for me to maintain my mirror. which, by the way, is coming along just fine, in case anyone was wondering. so michael, i'd recommend you divert those fellows to your spam folder... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080610/e8cd2f45/attachment.htm From lee at novomail.net Tue Jun 10 10:45:08 2008 From: lee at novomail.net (Lee Passey) Date: Tue, 10 Jun 2008 11:45:08 -0600 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484E6429.7040504@perathoner.de> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> Message-ID: <484EBDA4.4030402@novomail.net> Marcello Perathoner wrote: > The solution to your problem is: take your ebook standard of choice and > start converting the library. If it does any good, people will notice > and jump to it. Well, I don't really have a problem; but your suggestion is a good one. It is, of course, a little more complex than this. You do need to start by selecting a markup standard, and creating a markup tutorial that goes beyond the bare syntax requirements. Then you need to start building a library using that markup language; but the Project Gutenberg archive has been so sloppily created you can't use it as a starting point. Typographical markup, provenance, references and other such metadata have been irretrievably lost, so you pretty much have to start over. Now, because consistency within the archive is also important (if you want a standard, you should also want an archive where everything satisfies that standard) you can't use the PG archive because it has no standards. Even if the new, improved files make their way back to PG you would want a place where they could be stored in their pristine state. So, PG has been useful how? Please, please, PLEASE do not think that I'm suggesting that PG should become relevant or useful; I'm simply pointing out that it is not, and that attempts to make it relevant or useful will simply be futile. From marcello at perathoner.de Tue Jun 10 10:54:56 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 10 Jun 2008 19:54:56 +0200 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> Message-ID: <484EBFF0.2040108@perathoner.de> Michael Hart wrote: > The message below will have you believe that DP did all this > without either permission or encouragement from me. I didn't know anybody needed your permission to do ebooks. > As I have stated in reply to these same accusations before, > I personally went to Las Vegas, the then home town of DP's > founder, Charles Franks, where we met in person and worked > out all the details he had in mind, but the author below-- > sadly to say--was not in attendance. A latecomer. The accounts I heard about this meeting where somehow different. But it often happens that persons percieve the same situation in a different manner. > In addition, he would have you believe that DP has created > much more than half of all the eBooks at PG sites. I said DP produced more books than PG. Not *much* more. As of today and according to DP they have completed and posted 13,342 books. As of PG, today we have posted #25755. 25755 - 13342 ======= 12413 The 13,342 books posted thru DP are more than the 12,413 books posted thru other channels. About a thousand more. And they did it in 8 years, not in 38. And this simple arithmetic proves that you create less ebooks by "offer[ing] as many freedoms to our volunteers as possible" and by not being "very bossy about what our volunteers should do". (http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart) It shows instead how you can create more books by providing guidance and a productive environment to volunteers. It proves that by requiring strict guidelines you actually go faster than by requiring none. If there shall be a PG II, then that title belongs to DP. -- Marcello Perathoner webmaster at gutenberg.org From rburkey2005 at earthlink.net Tue Jun 10 10:58:43 2008 From: rburkey2005 at earthlink.net (Ron Burkey) Date: Tue, 10 Jun 2008 12:58:43 -0500 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484EBDA4.4030402@novomail.net> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <484EBDA4.4030402@novomail.net> Message-ID: <1213120723.28993.3.camel@software1.heads-up.local> On Tue, 2008-06-10 at 11:45 -0600, Lee Passey wrote: > Marcello Perathoner wrote: > > > The solution to your problem is: take your ebook standard of choice and > > start converting the library. If it does any good, people will notice > > and jump to it. > > Well, I don't really have a problem; but your suggestion is a good one. > > It is, of course, a little more complex than this. You do need to start > by selecting a markup standard, and creating a markup tutorial that goes > beyond the bare syntax requirements. > > Then you need to start building a library using that markup language; > but the Project Gutenberg archive has been so sloppily created you can't > use it as a starting point. Typographical markup, provenance, references > and other such metadata have been irretrievably lost, so you pretty much > have to start over. > That's throwing out the baby with the bath-water. You need to distinguish between the perfect case and a practically-achievable case. Step 1: Choose a standard. Make sure it's flexible enough to handle unknown data, such as provenance. Step 2: Get some texts into that format. Step 3: Hope other people notice and jump on board. Step 4: *Then* worry about the do-overs and bemoan the fact that there need to be any do-overs (when it really could have been avoided). From Bowerbird at aol.com Tue Jun 10 11:58:09 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 10 Jun 2008 14:58:09 EDT Subject: [gutvol-d] the wonder of widger, cleaning up the library Message-ID: david widger has cleaned up gibbon's "decline and fall". there were 2 versions of this book in the p.g. library -- one of them text-only and the other .html-only -- with both formats encompassing several p.g. e-texts, since this is a book that was published in 6 volumes... as david widger put it: > Both of these sets have been recently completely > reproofed with correction of several thousand errors. this is david widger commenting on "several thousand errors", folks, not someone with a "grudge" against project gutenberg. it's just a _fact_ that many of the e-texts are plagued by errors. anyone who disputes this message has their head in the sand. they need to be cleaned... thanks to david for doing that job... rather than adding more books to the pile, the best thing for d.p. (and independent digitizers) to do at this time would be to find and fix the errors in the existing e-texts... > History of the Decline and Fall of the Roman Empire > http://www.gutenberg.org/etext/731 > History of the Decline and Fall of the Roman Empire > http://www.gutenberg.org/etext/890 > The History Of The Decline And Fall Of The Roman Empire > http://www.gutenberg.org/etext/25717 (and yes, the inconsistency in the titles did make me laugh.) thanks again to david, for doing what needs to be done... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080610/bc2777a2/attachment-0001.htm From bzg at altern.org Tue Jun 10 20:52:40 2008 From: bzg at altern.org (Bastien) Date: Wed, 11 Jun 2008 05:52:40 +0200 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484DB329.8060802@novomail.net> (Lee Passey's message of "Mon, 09 Jun 2008 16:48:09 -0600") References: <484DB329.8060802@novomail.net> Message-ID: <87od68typj.fsf@bzg.ath.cx> Lee Passey writes: > I realize that no one here really lends any credence to bowerbird's > attempt to create Yet Another Markup Language; but every once in a > while I think it is appropriate to call a spade a spade, and an > irrational conclusion an irrational conclusion. And? -- Bastien From ebooks at ibiblio.org Tue Jun 10 22:59:22 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Wed, 11 Jun 2008 01:59:22 -0400 Subject: [gutvol-d] cyberlibrary numbers In-Reply-To: References: Message-ID: <484F69BA.2030702@ibiblio.org> On June 6, 2008, Bowerbird wrote: > jose said: > > blah blah blah dodge distortion factoid blah blah blah Actually, that would make a pretty good description of some of your posts, especially if you add "jump to erroneous conclusions blah blah blah grandiose claims blah blah blah." :) You know, Bowerbird, you disappoint me. I thought you'd be jumping for joy after finding out that Google has scanned more than twice as many books at the University of Michigan as you thought they had. ;) > p.s. tell john wilkin "hi" for me. he never answers me > when i try to ask him a question directly on his blog... It seems that Paul Courant doesn't answer you when you address him on his blog either: http://paulcourant.net/2008/04/26/john-wilkin-and-others-on-openness-and-its-opposites/#comments http://paulcourant.net/2008/05/31/microsoft-exits-the-mass-digitization-business/#comments I can't speak for either other one of them, but if I had to hazard a guess, perhaps the reason they ignore you is that they think you ought to do your homework. For instance, you might start by reading UM's cooperative agreement with Google. It's been available on the UM website since June 2005. In fact, David Carter posted a link to a PDF version of the agreement in this Book People post on June 16, 2005: "Text of Michigan/Google agreement" http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-16,12 Two days later, J Flenner also posted a link to the agreement in this post: "Details Revealed on Google Library Project at U. Michigan" http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2005&post=2005-06-18,2 Here are links to it in both HTML and PDF formats: http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html http://www.lib.umich.edu/mdp/um-google-cooperative-agreement.pdf You might want to start with this paragraph: "4.4.1 Use of U of M Digital Copy on U of M Website. U of M shall have the right to use the U of M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered on U of M's website. U of M shall implement technological measures (e.g., through use of the robots.txt protocol) to restrict automated access to any portion of the U of M Digital Copy or the portions of the U of M website on which any portion of the U of M Digital Copy is available. U of M shall also make reasonable efforts (including but not limited to restrictions placed in Terms of Use for the U of M website) to prevent third parties from (a) downloading or otherwise obtaining any portion of the U of M Digital Copy for commercial purposes, (b) redistributing any portions of the U of M Digital Copy, or (c) automated and systematic downloading from its website image files from the U of M Digital Copy. U of M shall restrict access to the U of M Digital Copy to those persons having a need to access such materials and shall also cooperate in good faith with Google to mutually develop methods and systems for ensuring that the substantial portions of the U of M Digital Copy are not downloaded from the services offered on U of M's website or otherwise disseminated to the public at large." Did you notice the part about preventing "third parties from ... (b) redistributing any portions of the U of M Digital Copy, or (c) automated and systematic downloading from its website image files from the U of M Digital Copy"? That would apply to you, third party, er, Bowerbird. Perhaps that's why they ignore you when you talk about scraping and re-mounting their scans and OCR. Well, if I can ever help you with your homework in the future, feel free to let me know. My tutoring rates are very reasonable. ;) Jose Menendez From ebooks at ibiblio.org Tue Jun 10 23:18:06 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Wed, 11 Jun 2008 02:18:06 -0400 Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: References: <4848ED25.1040202@ibiblio.org> Message-ID: <484F6E1E.3090206@ibiblio.org> I thought this deserved its own thread. On June 6, 2008, I wrote: > P.S. I'm surprised that no one has mentioned on gutvol-d before > now that Microsoft was quitting its book scanning operation. The same day Michael Hart replied: > Perhaps no one here actually believed Microsoft was serious about > doing eBooks in the first place. > > Perhaps some of the people here realized that Microsoft was not > going to be happy about losing the Yahoo! deal, and combined it > with the fact that Yahoo! is also a major supporter of the same > OCA [Open Content Alliance] that Microsoft was trying to get in > perhaps yet another kind of takeover bid. Well, if you'd done your homework, Michael, you would have realized that Microsoft was not only serious, but it was responsible for the *overwhelming majority* of books scanned by the OCA. You also would have realized that Yahoo is NOT a "major supporter" of the OCA. Indeed, in terms of financial support, Yahoo WAS a minuscule supporter of the OCA compared to Microsoft. (Notice I switched to the past tense. There's a reason for that.) Want some proof? Looking at the Internet Archive's Text Archive page: http://www.archive.org/details/texts I see "202,578 items" for the American Libraries sub-collection and "119,424 items" for the Canadian Libraries. Adding them together, we get a total of 322,002. (Those numbers are updated regularly as more books are put online, so people may see higher numbers later.) Now how many of those 322,000+ books were thanks to Yahoo and how many were thanks to Microsoft? Let's start with "major supporter" Yahoo: http://www.archive.org/details/yahoo_books I hope you're sitting down when you read this, Michael, because Yahoo contributed the staggering total of "1,075 items." That's not a typo; I didn't leave out a few digits. Only 1,075! If we divide 1,075 by 322,002, we get 0.00334. So only 0.334% of the books the OCA has scanned and put online from American and Canadian libraries are thanks to "major supporter" Yahoo! Now let's see how Microsoft did: http://www.archive.org/details/msn_books The total for Microsoft is "288,518 items." (Since the OCA is still scanning books with funds contributed by Microsoft, this total keeps getting updated too.) 288,518 divided by 322,002 equals 0.896. So 89.6% of the books the OCA has scanned and put online from American and Canadian libraries are thanks to the company "perhaps no one here actually believed ... was serious." If you're still tempted to cling to the notion that Microsoft wasn't serious and that Yahoo is a "major supporter" of the OCA, let's take a look at Brewster Kahle's own announcement on May 26 about Microsoft's decision: http://www.archive.org/iathreads/post-view.php?id=194217 It's not too long, so I'm going to quote the whole thing, with a few comments from me enclosed in brackets []. "The Internet Archive operates 13 scanning centers in great libraries, digitizing 1000 books a day. This scanning is financially supported by libraries, foundations, and the Microsoft Corporation. Today, Microsoft has announced that it will ramp down their investment in this area. We very much appreciate their efforts and funding in book scanning over the last 3 years. As a result, over 300,000 books are publicly available on the archive.org site that would not otherwise be." [Note that Brewster didn't mention ANY financial support from Yahoo. See why I switched to the past tense and said that "Yahoo WAS a minuscule supporter of the OCA"?] "To their credit, they said they are taking off any contractual restrictions on the public domain books and letting us keep the equipment that they funded. This is extremely important because it can allow those of us in the public sphere to leverage what they helped build. Keeping the public domain materials public domain is where we all wanted to be. Getting a books scanning process in place is also a major accomplishment. Thank you Microsoft." [Note the mention of "contractual restrictions." I'll get to those a little later.] "Funding for the time being is secure, but going forward we will need to replace the Microsoft funding. Microsoft has always encourage the Open Content Alliance to work in parallel in case this day arrived. Lets work together, quickly, to build on the existing momentum. All ideas welcome. "Onward to a completely public library system!" Did you notice, Michael, that Brewster didn't say anything about Yahoo helping to replace the Microsoft funding? > Quite possibly Microsoft found, as did the Federal Spook Agency > and Co., that Brewster Kahal, who runs the OCA, is not quite an > easy to maniputulate person as they had assumed. But apparently his surname is easy to misspell. ;) Seriously, do you recall the "contractual restrictions" I pointed out in Brewster's announcement? If you look again at the Microsoft page at the Internet Archive: http://www.archive.org/details/msn_books you'll see a box labeled "Rights" on the left side of the page. Here's what it says: "Books scanned before November 1, 2006 are under OCA principles, thereafter they are available for non-commercial use and may not appear in commercial services. Please contact info at archive.org or Microsoft about bulk access." Brewster Kahle may not be "easy to maniputulate [sic]," but he didn't stop Microsoft from violating the OCA principles it had agreed to. Jose Menendez From jmdyck at ibiblio.org Wed Jun 11 01:41:11 2008 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed, 11 Jun 2008 01:41:11 -0700 Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking In-Reply-To: <484EBFF0.2040108@perathoner.de> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <484EBFF0.2040108@perathoner.de> Message-ID: <484F8FA7.2000205@ibiblio.org> Marcello Perathoner wrote: > > As of today and according to DP they have completed and posted 13,342 books. > > As of PG, today we have posted #25755. > > 25755 - > 13342 > ======= > 12413 The 13,342 number at pgdp.net is the number of its 'projects' that have "posted to PG" status. However, it isn't the case that 1 pgdp.net project equals 1 PG etext. Large books are often split into multiple projects and then merged into a single posting -- sometimes (mostly in the past) the multiple projects would each get the "posted to PG" status, so contributing more than one to the number in question. Instead, to get the number of PG texts that were contributed by pgdp.net, the best approximation is the number in the upper right of most pages at the site, currently 12,897. (This is the number of distinct PG etext numbers that we have recorded for our posted projects.) Also, I'm guessing that PG's "reserved count" is still about 40, so when #25755 is posted, that means PG (USA) has about 25715 books. So the correct calculation is something more like: 25715 PG USA total - 12897 # from pgdp.net ----- 12818 # from elsewhere So pgdp.net still accounts for more than half, but it's pretty close. The last time I did that calculation (a couple months ago, I think), PGDP's contribution was just under half, so we must have crossed the equator sometime since then. Mind you, if you set aside audio and video files and only look at texts per se, PGDP's contribution has been more than half for quite a while. Someone else can do that calculation if they want. Of course, if you consider the larger interpretations of "Project Gutenberg", PGDP's fraction thereof will be less. -Michael From paulmaas at airpost.net Wed Jun 11 07:16:51 2008 From: paulmaas at airpost.net (Paul Maas) Date: Wed, 11 Jun 2008 07:16:51 -0700 Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking In-Reply-To: <484F8FA7.2000205@ibiblio.org> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <484EBFF0.2040108@perathoner.de> <484F8FA7.2000205@ibiblio.org> Message-ID: <1213193811.18491.1257910599@webmail.messagingengine.com> What can be said is that DP is the #1 contributor of electronic texts to PG. No one else even comes close. Congratulations to DP! pm On Wed, 11 Jun 2008 01:41:11 -0700, "Michael Dyck" said: > Marcello Perathoner wrote: > > > > As of today and according to DP they have completed and posted 13,342 books. > > > > As of PG, today we have posted #25755. > > > > 25755 - > > 13342 > > ======= > > 12413 > > The 13,342 number at pgdp.net is the number of its 'projects' that have > "posted to PG" status. However, it isn't the case that 1 pgdp.net > project equals 1 PG etext. Large books are often split into multiple > projects and then merged into a single posting -- sometimes (mostly in > the past) the multiple projects would each get the "posted to PG" > status, so contributing more than one to the number in question. > > Instead, to get the number of PG texts that were contributed by > pgdp.net, the best approximation is the number in the upper right of > most pages at the site, currently 12,897. (This is the number of > distinct PG etext numbers that we have recorded for our posted projects.) > > Also, I'm guessing that PG's "reserved count" is still about 40, so when > #25755 is posted, that means PG (USA) has about 25715 books. So the > correct calculation is something more like: > > 25715 PG USA total > - 12897 # from pgdp.net > ----- > 12818 # from elsewhere > > So pgdp.net still accounts for more than half, but it's pretty close. > The last time I did that calculation (a couple months ago, I think), > PGDP's contribution was just under half, so we must have crossed the > equator sometime since then. > > Mind you, if you set aside audio and video files and only look at texts > per se, PGDP's contribution has been more than half for quite a while. > Someone else can do that calculation if they want. > > Of course, if you consider the larger interpretations of "Project > Gutenberg", PGDP's fraction thereof will be less. -- Paul Maas paulmaas at airpost.net -- http://www.fastmail.fm - Choose from over 50 domains or use your own From hart at pglaf.org Wed Jun 11 09:53:38 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 11 Jun 2008 09:53:38 -0700 (PDT) Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <484EBFF0.2040108@perathoner.de> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <484EBFF0.2040108@perathoner.de> Message-ID: Once again the author below would have you believe 25,xxx are all the Project Gutenberg eBooks in existence without giving any credit to PG of Australia with 1640, or Canada with ~100, or Europe with ~500, Which are enough by themselves to counter his argument of "simple aritmetic" as listed below. NOT to even mention PrePrints with 387 or to mention his dreated Nemesis he named indirectly below, the dreaded site at: http://www.gutenberg.cc with 75,000+. Since he wasn't there at the Las Vegas meeting, he can do only the most indirectly kibitzing from the sidelines and the only way to make sure to keep his remarks in a proper perspective is to save his messages, rather than just the easier alternative of deleting them, and returning them a while later when they will impact his face with the same, or actually reversed, impact he intended. It would appear that his major goal is to cause strife in our midst here, and since no one replied that they needed any assistance in refuting his rants and raves, we'll see just how well he manages without any feedback for a bit. However, the more he rants and raves, the more he proves, again and again, that freedom of speech is strong here at Project Gutenberg. And, just to make the "simple arithmetic" even moreso: Current Totals 25,755 Project Gutenberg Under US Copyright Law 1,640 Project Gutenberg Of Australia 504 Project Gutenberg of Europe 138 Project Gutenberg of Canada [through May] ====== 28,037 Grand Total Not to mention some worthwhile titles at PrePrints: 387 Project Gutenberg PrePrints or, since he mentioned the dreaded II below, we should, but, won't, add in the 75,000+ eBooks donated by those entirely outside the Project Gutenberg evironment from around the world, but who don't have any distribution. And it would certainly be too much to consider a first Project Gutenberg spin-off, Project Runeberg, from way before DP's time, or Project Wittenberg, or. . . . On Tue, 10 Jun 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >> The message below will have you believe that DP did all this >> without either permission or encouragement from me. > > I didn't know anybody needed your permission to do ebooks. > > >> As I have stated in reply to these same accusations before, >> I personally went to Las Vegas, the then home town of DP's >> founder, Charles Franks, where we met in person and worked >> out all the details he had in mind, but the author below-- >> sadly to say--was not in attendance. A latecomer. > > The accounts I heard about this meeting where somehow different. > But it often happens that persons percieve the same situation in a > different manner. > > >> In addition, he would have you believe that DP has created >> much more than half of all the eBooks at PG sites. > > I said DP produced more books than PG. Not *much* more. > > As of today and according to DP they have completed and posted > 13,342 books. > > As of PG, today we have posted #25755. > > 25755 - > 13342 > ======= > 12413 > > The 13,342 books posted thru DP are more than the 12,413 books > posted thru other channels. About a thousand more. And they did it > in 8 years, not in 38. > > And this simple arithmetic proves that you create less ebooks by > "offer[ing] as many freedoms to our volunteers as possible" and by > not being "very bossy about what our volunteers should do". > (http://www.gutenberg.org/wiki/Gutenberg:Project_Gutenberg_Mission_Statement_by_Michael_Hart) > > It shows instead how you can create more books by providing > guidance and a productive environment to volunteers. It proves > that by requiring strict guidelines you actually go faster than by > requiring none. > > If there shall be a PG II, then that title belongs to DP. > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From hart at pglaf.org Wed Jun 11 10:14:34 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 11 Jun 2008 10:14:34 -0700 (PDT) Subject: [gutvol-d] "Simple Arithmetic" In-Reply-To: References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <484EBFF0.2040108@perathoner.de> Message-ID: 25,755 Project Gutenberg Under US Copyright Law 1,640 Project Gutenberg Of Australia 504 Project Gutenberg of Europe 138 Project Gutenberg of Canada [through May] ====== 28,037 Grand Total NOT mentioning so many more. 13,342 as stated in Marcello's "simple arithmetic" It's really a shame that these figures are so eqocentric, UScentric, etc., as to derail the topic from what SHOULD be the real statement at hand. That Distributed Proofreader is doing WONDERFUL WORK!!! And the misleading part about 8 years versus 38, well-- it's not quite "simple arithmetic" when you get into an example of such rapid growth curves, yet I think anyone here realizes that any such fast growth function yields much more in recent years than earlier years. Of course, we perhaps have to take into account that it is outside that author's timeframe of experience to say just how easy or hard it was starting those 38 years. Well, 37.95, or so. Let's see just how much growth there has been in a same 8 year period of flash RAM, for example??? No. . .not here, not now. This should really be a time for CONGRATULATIONS TO DP! Regargless of that author's inabilities to say it well, without im proper "simple arithmetic" to detract from a GOOD SOLID "WELL DONE!!!!!!!" and MANY THANKS TO THE DISTRIBUTED PROOFREADERS!!!!!!! Michael S. Hart Founder Project Gutenberg From Bowerbird at aol.com Wed Jun 11 10:14:37 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 11 Jun 2008 13:14:37 EDT Subject: [gutvol-d] PGDP's contribution to PG, numerically speaking Message-ID: i'd say that, for some time now, 3/4 of the books that are being posted were digitized by distributed proofreaders... indeed, at this point, i'd estimate it conservatively at 4/5. and i wouldn't be surprised if it were to jump up to 9/10. or higher. these days, on the "independent" side, we have david widger, and al haines, and the chinese students, and not much more. there are literally _thousands_ of people working over at d.p. and ever since the p.g. website has had a banner that directs people to d.p., they've had a constant supply of volunteers... so if you thought d.p. is "independent" of p.g., think again. before those banners, they were worried about their churn. even now, _we_ should be worried about their burnout rate, because they are destroying a _huge_ number of digitizers. even the ones they keep are being stunted at the p1 phase. besides, a number of the people at d.p. are working there _because_ it feeds p.g. if it didn't, they'd work elsewhere... now... if only d.p. were _efficient_, they could be turning out a _lot_ more e-texts than they are. as it is, their inefficiency has them stalled out at a couple thousand e-texts per year. in comparison, google scans that many books before lunch. the d.p. number _could_ go up, but if you look more closely, you might well discover that it's because they're now doing many more _children's_books_, which have barely any text. they're putting a lot more time into the _illustrations_ now. that's not a bad thing. the emphasis on quantity is stupid. it's also the case that they are now mounting more scan-sets. again, lowers the quantity, but it's the right thing to be doing. to repeat, any emphasis on quantity is stupid. especially since that's a game that d.p. will lose when a more-efficient system appears on the scene... i suggest instead that people celebrate the fact that d.p. has crystallized a community of thousands for the purpose of digitizing the public domain... that's beautiful... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/84ea1d63/attachment.htm From Bowerbird at aol.com Wed Jun 11 10:25:21 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 11 Jun 2008 13:25:21 EDT Subject: [gutvol-d] cyberlibrary numbers Message-ID: jose said: > You know, Bowerbird, you disappoint me. I thought you'd be > jumping for joy after finding out that Google has scanned > more than twice as many books at the University of Michigan > as you thought they had. ;) as _i_ "thought" they had? the university itself _announced_ that they've scanned one million. made a big deal about it. and they continue to use _that_ number. why -- privately -- they told you something different, i don't know. or care. it might be that they have _scanned_ 2.5 million already, but have only _processed_ 1 million. if that's the case, then they really need to work on their processing, because it's a bottleneck. at any rate, i've said that i don't _care_ if they are behind schedule. as long as i think they're working hard on it, i'm happy with them... so i'm happy. but i think it's absolutely clear that if they only have a million public-facing e-books right now, they're behind schedule. and you can throw up a big smokescreen, but it's still perfectly clear. *** of course, if i were to have said that they _weren't_ behind schedule, you would have thrown up a big smokescreen to say that they _were_. because you don't really care about _the_truth_ much at all, jose, you just care about disputing whatever that i say. or michael says. and because of that, jose, _you_disappoint_me_. enough that it's to the point where i'm going to have to put you in my spam folder. criminey sakes, if i said your wife was good-looking, you would say she's ugly, just to dispute me. if i said the sky was blue, you'd argue. > It seems that Paul Courant doesn't answer you > when you address him on his blog either: that's right. i've got about 5 posts in to him without 1 reply yet... > I can't speak for either other one of them, but if I had to > hazard a guess, perhaps the reason they ignore you is that > they think you ought to do your homework. For instance, > you might start by reading UM's cooperative agreement with > Google. It's been available on the UM website since June 2005. don't be an ass, jose. of course i've read that contract. in fact, i even referred to various parts of it earlier in this very thread... > You might want to start with this paragraph: oh yes, i've read that paragraph too. indeed, that very paragraph was mentioned earlier over on courant's blog, before i had posted. and that paragraph certainly seems to say that umichigan _must_ try to thwart automated downloads. but yet john wilkin _insisted_ that their material is _free_. that's _why_ brewster challenged him, and carl malmud followed with a question that pinpointed the issue. on the one hand, they say their e-books are _free_, but on the other, you're not allowed to harvest them en masse. that's a contradiction. unlike carl, who posed the mass-harvest question as "hypothetical", i've informed them that i have very _real_ intentions to mass-scrape their public-domain books, to see how they resolve the contradiction. i have given the google project -- and umichigan in particular -- a good deal of support across cyberspace, precisely _because_ john wilkin has said -- loudly and clearly, from the beginning -- that they would make the pubic-domain material freely available. now they are trying to take that back, by saying that we can only "look" at the material, that we can't actually _download_ it ourselves. that's _bullshit_, and i'm going to call them on it, and do it publicly... of course, i'm sure you knew all this, and you're just kicking up dust, which is why -- from now on -- you're going in my spam folder, jose. if people want to believe the disinformation you spout out, they can... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/4baa3896/attachment.htm From Bowerbird at aol.com Wed Jun 11 10:44:29 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 11 Jun 2008 13:44:29 EDT Subject: [gutvol-d] cyberlibrary numbers -- one more question Message-ID: one more question, jose... i know you're retired now, but when you were working, were you a lawyer? the way you kick up dust, and blow smokescreens, well... well, it makes me think that you were a lawyer. is that right? -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/f14a2c93/attachment.htm From Bowerbird at aol.com Wed Jun 11 11:05:10 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 11 Jun 2008 14:05:10 EDT Subject: [gutvol-d] good tools that do all the work (the short version) Message-ID: michael said: > his major goal is to cause strife in our midst i keep wondering why you have this person _inside_ your organization, minding your website. certainly web-jockeys can't be that hard to find. indeed, it seems to me that there are a lot of vultures around here, just waiting for you to die, so they can feast on the p.g. carcass and turn it into the opposite of what you intended which made it great... of course, maybe once you're gone, and they take over, and p.g. fails, that will be the ultimate proof that your approach was the correct one. then someone else will come in and pick up the pieces and restore it... crafty, michael, crafty... :+) -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080611/db091d3e/attachment.htm From hart at pglaf.org Wed Jun 11 11:05:28 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 11 Jun 2008 11:05:28 -0700 (PDT) Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: <484F6E1E.3090206@ibiblio.org> References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> Message-ID: Several more contradiction to refute: 1. Microsoft was not 100% responsible for 300K titles-- via the various Internet Archive / Open Content Alliance efforts that have been going on. Not to mention several years of effort without Microsoft that produced 100K. 1a. Let's not forget the previous lesson in which we had just learned to watch out for claims that /recent years/ are more important the /previous years/ logaritmically. 1b. Thus, if Microsoft were "seriously interested" in an eBook operation, they would certainly have done more. 1c. Given the sheer size of Microsoft, this can only per that size ratio, be seen as "putting a toe in the water" at the very most. 1d. Hence the term "minuscule" is more appropriate here. Given the serious orders of magnitude difference between Microsoft and Yahoo, it was Microsoft's interest in this world of eBooks that was statistically "minuscule." If MS were at all seriously interested in eBooks, it had every opportunity and resource to go toe to toe with the Google effort, but never really applied themselves to an effort more than about 10% of what Google laid out day 1 of their announcements. Thus it was never part of my expectations that Microsoft even WANTED to become a major player in the eBook world, and hence I was apparently less surprised than most of a world of punditry to see them go. . .since I wasn't sure they had ever even arrive to the tune of more than a toe in the door, or in the water, so to speak. And this doesn't even address "finished eBooks" of kinds we are used to discussing here. When one of the "giants" of our industry puts a million, or two, or three million eBooks into the system, even on a commercial basis, then, and only then will I take them at all seriously. . .and the longer they wait, the more, and more, and more eBooks it will take. Why? Because there will already be so many more, as per those lessons I've been mentioning about growth curves. There are already millions of eBooks out there without a dependence on any one source. Thus it's no big deal for Microsoft to assist in scanned outputs of 300,000, particularly if one considers larger sizing when it comes to giants. It's hard to say that a hundred billion dollar company's interest is serious when they spend 1% of 1% of it on it as in the case of the subject under discussion. When you are talking about giants worth hundred billion, two hundred billion, etc., this is really more like just a toenail in the water, not even a toe. 1% of 100 billion is 1 billion 1% of 1 billion is 10 million. Let's say 300,000 books at 333 1/3 pages each. That's 100 million pages. . .100,000,000 At 10 cents a page, that's $10 million. However, that would have to be multiplied for each times 100 billion of company value. Google is worth some $200+ billion. Microsoft is worth some $300+ billion. [End of last year] search: "microsoft is valued at" billion Thus 300K books at 333 1/3 pages each doesn not qualify, as it is only half of 1% of 1%. Google, on the other hand, was not only quite much more serious in the public relations aspect of eBooks but it was also more serious in the amound accomplished. However, as stated elsewhere, I do not predict fullness of what the public was led to expect anytime soon or in the predicted ranges given on December 14, 2004. On Wed, 11 Jun 2008, Jose Menendez wrote: > I thought this deserved its own thread. > > > On June 6, 2008, I wrote: > >> P.S. I'm surprised that no one has mentioned on gutvol-d before >> now that Microsoft was quitting its book scanning operation. > > > The same day Michael Hart replied: > >> Perhaps no one here actually believed Microsoft was serious about >> doing eBooks in the first place. >> >> Perhaps some of the people here realized that Microsoft was not >> going to be happy about losing the Yahoo! deal, and combined it >> with the fact that Yahoo! is also a major supporter of the same >> OCA [Open Content Alliance] that Microsoft was trying to get in >> perhaps yet another kind of takeover bid. > > > Well, if you'd done your homework, Michael, you would have realized > that Microsoft was not only serious, but it was responsible for the > *overwhelming majority* of books scanned by the OCA. You also would > have realized that Yahoo is NOT a "major supporter" of the OCA. > Indeed, in terms of financial support, Yahoo WAS a minuscule supporter > of the OCA compared to Microsoft. (Notice I switched to the past > tense. There's a reason for that.) Want some proof? > > Looking at the Internet Archive's Text Archive page: > > http://www.archive.org/details/texts > > I see "202,578 items" for the American Libraries sub-collection and > "119,424 items" for the Canadian Libraries. Adding them together, we > get a total of 322,002. > > (Those numbers are updated regularly as more books are put online, so > people may see higher numbers later.) > > Now how many of those 322,000+ books were thanks to Yahoo and how many > were thanks to Microsoft? Let's start with "major supporter" Yahoo: > > http://www.archive.org/details/yahoo_books > > I hope you're sitting down when you read this, Michael, because Yahoo > contributed the staggering total of "1,075 items." That's not a typo; > I didn't leave out a few digits. Only 1,075! If we divide 1,075 by > 322,002, we get 0.00334. So only 0.334% of the books the OCA has > scanned and put online from American and Canadian libraries are thanks > to "major supporter" Yahoo! > > Now let's see how Microsoft did: > > http://www.archive.org/details/msn_books > > The total for Microsoft is "288,518 items." (Since the OCA is still > scanning books with funds contributed by Microsoft, this total keeps > getting updated too.) > > 288,518 divided by 322,002 equals 0.896. So 89.6% of the books the OCA > has scanned and put online from American and Canadian libraries are > thanks to the company "perhaps no one here actually believed ... was > serious." > > If you're still tempted to cling to the notion that Microsoft wasn't > serious and that Yahoo is a "major supporter" of the OCA, let's take a > look at Brewster Kahle's own announcement on May 26 about Microsoft's > decision: > > http://www.archive.org/iathreads/post-view.php?id=194217 > > It's not too long, so I'm going to quote the whole thing, with a few > comments from me enclosed in brackets []. > > > "The Internet Archive operates 13 scanning centers in great libraries, > digitizing 1000 books a day. This scanning is financially supported by > libraries, foundations, and the Microsoft Corporation. Today, > Microsoft has announced that it will ramp down their investment in > this area. We very much appreciate their efforts and funding in book > scanning over the last 3 years. As a result, over 300,000 books are > publicly available on the archive.org site that would not otherwise be." > > [Note that Brewster didn't mention ANY financial support from Yahoo. > See why I switched to the past tense and said that "Yahoo WAS a > minuscule supporter of the OCA"?] > > "To their credit, they said they are taking off any contractual > restrictions on the public domain books and letting us keep the > equipment that they funded. This is extremely important because it can > allow those of us in the public sphere to leverage what they helped > build. Keeping the public domain materials public domain is where we > all wanted to be. Getting a books scanning process in place is also a > major accomplishment. Thank you Microsoft." > > [Note the mention of "contractual restrictions." I'll get to those a > little later.] > > "Funding for the time being is secure, but going forward we will need > to replace the Microsoft funding. Microsoft has always encourage the > Open Content Alliance to work in parallel in case this day arrived. > Lets work together, quickly, to build on the existing momentum. All > ideas welcome. > > "Onward to a completely public library system!" > > > Did you notice, Michael, that Brewster didn't say anything about Yahoo > helping to replace the Microsoft funding? > > >> Quite possibly Microsoft found, as did the Federal Spook Agency >> and Co., that Brewster Kahal, who runs the OCA, is not quite an >> easy to maniputulate person as they had assumed. > > > But apparently his surname is easy to misspell. ;) Seriously, do you > recall the "contractual restrictions" I pointed out in Brewster's > announcement? If you look again at the Microsoft page at the Internet > Archive: > > http://www.archive.org/details/msn_books > > you'll see a box labeled "Rights" on the left side of the page. Here's > what it says: > > > "Books scanned before November 1, 2006 are under OCA principles, > thereafter they are available for non-commercial use and may not > appear in commercial services. Please contact info at archive.org or > Microsoft about bulk access." > > > Brewster Kahle may not be "easy to maniputulate [sic]," but he didn't > stop Microsoft from violating the OCA principles it had agreed to. > > > Jose Menendez > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From jmdyck at ibiblio.org Wed Jun 11 13:24:43 2008 From: jmdyck at ibiblio.org (Michael Dyck) Date: Wed, 11 Jun 2008 13:24:43 -0700 Subject: [gutvol-d] age of PG In-Reply-To: References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> Message-ID: <4850348B.1060908@ibiblio.org> Marcello Perathoner wrote: > DP ... produced more books in 8 years than PG in 37 years. and then Michael Hart wrote: > > And, by the way, it's more like 38 years, but who would make > the presumptive call that the author in question pays a more > than lip-service attention to what he is saying. . . . and > Of course, we perhaps have to take into account that it > is outside that author's timeframe of experience to say > just how easy or hard it was starting those 38 years. > > Well, 37.95, or so. Sources I've found indicate that PG started on July 4, 1971, which means that it's very close to 37 years old (36.94 if you like). So it seems that, on this point, Marcello was quite correct, and Michael not so much. See, e.g., http://www.gutenberg.org/newsletter/archive/PGWeekly_2008_01_23.txt -Michael Dyck From hart at pglaf.org Wed Jun 11 18:30:36 2008 From: hart at pglaf.org (Michael Hart) Date: Wed, 11 Jun 2008 18:30:36 -0700 (PDT) Subject: [gutvol-d] age of PG In-Reply-To: <4850348B.1060908@ibiblio.org> References: <484DB329.8060802@novomail.net> <484E6429.7040504@perathoner.de> <4850348B.1060908@ibiblio.org> Message-ID: Sorry, my bad "simple arithmetic". . .or got stuck on an olde typo and never corrected it. . . . We will START our 38th year on July 4. Well, technically, July 5th, as it was after midnight. Michael On Wed, 11 Jun 2008, Michael Dyck wrote: > Marcello Perathoner wrote: >> DP ... produced more books in 8 years than PG in 37 years. > > and then Michael Hart wrote: >> >> And, by the way, it's more like 38 years, but who would make >> the presumptive call that the author in question pays a more >> than lip-service attention to what he is saying. . . . > > and > >> Of course, we perhaps have to take into account that it >> is outside that author's timeframe of experience to say >> just how easy or hard it was starting those 38 years. >> >> Well, 37.95, or so. > > Sources I've found indicate that PG started on July 4, 1971, which > means that it's very close to 37 years old (36.94 if you like). So it > seems that, on this point, Marcello was quite correct, and Michael not > so much. > > See, e.g., > http://www.gutenberg.org/newsletter/archive/PGWeekly_2008_01_23.txt > > -Michael Dyck > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From ebooks at ibiblio.org Thu Jun 12 22:04:47 2008 From: ebooks at ibiblio.org (Jose Menendez) Date: Fri, 13 Jun 2008 01:04:47 -0400 Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> Message-ID: <4851FFEF.3010701@ibiblio.org> On June 11, 2008, Michael Hart wrote: > Several more contradiction to refute: But you didn't "refute" anything. You just spouted more rhetoric and unsubstantiated assertions. > 1. Microsoft was not 100% responsible for 300K titles-- > via the various Internet Archive / Open Content Alliance > efforts that have been going on. Not to mention several > years of effort without Microsoft that produced 100K. I never said that Microsoft was "100% responsible," did I? I wrote "288,518 divided by 322,002 equals 0.896. So 89.6% of the books the OCA has scanned and put online from American and Canadian libraries are thanks to the company 'perhaps no one here actually believed ... was serious.'" 89.6% of 322,002 is not the same as 100%, is it? As for this line of yours, "Not to mention several years of effort without Microsoft that produced 100K," you would have been better off not mentioning it, because it's false. Here's a link to a "Wall Street Journal" article from Nov. 9, 2005: "Building an Online Library, One Volume at a Time" http://online.wsj.com/public/article/SB113111987803688478-VNpw62xi_JA4avE8cxOZf0pf_nM_20061109.html?mod=blogs And here's an excerpt: "The Internet Archive's effort to get books online is still in its early stages. In the little more than a year since the group started scanning books, it has digitized just 2,800 books, at a cost of about $108,250. Funding has come largely from libraries that have paid to have their texts digitized. Work will likely speed up now that Microsoft and Yahoo are on board; both companies joined the effort in October...." Let's see. 100K - 2,800? Congratulations, Michael, you were only off by 97,200! :) By the way, Jim Tinsley posted a link to that "Wall Street Journal" article here on the gutvol-d list back on Nov. 12, 2005: http://lists.pglaf.org/private.cgi/gutvol-d/2005-November/003526.html And Bowerbird also posted a link to that article on the Book People list back on Sept. 4, 2006: http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-09-04,7 So you've had opportunities to read it. > 1d. Hence the term "minuscule" is more appropriate here. > Given the serious orders of magnitude difference between > Microsoft and Yahoo, it was Microsoft's interest in this > world of eBooks that was statistically "minuscule." According to Yahoo! Finance, Microsoft's current "market cap" (market capitalization) is $263.01 billion, and its enterprise value is $228.55 billion. http://finance.yahoo.com/q/ks?s=MSFT Yahoo's current market cap is $32.36 billion, and its enterprise value is $33.95 billion. http://finance.yahoo.com/q/ks?s=YHOO Market cap ratio: 263.01/32.36 = 8.13 Enterprise value ratio: 228.55/33.95 = 6.73 So, in both market capitalization and enterprise value, there isn't even one order of magnitude difference between Microsoft and Yahoo. If we look at their book totals for the OCA, however, we will find "serious orders of magnitude difference." Microsoft's total is now up to 290,123. http://www.archive.org/details/msn_books Yahoo's total is still the piddling 1,075. http://www.archive.org/details/yahoo_books 290,123/1,075 = 269.9 So, despite your rhetoric and false assertions, Michael, Yahoo is still the one whose support for the OCA was "minuscule." > Let's say 300,000 books at 333 1/3 pages each. > > That's 100 million pages. . .100,000,000 > > At 10 cents a page, that's $10 million. It's funny you didn't do the same sort of calculation for Yahoo's OCA contribution. Let's say 1,200 books (I'm rounding up Yahoo's 1,075 total to get nice even results) at 333 1/3 pages each. That's 400,000 pages. At 10 cents a page, that's $40,000. Hmmm.... "Minuscule" may have been too generous an adjective for Yahoo's support of the OCA. > Google is worth some $200+ billion. > > Microsoft is worth some $300+ billion. > > [End of last year] > > search: > > "microsoft is valued at" billion The first thing that struck me here was that you didn't give a value for Yahoo, which would have instantly exposed your false claim of "serious orders of magnitude difference between Microsoft and Yahoo." The second thing that struck me here was how you used the wrong verb tense, after criticizing Josh Hutchinson the other day about his tenses. The "end of last year" is not the present; it's in the past. You should have said, "Google was worth ..." and "Microsoft was worth ..." They're definitely worth less now. The third thing that struck me here was your suggestion to search for "'microsoft is valued at' billion." Here's a tip for you, Michael: There are web sites that people can use to look up detailed financial information about companies. For instance, there's Yahoo! Finance (http://finance.yahoo.com/), which I used earlier to look up the value of Microsoft and Yahoo. If we look up Google's "key statistics," http://finance.yahoo.com/q/ks?s=GOOG we'll see that its current market cap is $173.68 billion, and its enterprise value is $159.11 billion. Jose Menendez From Bowerbird at aol.com Thu Jun 12 23:01:04 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 13 Jun 2008 02:01:04 EDT Subject: [gutvol-d] microsoft digitization Message-ID: i'd like to thank microsoft for the $5 million they generously kicked in to digitize books, before their recent decision to exit the scene. that's a lot more money than i have, for sure! but i trust it won't strain bill's retirement plans. -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/3f3f92a0/attachment.htm From hart at pglaf.org Fri Jun 13 09:58:47 2008 From: hart at pglaf.org (Michael Hart) Date: Fri, 13 Jun 2008 09:58:47 -0700 (PDT) Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: <4851FFEF.3010701@ibiblio.org> References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> <4851FFEF.3010701@ibiblio.org> Message-ID: It would be nice if you could keep your tenses straight, then perhaps people would at least TRY to believe you in the future. . .but based on your past. . .hardly. . . . As for Yahoo value, that's YOUR balikwick. I was talking about Microsoft. I only mention Yahoo as an aside, but have no interest in once again doing your homework for you. You seem to lack a working knowledge of what it means to be responsible, either in MS's case here, or your own. Couldn't you at last PRETEND to consider what your rants and raves will look like years down the road? Taking my quotes out of context won't get you anywhere, you can always quote from a later date with 20/20 hindsight, but no one will be impressed, and you will only make them aware YOU didn't have better figures either, way be when. As this goes on and on, I understand "1984" and rewriting history all the more. . .so I guess I have to make sure it goes as well as possible before I am gone to make it just all the more obvious what you will be doing afterwards. "Morituri te salutamus." Meanwhile, the name the several of you are making for yourselves is nothing I would rely on in the future. On Fri, 13 Jun 2008, Jose Menendez wrote: > On June 11, 2008, Michael Hart wrote: > >> Several more contradiction to refute: > > > But you didn't "refute" anything. You just spouted more rhetoric and > unsubstantiated assertions. > > >> 1. Microsoft was not 100% responsible for 300K titles-- >> via the various Internet Archive / Open Content Alliance >> efforts that have been going on. Not to mention several >> years of effort without Microsoft that produced 100K. > > > I never said that Microsoft was "100% responsible," did I? I wrote > > > "288,518 divided by 322,002 equals 0.896. So 89.6% of the books the > OCA has scanned and put online from American and Canadian libraries > are thanks to the company 'perhaps no one here actually believed ... > was serious.'" > > > 89.6% of 322,002 is not the same as 100%, is it? > > As for this line of yours, "Not to mention several years of effort > without Microsoft that produced 100K," you would have been better off > not mentioning it, because it's false. Here's a link to a "Wall Street > Journal" article from Nov. 9, 2005: > > "Building an Online Library, One Volume at a Time" > http://online.wsj.com/public/article/SB113111987803688478-VNpw62xi_JA4avE8cxOZf0pf_nM_20061109.html?mod=blogs > > And here's an excerpt: > > > "The Internet Archive's effort to get books online is still in its > early stages. In the little more than a year since the group started > scanning books, it has digitized just 2,800 books, at a cost of about > $108,250. Funding has come largely from libraries that have paid to > have their texts digitized. Work will likely speed up now that > Microsoft and Yahoo are on board; both companies joined the effort in > October...." > > > Let's see. 100K - 2,800? Congratulations, Michael, you were only off > by 97,200! :) > > By the way, Jim Tinsley posted a link to that "Wall Street Journal" > article here on the gutvol-d list back on Nov. 12, 2005: > > http://lists.pglaf.org/private.cgi/gutvol-d/2005-November/003526.html > > And Bowerbird also posted a link to that article on the Book People > list back on Sept. 4, 2006: > > http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-09-04,7 > > So you've had opportunities to read it. > > >> 1d. Hence the term "minuscule" is more appropriate here. >> Given the serious orders of magnitude difference between >> Microsoft and Yahoo, it was Microsoft's interest in this >> world of eBooks that was statistically "minuscule." > > > According to Yahoo! Finance, Microsoft's current "market cap" (market > capitalization) is $263.01 billion, and its enterprise value is > $228.55 billion. > > http://finance.yahoo.com/q/ks?s=MSFT > > Yahoo's current market cap is $32.36 billion, and its enterprise value > is $33.95 billion. > > http://finance.yahoo.com/q/ks?s=YHOO > > Market cap ratio: 263.01/32.36 = 8.13 > > Enterprise value ratio: 228.55/33.95 = 6.73 > > So, in both market capitalization and enterprise value, there isn't > even one order of magnitude difference between Microsoft and Yahoo. If > we look at their book totals for the OCA, however, we will find > "serious orders of magnitude difference." Microsoft's total is now up > to 290,123. > > http://www.archive.org/details/msn_books > > Yahoo's total is still the piddling 1,075. > > http://www.archive.org/details/yahoo_books > > 290,123/1,075 = 269.9 > > So, despite your rhetoric and false assertions, Michael, Yahoo is > still the one whose support for the OCA was "minuscule." > > >> Let's say 300,000 books at 333 1/3 pages each. >> >> That's 100 million pages. . .100,000,000 >> >> At 10 cents a page, that's $10 million. > > > It's funny you didn't do the same sort of calculation for Yahoo's OCA > contribution. > > Let's say 1,200 books (I'm rounding up Yahoo's 1,075 total to get nice > even results) at 333 1/3 pages each. > > That's 400,000 pages. > > At 10 cents a page, that's $40,000. > > Hmmm.... "Minuscule" may have been too generous an adjective for > Yahoo's support of the OCA. > > >> Google is worth some $200+ billion. >> >> Microsoft is worth some $300+ billion. >> >> [End of last year] >> >> search: >> >> "microsoft is valued at" billion > > > The first thing that struck me here was that you didn't give a value > for Yahoo, which would have instantly exposed your false claim of > "serious orders of magnitude difference between Microsoft and Yahoo." > > The second thing that struck me here was how you used the wrong verb > tense, after criticizing Josh Hutchinson the other day about his > tenses. The "end of last year" is not the present; it's in the past. > You should have said, "Google was worth ..." and "Microsoft was worth > ..." They're definitely worth less now. > > The third thing that struck me here was your suggestion to search for > "'microsoft is valued at' billion." Here's a tip for you, Michael: > There are web sites that people can use to look up detailed financial > information about companies. For instance, there's Yahoo! Finance > (http://finance.yahoo.com/), which I used earlier to look up the value > of Microsoft and Yahoo. If we look up Google's "key statistics," > > http://finance.yahoo.com/q/ks?s=GOOG > > we'll see that its current market cap is $173.68 billion, and its > enterprise value is $159.11 billion. > > > Jose Menendez > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From marcello at perathoner.de Fri Jun 13 10:43:28 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Fri, 13 Jun 2008 19:43:28 +0200 Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> <4851FFEF.3010701@ibiblio.org> Message-ID: <4852B1C0.1060809@perathoner.de> Michael Hart wrote: > It would be nice if you could keep your tenses straight, ... but then got his Latin all tangled up ... > "Morituri te salutamus." ... and his mathematics is even worse. http://en.wikipedia.org/wiki/Ave_Caesar_morituri_te_salutant -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Fri Jun 13 12:35:14 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 13 Jun 2008 15:35:14 EDT Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) Message-ID: michael said: > "Morituri te salutamus." i hate latin. and greek. but i love google, and the dictionary: > http://www.merriam-webster.com/dictionary/morituri%20te%20salutamus "we (or those) who are about to die salute thee." this is what gladiators are reputed to have said, as a salute to the emperor before their battles... latin is dead, michael. but you still have spunk... but if you do go before me (which is no certainty), i'll save your library from the invading technoids... and the light markup revolution is the future. so if i go before i've saved your library, then _someone_else_will_... -bowerbird p.s. try this one: > http://penelope.uchicago.edu/Thayer/E/Roman/Texts/secondary/journals/TAPA/70/Morituri_Te_Salutamus*.html p.p.s. or this one: > http://en.wikipedia.org/wiki/For_Those_About_to_Rock_We_Salute_You which includes things like this: > The title track's popularity was such that > in every live concert AC/DC has done thereafter, > the song is performed as an encore and is > always accompanied by firing cannons on stage. and > On Nintendo's website, the ad for the > Wii version of Guitar Hero III: Legends of Rock states > "For those about to rock, Wii salute you". which, i guess, takes the cake... ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/e2d7e212/attachment.htm From Bowerbird at aol.com Fri Jun 13 14:11:12 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 13 Jun 2008 17:11:12 EDT Subject: [gutvol-d] take a look at an old book Message-ID: there's old, and then there is _old_, as in incunabula: > http://www.kottke.org/08/06/hypnerotomachia-poliphili -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080613/d4b0195f/attachment.htm From donovan at abs.net Fri Jun 13 16:24:16 2008 From: donovan at abs.net (D Garcia) Date: Fri, 13 Jun 2008 19:24:16 -0400 Subject: [gutvol-d] take a look at an old book In-Reply-To: References: Message-ID: <200806131924.16749.donovan@abs.net> On Friday 13 June 2008 17:11, Bowerbird at aol.com wrote: > there's old, and then there is _old_, as in incunabula: > > http://www.kottke.org/08/06/hypnerotomachia-poliphili > > -bowerbird And for those of you who weren't already aware of the existence of this book, see PG # 18459, posted 2006-05-27 and produced by Distributed Proofreaders from a fascimile reprint. From hart at pglaf.org Sat Jun 14 14:34:44 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 14 Jun 2008 14:34:44 -0700 (PDT) Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: <4852EB86.6060507@perathoner.de> References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> <4851FFEF.3010701@ibiblio.org> <4852B1C0.1060809@perathoner.de> <4852EB86.6060507@perathoner.de> Message-ID: The author below STILL hasn't done his homework. . .sadly to say, but par for his tour of the course. 1. The way this salute is intended is to give an added motivation to the gladiators to WIN. . .then they have NOT saluted, as their words were only true IF THEY DIED. We who are NOT about to die do NOT salute you!!! 2. There were gladiators for centuries before Caesar. . .duh!!! Do how could Caesar's version be the original???. . .duh!!! 3. This is what happens when someone rewrites history. "To the victors belong the spoils" and just ONE of the spoils is being able to rewrite history to one's liking. . . . The author below could quote searches that indicate at a ten to one ratio, or so, that his quotation is the correct one. But that is just a memento of the FACT that Julius Caesar rewrote history so damned thoroughly that even our historians quote his "Hail Caesar" ten times as often as what the salute actually was for hundreds of years before Caesar came along. 4. "The fault lies NOT in the stars, dear Marcello, the fault lies in ourselves. . . ." History, even when it is there to be read, is most often left to the interpretations of others. . .sadly to say. It's all too obvious. . .doesn't always need interpreters. On Fri, 13 Jun 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >> "amus" is "we". . .not "ant". . . . >> >> For those of you who never actually took Latin. > > Oh, how I wish! I actually "was taken" to Latin much against my > will. > > >> _I_ for one, at least, have not crowned YOU Caesar. >> >> "We who are about to die salute you" >> >> Is different than: >> >> "Those who are about to die salute you." > > "We who are about to die salute you" is wrong, because everybody > was saluting but not everybody that saluted was going to die. > > "Those who are about to die salute you" is right because everybody > who died, had saluted. > > Check the locus classicus, the Life of Claudius by Suetonius, > 21.6. : > > http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6 > > But you seem to prefer getting your quotes wrong, (or quoting > people that got their quotes wrong) because you can't be bothered > doing your homework. > > >> But, then, truly, YOU, have always insisted on being a Caesar >> in a world where there are no Caesars allowed. . . . > > Last time it was Stalin, this time it is Caesar, who will it be > next? Napoleon? Hitler? Bush? > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From marcello at perathoner.de Sat Jun 14 15:19:27 2008 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 15 Jun 2008 00:19:27 +0200 Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> <4851FFEF.3010701@ibiblio.org> <4852B1C0.1060809@perathoner.de> <4852EB86.6060507@perathoner.de> Message-ID: <485443EF.30006@perathoner.de> Michael Hart wrote: > The author below STILL hasn't done his homework. . .sadly to say, > but par for his tour of the course. It is bad manners to post a private mail you received without asking the sender first. Also, I have a name, and your "the author below" affectation is childish at best. Also, why frontchannel this discussion again after we held it private for a while? > But that is just a memento of the FACT that Julius Caesar > rewrote history so damned thoroughly that even our historians > quote his "Hail Caesar" ten times as often as what the salute > actually was for hundreds of years before Caesar came along. How could Julius Caesar have rewritten history that happened a hundred years after he was dead? I did give you a link to the only classical mention in Latin of the phrase you (mis-)quoted. Here it is again: http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6 ---- Life of Claudius by Suetonius, 21.6. If you had bothered to check the reference I gave you, you would easily have spotted the fact that Suetonius was talking about Tiberius Claudius Caesar (10 BC - AD 54) and not about Gaius Julius Caesar (100 BC - 44 BC). You ranted about the *wrong* Caesar! Embarassing, Michael, embarassing. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Sat Jun 14 15:21:18 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 14 Jun 2008 18:21:18 EDT Subject: [gutvol-d] microsoft digitization Message-ID: i said: > that's a lot more money than i have, for sure! and even if i did have $5 million dollars, or $25 million, i probably wouldn't donate much of it to digitize books. nope, i'd probably spend it on hookers and blow... :+) -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080614/c657c670/attachment.htm From grythumn at gmail.com Sat Jun 14 15:34:18 2008 From: grythumn at gmail.com (Robert Cicconetti) Date: Sat, 14 Jun 2008 18:34:18 -0400 Subject: [gutvol-d] take a look at an old book In-Reply-To: <200806131924.16749.donovan@abs.net> References: <200806131924.16749.donovan@abs.net> Message-ID: <15cfa2a50806141534x36b6a361p712a69fed055baac@mail.gmail.com> On Fri, Jun 13, 2008 at 7:24 PM, D Garcia wrote: > On Friday 13 June 2008 17:11, Bowerbird at aol.com wrote: >> there's old, and then there is _old_, as in incunabula: >> > http://www.kottke.org/08/06/hypnerotomachia-poliphili >> >> -bowerbird > > And for those of you who weren't already aware of the existence of this book, > see PG # 18459, posted 2006-05-27 and produced by Distributed Proofreaders > from a fascimile reprint. Just as a note, that edition is a facsimile reprint of an early (~1592) partial english translation of the latin/italian original (which is much prettier, FWIW). The first complete english translation was published relatively recently (1999). R C (PM of said project) From hart at pglaf.org Sat Jun 14 15:37:48 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 14 Jun 2008 15:37:48 -0700 (PDT) Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: <485443EF.30006@perathoner.de> References: <4848ED25.1040202@ibiblio.org> <484F6E1E.3090206@ibiblio.org> <4851FFEF.3010701@ibiblio.org> <4852B1C0.1060809@perathoner.de> <4852EB86.6060507@perathoner.de> <485443EF.30006@perathoner.de> Message-ID: On Sun, 15 Jun 2008, Marcello Perathoner wrote: > Michael Hart wrote: > >> The author below STILL hasn't done his homework. . .sadly to >> say, >> but par for his tour of the course. > > It is bad manners to post a private mail you received without > asking the sender first. It is equally bad manners to send an ex parte reply to a public message. . .please do not do so again. If _I_ did that, it was in error, either that the message to you arriveed and the message to the list did not, or operator error, but it was not intentional. > Also, I have a name, and your "the author below" affectation is > childish at best. This is toinsure everyone knows that I am not making this case as an ad hominem secnario, but just to keep things honest. Using your name would make it too personal. > Also, why frontchannel this discussion again after we held it > private for a while? I am not trying to make this a private discussion, and I thought all our messages were going through the list. > > >> But that is just a memento of the FACT that Julius Caesar >> rewrote history so damned thoroughly that even our historians >> quote his "Hail Caesar" ten times as often as what the salute >> actually was for hundreds of years before Caesar came along. > > How could Julius Caesar have rewritten history that happened a > hundred years after he was dead? "before he was dead". . .don't you READ before your reply??? How do you expect ANYONE to take you seriously??? There are some 22,000 links to the search I did, and they date back to 4th Century BC, which is way before any of the Caesar's time. > I did give you a link to the only classical mention in Latin of > the phrase you (mis-)quoted. Here it is again: > > http://penelope.uchicago.edu/Thayer/L/Roman/Texts/Suetonius/12Caesars/Claudius*.html#21.6 > > ---- Life of Claudius by Suetonius, 21.6. > > > If you had bothered to check the reference I gave you, you would > easily have spotted the fact that Suetonius was talking about > Tiberius Claudius Caesar (10 BC - AD 54) and not about Gaius > Julius Caesar (100 BC - 44 BC). > > You ranted about the *wrong* Caesar! > > Embarassing, Michael, embarassing. That's the WHOLE POINT. . .AND /YOU/ MISSED IT. . . . There were NOT any "Caesars" before Julius. . . . Thus ALL references to "Caesars" are hundreds of years AFTER the origins of gladiators and their salutes. Oh, you also missed the point about their motivativation, as ONLY those who were about to die were saluting Caesar. Repeat: Please don't send me private replies to public messages, they will always go back to the list. > > > -- > Marcello Perathoner > webmaster at gutenberg.org > From hart at pglaf.org Sat Jun 14 15:44:15 2008 From: hart at pglaf.org (Michael Hart) Date: Sat, 14 Jun 2008 15:44:15 -0700 (PDT) Subject: [gutvol-d] Leaving This List Message-ID: I am going to do a trial run of not answering the various rants and raves from the well-known and even lesser-known tag team flame warriors on this list, and let Greg Newby, our CEO, advise me when I should reply. Obviously the recent events have not been constructive to any degree, and I only reply for the sake of making sure, to whatever degree I can, that honesty and accuracy are a value someone is trying to keep alive. No one has sent me any messages stating that there is any real need to refute this handful of pretenders, so I will simply await requests from our general population or Greg Newby. . .unless I have a day that is too boring. . . . Michael From gbnewby at pglaf.org Sat Jun 14 16:33:00 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Sat, 14 Jun 2008 16:33:00 -0700 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: References: Message-ID: <20080614233300.GH23938@mail.pglaf.org> On Wed, Jun 11, 2008 at 02:05:10PM -0400, Bowerbird at aol.com wrote: > michael said: > > his major goal is to cause strife in our midst > > i keep wondering why you have this person _inside_ your organization, > minding your website. certainly web-jockeys can't be that hard to find. Marcello has done, and continues to do, outstandingly good volunteer service in maintaining the gutenberg.org Web site. That has no bearing on his freedom to express himself on the gutvol-d list, anymore than anyone else. For Michael, or me, to seek to turn down major contributions due a disagreement, argument, etc. would be pretty inconsistent with the overall management (or "non-management") of PG. > indeed, it seems to me that there are a lot of vultures around here, > just waiting for you to die, so they can feast on the p.g. carcass and > turn it into the opposite of what you intended which made it great... There are a variety of reasons why "taking over" isn't so easily done, with or without Michael's involvement. The most important, I think, is that taking over really means doing a lot of work to create something new, or taking the current PG & augenting it...evidently, there aren't too many folks ready to do that, even WITH Michael's encouragement. Some have, though. Thus, we have: PGDP [pgdp.net] PGCC [gutenberg.us] plus national sites like PG Canada Plus those who decided the PG way wasn't their way, and did something pretty different. It's not hubris to put archive.org in that group. > of course, maybe once you're gone, and they take over, and p.g. fails, > that will be the ultimate proof that your approach was the correct one. > then someone else will come in and pick up the pieces and restore it... > crafty, michael, crafty... :+) PG as a collection is pretty resilient...hard to make it go away, by design. Things like the Web site, catalog, and other metata are also pretty resilient, though take a somewhat delicate infrastructure to maintain. Things like mailing lists are transient, not mission critical. I don't know what you mean by failure. The work that's been done, is done -- the fruits of that labor are available, and will remain available. I can think of various things that might indicate the end of PG as it is now [for example, having no new content to add to the collection], but how is that failure for PG? -- Greg From Bowerbird at aol.com Sat Jun 14 18:15:37 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 14 Jun 2008 21:15:37 EDT Subject: [gutvol-d] good tools that do all the work (the short version) Message-ID: greg said: > Marcello has done, and continues to do, outstandingly good > volunteer service in maintaining the gutenberg.org Web site. i'm glad to hear it. :+) > That has no bearing on his freedom to express himself > on the gutvol-d list, anymore than anyone else. i'm glad he expresses himself too. even if i don't find anything worthwhile when he does. (and heck, he's been in my spam folder for a while now, so maybe his more-recent posts have been nicely-reasoned pieces of logic.) still... it's pretty clear what he thinks about michael... and i wouldn't give my house-keys to a sworn enemy. (but, you know, maybe that's just me.) i also think it's quite ironic that michael seems to get a lot more respect from the outside world than he gets right here on the p.g. listserve. but i guess that's what they say about prophets... > I don't know what you mean by failure. i could describe some scenarios, but that would be conjecture. the files you have now -- and the ones you will gain in the future -- will remain available, so if you consider that collection to be "success", now and into the future, then no, there's no way that p.g. can "fail"... of course, using such a definition, the university of virginia collection would also be considered a "success". but i wouldn't wanna be them... so, how do _you_ define "success" and "failure"? -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080614/65769fc0/attachment.htm From grythumn at gmail.com Sat Jun 14 22:22:52 2008 From: grythumn at gmail.com (Robert Cicconetti) Date: Sun, 15 Jun 2008 01:22:52 -0400 Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) In-Reply-To: References: <4851FFEF.3010701@ibiblio.org> <4852B1C0.1060809@perathoner.de> <4852EB86.6060507@perathoner.de> <485443EF.30006@perathoner.de> Message-ID: <15cfa2a50806142222y74d148deqfe76ae1b4446bfc@mail.gmail.com> On Sat, Jun 14, 2008 at 6:37 PM, Michael Hart wrote: >> It is bad manners to post a private mail you received without >> asking the sender first. > > It is equally bad manners to send an ex parte reply to a public > message. . .please do not do so again. It is polite practice in most online communities to take discussions off list when they go offtopic, or degenerate into flame wars*, in respect for the time of others on the list not directly involved. http://www.albion.com/netiquette/rule7.html http://www.dtcc.edu/cs/rfc1855.html#3 This thread ceased being amusing quite a while ago. R C (* excluding forums set aside for flaming, of course.) From Bowerbird at aol.com Sun Jun 15 01:18:32 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 15 Jun 2008 04:18:32 EDT Subject: [gutvol-d] Microsoft quits mass book digitization (was cyberlibrary numbers) Message-ID: robert said: > It is polite practice in most online communities > to take discussions off list when they go offtopic, > or degenerate into flame wars* oh my goodness, we're talking about "polite practice" now? what a kick! as if this "discussion" has _ever_ been "polite"... that's a knee-slapper if i ever heard one... > This thread ceased being amusing quite a while ago. well, hey, robert, you just changed that! :+) michael's position on this question was very straightforward: microsoft was never all that serious about book digitization, so there's no reason to mourn now that they have opted out. michael predicted at the start that they wouldn't stick around. now maybe someone came up with a well-reasoned argument against his positions, and i don't know about it because they're in my spam folder, in which case i hope someone will repeat it... but i'd guess that instead it was just the typical run-of-the-mill gamut of insults thrown at michael in the hope that the lurkers would fail to sort the barrage to determine that _nothing_stuck_. as far as _i_ am concerned, microsoft corrupted the "purity" of the o.c.a., all for a positively _tiny_ amount of money in the big scheme of things (you know, the arena where they made a bid to buy yahoo for _$44_billion_), so i'm _glad_ they've left our little neighborhood, as now brewster can go back to being true to his basic philosophy... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/1871690f/attachment.htm From walter.van.holst at xs4all.nl Sun Jun 15 05:16:31 2008 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Sun, 15 Jun 2008 14:16:31 +0200 Subject: [gutvol-d] good tools that do all the work (the short version) In-Reply-To: <20080614233300.GH23938@mail.pglaf.org> References: <20080614233300.GH23938@mail.pglaf.org> Message-ID: <4855081F.2040309@xs4all.nl> Greg Newby wrote: > I don't know what you mean by failure. The work that's been done, is > done -- the fruits of that labor are available, and will remain > available. I can think of various things that might indicate the end of > PG as it is now [for example, having no new content to add to the > collection], but how is that failure for PG? And that touches exactly the point of contention. PG and about any other open content/open source/free software project that has produced anything of interest cannot fail in the sense of being completely in vain as long as that what has been produced is still accessible. PG however does fail in the sense that it doesn't even grab relatively low hanging fruit. It is a failure in the sense of unfulfilled, or at least much slower than possible fulfilled, potential. The useless flamewars on this will keep raging on as long as the participants of this list do not acknowledge that there is more than one failure mode and that PG is in the second one. Even someone as the otherwise in matters of engineering greatly misguided Leslie "Bowerbird Intelligentleman" Hanson seems at least able to make some effort to grasp said low hanging fruit. However sympathetic the Zen-like way of running PG may be, letting thousands of flowers bloom does not rule out the possibility to at least express a slight preference for quality control and structure. There is a vast spectrum between the "we're not going to do any form of quality control other than copyright clearance" approach currently taken and a borderline fascist insistence on strict adherence to TEI-formatting as the other extreme. It is a pity and a shame that all this has to deteriorate into a clash of massive egos every time this comes up. As others have said already, this has indeed ceased to be entertaining quite a while ago. Regards, Walter From Bowerbird at aol.com Sun Jun 15 08:44:01 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 15 Jun 2008 11:44:01 EDT Subject: [gutvol-d] good tools that do all the work (the short version) Message-ID: walter said: > Leslie "Bowerbird Intelligentleman" Hanson do please leave my girlfriend out of the discussion. thanks. :+) otherwise, relatively well-put, walter... the main place where you are off-base is that, on your "vast spectrum", it is possible to grab all kinds of fruit (not just the "low-hanging" variety) by moving just a _smidgen_ away from the "zen-like" side, without any need at all to go anywhere near the "fascist" end, and once you see me making that happen, you'll realize i wasn't "misguided", but right on-target... and once greg realizes the huge benefits that become available from the small bump that consistency adds to the cost side of the equation, he too will come to see that p.g. in its current state really is a "failure". a very fortunate, happy "failure", to be sure, but a failure nonetheless... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/fa3b45bd/attachment.htm From hart at pglaf.org Sun Jun 15 11:11:51 2008 From: hart at pglaf.org (Michael Hart) Date: Sun, 15 Jun 2008 11:11:51 -0700 (PDT) Subject: [gutvol-d] Subject: !@! Re: Microsoft quits mass book digitization. . . . Message-ID: Please, as a matter of decent practice on this list, at least ANNOUNCE that you are speaking off list and send that announcement BOTH in a message to the list AND to make sure the person you are emailing knows. On Sun, 15 Jun 2008, Robert Cicconetti wrote: > On Sat, Jun 14, 2008 at 6:37 PM, Michael Hart wrote: >>> It is bad manners to post a private mail you received without >>> asking the sender first. >> >> It is equally bad manners to send an ex parte reply to a public >> message. . .please do not do so again. > > It is polite practice in most online communities to take discussions > off list when they go offtopic, or degenerate into flame wars*, in > respect for the time of others on the list not directly involved. > > http://www.albion.com/netiquette/rule7.html > http://www.dtcc.edu/cs/rfc1855.html#3 > > This thread ceased being amusing quite a while ago. > > R C > (* excluding forums set aside for flaming, of course.) > From tb at baechler.net Sun Jun 15 12:12:58 2008 From: tb at baechler.net (Tony Baechler) Date: Sun, 15 Jun 2008 12:12:58 -0700 Subject: [gutvol-d] Preprints: Hart/Newby HOPE 6 presentation uploaded Message-ID: <20080615191258.GA30136@investigative.net> All, I wouldn't normally post this type of announcement to the list, but I saw that the Wilson SF books from Preprints were recently posted so I thought I would announce here to avoid duplication of effort. I have processed and uploaded the HOPE number 6 presentation with Michael Hart and Greg Newby. It's on the pglaf.org server, but who knows when it will be posted. If anyone wants the files before they're officially posted, ask here and I'll contact you off list. The mp4 file is fairly huge but the audio files aren't that big. Again, this announcement is informational only to avoid duplicated effort. From julio.reis at tintazul.com.pt Sun Jun 15 14:46:11 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Sun, 15 Jun 2008 22:46:11 +0100 Subject: [gutvol-d] No (more) flames In-Reply-To: References: Message-ID: <1213566371.7712.165.camel@abetarda> Robert Cicconetti said, > It is polite practice in most online communities to take > discussions > off list when they go offtopic, or degenerate into flame > wars*, in > respect for the time of others on the list not directly > involved. and > This thread ceased being amusing quite a while ago. That's how I feel. I subscribe this list because I'm a volunteer for PG, mostly working in DP, inviting others to look at PG, and working the PG catalog. What I get from this list mostly is that Marcello and Michael hate each other, for a reason or twelve; that Bowerbird hates DP and most people there (and here) hate him, for a reason or one hundred; and that's it. Not much that really interests Gutenberg volunteers like myself. And yet, this list is called gutvol-d. I'm interested in ebooks, not in power struggles, or in name-calling, or in why DP treats its vols so badly, or in why that other markup is mo much better than this one. So I ask myself: why do I still subscribe to this list? The answer is, I expect the discussions to change. I don't know what to say. The most positive thing that happened over here lately was Michael deciding not to reply to it anymore. A bit sad, really. My question is -- will everybody else stop the off-topic stuff? J?lio. PS -- Or how about starting gutrant-d just to debate the bad leaders and the bad markup and the bad treatment of volunteers and other bad stuff? From Bowerbird at aol.com Sun Jun 15 15:54:22 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 15 Jun 2008 18:54:22 EDT Subject: [gutvol-d] No (more) flames Message-ID: julio said: > that Bowerbird hates DP and most people there (and here) hate him, > for a reason or one hundred; and that's it. oh please. you seem to be very bad at figuring out the truth. so here, let me spell it out for you, very clearly... 1. i don't "hate" distributed proofreaders. to the direct contrary, i love it. thousands of volunteers digitizing the public domain, how can you not? what i _do_ "hate" is the fact that too much time and energy from these thousands of volunteers is _wasted_ by a tremendously bad workflow... i've detailed the numerous problems with this workflow _many_ times, yet the d.p. "powers that be" just drag their feet on making corrections, mostly because they don't like to have their authority challenged, at all, let alone by a vocal critic who can muster up the power of logic like i can, and who'll do the work of building up a mountain of supportive evidence. that's why they silenced me on their own forums. that way, as time went on, they've been able to implement many of the changes that i'd suggested, but without "conceding" that i was correct. i'll document these changes later on. it should be very clear to you, and everyone else, that i spend a lot of time researching and writing my posts. i'm willing to give that time for the cause precisely because i do _love_ the volunteers for project gutenberg and d.p. for something i don't care about (university of virginia?), i spend zero time. 2. most of the people at distributed proofreaders don't even _know_ me, let alone "hate" me. of the ones that _do_ know me, many of them realize my intentions come from a good heart. even the ones who won't grant that have come to learn (painfully) they cannot mount an argument against me. you can't find one case where i've given d.p. bad advice... not a single one. 3. most of the people _here_ on this listserve do not "hate" me either... they don't like all the flack, but the vast majority of them clearly know (and will tell you) that my antagonists are responsible for that, not me... i'm glad that you are speaking up to say you are tired of all the flack. i'm tired of it too. i've been tired of it for a very, very, very long time... so i hope the people who are _responsible_ for the flack get the message. we want it to stop. we want to have quiet, rational discussions on this list. > gutrant-d if you think i write "rants", you're not reading the evidence i provide... -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080615/dae7ba3f/attachment.htm From grythumn at gmail.com Sun Jun 15 16:57:10 2008 From: grythumn at gmail.com (Robert Cicconetti) Date: Sun, 15 Jun 2008 19:57:10 -0400 Subject: [gutvol-d] New question (Was No (more) flames) Message-ID: <15cfa2a50806151657v31a7975dy2b34f2abbd7be38a@mail.gmail.com> On Sun, Jun 15, 2008 at 5:46 PM, J?lio Reis wrote: > Robert Cicconetti said, >> It is polite practice in most online communities to take >> discussions >> off list when they go offtopic, or degenerate into flame >> wars*, in >> respect for the time of others on the list not directly >> involved. > > and > >> This thread ceased being amusing quite a while ago. > > That's how I feel. > [...] > I'm interested in ebooks, not in power struggles, or in name-calling, or > in why DP treats its vols so badly, or in why that other markup is mo > much better than this one. So I ask myself: why do I still subscribe to > this list? The answer is, I expect the discussions to change. Well, let me pull out a real question that I've been working on.. I have a clearance on most of the OED. I'm trying to figure out what 'final format' to shoot for, as this is going to require a lot of markup not standard for DP, and I'll probably have to devise a simplified or condensed form for the formatting rounds. My top candidates right now are 1) A flavor of TEI (leaning towards freedict standards), or 2) XDXF. It's clear from looking at the text that semantic markup is going to be easier than presentational for this project, as many of the style differences are quite subtle. Does anyone have any experiences or recommendations to share? R C From gbnewby at pglaf.org Mon Jun 16 10:51:46 2008 From: gbnewby at pglaf.org (Greg Newby) Date: Mon, 16 Jun 2008 10:51:46 -0700 Subject: [gutvol-d] OED Message-ID: <20080616175146.GA23026@mail.pglaf.org> On Sun, Jun 15, 2008 at 5:46 PM, J\372lio Reis Robert Cicconetti said, > >Well, let me pull out a real question that I've been working on.. I >have a clearance on most of the OED. I'm trying to figure out what >'final format' to shoot for, as this is going to require a lot of >markup not standard for DP, and I'll probably have to devise a >simplified or condensed form for the formatting rounds. My top >candidates right now are 1) A flavor of TEI (leaning towards freedict >standards), or 2) XDXF. > >It's clear from looking at the text that semantic markup is going to >be easier than presentational for this project, as many of the style >differences are quite subtle. Does anyone have any experiences or >recommendations to share? Robert, I don't really know the answer...the OED is immensely complex, as you know..lots of typography, fonts, etc. But I wanted to say: GO FOR IT! This is a massive project, and really, really important. Having it be machine readable will be a wonderful contribution. -- Greg From Bowerbird at aol.com Mon Jun 16 11:13:55 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 16 Jun 2008 14:13:55 EDT Subject: [gutvol-d] 6 weeks of nothing been done Message-ID: it's now been 6 weeks since the "confidence in page" page has been updated over on the distributed proofreaders wiki. basically, the person who was working on the task has been swallowed up by real-world responsibilities, meaning that this particular wild goose chase has come to a complete halt. so the fixed-round workflow continues to remain ensconced. this is a shame, because it impacts both quantity _and_ quality. quantity is dampened because some pages are being handled _too_many_times_. they were easy enough to be perfected early, and the subsequent rounds are simply needlessly wasted energy. and quality is hurt as well, because some pages don't get seen _enough_ times, so flaws remain because of insufficient views. so even though "the d.p. powers that be" _agree_ that they need a roundless workflow, they aren't doing anything to bring it about. (which means they don't really grasp the importance of it after all, because if they did, they would _work_ to make it a high priority.) what _is_ happening is that people are doing all kinds of ad-hoc, one-off "round juggling" -- repeating p1, skipping rounds, etc., which helps only a little bit and is mostly just a big waste of time that is tolerated because it vents the volunteers' frustration and impatience with the broken nature of the fixed-round system... and meanwhile, the answer to "how can we be confident that a specific page needs no more proofing?" is as clear as it ever was, namely, when it undergoes _n_ iterations without any changes, where _n_ can be any number you want it to be, i suggest _2_... if they just implemented this simple solution, they'd see it works. -bowerbird ************** Vote for your city's best dining and nightlife. City's Best 2008. (http://citysbest.aol.com?ncid=aolacg00050000000102) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080616/7989f25b/attachment.htm From julio.reis at tintazul.com.pt Tue Jun 17 03:15:25 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Tue, 17 Jun 2008 11:15:25 +0100 Subject: [gutvol-d] No (more) flames In-Reply-To: References: Message-ID: <1213697725.6578.49.camel@abetarda> Bowerbird said, > i'm glad that you are speaking up to say you are tired of all the > flack. i'm tired of it too. i've been tired of it for a very, very, > very long time... I may not be very good at figuring out the truth :D but I don't dispute people's feelings. So you're tired of all the flak... I'm with you; just note that sentences such as this one actually *attract* flak: > oh please. you seem to be very bad at figuring out the truth. And your following remark sounds a tad patronizing, which might attract even *more* flak: > so here, let me spell it out for you, very clearly... Take care not to fan the flames. :) Anyway, good to know you love DP, honest I couldn't tell from your posts here. You say the DP bosses treat their proofers badly -- but it is *you* who make them feel bad for being treated like this or like that. So spread that love around and improve the condition for ebook volunteers somewhere; if your mojo won't work at DP, help somewhere else. Do something creative. I've seen z-m-l.com which might or might not be a good idea; try other stuff, somewhere else. I wish there was real competition to DP, and to Gutenberg; even because if competition to DP would increase the throughput of free texts, then it's really not competition; same for another huge free ebook library not really being competition to Gut, but working towards the same goal: free texts. J?lio. From Bowerbird at aol.com Tue Jun 17 04:39:51 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 17 Jun 2008 07:39:51 EDT Subject: [gutvol-d] No (more) flames Message-ID: julio said: > just note that sentences such as this one actually *attract* flak: as do sentences like: > Bowerbird hates DP and most people there (and here) hate him c'est la vie... > Take care not to fan the flames. :) i take care not to get burned. after that, let the flames burn what they will... > Anyway, good to know you love DP, > honest I couldn't tell from your posts here. seriously? people go to great lengths to defend the people they love. > You say the DP bosses treat their proofers badly -- > but it is *you* who make them feel bad for being treated like this well, yes, and that is a delicious conundrum. when a person you love has taken up with someone who treats them badly -- and we all know that this happens all the time -- do you say something? or do you let them suffer in silence? it's always a difficult question to answer, and the answer, it seems to me, often depends upon just how bad the treatment is... and when you can document that the treatment is _very_ bad, you squawk... especially when you can easily make that documentation, and clearly show the treatment is _very_ bad, you squawk... so i'm squawkin'... > So spread that love around and > improve the condition for ebook volunteers somewhere; well, i'm really trying to decide how i can do that, julio, i really am... > if your mojo won't work at DP, help somewhere else. yeah, i'm not above thinking that. i'm really not... but there is something... something very insidious... about _forking_... it's not good for collaborative projects... it's really not... so i have a large degree of reluctance when it comes to forking... > Do something creative. > I've seen z-m-l.com which might or might not be a good idea; well, in case you can't decide, it's an excellent idea... and as i fork the p.g. library, you'll come to realize that... (but again, i fork with extreme reluctance...) > try other stuff, somewhere else. no need to. the test bed has already been seeded... > I wish there was real competition to DP, and to Gutenberg; well, again, i'm not sure, because forking is not a healthy thing... > even because if competition to DP would > increase the throughput of free texts, > then it's really not competition; > same for another huge free ebook library > not really being competition to Gut, but > working towards the same goal: free texts. well yeah, it's not that something else is "competition". because -- as you've noted -- it's pretty much all complementary... online resources are _not_limited_, so they do not have a zero-sum relationship with one another, so they can't cannibalize each other, so the situation of "competition" doesn't really exist in their world... this is a big part of michael's overall philosophy, and it does work... but forking _is_ nonetheless a real danger, because it splits resources... it disturbs synergy, and synergy is the be-all and end-all of collaboration. i could've mounted an alternative to distributed proofreaders long ago. (at one point, i announced one, named "committed proofreaders", but..) but the thing is, i don't really want to create and nurture a _community_, which is what you would have to do, if you wanted to create another d.p. the technology is one thing, but the ass-kissing aspect is quite another... and again, the disruption of synergy is all-important. but yes, my frustration has grown to the point where i am _willing_ to entertain the thought of doing forking... and you'll see that manifesting in my posts in the immediate future... -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080617/9442b7c8/attachment.htm From Bowerbird at aol.com Tue Jun 17 12:37:40 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 17 Jun 2008 15:37:40 EDT Subject: [gutvol-d] the great preprocessing escapade Message-ID: wow. where do i begin? one of the weakest of the weak links in the d.p. workflow chain is "preprocessing". this step actually happens _after_ o.c.r., but _before_ the first proofing round, so d.p. calls it "preprocessing". i have made rational arguments -- long and hard -- that d.p. could help their quantity _and_ their quality by improving their preprocessing routines. as usual, the "powers that be" from d.p. have pretended to ignore me, tried to attack my credibility, and implemented some of my suggestions in a roundabout way so they wouldn't have to give me credit. (i do not need credit, but very often the roundabout implementations get details wrong.) and mostly, they just continue on with shoddy preprocessing... you can see where i've talked about this topic on the d.p. forums by searching messages for "preprocessing" or "pre-processing". here's a typical thread, this one appearing in january of 2007: > http://www.pgdp.net/phpBB2/viewtopic.php?t=24634 this thread is notable because i shared a useful hint with them, on the topic of how to fix "spacey quotes" -- quotemarks that have a space on both sides, which is not uncommon from o.c.r.: > the secret of fixing doublequotes > is counting them within-paragraph. now, i'll grant you that that "secret" is _pretty_obvious_ once you hear it, but i can assure you _most_ people hadn't thought of it, including programmers of the current d.p. preprocessing tools. indeed, dkretz responded by saying: > Of course. Thanks, I hadn't thought of that. i followed up with some elaboration, and he included it in his clean-up apps, and i believe he has had much success with it. but dkretz is about the only person who seemed to be listening. indeed, that thread is a good illustration of how i have helped d.p. -- or tried to, anyway -- and how they've ignored my suggestions. you'll find a ton of good advice from me, just in that one thread... it also illustrates how d.p. has intentionally set obstacles in my path. for instance, you can read how i offered to correct their _database_ -- i.e., the text for _all_of_their_projects -- in one fell swoop, but they refused to give me a copy of that database -- a 1.5-gig file -- and suggested instead that i download each of the 18,000+ e-texts _individually_, over the course of a good number of days. stupidity! and then they objected to _that_ because it would "strain their server". so this is how ridiculous the d.p. "powers that be" have treated me... so -- needless to say -- they've been stuck with lousy preprocessing. and thus, i was cheered last december, when rfrank (roger frank) announced he was working on an improved preprocessing tool... > http://www.pgdp.net/phpBB2/viewtopic.php?t=30903 you'll even see that, in regard to fixing spacey quotes, dkretz repeated my suggestion about the secret of doing it correctly: > http://www.pgdp.net/phpBB2/viewtopic.php?p=403153#403153 that thread ended after just 2 days. but over the past 6 months, rfrank has been using cpprep, to good effect, from what i can tell. a search for "cpprep" turns up lots of instances where it was used, and the reports of the fixes made show it saves work for proofers. *** so i expected this to be an upbeat message about how d.p. improved. however... when i looked at some of the early cpprep results, i was disappointed to find that it was simply _marking_ the spacey doublequotes, and not actually _fixing_ 'em. but i figured that, over time, they would observe that the fixes the program _would_have_ auto-applied were _correct_, in the vast majority of cases, and thus could be made with confidence, and they would flip the automatic-fix switch. in looking at more recent cpprep output, however, i find that not only have they not flipped the auto-fix switch, but have actually _regressed_ to where they are marking both types of spacey-quotes the same way! that is, both the probable-open and probable-close spacey-quotes are being marked with an asterisk in front of them, indicating the tool has become _less_ certain of its ability to discern them, and not more so. i haven't confirmed this on a number of files, because it's so hard to know just exactly which files were treated in which way, but the one file where i found these results is clearly a very recent one, and did have cpprep on it: > http://www.pgdp.net/c/project.php?id=projectID481cf3f654893 > http://www.pgdp.net/c/project.php?id=projectID481cf42c2d365 there are 2 projects listed, because this was a parallel-p1 experiment. in case you are interested, the project just released from the p2 queue: > http://www.pgdp.net/c/project.php?id=projectID4836c3cc2a3f5 i will detail the results of this experiment in a later post, but for now, i'll note that this method of marking spacey-quotes (with an asterisk) actually seems to be _counter-productive_... there are several cases where there was a difference between the two parallel p1 proofings on the spacey-quotes, where one proofing was _downright_wrong_. it wasn't just _missed_, it was acted upon and turned into an _error_... (there might be more where both p1 proofings made the same error.) that's not good. furthermore, several of the other parallel-differences indicate that the preprocessing that is being done is fairly lame. i make this judgment even though -- in the project comments -- rfrank said that the project "has undergone significant preprocessing". that's really unfortunate, because it indicates that d.p. still has a very long road to travel before they have attained an adequate preprocessing tool. in the meantime, proofers are having to make corrections the machine could be fixing... i certainly would have hoped that 6 months of work from rfrank would have resulted in a lot more progress toward a good preprocessing tool. -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080617/b284ff78/attachment.htm From julio.reis at tintazul.com.pt Tue Jun 17 15:25:01 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Tue, 17 Jun 2008 23:25:01 +0100 Subject: [gutvol-d] US public domain for works published abroad Message-ID: <1213741501.14935.50.camel@abetarda> Hmmm... guys, probably everyone's got here before I did. And yet -- http://www.copyright.cornell.edu/public_domain/ -- is this reliable? Under "Works Published Outside the U.S. by Foreign Nationals or U.S. Citizens Living Abroad," see line: * Date of Publication: 1923 through 1977 * Conditions: Published without compliance with US formalities, and in the public domain in its home country as of 1 January 1996 * Copyright Term in the United States: In the public domain Footnote 11 reveals that "US formalities include the requirement that a formal notice of copyright be included in the work; registration, renewal, and deposit of copies in the Copyright Office; and the manufacture of the work in the US." It follows that Gutenberg *could* clear a few more books... Rule 10? :) Published outside the US (not in the US within 30 days) *and* from 1923 to 1977 *and* (no formal notice of copyright *or* no deposit with US Copyright Office *or* not made in the USA) *and* PD in country of publication as of 1 Jan 1996 = Cleared. Rule 10-PT would clear books published in Portugal of authors dead on or before 1925; not much more, but a bit more (already enough to clear something in my library.) Rule 10-BR would clear books published in Brazil of authors dead on or before 1935; and so on and so forth. So? Is it worth PG's effort? J?lio. From prosfilaes at gmail.com Tue Jun 17 16:45:29 2008 From: prosfilaes at gmail.com (David Starner) Date: Tue, 17 Jun 2008 19:45:29 -0400 Subject: [gutvol-d] US public domain for works published abroad In-Reply-To: <1213741501.14935.50.camel@abetarda> References: <1213741501.14935.50.camel@abetarda> Message-ID: <6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com> On Tue, Jun 17, 2008 at 6:25 PM, J?lio Reis wrote: > So? Is it worth PG's effort? I think it's been mentioned as possible before. But there are a couple complexities. First, it basically requires a Rule 6 check, since it's very hard to tell whether a book was published in the US within 30 days or not. The other big issue is "in the public domain in its home country as of 1 January 1996". What, exactly, is a reliable source for the state of the public domain in Portugal in 1996? Especially one in English, so our copyright clearers can verify it? Given continuing changes in copyright law around the world, and the fact that things just aren't as simple as life+70 in so many cases I know of, make this somewhat difficult. (One detail; 1996 only applies to Berne convention countries; Vietnam is later, and other countries may have a start date in the 21st century, when we start to retroactively recognized Iraqi and Iranian copyrights.) Personally, if you are interested, I think your best bet is take the book you want, establish very carefully that it is in the public domain under these rules, and try to get a copyright clearance. It'll force them to look at the situation, and if they accept it, they'll have a precedence. If PG does start clearing books this way, I wouldn't be surprised if they make a list of countries to work from, that they feel confident about the state of copyright law From sly at victoria.tc.ca Tue Jun 17 18:10:39 2008 From: sly at victoria.tc.ca (Andrew Sly) Date: Tue, 17 Jun 2008 18:10:39 -0700 (PDT) Subject: [gutvol-d] US public domain for works published abroad In-Reply-To: <1213741501.14935.50.camel@abetarda> References: <1213741501.14935.50.camel@abetarda> Message-ID: I have thought the same thing about works published in the former Soviet union. But trying to approach the actual copyright laws involved gets confused. Andrew On Tue, 17 Jun 2008, J?lio Reis wrote: > Hmmm... guys, probably everyone's got here before I did. And yet -- > > http://www.copyright.cornell.edu/public_domain/ -- is this reliable? > > Under "Works Published Outside the U.S. by Foreign Nationals or U.S. > Citizens Living Abroad," see line: > > * Date of Publication: 1923 through 1977 > * Conditions: Published without compliance with US formalities, and in > the public domain in its home country as of 1 January 1996 > * Copyright Term in the United States: In the public domain > > Footnote 11 reveals that "US formalities include the requirement that a > formal notice of copyright be included in the work; registration, > renewal, and deposit of copies in the Copyright Office; and the > manufacture of the work in the US." > > It follows that Gutenberg *could* clear a few more books... Rule 10? :) > Published outside the US (not in the US within 30 days) *and* from 1923 > to 1977 *and* (no formal notice of copyright *or* no deposit with US > Copyright Office *or* not made in the USA) *and* PD in country of > publication as of 1 Jan 1996 = Cleared. > > Rule 10-PT would clear books published in Portugal of authors dead on or > before 1925; not much more, but a bit more (already enough to clear > something in my library.) > > Rule 10-BR would clear books published in Brazil of authors dead on or > before 1935; and so on and so forth. > > So? Is it worth PG's effort? > > J?lio. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/listinfo.cgi/gutvol-d > From walter.van.holst at xs4all.nl Tue Jun 17 22:06:46 2008 From: walter.van.holst at xs4all.nl (Walter van Holst) Date: Wed, 18 Jun 2008 07:06:46 +0200 Subject: [gutvol-d] US public domain for works published abroad In-Reply-To: <6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com> References: <1213741501.14935.50.camel@abetarda> <6d99d1fd0806171645r31dd4cbbq6d9bcdd4aa7a9ac0@mail.gmail.com> Message-ID: <485897E6.3060008@xs4all.nl> David Starner wrote: > days or not. The other big issue is "in the public domain in its home > country as of 1 January 1996". What, exactly, is a reliable source for > the state of the public domain in Portugal in 1996? Especially one in > English, so our copyright clearers can verify it? Given continuing That would be Portugese copyright statutes at that date. Who are not very likely to be available in English and unlikely to be available online. It nonetheless would be very useful to compile a set of EU member state copyright status as of the 1st of 1995 and the 1st of 1996. The first set since in some EU member states the date of 1st of 1995 is the marker date for retro-active extension from 50+ to 70+, the second for this rule 10 clearance. Regards, Walter From Bowerbird at aol.com Wed Jun 18 11:20:49 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 18 Jun 2008 14:20:49 EDT Subject: [gutvol-d] what dave did Message-ID: seth godin has an interesting post: > http://sethgodin.typepad.com/seths_blog/2008/06/what-dave-just.html -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/bd4175b8/attachment-0001.htm From Bowerbird at aol.com Wed Jun 18 12:57:04 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 18 Jun 2008 15:57:04 EDT Subject: [gutvol-d] across the mesa Message-ID: ok, here are the results for the parallel-p1 book that i talked about yesterday, "across the mesa". as usual, rfrank did good work here. the scans are fairly straight and very clear, and the quality of the o.c.r. is very good. projects like this are a joy to work on... and again, as usual, the p1 proofers have done a very good job as well. there were just 234 points of difference between the parallel proofings... on a file with over 9,200 lines -- a book of 318 pages -- that's excellent... and the typical results continue on, because roughly 85% of those 234 diffs could -- and should -- have been resolved with some good preprocessing before _any_ of this text even went in front of the proofers... sad but true... you can tell the preprocessing was inferior because of some obvious gaffs, like a line that begins with a semicolon. that doesn't happen in real books. the fact that such a line showed up in the diff results between the proofings means, on the good news front, that one of the proofings caught this error. of course, on the bad news front, it means that one of them also missed it, but as long as one of the proofings caught it, our awareness of it is there... still, this is the kind of thing that the computer can locate, so why not use it? so, in sum, i'm glad a sharp cookie like rfrank is working on programming a good preprocessing tool for d.p. i just wish he was making better progress. *** i've processed this project as z.m.l., and uploaded it to my website: > http://z-m-l.com/go/amesa/amesap123.html > http://z-m-l.com/go/amesa/amesa.zml -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/ad24957a/attachment.htm From jeroen.mailinglist at bohol.ph Wed Jun 18 14:06:35 2008 From: jeroen.mailinglist at bohol.ph (Jeroen Hellingman (Mailing List Account)) Date: Wed, 18 Jun 2008 23:06:35 +0200 Subject: [gutvol-d] US public domain for works published abroad In-Reply-To: <1213741501.14935.50.camel@abetarda> References: <1213741501.14935.50.camel@abetarda> Message-ID: <485978DB.8040204@bohol.ph> Hi Julio, I think in almost all cases, it would be easier to go through the PG-Canada (or PG-Philippines, but I have not yet made that operational for life+50) route, which work under a life+50 regime. Jeroen. J?lio Reis wrote: > Hmmm... guys, probably everyone's got here before I did. And yet -- > > http://www.copyright.cornell.edu/public_domain/ -- is this reliable? > > Under "Works Published Outside the U.S. by Foreign Nationals or U.S. > Citizens Living Abroad," see line: > > * Date of Publication: 1923 through 1977 > * Conditions: Published without compliance with US formalities, and in > the public domain in its home country as of 1 January 1996 > * Copyright Term in the United States: In the public domain > > From Bowerbird at aol.com Wed Jun 18 17:29:24 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 18 Jun 2008 20:29:24 EDT Subject: [gutvol-d] kid rock rocks kids Message-ID: atlantic records went to kid rock, who's on their label, telling him that he needed to say something publicly against downloading, because "people are stealing from us and stealing from you"... "wait a second, you've been stealing from the artists for years," he responded, "but now you want me to stand up for you?" instead, he started spreading the opposite message: "i was telling kids -- download it illegally, i don't care. i want you to hear my music so i can play live." -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080618/f5f1ef5a/attachment.htm From Bowerbird at aol.com Thu Jun 19 09:28:47 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 19 Jun 2008 12:28:47 EDT Subject: [gutvol-d] across the mesa Message-ID: "across the mesa" flew through p2. 48 hours, in about 10 sessions. once again, we have solid proof just how good the p1 proofers are, especially when they get two cracks at a book, like this parallel-p1. p2 made just _20_ corrections, on 17 pages, in this 318-page book, and 7 of those 20 were concerning blank lines at the top of a page... that's pretty impressive. it means that -- even _without_ the p2 -- the parallel-p1 took the book far past the 1-error-every-10-pages standard i have proposed for my "continuous proofreading" method. but the story doesn't end there... because when we take a closer look at the 20 errors that p2 fixed, we find that every single one could've been fixed in preprocessing. that's right, _every_single_one_! 20 out of 20. 100%. i've listed them below, because i'm sure some bloomin' idiot out there doesn't believe me. by the way, the reg-ex i've listed here turned up 2 more errors which were missed by both rounds of p1 plus the p2 proofers... on page 106: > "Game's up, Pachuca." he said, shortly. "You're > "Game's up, Pachuca!" he said, shortly. "You're on page 294: > "It's useless, of course," grunted Scott "They'll > "It's useless, of course," grunted Scott. "They'll *** as usual, i'm amazed/appalled by the huge disutility of having human beings search for errors by comparing word-for-word, when the computer can find them much easier. it's a big waste. i mean, sure, if you want the humans to search word-for-word for things that the computer _cannot_ find, that's fine. but first take care of all the errors that the computer _can_ find by itself. c'mon, people, open your eyes... -bowerbird p1> flat and stilling, to a region of small hills and valleys; p2> flat and stifling, to a region of small hills and valleys; bb> auto-detectable by spell-check p1> did not think you would stay with the Senora Morgan." p2> did not think you would stay with the Se??ora Morgan." bb> auto-detectable by spell-check p1> "I wouldn't call that queer," replied Scott p2> "I wouldn't call that queer," replied Scott. bb> auto-detectable by [:lowercase:][:whitespace:]\"[:uppercase:] p1> are you?". p2> are you?" bb> auto-detectable by punctuation-check p1> Li back on one of them to-night" p2> Li back on one of them to-night." bb> auto-detectable by paragraph-termination-check p1> thought p2> thought. bb> auto-detectable by paragraph-termination-check p1> and one of the candidates for the next presidency----"said p2> and one of the candidates for the next presidency----" said bb> auto-detectable by doublequote-check p1> said Scott, thoughtfully. "Or break it" p2> said Scott, thoughtfully. "Or break it." bb> auto-detectable by paragraph-termination-check p1> here in half a minute if I don't" p2> here in half a minute if I don't." bb> auto-detectable by paragraph-termination-check p1> "Mr. Hellick got flend--Mrs. Conlad." said Li, p2> "Mr. Hellick got flend--Mrs. Conlad," said Li, bb> auto-detectable by \.\" [:lowercase:] p1> Angel Gonzales. a large, brutal-looking man, his face p2> Angel Gonzales, a large, brutal-looking man, his face bb> auto-detectable by \.\ [:lowercase:] p1> "Men, and horses and plunder--oh, much plunder1" p2> "Men, and horses and plunder--oh, much plunder!" bb> auto-detectable by alpha/numeric-check p1> a yarn, Mr. Penhallow. and then you've got to help me p2> a yarn, Mr. Penhallow, and then you've got to help me bb> auto-detectable by \.\ [:lowercase:] *** new paragraphs on pagebreaks should be auto-detected... pb> Emma," said the girl. "But Mr. Adams has been telling pb> Just bring over a couple of blankets, will you, Mrs. pb> "Wearin' of the Green" and the "Long, Long Trail." pb> "You see, I've got mining in my blood. My grandfather pb> Anyhow, Mrs. Conrad married her Englishman and pb> Scott, whose impatience and irritation made speech unendurable. pb> Pachuca--apart from the raid, at least, he thinks he ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/1e3b3403/attachment.htm From Bowerbird at aol.com Thu Jun 19 14:02:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 19 Jun 2008 17:02:08 EDT Subject: [gutvol-d] art in the cloud and on glossy stock Message-ID: derek powazek -- a writer/geek out of the bay area -- has been associated with some neat stuff, including this: > MagCloud enables you to publish your own magazines. > All you have to do is upload a PDF > and we'll take care of the rest: > printing, mailing, subscription management, and more. > http://magcloud.com/ print-on-demand magazines. what will they think of next? :+) -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/1023beb5/attachment.htm From Bowerbird at aol.com Thu Jun 19 14:26:04 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 19 Jun 2008 17:26:04 EDT Subject: [gutvol-d] art in the cloud and on glossy stock Message-ID: oh yeah, the thing i forgot to tell you is that magcloud.com is backed by hp labs... derek informs why this is important: > there are other print-on-demand companies out there, but > MagCloud is the only one designed specifically for magazines. > And it's the only one created by HP, the company that makes > the Indigo printers that power the print-on-demand industry. in other words, hp labs has a good incentive to _make_this_happen_. they actually _want_ it to work. how many "tests" have we seen done by some entity that we weren't all too convinced wanted it to succeed? too many. so it's nice to see that shoe on the other foot for a change. derek has more insightful things to say, so go read what he wrote: > http://powazek.com/posts/984 -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080619/f3b13ce7/attachment.htm From julio.reis at tintazul.com.pt Thu Jun 19 14:45:37 2008 From: julio.reis at tintazul.com.pt (=?ISO-8859-1?Q?J=FAlio?= Reis) Date: Thu, 19 Jun 2008 22:45:37 +0100 Subject: [gutvol-d] US public domain for works published abroad In-Reply-To: References: Message-ID: <1213911940.8125.30.camel@abetarda> Thanks for your answers... they all contributed, but Jeroen's reply nails the last nail: why complicate? The goal behind my question was not to publish ebooks in PG-USA; but rather to have *somewhere* where I can publish all books in the public domain in Portugal and Brazil. PG-Canada is that place: we're life+70 they're life+50 so if it's free here, we can publish it there. Plus I've just PMed+PPed the first book in Portuguese in PGDP-Canada. A sonnet book by one of our greatest poets, published in the 1930s. J?lio. > I think in almost all cases, it would be easier to go through > the > PG-Canada (or PG-Philippines, but I have not yet made that > operational for life+50) route, which work under a life+50 > regime. From Bowerbird at aol.com Fri Jun 20 13:29:42 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 20 Jun 2008 16:29:42 EDT Subject: [gutvol-d] across the mesa -- the conclusion Message-ID: let's put it in stark terms. after my full-on preprocessing of "across the mesa", there were just 11 errors that human proofers found. an annotated list is appended... 6 were found by one parallel p1 proofing, 5 by the other. this is the kind of accuracy you can expect when a book is: 1) carefully scanned so the scan-set is clean, 2) subjected to o.c.r. with a good o.c.r. app, and 3) preprocessed correctly... all 11 of these errors would've been caught by engaged readers, so need for human proofers on this book is highly questionable. -bowerbird ====================================================================== == a> He tried, several tunes, but the door held maddeningly. ======= == b> He tried, several times, but the door held maddeningly. ======= == d> ===================^^====================== tunes vs. times* == ====================================================================== == a> in the middle of the night, too." Mrs. Van Zandt ============== == b> in the middle of the night, too," Mrs. Van Zandt ============== == d> ===============================^====== [period]* vs. [comma] == ====================================================================== == a> down here," Scott was riding with his knee around ============= == b> down here." Scott was riding with his knee around ============= == d> =========^============================ [comma] vs. [period]* == ====================================================================== == a> some shickens -- netting else left." ========================== == b> some shickens -- notting else left." ========================== == d> ==================^==================== netting vs. notting* == ====================================================================== == a> up the community. Herrick. I want you to know Bob ============= == b> up the community. Herrick, I want you to know Bob ============= == d> =========================^============ [period] vs. [comma]* == ====================================================================== == a> "Could you ride. Henry, do you think? You and ================= == b> "Could you ride, Henry, do you think? You and ================= == d> ===============^====================== [period] vs. [comma]* == ====================================================================== == a> "But, Henry, I can't stand it! And I look so! I =============== == b> "But. Henry, I can't stand it! And I look so! I =============== == d> ====^================================= [comma]* vs. [period] == ====================================================================== == a> "'Twa'n't much. I took my time. You see, the ================== == b> "Twa'n't much. I took my time. You see, the =================== == d> =^=============================== singlequote* vs. (missing) == ====================================================================== == a> of him. Get out of my way, Hard." ============================= == b> of him. Get out of my way. Hard." ============================= == d> =========================^============ [comma]* vs. [period] == ====================================================================== == a> "Oh, laugh if you want to," said Polly, indulgently. ========== == b> "Oh, laugh if you want to," said Polly, indulgently, ========== == d> ======================== [period]* vs. [comma] ====^=========== ====================================================================== == a> against the bandits which have nourished so long ============== == b> against the bandits which have flourished so long ============= == d> ===================^^================= nourish vs. flourish* == ====================================================================== ===================== 2 [period]* vs. [comma] ======================== ===================== 2 [period] vs. [comma]* ======================== ===================== 2 [comma]* vs. [period] ======================== ===================== 1 [comma] vs. [period]* ======================== ====================================================================== ===================== singlequote* vs. (missing) ===================== ====================================================================== ===================== tunes vs. times* =============================== ===================== netting vs. notting* =========================== ===================== nourish vs. flourish* ========================== ====================================================================== ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080620/5817f846/attachment.htm From Bowerbird at aol.com Mon Jun 23 11:19:59 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 23 Jun 2008 14:19:59 EDT Subject: [gutvol-d] roger's girls -- 001 Message-ID: roger frank recently had 3 titles posted that include the world "girls" in the title, so i'm calling them "roger's girls", and will be presenting some reports on them: > 25870 -- A World of Girls, by L. T. Meade > 25872 -- Girls of the Forest, by L. T. Meade > 25873 -- The Motor Girls on Crystal Bay, by Margaret Penrose -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080623/ddbefdea/attachment.htm From Bowerbird at aol.com Mon Jun 23 11:32:56 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 23 Jun 2008 14:32:56 EDT Subject: [gutvol-d] english espresso Message-ID: blackwell -- a 60-store bookstore-chain in england -- is putting the print-on-demand espresso book machine into their shops. the machine can print one million titles, 600,000 from a partnership with lightning source, with the rest of them being public-domain. (finally, a number that we can count on as having eliminated the duplicates.) vince gun, the c.e.o. of blackwell, said: > From a retailer's point of view, even allowing for the > first-generation technology and publisher challenges, > this is a fantastic opportunity?sell to demand with > no risk to inventory and an opportunity to create > incremental revenue streams for ourselves and publishers. wow. a bookseller with a brain. who knew? > http://thebookseller.com/news/61423-blackwell-brews-up-espresso.html -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080623/ee3ce867/attachment.htm From Bowerbird at aol.com Wed Jun 25 12:40:59 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 25 Jun 2008 15:40:59 EDT Subject: [gutvol-d] how little they know Message-ID: sometimes it's just _staggering_ how little they know over at d.p. consider this post, from roger frank: > http://www.pgdp.net/phpBB2/viewtopic.php?t=33945 it starts out like this: > The first time was an accident. Two PMs were running the same book > through DP and it wasn't caught until after P1. Out of curiousity, > the results were compared. It appeared that both P1 proofers, > in parallel, found most of the same errors, but each of them found > some that the other didn't. Letting a single merged copy go through P2 > showed that together, they caught nearly all the errors. Was this just luck? does roger _really_ think that this was "the first time" for parallel proofing? seriously? does he have no knowledge that this methodology has a _very_ long history, going back to the earliest days of _keypunching_ -- where it was called "double-punching" -- and likely was known even before then? and does he really not know that parallel proofing has proven itself already? does he really not know that i've pointed out that it could be usefully applied at distributed proofreaders, and making that observation for _years_ now? evidently not, because he thinks it's "time to explore this scientifically"... and gee, i _know_ that he read at least _some_ of this d.p. forum thread: > http://www.pgdp.net/phpBB2/viewtopic.php?t=33945 where i discussed "revolutionary o.c.r. proofing", a methodology based on _comparing_versions_, because he commented on the very first page. again, roger is doing some good work: > http://fadedpage.com//ppgen-doc.htm but his failure to do research -- failure to even go "next door" here to the p.g. listserve so he'd know what's being said here-- is very disappointing... (and those d.p. people who _are_ present, and are not telling roger that he should be here, are doing him a disservice, and are not his friends...) and, oh yeah, as i've said here _several_ times, in discussing the various parallel p1 experiments i've analyzed, the _real_question_ is not "whether" parallel proofing "works" -- because we _know_ that it works -- but rather _whether_ it's _more_cost-effective_ than 2 _serial_ rounds of p1 proofing. of course, at d.p., the "cost" of a p1 proofing is _literally_ next to nothing, so i guess that _any_ benefit will be "cost-effective" from that standpoint... -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080625/5d3e2679/attachment.htm From Bowerbird at aol.com Wed Jun 25 16:18:16 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 25 Jun 2008 19:18:16 EDT Subject: [gutvol-d] how little we all know Message-ID: does it make it sound nicer if i include myself, by using "we"? because if you want to know what _my_ research tells me, it's that docs.google.com is the future of online documents, including books. wysiwyg has proven that people will chose it if they can. 25 years of desktop experience shows that, very clearly. even though it's hardly the best workflow methodology, wysiwyg is strongly preferred over dealing inside formats. this includes .html, but also includes .zml -- my format -- and "least markup" -- rfrank's format -- and .xml, .epub, .wikimarkup, .docbook, .rtf, .txt, .pdf, and whatever else... behind the scenes, google can convert to whatever format, but the face it presents to the user is a wysiwyg interface... even though we might prefer a more semantic approach, wysiwyg is going to be the interface that we're stuck with. the project gutenberg library could be put in google-docs, in its entirety, formatted and styled quite nicely, with the ability for any authorized person to make corrections to it. any of the possible conversions that an end-user needed could be requested and received, by that user, on the fly... there's very little reason not to do it this way... -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080625/643f3d9d/attachment.htm From Bowerbird at aol.com Thu Jun 26 10:51:22 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 26 Jun 2008 13:51:22 EDT Subject: [gutvol-d] roger's girls -- 002 Message-ID: i took a look at 3 digitizations done by roger frank (rfrank): > world of girls > girls of the forest > motor girls at crystal bay i thought these projects would be a little more interesting than they turned out to be. although they've just been posted, they all entered d.p. prior to december of last year, when rfrank began his foray into preprocessing. so all of these books were inadequately prepared, a la the bad old days over at d.p (which, for most content providers, are still going on)... the absence of good preprocessing means that these books contained hundreds and hundreds of errors for p1 to correct, with as many as 500 of them being spacey-quotes, which -- as i have mentioned previously -- can be fixed _automatically_ by the computer, and thus shouldn't be subjected to humans... anyway, what i found with these books is the usual pattern -- you remember, the pattern that we've come to expect -- the pattern that seems to capture a "common-sense" take, which is that p1 fixes most of the errors, p2 gets most of the remaining ones, and p3 comes in and does clean-up. again, this is the pattern you get on page after page, in book after book, day after day, over in d.p.-land... here are the number of lines changed, by round: title --------- lines/ _p1 / _p2 / _p3 / _f1 / _f2 world girls --- 10,000/ 700 / 150 / 079 / 193 / 010 girls forest -- 10,000/ 350 / 075 / 016 / 036 / 020 motor girls --- 07,000/ 300 / 100 / 009 / 204 / --- due to the inadequate preprocessing, p1 made hundreds of fixes, then p2 made about 20%-33% as many corrections as p1 had made. the amazing thing is the small number of changes made by p3, from about half as many as p2 down to _one_tenth_ as many... likewise, while f1 made as many as 200 changes, f2 made 10-20. the reason that the very small changes made by p3 and f2 are _significant_ is that p3 and f2 are the current _big_bottlenecks_ in the workflow at distributed proofreaders. there are very few volunteers working in p3 and f2, relative to the other "lower-ranking" stations, so it's simply unreasonable (some might say "stupid") to expect that p3 and f2 can handle all of the material that's being generated by the earlier rounds. ergo, bottleneck. so "girls in the forest", was held up waiting for p3 to change _16_ lines, and then held up again waiting for f2 to change _20_ lines. and this on a book that had 10,000 lines in it! thus, many books are waiting a _long_ time for these "high-level" rounds, where _virtually_nothing_ happens to them. it's amazing! the other experiments have indicated that repeated p1 rounds could have found all of the errors that were located during p3. when one considers all of the various resources being wasted in _testing_ volunteers for p3 and f2, and the ill will generated when people "fail" to pass these tests, as well as all the energy being squandered in "round-skipping" and the like, it's simply astounding that this tiered system hasn't already been scrapped. but then again, the "powers that be" at d.p. do _not_ like to admit they were wrong. so look for them to patch over these problems. anyway, if anyone wants to see my output for this research, say so, and i'll post it. otherwise, there isn't much to see here, so move on. i will remark on one coincidence, however. "world of girls" was the 13,000th e-text done by distributed proofreaders. on the one hand, hearty congratulations to the dedicated volunteers making it happen! a big round number like that is indeed a good reason for celebration. on the other hand, it's extremely disappointing that, 13,000 e-texts in, d.p. still hasn't learned the value of doing a good job of preprocessing. the "leadership" that allows this travesty to hobble their volunteers is guilty of a serious failure to wisely utilize the labor being contributed in good faith. and they've been told this, repeatedly, and they ignore it, which further compounds their culpability. it's time to fix this problem. -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080626/191b1742/attachment.htm From Bowerbird at aol.com Fri Jun 27 13:48:08 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 27 Jun 2008 16:48:08 EDT Subject: [gutvol-d] the mountain Message-ID: over on the d.p. forums, rfrank (roger frank) announced this morning that he is running yet another parallel-proofing experiment. oh boy... unfortunately, he's put the cart in front of the horse, because he _still_ isn't doing the type of decent preprocessing that d.p. should be doing. which means he's still expecting the human volunteers to find and fix errors that the _computer_ is much better at finding and helping us fix. so what i've done is to show people what a well-preprocessed version of this file would look like. you can find it up on my website already: > http://z-m-l.com/go/mount/mount.zml so, no, it doesn't take long to do the preprocessing right. not long at all. and of course, once the preprocessing has been done, then the book is ready for "continuous proofreading", so i put those files up as well. here are a bunch of various pages in the book: > http://z-m-l.com/go/mount/mountp001.html > http://z-m-l.com/go/mount/mountp123.html > http://z-m-l.com/go/mount/mountp234.html > http://z-m-l.com/go/mount/mountp345.html luckily, rfrank arranged this so the pagenumber/filenames matched, so i didn't have to do any renaming of these files... it sure makes it more _convenient_ when the content-provider does that from the very start... *** over here: > http://www.pgdp.net/phpBB2/viewtopic.php?p=467791#467791 roger says this: > I know the concept of parallel or redundant processing has > been around and it's been applied effectively to many things. > I'm trying to learn how it applies to proofing. > "What kinds of errors are typically missed by both proofers?" save your time, roger. i've looked at lots and lots and lots of books, and there's no rhyme or reason for why proofers miss what they do. sometimes they catch errors that you'd think would be very elusive. and at other times they can miss what is right in front of their nose. the only thing you can count on is that, once something that you've missed is pointed out to you, you'll wonder how you _ever_ missed it, because it will stick out like a sore thumb. the smartest course of action is to catch as much as you can with the computer, in advance, and then just hope the humans catch the rest. -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080627/26418bfe/attachment.htm From Bowerbird at aol.com Sat Jun 28 18:39:13 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 28 Jun 2008 21:39:13 EDT Subject: [gutvol-d] the mountain Message-ID: i decided to move "blood mountain" along a bit, so it should be pretty much finished right now... all the people who claim that i don't know how to do this shit are invited to find the flaws in my work. > http://z-m-l.com/go/mount/mount.zml > http://z-m-l.com/go/mount/mountp001.html -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080628/37e3276e/attachment.htm From Bowerbird at aol.com Mon Jun 30 22:35:00 2008 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 1 Jul 2008 01:35:00 EDT Subject: [gutvol-d] continued confusion over at distributed proofreaders Message-ID: the question of whether o.c.r. can be "too perfect" has come up at d.p. > http://www.pgdp.net/phpBB2/viewtopic.php?p=468492#468492 someone actually is considering _leaving_in_ the runheads on a book, thus forcing the proofers to remove them _manually_ instead. sheesh! because otherwise the pages would be "too perfect", and the proofers would "become bored and perhaps miss something". this is hogwash... *** rfrank (roger frank) said: > If page after page goes by, does a proofer's attention fade? > I believe it does. everyone is entitled to an "opinion", of course. but why not test this? _my_ opinion is that the proofer's attention will not _necessarily_ fade. we've seen plenty of cases where a proofer who is doing page after page of clean text finds an isolated error, one that the earlier proofers -- who _made_ those pages so clean -- had missed... and the fact that those earlier proofers _missed_ an error shows that, even with their attention sufficiently "engaged", it was still possible for them to miss an error... i know that, from my own personal perspective, i have spotted typos in books published by bib publishers, when i wasn't even looking for 'em, because i _expected_ that the book was error-free. typos just stick out. but we _know_, from the _evidence_ of _several_earlier_experiments_, that p1 proofers -- in their second pass, thus on much cleaner text -- perform _just_as_well_ (and sometimes even better) than p3 proofers... this has been demonstrated, consistently, on many different occasions. so it's a pity that rfrank wasn't paying attention to those experiments... > Does a proofer get satisfaction in finding and making a correction? > Again, I believe that is true. well, sure that causes satisfaction. but do they still have that same sense of satisfaction when the computer could have found that very same error, _immediately_, 100% of the time? i would think that being the _engineer_ of that computer-aided process would be _far_ more satisfying, and take _much_ less human resources. and _i_ believe that proofers who _miss_ those errors, after they've spent literally hours and hours proofreading a book, only to have the computer find them _instantly_, are bound to be quite disappointed in themselves... > It's not proofing, but its relevant: > I've put several books in smoothreading > and have gotten the comments > "I wish you had given me something to find." i don't think that that _is_ relevant. (and "its" should be "it's".) if people are smoothreading a book _only_ to find errors, then they're wasting their time. they should only smoothread books if/when they are actually _interested_ in reading the content... > I regularly pre-process beyond what guiprep does. > However, some things that I could pre-process correctly > prehaps 98% of the time, I'll leave in the text because of > the cost of finding and fixing a mis-correction. This has > been discussed somewhere in the wayback, for example, > in adjusting spacing around quote marks. ok, this is just wack. (and "prehaps" should be "perhaps".) and not just because i laid out the _correct_ argument, and juliet laid out the _incorrect_ one, and rfrank chose the _incorrect_ argument. but because he then _generalized_ the incorrect argument. > in the newcomer's only books I've started, I've observed that > the number of corrections made by P2 dropped significantly > once I started pre-processing. that's because, once you start giving the p1 people cleaner text, they are _far_ more likely to find _all_ of the errors on a page... when you give them _dirty_ text, they will find _most_ of the errors, but they will leave a good number of errors as well... and this -- all by itself -- largely refutes the question at the top... we want to move the text as close to perfection as soon as possible. ideally, we would make it perfect in preprocessing, and then have the first p1 pass be the first no-change confirmation that it _was_ perfect, and the second p1 pass be the second no-change verification of that, in which case (in my opinion), we would be able to certify it as perfect. > For a while I was marking suspects with small x marks > before I realized that the mark wasn't showing up in guiguts, > depending on the font selected. What good was that? it could be a _lot_ of good, so guiguts should be improved, to take full advantage of this. why let a tool hold you back? > Then I switched to asterisks, but some new proofers > thought they had to leave every asterisk in place > (ala a proofer's note) even though the instructions were that > it was a warning in an area to be scrutinized closely > and then removed. so use a tilda~ or some other character that won't confuse them. > I don't have statistical data, but I know preprocessing makes for > fewer P2 diffs on Newcomers Only P1s work. that's because p1 got it right, so there was nothing for p2 to change. > But that does not mean that > the book is any better at the end of P2 > than if there were more errors to start with > and both the P1 and the P2 had more to do. but it _does_ mean p2 could've been bypassed, for many pages. or that the second pass could have been done by (plentiful) p1, instead of the (far less plentiful) p2, thus conserving resources. and it _certainly_ means that those pages -- on which p2 made no changes -- are _much_more_likely_ to be able to skip a p3... -bowerbird ************** Gas prices getting you down? Search AOL Autos for fuel-efficient used cars. (http://autos.aol.com/used?ncid=aolaut00050000000007) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.pglaf.org/private.cgi/gutvol-d/attachments/20080701/6cc7cea1/attachment.htm