From Bowerbird at aol.com Thu Apr 1 00:48:01 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 Apr 2010 03:48:01 EDT Subject: [gutvol-d] the ipad and project gutenberg Message-ID: <8ec0b.712bbbb8.38e5a9b1@aol.com> oh boy, the ipad is almost here... gonna change the world, you watch. nobody has mentioned here yet how apple scooped up the p.g. corpus... i wonder if they gave p.g. any money? i read apple is using the .epub files. oh yeah, now _that_ is a good idea... to put those crappy files in front of a user-base that cares about quality. yeah, that's a _really_ good idea, yep. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From schultzk at uni-trier.de Thu Apr 1 05:04:36 2010 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Thu, 1 Apr 2010 14:04:36 +0200 Subject: [gutvol-d] Re: here's my collaborative proofing system you can look at In-Reply-To: <4BB38407.1040402@novomail.net> References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net> Message-ID: Am 31.03.2010 um 19:19 schrieb Lee Passey: > I would argue that one of the most important lessons of the internet is that there is practically no such thing as an "average" user. One of the beauties of well-constructed HTML is that it accommodates itself to /all/ users, not just the mythical "average" user. > > If you ever come across a web site where you have a right/left scroll bar (and there are many) you know you have encountered a web designer who's stuck in the desktop publishing world. I would very much disagree here. Because as you say there is no such thing as the standard user. Or a standard font of size for that matter. Sure you can define that certain fonts and sizes are to be used. Yet, does the user have them or what to use them. I agree with you in so far that the general design of a page should not require a scroll bar. Yet, it should show up if the setup a user has makes it necessary. My main machine is a 17" MacBook with the resolution pushed all the way up. At times I zoom pages or change the default sizes and then the pages go unusable. because scroll bars do not pop-up of the page is not designed very intelligently. As far as proofing is concerned, it is actually in the desktop domain and a web-based domain. So what is wrong with having scroll bars. They become more important in editing especial if you have multiple views in one window with different content. You resize the views to accommodate your needs and use the scroll bars when you need to reach parts you rarely need or use. I can remember when it was said you need to design your page for 640x480 resolution because other resolutions were not used that much. Or what you still find this page requires XXXX to display or display properly. Now that is poor design. regards Keith. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 1 09:16:13 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 Apr 2010 12:16:13 EDT Subject: [gutvol-d] Re: huge hole of missing months in the listserve archives Message-ID: missing one month (april of 2009), but otherwise here are the archives: > http://z-m-l.com/gutvold/ -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 1 09:35:16 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 Apr 2010 12:35:16 EDT Subject: [gutvol-d] Re: jim replies to his e-mail Message-ID: jim said: > You ask us to review something that was broken it wasn't "broken". you were just doing it wrong. so i politely told you how to do it right, that is: > make your window wider and/or > make your text bigger without ever making it explicit you were doing it wrong, even though the solutions should've been obvious to you. (perhaps that's why you appear angry?, because the answer to your allegation of a "problem" was so darned obvious?) but you _insisted_ on saying "no, it's _broken_", so i had to become more explicit that, no indeed, jim, that "bug" was _not_ a "bug", but was indeed a feature waiting to happen. i have a cinema screen myself. did you really think that i hadn't seen the very exact same thing that you saw? > and then when we tell you that it is broken > you say you have something hidden in your back pocket > which isn?t broken and that make *US* look stupid? jim, you made _yourself_ look stupid. and you _persist._ > Again, post your software, including your source code, > and then let us talk about it.? Until then you are just > wasting everybody?s time. do you not realize what it sounds like when you just repeat the things you said before that were rejected way back then? > If you want to know why nobody is interested in > using your code: just keep it up. oh, i know full well why few people here give me feedback. it's because i defeated many of 'em in bloody hand-to-hand. back when they thought _they_ had the upper-hand, they were standing in line to post messages to this listserve... look at the archives! dozens of messages a day, day after day. but eventually i defeated each and every one of them, soundly. and they haven't forgotten it, either... but hey, i don't mind if they hold a grudge. because i do too. and i've proven i have better aim than all of them combined... > At the very least please note that you are wasting time > that I could be using to make PG books. jim says i'm wasting his time. how's _that_ for irony? :+) oh well, i guess april fool's day came a bit early for him... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 1 09:46:40 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 Apr 2010 12:46:40 EDT Subject: [gutvol-d] the importance of the scientific method Message-ID: in the past, i have suggested that rfrank would do well to incorporate more scientific method into his experiments. but alas... here's a quote from him in one of his forum threads: > I've always hoped that some of what we seem > to conclude will be compelling enough that > the people at DP or DPC or any of the others > might bake it into their future releases. there are a lot of "compelling" stories that one can dream up that have very little basis in reality... the scientific method is how we filter out those "compelling" stories from _the_truth_, and data is the currency of the realm in scientific experiments. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at hutchinson.net Thu Apr 1 14:29:51 2010 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu, 1 Apr 2010 21:29:51 +0000 (GMT) Subject: [gutvol-d] Re: here's my collaborative proofing system you can look at References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net> Message-ID: <871711989.124272.1270157391343.JavaMail.mail@webmail07> An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 1 14:46:43 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 1 Apr 2010 17:46:43 EDT Subject: [gutvol-d] a horizontal proofing interface, line by line Message-ID: here's a take on a horizontal proofing interface, where the scan is sliced up into individual lines: > http://z-m-l.com/go/lines/pagelines.html might be good if you use an iphone, for instance. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From schultzk at uni-trier.de Fri Apr 2 04:35:18 2010 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri, 2 Apr 2010 13:35:18 +0200 Subject: [gutvol-d] Re: here's my collaborative proofing system you can look at In-Reply-To: <871711989.124272.1270157391343.JavaMail.mail@webmail07> References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net> <871711989.124272.1270157391343.JavaMail.mail@webmail07> Message-ID: <1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de> Hi Everybody, Joshua, Uhmmmm ! I do not think anybody would do proofing editing on a mobile divice, that is phones or PDAs. NetBooks most likely and probably the iPad. Whether a browsers supports a standard generally depends on the browser and not the device it is running on. regards Keith. Am 01.04.2010 um 23:29 schrieb Joshua Hutchinson: > I've had some luck using the CSS max-width > > You can tell an img tag to max-width: 75% and it will resize the image to fit that much of the available area. It'll even dynamically resize when your stretch your browser window. > > Downside: Some browsers *suck* at resize images. IE6 does not support max-width, though pretty much everyone else does on the desktop (haven't tested it with any mobile devices, though). > > Josh > > On Mar 31, 2010, Lee Passey wrote: > > > If anyone has any suggestions as to how to dynamically resize images, > I'm all ears, because this is one of the problems I'm going to need to > resolve for my own co-operative proofing demonstration. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From schultzk at uni-trier.de Fri Apr 2 04:38:18 2010 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri, 2 Apr 2010 13:38:18 +0200 Subject: [gutvol-d] Re: jim replies to his e-mail In-Reply-To: References: Message-ID: <338FEF56-DCA7-4620-A763-0EFDC50685C3@uni-trier.de> Hi BB, I use the link given and I personally did not like the layout or interface, but then again it just me. To be honest I could not make heads or tails of what I was seeing or exactly what to do. regards Keith. Am 01.04.2010 um 18:35 schrieb Bowerbird at aol.com: > jim said: > > You ask us to review something that was broken > > it wasn't "broken". you were just doing it wrong. > > so i politely told you how to do it right, that is: > > > make your window wider > > and/or > > > make your text bigger -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 2 09:05:09 2010 From: jimad at msn.com (Jim Adcock) Date: Fri, 2 Apr 2010 09:05:09 -0700 Subject: [gutvol-d] [SPAM] RE: a horizontal proofing interface, line by line In-Reply-To: References: Message-ID: > http://z-m-l.com/go/lines/pagelines.html I think this option is not bad, although I suspect not every proofer will have the patience to take it "line-by-line." Do you have software to automatically slice the bitmap and align the txt? Or are you slicing the bitmap by hand? From vze3rknp at verizon.net Fri Apr 2 09:16:17 2010 From: vze3rknp at verizon.net (Juliet Sutherland) Date: Fri, 02 Apr 2010 12:16:17 -0400 Subject: [gutvol-d] Re: a horizontal proofing interface, line by line In-Reply-To: References: Message-ID: <4BB61851.9010301@verizon.net> On 4/1/2010 5:46 PM, Bowerbird at aol.com wrote: > here's a take on a horizontal proofing interface, > where the scan is sliced up into individual lines: > > > http://z-m-l.com/go/lines/pagelines.html > > might be good if you use an iphone, for instance. If I were implementing DP again from scratch, I would almost certainly use an interface like this for at least part of the proofing interface/process. There are lots of advantages to putting the text very close to the image like that for close checking of individual characters. I would also provide a full page interface, much like the current DP one, since some things are easier to see when looking at the entire page at once. Implementing this kind of line-by-line interface efficiently is only possible if one has word boundary information (actually, the line boundary) from the OCR. That kind of information was not available when DP started and to add it into the current DP would be so much effort as to make it out of the question. JulietS -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 2 09:18:53 2010 From: jimad at msn.com (Jim Adcock) Date: Fri, 2 Apr 2010 09:18:53 -0700 Subject: [gutvol-d] Re: here's my collaborative proofing system you can look at In-Reply-To: <1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de> References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net> <871711989.124272.1270157391343.JavaMail.mail@webmail07> <1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de> Message-ID: >I do not think anybody would do proofing editing on a mobile divice, that is phones or PDAs. NetBooks most likely and probably the iPad. It would be nice to at least be able to SR on anything that allows input and have a standardized way to flag an error. From dakretz at gmail.com Fri Apr 2 10:12:56 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 2 Apr 2010 11:12:56 -0600 Subject: [gutvol-d] Re: a horizontal proofing interface, line by line In-Reply-To: <4BB61851.9010301@verizon.net> References: <4BB61851.9010301@verizon.net> Message-ID: I would rather have a scrolling semi-transparent gun-slit overlay on the image that would synchronize with the cursor in the text. I can't proof with so little context. On Fri, Apr 2, 2010 at 10:16 AM, Juliet Sutherland wrote: > > On 4/1/2010 5:46 PM, Bowerbird at aol.com wrote: > > here's a take on a horizontal proofing interface, > where the scan is sliced up into individual lines: > > > http://z-m-l.com/go/lines/pagelines.html > > might be good if you use an iphone, for instance. > > If I were implementing DP again from scratch, I would almost certainly use > an interface like this for at least part of the proofing interface/process. > There are lots of advantages to putting the text very close to the image > like that for close checking of individual characters. I would also provide > a full page interface, much like the current DP one, since some things are > easier to see when looking at the entire page at once. > > Implementing this kind of line-by-line interface efficiently is only > possible if one has word boundary information (actually, the line boundary) > from the OCR. That kind of information was not available when DP started and > to add it into the current DP would be so much effort as to make it out of > the question. > > JulietS > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 2 10:16:28 2010 From: jimad at msn.com (Jim Adcock) Date: Fri, 2 Apr 2010 10:16:28 -0700 Subject: [gutvol-d] Re: a horizontal proofing interface, line by line In-Reply-To: <4BB61851.9010301@verizon.net> References: <4BB61851.9010301@verizon.net> Message-ID: >Implementing this kind of line-by-line interface efficiently is only possible if one has word boundary information (actually, the line boundary) from the OCR. That kind of information was not available when DP started and to add it into the current DP would be so much effort as to make it out of the question. I've been thinking about the possibility of this kind of interleaved bitmap/txt at least as an option for SR? You can presumably flag the linebreaks inside the png in a non-intrusive way. It would require generating SR in a file format that can mix bitmap and txt, for example either html or rtf. If the linebreaks are coded non-intrusively inside the png then it is back-compatible with what you have now. Then you just need separate optional page editor software, and you have given the proofers the option of doing it line-by-line. From schultzk at uni-trier.de Fri Apr 2 10:34:54 2010 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri, 2 Apr 2010 19:34:54 +0200 Subject: [gutvol-d] Re: a horizontal proofing interface, line by line In-Reply-To: References: <4BB61851.9010301@verizon.net> Message-ID: <92684726-10E3-42C5-964A-DB00BB91E457@uni-trier.de> You can hide a lot of information in an image. There are algorithm for encripting messages in images. Sorry, I just can not think of the name of the method. regards Keith. Am 02.04.2010 um 19:16 schrieb Jim Adcock: > If the linebreaks are coded non-intrusively inside the png then it is > back-compatible with what you have now. Then you just need separate > optional page editor software, and you have given the proofers the option of > doing it line-by-line. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 2 11:25:35 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 2 Apr 2010 14:25:35 EDT Subject: [gutvol-d] Re: jim replies to his e-mail Message-ID: keith said: > I use the link given and I personally did not like > the layout or interface, but then again it just me. saying you didn't like it is different from saying that "it's broken". but not that much more informative... > To be honest I could not make heads or tails > of what I was seeing or exactly what to do. ok, that's a little bit more clear. or maybe it's not. i think most people would know that you make fixes in the edit-field, so as to match the scan. but maybe you need to be familiar with some other proofing systems in order to use that experience... at any rate, yes, training will certainly be available. thanks for the feedback. :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 2 11:28:36 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 2 Apr 2010 14:28:36 EDT Subject: [gutvol-d] re: a horizontal proofing interface, line by line Message-ID: jim said: > Do you have software to automatically > slice the bitmap and align the txt?? > Or are you slicing the bitmap by hand? i don't do anything by hand, except pheed my phat phace, wave to my neighbors as i drive the car around our streets, pet my cat and dog, and pleasure myself and loved ones... *** juliet said: > Implementing this kind of line-by-line interface > efficiently is only possible if one has > word boundary information (actually, the line boundary) > from the OCR. That kind of information was not available > when DP started and to add it into the current DP would > be so much effort as to make it out of the question. my software doesn't need any such information from o.c.r. *** dakretz said: > I would rather have a scrolling semi-transparent > gun-slit overlay on the image that would synchronize with > the cursor in the text. I can't proof with so little context.? i'm confused. the whole page of context is there, both in the form of the (sliced) image and the (sliced) text... but please, take the text and images i've provided and provide a demo of the system that you have described. > http://z-m-l.com/go/sitka/sitkap002.txt > http://z-m-l.com/go/sitka/sitkap002.png > http://z-m-l.com/go/lines -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 2 11:47:07 2010 From: jimad at msn.com (Jim Adcock) Date: Fri, 2 Apr 2010 11:47:07 -0700 Subject: [gutvol-d] Re: a horizontal proofing interface, line by line In-Reply-To: References: Message-ID: >> Do you have software to automatically >> slice the bitmap and align the txt? >> Or are you slicing the bitmap by hand? > >i don't do anything by hand... OK, suggest you keep working in this direction as I think it would make a contribution. From jimad at msn.com Fri Apr 2 16:26:06 2010 From: jimad at msn.com (James Adcock) Date: Fri, 2 Apr 2010 16:26:06 -0700 Subject: [gutvol-d] Latin / HTML entities Cheat Sheet In-Reply-To: <4BB61851.9010301@verizon.net> References: <4BB61851.9010301@verizon.net> Message-ID: Finally got around to making myself a cheat sheet re Latin / HTML entities: http://www.freekindlebooks.org/Dev/charmap.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 2 18:28:43 2010 From: jimad at msn.com (James Adcock) Date: Fri, 2 Apr 2010 18:28:43 -0700 Subject: [gutvol-d] Kindle on iPad In-Reply-To: References: Message-ID: Amazon has announced Kindle on iPad: http://www.amazon.com/kindleforipad Which, depending on your point of view might mean: "Oh Good, now I can also read my Amazon books on my iPad." Or "Oh Good, now I can proof both ePub and Mobi on iPad" Or "O. S., Amazon is *already* capitulating in the race against Apple!" Competition is good, right? ;-) Personally, having a reader app for the iPad which looks less goofy than that which Apple is proposing looks to me like a good thing. Tempting to get one for SR'ing where one wants to do small edits on the fly. From Bowerbird at aol.com Sat Apr 3 12:22:51 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 3 Apr 2010 15:22:51 EDT Subject: [gutvol-d] Re: a horizontal proofing interface, line by line Message-ID: <78177.23fb8c5e.38e8ef8b@aol.com> jim said: > suggest you keep working in this direction > as I think it would make a contribution. well, yes and no. it's good for doing a word-by-word proofing... but, as i've shown, that methodology is overkill, at least when it is practiced over an entire book that was properly and aggressively preprocessed, as upwards of 90% of the lines are already perfect. it is good at pulling out the other 10% of the lines, however, and subjecting them to a closer scrutiny. plus it does have utility in doing a _comparison_ between two files, as is shown in this screenshot: >?? http://z-m-l.com/go/sitka/findlines.png that screenshot also gives you some hints about how i go about slicing the lines, although that's not really too difficult to figure out on your own. -bowerbird p.s. the pinkish lines in that screenshot are lines that have a diff between the two different versions. the numbers by the asterisks are a simple counter for the number of diffs cumulative to that point... -------------- next part -------------- An HTML attachment was scrubbed... URL: From joyce.b.wilson at sbcglobal.net Mon Apr 5 05:52:13 2010 From: joyce.b.wilson at sbcglobal.net (Joyce Wilson) Date: Mon, 05 Apr 2010 07:52:13 -0500 Subject: [gutvol-d] Full text catalog search Message-ID: <4BB9DCFD.3090008@sbcglobal.net> The "Full Text" search option here is broken, and has been for at least a year. If the thinking is that it doesn't need to be fixed because some of the alternative searches at the bottom of this page are acceptable substitutes, then it makes sense to remove it as an option on the "Advanced Search" page. To have the option there but never find any results with it is just confusing. Regards, Joyce From Bowerbird at aol.com Mon Apr 5 16:27:51 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 5 Apr 2010 19:27:51 EDT Subject: [gutvol-d] the sitka book from fadedpage is now posted Message-ID: <3ad60.6f52c518.38ebcbf7@aol.com> the sitka book, which was produced over at fadedpage.com, was posted over the weekend, so i will finish my version and do a comparison on it. the fadedpage/p.g. version is here: > http://www.gutenberg.org/files/31862/31862.txt i can, however, already report that the posted version _does_ contain a few errors. from page 19 in the original p-book: > http://z-m-l.com/go/sitka/sitkap019.png "yukali" is incorrectly entered as "yuhali", and "prazdnik" is incorrectly given as "prasdnik". in addition, both words are italicized in the original, but unitalicized in the p.g. version, for a grand total of 4 errors on this single page. (troubling is the fact that "yukali" had another occurrence in the book, and the similarity of these two non-dictionary words in the book's word-list _should've_ triggered closer examination.) ironically, there was a note in the forum about this page, saying that it was marked as "done" incorrectly after _no_ proofing, due to an inadvertent click on a wrong button, a note which also then pointed out these changes which needed to be done, but evidently rfrank missed that note. and with nobody else doing that page, the errors persisted. plus there are a few other errors i have already noticed... the word "sheetkah", on page 25, is missing its italics: > http://z-m-l.com/go/sitka/sitkap025.png and the period after "december 14th" should be a comma, in the footnote (#25) which was on page 80 in the p-book: > http://z-m-l.com/go/sitka/sitkap080.png so there are some things worth pointing out about all of this... the first is that, by _my_ standard of one-error-per-10-pages, these 6 errors are _not_ a damning indictment, not nearly so... in and of themselves, these errors are trivial. don't forget that. of course, you don't like to concentrate four errors on one page. but stuff happens. so in terms of this particular case, for _me_, i don't think it's a big deal. however, in terms of the _workflow_, the fact that a page with 4 errors can float through the system without _anyone_ having had a "second look" at it -- or, indeed, even a _first_ look -- is not a good sign, not a good sign at all. moreover, according to clear results of a poll i ran over at d.p., the majority of people over there believe that 5 errors in a whole _book_ is right at the maximum that they are willing to tolerate as "acceptable". even worse, some of them -- in a defiant act of wishful thinking -- actually have convinced themselves that they really do _attain_ that level of accuracy with the books they do! now, clearly that's ridiculous, and they're just fooling themselves; they don't actually _know_ how many errors are in their books, so they just let themselves believe that there aren't any errors there. but in spite of this break from reality, their _expressed_desire_ is that they release books that have 5 or fewer errors in them... viewed from that perspective, this performance -- with 4 errors on a single page and (at least) 6 in a posted book -- is _bad_... -bowerbird p.s. i'd point to the note that was left about that page, except that rfrank seems to take down the thread for a book once it is posted... once again, this data is important for a proper analysis of the test. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 6 15:48:25 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Apr 2010 18:48:25 EDT Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted Message-ID: <53001.fb9fc78.38ed1439@aol.com> rfrank has responded, over on the fadedpage forums, to the note i sent yesterday on the posted "sitka" book. as i might have expected, he fell on his sword and took full blame for the errors that were present... in some part, that's correct, because some errors _had_ been reported, and he simply failed to check the forum. but that's not really the proper take-away message here. a proofer can miss things, and still hit the "done" button, just because they honestly felt they'd caught everything. even the best proofer, with an accurate self-perception. 2 of the 6 errors that i reported were cases just like that. if the only person who's gonna serve as "back-up" for that proofer is the post-processor, that puts too much stress on the post-processor. that's exactly what d.p. has done, and that's why they have so few volunteers for that task... rfrank is willing to take on that stress, which is why he has fallen on his sword. but that's the wrong approach to take. rfrank is also improving his tools, so they catch those glitches. that's good, and it's part of the reason i reported these errors. plus he'd already improved his workflow, to catch _italics_... his improved workflow might work, _if_ the o.c.r. recognizes the styling correctly. but on any books with heavy formatting, going back and reinserting the formatting might be a real pain. a better approach, in my view, would be the one that rfrank started with, when he put up his site, which is to encourage the volunteers to do _both_ the proofing and the formatting. it's really _not_ that difficult to do both these tasks together. this is especially true if you give people a _formatted_display_, because then the obtrusive markup is cleared from the screen, and it's replaced by a rendering that resembles the actual scan. i demonstrated this technique with my own proofing site, and showed the additional strength that questionable words can be highlighted in a different color, maximizing the value of a flag. *** rfrank said: > But the most important thing I've concluded is that > the majority of reportable errors in Sitka are chargeable > to the post-processor (me) and not the roundless system. see, there's the "experimenter bias" that i was talking about; he'd rather take the blame himself than blame his system... i believe in the roundless system too. i believe in it so much, so strongly, that i believe the evidence can stand up for itself. > We don't have nearly enough participation by a realistic > cross-section of typical users here at fadedpage to > conclude anything based on real science or real statistics. > All I can say is that I am liking the roundless system more and > more. It seems to be doing its part well and continues to improve. there isn't a "cross-section" of "typical" proofers over at fadedpage, it's true. the proofers there are probably much better than average. and, as shown, even these better-than-average proofers can _miss_ errors. we're all human. we make mistakes. we need to be checked. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbuchana at teksavvy.com Tue Apr 6 18:20:37 2010 From: gbuchana at teksavvy.com (Gardner Buchanan) Date: Tue, 06 Apr 2010 21:20:37 -0400 Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted In-Reply-To: <53001.fb9fc78.38ed1439@aol.com> References: <53001.fb9fc78.38ed1439@aol.com> Message-ID: <4BBBDDE5.9090309@teksavvy.com> On 06-Apr-2010 18:48, Bowerbird at aol.com wrote: > a proofer can miss things, and still hit the "done" button, What would happen if the proofing system occasionally *inserted* an error into the page and the double-checked that the known error had been found and fixed? eg: find a correctly spelled word with "m" in it and change to "rn". Choose from amongst a list of 100 similar things. It might be a little paternalistic towards the proofreader, but would give the automated system some basis for judging whether the proofreader had *actually* proofed the the page or not. It might also help to keep proofers paying attention. The final test for correctness is then that (1) the fake error is found and fixed, and (2) nothing else changed. I haven't been paying much attention to this thread, so apologies if you've all covered this ground already. ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From dakretz at gmail.com Tue Apr 6 19:03:17 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 6 Apr 2010 19:03:17 -0700 Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted In-Reply-To: <4BBBDDE5.9090309@teksavvy.com> References: <53001.fb9fc78.38ed1439@aol.com> <4BBBDDE5.9090309@teksavvy.com> Message-ID: This comes up all the time. a. It's socially unacceptable by acclamation. b. All you prove is that the proofer did or didn't catch an error you already knew about. On Tue, Apr 6, 2010 at 6:20 PM, Gardner Buchanan wrote: > On 06-Apr-2010 18:48, Bowerbird at aol.com wrote: > >> a proofer can miss things, and still hit the "done" button, >> > > What would happen if the proofing system occasionally > *inserted* an error into the page and the double-checked > that the known error had been found and fixed? eg: find > a correctly spelled word with "m" in it and change to "rn". > Choose from amongst a list of 100 similar things. > > It might be a little paternalistic towards the proofreader, > but would give the automated system some basis for judging > whether the proofreader had *actually* proofed the the page > or not. It might also help to keep proofers paying attention. > > The final test for correctness is then that > (1) the fake error is found and fixed, and > (2) nothing else changed. > > I haven't been paying much attention to this thread, so > apologies if you've all covered this ground already. > > ============================================================ > Gardner Buchanan > Ottawa, ON FreeBSD: Where you want to go. Today. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 6 19:49:27 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 6 Apr 2010 22:49:27 EDT Subject: [gutvol-d] Re: the sitka book from fadedpage is now posted Message-ID: <42bf5.35a96207.38ed4cb7@aol.com> dakretz said: > This comes up all the time. it has been suggested before, yes... > a. It's socially unacceptable by acclamation. ...and is usually deemed socially unacceptable, true... but i'm not sure that that couldn't be turned around, assuming people are genuinely looking to be tested, in a sincere desire to improve. it is a bit repugnant, in my opinion... but so is the secretive collection of data on proofing accuracy that rfrank is doing now, especially since he's not revealing that to _anyone_, including the very people he is collecting data on... > b. All you prove is that the proofer did or > didn't catch an error you already knew about. well, i don't necessarily agree with that. i'd believe your ability to catch the introduced errors would be highly correlated with your general overall accuracy. so if you value that metric, there's one good purpose. (but, for the record, i believe that metric is valueless.) further, the ability to detect the error _immediately_, and show it to the proofer on-the-spot might well be the very best feedback necessary to get their attention, an argument which hasn't been fully considered before. so there _could_ be some real value in this technique. now, i certainly wouldn't do such error-injection on a proofer without their express approval, because it is entirely too _sneaky_ when you are doing it that way, and jeopardizes the trust-relationship, which is vital. but if a proofer _asked_ for it, i think it would be ok... i never subscribed to the "i need to have some errors to keep myself from betting bored" philosophy... but for a proofer who does, this could well be an answer. bottom line, though, i just don't think it's necessary... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 7 12:15:00 2010 From: jimad at msn.com (Jim Adcock) Date: Wed, 7 Apr 2010 12:15:00 -0700 Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted In-Reply-To: <53001.fb9fc78.38ed1439@aol.com> References: <53001.fb9fc78.38ed1439@aol.com> Message-ID: >if the only person who's gonna serve as "back-up" for that proofer is the post-processor, that puts too much stress on the post-processor. that's exactly what d.p. has done, and that's why they have so few volunteers for that task... Agreed that PP at DP is stressful, in part because of the high standards expected there of PPs -- don't see how one can meet their expectations without at least doing a SR pass oneself, which supposedly isn't required. But to me a more fundamental part of the problem is that it is relatively easy to start a book at DP and then expect someone else to fix the "problems" at the other end, leading a PP to stare at a potential PP book and say "why in g--- name would someone have started this book project in the first place???" Again, its pretty easy to find a book project where one can pretty easily predict it will be read 1000 times more than some other book. Which one would you rather PP? A book that gets read 1000 a year, or a book that gets read 1 time a year? If one can "solo" a compelling book myself, or "PP" a drab book at DP, which would you choose? The other problem is if you PP a book you haven't lived with through the entire process then you "start from scratch" with your knowledge of the book, its author, its proofing problems, etc, and getting up to speed IMHO is almost as painful as doing the whole project "solo" in the first place. Don't get me wrong, I *love* the feeling of having others proof one's choice of books at DP and say "wow, this is a really cool book to be proofing!" From jimad at msn.com Wed Apr 7 12:19:52 2010 From: jimad at msn.com (Jim Adcock) Date: Wed, 7 Apr 2010 12:19:52 -0700 Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted In-Reply-To: <4BBBDDE5.9090309@teksavvy.com> References: <53001.fb9fc78.38ed1439@aol.com> <4BBBDDE5.9090309@teksavvy.com> Message-ID: >What would happen if the proofing system occasionally *inserted* an error into the page and the double-checked that the known error had been found and fixed? eg: find a correctly spelled word with "m" in it and change to "rn". Choose from amongst a list of 100 similar things. The problems I see is that it would be hard for the system to "model" the kinds of errors that remain unseen in the book, thus it would train proofers to look for the wrong things. Also, if the system can introduce errors, it had better know how to take them back out. And its not fair to introduce errors on pages that a particular individual has already proofed, for example if for a given page I P2 and PP then I do NOT want to have to go into "paranoid mode" and look during PP for errors introduced after I had already P2'ed. Proofing *already* leads too much to the feeling that one is chasing one's tail, going around in circles, and "didn't I already fix that one already!" From jimad at msn.com Wed Apr 7 12:25:02 2010 From: jimad at msn.com (Jim Adcock) Date: Wed, 7 Apr 2010 12:25:02 -0700 Subject: [gutvol-d] Re: the sitka book from fadedpage is now posted In-Reply-To: <42bf5.35a96207.38ed4cb7@aol.com> References: <42bf5.35a96207.38ed4cb7@aol.com> Message-ID: >but i'm not sure that that couldn't be turned around, assuming people are genuinely looking to be tested, in a sincere desire to improve. it is a bit repugnant, in my opinion... Don't see why this should be more repugnant than the other testings and scorings that DP does on people? Except maybe the assumption is that now one can always "hump it" for 50 pages and increase one's score enough to qualify for the next level -- and then slack back off into "cruise mode?" From Bowerbird at aol.com Thu Apr 8 01:42:07 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 8 Apr 2010 04:42:07 EDT Subject: [gutvol-d] how to create a spellcheck workflow Message-ID: <462cd.1b65839.38eef0df@aol.com> ok, well gee... i had gotten the firm impression that rfrank had read the d.p. forum thread where the development of their "wordcheck" was discussed (thoroughly -- 30 pages). d.p. calls it "wordcheck", but it's basically spellcheck. so why did i think he had read it? because he incorporated several of the things which i had suggested (to no good effect) in that thread... i won't bother reciting all of those particulars, since he might well have come up with the ideas himself, and nobody really cares anyway, not even me... but i mention it because now i have been convinced that roger must _not_ have read that thread, based on some current discussions on his fadedpage site... or, if he did read it, he didn't "get it", but given those current discussions, he might now be more receptive. so i will run through a quick little "how-to" refresher on how to design and build a spellcheck functionality, and incorporate it into the overall workflow of a book, so roger has the benefit of my wisdom here... ;+) as you'll see, this will hit a very wide variety of topics. note that what we're talking about here is _primarily_ executed during _preprocessing_, but there are some follow-on thoughts that apply to the proofing as well. *** 0. set o.c.r. parameters correctly. don't dehyphenate! name your files wisely. look both ways before crossing. 1. the first thing you should do with your o.c.r. results is to make the few global changes you can do _blindly_. the very first one is to strip trailing spaces from all lines. another will be to replace two spaces with a single space. yet another is to change a linebreak-doublequote-space combination to eliminate the space, since it's superfluous. likewise, change all cases of space-doublequote-linebreak. you get the drift. oh, by the way, don't do what d.p. does! _retain_ the runheads, and pagenumbers; you need 'em... 2. the next thing you should do is clean up the runheads. this has little bearing on our general "spellcheck" topic, but i include it here because it's always your second step. 3. the third thing on your list is to fix all paragraphing... again, not much to do with spellcheck, but it _is_ step #3. 4. now we can focus quite specifically on spellcheck stuff. we will take the o.c.r. and run it through a program i wrote which pulls out all the words _not_ present in its dictionary. i think rfrank has his own program that does the same thing, or something similar enough. i'll make mine available too. this is the first draft of your "bad-words" list for this book. (note this is not the same as how d.p. defines "bad words".) in this regard, use a good dictionary in this first check here. (this is something that rfrank hasn't done correctly thus far.) the dictionary i use is quite good, and it can be found here: > http://z-m-l.com/go/regulardictionary.txt just so you have a feel for the output from this program, i have posted it for the "sitka" book we've been discussing: > http://z-m-l.com/go/sitka/sitka-reversedictionary.txt that output was generated in 5 seconds, so it's pretty fast... 5. you'll see, on viewing your list of supposed "bad-words", that a bunch are not "bad-words" at all. some of 'em will be character-names, or jargon specific to your particular book. some will be hyphenated fragments, some compound-words. you can delete these words from the list now, if you want, but you shouldn't necessarily feel a great need to do that. if you're looking at the list of words from the "sitka" book, you will also notice that i separated the initial-caps words from all-lowercase ones. there is a good reason for that. due to proper names, _most_ initial-cap words are correct, whereas most of the all-lowercase words are _incorrect_... separating the lists makes it easier to focus your attention. 6. your dictionary-check program should also spit out the _frequency_ of each bad-word. you'll use that information to cull some words from this list of "bad-words". this you _will_ want to do, most definitely, so _sort_ on frequency. i fought tooth-and-nail on this with d.p. (i lost, of course), but you can take this to the bank that as long as there are a mere 4-plus occurrences of a specific string in the o.c.r., you can (i.e., should) delete it from this list of "bad-words". yes, some of the words _might_indeed_ be bad, but unless you're positive of that, you should delete 'em from the list. and yes, this means that those words will _not_ be flagged. but trust me, if there's 4 or more occurrences of a scanno, your proofers will find at least _one_ . and whenever they find _one_ "bad-word" that wasn't on that "bad-word" list, you will automatically search the rest of the book, and thus find those _other_ occurrences as well, so you can fix them. i did _not_ include frequency information in my "sitka" list, because i didn't want to make everything so bloody obvious, and because i want you to discover the importance of that frequency data for yourself, so it burns itself in your brain. 7. when you narrow your focus to the words that are _not_ in the dictionary, and which occur only two or three times in the book, you'll find you can be very productive fixing errors. for many of the words, it'll be obvious what they should be... building a tool that will take you _immediately_ to each word, plus show the scan alongside, will turn you into a _machine,_ an awesome and devastatingly efficient error-fixing machine. for this very first pass, i recommend you look only at words you're confident are scanning errors. (they're easy to spot.) on your next pass, you can look at more questionable words. also pay attention to words with several variants that'll thus sort next to each other. (see the asterisks in the "sitka" list.) it'll almost certainly be the case that one variant is a scanno. (97 times out of 100, it is the one with fewer occurrences.) also, in a system where you're gonna have proofers doing a word-by-word proofing, don't even bother to look at words which look kinda reasonable. and don't ever bother to view any words with only _one_ occurrence. those will be flagged, and it's no less efficient to have the proofer see if it's correct than to have _you_ see if it's correct. you wanna be efficient in preprocessing. efficiency is _the_point_ of preprocessing! 8. the next thing you want to look at are compound-words. my tool also separates compound-words to their own listing. as per usual, perusal of the compound-words will show that some are obviously correct, others are obviously incorrect, and a bunch where a judgment can only happen with a scan. the other thing about compound-words, which you will want your tools to handle, is a check against the rest of the book, to find any other instances of the compound where the parts are separated by a space (two words) or joined (one word)... that information will help you decide how to treat the word. 9. the other thing you're gonna check is end-line hyphenates. remember that i told you _not_ to have the o.c.r. rejoin them. my philosophy is to _retain_ the end-line hyphenates through my final product, but i'm not arguing for that position _here_. you can rejoin the end-line hyphenates if you want to do that. just don't do it until _after_ all of the proofing is done, because having original linebreaks makes a page much easier to proof. and besides, in determining whether or not the rejoined word contains a dash or not, we need to have uncompromised data. if your o.c.r. program destroys that data (by joining the word in the way that _its_ dictionary dictates), then you might just be doing a disservice to the way that the _book_ did things... for your spellcheck, however, you can ignore all of that stuff. internal to your spellcheck tool, rejoin the end-line hyphenate by eliminating the linebreak, and test the resultant compound. if it passes, fine. if not, try again with the dash removed too. if it _still_ doesn't pass, flag _both_ portions of the compound. (note that if you are using _my_ dictionary, mentioned above, it contains no compounds, so you would skip that first check.) 10. so at this point in time, you have a great "bad-words" list. that is half your battle. (that's right, just _half_.) so now you will make your "good-words" list. to do this, just run the text through your dictionary-checker using your "bad-words" list as the dictionary. thus, the output will be all of the words in your text which are _not_ included on your "bad-words" list. (or you can just use my tool, which can also create this list.) from now on, this "good-words" list will be your dictionary... got that? you're not using the huge dictionary file any more. you'll use the much-more-compact "good-words" list instead. so now you have a "bad-words" list and a "good-words" list, which -- taken together -- comprise all words in your book. there are other jobs that you'll do during preprocessing, but this is all the spellcheck work that's needed during that stage. 11. so now we will move on from preprocessing to proofing, which takes us to "flagging" -- highlighting possible errors... a word should be flagged if it appears in the "bad-words" list. a word _might_ be flagged if it's not on the "good-words" list, perhaps in a different way; for instance, yellow instead of red. notice that this is a slightly more nuanced way to do flagging. rather than flag _everything_ that _might_ be wrong, we are gonna flag _only_ the things we really _suspect_ are wrong... underflagging is better than overflagging, because too many flags makes us complacent; we start to check only the flags. it's impossible for your mind to ignore the fact that most of the flags are not actually errors, so it comes to expect that if something is _not_ flagged, it certainly won't be an error. but if you underflag, and the proofer spots an error that was _not_ flagged, it primes them to be attentive to everything... 12. the other thing that's extremely important here is that, when the book is finished, we should have resolved all flags. every word in the book will be there on the "good-word" list, and the "bad-word" list will have shrunk until it disappeared. that is, every bad-word will have been checked, and if it was "ok", it will have been moved to the "good-word" list, and if it was not "ok", it would have been _changed_, which will also remove it from the bad-words list. the words do _not_ have to be physically removed from the "bad-words" list, but if we check every word in the book on the "good-words" list, and find it fine, then the "bad-words" list will be eliminated. this complete "good-words" list is _useful,_ because we can run the full book against spellcheck at any time, and it will come out totally clean. so we do that check periodically, so we know that we haven't compromised the book's accuracy. oh, and just so you'll know, it's quite easy to write the code that does this check. you simply sort the words in the book, eliminating duplicates; then you sort the "good-words" list (if it's not already sorted), and eliminate its duplicates too (shouldn't be any); then the 2 outputs should be _identical._ this lets us envision the proofing process as movement of all words on the "bad-words" list to the "good-words" list. (put that image in your head; the visualization has utility.) to help facilitate that movement, you need to make it easy for proofers to put words on the "good-words" list, which is why -- on my proofing site -- i let them add all the words for a single page to the "good-words" list with one button-click. (another option is a button-click for each individual word.) the flip-side is that, in order to have a page be considered as "finished", all flagged words on that page _must_ be cleared. remember, words move from "bad-words" to "good-words". that's good enough for now. any questions on this? ;+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbuchana at teksavvy.com Thu Apr 8 16:30:33 2010 From: gbuchana at teksavvy.com (Gardner Buchanan) Date: Thu, 08 Apr 2010 19:30:33 -0400 Subject: [gutvol-d] Re: how to create a spellcheck workflow In-Reply-To: <462cd.1b65839.38eef0df@aol.com> References: <462cd.1b65839.38eef0df@aol.com> Message-ID: <4BBE6719.2070708@teksavvy.com> On 08-Apr-2010 04:42, Bowerbird at aol.com wrote: > _retain_ the runheads, and pagenumbers; you need 'em... > I imagine you'll eventually explain, but what use is preserving or fixing the page headings? If they can be mechanically fixed, they were not worth much. Do you include or exclude running headings in your word-count dictionary analysis? Does it matter? ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From Bowerbird at aol.com Thu Apr 8 21:56:16 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Apr 2010 00:56:16 EDT Subject: [gutvol-d] why isn't distributed proofreaders helping fadedpage? Message-ID: <88407.6b6f18ae.38f00d70@aol.com> why isn't distributed proofreaders helping fadedpage? the site needs proofers, and d.p. has an _excess_, to the degree that they are actively attempting to stunt the work being done by the p1 volunteers... so why not send some people over to fadedpage? rfrank has done a lot for distributed proofreaders. and fadedpage is showing d.p. how they can make a roundless system work, and that it _does_ work... so why is d.p. being so stingy? help your brother! -bowerbird p.s. and to start, some of you people who are now working at fadedpage can occasionally make a post in a forum at d.p. inviting people to try fadedpage... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 8 22:08:45 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Apr 2010 01:08:45 EDT Subject: [gutvol-d] Re: how to create a spellcheck workflow Message-ID: <8888c.1468b5c0.38f0105d@aol.com> gardner said: > what use is preserving or fixing the page headings? > If they can be mechanically fixed, they were not worth much. well, to take your second sentence first, the runheads cannot _always_ be mechanically fixed. sometimes they contain text that describes the current point in the chapter, like an outline, and aren't just limited to a boring recitation of title and author. and returning to the question, runheads are useful since they help you keep your bearings in the book. even when they are nothing but title/author, they help keep recto/verso straight... plus -- especially when you're working on multiple projects -- it's useful to have the reminders about which book you're in... because without 'em, after a time, the pages all look the same. > Do you include or exclude running headings in > your word-count dictionary analysis?? Does it matter? never paid much attention, so i guess it doesn't matter much. but runheads are definitely excluded from search operations. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 9 12:25:18 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Apr 2010 15:25:18 EDT Subject: [gutvol-d] so let's talk about my collaborative proofreading site Message-ID: <25a83.14c958b4.38f0d91e@aol.com> ok, so let's talk about my collaborative proofreading site... yeah, that's a real knee-slapper, isn't it? :+) ok, i'll talk, and y'all can just sit and listen. or go watch t.v. *** to see what i'm talking about, you can visit this u.r.l.: > http://z-m-l.com/go/sitka/editr.pl when you go there, you're proofing the "sitka" book... unlike other proofing systems, pages are not "assigned" to you. you can go and proof any page you wanna proof. even go back the next day and proof it again, if you like. you'll see a row of buttons at the top-left of the screen, which includes one that says "go", and "prev", and "next"... navigating the pages... "prev" and "next" are what you'd expect. they take you to the previous page or the next page in the book, obviously. at the end of that row of buttons, you'll see an edit-field that has a number in it. (or perhaps a letter and numbers.) that number is the pagenumber. (and, ergo, the filename.) if you want to jump to a specific page, put its number in the edit-field, and then click the "go" button. boom, you're there. the ability to navigate the book at will, and to proof any page you like, can be extremely powerful once you learn to use it. certifying a page as clean... there's also an "ok" button. that's what you will click when you've proofed a page and found nothing at all to change. by clicking "ok", you indicate you certify the page is clean. searching the book for a string... next to "ok" is another edit-field and a "find" button next to it. this field lets you enter a search-term, and then when you click "find", a screen appears that lists pages with that search-term... the line containing the word is shown, with a link to its page. if you want to visit any of these pages, you can open them in a separate tab/window. and it's fine to open a bunch at one time. for instance, say you want to check the sitka chapter-headers. enter the term "chapter" in the edit-field, and then click "find", and you will get a list that includes the table of contents page as well as each of the nine pages where each chapter starts... (red lines are a case-insensitive hit; black are case-sensitive.) there are many other ways you can use the search functionality, but we'll save the discussion about all of those for a later time... to return from the search-results page back to a proofing page, just use the "back" button in your browser. you'll have to clear the search-term out of the edit-field, or it'll do the search again. (i should make it so it only does the search when you click the "find" button, but that's not the way it works now, sorry folks.) feel the power with the "command" field... the edit-field where you enter your search-term also serves as a "command" field. i'll eventually have a number of commands that you can issue, but for now there's just a couple of them... showmap... the first command is "showmap". enter that, and click "find". the program will show the "map" of files comprising the book. each one is a clickable link, so -- as before -- you can open any pages you like in separate tabs and proof them just fine... ("listcat", short for "list catalog", is a synonym that works too.) concat... another command is "concat". enter that, and click "find", and the program will concatenate all the text files into one, and put them on a web-page for you. this will allow you to look at the entire book, save it to your machine, and so on... you can also use the browser's "find" command, so "concat" is useful when you need more context to a "find" operation than the single line output from the native find command... showcustom... a third command is "showcustom". this command spits out the "custom dictionary" that has been created for the book. *** that's enough for now. give you a little toy to play with over the weekend, if you like. we'll discuss more stuff next week. anyway, thanks for the little chat. so, how was the t.v. show? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 9 14:44:30 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 9 Apr 2010 17:44:30 EDT Subject: [gutvol-d] my report on that sitka posting Message-ID: ok folks, here's my book report on the "sitka" book which was done over at fadedpage, and posted to p.g. it's #31862 if you want to go and take a look at it: > http://www.gutenberg.org/files/31862/31862.txt *** i'll discuss the high points in the body of this message; appended you'll find the data documenting these points... *** 1. ...stealth scanno... the first thing to note is that a sharp-eyed proofer caught a stealth scanno. my system, which relies on spellcheck heavily, is susceptible to stealth scannos, so i'm always interested in their frequency, and whether or not they will cause a serious problem in comprehension. in this book, there was 1 stealth scanno, so they were infrequent in this book. further, it wasn't a bad one... a "heeded" was misrecognized as "needed", so no big deal. *** 2. ...publisher corrections made; here's a list of 15... rfrank made his digitization philosophy very clear here, as he made a number of _corrections_ to the original book. and i agree! i'm firmly against the "transcription" model, which holds that we should merely reproduce the print-book. i maintain we are _republishing_ the book, and therefore i strongly believe we need to correct any errors we find. and like me, roger doesn't bother with a "list of changes", either. i don't think it's necessary to make such a list. they're boring. just make the change and keep on moving... however, there is one aspect to note about this, which is that means rfrank sets a higher standard for his work, and if we find errors in the p-book which he should've found, then he is now accountable for having missed those errors. *** 3. ...more fixes, requiring research outside the book... to his credit, rfrank went out of his way to check things, so as to make sure that he was _only_ fixing actual errors, and not introducing bad changes. sure, it's not too hard to look up a sailor's name on wikipedia to spell it right, or to check that a word means "an edit from the czar", but it takes time, and it improves the output, so give kudos... *** 4. ...he even did "corrections" i might not have done... as is often the case when you adopt a "re-publication" philosophy as opposed to "transcription", some of the changes that you make might not get universal agreement. but that comes with the territory, so you roll with it. *** 5. ...and some i _certainly_ wouldn't have done... and you continue rolling, even when disagreement is thick. *** 6. ...and yet he didn't do some that i woulda done... everybody has their own opinions... :+) i won't count these as "errors". but _i_would_ do them differently... *** 7. ...and i disagree on how he did some formatting... once again, everybody's got an opinion... for the first 2 cases shown, i put a comma after the period, just so the following lowercase letter wouldn't throw a flag every time i did that check. the third case is a bit more complicated... i believe that the formatting rules used by rfrank -- and d.p. in general -- are slightly misguided... first of all, they are arbitrary, with wiggle-room, which means they're hard to understand and interpret. i think rules that are firmly grounded work better... second, in many situations the rules instruct us to put the formatting toggles _inside_ the word itself (at least inside the punctuation which surrounds it). in this case, for instance, the _italicized_ word is placed inside some _non-italicized_ parentheses. so an italic letter butts a non-italic parenthesis. but _none_ of the rendering engines we have today (like browsers and e-book apps) are capable enough to make that look reasonably good, let alone nice. indeed, most of the time it looks absolutely dreadful; a shift from non-italic to italic (and back) is ugly. so even when it doesn't "make sense" grammatically, my recommendation is to italicize the entire string, it looks better, so that's what i do; you should too. *** 8. ...the previous errors... ok, now we get down to the brass tacks. were there any errors in the posted book? well, yeah. i've already pointed out 6. they are repeated below, to refresh memory. but there are more errors... *** 9. ...there are hyphenation differences... rfrank has clearly shown that he is more than willing to make changes to the book, so as to correct errors, and for consistency. for instance, he changed several cases of the name "wrangel", consistent to "wrangell". so if we now _find_ errors or inconsistencies, even if they were present in the original book, we'll hold him accountable for not having made the proper corrections. it's only fair. and yes, the book had several hyphenation inconsistencies. rfrank may have found and fixed some, i don't know, but i _do_ know that there were some that he did _not_ fix... do you like the word "sealion"? or is "sea-lion" better? i like the second. the book was split 50/50. rfrank did 3 of them one way, then left the final 1 the other way... i'd call the error-count 2, but you can call it 1, or 3. there were 2 cases of "far-off" and 1 case of "far off", so i'd go with "far-off", and raise the error-count by 1. there was 1 "far-seeing" and 1 "far seeing". ironically, they're both on the same page. talk about short-sighted. call 1 of 'em an error. there was 1 "guest-house" and 1 "guesthouse", so 1 error. and 1 "ice-houses" and 1 "icehouses", so again, 1 error... so let's say 6 hyphenation errors, due to inconsistency... *** 10. ...and the other consistency errors... i show 11 other consistency errors, all on names... feel free to argue about any of them. if you can find someone who's willing to argue them, that is. so far, 17 new errors... *** 11. ...and finally, the last 2 errors... there was 1 case of some missed formatting, as shown clearly in the scan for this page: > http://z-m-l.com/go/sitka/sitkap025.png _barabaras_, barabaras, ^^^^^^^^^^ it's humorous to note that this missed italic was on the same page as the missed italic i reported before. but nobody said anything about "barabaras" earlier... last but not least, 1 misspelled word: "gooch-heat", which should have been "gooch-haet" -- wolf-house... yes, it was printed wrong, but it shoulda been caught. so 2 more, for a sum of 19 new errors, plus the original 6, making a grand total of 25. that's twice my usual standard -- 1 error per 10 pages -- but since _most_ of these errors were also present in the p-book (along with many more which were corrected by rfrank), i'd call this a good digitization. *** in conclusion... many people will say that most of these errors are trivial. i wouldn't argue with them. at the same time, i'm compulsive enough to fix the trivial. i don't want any errors in my books. even "trivial" ones... i'm guessing that roger will want to fix these errors too. i'm also guessing that roger will say most of these errors can be laid at the feet of the postprocessor, so that his "roundless" system has passed this test with flying colors. and, in a sense, he's right, in that most of these errors _should_ be located within the realm of the postprocessor. (or -- to once again stress my model -- the preprocessor.) but to give the postprocessor the _luxury_ of enough time to do the job of catching errors like these, we must have the proofers perform as many duties as they possibly can. when a page can conceivably be seen by only _one_ proofer, then a postprocessor simply can't have enough faith that each page is correct, and will inevitably perform checks they would not do if they believed every page was solid... if you're worried about simple things like missing italics, you just won't have the focus to look for more subtle stuff. use your proofers to make sure every page is rock-solid. give the postprocessor the luxury to _polish_ the book... -bowerbird p.s. here's a "bonus error", thrown in for good measure. "blockhouse" isn't capitalized elsewhere in the book, and if we did capitalize it here, we'd capitalize "upper" too. > near the site of the upper blockhouse. Her successor, > near the site of the upper Blockhouse. Her successor, > ===========================^^^^^^^^^^^^^^^^^^^^^^^^^^ =============================================================== 1. ...stealth scanno... > and was needed. Captain A. Holmes A'Court, > and was heeded. Captain A. Holmes A'Court, > ========^================================= 2. ...publisher corrections made; here's a list of 15... > Narative of a Voyage Round the World. > Narrative of a Voyage Round the World. > ===^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > pupil.' > pupil.'" > =======^ > an air of prosperity prevaded the place. > an air of prosperity pervaded the place. > ======================^^================ > Her successor, the sceond Princess > Her successor, the second Princess > ====================^^============ > in these waters bought skins for mere trifles, some for > in these waters, bought skins for mere trifles, some for > ===============^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > bath house > a bath house > ^^^^^=^^^^ > island but > islands but > ======^^^^ > ungents, combine to make > unguents, combine to make > ===^^^^^^^^^^^^^^^^^^^^^ > Hudson Bay Co. the Russian ships that sailed > Hudson's Bay Co. the Russian ships that sailed > ======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > of the educational institution of Sitka. > of the educational institutions of Sitka. > ========================^^^^^^^ > The liquid refreshments serve to him > The liquid refreshments served to him > =============================^^^^^^^ > the villianous liquor called "hoochinoo" > the villainous liquor called "hoochinoo" > ========^^============================== > island to the broad Pacific. What were the thoughts > islands to the broad Pacific. What were the thoughts > ======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > or "Ranche," there is choice of two streets, > or "Ranche," there is a choice of two streets, > ======================^^^^^^^^^^^^^^^^^^^^^^ > and three who names I do not know. > and three whose names I do not know. > =============^^^^^^^^^^^^^^^^^^^^^ 3. ...more fixes, requiring research outside the book... > Priest Vasili Michaeloff Ocueredin, > Priest Vasili Michaeloff Ocheredin, > ===========================^======= > of the harvest. Her captain, Entienne Marchand, > of the harvest. Her captain, Etienne Marchand, > ==============================^^^^=^^^^^^^^^^^ > ukaze, > ukase, > ===^== 4. ...he even did "corrections" i might not have done... > all the Russians to the Republic > all the Russias to the Republic > ==============^^^^^^^^^^^^^^^^^ > of the American troops, Gen. Jeff C. Davis, > of the American troops, Gen. Jefferson C. Davis, > =================================^^^^^^^^^^ > is called Kosters Trail. The first > is called Koster's Trail. The first > ================^^^^^^^^^^^^^^^^^^ 5. ...and some i _certainly_ wouldn't have done... > articles, carved with totemic design > articles, carved with totemic designs > ==================================== 6. ...and yet he didn't do some that i woulda done... > --The Author > The Author > ^^^^^^^^^^^ > in the Bering Sea, > in Bering Sea > ===^^^^^^^^^^ 7. ...and i disagree on how he did some formatting... > to the Hudson's Bay Co., the Russian ships that > to the Hudson's Bay Co. the Russian ships that > =======================^^^^^^^^^^^^^^^^^^^^^^^ > Total, 400. Ib., p. 52. > Total, 400. Ib. p. 52. > ===============^^^^^^^ > of New Archangel _(Novo Arkangelsk),_ > of New Archangel (_Novo Arkangelsk_,) > =================^^===============^^^ 8. ...the previous errors... > _yukali_ (dried salmon), > yuhali (dried salmon), > ^^^^^^^^^^^^^^^^^^^^^^ > _prazdnik_ (holiday) > prasdnik (holiday) > ^^^^^^^^^^^^^^^^^^ > _Sheetkah_, > Sheetkah, > ^^^=^^^^^ > Seattle Intelligencer, December 14th, 1868; > Seattle Intelligencer, December 14th. 1868; > ======================^^^^^^ 9. ...there are hyphenation differences... sea-lions (#19, but at line-end) sealion (#29) sealion (#48) sea-lion (#101) > and sea-lion meat from Kodiak, and > and sealion meat from Kodiak, and > =======^^^^^^^^^^^^^^^^^^^^^^^^^^ > adorned with sea-lion heads > adorned with sealion heads > ================^^^^^^^^^^ far off (#35) far-off (#42) far-off (#80) > in the far-off possession of the Czar. > in the far off possession of the Czar. > ==========^=========================== far seeing (#16) far-seeing (#16) (yes, inconsistent on the same page) > wealthiest and most far-seeing of the leaders > wealthiest and most far seeing of the leaders > =======================^^^^^^^^^^^^^^^^^^^^^^ > being entertained in the guest-house were ...but yet... > went to the guesthouse of the kwan. All the > The ice-houses were near the outlet of ...but yet... > icehouses was laden on the ship 250 tons, and 10. ...and the other consistency errors... > in Chicagof Island, sent his ship's boat > in Chichagoff Island, sent his ship's boat > =======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Admiral Chicagof, Minister > Admiral Chichagof, Minister > ============^^^^^^^^^^^^^^ > and Chicagof." (A Voyage Round the World, Lisianski, p. 235.)] > and Chichagof." (A Voyage Round the World, Lisianski, p. 235.)] > ========^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > of Golden California. Captain Hagemeister came to relieve him, > of Golden California. Captain Hagmeister came to relieve him, > =================================^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > of St. Michael > of St. Michaels > ============== > stir the lust for vengeance. The Keeksitties, > stir the lust for vengeance. The Keeksittis, > ==========================================^^ > [Footnote 19: Globokoe Lake was sounded to > [Footnote 19: Golobokoe Lake was sounded to > ===============^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Prince Dmitri Maksoutoff, Dec. 2, 1863, to Oct. 18, 1867.] > Prince Dmitri Maksoutof, Dec. 2, 1863, to Oct. 18, 1867.] > =======================^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Annahootz, the friendly Kokwanton war chief, > Annahootz, the friendly Kokwantan war chief, > ===============================^============ for these two, we'd have to research which variant is correct: ...in one place, we have one spelling... > the mouth of the Indian River, or _Kolosh Ryeku_. ...but in another place it's spelled differently... > the river, known as the _Kolosh Ryeka_, by the Russians ...in one place, we have one spelling... > Captain Leontius Andreanovich Hagemeister ...but in another place it's spelled differently... > Leonti Andreanvich Hagemeister, Jan. 11, 1818, to Oct. 24, 1818. ...getting now down to some smaller points... ...for consistency, and clarity, i'd change these lines... > Total, 400, Ib. p. 52. > their own account. Id. Vol. 2, p. 38. ...i would use "ibid" in both, replacing the "ib." and "id."... ...but i wouldn't call a "failure" to do this "an error"... ...for consistency, i would also change this reference... > Russian American Archives. Corr. Vol. I, p. 275. ...so that it matched with this reference... > Russian American Archives, Correspondence, Vol. II, No. 108. ...but i wouldn't count a "failure" to do so as "an error"... ...and, to finish off this consistency section... ...i'd do research to find out how these archives are named... > Russian American Archives. Corr. Vol. I, p. 275. > their own account. Id. Vol. 2, p. 38. ...because in the top line, it's "vol. i", roman-style... ...while in the bottom line, it's "vol. 2", arabic-style... ...obviously, one of these versions is incorrect... ...but so much for the trivial obsessive-compulsive points... 11. ...and finally, the last 2 errors... _barabaras_, barabaras, ^^^^^^^^^^ gooch-heat gooch-haet ========^^ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cannona at fireantproductions.com Sat Apr 10 20:45:39 2010 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sat, 10 Apr 2010 22:45:39 -0500 Subject: [gutvol-d] Project Gutenberg DVD Release Candidate now available for testing Message-ID: Hi all. Finally, I have a release candidate of the new Gutenberg DVD available for testing. Currently, it is only available via BitTorrent. Please download and test, and let me know if you find any bugs. At this point, I'm not looking for suggestions on which new titles to add, but any other feedback is welcome and appreciated. I would not yet recommend distribution of this DVD image for any purpose other than testing. Please do not seed this torrent after April 30, because by then, we should have an official version available. This image is an .iso for a dvd-9, so you will either need a way to open or mount .iso files, or a dual layer DVD burner. You can download the torrent via the following magnet link: magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI or if your client doesn't speak magnet links, you can download the torrent file from: http://www.fireantproductions.com/pgdvd201004-rc1.torrent Thanks in advance for all the feedback. Aaron Cannon From cannona at fireantproductions.com Sun Apr 11 06:14:09 2010 From: cannona at fireantproductions.com (Aaron Cannon) Date: Sun, 11 Apr 2010 08:14:09 -0500 Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now available for testing In-Reply-To: References: Message-ID: There was a problem with the first torrent I uploaded. It did not work in some older clients. I have fixed it, so if you downloaded the torrent, you may wish to redownload it. Thanks. Aaron On 4/10/10, Aaron Cannon wrote: > Hi all. > > Finally, I have a release candidate of the new Gutenberg DVD available > for testing. Currently, it is only available via BitTorrent. Please > download and test, and let me know if you find any bugs. At this > point, I'm not looking for suggestions on which new titles to add, but > any other feedback is welcome and appreciated. > > I would not yet recommend distribution of this DVD image for any > purpose other than testing. > > Please do not seed this torrent after April 30, because by then, we > should have an official version available. > > This image is an .iso for a dvd-9, so you will either need a way to > open or mount .iso files, or a dual layer DVD burner. > > You can download the torrent via the following magnet link: > magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI > > or if your client doesn't speak magnet links, you can download the > torrent file from: > http://www.fireantproductions.com/pgdvd201004-rc1.torrent > > Thanks in advance for all the feedback. > > Aaron Cannon > From gbnewby at pglaf.org Sun Apr 11 23:42:13 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 11 Apr 2010 23:42:13 -0700 Subject: [gutvol-d] Re: Newby/Hart at Illinois symposium April 15-16 In-Reply-To: <20100316021403.GA26102@pglaf.org> References: <20100316021403.GA26102@pglaf.org> Message-ID: <20100412064213.GA11084@pglaf.org> We just heard this will be webcast. I do not know whether recordings will be made available later. http://go.illinois.edu/50years The schedule is at the conference site. Michael and I are scheduled for 1:30-3pm (CDT) Thursday April 15. http://50years.lis.illinois.edu/ -- Greg On Mon, Mar 15, 2010 at 07:14:03PM -0700, Greg Newby wrote: > For those in the region, this might be of interest: > http://50years.lis.illinois.edu/ > > PGLAF CEO Greg Newby will join PG founder Michael Hart > at a symposium on the U. Illinois campus. Registration > is free but limited. The panel with Michael & Greg is > scheduled for Thursday April 15 from 1:30-3pm. > > -- Greg From Bowerbird at aol.com Mon Apr 12 01:20:25 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Apr 2010 04:20:25 EDT Subject: [gutvol-d] more notes from the merry-go-round Message-ID: <6576c.204495ee.38f431c9@aol.com> gosh i love a merry-go-round... :+( *** here's a few notes, repeated too many times already, but whatchagonnado? if this doesn't make sense to you, that's probably because it wasn't intended for you, so pay no mind. *** any heavy bracket markup like [i]italics[/i] is obtrusive, and that's why proofers complain about it, justifiably... use light markup, like _italics_, and they won't complain. better yet, have them do the proofing with an .html field -- using a text-area field only when editing is required -- and you can use _real_ italics (and highlight it with color). you can also use _color_ to flag your questionable words, which makes the flagging a whole lot harder to "miss"... *** spacey double-quotes are _easy_ to solve if you pay heed to the _paragraphs_. ergo... pay heed to the paragraphs! yes, paragraphs cross page-boundaries... but so what? you start with the first file, and keep track of paragraphs while you proceed to the second, and the third, and so on. there's a ton of redundancy in the quotes -- open, close, open, close, open, close. make use of that redundancy... it's not rocket-science; it's not even difficult programming. get over the mind-blockage you have on this topic. *** and once you _do_ get over that mind-blockage, you just might see that books that use _single-quotes_ for dialog aren't really all that different. "but", you are sputtering, "yes they are, because contractions cause big difficulties!" poppycock. here's a file with a list of contractions (among other stuff): > http://z-m-l.com/customdictionary.txt that file's been up since june of 2007. use that list intelligently to control those contractions... possessives also use the single-quote, but they're easy to deal with as well; just do a little thinking about them. don't give up so easily. _try._ you'll find it's not as hard as you thought. and if you actually honestly _try_, and hit a wall anyway, show me your actual honest efforts, and i will help you... *** "probable markup on this page"? are you purposely trying to be vague? how about listing the italicized words, _specifically_? geez! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 12 14:26:55 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 12 Apr 2010 17:26:55 EDT Subject: [gutvol-d] spell-check functionality and the sitka book Message-ID: <1f6e.3591e780.38f4ea1f@aol.com> in discussing how to bring about a spell-check functionality, i talked about the "bad-words" list as a part of that workflow. i posted the "bad-words" list from the o.c.r. for the sitka book. > http://z-m-l.com/go/sitka/sitka-reversedictionary.txt i talked about how the proofing process can be envisioned as movement from the "bad-words" list to the "good-words" list. now that rfrank has posted the "final" version at p.g., we can do the same "dictionary-check" procedure on his finished book: > http://z-m-l.com/go/sitka/sitka-reverseposted.txt you'll see i have introduced blank lines into these files, so as to coordinate their lines so they can be merged into one file. this will allow us to see how each word traversed the process. the merged file is here: > http://z-m-l.com/go/sitka/sitka-reverse-review.html *** go ahead and take a quick look at the words... one thing you might notice is that a number of the words are marked with asterisks. those are the ones that are suspicious, in that they appear to be _variants_ of each other, which might be a good indication of either (1) an o.c.r. misrecognition, or (2) an inconsistency in the p-book which should be corrected... (some people use levenshtein edit-distance to find these words. that's nice, but a plain old review of a sorted list works well too.) i checked those variants against the p-book, and many of them were indeed errors. (the ones marked "ok" at the end were right.) that's how i found many of these consistency errors rfrank missed. oh, by the way, there is still at least one more of those errors that i didn't mention earlier... so if anyone wants to go looking for it... ok, so let's go on to look at the list in other ways... *** for each individual word, we're gonna see how it was handled... the first group are some words that had garbage characters. those words didn't have any direct equivalent in the final file, at least none that were close enough to be sorted similarly... once we get into the lowercase words, we get some matching. and that continues as we get into the words with an initial cap, and on into the compound-words, and then into the numbers... focus on the initial-cap words -- which are primarily names -- and you'll see that the vast majority of these were recognized correctly, in that they persisted through to the final version... about 85% of the initial-cap words were recognized correctly. the same is true of the compound-words, with few exceptions. and most numbers seem to have been recognized correctly too. but lowercase words are more of a mixed bag. some were right, it's true, but a relatively high percentage of them were incorrect, as evidenced by the fact they were changed, one way or another. only about half the lowercase words were recognized correctly. you might remember that i had predicted precisely this pattern. most lowercase o.c.r. words that are in the "bad-words" list are generally misrecognitions, while most initial-cap words are not. indeed, the percentage of correct lowercase words in this o.c.r. was much higher than is normal, because this was not typical o.c.r., in the sense that the scans were clean, but also because rfrank probably did some preprocessing on the raw o.c.r. text, which we can tell because it had very few garbage characters... so what we see is that roughly 75% of these words were _correct_, despite the fact they were "bad-words" (i.e., not in the dictionary). they weren't in the dictionary, but they were "good" in this book... that so many "bad-words" can be correct and unique to the book is why it's important to use a "custom" book-specific dictionary. only about 25% of the "bad-words" were actually misrecognitions. to the extent that you can narrow down your _flagging_ of the "bad-words" to the ones that are _really_ bad, you can relieve your proofers of a _lot_ of unnecessary flags, which is _good_, because false flags sap the attention of proofers unnecessarily. once you've done this analysis a number of times, like i have, you'll come to recognize that it is a very important analysis... -bowerbird p.s. hey, dkretz, thanks for the shoutout! but one correction! i wasn't "baited" into "crossing a line" that got be banned at d.p. i never get "baited" into _anything_. i always know what i'm doing. and i've been banned from enough places that i know how it works, so it wasn't that i "crossed a line". again, i know what i'm doing... no, if i get banned from somewhere, it's something i _anticipated_, and after a consideration of that outcome, decided it didn't matter. which is _not_ to say that i _like_ to get banned, or that i _try_ to, but is _rather_ to say that i won't allow myself be banned _unless_ i have decided that i don't really care whether i'm banned or not... as for "crossing a line", there's no need for it. even though people will generally say that i broke some technical rule and that is why i was banned, the truth of the matter is that one only gets banned if one pisses off the person with the power to push the ban button. it has nothing at all to do with "the rules"... it's just raw emotion... oh, and let me say one more thing; it's _nice_ that d.p. still lets me come to their forums and read them. if they tried to prevent that, i _could_ get around it, but it's a hassle. so i thank them for that... -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Tue Apr 13 00:32:53 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Tue, 13 Apr 2010 00:32:53 -0700 Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now available for testing In-Reply-To: References: Message-ID: <20100413073253.GA1262@pglaf.org> On Sat, Apr 10, 2010 at 10:45:39PM -0500, Aaron Cannon wrote: > Hi all. > > Finally, I have a release candidate of the new Gutenberg DVD available > for testing. Currently, it is only available via BitTorrent. Please > download and test, and let me know if you find any bugs. At this > point, I'm not looking for suggestions on which new titles to add, but > any other feedback is welcome and appreciated. Thanks, Aaron. This is beautifully done - congratulations! Has a link checker been run on this? I quickly found a missing file via file:///PGDVD_2010_04_RC1/etext/3002.html and wonder whether some filetypes or other content might not have made it. I am really impressed at the number of titles! I have temporarily put the ISO here: http://pglaf.org/PGDVD201004-RC1/ ...but don't try to download via HTTP unless you have a fast 'net connection. Use the .torrent instead. And you can browser the disc contents here (fast connection not needed): http://pglaf.org/PGDVD201004-RC1/content/ Thanks again! I will confirm this burns onto a DL DVD. -- Greg > I would not yet recommend distribution of this DVD image for any > purpose other than testing. > > Please do not seed this torrent after April 30, because by then, we > should have an official version available. > > This image is an .iso for a dvd-9, so you will either need a way to > open or mount .iso files, or a dual layer DVD burner. > > You can download the torrent via the following magnet link: > magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI > > or if your client doesn't speak magnet links, you can download the > torrent file from: > http://www.fireantproductions.com/pgdvd201004-rc1.torrent > > Thanks in advance for all the feedback. > > Aaron Cannon > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From greg at durendal.org Tue Apr 13 06:04:31 2010 From: greg at durendal.org (Greg Weeks) Date: Tue, 13 Apr 2010 09:04:31 -0400 (EDT) Subject: [gutvol-d] [SPAM] PGDP down? Message-ID: Is PGDP down, or is something with just my link? -- Greg Weeks http://durendal.org:8080/greg/ From sankarrukku at gmail.com Tue Apr 13 07:34:57 2010 From: sankarrukku at gmail.com (Sankar Viswanathan) Date: Tue, 13 Apr 2010 20:04:57 +0530 Subject: [gutvol-d] {Disarmed} Re: [SPAM] PGDP down? In-Reply-To: References: Message-ID: Yes. It is down. Sankar On Tue, Apr 13, 2010 at 6:34 PM, Greg Weeks wrote: > > Is PGDP down, or is something with just my link? > > -- > Greg Weeks > http://durendal.org:8080/greg/ > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -- Sankar Service to Humanity is Service to God -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Tue Apr 13 07:36:45 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 13 Apr 2010 07:36:45 -0700 Subject: [gutvol-d] {Disarmed} Re: [SPAM] PGDP down? In-Reply-To: References: Message-ID: It is down. On Tue, Apr 13, 2010 at 6:04 AM, Greg Weeks wrote: > > Is PGDP down, or is something with just my link? > > -- > Greg Weeks > http://durendal.org:8080/greg/ > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donovan at abs.net Tue Apr 13 10:05:53 2010 From: donovan at abs.net (D Garcia) Date: Tue, 13 Apr 2010 12:05:53 -0500 Subject: [gutvol-d] Re: PGDP down? In-Reply-To: References: Message-ID: <201004131305.54424.donovan@abs.net> > Is PGDP down, or is something with just my link? At o'dark-thirty EDT, pgdp.net suffered a kernel panic during backup and hard- locked instead of self-rebooting as it is configured to normally do. The hosting company has power-cycled the machine for us, and I am monitoring remotely while integrity checking runs on our filesystems. The system will continue to be unavailable for several hours while this and other validations take place. David (donovan) From Bowerbird at aol.com Tue Apr 13 13:28:51 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 13 Apr 2010 16:28:51 EDT Subject: [gutvol-d] so let's talk about my collaborative proofreading site, part 2 Message-ID: <2ab43.7bb18f9.38f62e03@aol.com> here's more info on my collaborative proofreading site... *** to see what we're talking about, you can visit this u.r.l.: > http://z-m-l.com/go/sitka/editr.pl *** we talked about 4 main topics last week: > navigating the pages... > certifying a page as clean... > searching the book for a string... > feel the power with the "command" field... under the 4th topic -- the command field -- we discussed 3 of the commands you can issue: > showmap... > concat... > showcustom... today we'll discuss a few more commands... *** blubberbaby... you'll remember that i also discussed how you can implement spellcheck functionality in your workflow. one key to this is creating a custom dictionary for each specific book that you are digitizing, one that contains the words unique to that particular book... at first, you'll have a "bad-words" list, which contains low-frequency words not found in a regular dictionary. the other list is the "good-words" list, which contains high-frequency words plus those in a regular dictionary. the process of correcting the book is one of _moving_ items on the "bad-words" list to the "good-words" list, either by certifying o.c.r. did recognize them correctly, or by correcting the misrecognition to what it should be. (or, in the case of an error by the publisher, correcting it.) what is handy, for this process, is knowing which pages have words that are contained on the "bad-words" list. you _could_ navigate through each of the pages, to see which ones have flagged words, which are shown in red. but why not have the computer just tell us what they are? voila the next command, christened "blubberbaby", to honor alaska, for this sitka book. enter "blubberbaby" in the search-field and click "find", and in a little while -- it's not yet unoptimized, so it's about 20 seconds -- you will be shown a page that includes all of the pages that have words which are still on the "bad-words" list... from that display-page, you can use the links there to open a number of these pages -- each in its own tab -- and work on them to deal with all of the flagged words. questionable words should be handled in preprocessing, for the most part, so if the workflow is designed correctly, you won't need to use this "blubberbaby" command often. but it's useful to have it, so you can do the check if desired. and if questionable words were not fixed in preprocessing, then you'll find "blubberbaby" to be even more important. *** pairsearch... you'll remember when i was discussing _inconsistencies_ in the sitka book that i used the "bad-words" list to find possible problems. specifically, when two variants of a word (usually a name) came up sorted next to each other, it was easy to spot 'em and tell that they needed checking. here are a few of them, so you can see what i mean... > Globokoe************** > Golobokoe************** > ... > Golofnin************** > Golovin************** > ... > Hagemeister************** > Hagmeister************** it's pretty obvious that these _might_ be inconsistencies... not all of them are. for instance, "golofnin" and "golvin" were -- apparently -- the names of two different people. but the others were errors made by the original printer, errors that coulda been caught (i caught 'em) and fixed. what you have to do, though, to check these pairs out, is go to the actual pages where they appear, and read the text, so as to determine the correct course of action. now, with the search capability, it's fairly easy to do that. you just enter each term, and then click on the links to open up the pages where that term appears. fairly easy. but that can get a bit tiresome if you have a lot to check. so i programmed this "pairsearch" command to help out. you enter the command "pairsearch", followed by pairs of terms that you want to search for, and the program presents the relevant pages to help you make a decision. so, for instance, for the three pairs above, you'd enter: > pairsearch hagemeister hagmeister golofnin golovin globokoe golobokoe the search-terms can be separated by spaces or line-ends. the output from that search is appended to this message. the lines are long, and will likely wrap, so it's also here: > http://z-m-l.com/go/sitka/pairsearch-output.html the pagenames aren't linked now, but eventually will be. this "pairsearch" command can be extremely useful in resolving inconsistencies within the book, both those introduced by o.c.r. and those by the original publisher. one more note... remember that publishers back in the old days didn't have the wonderful tools that we now have at our disposal, so it's no wonder that they had some problems when it came to words like "globokoe" and "golobokoe", or russian names. i'm sure if i had to use the primitive tools they had back then, i'd be making 3 times as many errors as they made, or more... *** end-page-hyphenates d.p. has proofers mark end-page-hyphenates with an asterisk. i'm not sure why they feel that's necessary. the computer can find end-page-hyphenates just fine. here's a routine to do it. put the command "end-page-hyphenates" in the search-field, and then click "find", and you'll get a list of where they occur. the list has links for both pages, containing both fragments... for this book, you'll get this: > sitkap002.txt ... and ... sitkap003.txt > sitkap007.txt ... and ... sitkap008.txt > sitkap018.txt ... and ... sitkap019.txt > sitkap019.txt ... and ... sitkap020.txt > sitkap021.txt ... and ... sitkap022.txt > sitkap027.txt ... and ... sitkap028.txt > sitkap043.txt ... and ... sitkap044.txt > sitkap051.txt ... and ... sitkap052.txt > sitkap077.txt ... and ... sitkap078.txt > sitkap079.txt ... and ... sitkap080.txt > sitkap083.txt ... and ... sitkap084.txt > sitkap087.txt ... and ... sitkap088.txt > sitkap102.txt ... and ... sitkap103.txt those pagenames are clickable, and take you to that page... there isn't a lot of reason you need to check those fragments, since the computer will also rejoin 'em if you unwrap the text. but if i didn't include this functionality, you know _someone_ would say "yeah, but your system doesn't do _this_, does it?" so now i can say, "well yes, as a matter of fact, it _does_..." *** so, we've added "blubberbaby" and "pairsearch" commands, as well as "end-page-hyphenates"; that's enough for today. by now, you should have a pretty good feel on how we will continue to implement functionalities as they are needed... we'll discuss more stuff as i get it put into place... -bowerbird p.s. here's the output from the "pairsearch" command above: > .....here it is, in order of appearance in the book: > > globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe (Deep) Lake; the island-studded > hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson Hagemeister Jamestown Kashavaroffs Katle > hagemeister ... sitkap009.txt ... g instructions previously given to Hagemeister, instructing him to find the > golofnin ... sitkap032.txt ... r the command of Captain Vasili M. Golofnin, who was widely known for his a > golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin soon left Sitka to return to St > hagmeister ... sitkap042.txt ... ills of Golden California. Captain Hagmeister came to re- lieve him, and in > golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, Voyage of the Sloop "Kamchatka > golofnin ... sitkap060.txt ... ccording to the account of Captain Golofnin, it was an establishment well b > golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, pp. 72-73. [[72]] > globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on Globokoef[72-2] (Deep) Lake, ground the > golobokoe ... sitkap072.txt ... f the present improvement. [72-2] Golobokoe Lake was sounded to a depth cf > hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich Hagemeister, Jan. 11, 1818, to Oct. 24, > globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe Lake.-- Southwest from Sitka ab > globokoe ... sitkap106.txt ... re in the rocky wall which divided Globokoe, or Deep Lake, from the sea, an > > > .....and sorted, by search-term: > > globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe (Deep) Lake; the island-studded > globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on Globokoef[72-2] (Deep) Lake, ground the > globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe Lake.-- Southwest from Sitka ab > globokoe ... sitkap106.txt ... re in the rocky wall which divided Globokoe, or Deep Lake, from the sea, an > > golobokoe ... sitkap072.txt ... f the present improvement. [72-2] Golobokoe Lake was sounded to a depth cf > > golofnin ... sitkap032.txt ... r the command of Captain Vasili M. Golofnin, who was widely known for his a > golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin soon left Sitka to return to St > golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, Voyage of the Sloop "Kamchatka > golofnin ... sitkap060.txt ... ccording to the account of Captain Golofnin, it was an establishment well b > > golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, pp. 72-73. [[72]] > > hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson Hagemeister Jamestown Kashavaroffs Katle > hagemeister ... sitkap009.txt ... g instructions previously given to Hagemeister, instructing him to find the > hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich Hagemeister, Jan. 11, 1818, to Oct. 24, > > hagmeister ... sitkap042.txt ... ills of Golden California. Captain Hagmeister came to re- lieve him, and in > > > .....and sorted again, this time in the order in which they were entered: > > hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson Hagemeister Jamestown Kashavaroffs Katle > hagemeister ... sitkap009.txt ... g instructions previously given to Hagemeister, instructing him to find the > hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich Hagemeister, Jan. 11, 1818, to Oct. 24, > > hagmeister ... sitkap042.txt ... ills of Golden California. Captain Hagmeister came to re- lieve him, and in > > > golofnin ... sitkap032.txt ... r the command of Captain Vasili M. Golofnin, who was widely known for his a > golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin soon left Sitka to return to St > golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, Voyage of the Sloop "Kamchatka > golofnin ... sitkap060.txt ... ccording to the account of Captain Golofnin, it was an establishment well b > > golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, pp. 72-73. [[72]] > > > globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe (Deep) Lake; the island-studded > globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on Globokoef[72-2] (Deep) Lake, ground the > globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe Lake.-- Southwest from Sitka ab > globokoe ... sitkap106.txt ... re in the rocky wall which divided Globokoe, or Deep Lake, from the sea, an > > golobokoe ... sitkap072.txt ... f the present improvement. [72-2] Golobokoe Lake was sounded to a depth cf > > > --30-- -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at durendal.org Wed Apr 14 04:44:17 2010 From: greg at durendal.org (Greg Weeks) Date: Wed, 14 Apr 2010 07:44:17 -0400 (EDT) Subject: [gutvol-d] [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> Message-ID: On Sat, 13 Feb 2010, Sparr wrote: > Remnants of glue left on the edge of a page (from the removed spine) > get stuck to the inside of the scanner or feeder, ruining the scans of > subsequent pages until the scanner is cleaned. I use a knife to cut the binding off rather than try to separate the pages. A plough knife is actually made for this. I've had pretty good results with a standard construction razor knife. -- Greg Weeks http://durendal.org:8080/greg/ From hart at pglaf.org Wed Apr 14 06:17:35 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 14 Apr 2010 06:17:35 -0700 (PDT) Subject: [gutvol-d] !@! #17135 Twas The Night Before Christmas Message-ID: ***** This file should be named 17135-h.htm or 17135-h.zip ***** This and all associated files of various formats will be found in: http://www.gutenberg.org/1/7/1/3/17135/ Produced by Janet Blenkinship, Suzanne Shell and the Online Distributed Proofreading Team at http://www.pgdp.net Where the first characters of stanza were "illuminated" they were eliminated and never replaced in the htm. version. From richfield at telkomsa.net Wed Apr 14 07:42:47 2010 From: richfield at telkomsa.net (Jon Richfield) Date: Wed, 14 Apr 2010 16:42:47 +0200 Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> Message-ID: <4BC5D467.4000407@telkomsa.net> FWIW, Since I bought myself a digital camera for general use, plus a copy of Omniscan, my scanner has been pretty well idle. As it happens, the camera is a 12 megapixel model, but for most purposes I find it better to set it to 8 MP or even less. Also, for most books I set the mode to black and white. It is best for mass input either to get a tripod as well, or to buy some sort of cheap plastic stand and mutilate it into a camera stand. I have been using a kindergarten table into the top of which I cut a camera-shaped hole with a hobby knife and due caution. Avoid buying s cheerfully coloured stand, because if you happen to need colour shots it can seriously affect the picture. The best is translucent white or grey, or possibly transparent. Grey or black are not too bad if illumination is no problem. Then it is just a matter of setting manual focus and clicking away till done. The table is very light and firm and I have had no problems with unsteadiness. Obviously one chooses a suitable surface to work on, so that glue and similar pollutants are not a consideration. There are of course umpteen variations on the theme. You might prefer stands and clips to hold the objects erect. You might buy a second-hand camera economically, but do make sure that it will take a suitable memory module, the larger the better. SD cards are very good, especially if have a reading USB attachment. I got one pretty cheap. The main regret is that I didn't get a mains adapter to power the camera while one was still available. As it stands I simply use rechargeable NIMH batteries of the right size. Remember: the power burden is much heavier than most other photographic activities. There are some definite advantages over the scanner, even though modern scanners are remarkably good. Fewer moving parts for one. (once you have the camera set up, it is only the button and the shutter that move! ) Unless you have a scanner with an automatic feed, the speed is better too, plus, there are few books that you need mutilate to photograph them. Another luxury, though I have not in practice needed it, is that the camera can be set to various degrees of resolution. For most purposes very modest resolution is far more than adequate, but if you should need more than you can get from a single shot, then set it up to take only part of a page at a time, and you can magnify your material till the limiting factor is not the camera, but the quality of the printing. Is my choice unusual in any way? Jon On 2010/04/14 13:44 PM, Greg Weeks wrote: > On Sat, 13 Feb 2010, Sparr wrote: > >> Remnants of glue left on the edge of a page (from the removed spine) >> get stuck to the inside of the scanner or feeder, ruining the scans of >> subsequent pages until the scanner is cleaned. > > I use a knife to cut the binding off rather than try to separate the > pages. A plough knife is actually made for this. I've had pretty good > results with a standard construction razor knife. > From ajhaines at shaw.ca Wed Apr 14 09:12:55 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Wed, 14 Apr 2010 09:12:55 -0700 Subject: [gutvol-d] Re: PDF-files References: <7527194b0912050812s17b817f5i81e0398905f15c68@mail.gmail.com> Message-ID: <1EA29AAF01F04633AE18B692A7AD561D@alp2400> Fernando, PDF files are welcome, as long as they're part of a complete submission package. At a minimum, a submission must include a plain text file (either ASCII, ISO/Latin1, or UTF8, as required by the book's language). If the source book has illustrations or other graphical content, you can prepare an HTML file. If you wish to also submit a PDF file, it should be generated from either your text or HTML file, not simply downloaded from, for example, Internet Archive (http://www.archive.org/details/americana), and included with the other submission files. More information can be found in PG's various FAQ's at http://www.gutenberg.org/wiki/Category:FAQ Al Haines Project Gutenberg ----- Original Message ----- From: Fernando Maia Jr. To: Project Gutenberg Volunteer Discussion Sent: Saturday, December 05, 2009 9:12 AM Subject: [gutvol-d] PDF-files [Sorry, I forgot to change the subject.] Hello, volunteers! I'm new here and I have a doubt. Would it be interesting for PG if it would have more PDF-files? I've searched for this information everywhere and I haven't found an answer yet, so that I decided to ask about it here. Sorry for possible mistakes (English isn't my native language). Thanks in advance, Fernando ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajhaines at shaw.ca Wed Apr 14 09:26:24 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Wed, 14 Apr 2010 09:26:24 -0700 Subject: [gutvol-d] Re: Question about Scanned Books References: <1266262684-sup-4204@zion> Message-ID: Michael, as long as the book can be proven to have been published before 1923, it's in the public domain in the U.S., and is eligible for addition to the Project Gutenberg collection. (For the past few years, Archive.org and Google Books have been the source for most PG submissions, as produced by Distributed Proofreaders and other submitters.) A paper copy is not necessary. For most books in Internet Archive, there's an "All Files: HTTP" link in the "View the book" box, at the left. That gives access to all formats in which the book is available--GIF, PDF, TIF, etc. Before beginning work on any book, you should check that it's not already in Project Gutenberg, and not being worked on by someone else by checking David Price's In-progress list at http://www.dprice48.freeserve.co.uk/GutIP.html. If you haven't prepared an ebook for submission to PG, you should read its various FAQ's at http://www.gutenberg.org/wiki/Category:FAQ. Section 7 of the Volunteers' FAQ is especially important. Al Haines Project Gutenberg ----- Original Message ----- From: "Michael McDermott" To: "gutvol-d" Sent: Monday, February 15, 2010 12:38 PM Subject: [gutvol-d] Question about Scanned Books > Archive.org has many DJVU files of books that have lapsed into the public > domain. Would it comply with PG's guidelines to take one of these volumes > (the one I was thinking of has a copyright date of 1915 and can be found > at http://www.archive.org/details/worksmartinluth00spaegoog)? > > The important elements here are: > > 1) I do not have a copy of the paper edition > 2) This is a scan of a work that, by all appearances, qualifies having > been published in the US before 1923 > 3) Was digitized by Google > > Would this work or would its ancestry cause problems? > -- > Michael McDermott > www.mad-computer-scientist.com > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From ajhaines at shaw.ca Wed Apr 14 09:58:35 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Wed, 14 Apr 2010 09:58:35 -0700 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas References: Message-ID: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> Actually, it's fairly common practice that if a paragraph/verse starts with some kind of graphical/illuminated character, the actual character it stands for is not included in the HTML version. ----- Original Message ----- From: "Michael S. Hart" To: "The gutvol-d Mailing List" Sent: Wednesday, April 14, 2010 6:17 AM Subject: [gutvol-d] !@! #17135 Twas The Night Before Christmas > > ***** This file should be named 17135-h.htm or 17135-h.zip ***** > This and all associated files of various formats will be found in: > http://www.gutenberg.org/1/7/1/3/17135/ > > Produced by Janet Blenkinship, Suzanne Shell and the Online > Distributed Proofreading Team at http://www.pgdp.net > > > Where the first characters of stanza were "illuminated" > they were eliminated and never replaced in the htm. version. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From marcello at perathoner.de Wed Apr 14 11:09:07 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 14 Apr 2010 20:09:07 +0200 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> Message-ID: <4BC604C3.2000804@perathoner.de> Al Haines (shaw) wrote: > Actually, it's fairly common practice that if a paragraph/verse starts > with some kind of graphical/illuminated character, the actual character > it stands for is not included in the HTML version. And that makes the HTML pretty useless for further processing like conversion to mobile formats. It should be made a requirement that the stream of non-markup-characters be identical in all versions of an ebook: lynx --dump should produce a text that wdiffs equal with the text version. -- Marcello Perathoner webmaster at gutenberg.org From jhowse at nf.sympatico.ca Wed Apr 14 11:20:45 2010 From: jhowse at nf.sympatico.ca (Jeannie Howse) Date: Wed, 14 Apr 2010 15:50:45 -0230 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <4BC604C3.2000804@perathoner.de> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> <4BC604C3.2000804@perathoner.de> Message-ID: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> At 03:39 PM 14/04/2010, you wrote: >Al Haines (shaw) wrote: > >>Actually, it's fairly common practice that if a paragraph/verse >>starts with some kind of graphical/illuminated character, the >>actual character it stands for is not included in the HTML version. > >And that makes the HTML pretty useless for further processing like >conversion to mobile formats. > >It should be made a requirement that the stream of >non-markup-characters be identical in all versions of an ebook: > > lynx --dump > >should produce a text that wdiffs equal with the text version. and it does. stripping this:

 The children were nestled all snug in their beds,
While visions of sugar-plums danced in their heads;
And mamma in her kerchief, and I in my cap,
Had just settled our brains for a long winter's nap,

gives you this: The children were nestled all snug in their beds, While visions of sugar-plums danced in their heads; And mamma in her kerchief, and I in my cap, Had just settled our brains for a long winter's nap, JHowse ================================================================================ "Turning a Picture into a thousand words"Preserving History One Page at a Time!! Celebrating more than 17,350 books posted to Project Gutenberg! Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/ ================================================================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From azkar0 at gmail.com Wed Apr 14 11:25:06 2010 From: azkar0 at gmail.com (Scott Olson) Date: Wed, 14 Apr 2010 12:25:06 -0600 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> <4BC604C3.2000804@perathoner.de> <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> Message-ID: On Wed, Apr 14, 2010 at 12:20 PM, Jeannie Howse wrote: > > and it does. > Except where some unfortunately placed white space gives you stuff like: A mid the many celebrations.. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at durendal.org Wed Apr 14 13:17:16 2010 From: greg at durendal.org (Greg Weeks) Date: Wed, 14 Apr 2010 16:17:16 -0400 (EDT) Subject: [gutvol-d] [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <4BC604C3.2000804@perathoner.de> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> <4BC604C3.2000804@perathoner.de> Message-ID: On Wed, 14 Apr 2010, Marcello Perathoner wrote: > And that makes the HTML pretty useless for further processing like conversion > to mobile formats. I've bitched about this before at DP and it got me nowhere. I didn't really jump up and down either though. -- Greg Weeks http://durendal.org:8080/greg/ From hart at pglaf.org Wed Apr 14 14:10:19 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 14 Apr 2010 14:10:19 -0700 (PDT) Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> <4BC604C3.2000804@perathoner.de> <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> Message-ID: If the between stanzas illustrations are so easily included, then why no the illuminated characters? 6 of one, half a dozen of the other. . .eh? Me. . .I would in both the plain ascii letter AND the graphic letter. "Fairly common practice" = "UNfairly common practice". . . . . On Wed, 14 Apr 2010, Jeannie Howse wrote: > At 03:39 PM 14/04/2010, you wrote: > Al Haines (shaw) wrote: > > Actually, it's fairly common practice that if a > paragraph/verse starts with some kind of > graphical/illuminated character, the actual > character it stands for is not included in the > HTML version. > > > And that makes the HTML pretty useless for further processing > like conversion to mobile formats. > > It should be made a requirement that the stream of > non-markup-characters be identical in all versions of an > ebook: > > ? lynx --dump > > should produce a text that wdiffs equal with the text version. > > > and it does. stripping this: > >

class="dropcapc">  class="dropcap">The children were nestled all > snug in their beds,
> While visions of sugar-plums danced in their heads;
> And mamma in her kerchief, and I in my cap,
> Had just settled our brains for a long winter's nap,

/> >

> > gives you this: > > The children were nestled all snug in their beds, > While visions of sugar-plums danced in their heads; > And mamma in her kerchief, and I in my cap, > Had just settled our brains for a long winter's nap, > > JHowse > > > ========================================================================== > ====== > "Turning a Picture into a thousand words"Preserving History One Page at a > Time!! > Celebrating more than 17,350 books posted to Project Gutenberg! > Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/ > ========================================================================== > ====== > > > > From dakretz at gmail.com Wed Apr 14 14:58:21 2010 From: dakretz at gmail.com (don kretz) Date: Wed, 14 Apr 2010 14:58:21 -0700 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> <4BC604C3.2000804@perathoner.de> <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca> Message-ID: PG texts seem to be distributed to readers by a number of different channels. In a sense, PG has become the dominant wholesaler with a number of retailers. And they also provide direct distribution. Source texts are provided to PG by DP (with trivial exceptions) in two formats: plain text and HTML. But PG and other mediators distribute ebooks in a variety of different formats; and given the variety of devices, readers are requiring a number of other formats. This will if anything be increasingly true. But all these ebook formats must somehow be derived, through one or more transformation processes, from one or the other of the two originals. Here are my naive, uninformed perceptions of the trends of what's happening among four different segments: Untransformed plain-text, transformed plain-text, untransformed HTML, and transformed HTML. 1. The number of readers who read ebooks using the original plain-text versions, distributed directly or indirectly, are a significant but declining proportion of the whole. 2. The number of readers who read ebooks using the original HTML versions, distributed directly or indirectly, are a significant proportion of the whole, not declining as rapidly, but still declining (because they require a real browser and a large-enough screen to read them with any level of fidelity.) 3. Some proportion of readers are reading ebooks derived from plain-text versions but transformed using some kind of software to infer formatting. I suspect this proportion is declining as well, but it's hard to do and the readers are increasingly expecting more from ebooks from their increasingly sophisticated devices. 4. So that leaves the rest, who are reading ebooks derived from the original HTML versions. My suspicion is that the majority of ebooks are already provided this way, and (especially with the increasing acceptance of de jure and de facto sub-html standards,) this will only increase. How accurate is this assessment? Based on the distribution among the quartiles, should PG and DP make any changes in the way ebooks are prepared and supplied? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Wed Apr 14 15:42:20 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 14 Apr 2010 18:42:20 EDT Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas Message-ID: <30c96.47708041.38f79ecc@aol.com> dakretz said: > How accurate is this assessment? it's half-assed accurate. mostly because it's looking at the p.g. corpus from the standpoint of its two major file-types. but we'll want to look at it from the perspective of the users who will access it, and how... (the answer to that is mobile, mobile, and mobil.) moreover, some of your points, even as given, are wrong... > 1. original plain-text. significant but declining strongly. no. significant and _increasing._ (a rising tide lifts all boats.) > 2. original .html files. significant but declining slightly. no. significant and _increasing._ (same tide, different boat.) > 3. plain-text derivatives. declining. dead wrong. this is the segment that is increasing fastest. most of the places that make derivatives use the plain-text. > 4. .html derivates. significant and increasingly so. well, not quite "dead-wrong", but still wrong nonetheless. the .html files have far too little consistency to be used in a systematic creation of derivatives, not without glitches... some places use the .html file, but then "fall back" to the plain-text version if they see problems with the derivative. but most can't spend that much energy on quality-control, so they've resigned themselves to using the plain-text files. which is not that big of a sacrifice, to be perfectly honest... indeed, the system giving the most consistently best results is the iphone viewer-app "eucalyptus", which utilizes _only_ the plain-text files; his converter is giving very good output. and, to help get people's heads on, and completely straight, it's good to do the reminder that many of the .html files are the result of a straight-out conversion of the plain-text file. and these files, because they're machine-generated, _are_ consistent enough to be used in a systematic conversion... it's the "hand-crafted" ones that cause all of the problems, which is something that i first pointed out many years ago. when problems with the auto-generated .html files do occur, it's usually due to an underlying glitch in the plain-text file. so auto-conversion of plain-text is the best way to proceed. and i've maintained for 7 years now that such a conversion is not just _possible_, but our best course of action to follow... for several years after i started, i left my argument unproven, just to see who would jump at the bait and try to dispute it... after destroying all that opposition, i have since proven that it is indeed possible to use a plain-text file as your "master". why y'all continue to ignore this proof, i simply do not know. but i'll keep making the case, until all of you can see it clearly. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From cannona at fireantproductions.com Wed Apr 14 16:46:43 2010 From: cannona at fireantproductions.com (Aaron Cannon) Date: Wed, 14 Apr 2010 18:46:43 -0500 Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now available for testing In-Reply-To: References: <20100413073253.GA1262@pglaf.org> <4BC5241A.2040409@teksavvy.com> Message-ID: Hi all. Thanks to all those who downloaded and sent me feedback. I figured out what was causing the broken links. Turns out that when I was creating an ISO file, all the "-"s in filenames were being changed to "_"s, so as to meet the ISO9660 standard, which doesn't permit filenames to contain "-". So, 3533-h.zip was being changed to 3533_h.zip. So, I am having to change the HTML to deal with this. I should have Release Candidate 2 ready for upload tomorrow. I'll keep everyone posted. Thanks. Aaron On 4/13/10, Aaron Cannon wrote: > It looks like the HTML was changed some how from my local copy. For > instance, my local copy has no content directory. All of the numbered > directories are in the root, though come to think of it, it probably > would have been better to create a content directory. Nevertheless, I > believe that it must have been altered to serve the books from a > content directory when it was placed on the web server. So, to answer > your concern, the link works on the real ISO. > > Also, I don't think I want to mess with the books we've already got on > there. It won't really make much difference in size, and I don't want > to add books at this point, so I guess I just don't see any compelling > reason to mess with it. Thanks for the suggestions though! > > Thanks a lot for looking. I really appreciate the feedback! > > If you find anything else, please let me know. Also, Maybe Greg can > comment on the 404 error. > > Thanks. > > Aaron > > > > On 4/13/10, Gardner Buchanan wrote: >> Hi Aaron, >> >> Similarly, I have found that this link does not work: >> >> http://pglaf.org/PGDVD201004-RC1/content/3/5/3/3533/3533-h.zip >> >> followed from here: >> http://pglaf.org/PGDVD201004-RC1/content/etext/3533.html >> >> If you have to muck with things, I also noticed that the following >> section includes the separate volume 1 and 2 of _the Attache_ as >> well as the omnibus. Likely only the two volumes of the omnibus >> are really needed. >> >> Canada -- Social life and customs -- Fiction >> >> * Sunshine Sketches of a Little Town (English) By Leacock, Stephen, >> 1869-1944 >> * The Attach?; or, Sam Slick in England ? Complete (English) By >> Haliburton, Thomas Chandler, 1796-1865 >> * The Attach?; or, Sam Slick in England ? Volume 01 (English) By >> Haliburton, Thomas Chandler, 1796-1865 >> * The Attach?; or, Sam Slick in England ? Volume 02 (English) By >> Haliburton, Thomas Chandler, 1796-1865 >> >> >> On 13-Apr-2010 03:32, Greg Newby wrote: >>> >>> Has a link checker been run on this? I quickly found a missing >>> file via file:///PGDVD_2010_04_RC1/etext/3002.html >>> and wonder whether some filetypes or other content might not >>> have made it. >> >> ============================================================ >> Gardner Buchanan >> Ottawa, ON FreeBSD: Where you want to go. Today. >> > From gbuchana at teksavvy.com Wed Apr 14 16:57:44 2010 From: gbuchana at teksavvy.com (Gardner Buchanan) Date: Wed, 14 Apr 2010 19:57:44 -0400 Subject: [gutvol-d] Re: Question about Scanned Books In-Reply-To: References: <1266262684-sup-4204@zion> Message-ID: <4BC65678.7030105@teksavvy.com> This part of Al's advice should indeed be taken seriously for those contemplating a solo project. I have twice been caught out having worked on a project for which I have had a valid prior clearance only to find that someone has done a duplicate project. I am deeply grateful to David for the work he does on the in-progress list, but I find it difficult to use and in any event one depends on others using it effectively too, and you rely on them to make some effort to follow up with the prior clearance holder as well. My experience is that this is not a reliable string of assumptions. I have been advocating off and on for a more accurate and up to date in-progress mechanism that would be driven from the core information that forms the PG clearance database. The process I apply when deciding to do a project is: (1) Search for the work online in the "usual" places: - king kong http://www.kingkong.demon.co.uk/ngcoba/ngcoba.htm - online books page http://onlinebooks.library.upenn.edu/new.html - PG's catalogue Search using the author and title. Find the author and title on a large catalogue like the LOC or AMICUS and also search for variant names and such. (2) Go to the PG-DP, DP-Europe and DP-Canada sites and search the forums for any talk about my proposed project. (3) Look in David's list. (4) Obtain a PG clearance. If a book I am working on is something that might be relevant to PG Canada, I *also* obtain a clearance from them. In the case of parallel clearances I inform both parties that this is going on. (5) Look *again* in David's list to see that my project appears. (6) Start work on my project. This may seem somewhat paranoid, but I don't intend *again* to be caught-out this way. On 14-Apr-2010 12:26, Al Haines (shaw) wrote: > Before beginning work on any book, you should check that it's not > already in Project Gutenberg, and not being worked on by someone else by > checking David Price's In-progress list at > http://www.dprice48.freeserve.co.uk/GutIP.html. ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From gbuchana at teksavvy.com Wed Apr 14 17:38:27 2010 From: gbuchana at teksavvy.com (Gardner Buchanan) Date: Wed, 14 Apr 2010 20:38:27 -0400 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <30c96.47708041.38f79ecc@aol.com> References: <30c96.47708041.38f79ecc@aol.com> Message-ID: <4BC66003.80201@teksavvy.com> While I agree more with BB's conjecture, than Don's I have seen no real statistical evidence on either side. My own experience, which is very old now, is of encountering titles in Palm-compatible formats that had manifestly been derived mechanically from the PG plain-text versions. This is just an anecdotal point, but it matches BB's "eucalyptus" data point. This doesn't seem to hard to research though. For grins I rummaged around for e-book versions of something I am familiar with. I found two separate conversions of _Sunshine Sketches_ by Leacock. Despite the existence of a nice HTML version by David Widger, both the PDF and HTML versions I found were based on the PG text version, using the text version of the TOC and having the double- hyphen version of M-dashes. So there's two more random data points in BB's column. On 14-Apr-2010 18:42, Bowerbird at aol.com wrote: > dakretz said: > > How accurate is this assessment? > > it's half-assed accurate. > ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From greg at durendal.org Wed Apr 14 17:48:50 2010 From: greg at durendal.org (Greg Weeks) Date: Wed, 14 Apr 2010 20:48:50 -0400 (EDT) Subject: [gutvol-d] [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <4BC66003.80201@teksavvy.com> References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com> Message-ID: On Wed, 14 Apr 2010, Gardner Buchanan wrote: > While I agree more with BB's conjecture, than Don's I have seen > no real statistical evidence on either side. > > My own experience, which is very old now, is of encountering > titles in Palm-compatible formats that had manifestly been derived > mechanically from the PG plain-text versions. This is just > an anecdotal point, but it matches BB's "eucalyptus" data > point. Another couple of anecdotal points. There are two paper publishers I've worked with a bit. Not recently, but a couple of years ago. Both had scripts to take the plain text and allow them to typeset in a couple of hours. They didn't use the html ever because it threw too many exceptions that required hand input to resolve, and therefore took a lot longer to get typeset. The proofread after to make sure nothing got messed up took longer. -- Greg Weeks http://durendal.org:8080/greg/ From dakretz at gmail.com Wed Apr 14 17:53:34 2010 From: dakretz at gmail.com (don kretz) Date: Wed, 14 Apr 2010 17:53:34 -0700 Subject: [gutvol-d] {Disarmed} Re: [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com> Message-ID: Does anyone know of any epublisher other than PG that *does* distribute the html we provide? Don On Wed, Apr 14, 2010 at 5:48 PM, Greg Weeks wrote: > On Wed, 14 Apr 2010, Gardner Buchanan wrote: > > While I agree more with BB's conjecture, than Don's I have seen >> no real statistical evidence on either side. >> >> My own experience, which is very old now, is of encountering >> titles in Palm-compatible formats that had manifestly been derived >> mechanically from the PG plain-text versions. This is just >> an anecdotal point, but it matches BB's "eucalyptus" data >> point. >> > > Another couple of anecdotal points. There are two paper publishers I've > worked with a bit. Not recently, but a couple of years ago. Both had scripts > to take the plain text and allow them to typeset in a couple of hours. They > didn't use the html ever because it threw too many exceptions that required > hand input to resolve, and therefore took a lot longer to get typeset. The > proofread after to make sure nothing got messed up took longer. > > > -- > Greg Weeks > http://durendal.org:8080/greg/ > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbuchana at teksavvy.com Wed Apr 14 18:07:12 2010 From: gbuchana at teksavvy.com (Gardner Buchanan) Date: Wed, 14 Apr 2010 21:07:12 -0400 Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: <4BC5D467.4000407@telkomsa.net> References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> <4BC5D467.4000407@telkomsa.net> Message-ID: <4BC666C0.9040300@teksavvy.com> Hi Jon, Nope, I think you're part of a pretty popular movement there. There's a whole cottage industry of building home-made book scanners that consist of a jig to hold the book and a pair of digital cameras positioned to capture the two facing pages. Look at http://www.diybookscanner.org/ Personally, I still use a flatbed, but that's because I'm a Luddite. On 14-Apr-2010 10:42, Jon Richfield wrote: > FWIW, Since I bought myself a digital camera for general use, plus a > copy of Omniscan, my scanner has been pretty well idle. [...] > > Is my choice unusual in any way? > ============================================================ Gardner Buchanan Ottawa, ON FreeBSD: Where you want to go. Today. From dakretz at gmail.com Wed Apr 14 18:09:39 2010 From: dakretz at gmail.com (don kretz) Date: Wed, 14 Apr 2010 18:09:39 -0700 Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: <4BC666C0.9040300@teksavvy.com> References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> <4BC5D467.4000407@telkomsa.net> <4BC666C0.9040300@teksavvy.com> Message-ID: I bought a crappy digital camera to use "most of the time". It didn't even come with a manual - just a url to download it. But it did have instructions on how to scan a book. Don On Wed, Apr 14, 2010 at 6:07 PM, Gardner Buchanan wrote: > Hi Jon, > > Nope, I think you're part of a pretty popular movement there. > There's a whole cottage industry of building home-made > book scanners that consist of a jig to hold the book and > a pair of digital cameras positioned to capture the two > facing pages. Look at http://www.diybookscanner.org/ > > Personally, I still use a flatbed, but that's because > I'm a Luddite. > > > On 14-Apr-2010 10:42, Jon Richfield wrote: > >> FWIW, Since I bought myself a digital camera for general use, plus a >> copy of Omniscan, my scanner has been pretty well idle. >> > [...] > > >> Is my choice unusual in any way? >> >> > > ============================================================ > Gardner Buchanan > Ottawa, ON FreeBSD: Where you want to go. Today. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sparr0 at gmail.com Wed Apr 14 18:25:31 2010 From: sparr0 at gmail.com (Sparr) Date: Wed, 14 Apr 2010 21:25:31 -0400 Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: <4BC666C0.9040300@teksavvy.com> References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> <4BC5D467.4000407@telkomsa.net> <4BC666C0.9040300@teksavvy.com> Message-ID: I have a few hundred thousand pages to scan, so a diy camera-style book scanner isn't appropriate, nor is a flatbed scanner. Thanks for the ideas, though. On Wed, Apr 14, 2010 at 10:42 AM, Jon Richfield wrote: > FWIW, Since I bought myself a digital camera for general use, plus a copy of > Omniscan, my scanner has been pretty well idle. On Wed, Apr 14, 2010 at 9:07 PM, Gardner Buchanan wrote: > Nope, I think you're part of a pretty popular movement there. > There's a whole cottage industry of building home-made > book scanners that consist of a jig to hold the book and > a pair of digital cameras positioned to capture the two > facing pages. ?Look at http://www.diybookscanner.org/ > > Personally, I still use a flatbed, but that's because > I'm a Luddite. From Bowerbird at aol.com Wed Apr 14 20:22:48 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 14 Apr 2010 23:22:48 EDT Subject: [gutvol-d] [SPAM] re: !@! #17135 Twas The Night Before Christmas Message-ID: <41239.6b00c426.38f7e088@aol.com> gardner said: > So there's two more random data points in BB's column. thanks, gardner. but there's no need to be "random". the big sites using p.g. e-books are very easy to find, because they are... um... big. the first, and foremost, of course, was "black mask". david moynihan had the scripts down to a science, and even offered them to p.g. at one point in time, an offer that was declined, for some stupid reason that was not _just_ stupid, but asininely ridiculous. (and no, i don't even know what that reason was.) next up is "manybooks". matthew's converters are not nearly as good as david's, but he seems to have many loyal users who pick up their books from him, probably because he's always been very good about supporting the widest possible array of machinery... both of these providers have been using books from p.g. going back for many years, so the fact that they were using the plain-text versions might be due to their history... that is, they might just be in a rut... but the newest big provider on the block nowadays is "feedbooks". and as far as i can tell, hadrien also uses the plain-text version as his "starter" version... makes sense, since he stores the text in a database, so it does no good to have somebody else's markup. the number of files being download from these sites these days -- thanks to the kindle/iphone/ipad trio -- is downright _stunning_. hadrien at feedbooks says he's running at about 75,000 downloads every _day_, with 2.5 million in march. several individual books have over 10k downloads, some 20k, and one 30k... *** dakretz said: > Does anyone know of any epublisher other than PG > that *does* distribute the html we provide? well, in a way. apple grabs the .epubs from here. if i'm not mistaken, marcello uses the .html file, if one exists. but if not, he uses the plain-text... i'm quite sure the .epub files are full of ugliness. and mike cook and his "epubbooks" site might use the .html file as his "starter", you'd have to ask him. he hasn't done books "en masse", though, and thus hasn't had to face the problems from inconsistencies. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From richfield at telkomsa.net Thu Apr 15 01:17:15 2010 From: richfield at telkomsa.net (Jon Richfield) Date: Thu, 15 Apr 2010 10:17:15 +0200 Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning In-Reply-To: References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> <4BC5D467.4000407@telkomsa.net> <4BC666C0.9040300@teksavvy.com> Message-ID: <4BC6CB8B.1090005@telkomsa.net> Thanks Don, I suppose I could have done the research to find that out myself, but it never occurred to me to do so. What you say shows that the concept is by now very routine. Hardly surprising; in university librares and even surreptitiously in bookshops, energetic anti-Luddites are busy snapping away at books and articles. Gardner, Thanks for the URL.Many of the devices it illustrates certainly are impressive, but all demand more elaborate mechanisms than I have hitherto so much as considered. I am not saying that this disqualifies them from reasonable consideration, and certainly if I am working in poor light it is necessary to scrounge a reading light, but so far I have managed acceptably with a hand-held camera for a few pages at a time for corrections or bits of newsprint etc, and the mutilated polystyrene table for a stand. I have occasionally used a tripod, which can be more suitable for some purposes. Another trick that is useful for more portable requirements is to se a stick such as a walking stick as a means of steadying the camera, a sort of monopodal tripod. It enables one to keep the camera steady enough for most purposes, plus controlling the distance well enough to use manual focus, which conserves battery power, gives more consistent results and increases speed. If I had to do it again, I probably would have chosen something longer; my little table is only 40 cm high, and 60-80 cm would give less distortion and more even focus. It is no problem for little paperbacks, but large pages are not so good, so I have to work out something to raise the level. I haven't done much scanning lately, but soon I may consider carving up an inverted dustbin or something if I can't find a higher small table. One thing I have not yet found is anything that I can use as a non-reflective overlay to flatten the pages without degrading the image or causing reflections. Picture glass doesn't work, unless there is a new grade that I don't know of. Something I have not yet got round to obtaining or jury-rigging, is one of those nice little cable-attached plungers, or better, a foot pedal for taking the snaps. After a few hundred pages, groping for the button is a nuisance. Sparr, >I have a few hundred thousand pages to scan, so a diy camera-style book scanner isn't appropriate, nor is a flatbed scanner. Thanks for the ideas, though.< You are welcome. I did wonder. Then I assume that you are using a mechanical feed scanner. If so it is simply a matter of guillotining or otherwise amputating the gluey bits. Someone once said something like: "If you can neither avoid it nor fix it, don't worry about it; it isn't a problem; it's reality." Or as they said long ago, "What can't be cured must be endured." Now, I don't know your circumstances, so everything I say is highly context sensitive, and please don't bite me if I tell you obviosities that have nothing to do with your needs and constraints (not to mention tastes, as Gardner instanced.) BUT if the material cannot reasonably be chopped or automatically handled, then it might be time to reconsider. How many pages per second . . . AVERAGE, INCLUDING dealing with jams and messes . . . does your automated glue-hating system read clean? If you cannot comfortably produce properly readable, OCRable pages at better than one per second, then you had better think of a few hundred thousand seconds. One or a half per second is in any case what a camera with a system like mine could give you, once you are up to speed. I have occasionally torn glued pages apart for photographic work, but for me that was no problem, so guillotining and trimming did nt come into it. At eight hours per day, you should be able to capture more than 100000 pages per 5-day week. It certainly is not nice, but it beats a "faster" system that does not work, or at least does not work faster. Just thoughts, together with the thought: "Sooner you than me!" ;-) Go well folks, Jon On 2010/04/15 03:09 AM, don kretz wrote: > I bought a crappy digital camera to use > "most of the time". It didn't even come with > a manual - just a url to download it. > > But it did have instructions on how to > scan a book. > > Don > > On Wed, Apr 14, 2010 at 6:07 PM, Gardner Buchanan > > wrote: > > Hi Jon, > > Nope, I think you're part of a pretty popular movement there. > There's a whole cottage industry of building home-made > book scanners that consist of a jig to hold the book and > a pair of digital cameras positioned to capture the two > facing pages. Look at http://www.diybookscanner.org/ > > Personally, I still use a flatbed, but that's because > I'm a Luddite. > > > On 14-Apr-2010 10:42, Jon Richfield wrote: > > FWIW, Since I bought myself a digital camera for general use, > plus a > copy of Omniscan, my scanner has been pretty well idle. > > [...] > > > Is my choice unusual in any way? > > > > ============================================================ > Gardner Buchanan > > Ottawa, ON FreeBSD: Where you want to go. Today. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrcdh58 at gmail.com Thu Apr 15 02:03:47 2010 From: mrcdh58 at gmail.com (Marc D'Hooghe) Date: Thu, 15 Apr 2010 11:03:47 +0200 Subject: [gutvol-d] freeliterature.org Message-ID: Hi there, I started a couple of weeks ago with a new site in support of PG. http://www.freeliterature.org. Two goals: two spread information about free e-books and literature on the web (extensive link list) - and the possibility to help producing e-text by proofreading. You can download the scans of a book of your choice, and the text to proof is sent on demand. Enjoy. Marc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricardofdiogo at gmail.com Thu Apr 15 06:29:01 2010 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Thu, 15 Apr 2010 14:29:01 +0100 Subject: [gutvol-d] Re: PDF-files In-Reply-To: <1EA29AAF01F04633AE18B692A7AD561D@alp2400> References: <7527194b0912050812s17b817f5i81e0398905f15c68@mail.gmail.com> <1EA29AAF01F04633AE18B692A7AD561D@alp2400> Message-ID: Ou ent?o http://www.gutenberg.org/wiki/Category:PT_PergFreq Ricardo From Bowerbird at aol.com Thu Apr 15 16:09:22 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 15 Apr 2010 19:09:22 EDT Subject: [gutvol-d] html/css expert advice sought Message-ID: <33ee1.72d80d3e.38f8f6a2@aol.com> here's an example of the code i'm using for multiple columns: > http://z-m-l.com/go/2-column-good-xml.html > http://z-m-l.com/go/3-column-good-xml.html if anyone has advice on how to improve that code, i'm all ears... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From cannona at fireantproductions.com Thu Apr 15 16:43:14 2010 From: cannona at fireantproductions.com (Aaron Cannon) Date: Thu, 15 Apr 2010 18:43:14 -0500 Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now available for testing In-Reply-To: References: <20100413073253.GA1262@pglaf.org> <4BC5241A.2040409@teksavvy.com> Message-ID: Hi all. I fixed the reported bugs and the second release candidate is now available for download. If you already downloaded the first DVD, you can download a binary delta which can be used to patch the old one. You will need xdelta3 for this (on Ubuntu you can just do sudo apt-get install xdelta3 use the following commands first, change to the directory where the .iso is located, then do: wget http://www.fireantproductions.com/delta.bin xdelta3 -d -s pgdvd201004-rc1.iso delta.bin pgdvd042010-rc2.iso rm delta.bin and optionally rm pgdvd201004-rc1.iso If you haven't downloaded the previous version, or if the delta patch doesn't work, you can find the new torrent at http://www.fireantproductions.com/pgdvd042010-rc2.torrent The md5sum for the new ISO is B1EA6C8C15BB2EE84126227017F56310 Thanks. Aaron On 4/14/10, Aaron Cannon wrote: > Hi all. > > Thanks to all those who downloaded and sent me feedback. > > I figured out what was causing the broken links. Turns out that when > I was creating an ISO file, all the "-"s in filenames were being > changed to "_"s, so as to meet the ISO9660 standard, which doesn't > permit filenames to contain "-". So, 3533-h.zip was being changed to > 3533_h.zip. So, I am having to change the HTML to deal with this. > > I should have Release Candidate 2 ready for upload tomorrow. I'll > keep everyone posted. > > Thanks. > > Aaron > > On 4/13/10, Aaron Cannon wrote: >> It looks like the HTML was changed some how from my local copy. For >> instance, my local copy has no content directory. All of the numbered >> directories are in the root, though come to think of it, it probably >> would have been better to create a content directory. Nevertheless, I >> believe that it must have been altered to serve the books from a >> content directory when it was placed on the web server. So, to answer >> your concern, the link works on the real ISO. >> >> Also, I don't think I want to mess with the books we've already got on >> there. It won't really make much difference in size, and I don't want >> to add books at this point, so I guess I just don't see any compelling >> reason to mess with it. Thanks for the suggestions though! >> >> Thanks a lot for looking. I really appreciate the feedback! >> >> If you find anything else, please let me know. Also, Maybe Greg can >> comment on the 404 error. >> >> Thanks. >> >> Aaron >> >> >> >> On 4/13/10, Gardner Buchanan wrote: >>> Hi Aaron, >>> >>> Similarly, I have found that this link does not work: >>> >>> http://pglaf.org/PGDVD201004-RC1/content/3/5/3/3533/3533-h.zip >>> >>> followed from here: >>> http://pglaf.org/PGDVD201004-RC1/content/etext/3533.html >>> >>> If you have to muck with things, I also noticed that the following >>> section includes the separate volume 1 and 2 of _the Attache_ as >>> well as the omnibus. Likely only the two volumes of the omnibus >>> are really needed. >>> >>> Canada -- Social life and customs -- Fiction >>> >>> * Sunshine Sketches of a Little Town (English) By Leacock, Stephen, >>> 1869-1944 >>> * The Attach?; or, Sam Slick in England ? Complete (English) By >>> Haliburton, Thomas Chandler, 1796-1865 >>> * The Attach?; or, Sam Slick in England ? Volume 01 (English) By >>> Haliburton, Thomas Chandler, 1796-1865 >>> * The Attach?; or, Sam Slick in England ? Volume 02 (English) By >>> Haliburton, Thomas Chandler, 1796-1865 >>> >>> >>> On 13-Apr-2010 03:32, Greg Newby wrote: >>>> >>>> Has a link checker been run on this? I quickly found a missing >>>> file via file:///PGDVD_2010_04_RC1/etext/3002.html >>>> and wonder whether some filetypes or other content might not >>>> have made it. >>> >>> ============================================================ >>> Gardner Buchanan >>> Ottawa, ON FreeBSD: Where you want to go. Today. >>> >> > From prosfilaes at gmail.com Thu Apr 15 20:09:19 2010 From: prosfilaes at gmail.com (David Starner) Date: Thu, 15 Apr 2010 23:09:19 -0400 Subject: [gutvol-d] Re: !@!!@!!@!Re: Re: so what is so important about pagination? In-Reply-To: <1266938880-sup-4545@zion> References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion> Message-ID: On Tue, Feb 23, 2010 at 2:22 PM, Michael McDermott wrote: > Readers do > not care about the original pages. There have been many editions of Twain or > Shakespeare. But a lot of books aren't Twain or Shakespeare. Most non-fiction books are littered with page numbers that probably should be converted to hyperlinks, but that's a lot of work. And non-fiction books reference page numbers in other books. -- Kie ekzistas vivo, ekzistas espero. From Bowerbird at aol.com Fri Apr 16 11:58:11 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 16 Apr 2010 14:58:11 EDT Subject: [gutvol-d] so let's talk about my collaborative proofreading site, part 3 Message-ID: <261a.60807b81.38fa0d43@aol.com> here's more info on my collaborative proofreading site... *** to see what we're talking about, you can visit this u.r.l.: > http://z-m-l.com/go/sitka/editr.pl *** we've talked about 4 main topics: > navigating the pages... > certifying a page as clean... > searching the book for a string... > feel the power with the "command" field... under the 4th topic -- the command field -- we've discussed these commands you can issue: > showmap... > concat... > showcustom... > blubberbaby... > pairsearch... > end-page-hyphenates... today we'll discuss a few more commands... > copyfootnotes... > movefootnotes... > show-end-line-hyphenates... *** copyfootnotes... some e-book formats want the footnotes collected together into their own section (a la "endnotes")... to accomplish this, enter "copyfootnotes" into the search-field, and then click the "find" button... all of the footnotes will be presented on a screen, so you can copy them en masse. this command leaves the footnotes unmolested on their pages... *** movefootnotes... movefootnotes is another command that does the same as "copyfootnotes", except "movefootnotes" also deletes each footnote from its original page... i'll note that neither of these commands should be used until proofing has been completely finished. until that time, you want to leave the footnote in the one place where it can be most easily proofed, which is right there on that page, next to the scan. for the moment, i have disabled this command... once i've programmed the "mass revert" ability, to reverse any sabotage effort, i will reinstate it. *** show-end-line-hyphenates... you'll probably recall that i encourage people to _retain_ original linebreaks from the paper-book, expressly including all the end-line-hyphenates... this makes it much easier to do proofing, as even distributed proofreaders and project gutenberg acknowledge when it comes to _them_ proofing. (so why they rewrap their text before giving it to other people is a bit disingenuous; but i digress.) at any rate, one slight problem with this approach is that the hyphenated fragments often do _not_ pass spellcheck, and thus are unnecessarily flagged. for instance, you might have the first part of a frag- ment on the top line, and the second on the bottom, and neither "frag-" nor "ment" will pass spellcheck. this command helps you solve that little problem. "show-end-line-hyphenates" will list all of them, as you might expect, but it does a little bit more. first, it tests if the rejoined form passes spellcheck. if so, then it gives you both fragments, so that you can include them in the book's custom dictionary... this command also surveys the full book to see how many times the rejoined form appears in it -- with hyphen, without it, and as two words -- and informs you of the counts, which is good info. i restored all of the end-line-hyphenates on many pages within the "sitka" book, and you can observe the output from "show-end-line-hyphenates" here: > http://z-m-l.com/go/sitka/hyphenates-output.html *** while we're on the subject of end-line-hyphenates, i should briefly address one of the thorny matters... i've always maintained that users should be able to unwrap the text themselves, any time they wanted. indeed, i've said we should give them tools to do it. even more than _that,_ i've _provided_ such a tool: > http://z-m-l.com/go/unwrap.pl in most cases, an end-line-hyphenate is _easy_ to resolve. you eliminate the dash and then bring up the first string from the next line and concatenate it. simple enough. the glitch happens when it was a _compound_word_ -- i.e., a word that includes a dash in it _normally._ in word-processing parlance, this is known as the difference between a "hard" and a "soft" hyphen... so, in order to indicate to the unwrap routine that any particular dash at the end of a line is a "hard" hyphen, to be retained, we need to give it some kind of marker. i've decided -- tentative to testing for problems -- this marker will be the "~" character, after the dash. you can see cases in the sitka book where this happens: > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap007 > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap019 > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap093 > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094 > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094 > http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap107 the lines from those 6 cases are listed here, respectively: > sions in America. The sails of ships from far-~ > off Kronstadt on the Baltic brought Russian > during the winter the hunters took 40 sea-~ > lions, and in the spring many seals were > of ancient Venice. The picturesque, dark-~ > skinned Thlingit women sit at the doors of > Russian fur warehouse. Next is the three-~ > story building used for courthouse and jail, > and later of the U.S. Marines from the Man-~ > of-War which was stationed here. East of > sea. Eastward crest after crest of glacier-~ > capped peaks rise for a hundred miles, so when these are unwrapped, the words "far-off" and "sea-lion" and "dark-skinned" and "three-story" and "man-of-war" and "glacier-capped" will now be rendered as they should be -- as compound words... *** based on my long observation, i'd say dehyphenation is one of the most _inelegant_ aspects of the d.p. system... first of all, it causes unnecessary work for the proofers, because it's more difficult to proof when the linebreaks have been disturbed in any way. even though the effect is relatively small when it's just on end-line-hyphenates, it still cumulates. (and the dictum against "unclothed" em-dashes at line-ends adds to this cumulative effect.) this shifting of original linebreaks causes line-lengths to become uneven, introducing a variety of problems in that some routines that _could_ be written to help process the text depend on line-lengths, and thus are sabotaged when we change the line-lengths arbitrarily. second, dehyphenation itself is work, because proofers (who do not have access to any book-wide information) have to make a judgment about whether the hyphen is to be retained or not, which is fraught with ambiguity... this leads to diffs, which chew even more proofer time. indeed, in the "perpetual" projects, we saw cases where one proofer would take out a hyphen, and another one would put it back with an asterisk (meaning "check it"). and then the third proofer would take out the asterisk! and of course, if a proofer makes a bad decision, that pollutes the text, which can lead to more bad decisions. decisions on all end-line-hyphenates should be made during preprocessing. then if the proofers challenge any of the decisions, the postprocessor can decide that. that's the only sensible workflow. and this "show-end-line-hyphenates" command shows that it is indeed possible to handle end-line-hyphenates in a manner that is simple, yet adequately sophisticated. *** so those are our 3 new commands for the weekend... more later... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmcdermott at mad-computer-scientist.com Fri Apr 16 12:03:49 2010 From: mmcdermott at mad-computer-scientist.com (Michael McDermott) Date: Fri, 16 Apr 2010 14:03:49 -0500 Subject: [gutvol-d] Typesetting Message-ID: <1271444206-sup-9976@zion> Like many I'm sure (all right, I'm not really sure), I like ebooks/etexts but do not like to read them on a computer screen. This is largely, no doubt, because I work with a computer all day anyway--a book should be a place to get away from it all for a little. The natural thing to do is to print the text out and read it. The question then is: how do we typeset it? The first thing I looked at was GutenMark. I was a little disappointed when I tried it on _Gods and Fighting Men_ by Lady Augusta Gregory. The LaTeX it generated was invalid. Then I took the HTML version of said book and ran it through HTML2PS. The results were serviceable, but looked like, well, a printed web page. a2ps worked in the most rudimentary sense. The font was still a fixed width font, the paragraphs were not reformatted, so there was a lot of unused space on the right hand side of the page. One of the last two would, of course, do in a pinch, but I was wondering whether anyone else here had any ideas/recipes on how to automatically or mostly-automatically typeset a PG etext for printing. -- Michael McDermott www.mad-computer-scientist.com From Bowerbird at aol.com Fri Apr 16 14:36:41 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 16 Apr 2010 17:36:41 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <3426.7f772dcd.38fa3269@aol.com> michael said: > Like many I'm sure (all right, I'm not really sure), > I like ebooks/etexts but do not like to read them > on a computer screen. there are many people like you. most of them are old. which is to say that very few young people feel that way. it does _not_ mean that all "old" people agree with you; many oldsters are very comfortable reading on screens. > This is largely, no doubt, because I work with > a computer all day anyway--a book should be > a place to get away from it all for a little. i'm sorry you hate your job... ;+) and maybe that's part of the problem. perhaps you don't really hate reading off a screen, you just hate doing it at a computer while you're sitting at a desk. in which case an ipad might be a very nice solution... (it _would_ save some trees. or maybe just one tree. but every tree we save is one more tree on the earth.) > The natural thing to do is to > print the text out and read it. > The question then is: > how do we typeset it? boy, you _are_ old, aren't you? :+) "typesetting" is such a quaint term, charming and cute. even "desktop publishing" now seems badly outdated. > One of the last two would, of course, do in a pinch, > but I was wondering whether anyone else here > had any ideas/recipes on how to automatically or > mostly-automatically typeset a PG etext for printing. well, yeah. but what are your expectations? what are your demands? if you were to do the job for an individual e-text, perhaps like the one you mentioned, what changes would you make? let's start with ripping out the legalese and go from there... you talked about unwrapping paragraphs. you'd do that? (were they too long for you, or too short for you, or what?) of course you don't want a monospaced font, but which fonts would you settle for? times new roman? helvetica? or do you need an ability to use any font on your machine? what about paragraphing? block paragraphs, or indentation? do you want full-justification, or is ragged-right acceptable? hyphenation, or not? if you could have the original linebreaks, complete with the original end-of-line-hyphenates, would you? how about chapter-headings? page-top? recto? double-truck? curly-quotes? typographic em-dashes? footnotes or endnotes? runheads? do you want pagenumbers? if so, printed where? what pagesize would you prefer? 8.5*11? or 5.5*8.5 for 2-up? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmcdermott at mad-computer-scientist.com Fri Apr 16 15:37:05 2010 From: mmcdermott at mad-computer-scientist.com (Michael McDermott) Date: Fri, 16 Apr 2010 17:37:05 -0500 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <3426.7f772dcd.38fa3269@aol.com> References: <3426.7f772dcd.38fa3269@aol.com> Message-ID: <1271457420-sup-4620@zion> Excerpts from Bowerbird's message of Fri Apr 16 16:36:41 -0500 2010: > michael said: > > Like many I'm sure (all right, I'm not really sure), > > I like ebooks/etexts but do not like to read them > > on a computer screen. > > there are many people like you. most of them are old. > > which is to say that very few young people feel that way. > > it does _not_ mean that all "old" people agree with you; > many oldsters are very comfortable reading on screens. > > > This is largely, no doubt, because I work with > > a computer all day anyway--a book should be > > a place to get away from it all for a little. > > i'm sorry you hate your job... ;+) > > and maybe that's part of the problem. perhaps you > don't really hate reading off a screen, you just hate > doing it at a computer while you're sitting at a desk. > > in which case an ipad might be a very nice solution... > (it _would_ save some trees. or maybe just one tree. > but every tree we save is one more tree on the earth.) > > > The natural thing to do is to > > print the text out and read it. > > The question then is: > > how do we typeset it? > > boy, you _are_ old, aren't you? :+) > > "typesetting" is such a quaint term, charming and cute. > > even "desktop publishing" now seems badly outdated. > > > One of the last two would, of course, do in a pinch, > > but I was wondering whether anyone else here > > had any ideas/recipes on how to automatically or > > mostly-automatically typeset a PG etext for printing. > > well, yeah. > > but what are your expectations? what are your demands? > > if you were to do the job for an individual e-text, perhaps > like the one you mentioned, what changes would you make? > > let's start with ripping out the legalese and go from there... > > you talked about unwrapping paragraphs. you'd do that? > (were they too long for you, or too short for you, or what?) > > of course you don't want a monospaced font, but which > fonts would you settle for? times new roman? helvetica? > or do you need an ability to use any font on your machine? > > what about paragraphing? block paragraphs, or indentation? > > do you want full-justification, or is ragged-right acceptable? > > hyphenation, or not? if you could have the original linebreaks, > complete with the original end-of-line-hyphenates, would you? > > how about chapter-headings? page-top? recto? double-truck? > > curly-quotes? typographic em-dashes? footnotes or endnotes? > > runheads? do you want pagenumbers? if so, printed where? > > what pagesize would you prefer? 8.5*11? or 5.5*8.5 for 2-up? > > -bowerbird -- Michael McDermott www.mad-computer-scientist.com From mmcdermott at mad-computer-scientist.com Fri Apr 16 16:04:22 2010 From: mmcdermott at mad-computer-scientist.com (Michael McDermott) Date: Fri, 16 Apr 2010 18:04:22 -0500 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <1271457420-sup-4620@zion> References: <3426.7f772dcd.38fa3269@aol.com> <1271457420-sup-4620@zion> Message-ID: <1271457428-sup-4270@zion> > > there are many people like you. most of them are old. I'm in my early 20s. > > i'm sorry you hate your job... ;+) > > and maybe that's part of the problem. perhaps you > > don't really hate reading off a screen, you just hate > > doing it at a computer while you're sitting at a desk. Well, your psychoanalyzing is interesting (sarc.), but I like what I do. I do not like eyestrain and I like the variety that print media provides. > > "typesetting" is such a quaint term, charming and cute. An old term to be sure, but I like it. "Desktop publishing" was a lame term, even when it was in vogue. > > boy, you _are_ old, aren't you? :+) No. There are a finite number of options: a computer screen (a blackberry screen is just a small computer screen), an eink screen (which would be a good compromise if I had the spare cash), or print. I'm trying to move away from 1, 2 is impractical for the time being, and that brings us to 3. > > but what are your expectations? what are your demands? I have no demands, per se. It was a question. Googling did not turn up anything convenient. The only real option would be to convert each text manually into LaTeX or some lightweight format like asciidoc (my personal favorite). Largely, I am looking to see if anyone else has a solution to a problem before I break out an interpreter/compiler and get cracking on my own. Nitpicking aside, you raise a valid point. What do I want? * Automatic or mostly automatic. This is all done by running a single command or with some slight configuration changes to said command. * Font family selection. I don't personally care about picking an exact font, but font family select ala CSS would be nice, with a reasonable default of the Roman variety. * Paragraph lines should run to the end of the printed page--be that margins or whatnot. * On screen, I like block paragraphs, but in print indented ones. Optimally, this would be user-settable. * Page size I would want to set, but 2 pages printed on an 8.5x11 sheet in practice. * I care little about hyphenation vs wrapping, but I would want the text conformed to the print media, not verbatim of the original edition. This is, after all, one of the advantages of an etext--the ability to reflow the content as desired. * Page numbers, of course. * Curly quotes do not matter one way or another to me. * em-dashes would be preferable. * Footnotes and endnotes should be included, of course. -Michael Excerpts from Michael McDermott's message of Fri Apr 16 17:37:05 -0500 2010: > Excerpts from Bowerbird's message of Fri Apr 16 16:36:41 -0500 2010: > > michael said: > > > Like many I'm sure (all right, I'm not really sure), > > > I like ebooks/etexts but do not like to read them > > > on a computer screen. > > > > there are many people like you. most of them are old. > > > > which is to say that very few young people feel that way. > > > > it does _not_ mean that all "old" people agree with you; > > many oldsters are very comfortable reading on screens. > > > > > This is largely, no doubt, because I work with > > > a computer all day anyway--a book should be > > > a place to get away from it all for a little. > > > > i'm sorry you hate your job... ;+) > > > > and maybe that's part of the problem. perhaps you > > don't really hate reading off a screen, you just hate > > doing it at a computer while you're sitting at a desk. > > > > in which case an ipad might be a very nice solution... > > (it _would_ save some trees. or maybe just one tree. > > but every tree we save is one more tree on the earth.) > > > > > The natural thing to do is to > > > print the text out and read it. > > > The question then is: > > > how do we typeset it? > > > > boy, you _are_ old, aren't you? :+) > > > > "typesetting" is such a quaint term, charming and cute. > > > > even "desktop publishing" now seems badly outdated. > > > > > One of the last two would, of course, do in a pinch, > > > but I was wondering whether anyone else here > > > had any ideas/recipes on how to automatically or > > > mostly-automatically typeset a PG etext for printing. > > > > well, yeah. > > > > but what are your expectations? what are your demands? > > > > if you were to do the job for an individual e-text, perhaps > > like the one you mentioned, what changes would you make? > > > > let's start with ripping out the legalese and go from there... > > > > you talked about unwrapping paragraphs. you'd do that? > > (were they too long for you, or too short for you, or what?) > > > > of course you don't want a monospaced font, but which > > fonts would you settle for? times new roman? helvetica? > > or do you need an ability to use any font on your machine? > > > > what about paragraphing? block paragraphs, or indentation? > > > > do you want full-justification, or is ragged-right acceptable? > > > > hyphenation, or not? if you could have the original linebreaks, > > complete with the original end-of-line-hyphenates, would you? > > > > how about chapter-headings? page-top? recto? double-truck? > > > > curly-quotes? typographic em-dashes? footnotes or endnotes? > > > > runheads? do you want pagenumbers? if so, printed where? > > > > what pagesize would you prefer? 8.5*11? or 5.5*8.5 for 2-up? > > > > -bowerbird -- Michael McDermott www.mad-computer-scientist.com From jimad at msn.com Fri Apr 16 16:05:53 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 16:05:53 -0700 Subject: [gutvol-d] [SPAM] RE: Re: !@!!@!!@!Re: Re: so what is so important about pagination? In-Reply-To: <1266938880-sup-4545@zion> References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion> Message-ID: ....are you interested in text preservation or in manuscript preservation? PG & DP while they do good work for society don't actually do either of those things. What they do is transcription of a book into ASCII or something close to ASCII -- even when transcribing into HTML or ISO. The end result is usually something that is readable and recognizable as being somehow more-or-less related to what the original author wrote and the original author published. Is it "correct" ? Of course not -- one cannot talk about "correctness" when something is 1) intended to be readable by today's audience, and 2) has been transcribed into something that is a small subset of what was available to publishers even by the 1700s 3) the chosen subset is primarily dictated by what can be easily input from a standard IBM chicklet keyboard and more-or-less OCR'ed by standard OCR software 4) a subset of punctuation and simplified punctuation rules have been adopted in practice which differ somewhat from that which obviously the author and publisher put in their books. One might be tempted to say that what PG & DP actually do is "word preservation" but actually they don't even really do that either. Its really re-interpretation and republishing from one format -- on paper by professional publishers a long long time ago, into another format -- either a PG specific non-re-flowable electronic format built around "teletype" standards of the early 1970s, "ASCII, 70 chars more-or-less per line" similar to AP wire format, or to HTML for lowest-common-denominator browsers -- said constraint being in practice more likely the HTML to EPUB and/or HTML to MOBI converter routines and the limitations of EPUB and/or MOBI stand-alone reader hardware -- and doing so in a way that might actually be read by one or another target audiences on said devices. Are these efforts successful? I think so -- for example when I see a friend of mine has bought a new iPad and is happily reading a text I produced for PG prior to the iPad's announcement and my friend didn't even realize that I wrote it in HTML and PG published it -- because of course Apple strips out the PG header and transcriber acknowledgements before converting it to Apple DRM'ed EPUB and redistributing it as "Apple's own free book available only from the Apple iPad Store"! [ Thank You Jobs -- who's "1984" now??? ] From jimad at msn.com Fri Apr 16 16:24:23 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 16:24:23 -0700 Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> Message-ID: Even more spectacular than the "illuminated letter" problem [which is bad enough][and which I would hope most transcribers would avoid nowadays by choosing NOT to include GIFs for illuminated letters and other trivial "printers art"] you also have texts when the transcriber has chosen to leave some text in GIF only mode, and/or other text in GIF mode AND OCR'ed mode, such that the MOBI and EPUB versions may have 0, 1 or 2 copies of a particular entire paragraph of text. And/or the HTML was written in a non-linear form in which case the MOBI and EPUB versions may have 0, 1, 2 or N copies of any particular passage in the text. And captions on images may be retained in the image, included in the HTML, and/or included in the alt-tag meaning that a particular user with a particular reading device may see or hear the image caption 0, 1, 2 or 3 times. From lee at novomail.net Fri Apr 16 16:49:33 2010 From: lee at novomail.net (Lee Passey) Date: Fri, 16 Apr 2010 17:49:33 -0600 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <1271444206-sup-9976@zion> References: <1271444206-sup-9976@zion> Message-ID: <4BC8F78D.1000707@novomail.net> On 4/16/2010 1:03 PM, Michael McDermott wrote: [snip] > One of the last two would, of course, do in a pinch, but I was wondering > whether anyone else here had any ideas/recipes on how to automatically > or mostly-automatically typeset a PG etext for printing. As bowerbird is unfailingly quick to point out, automatic processing of any document file relies on the file being regularized in such a way that any transformation you which to make is unambiguously identifiable. If it is not possible to unambiguously identify a transformation you wish to make, the file must include unambiguously identifiable meta-information (information that is not part of the primary data) that identifies the transformation (this kind of meta-information is commonly known as "markup"). Project Gutenberg requires no textual regularization of any kind for its impoverished text files, and therefore these files are extremely difficult to automatically transform. Of course, there are some conventions which have evolved some of which are used more regularly than others. Thus, if you are content with the italicization of text set off by underscores (_) you will probably be successful with this transformation more than 90% of the time. On the other hand, if you want to start chapters on a new page, you will probably be successful with that transformation less than 50% of the time. The degree of success you have will depend to a large extent on the degree of transformation you want to achieve; if you are content to simply print the file as is, changing only the font face (don't try to change the font size, or you will run into reflowing problems) you will can probably achieve 99+% success. If you want to make a PG file look like an ordinary paperback, certainly less than 50%. (This is, of course, assuming you are using "off-the-shelf" tools. If you're comfortable with scripting languages you could no doubt do better). Your degree of success will also depend on the age of the PG file you want to transform. As time has gone on, and conventions have evolved, later texts are more "regular" than earlier texts; good luck converting _Pride and Prejudice_. You will probably have the most success by using the HTML version of a file, when it can be found (I do not believe that the majority of texts at Project Gutenberg are yet available in HTML versions); this is because while PG HTML texts are still not completely consistent in their use of markup, they are probably /more/ consistent than the impoverished text files. I am assuming you used html2ps or html2pdf version 2.0.43 available from http://www.tufat.com/s_html2ps_html2pdf.htm, and that you have completely read the documentation (BTW, I have not). According to the website, html2pdf almost completely supports CSS version 2, and the media parameters values of CSS3. Were it I (and it will not be, because I am completely happy reading HTML on my mobile device, and because I find PDF to be the one format which is actually worse than PG impoverished text format) I would find a css style sheet which has most of the features I like then use that with the PG HTML files and html2pdf. The resulting PDF can then be printed using Acrobat Reader or equivalent (if you are committed to the destruction of the environment). I suspect that html2pdf will not consume a style sheet unless it is referenced by the html document itself, and DP/PG has been highly resistant to the notion of adding a reference to a generic style sheet in every HTML file, so you will probably have to edit each file to add "" to the section of each HTML file, but I would think that would fall under the category of "semi-automated." If you cannot find an HTML version of the text you want (be sure to look outside of PG, as there are many other sources) you might want to try bowerbird's ZML2HTML coverter; I suspect it may work about 75% of the time to get basic HTML out of PG impoverished text. FWIW, the style sheet I typically use for reading HTML files can be found at http://www.ebookcooperative.com/ebook.css. From jimad at msn.com Fri Apr 16 16:52:28 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 16:52:28 -0700 Subject: [gutvol-d] [SPAM] RE: {Disarmed} Re: [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com> Message-ID: >Does anyone know of any epublisher other than PG that *does* distribute the html we provide? Not sure exactly what you are asking but Apple for example takes the PG html, strips out the PG legalize and acknowledgment of the volunteers, converts it to EPUB with DRM, and redistributes it "free" [where "Free" in this case means being only able to get then book in DRM form and only being able to get it directly from the Steve Jobs iPad monopoly] One knows they are not working from the txt versions of the files because the Apple redistributions contains chars and formatting found only in the HTML versions. FreeKindleBooks redistributes in HTML form converted to MOBI and retaining all the PG legalize and requirements. Mobileread has volunteers which take the HTML usually heavily reformat it, strip it, and republish in MOBI and EPUB formats while cackling about how much better their versions are! Many other sites appear to "down-convert" to a least-common-denominator ASCII format before "up-converting" back to HTML, MOBI, EPUB, etc. Presumably they are working from an ASCII version of an old DVD distribution - getting "working" EPUB and MOBI from the HTML formats tends to be "non-trivial", not to mention that some sites republish in say two dozen different formats. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 16 16:54:03 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 16 Apr 2010 19:54:03 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <1f3b1.3bbc6acd.38fa529b@aol.com> michael said: > I'm in my early 20s. ok. age is a state of mind. > I like what I do. > I do not like eyestrain you get eyestrain from your screens? really? sincerely, you need to get better equipment. there's no good reason for eyestrain any more. > and I like the variety that print media provides. ok. i find the web has 30 times more variety, including most stuff you can find in print, but everybody has their own sense of taste. > An old term to be sure, but I like it. i like it too. it's charming. and cute. :+) > The only real option would be to > convert each text manually into LaTeX > or some lightweight format like asciidoc > (my personal favorite). if you read the archives, you'll see that i have been the major cheerleader here for light-markup since way back in 2003. > Largely, I am looking to see > if anyone else has a solution i have a solution. it's sitting on my hard-drive. you might be the person who springs it free... > before I break out an interpreter/compiler > and get cracking on my own. i encourage you to get cracking on your own... i'd love to have someone to compare notes with. i would offer you all kinds of advice in your work, probably (to be frank) whether you want it or not. > Nitpicking aside, you raise a valid point. we're just having fun... don't take it too seriously. but that wasn't nitpicking. i was running through the checklist that you will eventually have to make, if you want to offer your program to anybody else. (which you might or might not wanna do, i dunno.) > What do I want? yes please, do tell... > * Automatic or mostly automatic. makes sense. fully automatic just isn't possible, not across the full p.g. library, but "mostly automatic" is within range... > * Font family selection. I don't personally > care about picking an exact font i ask about times new roman and helvetica in particular because some aspects of my solution can only use those. but if we take another approach, you can use any font... > * Paragraph lines should run to the end of the printed page ok, but are lines as they're wrapped in a typical p.g. e-text too short, or too long? in other words, what's the measure? (that's the typographic term for the length of your lines, i.e., the pagesize minus your margins.) this will depend on the fontsize, which i forgot to ask about. > * On screen, I like block paragraphs, but in print indented ones. ok. > Optimally, this would be user-settable. optimally, _everything_ is user-settable. > * Page size I would want to set, but > 2 pages printed on an 8.5x11 sheet in practice. that's right. > * I care little about hyphenation vs wrapping, but > I would want the text conformed to the print media, > not verbatim of the original edition. that's a bit ironic, since the original edition _was_ text that was made to conform to the print media. so there is a certain bit of contradiction in there... but i'll let it pass. > This is, after all, one of the advantages of an etext > --the ability to reflow the content as desired. well, yes, of course. but once you've printed it out, you've lost that ability-to-reflow. so does it matter? (never mind, it's just another philosophical question.) > * Page numbers, of course. of course, of course. > * Curly quotes do not matter one way or another to me. ok. > * em-dashes would be preferable. ok. > * Footnotes and endnotes should be included, of course. yes, of course they should be included. i was asking if you had a preference for one over the other, because it can get very hairy to do footnotes in a rewrapped e-text. it's easier to do endnotes. but if you _want_ footnotes... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 16 17:18:12 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 17:18:12 -0700 Subject: [gutvol-d] Re: html/css expert advice sought In-Reply-To: <33ee1.72d80d3e.38f8f6a2@aol.com> References: <33ee1.72d80d3e.38f8f6a2@aol.com> Message-ID: Not an expert html coder, but 2-column "works" on my 1280 display in both IE and Firefox, whereas 3-column does bad things with the page image. >if anyone has advice on how to improve that code, i'm all ears... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 16 17:36:40 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 17:36:40 -0700 Subject: [gutvol-d] [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: <261a.60807b81.38fa0d43@aol.com> References: <261a.60807b81.38fa0d43@aol.com> Message-ID: Agreed that mindlessly changing hyphens to check-hyphens is one way some P3 automatons introduce more damage than they're worth. There is a general problem in DP that proofers introduce a check-hyphen when they mean "I really don't like the fact that the original book had a hyphen there." Well, too bad. If the original book had a hyphen there then the two options are: 1) Join with hyphen, or 2) Join without hyphen "Throw the hyphen away because I do not like it" is not an option. Typical example is something like: ..school- teacher. Where the two plausible answers could be: .school-teacher. or .schoolteacher. and of course the proofer automaton changes this to .school-*teacher. meaning "gee I wish the author had written this as:" .school teacher. which of course is not an option. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Fri Apr 16 17:47:24 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 17:47:24 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <3426.7f772dcd.38fa3269@aol.com> References: <3426.7f772dcd.38fa3269@aol.com> Message-ID: >in which case an ipad might be a very nice solution... (it _would_ save some trees. or maybe just one tree. but every tree we save is one more tree on the earth.) iPad is a very nice solution if a) you want to let Steve Jobs decide what you get to read, where you can download it from, what reader app you get to read it with, and how much you pay for it. b) Its not important for you to acknowledge where you books are coming from, who did them for you, and to be allowed to redistribute those books to your friends. c) You don't mind reading your books through a "screen door" Good ebook readers allow: YOU to easily get a book from any free site YOU choose - NOT Steve Jobs! Choose from a variety of fonts. Choose from a variety of font sizes - and easily change those font sizes over the course of a day if your eyes begin to tire. Choose how big the margins are - how many chars or words YOU want per line of text. Don't know about you, but I have absolutely no desire to have Steve Jobs censor my reading materials - nor to censor the reading app that I use to read those books - nor to monopolize the distribution channel! iPad is a HUGE step BACKWARDS as far as I can tell! -------------- next part -------------- An HTML attachment was scrubbed... URL: From klofstrom at gmail.com Fri Apr 16 17:57:03 2010 From: klofstrom at gmail.com (Karen Lofstrom) Date: Fri, 16 Apr 2010 14:57:03 -1000 Subject: [gutvol-d] Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Fri, Apr 16, 2010 at 2:36 PM, James Adcock wrote: > There is a general problem in DP that proofers introduce a check-hyphen when they mean ?I really don?t like the fact that the original book had a hyphen there.? And you know this HOW? What the asterisk means is, "I don't know how the author usually spells this, whether closed (schoolteacher) or hyphenated (school-teacher). This hyphen comes at the end of a line, so I don't know whether to drop it or keep it. I'll put an asterisk there, so the PPer can check the usage in the rest of the text. to see what spelling the author usually uses, hyphenated or closed." That's ALL it means. Uncertainty about the author's preferred spelling. I know, as a professional copyeditor, that the open/hyphenated/closed continuum is extremely mutable, that words have changed over time (to-day becomes today), and that at any one time, different authors and different publishing houses may make different choices. (Copyeditor or copy-editor is just one example; you'll find it both ways.) Sometimes newbie proofers over-asterisk. The same word may occur on the same page with the author's preferred spelling prominently on display. But the newbie is afraid of making a judgment call and asterisks anyway. No big deal. Better to be too careful than to drop the hyphen and rejoin words that should be hyphenated rather than closed up. At times this list seems to function just as Encyclopedia Dramatica does for Wikipedia; all the malcontents gather and mutter about THEM over THERE doing it WRONG and THEY didn't listen to ME. -- Karen Lofstrom From jimad at msn.com Fri Apr 16 17:57:34 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 17:57:34 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <4BC8F78D.1000707@novomail.net> References: <1271444206-sup-9976@zion> <4BC8F78D.1000707@novomail.net> Message-ID: >...DP/PG has been highly resistant to the notion of adding a reference to a generic style sheet in every HTML file... Anything more than the simplest uses of CSS tends to break the conversion of HTML into EPUB and MOBI that can be successfully used by most ebook readers -- not to mention older browsers. From jimad at msn.com Fri Apr 16 18:12:51 2010 From: jimad at msn.com (James Adcock) Date: Fri, 16 Apr 2010 18:12:51 -0700 Subject: [gutvol-d] [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: >And you know this HOW? Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2) As a PP I have attempted to "fix" check-hyphens and to do so one has to try to understand what it was that the P3 was complaining about. I've emailed some and said "what were you thinking?" and they say "oops, you're right, I was basically thinking that I wished the hyphen wasn't there." Lot's of people put check-hyphen in there when they are feeling "uncomfortable." Feeling "uncomfortable" isn't *sufficient* reason to put a check-hyphen in -- because if you do so then you make the PP uncomfortable too -- and who has no recourse except a) ignore the check-hyphen. b) waste copious amounts of time trying to double check the hyphen against the author's published corpus c) write the proofer an email and hope some day they will respond honestly and tell you what they were thinking if they were thinking when they entered the check-hyphen. Many authors put hyphens in places which today make to our modern tastes feel uncomfortable. That is not enough reason to insert a check-hyphen. A check-hyphen is basically a punt to the PP -- who is no better placed to resolve the issue. From klofstrom at gmail.com Fri Apr 16 18:19:46 2010 From: klofstrom at gmail.com (Karen Lofstrom) Date: Fri, 16 Apr 2010 15:19:46 -1000 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Fri, Apr 16, 2010 at 3:12 PM, James Adcock wrote: > Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2) As a PP I have attempted to "fix" check-hyphens and to do so one has to try to understand what it was that the P3 was complaining about. ?I've emailed some and said "what were you thinking?" and they say "oops, you're right, I was basically thinking that I wished the hyphen wasn't there." Those are folks who don't understand the rules. You ran into some bad P3ers; that doesn't mean that all of us in P3 are like that. You could have *corrected* the misconceptions, rather than deciding that we're all idiots. -- Karen Lofstrom not an idiot (at least in THIS area) From prosfilaes at gmail.com Fri Apr 16 18:39:00 2010 From: prosfilaes at gmail.com (David Starner) Date: Fri, 16 Apr 2010 21:39:00 -0400 Subject: [gutvol-d] Re: [SPAM] RE: Re: !@!!@!!@!Re: Re: so what is so important about pagination? In-Reply-To: References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion> Message-ID: On Fri, Apr 16, 2010 at 7:05 PM, James Adcock wrote: > Of course not -- one cannot talk about "correctness" when something is 1) intended to be readable by today's audience, and 2) has been transcribed into something that is a small subset of what was available to publishers even by the 1700s 3) the chosen subset is primarily dictated by what can be easily input from a standard IBM chicklet keyboard and more-or-less OCR'ed by standard OCR software 4) a subset of punctuation and simplified punctuation rules have been adopted in practice which differ somewhat from that which obviously the author and publisher put in their books. One can always talk about correctness; it comes in many different levels and varieties. Just because the New Testament was written in Greek, doesn't mean we can't call an English translation wrong where the Gospel of John starts: "Send David all your money in small unmarked bills." I rather like that translation, but objectively speaking, it doesn't represent the original Greek in any way, shape or form. -- Kie ekzistas vivo, ekzistas espero. From dakretz at gmail.com Fri Apr 16 20:26:20 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 16 Apr 2010 20:26:20 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: It's also possible that the P3er was responding appropriately by responding mindlessly to a process that encourages and rewards mindlessness. (Saying that more diplomatically, a system that encourages rote memorization and application of universal rules rather than thoughtful consideration of the text in the light of the available context.) What's ironic is that the second easiest way to handle it is to let the postprocessor (or is that post-processor? let's say post-*processor. See what I mean?) use the available tools to simply list all the cases where hyphenated and dehyphenated versions of the same word appear in the text, check a page image, see which was actually used (I bet it's the most frequent), and fix 'em all at a stroke. The first easiest way is to do this before posting the project in the first place. Then let the instructions say those dreaded DP words: "It doesn't matter,", reducing the cognitive distinctions and requirements between new proofers and old proofers. Somehow this concept is always a non-starter unfortunately, especially among the old proofers who get to write the rules. On Fri, Apr 16, 2010 at 6:19 PM, Karen Lofstrom wrote: > On Fri, Apr 16, 2010 at 3:12 PM, James Adcock wrote: > > > Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2) > As a PP I have attempted to "fix" check-hyphens and to do so one has to try > to understand what it was that the P3 was complaining about. I've emailed > some and said "what were you thinking?" and they say "oops, you're right, I > was basically thinking that I wished the hyphen wasn't there." > > Those are folks who don't understand the rules. You ran into some bad > P3ers; that doesn't mean that all of us in P3 are like that. > > You could have *corrected* the misconceptions, rather than deciding > that we're all idiots. > > -- > Karen Lofstrom > not an idiot > (at least in THIS area) > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klofstrom at gmail.com Fri Apr 16 20:49:19 2010 From: klofstrom at gmail.com (Karen Lofstrom) Date: Fri, 16 Apr 2010 17:49:19 -1000 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Fri, Apr 16, 2010 at 5:26 PM, don kretz wrote: > The first easiest way is to do this before posting the project in the first place. That sounds like a good idea. Why don't we add it as a step in the preparation process? -- Karen Lofstrom From dakretz at gmail.com Fri Apr 16 21:03:41 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 16 Apr 2010 21:03:41 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: Ummm ... I think it is. It's a standard feature of guiprep. It's *almost* a required activity for the Content Provider. But one that the guidelines suggest that the proofer (who doesn't know this has probably happened) will nullify. Here is the applicable Proofing Guideline: Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together like this: to-*day. The asterisk will bring it to the attention of the post-processor, who has access to all the pages and can determine how the author typically wrote this word. Now an only mildly conservative reading of that suggests that just about any word that could possibly be hyphenated should be "-*"ed unless there's another example showing the "right way" on the very same page. There's certainly no moderating language encouraging the proofer to do anything else. Especially considering the possible calumny if they should do the wrong thing. And it says right there to leave it for the PPer if it's not obvious to you in the context of the one page available to you at the time. In fact, if the CPer has done what most CPers do, and left provably hyphenated words hyphenated and closed up the rest, the Guideline actually would lead the proofer to undo it all. On Fri, Apr 16, 2010 at 8:49 PM, Karen Lofstrom wrote: > On Fri, Apr 16, 2010 at 5:26 PM, don kretz wrote: > > > The first easiest way is to do this before posting the project in the > first place. > > That sounds like a good idea. Why don't we add it as a step in the > preparation process? > > -- > Karen Lofstrom > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Fri Apr 16 21:19:38 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 16 Apr 2010 21:19:38 -0700 Subject: [gutvol-d] Re: [SPAM] RE: {Disarmed} Re: [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com> Message-ID: I did some checking too. The conclusion I provisionally have arrived at is that there are relatively few beneficiaries from our expectations for an increasingly elegant HTML version of each project which also is one of the major drags on the post-processing stage and a major contributor in the increasing residency period of projects on DP. It appears to me that the only people who enjoy the full pleasure of our finest work are a.) those who read the whole thing online at PG, and b) those who personally download the HTML version and install it locally so they can read it with a device (probably a PC full-width screen (including laptops and similar.) Which would be - what - 10% or less? In fact, it appears that secondary distributors treat the removal of all or part of the HTML as part of their value-add. Don On Fri, Apr 16, 2010 at 4:52 PM, James Adcock wrote: > >Does anyone know of any epublisher other than PG that *does* distribute > the html we provide? > > > > Not sure exactly what you are asking but Apple for example takes the PG > html, strips out the PG legalize and acknowledgment of the volunteers, > converts it to EPUB with DRM, and redistributes it ?free? [where ?Free? in > this case means being only able to get then book in DRM form and only being > able to get it directly from the Steve Jobs iPad monopoly] One knows they > are not working from the txt versions of the files because the Apple > redistributions contains chars and formatting found only in the HTML > versions. FreeKindleBooks redistributes in HTML form converted to MOBI and > retaining all the PG legalize and requirements. Mobileread has volunteers > which take the HTML usually heavily reformat it, strip it, and republish in > MOBI and EPUB formats while cackling about how much better their versions > are! Many other sites appear to ?down-convert? to a > least-common-denominator ASCII format before ?up-converting? back to HTML, > MOBI, EPUB, etc. Presumably they are working from an ASCII version of an old > DVD distribution ? getting ?working? EPUB and MOBI from the HTML formats > tends to be ?non-trivial?, not to mention that some sites republish in say > two dozen different formats. > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klofstrom at gmail.com Fri Apr 16 21:24:29 2010 From: klofstrom at gmail.com (Karen Lofstrom) Date: Fri, 16 Apr 2010 18:24:29 -1000 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Fri, Apr 16, 2010 at 6:03 PM, don kretz wrote: > Here is the applicable Proofing Guideline: > Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together like this: to-*day. Ah, badly-written guideline. What it doesn't spell out is that there's ambiguity ONLY when words are hyphenated at the end of a line. I can see how someone would misread that guideline and add asterisks before every dang hyphen. The more so if the proofer weren't familiar with 18th and 19th century spellings and lacked any sense of how spellings might have changed. I have been proofing for nearly seven years now, so I suppose some things seem clear to me that might be opaque to a less-experienced proofer. You're also assuming that the proofer is doing only one page at a time. Many of us P3ers tend to do many pages in the same book, so begin to have some sense of what spellings the author uses. It might make sense for the project comments to include a list of words that the au hyphenates that might be problematic. A note to the effect that au uses to-day and to-morrow might alleviate some anxiety and asterisks. -- Karen Lofstrom From dakretz at gmail.com Fri Apr 16 21:36:01 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 16 Apr 2010 21:36:01 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: Ah, badly-written guideline. What it doesn't spell out is that there's > ambiguity ONLY when words are hyphenated at the end of a line. I can > see how someone would misread that guideline and add asterisks before > every dang hyphen. The more so if the proofer weren't familiar with > 18th and 19th century spellings and lacked any sense of how spellings > might have changed. > > I have been proofing for nearly seven years now, so I suppose some > things seem clear to me that might be opaque to a less-experienced > proofer. > > You're also assuming that the proofer is doing only one page at a > time. Many of us P3ers tend to do many pages in the same book, so > begin to have some sense of what spellings the author uses. > > It might make sense for the project comments to include a list of > words that the au hyphenates that might be problematic. A note to the > effect that au uses to-day and to-morrow might alleviate some anxiety > and asterisks. > > -- > Yup. Or you could say that all the hyphenated words have already been checked once, they will all be checked again in post-processing, and it doesn't matter. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Fri Apr 16 22:21:23 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Fri, 16 Apr 2010 22:21:23 -0700 Subject: [gutvol-d] Hyphenation (Re: Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3) In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: <20100417052123.GA26290@pglaf.org> Why is it that people will have a long thread about hyphenation, yet not edit the darned screwed-up subject line to be clear and readable? I'm sorry that our pglaf spam filter tags some stuff as spam, but it doesn't mean we need to carry the tag forever! Lovingly, Greg On Fri, Apr 16, 2010 at 09:36:01PM -0700, don kretz wrote: > Ah, badly-written guideline. What it doesn't spell out is that there's > > > ambiguity ONLY when words are hyphenated at the end of a line. I can > > see how someone would misread that guideline and add asterisks before > > every dang hyphen. The more so if the proofer weren't familiar with > > 18th and 19th century spellings and lacked any sense of how spellings > > might have changed. > > > > I have been proofing for nearly seven years now, so I suppose some > > things seem clear to me that might be opaque to a less-experienced > > proofer. > > > > You're also assuming that the proofer is doing only one page at a > > time. Many of us P3ers tend to do many pages in the same book, so > > begin to have some sense of what spellings the author uses. > > > > It might make sense for the project comments to include a list of > > words that the au hyphenates that might be problematic. A note to the > > effect that au uses to-day and to-morrow might alleviate some anxiety > > and asterisks. > > > > -- > > > > Yup. Or you could say that all the hyphenated words have already been > checked once, they will all be checked again in post-processing, and it > doesn't matter. :) > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From richfield at telkomsa.net Fri Apr 16 23:55:38 2010 From: richfield at telkomsa.net (Jon Richfield) Date: Sat, 17 Apr 2010 08:55:38 +0200 Subject: [gutvol-d] Re: Eyestrain (Was Typesetting) In-Reply-To: <1271444206-sup-9976@zion> References: <1271444206-sup-9976@zion> Message-ID: <4BC95B6A.7000002@telkomsa.net> My history of screen experience goes back some 44 years, which is longer than we have had TV in South Africa. More than half of that period at work (and since the very early eighties home as well) was spent on screens of various qualities and functionalities, everything from 8080s and 8600s with delusions of grandeur, to large mainframes and the whole bang shoot in between, and everything from 300 bits (no, not bytes) per second (not necessarily baud) to crwth-knows-what now. My point? Apart from my decrepitude and the fact that I now have taken to wearing glasses while on line, that eyestrain never figured. I could not understand what the problem was with friends who complained of it (and there were plenty). Then a year or two after I got into PC work I realised that if I got involved in an exciting interactive game (not always if I was the player if things got really exciting), I soon got eyestrain! Now, what follows is not the remark of your friendly corner-shop ophthalmologist, and as far as I can make out my experience, while not unique is not shared by the majority of users, but I think it is of potential use to some people. In all my computer experience I have been emotionally comfortable with hardware, software, and their logic and theory of operation. Whereas many people lean forward when working at the screen, I lounge back, working with my eyes, not actually focussed on infinity (though I think it is a disgrace that our screens do not yet routinely and economically support that) but certainly focussed well past the tip of my cute little snout. In short, I am relaxed, *and so are my eyes*! But obviously I am doing something different with my eyes when playing games. The screens are the actual same screens. I usually am sitting in the same attitude, etc. so dust from the screen isn't a factor. People have suggested all sorts of things, such as that when excited my pupils are more distended or my blink rate is lower. Maybe some of those factors are true, but what it feels like to me (subjectively, I haven't been in a position to test this) is that my ciliary muscles get tired. So??? So, unless your screen or lighting is really lousy, ditto your typeface, colour, layout, size etc really unsuited to your needs, if screen fatigue is a problem, maybe what you need is some well-mamaged relaxation exercises. If what knackers your eyes is games, I am sure you can do the arithmetic! (No, don't mind ME! this is my sympathetic look! ;-) ) Cheers, Jon From hart at pglaf.org Sat Apr 17 02:16:39 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 02:16:39 -0700 (PDT) Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> Message-ID: I have no objection to having both the Illuminated GIF file and the ASCII equivalent character. I see these as just fine, with no impediment to either reading or searching or quoting, other, of course, that any artifact of the GIF file usually not really much of a problem when I cut and paste. As for the MOBI, EPUB, etc., formats, as long as it's easy from the average reader's POV, it should be acceptable. Michael On Fri, 16 Apr 2010, James Adcock wrote: > Even more spectacular than the "illuminated letter" problem [which is bad > enough][and which I would hope most transcribers would avoid nowadays by > choosing NOT to include GIFs for illuminated letters and other trivial > "printers art"] you also have texts when the transcriber has chosen to leave > some text in GIF only mode, and/or other text in GIF mode AND OCR'ed mode, > such that the MOBI and EPUB versions may have 0, 1 or 2 copies of a > particular entire paragraph of text. And/or the HTML was written in a > non-linear form in which case the MOBI and EPUB versions may have 0, 1, 2 or > N copies of any particular passage in the text. And captions on images may > be retained in the image, included in the HTML, and/or included in the > alt-tag meaning that a particular user with a particular reading device may > see or hear the image caption 0, 1, 2 or 3 times. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From marcello at perathoner.de Sat Apr 17 03:28:10 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Sat, 17 Apr 2010 12:28:10 +0200 Subject: [gutvol-d] Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> Message-ID: <4BC98D3A.6080908@perathoner.de> James Adcock wrote: > iPad is a very nice solution if > > a) you want to let Steve Jobs decide what you get to read, where you can > download it from, what reader app you get to read it with, and how much you pay > for it. With Stanza you can download directly from PG and many other free publishers. > iPad is a HUGE step BACKWARDS as far as I can tell! Apple systems have always been more closed than the alternatives. If you don't like closed systems, buy an Android tablet instead. -- Marcello Perathoner webmaster at gutenberg.org From Bowerbird at aol.com Sat Apr 17 11:13:30 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 17 Apr 2010 14:13:30 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <12881.668b4509.38fb544a@aol.com> michael said: > Gods and Fighting Men ok, just so we can all get "on the same page", i've run out a first draft of this book as a .pdf. > http://z-m-l.com/misc/14465-take5.pdf it's got some problems, notably with orphans, including more than one page with one word, but that's ok for the time being. michael, how would this .pdf fit your needs? (you'll need to print a few pages to evaluate.) what, if anything, would need to be changed? -bowerbird p.s. why can't people pick a _short_ book for demo purposes? long books clog the works... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Sat Apr 17 14:56:51 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 14:56:51 -0700 Subject: [gutvol-d] [SPAM] RE: Re: !@! #17135 Twas The Night Before Christmas In-Reply-To: References: <52E0A8172A5F417986C50DAFE5D1132A@alp2400> Message-ID: >I have no objection to having both the Illuminated GIF file and the ASCII equivalent character. I see these as just fine, with no impediment to either reading or searching or quoting, other, of course, that any artifact of the GIF file usually not really much of a problem when I cut and paste. >As for the MOBI, EPUB, etc., formats, as long as it's easy from the average reader's POV, it should be acceptable. OK, then someone needs to think this through and come up with standards and expectations, because what is happening now is "not working." Again, it is not infrequently the case in one or more of the file formats that PG is distributing that a particular item of text is showing up 0, 1, 2, or 3 times, where the "right answer" is once -- or maybe twice -- if as you suggests one accepts redundancy in the case of illuminated letters. As you suggest probably the simplest answer is that if someone wants to put in illuminated letters they also include the plain-text version of the letter, and then presumably one should NOT include an alt-tag on the "illustration" [when it is actually just an illuminated letter] What one *ought* to do for a no-illustration distribution given a "real" illustration with an alt-tag is yet another matter that needs to be thought out. Also suggest it would be nice if we had a naming convention for illuminated letters or some such equivalent, such that the file format conversion software, and/or other software, can tell whether a particular HTML "really" has illustrations, or if it just contains illuminated letters. For example in the text in question, when I ask PG for the MOBI version with *no images* this is what I currently get (which is not quite what one would hope for!) ... Saying her Prayers T was the night before Christmas, when all through the house Not a creature was stirring, not even a mouse; The stocking were hung by the chimney with care In hopes that St. Nicholas soon would be there; Sleeping Mouse Stocking in the Fireplace The children were nestled all snug in their beds, While visions of sugar-plums danced in their heads; And mamma in her kerchief, and I in my cap, Had just settled our brains for a long winter's nap, The children were nestled When out on the lawn there arose such a clatter, I sprang from the bed to see what was the matter .... From jimad at msn.com Sat Apr 17 14:59:14 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 14:59:14 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: >You could have *corrected* the misconceptions, rather than deciding that we're all idiots. I did correct the misconceptions and I did not decide that "we" are all idiots. I have stated repeatedly that I found found extremely competent and dedicated volunteers at all levels of DP -- and the converse. From jimad at msn.com Sat Apr 17 15:15:33 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 15:15:33 -0700 Subject: [gutvol-d] [SPAM] RE: Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: >It might make sense for the project comments to include a list of words that the au hyphenates that might be problematic. A note to the effect that au uses to-day and to-morrow might alleviate some anxiety and asterisks. I have tried leaving project comments and the P1s and the P2s tend to read and follow the project comments whereas the P3s ignore them and undo the good work of the P1s and P2s. From jimad at msn.com Sat Apr 17 15:25:40 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 15:25:40 -0700 Subject: [gutvol-d] Re: Eyestrain (Was Typesetting) In-Reply-To: <4BC95B6A.7000002@telkomsa.net> References: <1271444206-sup-9976@zion> <4BC95B6A.7000002@telkomsa.net> Message-ID: The way people use their eyes, the ways people read, the capabilities of their eyes, and their brains to process information, vary widely, and in ways you cannot imagine unless you personally have run into problems and have noticed that you have them. In the simplest almost universal case people start experiencing eyestrain around age 40 requiring the use of compensating visual orthotics. Age 40 also seems to be about the age of greatest denial ;-) From jimad at msn.com Sat Apr 17 15:32:43 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 15:32:43 -0700 Subject: [gutvol-d] [SPAM] RE: Re: Typesetting In-Reply-To: <4BC98D3A.6080908@perathoner.de> References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >With Stanza you can download directly from PG and many other free publishers. Sorry, but are you saying that you are actually currently running Stanza on an iPad, that you have tested this, and that it works? From what I can see they only have an iPod version, which yes will run on iPad -- and create a blurry simulation of an iPod on your iPad. From greg at durendal.org Sat Apr 17 15:46:24 2010 From: greg at durendal.org (Greg Weeks) Date: Sat, 17 Apr 2010 18:46:24 -0400 (EDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: >> It might make sense for the project comments to include a list of > words that the au hyphenates that might be problematic. A note to the > effect that au uses to-day and to-morrow might alleviate some anxiety > and asterisks. > > I have tried leaving project comments and the P1s and the P2s tend to read > and follow the project comments whereas the P3s ignore them and undo the > good work of the P1s and P2s. What I found was that there's a contingent in all rounds that seem to have never read the project comments. I don't know if they never read them, or just forgot them in the throws of proofing. -- Greg Weeks http://durendal.org:8080/greg/ From klofstrom at gmail.com Sat Apr 17 15:47:22 2010 From: klofstrom at gmail.com (Karen Lofstrom) Date: Sat, 17 Apr 2010 12:47:22 -1000 Subject: [gutvol-d] Dim view of P3ers Message-ID: On Sat, Apr 17, 2010 at 12:15 PM, Jim Adcock wrote: > I have tried leaving project comments and the P1s and the P2s tend to read and follow the project comments whereas the P3s ignore them and undo the good work of the P1s and P2s. And earlier Jim wrote: > I have stated repeatedly that I found found extremely competent and dedicated volunteers at all levels of DP -- and the converse. Bizarre. In one post you're drawing back from blanket accusations and in the next, you repeat them. Jim, I don't understand WHY you feel impelled to keep throwing stones at DP. You don't like the way we do things, you've left ... it's all behind you, right? But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG. I suppose I ought to just killfile you, as I have Bowerbird. -- Karen Lofstrom From hart at pglaf.org Sat Apr 17 17:07:11 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 17:07:11 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: > >With Stanza you can download directly from PG and many other free > publishers. > > Sorry, but are you saying that you are actually currently running Stanza on > an iPad, that you have tested this, and that it works? From what I can see > they only have an iPod version, which yes will run on iPad -- and create a > blurry simulation of an iPod on your iPad. iPads have their own iBooks App and if you search for "Project Gutenberg" and various titles what you get seems very much not to be what you call a "blurry simulation of an iPod on your iPad." I suggest that instead of taking Artistotle's thought processing to try a way of figuring out what an iPad looks like without looking at a real one of these gizmos that instead you just find one and actually look at it or the next best thing, look at the online demonstrations or ask someone who is trying one out to do some experimentation for you. In addition, you can also find a nice App from the people at Wattpad that also has a rather nice rendering of the Project Gutenberg eBooks on iPad. Given that eBook Apps surpassed game Apps on the iPod a while while, and, no, I don't know exactly when that was or if games took back the crown or eBooks kept the lead, but given that, I must presume eBook Apps will have a decent life on the iPad. I've tried out several reading experiences on the iPad and all seem quite easy to read from and the Apps store makes it quite obvious which App has been written specifically for the iPad and which for combinations. I'm sure all the iPod reader outfits that are still in production will be releassing iPad products that take full advantage of the 768 x 1024 res-- which works so well that you don't think at all about resolution and size becomes the only factor you will probably worry about. However, I should state in advance that I am sure people will find worry, about all sorts of things, that seem just fine to nearly everyone else. Personally, I'm just waiting to see what comes down the pike from persons who want to turn iPads into iMacs or whatever, and then start Apps Stores of various and sundry varieties, just like they did with iPhones, iPods & pretty much everything else in the computing world. Heck, the iPod was not out but a week when the first eBook reader was out to let people read our Project Gutenberg eBooks and others on it. I'm sure there will be dozens, if not hundreds, of iPad eBook readers. From hart at pglaf.org Sat Apr 17 17:11:13 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 17:11:13 -0700 (PDT) Subject: [gutvol-d] Re: Eyestrain (Was Typesetting) In-Reply-To: References: <1271444206-sup-9976@zion> <4BC95B6A.7000002@telkomsa.net> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: > The way people use their eyes, the ways people read, the capabilities of > their eyes, and their brains to process information, vary widely, and in > ways you cannot imagine unless you personally have run into problems and > have noticed that you have them. In the simplest almost universal case > people start experiencing eyestrain around age 40 requiring the use of > compensating visual orthotics. Age 40 also seems to be about the age of > greatest denial ;-) I could read the OED Microprint edition without decent lighting until 42. After that it was all downhill so fast I never really tried it any more-- with or without glasses, but would use the provided Bausch & Lomb reader. Today I use $1 glasses with all my computers. . .I just buy ever grade of magnification and leave each with the computer it works best with. I'll know I'm in trouble if/when I move to the 3x range. . .hee hee! From hart at pglaf.org Sat Apr 17 17:15:30 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 17:15:30 -0700 (PDT) Subject: [gutvol-d] Dim View: WAS Re: [SPAM] RE: Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: > >It might make sense for the project comments to include a list of > words that the au hyphenates that might be problematic. A note to the > effect that au uses to-day and to-morrow might alleviate some anxiety > and asterisks. > > I have tried leaving project comments and the P1s and the P2s tend to read > and follow the project comments whereas the P3s ignore them and undo the > good work of the P1s and P2s. Somehow in the context of the handful of mesages Jim Adcock sent, and in even in the context of this message, this does not seem to be a dim view. . .with two plusses and one minus. Of course, it would all work out better if the minus came first-- From hart at pglaf.org Sat Apr 17 17:16:51 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 17:16:51 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my collaborative proofreading site, part 3 In-Reply-To: References: <261a.60807b81.38fa0d43@aol.com> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: > >You could have *corrected* the misconceptions, rather than deciding > that we're all idiots. [Talk about a "Dim View". . . .] > I did correct the misconceptions and I did not decide that "we" are all > idiots. I have stated repeatedly that I found found extremely competent and > dedicated volunteers at all levels of DP -- and the converse. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From Bowerbird at aol.com Sat Apr 17 18:14:46 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sat, 17 Apr 2010 21:14:46 EDT Subject: [gutvol-d] Re: Typesetting (not really, but nobody seems to read subject-headers) Message-ID: michael said: > Of course, it would all work out better if the minus came first-- hey, that's an idea. have p3 proof first, followed by p2, then p1. *** the p3 proofers asterisk the end-line-hyphenates because that's the one course of action guaranteed not to be wrong. moreover, it's the _only_ one that carries that promise. *** michael said: > there will be dozens, if not hundreds, of iPad eBook readers. and 3/4 of them will claim to support the .epub format, but yet no two of them will do it in the exact same way... but hey, aren't y'all glad that we have a _standard_? i am! *** um, and jim is right about one thing. there's no stanza on ipad. unless amazon changes its mind, and reverses its current stand. *** as for eyestrain, if you have it, explore the various solutions! because it _is_ possible for you to get rid of it, in most cases. it might mean buying better equipment, but not necessarily; the solution might be free, and easy, and improve your life... *** oh, and that .pdf i created? no comments? whatsamatter? is something like that too _tangible_ to be discussed here? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmcdermott at mad-computer-scientist.com Sat Apr 17 18:49:45 2010 From: mmcdermott at mad-computer-scientist.com (Michaelu McDermott) Date: Sat, 17 Apr 2010 20:49:45 -0500 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <12881.668b4509.38fb544a@aol.com> References: <12881.668b4509.38fb544a@aol.com> Message-ID: <1271554991-sup-4922@zion> > i've run out a first draft of this book as a .pdf. I won't be able to run off some pages until tomorrow, but the PDF looks quite good. Offhand, though, the page numbers look like they drop low enough that they _could_ be out of the printable area. > p.s. why can't people pick a _short_ book for > demo purposes? long books clog the works... Aww, come on. That wouldn't be any fun, now would it? :) Seriously, though shorter works are poor representatives of the problem at hand. Picking the Declaration of Independence, or TS Eliot's the Wasteland, would be too simple to print up as is and ignore the issues, load up in a word processor, or manually mark up. -Michael Excerpts from Bowerbird's message of Sat Apr 17 13:13:30 -0500 2010: > michael said: > > Gods and Fighting Men > > ok, just so we can all get "on the same page", > i've run out a first draft of this book as a .pdf. > > > http://z-m-l.com/misc/14465-take5.pdf > > it's got some problems, notably with orphans, > including more than one page with one word, > but that's ok for the time being. > > michael, how would this .pdf fit your needs? > (you'll need to print a few pages to evaluate.) > what, if anything, would need to be changed? > > -bowerbird > > p.s. why can't people pick a _short_ book for > demo purposes? long books clog the works... -- Michael McDermott www.mad-computer-scientist.com From hart at pglaf.org Sat Apr 17 20:18:11 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sat, 17 Apr 2010 20:18:11 -0700 (PDT) Subject: [gutvol-d] Eyestrain In-Reply-To: References: Message-ID: Try reading white on black, or charcoal, or any such mixtures. Get the contrast where you like it, try lots of fonts, sizes, refresh rates, etc. From jimad at msn.com Sat Apr 17 23:27:49 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 23:27:49 -0700 Subject: [gutvol-d] Re: Dim view of P3ers In-Reply-To: References: Message-ID: >Jim, I don't understand WHY you feel impelled to keep throwing stones at DP. You don't like the way we do things, you've left ... it's all behind you, right? But no, you have to join the grouch group here at PG and repeatedly attack the organization that is providing the overwhelming majority of the texts submitted to PG. If you read my comments carefully I think you will find that I try to speak truthfully to what works at DP and at PG and what doesn't work so that we all can try to fix it and make a better contribution to the world. In the business world this would be called "continuous improvement." PG'ers at least seem to be able to generally acknowledge what works and what doesn't work. In DP-land if you don't drink the koolaid and declare it tasty then you fall constantly under attack. If there are problems with how P3 works -- and there are -- one would think DP would want to face up to that and work to improve it -- just as in PG-land the lack of standards are causing texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs. I could say "gosh let's ignore this because DP and PG are all volunteers and their hearts are in the right places and I wouldn't want to hurt anyone's feelings" but that wouldn't change the facts: DP wastes a lot of volunteer time and in general makes things more painful than need be due to aged tools and approaches. And PG distributes a lot of stuff that ends up appearing "broken" to end users because of the standards chosen -- and/or the lack thereof. From jimad at msn.com Sat Apr 17 23:47:08 2010 From: jimad at msn.com (Jim Adcock) Date: Sat, 17 Apr 2010 23:47:08 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >> Sorry, but are you saying that you are actually currently running Stanza on >> an iPad, that you have tested this, and that it works? From what I can see >> they only have an iPod version, which yes will run on iPad -- and create a >> blurry simulation of an iPod on your iPad. > >iPads have their own iBooks App and if you search for "Project Gutenberg" >and various titles what you get seems very much not to be what you call a >"blurry simulation of an iPod on your iPad." > >I suggest that instead of taking Artistotle's thought processing to try a >way of figuring out what an iPad looks like without looking at a real one >of these gizmos that instead you just find one and actually look at it or >the next best thing, look at the online demonstrations or ask someone who >is trying one out to do some experimentation for you. I have done all these things. I went to an apple store and played with an iPad as soon as they came out and was underwhelmed. I compared it to an iPod and decided that if I was going to consider either one probably the iPod made more sense to me. A friend has bought an iPod and we spent an evening playing with it trying to get PG books directly to it without passing through the Steve Jobs filter. For example in the web browser we tried downloading an ePub format book from PG and Apple blocks this whereas in comparison Kindle supports it -- as do PC browsers. We downloaded and installed Stanza and it showed up as a blurry simulation of an iPod within the iPad. Again, I am asking a serious question: Are you saying that you are actually currently running Stanza on an iPad, that you have tested this, and that it works? Because I have tested it and for me it didn't work, but rather showed up as a blurry simulation of an iPod on the iPad. There are also discussions on the web about how Steve Jobs required Stanza to take out features that allowed Stanza users to share non-DRM books with friends. If you have found "good" ways to get PG directly to iPad how about discussing them in detail, what you did to have success, rather than flaming me -- because I have tried and what I have seen to date is not very encouraging. If you own an iPad and have had luck directly loading a PG book from PG onto your iPad and can read it then please share with us how because that will certainly affect my purchase decision -- or lack thereof. Yes one can use the apple ibooks app to read copies of PG books redistributed by Apple where Apple has stripped the PG legalese and acknowledgements - at least the first 20,000 titles, the most recent stuff doesn't seem to be there. I have already said this in previous emails. From jimad at msn.com Sun Apr 18 00:21:43 2010 From: jimad at msn.com (James Adcock) Date: Sun, 18 Apr 2010 00:21:43 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <12881.668b4509.38fb544a@aol.com> References: <12881.668b4509.38fb544a@aol.com> Message-ID: > http://z-m-l.com/misc/14465-take5.pdf First time I tried downloading this is didn't work. Tried it again later from a different computer and it worked. Tried printing out the first 10 pages. My printer reported that the document requested C5 page size - but the C series is an envelope size? I would have expected A4 or US "Letter" size. First Page title appears to print off center to the left. Contents in an unusually small font Page numbers in an unusually large font Ragged Right is an unusual convention for a PDF document Body font seems to be unusually small. Line length of approx 70 chars seems unusually long for a book-like format. Most books use about 50 chars per line of text because doing so makes the book more readable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richfield at telkomsa.net Sun Apr 18 00:48:54 2010 From: richfield at telkomsa.net (Jon Richfield) Date: Sun, 18 Apr 2010 09:48:54 +0200 Subject: [gutvol-d] Re: Eyestrain (Was Typesetting) In-Reply-To: References: <1271444206-sup-9976@zion> <4BC95B6A.7000002@telkomsa.net> Message-ID: <4BCAB966.3010805@telkomsa.net> Yes, I agree with both. I never was very comfortable with OEDMP at any age, but could read it in good light at a pinch till about 50 (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) Now, as it happens, I am (primarily) an unfrocked biologist and have discovered that the strongest "readers" I can find, (+3.5 to 4 if I am lucky) though useless for proper reading (my current prescription is +2.5) make very useful visual aids for field work and are perfect for OEDMP reading; far better than the rather good magnifier supplied with the books. BTW, in case anyone else in the forum still reads and enjoys books, paper books (a medium that needs redesign, and I am just the man to do it!) might be interested in a useful expedient that I happened across. My OEDMP came in a box/shelf with magnifier and two slots, one for each tome. As the designers of the package obviously had experience of what happened to large volumes that got manhandled by their bindings, they had a neat expedient: behind each tome a strip of tough, transparent plastic was fastened to the upper back corner of the slot, and hung down to the bottom, passing thence to the front, where it emerged as a tab below each volume. To get the volume out without brutalising it, you simply pulled at the matching tab. The volume then emerged a few inches without damage or inconvenient scrabbling, and could then be picked up in a civilised, nondestructive mode. Now, after some 40 years or so, (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) those strips of polyester or whatever (I omitted to burn a bit, so I am uncertain; it might have been plasticised PVC or something (can't remember exactly; memory going along with other virtues. Used to have senior moments. Now have junior moments Not yet in my pants, but no doubt that too is on the way.) ) began to go nonfunctional and their connections failed. So I removed them. Then an idea struck as my gathering senility went on strike for a while. Some idiot was lining a dam with plastic in the near neighbourhood and offcuts of 2mm-thick black HDPE were lying around as though waste were a virtue. I had liberated a square metre or two and cut two strips to fit where the transparent plastic had gone. Unlike the original, my inserts were much stiffer and I applied some brutal folding to make it turn the corner, but had no need to fasten it at the top back corner. It works amazingly, smoothly and cleanly, and it is harmless to book, cabinet and reader. Two moving parts, including the book. Its only shortcoming for general use on broad shelves is that one needs strips that roughly correspond to the widths of the matching books. One could design shelves and attachments to overcome that (very minor) problem, but I seldom have such a need, so I let it go. Old age and all that. Cheers, Jon On 2010/04/18 02:11 AM, Michael S. Hart wrote: > On Sat, 17 Apr 2010, Jim Adcock wrote: > > >> The way people use their eyes, the ways people read, the capabilities of >> their eyes, and their brains to process information, vary widely, and in >> ways you cannot imagine unless you personally have run into problems and >> have noticed that you have them. In the simplest almost universal case >> people start experiencing eyestrain around age 40 requiring the use of >> compensating visual orthotics. Age 40 also seems to be about the age of >> greatest denial ;-) >> > I could read the OED Microprint edition without decent lighting until 42. > > After that it was all downhill so fast I never really tried it any more-- > with or without glasses, but would use the provided Bausch& Lomb reader. > > Today I use $1 glasses with all my computers. . .I just buy ever grade of > magnification and leave each with the computer it works best with. > > I'll know I'm in trouble if/when I move to the 3x range. . .hee hee! > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > From hart at pglaf.org Sun Apr 18 02:47:05 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sun, 18 Apr 2010 02:47:05 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Sat, 17 Apr 2010, Jim Adcock wrote: > >> Sorry, but are you saying that you are actually currently running Stanza > on > >> an iPad, that you have tested this, and that it works? From what I can > see > >> they only have an iPod version, which yes will run on iPad -- and create > a > >> blurry simulation of an iPod on your iPad. > > > >iPads have their own iBooks App and if you search for "Project Gutenberg" > >and various titles what you get seems very much not to be what you call a > >"blurry simulation of an iPod on your iPad." > > > >I suggest that instead of taking Artistotle's thought processing to try a > >way of figuring out what an iPad looks like without looking at a real one > >of these gizmos that instead you just find one and actually look at it or > >the next best thing, look at the online demonstrations or ask someone who > >is trying one out to do some experimentation for you. > > I have done all these things. I went to an apple store and played with an > iPad as soon as they came out and was underwhelmed. I compared it to an > iPod and decided that if I was going to consider either one probably the > iPod made more sense to me. A friend has bought an iPod and we spent an > evening playing with it trying to get PG books directly to it without > passing through the Steve Jobs filter. For example in the web browser we > tried downloading an ePub format book from PG and Apple blocks this whereas > in comparison Kindle supports it -- as do PC browsers. We downloaded and > installed Stanza and it showed up as a blurry simulation of an iPod within > the iPad. Somewhere in the previous paragraph you seem to have switched from talking "A friend has bought an iPod and we spent and evening playing with it...", to "it showed up as a blurry simulation of an iPod within the iPad", with, it would appear, no switch of topic from iPod to iPad. Was there are typo in "friend has bought an iPod" where you meant "iPad"?, or did I miss something else that indicated changes from iPod to iPad? > Again, I am asking a serious question: Are you saying that you are actually > currently running Stanza on an iPad, that you have tested this, and that it > works? Because I have tested it and for me it didn't work, but rather showed > up as a blurry simulation of an iPod on the iPad. There are also > discussions on the web about how Steve Jobs required Stanza to take out > features that allowed Stanza users to share non-DRM books with friends. I didn't mention Stanza at all, so how can you be asking me "a serious question: Are you saying you are actually running Stanza on an iPad?" Perhaps you can restate this and also enlighten us on the feature that is missing, where it and how to use it on the other Stanza version[s]. > If you have found "good" ways to get PG directly to iPad how about > discussing them in detail, what you did to have success, I told you. . .I used the iBooks App that popped up at first turn on, and I also used the Wattpad App. If you don't like those, you might try Goodreader Lite, before trying though I am not sure of the details, haven't tried it yet. > rather than flaming me -- Flaming you? After all the previous harshness, you accuse ME of flaming you? Is that because I asked if you didn't try iBooks and Wattpad? Neither of which product mentions did you reply to, nor even "Thanks, but no thanks for the suggestion." Not to mention attacking me for something I said about Stanza, when I didn't even mention Stanza. Please. . .lighten up. . .I'm on your side. . .and trying to help. > because I have tried and what I have seen to date is not very > encouraging. If you own an iPad and have had luck directly loading a PG book > from PG onto your iPad and can read it then please share with us how because > that will certainly affect my purchase decision -- or lack thereof. > Yes one can use the apple ibooks app to read copies of PG books > redistributed by Apple where Apple has stripped the PG legalese and > acknowledgements - at least the first 20,000 titles, the most recent stuff > doesn't seem to be there. I have already said this in previous emails. Yet this is a case of "had luck directly loading a PG book," though not "from PG onto your iPad" but "can read it". . . . Personally, I don't care where anyone gets our books from, just as long as we get them out to people. As for the more recent titles, yes, most people "start at the beginning, and continue on until they get to the end." However, I am guessing even if/when they catch up, there will still be some delay, as is true for any numbers of other sites that have relayed our books from us to others in format variety, or other change, that gives them a certain appeal beyond our own formats. Some of these hand out nearly as many as we do from our largest sites. Our goal is the most eBooks to the most people. All of these people are helping us do this, and we don't pay them anything. In a very real sense Apple, Amazon, et al, work for Project Gutenberg. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From marcello at perathoner.de Sun Apr 18 03:37:55 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 18 Apr 2010 12:37:55 +0200 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: <4BCAE103.2030105@perathoner.de> Jim Adcock wrote: >> With Stanza you can download directly from PG and many other free > publishers. > > Sorry, but are you saying that you are actually currently running Stanza on > an iPad, that you have tested this, and that it works? From what I can see > they only have an iPod version, which yes will run on iPad -- and create a > blurry simulation of an iPod on your iPad. I didn't because Apple sent me no iPad and I never bought from Apple in my life nor will I unless they radically change their business model. By Lexcycle's own claim Stanza is compatible with the iPad: http://itunes.apple.com/us/app/stanza/id284956128?mt=8 I run Stanza on a Touch and download dozens of PG ePubs every day. Just point a new 'Book Source' at m.gutenberg.org (Don't publish this url because we have not enough server horsepower behind it yet.) -- Marcello Perathoner webmaster at gutenberg.org From marcello at perathoner.de Sun Apr 18 04:23:06 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 18 Apr 2010 13:23:06 +0200 Subject: [gutvol-d] DP output is technically obsolete In-Reply-To: References: Message-ID: <4BCAEB9A.2040105@perathoner.de> Karen Lofstrom wrote: > But no, you have to join the grouch group here at > PG and repeatedly attack the organization that is providing the > overwhelming majority of the texts submitted to PG. Quantity, yes ... Let's talk *quality* instead. The problem is not that some PPers are incompetent, the problem is that the whole DP output is technically obsolete: DP is producing `HTML Facsimiles for the Desktop? while it should be producing eBooks. Which do you think is more useful? A book you can only read at home on your dektop or a book you can read everywhere on your phone? Ironically much of PPing clogs the queues while lessening the value of the books. DP output renders ugly on all devices except desktop-sized screens. DP HTML is almost as hard to convert to other formats as PG plain text. DP has to enforce some standard that greatly simplifies the output. -- Marcello Perathoner webmaster at gutenberg.org From hart at pglaf.org Sun Apr 18 07:11:05 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sun, 18 Apr 2010 07:11:05 -0700 (PDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCAEB9A.2040105@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> Message-ID: Hear! Hear! On Sun, 18 Apr 2010, Marcello Perathoner wrote: > Karen Lofstrom wrote: > > > But no, you have to join the grouch group here at > > PG and repeatedly attack the organization that is providing the overwhelming > > majority of the texts submitted to PG. > > Quantity, yes ... Let's talk *quality* instead. > > The problem is not that some PPers are incompetent, the problem is that the > whole DP output is technically obsolete: > > DP is producing `HTML Facsimiles for the Desktop? while it should be producing > eBooks. > > Which do you think is more useful? A book you can only read at home on your > dektop or a book you can read everywhere on your phone? > > Ironically much of PPing clogs the queues while lessening the value of the > books. > > DP output renders ugly on all devices except desktop-sized screens. > > DP HTML is almost as hard to convert to other formats as PG plain text. > > DP has to enforce some standard that greatly simplifies the output. > > > > > From prosfilaes at gmail.com Sun Apr 18 07:24:57 2010 From: prosfilaes at gmail.com (David Starner) Date: Sun, 18 Apr 2010 10:24:57 -0400 Subject: [gutvol-d] Re: Dim view of P3ers In-Reply-To: References: Message-ID: On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock wrote: >?In the > business world this would be called "continuous improvement." Jim, in the business world, your complaint about the fact the business wasn't working on your preferred projects would annoy the hell out of your coworkers the eighth time they heard it, just like here. -- Kie ekzistas vivo, ekzistas espero. From traverso at posso.dm.unipi.it Sun Apr 18 08:05:09 2010 From: traverso at posso.dm.unipi.it (Carlo Traverso) Date: Sun, 18 Apr 2010 17:05:09 +0200 (CEST) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCAEB9A.2040105@perathoner.de> (message from Marcello Perathoner on Sun, 18 Apr 2010 13:23:06 +0200) References: <4BCAEB9A.2040105@perathoner.de> Message-ID: <20100418150509.8A3501008D@cardano.dm.unipi.it> >>>>> "Marcello" == Marcello Perathoner writes: Marcello> Karen Lofstrom wrote: >> But no, you have to join the grouch group here at PG and >> repeatedly attack the organization that is providing the >> overwhelming majority of the texts submitted to PG. Marcello> Quantity, yes ... Let's talk *quality* instead. Marcello> The problem is not that some PPers are incompetent, the Marcello> problem is that the whole DP output is technically Marcello> obsolete: Marcello> DP is producing `HTML Facsimiles for the Desktop? while Marcello> it should be producing eBooks. Marcello> Which do you think is more useful? A book you can only Marcello> read at home on your dektop or a book you can read Marcello> everywhere on your phone? Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging. Carlo From dakretz at gmail.com Sun Apr 18 09:20:03 2010 From: dakretz at gmail.com (don kretz) Date: Sun, 18 Apr 2010 09:20:03 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> Message-ID: It really doesn't matter what DP targets as long as it's capable of identifying, completely and unambiguously, the requisite syntactic elements. But we have no agreed list, not even an ad hoc functional one, of what those are. Instead our focus is on subjective elegance of appearance rather than on objective clarity and completeness. "Good work" has come to be associated with "looks pretty and makes the PPer feel good," plus the ability to pass two sets of incompletely documented and sometimes inconsistent automated tests - the postprocessor tools and the whitewashers' tools - neither of which were intended to consider syntactic rigor and accuracy. Interestingly, we seem to have instinctively inferred the need of this. the HTML texts often include some basic form of it (or more accurately an ad hoc collection of basic forms) in the CSS stylesheets. It seems to me that we need the "what" before we worry about the "how". Don On Sun, Apr 18, 2010 at 8:05 AM, Carlo Traverso wrote: > >>>>> "Marcello" == Marcello Perathoner writes: > > Marcello> Karen Lofstrom wrote: > > >> But no, you have to join the grouch group here at PG and > >> repeatedly attack the organization that is providing the > >> overwhelming majority of the texts submitted to PG. > > Marcello> Quantity, yes ... Let's talk *quality* instead. > > Marcello> The problem is not that some PPers are incompetent, the > Marcello> problem is that the whole DP output is technically > Marcello> obsolete: > > Marcello> DP is producing `HTML Facsimiles for the Desktop? while > Marcello> it should be producing eBooks. > > Marcello> Which do you think is more useful? A book you can only > Marcello> read at home on your dektop or a book you can read > Marcello> everywhere on your phone? > > Is PG ready to accept Epub as submission format? (i.e. one submits a > valid epub from which the other formats are derived)? If so, one can > target Epub, otherwise at best one is forced to submit HTML or txt > that converts not-too-badly with current PG tools, and this migh be > extremely challenging. > > Carlo > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajhaines at shaw.ca Sun Apr 18 09:29:58 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sun, 18 Apr 2010 09:29:58 -0700 Subject: [gutvol-d] Reporting errors in PG files (was Dim view of P3ers) Message-ID: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> Jim Adcock wrote: >just as in PG-land the lack of standards are causing >texts to be distributed to users frequently missing or duplicating letters >and words and in some cases whole paragraphs. Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.) If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number). Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores". Al From marcello at perathoner.de Sun Apr 18 09:35:36 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Sun, 18 Apr 2010 18:35:36 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> Message-ID: <4BCB34D8.8090908@perathoner.de> Carlo Traverso wrote: >>>>>> "Marcello" == Marcello Perathoner writes: > > Marcello> Karen Lofstrom wrote: > > >> But no, you have to join the grouch group here at PG and > >> repeatedly attack the organization that is providing the > >> overwhelming majority of the texts submitted to PG. > > Marcello> Quantity, yes ... Let's talk *quality* instead. > > Marcello> The problem is not that some PPers are incompetent, the > Marcello> problem is that the whole DP output is technically > Marcello> obsolete: > > Marcello> DP is producing `HTML Facsimiles for the Desktop? while > Marcello> it should be producing eBooks. > > Marcello> Which do you think is more useful? A book you can only > Marcello> read at home on your dektop or a book you can read > Marcello> everywhere on your phone? > > Is PG ready to accept Epub as submission format? (i.e. one submits a > valid epub from which the other formats are derived)? If so, one can > target Epub, otherwise at best one is forced to submit HTML or txt > that converts not-too-badly with current PG tools, and this migh be > extremely challenging. That is not the problem. You can botch ePub as easily as you can HTML. (In fact ePub is only HTML + some metadata) You should produce HTML that is *semantically* correct and degrades gracefully. Ie. if you remove all CSS it should still make sense. Most prominent offenders are non-semantic headers, preformatted text, positioning, floating and ornaments. -- Marcello Perathoner webmaster at gutenberg.org From gbnewby at pglaf.org Sun Apr 18 10:05:36 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 18 Apr 2010 10:05:36 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> Message-ID: <20100418170536.GA22578@pglaf.org> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso wrote: > Is PG ready to accept Epub as submission format? (i.e. one submits a > valid epub from which the other formats are derived)? If so, one can > target Epub, otherwise at best one is forced to submit HTML or txt > that converts not-too-badly with current PG tools, and this migh be > extremely challenging. > > Carlo I don't think we're ready for this except in rare cases where ePub is the best format for display for a particular item (we just released a book where PDF was the best format, believe it or not). The challenge is that when books are fixed, someone (typically the whitewasher, seldom the original submitter) needs to regenerate all the files from that book. Since there is not yet any standard processing stream to generate static ePub files, this makes it hard for fixes (to HTML & text) to be applied to ePubs. I would, of course, love to see something become our "standard" conversion tool, usable by anyone. Right now, the closest for PG is Marcello's software to build the cached ePub files. It's wonderful and functional, but is it ready for all envisioned purposes? I think not, due at least in part to shortcomings of the input HTML. ALL that said, maybe I am too hung up on automated or semi-automated methods. It *is* the case that an ePub can yield plain HTML, which could be edited and zipped up into a new ePub (without too much trouble). Is there enough benefit in such ePubs? Are there good examples of hand-crafted (or automated, but using different software than is used on the gutenberg.org server) that are far superior to the alternatives? Having a single master format, from which all subsidiary formats can be derived, has been a long-time goal. This has not yet been viable for most titles, despite valiant (and productive) efforts with HTML and TeX. >From everything I've seen about ePub, adding static ePub files to the collection would be a net increase in the effort needed to apply fixes (i.e., it would be one MORE format to deal with by hand, not a generated format that would be very little extra work to generate). There are lots of people involved in creating, managing and fixing eBook files, and there is certainly room for any experiments that people can think of. My response isn't intended to quell such effort, rather to state that given the current state of things, I don't think ePub is a great candidate for a new static file format for the PG collection. -- Greg From dakretz at gmail.com Sun Apr 18 10:07:35 2010 From: dakretz at gmail.com (don kretz) Date: Sun, 18 Apr 2010 10:07:35 -0700 Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) In-Reply-To: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> Message-ID: It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images, and a form to provide the required information, and least for point cases. Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing. On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) wrote: > Jim Adcock wrote: > > just as in PG-land the lack of standards are causing >> texts to be distributed to users frequently missing or duplicating letters >> and words and in some cases whole paragraphs. >> > > Errors in PG's files should be reported to the Errata system: > errata2010_AT_pglaf.org > > Error reports should be as specific as possible. Mention the etext number, > the line number(s), the line(s) of text in question, and the proposed > correction(s) to each. If there are many errors, feel free to download and > correct the existing files, and send them to the above address. (Don't > re-wrap; don't touch the PG header or footer.) > > If you feel that a text can be fixed only by a complete re-do (maybe it's > missing the illustrations, the index, or whatever), feel free to download a > scanset, get a copyright clearance, and have at it. When the new fileset is > submitted through the normal process, mention the text number that it's an > update/correction/replacement for. The original producer's credit will be > added to yours, the original etext will be archived, and the new version > posted (under the original etext number). > > Simply complaining about errors isn't useful, nor are general complaints, > especially concerning older texts, such as "italics aren't shown" or > "all-caps are used for italics, not underscores". > > Al > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Sun Apr 18 10:29:18 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 18 Apr 2010 10:29:18 -0700 Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) In-Reply-To: References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> Message-ID: <20100418172918.GA24296@pglaf.org> On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote: > It seems to me that error identification, reporting, verification and repair > would be > a lot easier if PG provided easily-accessible on-line access to the page > images, We post 'em when we get 'em. There is guidance for the file naming convention on images. Mostly we do not get page images. In the case of DP, a few people have provided page images after the eBooks were posted. But this does not seem to be a part of the regular DP processing chain. > and a form to provide the required information, and least for point cases. A form... maybe. I am not sure this would make things any easier to fix (for the fixers -- there are only three people who regularly apply fixes -- Al is one of them, so his views carry more weight than mine!). But it might make it easier for people to report errata. > Then the reporting person could just find the page, check the image you're > going > to use for verification, and narrow things down for processing. Sure. Only some errors require checking page images, but it would be nice to have them. It would be nice to have them for numerous purposes to which our readers might put them. -- Greg > On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) wrote: > > > Jim Adcock wrote: > > > > just as in PG-land the lack of standards are causing > >> texts to be distributed to users frequently missing or duplicating letters > >> and words and in some cases whole paragraphs. > >> > > > > Errors in PG's files should be reported to the Errata system: > > errata2010_AT_pglaf.org > > > > Error reports should be as specific as possible. Mention the etext number, > > the line number(s), the line(s) of text in question, and the proposed > > correction(s) to each. If there are many errors, feel free to download and > > correct the existing files, and send them to the above address. (Don't > > re-wrap; don't touch the PG header or footer.) > > > > If you feel that a text can be fixed only by a complete re-do (maybe it's > > missing the illustrations, the index, or whatever), feel free to download a > > scanset, get a copyright clearance, and have at it. When the new fileset is > > submitted through the normal process, mention the text number that it's an > > update/correction/replacement for. The original producer's credit will be > > added to yours, the original etext will be archived, and the new version > > posted (under the original etext number). > > > > Simply complaining about errors isn't useful, nor are general complaints, > > especially concerning older texts, such as "italics aren't shown" or > > "all-caps are used for italics, not underscores". > > > > Al > > > > > > _______________________________________________ > > gutvol-d mailing list > > gutvol-d at lists.pglaf.org > > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From ajhaines at shaw.ca Sun Apr 18 10:58:06 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sun, 18 Apr 2010 10:58:06 -0700 Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> Message-ID: <64E735529F244281A37EF3F097BFCBA5@alp2400> The only page scans PG has are those that may have been submitted by the preparer. (Joshua Hutchinson has submitted many scansets of DP productions.) Some DP submitters provide page scans linked to page numbers in the HTML version, but this is rare. (I don't think I've ever seen a scanset from an independent producer.) The Whitewashers, a.k.a. the Errata Team, simply aren't equipped to find, download, and process pagescans for the submissions they handle. Any questions/policy concerning making pagescans mandatory, e.g. the cost/amount of the increased drive space needed, I leave to Greg/Michael. An errata submission webform would be useful. (Some emailed errata reports are sadly lacking in detail.) Maybe sometime when Greg has a student intern? ----- Original Message ----- From: don kretz To: Project Gutenberg Volunteer Discussion Sent: Sunday, April 18, 2010 10:07 AM Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) It seems to me that error identification, reporting, verification and repair would be a lot easier if PG provided easily-accessible on-line access to the page images, and a form to provide the required information, and least for point cases. Then the reporting person could just find the page, check the image you're going to use for verification, and narrow things down for processing. On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) wrote: Jim Adcock wrote: just as in PG-land the lack of standards are causing texts to be distributed to users frequently missing or duplicating letters and words and in some cases whole paragraphs. Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org Error reports should be as specific as possible. Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each. If there are many errors, feel free to download and correct the existing files, and send them to the above address. (Don't re-wrap; don't touch the PG header or footer.) If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for. The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number). Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores". Al _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajhaines at shaw.ca Sun Apr 18 11:12:36 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sun, 18 Apr 2010 11:12:36 -0700 Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> <20100418172918.GA24296@pglaf.org> Message-ID: <5EDA62DFBF564575AAFA1CD75287E15D@alp2400> Greg said: > A form... maybe. I am not sure this would make things any easier to > fix (for the fixers -- there are only three people who regularly apply > fixes -- Al is one of them, so his views carry more weight than mine!). > But it might make it easier for people to report errata. A webform would (hopefully) make reporting more consistent, possibly with such mandatory fields as etext number, title, and author. (Yes, the occasional report arrives with none of them, only a pre-10K filename, which has to be tracked down in the gutindex files to find the etext number.) However, the current volume of errata reports (several/week, if that), probably doesn't make the work of creating such a form worth while. And, agreed--it wouldn't help the actual correction process. ----- Original Message ----- From: "Greg Newby" To: "Project Gutenberg Volunteer Discussion" Sent: Sunday, April 18, 2010 10:29 AM Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) > On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote: >> It seems to me that error identification, reporting, verification and >> repair >> would be >> a lot easier if PG provided easily-accessible on-line access to the page >> images, > > We post 'em when we get 'em. There is guidance for the file naming > convention on images. > > Mostly we do not get page images. In the case of DP, a few people > have provided page images after the eBooks were posted. But this does > not seem to be a part of the regular DP processing chain. > >> and a form to provide the required information, and least for point >> cases. > > A form... maybe. I am not sure this would make things any easier to > fix (for the fixers -- there are only three people who regularly apply > fixes -- Al is one of them, so his views carry more weight than mine!). > But it might make it easier for people to report errata. > >> Then the reporting person could just find the page, check the image >> you're >> going >> to use for verification, and narrow things down for processing. > > Sure. Only some errors require checking page images, but it would > be nice to have them. It would be nice to have them for numerous > purposes to which our readers might put them. > -- Greg > >> On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) >> wrote: >> >> > Jim Adcock wrote: >> > >> > just as in PG-land the lack of standards are causing >> >> texts to be distributed to users frequently missing or duplicating >> >> letters >> >> and words and in some cases whole paragraphs. >> >> >> > >> > Errors in PG's files should be reported to the Errata system: >> > errata2010_AT_pglaf.org >> > >> > Error reports should be as specific as possible. Mention the etext >> > number, >> > the line number(s), the line(s) of text in question, and the proposed >> > correction(s) to each. If there are many errors, feel free to download >> > and >> > correct the existing files, and send them to the above address. (Don't >> > re-wrap; don't touch the PG header or footer.) >> > >> > If you feel that a text can be fixed only by a complete re-do (maybe >> > it's >> > missing the illustrations, the index, or whatever), feel free to >> > download a >> > scanset, get a copyright clearance, and have at it. When the new >> > fileset is >> > submitted through the normal process, mention the text number that it's >> > an >> > update/correction/replacement for. The original producer's credit will >> > be >> > added to yours, the original etext will be archived, and the new >> > version >> > posted (under the original etext number). >> > >> > Simply complaining about errors isn't useful, nor are general >> > complaints, >> > especially concerning older texts, such as "italics aren't shown" or >> > "all-caps are used for italics, not underscores". >> > >> > Al >> > >> > >> > _______________________________________________ >> > gutvol-d mailing list >> > gutvol-d at lists.pglaf.org >> > http://lists.pglaf.org/mailman/listinfo/gutvol-d >> > > >> _______________________________________________ >> gutvol-d mailing list >> gutvol-d at lists.pglaf.org >> http://lists.pglaf.org/mailman/listinfo/gutvol-d > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From Bowerbird at aol.com Sun Apr 18 12:26:53 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 18 Apr 2010 15:26:53 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <10bbb.720a5730.38fcb6fd@aol.com> boy oh boy, it must really be _painful_ to marcello that he has to now be saying the very same things that -- when i said them here for years and years -- he constantly disagreed with, and called me names. assholes really hate to admit that they were wrong... you know he had to resist it for a long time, but still, eventually, one just cannot dispute the truth, can one? *** greg said: > Having a single master format, from which > all subsidiary formats can be derived, has been > a long-time goal.? This has not yet been viable > for most titles, despite valiant (and productive) > efforts with HTML and TeX. unfortunately, if you've been _ignoring_ what has happened here over the years -- as dr. newby has, evidently -- then you might not realize that i have already proven z.m.l. can be that "master format"... that's right, greg! while you've supposedly been "looking for" such a format, i've been right here, shoving it up under your nose. do you see it now? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Sun Apr 18 12:34:36 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 18 Apr 2010 15:34:36 EDT Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) Message-ID: <110e5.659b7916.38fcb8cc@aol.com> greg said: > There is guidance for the file naming convention on images. evidently, as part of his general ignorance of what goes on here, dr. newby missed my recent devastating critique of this "convention". -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Sun Apr 18 12:46:55 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 18 Apr 2010 15:46:55 EDT Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) Message-ID: <11976.78437e8e.38fcbbaf@aol.com> al said: > An errata submission webform would be useful.? > (Some emailed errata?reports are sadly lacking in detail.) > Maybe sometime when Greg has a student intern? my gawd. what a bunch of idiots we have in charge here. who needs a "student intern" to create a darn web-form? i devised a whole error-reporting system for you to use. it's up on my site right now. *** as for the page-images, we have equivalent stupidity... the page-images for the books that d.p. processes are sitting at d.p. for the entire time that the book is there. p.g. could scrape them with the greatest of ease, _if_ it truly wanted them. why should anyone be forced to "submit" them? do we _enjoy_ wasting people's time? look... if y'all are too busy to do things correctly, fine. but at least empower somebody else to do the job, ok? because otherwise your stupidity starts to look _willful_. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Morasch at aol.com Sun Apr 18 13:06:47 2010 From: Morasch at aol.com (Morasch at aol.com) Date: Sun, 18 Apr 2010 16:06:47 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <40a9b.162dd4d7.38fcc057@aol.com> michael mcd said: > the PDF looks quite good. Offhand, though, > the page numbers look like they drop low enough > that they _could_ be out of the printable area. you're right. so i fixed that. download it again... > http://z-m-l.com/misc/14465-take5.pdf > shorter works are poor representatives of the problem still, you don't need 350 pages to cover the waterfront, especially when 315 of them have nothing happening... *** jim said: > First time I tried downloading this is didn?t work. Tried > it again later from a different computer and it worked. don't know what to tell you, jim. > Tried printing out the first 10 pages.? > My printer reported that the document requested C5 page size > ? but the C series is an envelope size? > I would have expected A4 or US ?Letter? size. the pagesize is 5.5*8.5; that's what michael wanted. eventually, how you will print it out will depend on what you intend to do with it in terms of _binding_. for this preview, you have 2 convenient options... you can print it out 2-up, on letter-size, using the "layout" method you should find in the print dialog. for enhanced realism, slice pages down the middle. or you can print it out on 5.5*8.5, which is available at most any office-supplies stores, in my experience. we'll discuss printing and binding more, at a later time. > First Page title appears to print off center to the left. looks pretty dead-on centered to me, at least in the .pdf. did you print to 5.5*8.5 paper? if so, it should be right... > Contents in an unusually small font correct... my preference is for the contents section to be shown on 1 page, 2 pages max, so i had to cramp the font. i woulda reworked it manually if i wanted to take the time. reworking entails moving the chapters to the first page of each _book_ section, so the contents section up front just contains the entries relating to the _parts_ and the _books._ it's issues like these that get into the nitty-gritty questions on how automated you want the whole process to become. the easiest solution would be to run the table of contents over to 3 pages, or 4, or whatever it happens to be... but that approach doesn't produce a lot of satisfaction for me. so the question for me is, "how hard is it to automate what i would _really_ like to do in various situations like these?" since i was doing this by hand, to get on the same page with michael, i was willing to do a little manual massage. > Page numbers in an unusually large font the text-editor i used to create this .pdf does it that way, and as far as i can see, there's no way that i can control it. but that's not the way my program does it. so it's not something we really need to worry about... > Ragged Right is an unusual convention for a PDF document michael expressed no preference; ragged-right was easier. > Body font seems to be unusually small. i think so too. it's 10-point times new roman, i believe. (yep, that's it, checked.) but michael has the young eyes. what i was doing, in case it wasn't totally clear to people, was retaining the existing linebreaks from the p.g. e-text. in order to get (reasonable) half-inch margins on each side, i had to reduce point-size down to what _i_ feel is too small. ...but i wasn't doing this for me; i was doing it for michael... > Line length of approx 70 chars seems unusually long for a > book-like format.? Most books use about 50 chars per line > of text because doing so makes the book more readable. so you didn't notice that i was using the existing p.g. linebreaks. there were several tip-offs. first and foremost, the first line of each paragraph is too long, because of the indent i introduced. second, the lines are more ragged than they should be, because line-length decisions are made on a monospace character count, not on a metric based on the line's proportionally-spaced width. (_any_ proportionally-spaced metric is better than monospacing, since the proportions are highly correlated across various fonts.) and thirdly, like i said, and you said, the line-lengths are too long. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Sun Apr 18 13:18:16 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Sun, 18 Apr 2010 13:18:16 -0700 Subject: [gutvol-d] Seeking current/past bookshelf maintainers Message-ID: <20100418201816.GB30665@pglaf.org> We used to have a bookshelf@ alias, but I've lost track of who this is supposed to go to. If it's you, or you know who is (was?) maintaining the bookshelf area of the gutenberg.org wikispace, please drop me a note. This page is all we have for contact info, but it's currently a dead end: http://www.gutenberg.org/wiki/Gutenberg:Bookshelf_Contributions Thanks in advance. -- Greg (a.k.a. one of the idiots in charge) From ricardofdiogo at gmail.com Sun Apr 18 14:44:52 2010 From: ricardofdiogo at gmail.com (Ricardo F Diogo) Date: Sun, 18 Apr 2010 22:44:52 +0100 Subject: [gutvol-d] Re: Seeking current/past bookshelf maintainers In-Reply-To: <20100418201816.GB30665@pglaf.org> References: <20100418201816.GB30665@pglaf.org> Message-ID: It was Robert Marquardt, who died in Dec 2007. From hart at pglaf.org Sun Apr 18 15:17:07 2010 From: hart at pglaf.org (Michael S. Hart) Date: Sun, 18 Apr 2010 15:17:07 -0700 (PDT) Subject: [gutvol-d] Re: Dim view of P3ers In-Reply-To: References: Message-ID: David Starner, if you only would be willing to take your own advice. So much of what you say here, and I've said it before, is complaint, without you providing any hope of solution. As I have said before-- there is a word for this, but it is not used in polite conversation. If only you took EITHER your own advice OR your signature block: "Kie ekzistas vivo, ekzistas espero." at all seriously, then we would be glad to hear from you, however it turns out that all too much of what you say goes to /dev/null or the various other killfiles people use to filter you out. Now. . .please. . .give some hope. . .or you will most certainly see the result of using vinegar rather than honey to get what you want-- presuming you really do want things to get/work better. Please. . .take a lesson from you own words. . . . You once said something like: As an honest person I am willing to learn from my mistakes. . . . Please do. . . . On Sun, 18 Apr 2010, David Starner wrote: > On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock wrote: > >?In the > > business world this would be called "continuous improvement." > > Jim, in the business world, your complaint about the fact the business > wasn't working on your preferred projects would annoy the hell out of > your coworkers the eighth time they heard it, just like here. > > -- > Kie ekzistas vivo, ekzistas espero. > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From Bowerbird at aol.com Sun Apr 18 15:54:58 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 18 Apr 2010 18:54:58 EDT Subject: [gutvol-d] Re: the idiots in charge Message-ID: <41f08.7e064389.38fce7c2@aol.com> greg said: > -- Greg (a.k.a. one of the idiots in charge) well, good, at least you have a sense of humor about it. :+) look, i apologize if i have criticized anyone unduly, as my intentions are not to hurt anyone's feelings... we've discussed some of these topics over and over, and even though some solutions are rather _obvious_ and we seem to have people willing to implement 'em, nonetheless _nothing_ ever seems to get done. _ever._ and when the topics come up again, and again and again, the people in charge act as if dialog has never been held. so the only conclusion that seems plausible out here in volunteer-land is that nobody at the top is _listening_... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From traverso at posso.dm.unipi.it Sun Apr 18 19:18:56 2010 From: traverso at posso.dm.unipi.it (Carlo Traverso) Date: Mon, 19 Apr 2010 04:18:56 +0200 (CEST) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100418170536.GA22578@pglaf.org> (message from Greg Newby on Sun, 18 Apr 2010 10:05:36 -0700) References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> Message-ID: <20100419021856.6C702100B0@cardano.dm.unipi.it> >>>>> "Greg" == Greg Newby writes: Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso Greg> wrote: >> Is PG ready to accept Epub as submission format? (i.e. one >> submits a valid epub from which the other formats are derived)? >> If so, one can target Epub, otherwise at best one is forced to >> submit HTML or txt that converts not-too-badly with current PG >> tools, and this migh be extremely challenging. >> >> Carlo Greg> I don't think we're ready for this except in rare cases Greg> where ePub is the best format for display for a particular Greg> item (we just released a book where PDF was the best format, Greg> believe it or not). Greg> The challenge is that when books are fixed, someone Greg> (typically the whitewasher, seldom the original submitter) Greg> needs to regenerate all the files from that book. Greg> Since there is not yet any standard processing stream to Greg> generate static ePub files, this makes it hard for fixes (to Greg> HTML & text) to be applied to ePubs. Greg> I would, of course, love to see something become our Greg> "standard" conversion tool, usable by anyone. Right now, Greg> the closest for PG is Marcello's software to build the Greg> cached ePub files. It's wonderful and functional, but is it Greg> ready for all envisioned purposes? I think not, due at Greg> least in part to shortcomings of the input HTML. That's the whole point of my proposal. Starting with hand-crafted HTML we are likely to end with poor ePub, since the inference of metadata might be wrong, and many features of HTML need to be tuned to ePub and might not turn out correct; While obtaining reasonable HTML from ePub is just unzipping and discarding metadata. Maybe it will be harder to have "nicely handcrafted" HTML, but we have to give the best available product in the standard format that most users are likely to use (and of course a reasonable product in every other format). To maintain ePub (to correct typos) one has to unzip the ePub, correct the HTML and re-zip. Another issue is to automate the creation of txt from HTML. Currently, the output of w3m -dump (or links -dump, or lynx -dump etc.) is pretty good for txt, except that font changes (mainly, underscores for italics) are lost. It shouldn't be difficult to pre-process the HTML to show the underscores for italics, in such a way that one obtains a reasonable PG txt file. This might work better from the HTML generated from epub (in which the HTML is more constrained) than for handcrafted HTML. It might be a bit more challenging to downgrade from UTF-8 (as generated by -dump) to iso-8859-1 or to ASCII, for example to handle the unicode characters that are used to draw tables, but this might be very well automated too. This is on my side an offer to work towards the production of a toolchain along these lines, if it is not discarded a priori. Carlo From Bowerbird at aol.com Sun Apr 18 19:40:33 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 18 Apr 2010 22:40:33 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <1fdf6.4812f09b.38fd1ca1@aol.com> carlo said: > Another issue is to automate the creation of txt from HTML. why do it backwards? when it's done correctly, the .txt file can create the .html... and an xhtml file, if that's what you want. and your .epub. plus it can generate whatever kind of .pdf you might want... don't you realize how stupid you sound when you say this? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcello at perathoner.de Mon Apr 19 02:15:00 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 19 Apr 2010 11:15:00 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100419021856.6C702100B0@cardano.dm.unipi.it> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> Message-ID: <4BCC1F14.1090801@perathoner.de> Carlo Traverso wrote: >>>>>> "Greg" == Greg Newby writes: > > Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso > Greg> wrote: > >> Is PG ready to accept Epub as submission format? (i.e. one > >> submits a valid epub from which the other formats are derived)? > >> If so, one can target Epub, otherwise at best one is forced to > >> submit HTML or txt that converts not-too-badly with current PG > >> tools, and this migh be extremely challenging. > >> > >> Carlo > > Greg> I don't think we're ready for this except in rare cases > Greg> where ePub is the best format for display for a particular > Greg> item (we just released a book where PDF was the best format, > Greg> believe it or not). > > Greg> The challenge is that when books are fixed, someone > Greg> (typically the whitewasher, seldom the original submitter) > Greg> needs to regenerate all the files from that book. > > Greg> Since there is not yet any standard processing stream to > Greg> generate static ePub files, this makes it hard for fixes (to > Greg> HTML & text) to be applied to ePubs. > > Greg> I would, of course, love to see something become our > Greg> "standard" conversion tool, usable by anyone. Right now, > Greg> the closest for PG is Marcello's software to build the > Greg> cached ePub files. It's wonderful and functional, but is it > Greg> ready for all envisioned purposes? I think not, due at > Greg> least in part to shortcomings of the input HTML. > > That's the whole point of my proposal. Starting with hand-crafted HTML > we are likely to end with poor ePub, since the inference of metadata > might be wrong, and many features of HTML need to be tuned to ePub and > might not turn out correct; And what about users who download the HTML to view on a mobile? You must produce better HTML not for the sake of ePub but for the sake of universal usability. The metadata come directly from the PG database and are updated whenever the PG database changes. That makes our metadata far more consistent than your proposal would do. > While obtaining reasonable HTML from ePub > is just unzipping and discarding metadata. ePub HTML is often split into chapters, which may leave you with 50+ files after unzipping which you have to merge manually. > This is on my side an offer to work towards the production of a > toolchain along these lines, if it is not discarded a priori. Before that can happen a major `paradigm shift? has to happen at DP. At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM. What the PPers should do is to produce light semantic markup that lets the user choose the presentation and device: "Get It The Way You Want." The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted. -- Marcello Perathoner webmaster at gutenberg.org From jimad at msn.com Mon Apr 19 06:45:02 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 06:45:02 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >Was there are typo in "friend has bought an iPod" where you meant "iPad"?, or did I miss something else that indicated changes from iPod to iPad? Yes Typo Sorry iPad iPod sometimes (always?) Apple is too clever for its own good. >I didn't mention Stanza at all, so how can you be asking me "a serious question: Are you saying you are actually running Stanza on an iPad?" You trashed me for playing mind games whereas I was actually describing my actual experience with an iPad. I bragged to my friend who had bought an iPad how you could get these wonderful PG books on the iPad for free and then proceeded to try to show him "all I know" about the subject -- and of course nothing I tried to show him about reading these wonderful PG books on the iPad actually worked in practice -- except we discovered that Apple has ported at least a subset of PG books to the iBooks App stripping out the PG header and acknowledgements in the process and "locked" the books to the iBooks Applet which I think is actually a pretty bad reader app when you get right down to it not even allowing one to set the margins....but it does contain fluff such as animated page turns which is cute for about the first five pages. >Perhaps you can restate this and also enlighten us on the feature that is missing, where it and how to use it on the other Stanza version[s]. As BB said Lexcycle which is now owned by Amazon doesn't appear to be releasing a copy of Stanza for the iPad. On the contrary when you download Stanza for the iPad from the Apple Store what you get is a copy of Stanza for the iPod which shows up within the iPod simulator build into the iPad. The text of that iPod simulator has been "zoomed in on" without even substituting a higher-rez version of the text resulting in a very blurry read. The controversy about the Apple "censorship" of Stanza can be found here: http://www.google.com/search?q=Apple+Stanza+USB For example quote "Lexcycle's Stanza e-book reader for the iPhone and iPod touch has been stripped of USB book sharing, at the request of Apple...." where "at the request of Apple" means "if you don't do what we say you can't distribute your app via the Apple Store which is the only way to distribute your app." >> If you have found "good" ways to get PG directly to iPad how about >> discussing them in detail, what you did to have success, > >I told you. . .I used the iBooks App that popped up at first turn on, >and I also used the Wattpad App. Sorry, but I don't know about the Wattpad App but I'm pretty sure the iBooks App doesn't allow one to directly load PG books from the PG site. Or have you discovered something I didn't discover? Why do I care? I want to be able to read what I want to read, and I want to be able to use the internet and wifi to do so to get what I want to read where I want to get it. I don't want to send my $500 to Steve Jobs in order to *test* whether or not he has locked down the iPad so much I cannot read what I want to read. Nook is worthless, for example -- too locked down. Has wifi which could be great -- but B&N doesn't actually let you use that which you have paid for. Kindle has weak and slow whispernet/AT&T connection which is troublesome here in the 'burbs, also PG seems to be leaning towards ePub instead of MOBI, which begs the question of long-term viability of MOBI -- but ePub in turn has problems of dueling distributors and incompatible DRM schemes.... iPad says it allows you to transfer books via USB and the iTunes, but if you have to plug in a USB cable then nook and Kindle have the same capabilities so then why bother putting in wifi in the first place? If you don't let the purchaser use it? >Is that because I asked if you didn't try iBooks and Wattpad? How would I try these things without sending my $500 to Jobs for the privilege of *testing* his offering? You can't download this stuff at the Apple Store. What I really need to know is if any PG person has succeeding in directly transferring books of their choice from an internet site of their choice using wifi. >Personally, I don't care where anyone gets our books from, just as long as we get them out to people. I care because I would like to be able to use iPad or whatever to read books in development, say for example SR from DP or my own efforts. And I don't want to wait an extra year or two for PG to make a new DVD distribution to go out to Apple or whoever so that they can stick their own DRM scheme on that PG effort or reduce it all down to txt before turning it back into HTML and from there into ePub or MOBI -- to choose a few common examples. >In a very real sense Apple, Amazon, et al, work for Project Gutenberg. I would certainly disagree with this statement if they stick DRM on a PG effort, or if they work to prevent redistribution of PG books among friends. If they do these things then they are working AGAINST PG -- and using your own books to do so. From hart at pglaf.org Mon Apr 19 06:57:26 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 06:57:26 -0700 (PDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC1F14.1090801@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> Message-ID: Worthy of a second look: Marcello Perathoner said: [re: eBooks for cellphones, etc] Before that can happen a major `paradigm shift? has to happen at DP. At DP the PPers enjoy to push their pet preferences down the readers throat: "What *I* See Is What You Get." And most PP time is spent in weaving those personal preference deep into the markup so as to make the markup pretty useless for anything but desktop devices with lots of screen, lots of cycles and lots of RAM. What the PPers should do is to produce light semantic markup that lets the user choose the presentation and device: "Get It The Way You Want." The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted. From jimad at msn.com Mon Apr 19 07:25:23 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 07:25:23 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: <4BCAE103.2030105@perathoner.de> References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> <4BCAE103.2030105@perathoner.de> Message-ID: > m.gutenberg.org Wow. Now that *is* a step in the right direction! Hope we-all will be able to talk about it soon. From hart at pglaf.org Mon Apr 19 07:50:25 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 07:50:25 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: Starting with your last comment first: What DRM is put on PG files? I thought the DRM was in the reader program, not the files. In general: you said you had hands on experience with and iPad, but couldn't find anything that looked good, gave Stanza example. Here is my suggestion: The next time you get your hands on an iPad, or even ask friends to try it for you, just do the little search they have and try a few obvious things like "books" "ebooks" and similar things. You'll get a handful of free or "lite" ereaders programs, and in some cases they will be for the iPad, some for the iPod, and you can compare them yourself to your heart's content, and give your conclusions here, please. My own conclusions were that all the iPad programs are readable. Black on white, or white on black. If the "Accessibility" black and white reversal doesn't work, that means the program has that under control via it's own commands. You also complained about not getting directly from PG and DRM. As I have said so many times, I don't care who redistributes PG, from Tea Party people to Sarah Palin or Tina Fey. . .period. If they put our books on their sites, or go the other way, great difference from my POV, unless they censor out some books, but I am not sure that is a valid reason even then to stop them. There are right and left wing physical libraries. . .who cares? More below: On Mon, 19 Apr 2010, Jim Adcock wrote: > >Was there are typo in "friend has bought an iPod" where you meant "iPad"?, > or did I miss something else that indicated changes from iPod to iPad? > > Yes Typo Sorry iPad iPod sometimes (always?) Apple is too clever for its own > good. So are some of the people who post here. . . . > >I didn't mention Stanza at all, so how can you be asking me "a serious > question: Are you saying you are actually running Stanza on an iPad?" You still haven't made any point about Stanza, nor answered my question. You say you are serious, and then you are all bent out of shape in both your questions and your answers, and then you blame me and Apple. . . . "The fault lies not the stars, the fault lies in ourselves." > You trashed me for playing mind games No, you trashed yourself, if you insist on calling it that, by writing something that was very incomplete, inconclusive and confusing. . . . The solution is just to lighten up and try again, not to accuse worlds of "flaming" and "trashing" you. Just make your point[s] as best you can, and move on. An apology for when you have been confusing is also appropriate, with no need to blame me or Apple. > whereas I was actually describing my actual experience with an iPad. Let's just say your "actual experience with an iPad" could have used a little more explanation, perhaps a little more experience. I just searched for "ebook" downloaded programs, and searched in those for "Project Gutenberg." I didn't expect to find a list of 30,000 titles on first try, any more than I expect to get the books at pglaf.org or gutenberg.org or .cc on the first try, even after lots of practice, certainly NOT first time. > I bragged to my friend who had bought an iPad how you could get these > wonderful PG books on the iPad for free and then proceeded to try to show > him "all I know" about the subject -- and of course nothing I tried to show > him about reading these wonderful PG books on the iPad actually worked in It worked for me, but then I gave it a few tries. However, the first two both worked, as did all the others made for iPad, though I have not tried each one in great detail, but enough to bring up books I know I typed in. Keep trying. . . . I really hate to say this, as you'll probably accuse me of flame/trashing but it sounds as if you have spent more time complaining here than in the actual testing of the product. I'm sure you know that Apple wants to control how files get to iPads. However, it certainly appears that at least several of the programs I was testing have their own ways of getting our "Alice in Wonderland" example. [Big Snip, will address later, if we get a few requests for it] > >Is that because I asked if you didn't try iBooks and Wattpad? > > How would I try these things without sending my $500 to Jobs for the > privilege of *testing* his offering? My apologies, perhaps I have this all backwards, as I thought I had it backwards when you swapped "iPad" for "iPod" or vice versa: I thought you already had managed to "try these things without sending my $500 to Jobs for the privilege of *testing* his offering. . . . Did you, or did you not, make that trial run with an iPad? If you did, then I made suggestions for how to get what you want. Or at least what you SAY you want, but I'm not sure any longer. If you did not take a test drive. . . . Well, in either case I suggest more test driving, and searching for "ebook" and "book" and the like and downloading their programs. > You can't download this stuff at the Apple Store. Then how did I manage to download them from the store? I just tapped on "Apps" and did my little searches. . . . Isn't that the way you're supposed to? Am I really missing something here about your experience??? If so, I apologize, and am willing to start again, but I strongly suggest a visit to where you can play with an iPad again and try, hopefully successfully, some of the suggestions I already made. > What I really need to know is if any PG person has succeeding in directly > transferring books of their choice from an internet site of their choice > using wifi. You can do this with Goodreader. There is a free Goodreader Lite for the iPad/iPhone, that let's grab 5 books at a time. . .I tried it. . .it works. > >Personally, I don't care where anyone gets our books from, just as long as > >we get them out to people. > > I care because I would like to be able to use iPad or whatever to read books > in development, say for example SR from DP or my own efforts. And I don't > want to wait an extra year or two for PG to make a new DVD distribution to > go out to Apple or whoever so that they can stick their own DRM scheme on > that PG effort or reduce it all down to txt before turning it back into HTML > and from there into ePub or MOBI -- to choose a few common examples. I agree that it's a pain to have to wait for books in progress. However, Goodreader should let you download those if you find them. > >In a very real sense Apple, Amazon, et al, work for Project Gutenberg. > > I would certainly disagree with this statement if they stick DRM on a PG > effort, or if they work to prevent redistribution of PG books among friends. > If they do these things then they are working AGAINST PG -- and using your > own books to do so. You insist on saying that someone who is filling this glass has left it half empty just because it is not overflowing to the whole world. Anything that gets more people to read more books is a positive even if it is a little too high on the hog for most or is not into files- sharing on your scale. However, Wattpad, Goodreader, iBooks, and other do provide relief. Times will change, people will jailbreak iPads and all. . . . > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From jimad at msn.com Mon Apr 19 07:53:06 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 07:53:06 -0700 Subject: [gutvol-d] Re: Dim view of P3ers In-Reply-To: References: Message-ID: >Jim, in the business world, your complaint about the fact the business wasn't working on your preferred projects would annoy the hell out of your coworkers the eighth time they heard it, just like here. That is probably a true statement: When one talks about things being "broken" and open for possible improvement the response is almost always universally scorn and derision. Only when an organization falls into acute duress is it usually open to considering change -- if then. The US auto industry being perhaps a current, but weak, example. Stating that I have a dim view of P3ers is probably overstating the case. What I am sure I have a dim view of is: Query-hyphen and especially the rote overuse of it by some P3ers. The rote removal of whitespace on both sides of m-dash even when that is clearly not author intent. Some P3ers who are clearly just SR'ing without looking at the page images. Punting "bugs" down field under the assumption that *someone else* is going to fix them. Not having a clear point in the process when the "proofing" phase is supposedly done. Taking 3+ years to create a text, or not finishing a text that has had considerable volunteer time and effort invested in it. Designing a process where *no one* is allowed to take responsibility for a text. Distributing texts that have less than 1 or more than 1 copy of some portion of an author's text. Distributing "risen to the public domain" texts under DRM Preventing friends and fellow citizens from sharing texts "risen to the public domain" Otherwise claiming or enforcing restrictions on the sharing and redistribution of texts "risen to the public domain" Creating texts that cannot be used as widely as possible on a great variety of differing reader machines including addressing issues of "accessibility" Demoware -- Sorry if any of these statements are controversial -- I don't think they should be! From jimad at msn.com Mon Apr 19 08:00:46 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 08:00:46 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> Message-ID: >Is PG ready to accept Epub as submission format? (i.e. one submits a valid epub from which the other formats are derived)? If so, one can target Epub, otherwise at best one is forced to submit HTML or txt that converts not-too-badly with current PG tools, and this migh be extremely challenging. It would be nice to have a portable version of the current tools, so that transcribers can see how their HTML is going to "officially" translate into ePub and MOBI prior to submission. I tried porting the tools, but got bogged down by the amount of stuff which wouldn't port easily. From jimad at msn.com Mon Apr 19 08:21:59 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 08:21:59 -0700 Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers) In-Reply-To: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400> Message-ID: >Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org Not sure how that's going to help when the problems are pretty systematic? >If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it. I'm doing one such right now but I'm apprehensive of the flame-fest that will ensue if one namely me actually tries to redo an old text. But, I guess I'm willing to throw my body on the fire and see what happens *next*... >Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores". The more general problem is that texts continue to be created that are generally not readable with fidelity by many users on many different machines. Typical problem, as others have mentioned, lies in the choice HTML coding techniques used, and a preference for visual cuteness on one or another HTML machine rather than on fidelity on a wide variety of HTML and HTML derived machines -- including issues of "accessibility." From jimad at msn.com Mon Apr 19 08:43:57 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 08:43:57 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC1F14.1090801@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> Message-ID: >The PPers will have to relinquish their power of God -- or have it wrested from their hands -- and very strict guidelines will have to be put into place as to what markup is accepted. I'm not sure that the PPers in question understand the damage they are doing. A first step would be not to force changes but at least let people know what problems they are creating and how NOT to cause them. There are some people at DP who care about these issues -- and obviously others who do not. Obviously its very hard to tell people to try to minimize their use of CSS.... From tunelera at yahoo.com Mon Apr 19 08:47:17 2010 From: tunelera at yahoo.com (Julia C. Miller) Date: Mon, 19 Apr 2010 10:47:17 -0500 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> Message-ID: <4BCC7B05.5020506@yahoo.com> In order for a "paradigm shift" to happen at DP, PG has to define what is and is not acceptable in the HTML and spell it out so that DP can put it into practice. I took another look at the PG HTML FAQ and it does not say anything that might be used as a guide to improving HTML output. It would also be extremely helpful to have a way to preview the different output formats so we can test our finished HTML and make sure it works properly not only as HTML but also as the source for the other formats. I (for one) am happy to modify the way I do things -- as long as someone explains what should/shouldn't be done and why. I am not a computer professional (and neither are many or most of the PPers at DP) and don't have the time or background to track down the current thinking on how to code HTML. But I don't have a problem modifying my practices to end up with a better end product. Perhaps some of the time that is spent ranting about DP's work flow and DP's output could be better put to use creating more informative FAQs or even guidelines that DPers can use to create output that fits into the current thinking about acceptable HTML and/or other formats. On 4/19/2010 8:57 AM, Michael S. Hart wrote: > Worthy of a second look: > > > Marcello Perathoner said: [re: eBooks for cellphones, etc] > > > Before that can happen a major `paradigm shift? has to happen at DP. > > At DP the PPers enjoy to push their pet preferences down the readers throat: > "What *I* See Is What You Get." And most PP time is spent in weaving those > personal preference deep into the markup so as to make the markup pretty > useless for anything but desktop devices with lots of screen, lots of cycles > and lots of RAM. > > What the PPers should do is to produce light semantic markup that lets the > user choose the presentation and device: "Get It The Way You Want." > > The PPers will have to relinquish their power of God -- or have it wrested > from their hands -- and very strict guidelines will have to be put into place > as to what markup is accepted. > > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prosfilaes at gmail.com Mon Apr 19 09:13:34 2010 From: prosfilaes at gmail.com (David Starner) Date: Mon, 19 Apr 2010 12:13:34 -0400 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC1F14.1090801@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> Message-ID: On Mon, Apr 19, 2010 at 5:15 AM, Marcello Perathoner wrote: > At DP the PPers enjoy to push their pet preferences down the readers throat: > "What *I* See Is What You Get." And most PP time is spent in weaving those > personal preference deep into the markup so as to make the markup pretty > useless for anything but desktop devices with lots of screen, lots of cycles > and lots of RAM. You know we might have TEI-Lite now if you hadn't tried to push your pet preferences about what the generated HTML must look like on all DP projects, especially when you had the audacity to call it standard when it clearly wasn't. -- Kie ekzistas vivo, ekzistas espero. From dakretz at gmail.com Mon Apr 19 09:29:40 2010 From: dakretz at gmail.com (don kretz) Date: Mon, 19 Apr 2010 09:29:40 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <1fdf6.4812f09b.38fd1ca1@aol.com> References: <1fdf6.4812f09b.38fd1ca1@aol.com> Message-ID: Hypothesis: A good paradigm for proofing and marking up a book is an outline. Several assumptions that help this to work. 1. Without any exceptions I can think of, any comprehensible printed text can be completely, unambiguously outlined. We know from experience it works. Any XML document, including an XHTMLdocument, complies by definition. It's not just a good idea, it's the law. 2. An outline is easy to define and easy to understand. Conceptually, it's simply a regular hierarchical structure, with every syntactic element completely embedded within another below a simple sequential list of top-level elements. 3. Any syntactic element can be structurally identified as one of three types. a.) A section. b.) A sequence of characters. c.) A position offset from the start (of the text, and/or of an element.) We know from experience that this works. Any HTML element can be bound by one of only two types: a
or a . What we need to do is to associate logical divs and spans with syntax. ================================================== Benefits: A book that has been outlined is probably simultaneously easier to build, to read, to comprehend, to verify visually, to verify grammatically with software, and to transform into ebook markups than any other format. And structurally, it's self-validating. Low barrier to entry. Anyone can proof with confidence from the start, with a brief introduction and a list of syntax elements. ================================================== Proofing interface. Notice that the proofing representation can be entirely separate from the serialized representation - i.e. how it's stored in a file for instance. What might it look like? We have lots of history for this - there are not many ways to represent language that are more universal than an outline. Almost all of us come pre-trained. Say the convention is to start an element with a newline, a plus sign, and a syntax tag, on a line by themselves. Paragraphs are so common that they can just start with, say, two blank lines. An element's content continues with indented content. An element ends with the start of another element at the same indentation level, two blank lines (another paragraph), or outdented content. +chapter +chapter-heading Chapter The First It was a dark and windy ... I think I'll play with this a bit and see how far it goes. Is anyone familiar with other attempts in this line? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Mon Apr 19 09:33:30 2010 From: dakretz at gmail.com (don kretz) Date: Mon, 19 Apr 2010 09:33:30 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <1fdf6.4812f09b.38fd1ca1@aol.com> Message-ID: Oh, and ... Yes, indenting every line of text is a bitch. So don't do it. If the tagging is done properly (probably an adaptation of the example), software can indent it automatically. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee at novomail.net Mon Apr 19 09:34:09 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 10:34:09 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC7B05.5020506@yahoo.com> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> Message-ID: <4BCC8601.3020408@novomail.net> On 4/19/2010 9:47 AM, Julia C. Miller wrote: > In order for a "paradigm shift" to happen at DP, PG has to define what > is and is not acceptable in the HTML and spell it out so that DP can put > it into practice. I took another look at the PG HTML FAQ and it does not > say anything that might be used as a guide to improving HTML output. The odds of this happening are about equivalent to that of having porcine aviators; Mr. Hart is diametrically opposed to standards of any kind for PG. However, PG creating an HTML standard is in fact unnecessary. According to Mr. Hart (although somewhat disputed by Mr. Haines) PG will accept just about anything it is given. Thus, DP could establish its own HTML guidelines with the assurance that they would be acceptable to PG. Non-conforming HTML could still make its way into the PG corpus from other sources, but at least the DP work-product would be consistent. > It would also be extremely helpful to have a way to preview the > different output formats so we can test our finished HTML and make sure > it works properly not only as HTML but also as the source for the other > formats. This could be so difficult as to be nigh on impossible. For example, as most here know, the ".epub" format is actually just a zip file containing (among other things) the XHTML version of the document. How that document is displayed does not rely at all on the nature of the document's markup, but almost exclusively on the capabilities of reading device's software. The .epub readers based on JavaScript (such as Monocle) will probably display the text with as much richness as the hosting browser software would, whereas standalone .epub readers (such as ?Book) will only display what the software designers felt was important, and probably will not support CSS at all. No one viewer can tell you if the markup is satisfactory, because with .epub the markup is only part of the story. On the other hand, if DP were to establish HTML guidelines and requirements (requirements for a baseline, guidelines for enhancements) I would be happy to code up a program which would test for conformance to those guidelines. I couldn't give you a picture, but I could give you a thousand words. > I (for one) am happy to modify the way I do things -- as long as someone > explains what should/shouldn't be done and why. I am not a computer > professional (and neither are many or most of the PPers at DP) and don't > have the time or background to track down the current thinking on how to > code HTML. But I don't have a problem modifying my practices to end up > with a better end product. Adding HTML markup to a document (or modifying that which is already there) is nowhere near as difficult as many would have you believe. Check out http://web.archive.org/web/20080327044926/gutenberg.hwg.org/tutorials.html and http://www.dysfunctionals.org/~networker/HTMLeBooks.html. But you are correct, having a document like one of these which is DP-sanctioned would simplify a PPers life dramatically. > Perhaps some of the time that is spent ranting about DP's work flow and > DP's output could be better put to use creating more informative FAQs or > even guidelines that DPers can use to create output that fits into the > current thinking about acceptable HTML and/or other formats. Many have tried (among them Mr. Hutchinson and Mr. Perathoner). But without organizational buy-in those FAQs and guidelines will go nowhere--fast. Unfortunately, there appears to be no one left at DP with the clout to say, "this is our first draft of HTML guidelines. Comments and discussion is welcome, but by the end of the year some sort of guidelines /will/ be adopted." As near as I can tell, the ranters rant not because DP's work flows are, shall we say, sub-optimal, or because the FAQs and guidelines have not been written, but because none of the Powers That Be at DP seem to be willing to do anything about it. These kinds of decisions cannot be made by consensus. Somebody needs to step up to the plate. Mr. Adcock seems to still have enough respect for DP that he believes it can be improved. I do not. I would love for someone to prove me wrong. From dakretz at gmail.com Mon Apr 19 09:45:29 2010 From: dakretz at gmail.com (don kretz) Date: Mon, 19 Apr 2010 09:45:29 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC8601.3020408@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC8601.3020408@novomail.net> Message-ID: Also, I see maybe 3 or 4 elements that should be identified in-line using conventions we already have. Italics, boldface, small-caps (although these are often micro-headings), ... One opportunity I think would be break out embedded quotes and make them visually obvious. And their boundaries checkable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ke at gnu.franken.de Mon Apr 19 09:57:05 2010 From: ke at gnu.franken.de (Karl Eichwalder) Date: Mon, 19 Apr 2010 18:57:05 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: (David Starner's message of "Mon, 19 Apr 2010 12:13:34 -0400") References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> Message-ID: David Starner writes: > You know we might have TEI-Lite now if you hadn't tried to push your > pet preferences about what the generated HTML must look like on all DP > projects, especially when you had the audacity to call it standard > when it clearly wasn't. At least, tidy seems to be happy with it and you can embed your own CSS fragments. And, finally, it's much better than all these handcrafted HTML exercises that are mostly just a waste of time. -- Karl Eichwalder From jimad at msn.com Mon Apr 19 10:07:28 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 10:07:28 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >What DRM is put on PG files? I thought the DRM was in the reader program, not the files. DRM on books in my experience is typically implemented as a device-specific encryption such that even if you move an ebook file to a different machine you own that machine cannot read the file. A hidden key say "serial number" on a particular hardware device is used as a decryption key to allow decryption of the file encrypted specifically for that device. Thus for example if one buys an in-copyright book from Amazon and you physically copy that ebook file from one Kindle you own to another Kindle you own the second Kindle still will not be able to read the ebook file for the first Kindle. While Amazon will typically allow you to read one purchased book on six devices simultaneously -- including on Kindle for the PC, Kindle for the Mac, Kindle for the Blackberry, etc, each of those ebook files has to be downloaded separately from Amazon because each comes with a unique device-specific encryption. Not everything from Amazon needs to have DRM, the publisher who uploads to Amazon has the right to specify "I don't want DRM on my book." Further, the encryption schemes are typically owned proprietary to a particular company or consortium and discussing or distributing information about those encryption schemes is against the law. And thus ePub specs for example doesn't include description of a ePub specific DRM scheme rather each distributor of ePub files can implement their own proprietary and mutually incompatible DRM scheme such that owning multiple ePub devices is not sufficient to ensure that one can purchase an ePub book for one device and read it on another device. And if you have an ePub library of purchased books that you read on your blackberry or what have you and now you want to move that library to your new iPad well too bad because its probably not going to work. Nor can you resell your ebooks to someone else on eBay when you are done with them. Even without DRM as far as I know all storage on iPad is currently tied to a particular app so even if you have a non-DRM "PG" book under Apples' iBooks applet you can't say "Gee let me open that up in Stanza because Stanza offers a better ebook reader" -- you can't do that because the iPad ties the book file to the particular reader applet. If Apple were to allow book transfer via USB then god forbid you could at least move non-DRM books from one iPad reader applet to a different iPad reader applet! >The next time you get your hands on an iPad, or even ask friends to try it for you, just do the little search they have and try a few obvious things like "books" "ebooks" and similar things. Sorry, perhaps "friends" was too strong a word but I thought what I have been asking here is if anyone in PG or DP land has actually found an applet that will allow them to directly download a free book from the PG website or other websites using wifi and read it in a manner that makes you happy. Or is it necessary, as in the case of Apple's own iBooks applet, to *always* tie the distribution path of the applet to the applet itself? This may seem like a strange question except that Apple already HAS shut down distribution of books by USB except via the iTunes monopoly. >As I have said so many times, I don't care who redistributes PG, from Tea Party people to Sarah Palin or Tina Fey. . .period. Again, I acknowledge that *you* don't care, but I do: I want to be able to get books and publications directly from a variety of web sites via wifi, and I don't want the applet nor the device to tell me where I can get MOBI or ePub books from anymore than I would want a web browser tell what HTML I am allowed read from what HTML sites. This is the ebook version of "net neutrality" as opposed to buying a device from Big Brother and letting Big Brother then tell you that you can only use that device to buy MORE product from Big Brother. Even Big Bill allows me to install a large variety of reader apps on my netbook say, right click on some ebook I see anywhere on the internet, and that book automagically opens in my choice of reader app. >An apology for when you have been confusing is also appropriate, with no need to blame me or Apple. I apologize for having difficultly unambiguously discussing terminology that Appple chooses to be deliberately ambiguous as a cute marketing device. >It worked for me, but then I gave it a few tries. However, the first two both worked, as did all the others made for iPad, though I have not tried each one in great detail, but enough to bring up books I know I typed in. You are I have differing ideas of what "it worked" means. For example on a Kindle, which again is also not the most "unlocked" device in the world, I can web browse to www.gutenberg.org, click on a MOBI title there, and it "works", or I can go to FreeKindleBooks, or to Feedbooks, etc -- my choice of publisher -- and if it's a "free book" I can get it -- it works. I can't get it if it's a "for pay" book because Amazon has locked up that distribution channel -- which is not a good thing. As opposed to Nook where none of this works at all except the direct "for pay" path from B&N. >I really hate to say this, as you'll probably accuse me of flame/trashing but it sounds as if you have spent more time complaining here than in the actual testing of the product. We spent about four hours playing with the iPad and trying to get it to do what we wanted it to do -- namely direct access to free ebooks on particular websites on the internet. In that amount of time I was already writing software to freely distribute books on the Kindle when I got my first Kindle Dec 2007. >I'm sure you know that Apple wants to control how files get to iPads. Yes, the only question is just how badly "locked down" their device is in the matter -- and whether or not they will take steps again in the future to force an increase in that "lock down". Again, what I want is the ebook version of "net neutrality" -- I want to have an ebook reader applet which is independent of ebook publisher. I don't want to have to acquire and use a different ebook reader applet for each book I want to purchase -- or acquire freely on the internet. There are WAY more internet sites offering interesting books and other publications that there are organizations willing to write applets for the iPad! >I thought you already had managed to "try these things without sending my $500 to Jobs for the privilege of *testing* his offering. . . . Sure, I borrowed a friend and his iPad for four hours of lack of success trying various approaches after which he ran away screaming.... >Then how did I manage to download them from the store? Sorry again more Apple cuteness, there is the "Apple Store" virtual on the internet, and there is the "Apple Store" bricks and mortar at the Mall. I can download software from the virtual Apple Store to my desktop, but then I don't have a physical iPad to test it on. Or I can go to the Mall where they have a physical iPad, but then I don't have permission to download and install applets from the virtual Apple Store. And I've used up my friendship for right now with the "bricks and mortar" friend who has a "bricks and mortar" iPad... >You can do this with Goodreader. OK, good suggestion -- their website looks promising I will dig into it more -- thanks! >Times will change, people will jailbreak iPads and all. . . . I am hoping that the future OS in the works for iPad may make things less restrictive. Not personally interested in hacking anything to get increased access. Hacking to my taste is incompatible with creating texts for PG.... From jimad at msn.com Mon Apr 19 10:26:22 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 10:26:22 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC8601.3020408@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC8601.3020408@novomail.net> Message-ID: >> It would also be extremely helpful to have a way to preview the >> different output formats so we can test our finished HTML and make sure >> it works properly not only as HTML but also as the source for the other >> formats. > >This could be so difficult as to be nigh on impossible. For example, as >most here know, the ".epub" format is actually just a zip file >containing (among other things) the XHTML version of the document.... Sorry, but I've looked and tried to port Marcello's HTML->epub code and its anything but that simple. (But I am not an experienced Python coder) Again, to my mind a "preview" need simply be a portable version of Marcello's code so that we can do our own HTML to ePub conversion (and from there to MOBI) and run it on the variety of ePub and MOBI reader devices and software we already own, so that we have *some* idea of the problems that the particular HTML is going to run into on various portable devices. And I am sure there are any number of people who are willing to preview a DP candidate release on the hardware they own in order to find what problems there are to be found -- most of us are pretty passionate about our choice of hardware and would like very much for DP/PG to produce ebooks that actually work on our hardware investments! PS: I already to make preview versions of my HTML on ePub and MOBI -- its just that the HTML->ePub and HTML->MOBI conversion software I have is not identical to Marcello's and thus the formatting ends up different than the "official" version. From marcello at perathoner.de Mon Apr 19 10:35:52 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Mon, 19 Apr 2010 19:35:52 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC7B05.5020506@yahoo.com> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> Message-ID: <4BCC9478.5040204@perathoner.de> Julia C. Miller wrote: > In order for a "paradigm shift" to happen at DP, PG has to define what is and is > not acceptable in the HTML and spell it out so that DP can put it into practice. It would be much better if DP did that. > It would also be extremely helpful to have a way to preview the different output > formats so we can test our finished HTML and make sure it works properly not > only as HTML but also as the source for the other formats. Roger Frank has the converter and did extensive testing on it. > I (for one) am happy to modify the way I do things -- as long as someone > explains what should/shouldn't be done and why. I am not a computer professional > (and neither are many or most of the PPers at DP) and don't have the time or > background to track down the current thinking on how to code HTML. But I don't > have a problem modifying my practices to end up with a better end product. Got to the DP wiki and search for 'ePub'. I don't know the exact url because the site is down. -- Marcello Perathoner webmaster at gutenberg.org From lee at novomail.net Mon Apr 19 10:38:47 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 11:38:47 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> Message-ID: <4BCC9527.3000103@novomail.net> On 4/19/2010 9:00 AM, Jim Adcock wrote: [snip] > It would be nice to have a portable version of the current tools, so > that transcribers can see how their HTML is going to "officially" > translate into ePub and MOBI prior to submission. I tried porting > the tools, but got bogged down by the amount of stuff which wouldn't > port easily. Only half of this proposal is possible: the .mobi half. As others have pointed out recently, .epub is not really an e-book format. For reasons both technical and practical, most people agree that HTML is the preferred markup for creating e-books. The primary drawback to HTML is that it is inherently a multi-file solution; the HTML file is distinct from the image files, CSS files, font files, etc. Moreover, if you had multiple HTML files that made up the book (and sometimes there are good technical reasons for doing so) you needed yet another metafile that described how the different files related to each other. After about a year of wrangling, in September 2006 the IDPF officially released the "Open Container Format," which specified how a collection of HTML files and the other files on which they depend would be included in a ZIP archive. The specification recommends using the file extension ".epub" to identify files that are OCF containers. In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion." Mobipocket, on the other hand, is a different ball of worms. The original Mobipocket reader (which, I understand, became the basis for the Kindle software) used a subset of HTML markup, and in a few instances changed the meaning of tags (
does not create a Horizontal Rule, but starts a new page in the user agent). It did not recognize all of the named entities, and did not support CSS at all. A Mobipocket PRC file was simply this almost-HTML compressed using Rick Bram's PalmDOC compression scheme (which was actually quite elegant in its simplicity). The later ".mobi" format was the same almost-HTML file compressed across the entire package using Huffman encoding instead. It produces a somewhat small file; the contents of the archive are identical to those in the ".prc" format. Mobipocket Publisher (which I assume is still what is used to create Kindle files) claimed that Mobipocket files supported CSS. In fact what happened was that Mobipocket Publisher would load a CSS file if it were specified in the source HTML, and would convert all the style attributes and computed CSS to the almost-HTML the Mobipocket reader recognized. Thus, a style like "style='font-size: larger';" might be converted to "", but a style like "style='margin-left: 10em';" was simply discarded, because the Mobipocket almost-HTML did not recognize any way to change margin sizes. If you wanted to test the Mobipocket conversion, I would think the way to do that would be to extract the modified HTML from the Mobipocket file, and then write whatever kind of tests you needed to be sure the conversion was correct. I have some 'C' code hanging around to extract HTML from ".mobi" files; if you want it, I could send it to you. From dakretz at gmail.com Mon Apr 19 11:01:08 2010 From: dakretz at gmail.com (don kretz) Date: Mon, 19 Apr 2010 11:01:08 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC9527.3000103@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: The primary drawback to HTML is that it is inherently a multi-file solution; I'd say that's far from the primary drawback. Much more substantial drawbacks are that is presentational, not syntactic; and even if you make it even more complex with syntactic information (or don't for that matter) the proofers will never (nor should they) proof in that format. For DP's purposes, for actually doing the work, HTML is a non-starter - but so is any other equally complex (I'd say any XML-based) representation. What we have in there already ( etc.) is the locus of major headaches and an ongoing error-trap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 19 11:02:47 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 Apr 2010 14:02:47 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <81321.79a934f1.38fdf4c7@aol.com> tunelera said: > Perhaps some of the time that is spent ranting > about DP's work flow and DP's output could be > better put to use creating more informative FAQs > or even guidelines that DPers can use to create > output that fits into the current thinking about > acceptable HTML and/or other formats. my word, it must be "convenient" to simply _ignore_ all the work that i've done here in the last six years. to reiterate, i solved this problem a long time ago... *** michael said: > Worthy of a second look: > Marcello Perathoner said: hey, go ahead and look a second time if you like, but marcello is rarely worth the effort... to some extent, he's on the right track. then again, to that exact same extent, i've said the same thing, over and over, again and again, for years and years. i'm also smart enough to know that postprocessors at d.p. will not go for this approach. they _want_ to make it look pretty. that's why they do what they do. so you will never get them to strip down their .html. but that doesn't matter... if you jigger the workflow so it will create a text-file which has semantic rigor -- e.g., one in z.m.l. format -- you can use _that_ as "the master file" and still let the postprocessors play for as long as they like in fancy-markup disneyland. plus you will make the d.p. workflow more efficient. -bowerbird p.s. i'm also smart enough to know it is impossible to "target" .epub at this time, because of the huge inconsistencies in the way that it gets rendered by the various apps out there. if you focus on adobe's, you're gonna break your file for other viewer-apps, and vice-versa. it will take _years_ to gain stability. the .epub scene was mired in hype since its start... p.p.s. and if _you_ were smart enough, you'd know rfrank already tried to target .epub, but he gave up. d.p. is down right now, so i can't quote you the u.r.l., but use the forum search and you can find it easily... -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Mon Apr 19 11:06:57 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 11:06:57 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Mon, 19 Apr 2010, Jim Adcock wrote: Apple has assured me over and over there is no DRM on our files. If you have any evidence to the opposite, we'd love to hear it. More below: > > >What DRM is put on PG files? I thought the DRM was in the reader program, > not the files. > > DRM on books in my experience is typically implemented as a device-specific > encryption such that even if you move an ebook file to a different machine > you own that machine cannot read the file. A hidden key say "serial number" > on a particular hardware device is used as a decryption key to allow > decryption of the file encrypted specifically for that device. Thus for > example if one buys an in-copyright book from Amazon and you physically copy > that ebook file from one Kindle you own to another Kindle you own the second > Kindle still will not be able to read the ebook file for the first Kindle. > While Amazon will typically allow you to read one purchased book on six > devices simultaneously -- including on Kindle for the PC, Kindle for the > Mac, Kindle for the Blackberry, etc, each of those ebook files has to be > downloaded separately from Amazon because each comes with a unique > device-specific encryption. Not everything from Amazon needs to have DRM, > the publisher who uploads to Amazon has the right to specify "I don't want > DRM on my book." Further, the encryption schemes are typically owned > proprietary to a particular company or consortium and discussing or > distributing information about those encryption schemes is against the law. > And thus ePub specs for example doesn't include description of a ePub > specific DRM scheme rather each distributor of ePub files can implement > their own proprietary and mutually incompatible DRM scheme such that owning > multiple ePub devices is not sufficient to ensure that one can purchase an > ePub book for one device and read it on another device. And if you have an > ePub library of purchased books that you read on your blackberry or what > have you and now you want to move that library to your new iPad well too bad > because its probably not going to work. Nor can you resell your ebooks to > someone else on eBay when you are done with them. > > Even without DRM as far as I know all storage on iPad is currently tied to a > particular app so even if you have a non-DRM "PG" book under Apples' iBooks > applet you can't say "Gee let me open that up in Stanza because Stanza > offers a better ebook reader" -- you can't do that because the iPad ties the > book file to the particular reader applet. If Apple were to allow book > transfer via USB then god forbid you could at least move non-DRM books from > one iPad reader applet to a different iPad reader applet! > > >The next time you get your hands on an iPad, or even ask friends > to try it for you, just do the little search they have and try a > few obvious things like "books" "ebooks" and similar things. > > Sorry, perhaps "friends" was too strong a word but I thought what I have > been asking here is if anyone in PG or DP land has actually found an applet > that will allow them to directly download a free book from the PG website or > other websites using wifi and read it in a manner that makes you happy. Or > is it necessary, as in the case of Apple's own iBooks applet, to *always* > tie the distribution path of the applet to the applet itself? This may seem > like a strange question except that Apple already HAS shut down distribution > of books by USB except via the iTunes monopoly. Until you have at least tried the examples I went and found for you, that allowed ME to read AND download directly from gutenberg.org.... I have nothing further to offer you on this subject. You are leading me to believe I was correct in the extreme when that thought came to me that you are spending more time complaining about all this than actually doing your own research. Please. . .get out there and do something between your messages so I or we don't have the feeling this is a totally useless exercise from your experimental labs. > >As I have said so many times, I don't care who redistributes PG, > from Tea Party people to Sarah Palin or Tina Fey. . .period. > > Again, I acknowledge that *you* don't care, but I do: I want to be able to > get books and publications directly from a variety of web sites via wifi, This does not remove any from your "variety of web sites via wifi. I told you which Apps you could use, and you pretend I never said it. Please go back and read it all again, do your homework, and prepare for a real conversation. You are NOT conversing here, you are not sharing the wealth. Look up the roots of communicate. > and I don't want the applet nor the device to tell me where I can get MOBI > or ePub books from anymore than I would want a web browser tell what HTML I I read all sorts of stuff with Safari and Opera Lite, which is your problem? Please experiment and cite your specific examples that we can recreate. > am allowed read from what HTML sites. This is the ebook version of "net > neutrality" as opposed to buying a device from Big Brother and letting Big > Brother then tell you that you can only use that device to buy MORE product > from Big Brother. Even Big Bill allows me to install a large variety of > reader apps on my netbook say, right click on some ebook I see anywhere on > the internet, and that book automagically opens in my choice of reader app. And you never tried to install those reader apps I mentioned. . . . So what right have you to complain? That they didn't install themselves? I'm sure you would be complaining even more if they did. That they didn't SAVE the files by themselves? Again, I am sure you would be complaining even more if they did. What is it you want?!?!?!? You haven't SAID you want anything I haven't found for you. Yet you have refused to acknowledge those efforts. No thanks means no thanks. > >An apology for when you have been confusing is also appropriate, with > no need to blame me or Apple. > > I apologize for having difficultly unambiguously discussing terminology that > Appple chooses to be deliberately ambiguous as a cute marketing device. Is that a "non-denial denial?" > >It worked for me, but then I gave it a few tries. However, the first two > both worked, as did all the others made for iPad, though I have not tried > each one in great detail, but enough to bring up books I know I typed in. > > You are I have differing ideas of what "it worked" means. For example on a > Kindle, which again is also not the most "unlocked" device in the world, I > can web browse to www.gutenberg.org, click on a MOBI title there, and it Are you saying you sent to gutenberg.org and tried this without success? Are you telling us what program you used in that effort? Are asking us to try this for you? Still, I'm not sure why one brand should support another proprietary format, but I'll try it if you ask, after you thank me for my previous efforts from your previous questions. > "works", or I can go to FreeKindleBooks, or to Feedbooks, etc -- my choice > of publisher -- and if it's a "free book" I can get it -- it works. I can't > get it if it's a "for pay" book because Amazon has locked up that > distribution channel -- which is not a good thing. As opposed to Nook where > none of this works at all except the direct "for pay" path from B&N. You need to be more specific with your requests and challenges. You say what I mean by "it works for me" is not what you mean for you, but you are not specific about what it is you really want. Does every program do everything the way I want? No. Can I manage to get the results I want? Yes? Are you willing to do what it takes to get what you want? ??? > >I really hate to say this, as you'll probably accuse me of flame/trashing > but it sounds as if you have spent more time complaining here than in the > actual testing of the product. > > We spent about four hours playing with the iPad and trying to get it to do > what we wanted it to do -- namely direct access to free ebooks on particular > websites on the internet. You still have refused to name what programs you tried on what sites, and what you tried to do with them. You also have refused to comment on the programs I have suggested already. You are not encouraging me, or anyone else, to try to help you further. > In that amount of time I was already writing software to freely distribute > books on the Kindle when I got my first Kindle Dec 2007. Tell me, honestly, have to asked Apple for the documentation on how to write for the iPad? There were lots of people working for months on Apps before it came out. > >I'm sure you know that Apple wants to control how files get to iPads. > > Yes, the only question is just how badly "locked down" their device is in Ah. . .now we come to the point!!! It's not the "programs" or the "ebooks" you are complaining about, it's "how badly 'locked down` their device is". . .!!! See??? You weren't complaining just about DRM on eBooks. . . . You seem to have a whole pile of axes to grind, and whenever anyone shows you how one axe will do what you want, you switch to another, and another: books, programs, the device itself, Microsoft, etc. I did ALL the things you said. You haven't even gone back and tried ONE of them. Yet you continue to act as if you are right in there testing. > the matter -- and whether or not they will take steps again in the future to > force an increase in that "lock down". Since I haven't been able to get you to state specifics on the present, I am certainly NOT going to engage in a purely hypothetical discussion, on this, or perhaps even other such things you may include. Let's deal with reality before dealing with the other stuff, ok? > Again, what I want is the ebook version of "net neutrality" -- I want to > have an ebook reader applet which is independent of ebook publisher. I don't > want to have to acquire and use a different ebook reader applet for each > book I want to purchase -- or acquire freely on the internet. There are WAY > more internet sites offering interesting books and other publications that > there are organizations willing to write applets for the iPad! Again, _I_ have surfed to "normal" eBook sites and grabbed eBooks on an iPad. I even told you about it. . . . I'm not sure you are paying much attention, and this is very likely to be my last message to you on this subject, perhaps for quite some time. > >I thought you already had managed to "try these things without sending > my $500 to Jobs for the privilege of *testing* his offering. . . . > > Sure, I borrowed a friend and his iPad for four hours of lack of success > trying various approaches after which he ran away screaming.... Just because YOU can't drive a certain car, doesn't make it undriveable. Four hours??? And you never managed to download ONE eReader App??? Something is seriously wrong here. Either with the experimentation or the reporting of it, or both. > >Then how did I manage to download them from the store? > > Sorry again more Apple cuteness, there is the "Apple Store" virtual on the > internet, and there is the "Apple Store" bricks and mortar at the Mall. I > can download software from the virtual Apple Store to my desktop, but then I > don't have a physical iPad to test it on. You said you had one for four hours. You never managed to do what most people do in the first ten minutes? Including myself? > Or I can go to the Mall where they have a physical iPad, but then I don't > have permission to download and install applets from the virtual Apple > Store. And I've used up my friendship for right now with the "bricks and > mortar" friend who has a "bricks and mortar" iPad... You are saying they won't let you test these features at the mall??? I have a strong feeling you didn't ask them for very much help. > >You can do this with Goodreader. > > OK, good suggestion -- their website looks promising I will dig into it more > -- thanks! I did mention Goodreader earlier, did I not? And Wattpad? If this pretty much one line reply had been forthcoming back then, this just might have been a totally different conversation. Think about honey vs vinegar. . .eh? > >Times will change, people will jailbreak iPads and all. . . . > > I am hoping that the future OS in the works for iPad may make things less > restrictive. Not personally interested in hacking anything to get increased > access. Hacking to my taste is incompatible with creating texts for PG.... And just how do think most of the great apps in history got started??? > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From lee at novomail.net Mon Apr 19 11:07:50 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 12:07:50 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC8601.3020408@novomail.net> Message-ID: <4BCC9BF6.8040409@novomail.net> On 4/19/2010 11:26 AM, Jim Adcock wrote: [snip] > PS: I already to make preview versions of my HTML on ePub and MOBI -- > its just that the HTML->ePub and HTML->MOBI conversion software I > have is not identical to Marcello's and thus the formatting ends up > different than the "official" version. If true, this is troubling. Because .epub is just a ZIP file, you should be able to open the archive in your favorite tool (WinZip, WinRar, 7-Zip, PowerArchiver, whatever) or use gzip -x and extract all the files. The HTML file(s) should be identical to whatever the source was. If they differ, the differences had better be harmless (making the source valid XHTML, for example). If they /do/ differ in substantive ways, Marcello should revisit his "publishing" code. It is possible, however, that if an .epub file looks different when rendered than the source HTML perhaps the archive contains a default stylesheet that alters the appearance. BTW, to create a valid .epub file, start by creating an .opf file which describes the publication. One extracted from an existing .epub file should give you a good example of what is necessary. Then create a container.xml file that references the .opf file you created. Put this file in a subdirectory called "meta-inf". Lastly capture the mimetype file from an existing .epub. Now, add "mimetype" to a zip file, *without compression*. Then add the .opf file, the content XHTML file(s), and meta-inf/container.xml. Rename the file to ".epub", and voil?, you have a valid .epub file. Of course other files can be added as well (such as font files and stylesheets), but they are just gilding the lily. The actual paths of the various files are irrelevant except for the container.xml file, which *must* be in the meta-inf/ folder (and of course the paths to the files must be correctly recorded in the .opf file). I think it is only polite to add the .opf file to the archive second, and to leave it uncompressed, but that is fairly uncommon. The OCF specification requires that the mimetype file be the first file in the archive (so it can always be found at a specific byte offset), but I know of no .epub reader that actually enforces this requirement. From jimad at msn.com Mon Apr 19 11:25:43 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 11:25:43 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC9527.3000103@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: >In other words, an ".epub" file is just a ".zip" file with a few additional metadata files added. Software that purports to "convert" HTML to .epub should not do /anything/ to the source file, except perhaps to insure that it is valid XHTML (for older HTML files). There is no need to validate an .epub conversion, as no conversion should have occurred. If a rendered .epub document does not look exactly like the same collection of files rendered by a browser from the file system, it is the fault of the .epub rendering software, not the "conversion." You make an interesting thesis, which, rare in the case of DP/PG arguments, is eminently testable. I have done so, and you clearly have not. Take a PG HTML zip file, say "76" for the sake of completeness. Download it, and unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for concreteness. Download it, and unpack it on your computer. Now, look at the contents. Do they have the same HTML files? No they do not. Do the have the same number of HTML files? No they do not. Are the contents of the HTML files identical? No they are not. For the sake of completeness, open the first HTML file of each. Do the files RENDER the same on your browser when you actually TRY them to see if your thesis is correct? No they do not RENDER the same. It is an interesting thesis that PG epub files are "just" a zipped version of the PG HTML files -- but it is an easily demonstrably false thesis. Marcello's epub software does more than "just" pack the HTML files into an epub package. Ask him for a copy of his converter software, and see what the conversion actually entails. And/or ask Marcello what conversions he actually does to move from the HTML version to the epub version. Thus again, I suggest that it would be a good idea to have a portable version of Marcello's epub conversion software that we could use for testing on our local machines. Given a portable version of the epub conversion software going to mobi is easy using the same Amazon/Mobipocket provided epub->mobi conversion software that Marcello is already using. From hart at pglaf.org Mon Apr 19 11:30:30 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 11:30:30 -0700 (PDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: If only Jim would have been as thorough, and polite, about the iPad. mh On Mon, 19 Apr 2010, Jim Adcock wrote: > >In other words, an ".epub" file is just a ".zip" file with a few > additional metadata files added. Software that purports to "convert" > HTML to .epub should not do /anything/ to the source file, except > perhaps to insure that it is valid XHTML (for older HTML files). There > is no need to validate an .epub conversion, as no conversion should have > occurred. If a rendered .epub document does not look exactly like the > same collection of files rendered by a browser from the file system, it > is the fault of the .epub rendering software, not the "conversion." > > You make an interesting thesis, which, rare in the case of DP/PG arguments, > is eminently testable. I have done so, and you clearly have not. Take a PG > HTML zip file, say "76" for the sake of completeness. Download it, and > unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for > concreteness. Download it, and unpack it on your computer. > > Now, look at the contents. > > Do they have the same HTML files? > > No they do not. > > Do the have the same number of HTML files? > > No they do not. > > Are the contents of the HTML files identical? > > No they are not. > > For the sake of completeness, open the first HTML file of each. Do the > files RENDER the same on your browser when you actually TRY them to see if > your thesis is correct? > > No they do not RENDER the same. > > It is an interesting thesis that PG epub files are "just" a zipped version > of the PG HTML files -- but it is an easily demonstrably false thesis. > Marcello's epub software does more than "just" pack the HTML files into an > epub package. Ask him for a copy of his converter software, and see what > the conversion actually entails. And/or ask Marcello what conversions he > actually does to move from the HTML version to the epub version. > > Thus again, I suggest that it would be a good idea to have a portable > version of Marcello's epub conversion software that we could use for testing > on our local machines. Given a portable version of the epub conversion > software going to mobi is easy using the same Amazon/Mobipocket provided > epub->mobi conversion software that Marcello is already using. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From lee at novomail.net Mon Apr 19 12:05:17 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 13:05:17 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: <4BCCA96D.4030803@novomail.net> On 4/19/2010 12:30 PM, Michael S. Hart wrote: > > If only Jim would have been as thorough, and polite, about the iPad. > > mh Hmmm, I thought he was ... From jimad at msn.com Mon Apr 19 12:14:14 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 12:14:14 -0700 Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >Apple has assured me over and over there is no DRM on our files. And I said it wouldn't matter if they have DRM on your files or not as long as they prevent you from moving the files from applet to applet, and/or prevent you from sharing the files with your friends, because then they have accomplished the same goals as DRM without actually implementing the DRM. >Until you have at least tried the examples I went and found for you, that allowed ME to read AND download directly from gutenberg.org.... Well, I just spent about an hour reading all the manuals for goodreader -- the applet you most recommended -- and it talks about supporting PDF not epub nor mobi and it talks about why when you try to read a big book it crashes, and how if you want to set up wifi to talk to your computer then that kills wifi to the internet, etc. So forgive me if I am not impressed. I also looked up everything else on the iTunes store listed under "ebooks" or "books" and those apps are even weaker. So clearly we are living on non-parallel planets! >I read all sorts of stuff with Safari and Opera Lite, which is your problem? I told you the first thing I tried doing at the Apple bricks and mortor store was to go to PG in the Safari Browser, clicked on an epub link, and it says "Sorry downloading that file type is not allowed." >Please experiment and cite your specific examples that we can recreate. Please use your Safari browser, go to PG, pick an epub link, click on it, and report back what happens on the iPad. When I tried this at the Apple Brick and Mortor store iPad says "I'm sorry but I can't do that Hal". For comparison, I go on my desktop using IE or Mozilla, click on an epub or mobi link and that book opens automatically in its appropriate ebook reader, just the same as clicking on a PDF file causes that file to open in Adobe Reader. Or clicking on a djvu file opens it in a LizardTech djvu ebook reader. For comparison, on Kindle I go to PG, I click on a mobi link and it says "Do You Want to Download This Book?" I say "Yes" that book shows up in my Kindle bookshelf, where I click on the book and read it any time I want. >And you never tried to install those reader apps I mentioned. . . . >So what right have you to complain? I complain because every time you suggest something where I have to spend my $500 up front only to determine that indeed what I said doesn't work doesn't work. If I spend the $500 and sure enough it doesn't work are *you* going to offer me my money back??? Sure I know that iPad has Safari that can read HTML but I don't want to read HTML. I want to read ePub or Mobi on a decent ebook reader which will allow me to set things like font sizes and margins. >What is it you want?!?!?!? >You haven't SAID you want anything I haven't found for you. >Yet you have refused to acknowledge those efforts. I have checked them out and at least according to their own documentation they don't work. What I want is a slate like device with wifi where I can download epubs and mobis from the internet or from my intranet, read them, perhaps lightly edit or annotate them, and I want to be able to do so as seamlessly and as painlessly as from my netbook -- given that a slate is simply a netbook minus the keyboard. >Are you saying you sent to gutenberg.org and tried this without success? Yes. >Are you telling us what program you used in that effort? Safari >Are you willing to do what it takes to get what you want? I already have done so three different ways: 1) Using a desktop. 2) Using a netbook. 3) Using a Kindle. The question then is NOT whether I can find iPad "workaround" to get to some subset of what someone might be doing somewhere in the ebook world. The question is whether or not there is some iPad reader app that allows at least as good and as complete an experience as I am already experiencing via 1) 2) 3) above. 1) has the problem that its not portable. 2) has the problem that it has a keyboard that gets in the way. 3) has the problem that it has slow and unreliable whispernet rather than fast and reliable wifi. Is iPad better? Presumable not, or you would not keep emphasizing work-arounds. Perhaps when HP comes out with the Slate it will be "unlocked." Perhaps not. But I'm not going to pay $500 for the privilege of hack work-arounds! >You still have refused to name what programs you tried on what sites, and what you tried to do with them. I think I've told you, actually. When I say I used the web browser, I think its pretty obvious that the web browser on iPad is Safari? I told you we used iBooks, because we both discussed the PG limitations of what is there. I told you we tried Stanza, because I told you about the large blurry iPod simulator that brought up. I told you I spent an hour reading the Goodreader documentation about crashes and having to reconfigure ones computer and router to either support reading from the internet or from a local computer, and having to reconfigure to switch between the two... >Tell me, honestly, have to asked Apple for the documentation on how to write for the iPad? I have researched the issue of developing for Apple, yes, and was turned off having to pay subscription fees up front. Even Big Bill doesn't require that. >It's not the "programs" or the "ebooks" you are complaining about, it's "how badly 'locked down` their device is". . .!!! Same thing since they lock the books to the programs... >You haven't even gone back and tried ONE of them. Again, how would I test them more than I have already tested them without spending my $500 up front? >Let's deal with reality before dealing with the other stuff, ok? The reality is that people had bought iPods using Stanza and expecting to be able to share books and Apple took this away from them. Same "1984" kind of deal as the student who had purchased "1984" for their Kindle, was relying on that to do his homework, and without warning Amazon took off the purchased book without permission. >Four hours??? And you never managed to download ONE eReader App??? Sure we did, I told you we downloaded Stanza. >I have a strong feeling you didn't ask them for very much help. There wasn't much help to be had, truth be told. I will go back and see if they will allow me to install Goodreader, since that is your top suggestion. >I did mention Goodreader earlier, did I not? Perhaps, but you didn't mention that it could download directly from any particular website, in fact you have said repeatedly you don't care if it can download from any particular website. >And just how do think most of the great apps in history got started??? Most of them got started somewhere where a mere say-so from Steve Jobs isn't enough to get them *stopped!!!* From Bowerbird at aol.com Mon Apr 19 12:18:38 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 Apr 2010 15:18:38 EDT Subject: [gutvol-d] the blind men and the .epub file-format Message-ID: <8ce40.74ecf6f2.38fe068e@aol.com> it's sad y'all know so much, yet so little, all at the same time. lee is correct. but he doesn't know what he's talking about. jim is correct. but he doesn't know what he's talking about. as lee says, an .epub file is just some (x)html files zipped up. as jim says, the .epub files at p.g. often differ from the .html. what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 19 12:29:15 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 Apr 2010 15:29:15 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <470.723ebfcf.38fe090b@aol.com> jim said: > I want to read ePub or Mobi on a decent ebook reader > which will allow me to set things like font sizes and margins. try ibisreader. > http://ibisreader.com you can try it on your desktop machine to see if you like it. and it's on the ipad. plus it is under _active_ development. i don't know if "eucalyptus" is ipad-native yet, but when it is, i would definitely recommend that as a worthwhile viewer-app. (it has shortcomings, but from a reading perspective, it's fine.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Mon Apr 19 12:48:46 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 19 Apr 2010 12:48:46 -0700 Subject: [gutvol-d] Re: the blind men and the .epub file-format In-Reply-To: <8ce40.74ecf6f2.38fe068e@aol.com> References: <8ce40.74ecf6f2.38fe068e@aol.com> Message-ID: >what neither one seems to know is that marcello's converter doesn't always use the .html; sometimes it uses the .txt file. i don't know the particulars, but it probably has something to do with the nature of the specifics within the .html file... Not sure what part of the elephant you've grabbed hold of, but if you looked at the example in question it would be obvious that your answer isn't. From lee at novomail.net Mon Apr 19 13:18:17 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 14:18:17 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: <4BCCBA89.2040103@novomail.net> On 4/19/2010 12:25 PM, Jim Adcock wrote: >> In other words, an ".epub" file is just a ".zip" file with a few > additional metadata files added. Software that purports to "convert" > HTML to .epub should not do /anything/ to the source file, except > perhaps to insure that it is valid XHTML (for older HTML files). There > is no need to validate an .epub conversion, as no conversion should have > occurred. If a rendered .epub document does not look exactly like the > same collection of files rendered by a browser from the file system, it > is the fault of the .epub rendering software, not the "conversion." > > You make an interesting thesis, which, rare in the case of DP/PG arguments, > is eminently testable. I have done so, and you clearly have not. Take a PG > HTML zip file, say "76" for the sake of completeness. Download it, and > unpack it on your computer. Take a PG epub "zip" file, say pg76.epub for > concreteness. Download it, and unpack it on your computer. > > Now, look at the contents. > > Do they have the same HTML files? Yes, they do. The file names have been altered, but the content is virtually the same. [snip] > Do the have the same number of HTML files? Yes they do. Each has eight parts plus the godawful and legally unnecessary PG header (Apple is doing the world a favor by stripping it away. [snip] > Are the contents of the HTML files identical? > > No they are not. No, they are not. Mr. Perathoner's files 1.) have been converted from ISO-8859 to Unicode/UTF-8; 2.) have extracted the internal style sheets into external style sheets; 3.) have added a links to a "center contents pages" and generic "pgepub" stylesheet; 4.) have added "id" attributes for use by .epub user agents for navigation; and 5.) have changed all the internal links to match the file paths inside his archive. All of these steps, except #3, are harmless and do not affect the presentation of the content. Indeed, with the exception of centering the tables they are probably all desirable things to do. > For the sake of completeness, open the first HTML file of each. Do the > files RENDER the same on your browser when you actually TRY them to see if > your thesis is correct? > > No they do not RENDER the same. First of all, it is your thesis not mine. I rarely, if ever, download files from PG; instead I get them from some other source where the quality of the files has more importance. But you are correct, with an unaltered archive they do /not/ render the same. However, if you delete the "pgepub.css" file, or delete its contents, they /do/ render the same with the exception of the centered tables of contents. If you delete all the odd numbered .css files, then they /do/ render identically. This is, of course, exactly why embedding style information inside an HTML file is a bad thing (you can't change the presentation without editing the HTML) and including a link to a generic stylesheet is a good thing (just find the stylesheet you like, copy it over the top of the generic one, and voil?, your book, your way). All of this can be accomplished by using a visual zip tool, and without ever having to edit a file (other than your zipper). Although we definitely need to talk Mr. Perathoner out of adding a link to a "center me" style sheet. > It is an interesting thesis that PG epub files are "just" a zipped version > of the PG HTML files -- but it is an easily demonstrably false thesis. I never said that /PG/ .epub files are just a zipped version of /PG/ HTML files; I said that technically conforming .epub files are just zipped versions of their source HTML files. It is certainly possible to take an HTML file, alter it, and make an .epub file from the newly altered file. Personally, I would view that as a flaw in the conversion software, though, and independent of the issue of .epub encapsulation. > Marcello's epub software does more than "just" pack the HTML files into an > epub package. Ask him for a copy of his converter software, and see what > the conversion actually entails. And/or ask Marcello what conversions he > actually does to move from the HTML version to the epub version. True. Apparently, Mr. Perathoner's software extracts embedded CSS information and moves it to an external style sheet (as it should), creates a "
" around the tables of contents and illustrations, with a corresponding style sheet that centers the contents (which it should not), and adds a link to a generic "pgepub" style sheet (as it should), in addition to altering names for navigation purposes. Now apparently, your complaint is not that PG HTML does not make good .epub files, or that including a generic stylesheet "breaks" the ".epub", but that you don't like the .epub generator that Mr. Perathoner wrote. That complaint, with which I sympathize, needs to be directed to him individually; it cannot, however, be generalized to /all/ .epub files, only those created by his software. > Thus again, I suggest that it would be a good idea to have a portable > version of Marcello's epub conversion software that we could use for testing > on our local machines. Given a portable version of the epub conversion > software going to mobi is easy using the same Amazon/Mobipocket provided > epub->mobi conversion software that Marcello is already using. From Bowerbird at aol.com Mon Apr 19 13:49:32 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 19 Apr 2010 16:49:32 EDT Subject: [gutvol-d] Re: the blind men and the .epub file-format Message-ID: <6fba.24226b0f.38fe1bdc@aol.com> jim said: > Not sure what part of the elephant you've grabbed hold of, > but if you looked at the example in question it would be > obvious that your answer isn't. do you really think i'm going to let myself get wrestled into having a discussion with the blind men about the elephant? if so, think again. i said as much as i can in support of you -- you're half-right. so is lee... if you want to figure out the specifics from there, you're welcome to do so. but you'll be doing that without me. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Mon Apr 19 14:37:38 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 14:37:38 -0700 (PDT) Subject: [gutvol-d] Re: [SPAM] RE: Re: Typesetting In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Mon, 19 Apr 2010, Jim Adcock wrote: > > >Apple has assured me over and over there is no DRM on our files. > > And I said it wouldn't matter if they have DRM on your files or not as long > as they prevent you from moving the files from applet to applet, and/or > prevent you from sharing the files with your friends, because then they have > accomplished the same goals as DRM without actually implementing the DRM. This is not where you started, and I doubt it's where either you or iPad end. > >Until you have at least tried the examples I went and found for you, > that allowed ME to read AND download directly from gutenberg.org.... > > Well, I just spent about an hour reading all the manuals for goodreader -- > the applet you most recommended -- and it talks about supporting PDF not > epub nor mobi and it talks about why when you try to read a big book it Is The Bible a big enough book for you? Shall I test that for you? Once again, I repeat, I don't care about any one particular format, so please stop pretending this is a valid topic with me, instead, I ask you to take that to a different subject header so I do not have to keep getting bonked over the head with it, however cartoonish. > crashes, and how if you want to set up wifi to talk to your computer then > that kills wifi to the internet, etc. And just how many systems do you have, or know of, that wifi to two wifi spots at the same time??? I have never even tried this, so I am very interested. I hope it isn't just one more dead herring dragged across this pathway-- a pathway that still continues to weave madly across times zones. > So forgive me if I am not impressed. I also looked up everything else on the > iTunes store listed under "ebooks" or "books" and those apps are even > weaker. So clearly we are living on non-parallel planets! Sorry, reading the manual and looking up Apps does not qualify you with any actual experience on the subject. Not to mention that YOU did not mention, once again, the actual NAME of the products you are claiming so much expertise about. I'm willing to bet you can't even name the handful I downloaded or much less the entire list available. This would indicate you don't actually know the options available for a selection of programs, not to mention those that are not spelled out in detail in what you say you have been reading. One of my favorite quotes of all time comes to mind here: "Don't Confuse The Map With The Territory." > >I read all sorts of stuff with Safari and Opera Lite, which is your > problem? > > I told you the first thing I tried doing at the Apple bricks and mortor > store was to go to PG in the Safari Browser, clicked on an epub link, and it > says "Sorry downloading that file type is not allowed." And you didn't go any farther with your experimentation? I tried other options, got other results. However, I will try what you said, as well. > >Please experiment and cite your specific examples that we can recreate. > > Please use your Safari browser, go to PG, pick an epub link, click on it, > and report back what happens on the iPad. When I tried this at the Apple > Brick and Mortor store iPad says "I'm sorry but I can't do that Hal". I have no desire to go to any Apple Brick and Mortar store, but I will try from general wifi hookups. Nevertheless, if, at the end of all this, you send me a list of questions, comments, complaints, etc., I will try to go get you answers from Apple. Fair enough? > For comparison, I go on my desktop using IE or Mozilla, click on an epub or > mobi link and that book opens automatically in its appropriate ebook reader, > just the same as clicking on a PDF file causes that file to open in Adobe > Reader. Or clicking on a djvu file opens it in a LizardTech djvu ebook > reader. > For comparison, on Kindle I go to PG, I click on a mobi link and it says "Do > You Want to Download This Book?" I say "Yes" that book shows up in my Kindle > bookshelf, where I click on the book and read it any time I want. And hasn't Apple made it totally obvious you can't do with with an iPad??? Yet you continue to complain that they have Apple when you want Orange??? Yet, I have found plenty of ways to get that kind of end result. No matter what your reading of manuals and reports might have said. BTW, I haven't been able to make the iPad crash yet, even with The Bible. > >And you never tried to install those reader apps I mentioned. . . . > >So what right have you to complain? About the below: Now you resort to putting words in my mouth, like in high school days. I have never tried to "suggest something where [you] have to spend [your] $500 up front only to. . ." particularly when you have made it obvious it it the case that your mind is closed to a variety of options. > I complain because every time you suggest something where I have to spend my > $500 up front only to determine that indeed what I said doesn't work doesn't > work. If I spend the $500 and sure enough it doesn't work are *you* going > to offer me my money back??? Sure I know that iPad has Safari that can read > HTML but I don't want to read HTML. I want to read ePub or Mobi on a decent > ebook reader which will allow me to set things like font sizes and margins. You never figured out how "to set things like font sizes and margins?" I'm beginning to wonder just what you did with your four hours. . . . > >What is it you want?!?!?!? > >You haven't SAID you want anything I haven't found for you. > >Yet you have refused to acknowledge those efforts. > > I have checked them out and at least according to their own documentation No, you haven't. . .not what most people mean. . .you never went back to see how reality compares with the docs. > they don't work. What I want is a slate like device with wifi where I can > download epubs and mobis from the internet or from my intranet, read them, > perhaps lightly edit or annotate them, and I want to be able to do so as > seamlessly and as painlessly as from my netbook -- given that a slate is > simply a netbook minus the keyboard. Ah, now, at this late stage, you have added that you want to edit eBooks on the iPad. > >Are you saying you sent to gutenberg.org and tried this without success? > > Yes. > > >Are you telling us what program you used in that effort? > > Safari > > >Are you willing to do what it takes to get what you want? > > I already have done so three different ways: > > 1) Using a desktop. > 2) Using a netbook. Not any different in this respect, just padding your bibliography. > 3) Using a Kindle. If you spent the same four hours' worth on a Kindle, and liked it so much you were already programming with it, I can't imagine why you are having this conversation at all. Unless it is just to moan and complain in front of an audience to somehow "get even" with Apple for being. . .well. . .Apple. You like Microsoft and Kindle. . .go. . .Bon Voyage!!! > The question then is NOT whether I can find iPad "workaround" to get to some > subset of what someone might be doing somewhere in the ebook world. The Sorry, but that is pretty much the entire essence of running computers. I'm betting you have just forgotten the steep learning curve you climbed to get to know the ones you now say you like. I'll bet you ranted and raved about them just like you are doing now!!! I did. ;-) > question is whether or not there is some iPad reader app that allows at > least as good and as complete an experience as I am already experiencing via > 1) 2) 3) above. 1) has the problem that its not portable. 2) has the problem > that it has a keyboard that gets in the way. 3) has the problem that it has > slow and unreliable whispernet rather than fast and reliable wifi. Is iPad > better? Presumable not, or you would not keep emphasizing work-arounds. > Perhaps when HP comes out with the Slate it will be "unlocked." Perhaps > not. But I'm not going to pay $500 for the privilege of hack work-arounds! Sure you are! You do it every time you buy a computer, or somebody pays for you to use. It's all built on that sort of thing. Get used to it. Don't ever look under the hood, you will be terribly disappointed as to a plethora of "hacks and workarounds" that make every bit of this work. > >You still have refused to name what programs you tried on what sites, > and what you tried to do with them. > > I think I've told you, actually. When I say I used the web browser, I think > its pretty obvious that the web browser on iPad is Safari? I told you we > used iBooks, because we both discussed the PG limitations of what is there. At first you denied using iBooks at all, don't you remember??? > I told you we tried Stanza, because I told you about the large blurry iPod Actually, you spoke of that as if it were a hypothetical, so there was quite literally no way to know you had actually tried it or not, or were reporting once again what various manuals and reviews told you. > simulator that brought up. I told you I spent an hour reading the > Goodreader documentation about crashes and having to reconfigure ones > computer and router to either support reading from the internet or from a > local computer, and having to reconfigure to switch between the two... Same with your cell phone and most other such devices. However, if all your systems use plain 801n, or 801g, should be no problem, it certainly hasn't been for me. > > >Tell me, honestly, have to asked Apple for the documentation on how to > write for the iPad? > > I have researched the issue of developing for Apple, yes, and was turned off > having to pay subscription fees up front. Even Big Bill doesn't require > that. So, you are admitting you never asked for what you didn't get. "You Never Know What You Might Get If You Don't Ask For It." > >It's not the "programs" or the "ebooks" you are complaining about, > it's "how badly 'locked down` their device is". . .!!! > > Same thing since they lock the books to the programs... Not, it's not the same thing. Learn to speak specifically when you say such things, ask such questions, etc. Otherwise you are just wasting a lot of people's time. > >You haven't even gone back and tried ONE of them. > > Again, how would I test them more than I have already tested them without > spending my $500 up front? Gee, I would think that obvious to someone who already did it once before. > >Let's deal with reality before dealing with the other stuff, ok? > > The reality is that people had bought iPods using Stanza and expecting to be > able to share books and Apple took this away from them. Same "1984" kind of > deal as the student who had purchased "1984" for their Kindle, was relying > on that to do his homework, and without warning Amazon took off the > purchased book without permission. You seem to be bringing up something new, and of great interest. I'm sure we'd all like to hear more about this!!!!!!! > >Four hours??? And you never managed to download ONE eReader App??? > > Sure we did, I told you we downloaded Stanza. For the iPad specifically! Now, just above, you said you were using iBooks, doesn't that count? > >I have a strong feeling you didn't ask them for very much help. > > There wasn't much help to be had, truth be told. I will go back and see if > they will allow me to install Goodreader, since that is your top suggestion. You keep short-changing Wattpad, which I think I mentioned first. When at the Apps Store, just search for "ebooks" and "books" etc. How many times have I said that??? > >I did mention Goodreader earlier, did I not? > > Perhaps, but you didn't mention that it could download directly from any > particular website, in fact you have said repeatedly you don't care if it > can download from any particular website. You download it from the Apps Store. . . . > >And just how do think most of the great apps in history got started??? > > Most of them got started somewhere where a mere say-so from Steve Jobs isn't > enough to get them *stopped!!!* No, they just worked around their current version of Steve Jobs, such as working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony and all the rest. . . . > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From lee at novomail.net Mon Apr 19 14:51:13 2010 From: lee at novomail.net (Lee Passey) Date: Mon, 19 Apr 2010 15:51:13 -0600 Subject: [gutvol-d] Re: the blind men and the .epub file-format In-Reply-To: <8ce40.74ecf6f2.38fe068e@aol.com> References: <8ce40.74ecf6f2.38fe068e@aol.com> Message-ID: <4BCCD051.7060508@novomail.net> On 4/19/2010 1:18 PM, Bowerbird at aol.com wrote: [snip] > what neither one seems to know is that marcello's converter > doesn't always use the .html; sometimes it uses the .txt file. > i don't know the particulars, but it probably has something > to do with the nature of the specifics within the .html file... bb is correct when he suggests that sometimes the .epub file is 2 generations removed from the Impoverished Text file. If there is no hand-crafted HTML file, there is an option to download a computer-generated HTML file. If you were to download an .epub file for one of these texts for which only ITF is stored (I used _War of the Worlds_, etext 35) you would see that the internal HTML differs from the computer-generated HTML only by the fact that the computer-generated HTML contains the metadata in elements whereas the .epub contains the metadata in the content.opf file, and by the fact that the .epub file contains a link to "pgepub.css" whereas the computer-generated HTML does not (why not? what harm would it do? For that matter, why not leave the metadata in the HTML file as well?). Presumably .epub generation is a linked process whereby ITF is converted to HTML which is then encapsulated in the OCF. Because native HTML is relative uncommon at PG, I would guess that most .epub files start the process as ITF. From hart at pglaf.org Mon Apr 19 17:35:43 2010 From: hart at pglaf.org (Michael S. Hart) Date: Mon, 19 Apr 2010 17:35:43 -0700 (PDT) Subject: [gutvol-d] Dim View In-Reply-To: References: <8ce40.74ecf6f2.38fe068e@aol.com> Message-ID: I, myself, am taking a dim view of some of these conversations, in re: that I am perhaps taking them more seriously than deserved. So, unless there are some requests for further comments, I intend my future comments to be more limited in seriousness and scope. mh From marcello at perathoner.de Tue Apr 20 04:10:28 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 Apr 2010 13:10:28 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCCBA89.2040103@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> <4BCCBA89.2040103@novomail.net> Message-ID: <4BCD8BA4.6070803@perathoner.de> Lee Passey wrote: > creates a "
" around the tables of contents and > illustrations, with a corresponding style sheet that centers the > contents (which it should not), HTML Tidy does that. Direct your complaints to the w3c. -- Marcello Perathoner webmaster at gutenberg.org From tunelera at yahoo.com Tue Apr 20 07:55:37 2010 From: tunelera at yahoo.com (Julia C. Miller) Date: Tue, 20 Apr 2010 09:55:37 -0500 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCC9478.5040204@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> Message-ID: <4BCDC069.2050608@yahoo.com> On 4/19/2010 12:35 PM, Marcello Perathoner wrote: > Julia C. Miller wrote: > >> In order for a "paradigm shift" to happen at DP, PG has to define >> what is and is not acceptable in the HTML and spell it out so that DP >> can put it into practice. > > It would be much better if DP did that. > So after DP goes through the time and effort to define the standards to upload to PG, people from PG can say "No, that's not what we want"? > >> It would also be extremely helpful to have a way to preview the >> different output formats so we can test our finished HTML and make >> sure it works properly not only as HTML but also as the source for >> the other formats. > > Roger Frank has the converter and did extensive testing on it. > Yes, Roger has the converter and his discussion of the changes that need to be made so the conversion to ePub works properly was very helpful. I used what I learned in that thread in the last 8 books that I have uploaded. But I am working on books right now that I know will not convert properly (based on what I have learned from Roger's discussion). I would like to be able to preview, change the coding and preview again until I find a satisfactory solution. From marcello at perathoner.de Tue Apr 20 08:26:44 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Tue, 20 Apr 2010 17:26:44 +0200 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCDC069.2050608@yahoo.com> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> <4BCDC069.2050608@yahoo.com> Message-ID: <4BCDC7B4.9080605@perathoner.de> Julia C. Miller wrote: > So after DP goes through the time and effort to define the standards to > upload to PG, people from PG can say "No, that's not what we want"? I don't see any danger of that as long as the new standards are more restrictive than the old ones. And they'd have to be a lot more restrictive to be worth the trouble of implementing them. > Yes, Roger has the converter and his discussion of the changes that need > to be made so the conversion to ePub works properly was very helpful. I > used what I learned in that thread in the last 8 books that I have > uploaded. But I am working on books right now that I know will not > convert properly (based on what I have learned from Roger's discussion). > I would like to be able to preview, change the coding and preview again > until I find a satisfactory solution. The sources are online ... But me being a 100% linux shop and ibiblio being a 100% linux shop and with 99% of you wanting a windows software somebody has to take the time and port it. OTOH the converter is just one link in the chain. You'd also have to test the ePub on every reader out there. Its much easier to forget about fancy formatting and use only the simplest HTML constructs. -- Marcello Perathoner webmaster at gutenberg.org From dakretz at gmail.com Tue Apr 20 10:38:01 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 20 Apr 2010 10:38:01 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCDC7B4.9080605@perathoner.de> References: <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> <4BCDC069.2050608@yahoo.com> <4BCDC7B4.9080605@perathoner.de> Message-ID: Roger is also a 100% linux shop. Well, that may not be entirely true - he probably uses Solarix and various unices. On Tue, Apr 20, 2010 at 8:26 AM, Marcello Perathoner wrote: > Julia C. Miller wrote: > > So after DP goes through the time and effort to define the standards to >> upload to PG, people from PG can say "No, that's not what we want"? >> > > I don't see any danger of that as long as the new standards are more > restrictive than the old ones. And they'd have to be a lot more restrictive > to be worth the trouble of implementing them. > > > > Yes, Roger has the converter and his discussion of the changes that need >> to be made so the conversion to ePub works properly was very helpful. I used >> what I learned in that thread in the last 8 books that I have uploaded. But >> I am working on books right now that I know will not convert properly (based >> on what I have learned from Roger's discussion). I would like to be able to >> preview, change the coding and preview again until I find a satisfactory >> solution. >> > > The sources are online ... But me being a 100% linux shop and ibiblio being > a 100% linux shop and with 99% of you wanting a windows software somebody > has to take the time and port it. > > OTOH the converter is just one link in the chain. You'd also have to test > the ePub on every reader out there. > > Its much easier to forget about fancy formatting and use only the simplest > HTML constructs. > > > > -- > Marcello Perathoner > webmaster at gutenberg.org > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 11:41:02 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 11:41:02 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: >If only Jim would have been as thorough, and polite, about the iPad. I don't know what your problem is, but per your suggestions I went back to the Apple "brick and mortar" store yesterday, and spent an additional three+ hours researching all the iPad suggestions you made. None of them "worked" as you suggested. None of them allow direct access to allow one to read ebook files either ePub or MOBI via wifi access to the internet. None of them allow latest access to the most recent books in ePub format released on PG. What almost all of them do is allow access to an internet server tied to that particular applet that allows one to read some subset of the PG offerings in some degraded form, typically a slightly spruced up "pretty print" version of a text file. This can simply be checked by searching for one of the latest PG books in which case you will find that NONE of these applets offer the latest PG books. Unlike most full function web browsers on desktops or even netbooks, if one uses the provided Safari web browser to click on an ePub or MOBI file Safari says "You cannot download that file." You CAN use Safari to open a PDF file, which makes iPad useful for reading Google Books PDF "photocopies" of books. Just not useful for what PG offers. Also even IF you use Safari to open a PDF file, apparently Safari does not retain a copy of that book because when I run Safari again it redownloads that PDF file again from scratch. Why does one care? Well, that's slow, and it means that this doesn't work in "airplane mode" ie you cannot read the PDF file on an airplane via Safari with the wifi turned off. The ability to actually download and save book files is a fundamental feature of "real" book readers, and in the design of "real" ebook formats such as ePub and MOBI, such that when you download a book, you HAVE that book, and can then read it wherever and whenever you choose without requiring wifi or other wired connection. What the Apple manuals say (finally released yesterday and read last night) is that one is allowed to transfer free book files in ePub format via USB cable from your desktop to your iPad via iTunes. You download a free ePub book to your desktop. From there you transfer it to the iTunes software. Then you hook up your iPad using a USB. Then you sync to iTunes. Then you safely unplug your iPad. Then you open iBooks. Then you find the new book in the iBooks "shelf" which you can then finally click on and start reading. As opposed to one click on a link to a free ePub or MOBI book link say at PG using a netbook browser, which downloads the file, stores it, opens the reader app and you are up and reading. 1 second verses 10 minutes of hassle factor. Further, the iPad manuals say that Apple has *permanently* given Apple applets priority over other applets for the file types that the iPad supports. IE ePub type is "hardwired" to the iBooks applet such if you transfer an ePub file via the long-winded iTunes USB "sync" process then one can only read that ePub using the iBooks app -- which is a pretty weak app compared to other ePub and MOBI readers if one has made the comparison. [Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and allowed no other browser choice! Can you say "Monopoly," I knew you could] What CAN iPad do? It can reasonably present paid books from Apple on iBooks (not the greatest reader app, but not too horrible either) It can reasonably present a free subset of PG's offerings repackaged as-if they came from Apple on iBooks It can reasonably present paid books from Amazon via Kindle for iPad It can reasonably present a free subset of PG's offerings repackaged as-if they came from Amazon via Kindle for iPad If you have already bought books for a Kindle then Kindle for iPad will also allow you to read them for no additional cost on iPad It can reasonably present PDF books and documents via Safari as long as you have an active wifi connection It can store and allow you to read free ePub and other common document formats that you have transferred to iPad using the slow and cumbersome Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot test it in the Apple store because they don't have USB to desktop set up] Is this all good or bad? It depends on what you want to do. If you simply want to be a passive consumer of content, similar to watching TV from your cable provider, then maybe its fine. If you want to be a CREATOR of content, such as someone who helps DP, SRs books from DP, "solos" books for PG, etc, then it's a pretty weak offering -- IMHO you would be much better off putting up with the hassles of a netbook which DOES allow one to quickly and painlessly transfer content using wifi. And if you are a reader omnivore like I am, then you will probably rapidly get sick of the Job's monopolistic restrictions constantly getting in the way of your ability to quickly and easily download What you want from Where you want reading it with Whatever reader applet YOU damned well choose -- NOT Steve Jobs! Other reasonable approaches: Wait for the HP Slate and see how cobbled-up its touch abilities are. At least it offers a REAL operating system -- why couldn't Apple have offered OS X on iPad ??? Buy a netbook and put up with the keyboard hassles. Buy a Kindle and put up with the crappy web browser and slow-and-unreliable "whispernet" AT&T connection -- at least you get a good built-in reader app and good screen technology. Buy a low-cost generic reader such as Libre Pro Buy an iPod and at least you're admitting you are reading on a cellphone and at least you are actually getting a cellphone--with the resulting compromises in space, speed, and OS. Wait and see if the next version of the OS for iPad is less compromised. From jimad at msn.com Tue Apr 20 12:01:12 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 12:01:12 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCCBA89.2040103@novomail.net> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> <4BCCBA89.2040103@novomail.net> Message-ID: >Now apparently, your complaint is not that PG HTML does not make good .epub files, or that including a generic stylesheet "breaks" the ".epub", but that you don't like the .epub generator that Mr. Perathoner wrote. That complaint, with which I sympathize, needs to be directed to him individually; it cannot, however, be generalized to /all/ .epub files, only those created by his software. First, it should be obvious to all the PG ePub is NOT simply HTML repackaged and compressed in that PG ePub is offered in two flavors, with and without "illustrations" and if those "illustrations" are illuminated caps then that is going to have at least SOME impact on the ePub files generated and the enjoyment or lack thereof of the end reader! My *complaint* rather was that YOU said it was not necessary to have access to Marcello's converter because I could easily create my own ePub files to see what my HTML would like as an ePub. Which was clearly false. My *suggestion* after *others* at PG complained that DP keeps turning out HTML which breaks when turned into PG ePub files was that maybe PG ought to offer Marcello's converter software in a portable form (I tried porting it but can't get it to work) so that DP authors (PP's) can actually TRY the ePub format as part of their content development process, and perhaps IF they saw for themselves that they were making choices in their HTML cutesiness that is causing the ebook readers experience to fail THEN perhaps they would make better choices. BUT, currently the only way to see how the ePubs or MOBI is going to turn out is to submit the completed HTML to PG for posting at which point in time its way too late to make more reasoned HTML design tradeoffs. From jimad at msn.com Tue Apr 20 12:15:35 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 12:15:35 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <470.723ebfcf.38fe090b@aol.com> References: <470.723ebfcf.38fe090b@aol.com> Message-ID: Sigh. Do any of you guys know what an eBook is or an eBook reader??? Ibis is yet another hack workaround, in this case offering a low quality rendering of an ePub from a list they maintain on their website, rendered into HTML, displayed while you are attached to the internet via cable or wifi. It doesn't allow you to download the book, nor does it allow you to download a book from a location you choose, but rather always from Feedbook. It doesn't allow you to choose font, or font size, or margins. It doesn't allow you to read on an airplane or on a beach or anything else an ebook reader allows. It doesn't contain all the PG catalog and certainly not any of the recent titles which 30 seconds of test will easily demonstrate. Eurcalyptus from their website says iPod not iPad and it says they work from ASCII format not ePub nor MOBI so they are not even working from eBook files. >try ibisreader. >i don't know if "eucalyptus" is ipad-native yet, but when it is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Tue Apr 20 12:35:52 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 20 Apr 2010 12:35:52 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> <4BCCBA89.2040103@novomail.net> Message-ID: Summary of the situation (as it seems to me). DP is currently taking too long to produce texts that are are either less (plain-text) or more (DP-style HTML) than the supply chain is able to convey to the end-readers to deliver the experience intended. Once DP delivers their content in one or both of those formats, it's for all purposes stuck at PG (nicely symmetrical with how it had previously been stuck at DP) because while DP had the raw materials but no finished goods, PG has the finished goods but no raw materials. So for whatever purpose (quality improvement, error correction, evolving requirements) PG's products grow stale. What can DP do in the reasonable short-term future that would be low risk and low effort? The first most obvious to me is to start getting serious about passing along the raw materials. Upload in as complete form as possible the matching image and text files so future modification and adaptation is possible. There's no loss to DP by doing so; and the risk is that over time they are quite capable of losing track of them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 13:31:36 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 13:31:36 -0700 Subject: [gutvol-d] EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: [Changed the header per Michaels request] >Once again, I repeat, I don't care about any one particular format, so please stop pretending this is a valid topic with me, But then why do you keep offering advice to me that you know will not work for me because I *am* interested in eBook formats and eBook content development such as DP does which is HTML rendered into ePub and MOBI? PG doesn't even typically support PDF so how does the fact tha Safari happens to kind of support reading documents in PDF have anything to do with PG? >And just how many systems do you have, or know of, that wifi to two wifi spots at the same time??? My generic $300 netbook has absolutely no problem whatsoever fresh out of the box using wifi to transfer files either from the internet or to/from the publically shared folders of all my locally networked computers. I never have to change a wifi setting just to get a file from a differing location! >Sorry, reading the manual and looking up Apps does not qualify you with any actual experience on the subject. OK I spent another 3+ hours at the Apple Store (bricks and mortor) last night trying all the "books" apps there including all of your suggestions and none of them do anything "reasonable" like actually allowing one to transfer a ePub or MOBI file from a chosen location on the internet onto the iPad and allowing one to read that eBook there. >Not to mention that YOU did not mention, once again, the actual NAME of the products you are claiming so much expertise about. I tried all the ones you mentioned previous plus all the ones searchable via the App Store searching on "book" or "ebooks" as you suggested previously. >Nevertheless, if, at the end of all this, you send me a list of questions, comments, complaints, etc., I will try to go get you answers from Apple. The question, complain, comments, etc would be the same as from the start, namely: "If in fact one can do so on iPad, how does one use iPad to download a free eBook in ePub or MOBI format via wifi from an internet site that I choose, storing that ePub or MOBI eBook on the iPad, and then get iPad to open that eBook for reading at this and/or a later date -- when I may or may not have an internet connection?" This is quite possible from Kindle (whispernet), desktops, laptops, and netbooks, for example, so I don't think it's an unreasonable expectation. >And hasn't Apple made it totally obvious you can't do with with an iPad??? Then why in gawds name would anyone want to get an iPad as an ebook reader? >Ah, now, at this late stage, you have added that you want to edit eBooks on the iPad. I can live without it. What makes iPad uninteresting to me is if one cannot use wifi to download an ePub or MOBI file from a location of my own choosing. I can't edit eBooks on the Kindle, for example, but the Kindle does allow me to bookmark problem spots in the text I am SR'ing, and then I can transfer the bookmarks to my desktop, use that info to locate the problems in the text under development, and fix it there. >If you spent the same four hours' worth on a Kindle, and liked it so much you were already programming with it, I can't imagine why you are having this conversation at all. I think if you have been following these conversations at all over the preceding months you would know that I am not in love with any particular ebook reader which is why I am still on the outlook for something that would work better. Apple hyped how much better iPad would be, so I tried it, and found that in fact it consists of demoware. >Unless it is just to moan and complain in front of an audience to somehow "get even" with Apple for being. . .well. . .Apple. I have to admit I have not spent much time on Apple desktops or laptops but I cannot believe that Apple could possibly be *this* restrictive on their desktops or laptops or they would be out of business. The question is not then whether or not I like Apple, but rather whether or not iPad offers anything new and interesting in terms of an eBook Reader. You claimed it did. I tried it and it doesn't work. If iPad is that restrictive, then I don't like it. I also don't like nook for the same reason -- namely nook has a wifi but doesn't let the customers use it for anything except buying books from B&N. Why should I pay for a "feature" I am not allowed to use? Does that mean I hate B&N? No, if I want to buy a book or magazine I still go to B&N -- I just don't spend my money on a nook designed to lock me into only being able to spend more money on a nook. Do I "love" Kindle -- no, it has a crappy web browser, is slow to open PDF files, has the lousy slow AT&T "whispernet" connection etc. Yet even with these restrictions I CAN get things done with Kindle, whereas iPad successfully blocks everything I try to do. >At first you denied using iBooks at all, don't you remember??? No. Quote me when. >> I told you we tried Stanza, because I told you about the large blurry iPod >Actually, you spoke of that as if it were a hypothetical... I don't think I did. Quote me when. >So, you are admitting you never asked for what you didn't get. Strange. Why would I ask for the privilege of paying a subscription fee to develop apps for a device that doesn't work? >You seem to be bringing up something new, and of great interest. I have talked about it before on this same forum so it is not new and flamed Amazon for their stupidity then just as I am flaming Apple for their stupidity now. Search on "1984 Amazon" if anyone is interested in the "1984" Amazon Kindle act of stupidity. Read http://manuals.info.apple.com/en_US/iPad_User_Guide.pdf re iTunes syncing if you want to read about Apple's act of stupidity. >Now, just above, you said you were using iBooks, doesn't that count? iBooks was on the iPad already, so no, I didn't download it. When I went back to the Apple store again last night at your suggestion they repeated that I was not allowed to download apps and that if I tried to do so it would not work. I waited till they were not looking, tried downloading apps, and eventually figured out how to get the app downloads to work. The apps *themselves* once downloaded however do not allow downloading of free ePub and MOBI books from a website of my choosing, storing those on the iPad for reading later, so the apps you ask me to install don't do anything interesting or useful to me. >You keep short-changing Wattpad, which I think I mentioned first. I did download Wattpad and it simply yet another app that ties to one particular server on the internet downloading a subset of PG books lightly reformatted from ASCII plaintext. >When at the Apps Store, just search for "ebooks" and "books" etc. >How many times have I said that??? I did that, tried everything, again the apps out there all connect to a private server on the internet downloading a subset of PG books lightly reformatted from ASCII plaintext. iBooks is a bit better in that they take PG ePub, hack it to represent it as-if it comes from Apple, and redistribute it from their servers. This also means that they only serve up a subset of PG works, and it means that it is not useful for content development, such as SR from DP. Kindle for iPad is a bit better in that they again take PG books, hack it to represent it as-if it comes from Amazon, and redistribute it from their servers -- but do it on a better reader app than iBooks. Which again means that they only serve up a subset of PG works, and it means that it is not useful for content development, such as SR from DP. > (Re Goodreader) You download it from the Apps Store. . . . But IT in turn cannot download ePub or MOBI books from a general location on the internet. > No, they just worked around their current version of Steve Jobs, such as working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony and all the rest. . . . Thinking back in time I think this was a somewhat true statement when app distribution was via computer stores. Since the internet has caught on I haven't had problems distributing content nor apps to whoever I want. The internet has a problem in that searching is via Google, and Google in turn does their own monopolistic practices, such as refusing to return a search "hit" on small websites even if you search on the exact name of that website -- unless one sends copious advertising dollars to Google. From jimad at msn.com Tue Apr 20 13:43:56 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 13:43:56 -0700 Subject: [gutvol-d] Re: the blind men and the .epub file-format In-Reply-To: <4BCCD051.7060508@novomail.net> References: <8ce40.74ecf6f2.38fe068e@aol.com> <4BCCD051.7060508@novomail.net> Message-ID: >Because native HTML is relative uncommon at PG, I would guess that most .epub files start the process as ITF. Please don't guess, but rather check it out. For example of books posted in the last 24 hours, 15 out of 17 came with native HTML. Playing around with Advanced Search it reports 21786 books in HTML native format verses 20828 in text format. IE going back over the entire history of PG about 2/3rds of the books have HTML native format. From Bowerbird at aol.com Tue Apr 20 14:16:40 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 Apr 2010 17:16:40 EDT Subject: [gutvol-d] summary of the recent discussion of e-books on the ipad Message-ID: <12108.203769e3.38ff73b8@aol.com> if you wonder if the recent flurry between jim and michael needs to be read, let me save you the time and trouble... according to michael, who doesn't care about the format of the books just as long as he can get to the content inside, it's pretty easy to read e-books on the ipad, especially when one has an always-on connection to the web, in which case one can access many sites, using many viewer-programs, specifically including the native safari web-browser... according to jim, who wants his e-books in .epub or .mobi, getting e-books on the ipad can be a large pain in the ass... jim also complains about the closed nature of the ipad, and prefers his netbook, albeit hasn't reported on whether or not the added weight of the netbook hampers his use of the unit. much drama and misunderstandings and unaddressed points were also part of the recent exchange of e-mails, but i think i've boiled the important aspects. if you have any questions, please feel free to ask them... -bowerbird p.s. if anyone actually _owns_ an ipad, would you report in? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 20 14:26:38 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 Apr 2010 17:26:38 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <12ba6.1c940b.38ff760e@aol.com> according to its website, ibisreader can: 1) fetch an .epub from any website, and 2) download it so it can be read offline... jim claims neither of these things is true. i have no dog in this fight. -bowerbird p.s. as i said, _when_ eucalyptus is ipad-native, i recommend it. the fact that it uses p.g. .txt files, instead of .epub files, is a _feature_, not a _bug_... and anyone who makes a claim that a p.g. .txt file is "not an e-book file" is a full-on bloomin' idiot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 20 14:29:23 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 Apr 2010 17:29:23 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <12ead.2be71cd9.38ff76b3@aol.com> dakretz said: > Summary of the situation (as it seems to me). > DP is currently taking too long to produce texts > that are are either less (plain-text) > or more (DP-style HTML) > than the supply chain is able to convey to the > end-readers to deliver the experience intended. "less" and "more" are value-laden terms. _and_ incorrect. please re-summarize... > Upload in as complete form as possible > the matching image and text files > so future modification and adaptation is possible. > There's no loss to DP by doing so oh dear boy. (just an expression. don is an old man. heck, he might even be _a_grandfather_ by this time.) but don, how much more personal denigration are you going to have to endure at the hands of d.p. apologists before you realize that you do not grok their mindset? let me 'splain it to you... d.p. has, in their hands, the scans from all the books that have gone through their system. so they _could_ have pushed them to project gutenberg at any time... indeed, charlz originally intended that d.p. itself would mount the scans. he called it the "online library system", and at one point in time, it actually came into existence. (it's probably still there, with some 6,000 scan-sets in it.) why hasn't it been maintained? well, _i_ happen to think that it's pretty obvious. but maybe that's because of what i do with those scans: i use them to point to unequivocal evidence of _errors_ in the "final product" emerging from the d.p. workflow. and that's what other people might do with them, too... does d.p. want us unequivocally pointing out their errors? no. ergo, they are keeping their scans to themselves... the myth of d.p. accuracy is one that keeps d.p. going... the powers-that-be over there do not want to put that myth up against _any_ solid evidence to the contrary... and it's not that hard to understand, either. rfrank was eager to see the results of my check on the "sitka" book, at least when he thought that check would be _positive_. but when it was less than flattering, he clammed right up. it's hard for some people to admit they make mistakes... even if they can do it in a "general" way, when it comes to close-eyed examination of specifics, they're uninterested, and might even go to great lengths to suppress evidence... > and the risk is that over time > they are quite capable of losing track of them. they have the scans firmly in their grasp now, and they wish to retain control, so they simply are not worried about "losing track of them"... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 14:57:40 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 14:57:40 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCDC7B4.9080605@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> <4BCDC069.2050608@yahoo.com> <4BCDC7B4.9080605@perathoner.de> Message-ID: >Its much easier to forget about fancy formatting and use only the simplest HTML constructs. I think for the most part people are after reviewing the recent submissions. It seems like there a only a few commonly repeated mistakes PPs do that confound ePub and MOBI generation: What do to about illustrations in the ePub and MOBI files distributed without illustration. Illuminated Initial Caps Drop Caps Text represented as Illustration for some reason (PP thought the original text looked so cool that some of it was introduced as an Illustration) Equations "typeset" in Unicode/HTML I wonder if instead of enumerating the HTML constructs people are allowed to use if it wouldn't be better simply to enumerate the HTML practices that will lead to trouble? Again, I don't think people are trying to cause trouble, they just get seduced by some visual aspect of HTML without realizing the problems that will cause later. From jimad at msn.com Tue Apr 20 15:13:02 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 15:13:02 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> <4BCCBA89.2040103@novomail.net> Message-ID: >The first most obvious to me is to start getting serious about passing along the raw materials. Upload in as complete form as possible the matching image and text files so future modification and adaptation is possible. There's no loss to DP by doing so; and the risk is that over time they are quite capable of losing track of them. I suggest that it is helpful if possible for the HTML to be submitted with the linebreaks the same as the original book, and that PG retain those linebreaks rather than changing the line lengths by say running the HTML through "tidy." Or else at least retain the submitted HTML internally with the original linebreaks to make it easier to fix problems or make another pass through DP or some other process some day. Pgdiff can be used to recover the linebreaks, but it is less work if the linebreaks are never discarded in the first place. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 15:21:35 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 15:21:35 -0700 Subject: [gutvol-d] Re: the blind men and the .epub file-format In-Reply-To: References: <8ce40.74ecf6f2.38fe068e@aol.com> <4BCCD051.7060508@novomail.net> Message-ID: 20828 ^ sorry, should read 30828 From hart at pglaf.org Tue Apr 20 15:39:45 2010 From: hart at pglaf.org (Michael S. Hart) Date: Tue, 20 Apr 2010 15:39:45 -0700 (PDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> <4BCDC069.2050608@yahoo.com> <4BCDC7B4.9080605@perathoner.de> Message-ID: As I said before, no one can or will complain vociferously if BOTH the illuminated caps AND the ASCII are included. It won't hurt the readability, and it won't matter where the illumination ends up in nearly such exact terms. Why make this so much harder than is has to be???!!! Just make it so everyone can BOTH read the text AND appreciate the illumination. So. . .please. . .stop wasting time and effort, and just make it easy on all concerned, as it should be. No more mountains made out of molehills. . . . Thanks!!! Give eBooks in 2010!!! Michael S. Hart Founder Project Gutenberg Inventor of eBooks Recommended Books: Dandelion Wine, by Ray Bradbury: For The Right Brain Diamond Age, by Neal Stephenson: To Understand The Internet The Phantom Tollbooth, by Norton Juster: Lesson of Life. . . If you ever do not get a prompt response, please resend, then keep resending, I won't mind getting several copies per week. On Tue, 20 Apr 2010, James Adcock wrote: > >Its much easier to forget about fancy formatting and use only the > simplest HTML constructs. > > I think for the most part people are after reviewing the recent submissions. > It seems like there a only a few commonly repeated mistakes PPs do that > confound ePub and MOBI generation: > > What do to about illustrations in the ePub and MOBI files distributed > without illustration. > > Illuminated Initial Caps > > Drop Caps > > Text represented as Illustration for some reason (PP thought the original > text looked so cool that some of it was introduced as an Illustration) > > Equations "typeset" in Unicode/HTML > > I wonder if instead of enumerating the HTML constructs people are allowed to > use if it wouldn't be better simply to enumerate the HTML practices that > will lead to trouble? Again, I don't think people are trying to cause > trouble, they just get seduced by some visual aspect of HTML without > realizing the problems that will cause later. > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From kevin.pulliam at gmail.com Tue Apr 20 15:48:37 2010 From: kevin.pulliam at gmail.com (Kevin Pulliam) Date: Tue, 20 Apr 2010 17:48:37 -0500 Subject: [gutvol-d] Archiving of Raw Materials was Re: Re: DP output is technically obsolete Message-ID: On Tue, Apr 20, 2010 at 2:35 PM, don kretz wrote: > Summary of the situation (as it seems to me). > SNIP > > What can DP do in the reasonable short-term future that would be low risk > and low > effort? > > The first most obvious to me is to start getting serious about passing along > the > raw materials. Upload in as complete form as possible the matching image and > text files so future modification and adaptation is possible. There's no > loss to > DP by doing so; and the risk is that over time they are quite capable of > losing > track of them. SNIP Yes please. Please provide a complete archive of the project files (Scans, text, clearance docs, etc) in an obvious location for all projects and submissions (DP or otherwise). This can be at PG, at DP, at archive.org or elsewhere... I don't care where, so long as it has the same permanence of a PG text, and has the same 'lack' of barriers to access as PG texts. Just as open source projects literally require distribution of the un-compiled code so that future folks can see the foundation of the work as well as the finished product, so can ebooks benefit if future users can (if they choose) see the basis of the work as well as the finished product if they feel like offering an improved or altered version. Thanks! Kevin From jimad at msn.com Tue Apr 20 15:49:19 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 15:49:19 -0700 Subject: [gutvol-d] Re: summary of the recent discussion of e-books on the ipad In-Reply-To: <12108.203769e3.38ff73b8@aol.com> References: <12108.203769e3.38ff73b8@aol.com> Message-ID: >.jim also complains about the closed nature of the ipad, and prefers his netbook, albeit hasn't reported on whether or not the added weight of the netbook hampers his use of the unit. LOL, thank you for the intelligent summary! However, I *have* reported here previously on my experience with the added weight of a netbook: Namely that when I try to hold it and read one-handed it weighs enough that my hand falls asleep. A bigger problem is the attached keyboard that I don't really need when I am reading a book, except that the attached keyboard has the page-turn buttons located in really stupid and unhelpful locations, such that when simply reading a book one-handed on a netbook turning pages is a pain in the *ss! Weight-wise iPad is slightly bigger thicker and heavier than a Kindle DX which is my current go-to preferred reading device, and the iPad is about the size and weight of the non-keyboard half of a netbook - which means overall a netbook is about 2X the size and weight of an iPad. Again, problems with the Kindle DX: Slow and unreliable AT&T "whispernet" wireless connection Slow and crappy basic web browser Slow PDF loads and page turns Difficult to use to write even basic notes - for example if one wants to take notes of the problems one sees when doing an SR. DRM policies of the books you buy from Amazon (as opposed to free books) are overly restrictive. Low contrast display when in low-light situations. Problems with a netbook: Too heavy Battery life too short Keyboard is not useful and is awkward when one just wants to read a book. Screen door effect Problems with iPad: Can't use wifi to download a ePub or MOBI book to the iPad, must download to a desktop computer and from there to iTunes to USB to iPad. Screen door effect ePub and MOBI reader apps on iPad not as good as those available on other platforms. We don't really know yet in practice how restrictive Apple DRM policies will prove to be [on purchased books] - in practice on free ePub and MOBI books they are very annoying. 3G monthly wireless prices are pricey - and once again comes from AT&T! [At least Kindle's 3G wireless is free - and worth every penny!] -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Tue Apr 20 15:55:14 2010 From: hart at pglaf.org (Michael S. Hart) Date: Tue, 20 Apr 2010 15:55:14 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: If you read the following carefully, you will see that Jim Adcock has continued to change the frames of reference of his decisions, which were, obviously, already made before he did his research. As we all have heard from very reliable and value sources, quite literally even the best research techniques suffer greatly via a new interpretation by someone leaning in any biased direction. The point is, and always has been, that Apple and the iPad NEVER were a real consideration for Mr. Adcock and he just keeps on in the process of making more and more OBVIOUS objections that were ALREADY OBVIOUS from the start. ALL OBVIOUS, ALL THE TIME, NOTHING NEW, JUST RANTING AND RAVING. I use his own words from below to help you decide: "Then why in gawds name would anyone want to get an iPad as an ebook reader?" Mr. Adcock NEVER wanted to get an iPad as an eBook reader. This was never a choice that was up for discussion. The only discussion from him: all the reasons he wouldn't. This is what THE LIST OF FAMOUS FALLACIES calls the trick of: "THE DOG IN THE MANGER." Just another secondary school fallacy brought back to life as if by a rather limited Frankenstein. Frankenstein's monster was much more of a humanist. . . . On Tue, 20 Apr 2010, James Adcock wrote: > [Changed the header per Michaels request] > > >Once again, I repeat, I don't care about any one particular format, > so please stop pretending this is a valid topic with me, > > But then why do you keep offering advice to me that you know will not work > for me because I *am* interested in eBook formats and eBook content > development such as DP does which is HTML rendered into ePub and MOBI? PG > doesn't even typically support PDF so how does the fact tha Safari happens > to kind of support reading documents in PDF have anything to do with PG? > > >And just how many systems do you have, or know of, that wifi to two wifi > spots at the same time??? > > My generic $300 netbook has absolutely no problem whatsoever fresh out of > the box using wifi to transfer files either from the internet or to/from the > publically shared folders of all my locally networked computers. I never > have to change a wifi setting just to get a file from a differing location! > > >Sorry, reading the manual and looking up Apps does not qualify you with > any actual experience on the subject. > > OK I spent another 3+ hours at the Apple Store (bricks and mortor) last > night trying all the "books" apps there including all of your suggestions > and none of them do anything "reasonable" like actually allowing one to > transfer a ePub or MOBI file from a chosen location on the internet onto the > iPad and allowing one to read that eBook there. > > >Not to mention that YOU did not mention, once again, the actual NAME of > the products you are claiming so much expertise about. > > I tried all the ones you mentioned previous plus all the ones searchable via > the App Store searching on "book" or "ebooks" as you suggested previously. > > >Nevertheless, if, at the end of all this, you send me a list of questions, > comments, complaints, etc., I will try to go get you answers from Apple. > > The question, complain, comments, etc would be the same as from the start, > namely: > > "If in fact one can do so on iPad, how does one use iPad to download a free > eBook in ePub or MOBI format via wifi from an internet site that I choose, > storing that ePub or MOBI eBook on the iPad, and then get iPad to open that > eBook for reading at this and/or a later date -- when I may or may not have > an internet connection?" > > This is quite possible from Kindle (whispernet), desktops, laptops, and > netbooks, for example, so I don't think it's an unreasonable expectation. > > >And hasn't Apple made it totally obvious you can't do with with an iPad??? > > Then why in gawds name would anyone want to get an iPad as an ebook reader? > > >Ah, now, at this late stage, you have added that you want to edit eBooks > on the iPad. > > I can live without it. What makes iPad uninteresting to me is if one cannot > use wifi to download an ePub or MOBI file from a location of my own > choosing. I can't edit eBooks on the Kindle, for example, but the Kindle > does allow me to bookmark problem spots in the text I am SR'ing, and then I > can transfer the bookmarks to my desktop, use that info to locate the > problems in the text under development, and fix it there. > > >If you spent the same four hours' worth on a Kindle, and liked it > so much you were already programming with it, I can't imagine why > you are having this conversation at all. > > I think if you have been following these conversations at all over the > preceding months you would know that I am not in love with any particular > ebook reader which is why I am still on the outlook for something that would > work better. Apple hyped how much better iPad would be, so I tried it, and > found that in fact it consists of demoware. > > >Unless it is just to moan and complain in front of an audience to > somehow "get even" with Apple for being. . .well. . .Apple. > > I have to admit I have not spent much time on Apple desktops or laptops but > I cannot believe that Apple could possibly be *this* restrictive on their > desktops or laptops or they would be out of business. The question is not > then whether or not I like Apple, but rather whether or not iPad offers > anything new and interesting in terms of an eBook Reader. You claimed it > did. I tried it and it doesn't work. If iPad is that restrictive, then I > don't like it. I also don't like nook for the same reason -- namely nook > has a wifi but doesn't let the customers use it for anything except buying > books from B&N. Why should I pay for a "feature" I am not allowed to use? > Does that mean I hate B&N? No, if I want to buy a book or magazine I still > go to B&N -- I just don't spend my money on a nook designed to lock me into > only being able to spend more money on a nook. Do I "love" Kindle -- no, it > has a crappy web browser, is slow to open PDF files, has the lousy slow AT&T > "whispernet" connection etc. Yet even with these restrictions I CAN get > things done with Kindle, whereas iPad successfully blocks everything I try > to do. > > >At first you denied using iBooks at all, don't you remember??? > > No. Quote me when. > > >> I told you we tried Stanza, because I told you about the large blurry > iPod > > >Actually, you spoke of that as if it were a hypothetical... > > I don't think I did. Quote me when. > > >So, you are admitting you never asked for what you didn't get. > > Strange. Why would I ask for the privilege of paying a subscription fee to > develop apps for a device that doesn't work? > > >You seem to be bringing up something new, and of great interest. > > I have talked about it before on this same forum so it is not new and flamed > Amazon for their stupidity then just as I am flaming Apple for their > stupidity now. Search on "1984 Amazon" if anyone is interested in the > "1984" Amazon Kindle act of stupidity. Read > http://manuals.info.apple.com/en_US/iPad_User_Guide.pdf re iTunes syncing if > you want to read about Apple's act of stupidity. > > >Now, just above, you said you were using iBooks, doesn't that count? > > iBooks was on the iPad already, so no, I didn't download it. When I went > back to the Apple store again last night at your suggestion they repeated > that I was not allowed to download apps and that if I tried to do so it > would not work. I waited till they were not looking, tried downloading > apps, and eventually figured out how to get the app downloads to work. The > apps *themselves* once downloaded however do not allow downloading of free > ePub and MOBI books from a website of my choosing, storing those on the iPad > for reading later, so the apps you ask me to install don't do anything > interesting or useful to me. > > >You keep short-changing Wattpad, which I think I mentioned first. > > I did download Wattpad and it simply yet another app that ties to one > particular server on the internet downloading a subset of PG books lightly > reformatted from ASCII plaintext. > > >When at the Apps Store, just search for "ebooks" and "books" etc. > > >How many times have I said that??? > > I did that, tried everything, again the apps out there all connect to a > private server on the internet downloading a subset of PG books lightly > reformatted from ASCII plaintext. iBooks is a bit better in that they take > PG ePub, hack it to represent it as-if it comes from Apple, and redistribute > it from their servers. This also means that they only serve up a subset of > PG works, and it means that it is not useful for content development, such > as SR from DP. Kindle for iPad is a bit better in that they again take PG > books, hack it to represent it as-if it comes from Amazon, and redistribute > it from their servers -- but do it on a better reader app than iBooks. Which > again means that they only serve up a subset of PG works, and it means that > it is not useful for content development, such as SR from DP. > > > (Re Goodreader) You download it from the Apps Store. . . . > > But IT in turn cannot download ePub or MOBI books from a general location on > the internet. > > > No, they just worked around their current version of Steve Jobs, such as > working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony > and all the rest. . . . > > Thinking back in time I think this was a somewhat true statement when app > distribution was via computer stores. Since the internet has caught on I > haven't had problems distributing content nor apps to whoever I want. The > internet has a problem in that searching is via Google, and Google in turn > does their own monopolistic practices, such as refusing to return a search > "hit" on small websites even if you search on the exact name of that website > -- unless one sends copious advertising dollars to Google. > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From hart at pglaf.org Tue Apr 20 16:04:00 2010 From: hart at pglaf.org (Michael S. Hart) Date: Tue, 20 Apr 2010 16:04:00 -0700 (PDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> Message-ID: Once again I must insist Mr. Adcock stop putting words in my mouth. You notice he doesn't put his answers in the context I asked them. These products all "work as I suggested," they just NEVER WERE INTENDED TO WORK AS MR. ADCOCK WOULD HAVE WANTED, AND HE KNEW THIS BEFOREHAND. . .nothing new here, just the same old same old. There are plenty of options to read in higher res than iPod/iPhone. That was your original complaint. There are plenty of PG eBooks. Also part of your original complaint. That you want to go behind the counter and rearrange things? Sorry, it's their store, their counter, not at all up to you. Make your own. . . . On Tue, 20 Apr 2010, James Adcock wrote: > >If only Jim would have been as thorough, and polite, about the iPad. > > I don't know what your problem is, but per your suggestions I went back to > the Apple "brick and mortar" store yesterday, and spent an additional three+ > hours researching all the iPad suggestions you made. None of them "worked" > as you suggested. None of them allow direct access to allow one to read > ebook files either ePub or MOBI via wifi access to the internet. None of > them allow latest access to the most recent books in ePub format released on > PG. What almost all of them do is allow access to an internet server tied > to that particular applet that allows one to read some subset of the PG > offerings in some degraded form, typically a slightly spruced up "pretty > print" version of a text file. This can simply be checked by searching for > one of the latest PG books in which case you will find that NONE of these > applets offer the latest PG books. > > Unlike most full function web browsers on desktops or even netbooks, if one > uses the provided Safari web browser to click on an ePub or MOBI file Safari > says "You cannot download that file." You CAN use Safari to open a PDF > file, which makes iPad useful for reading Google Books PDF "photocopies" of > books. Just not useful for what PG offers. Also even IF you use Safari to > open a PDF file, apparently Safari does not retain a copy of that book > because when I run Safari again it redownloads that PDF file again from > scratch. Why does one care? Well, that's slow, and it means that this > doesn't work in "airplane mode" ie you cannot read the PDF file on an > airplane via Safari with the wifi turned off. The ability to actually > download and save book files is a fundamental feature of "real" book > readers, and in the design of "real" ebook formats such as ePub and MOBI, > such that when you download a book, you HAVE that book, and can then read it > wherever and whenever you choose without requiring wifi or other wired > connection. > > What the Apple manuals say (finally released yesterday and read last night) > is that one is allowed to transfer free book files in ePub format via USB > cable from your desktop to your iPad via iTunes. You download a free ePub > book to your desktop. From there you transfer it to the iTunes software. > Then you hook up your iPad using a USB. Then you sync to iTunes. Then you > safely unplug your iPad. Then you open iBooks. Then you find the new book > in the iBooks "shelf" which you can then finally click on and start reading. > > > As opposed to one click on a link to a free ePub or MOBI book link say at PG > using a netbook browser, which downloads the file, stores it, opens the > reader app and you are up and reading. 1 second verses 10 minutes of hassle > factor. > > Further, the iPad manuals say that Apple has *permanently* given Apple > applets priority over other applets for the file types that the iPad > supports. IE ePub type is "hardwired" to the iBooks applet such if you > transfer an ePub file via the long-winded iTunes USB "sync" process then one > can only read that ePub using the iBooks app -- which is a pretty weak app > compared to other ePub and MOBI readers if one has made the comparison. > [Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and > allowed no other browser choice! Can you say "Monopoly," I knew you could] > > What CAN iPad do? > > It can reasonably present paid books from Apple on iBooks (not the greatest > reader app, but not too horrible either) > > It can reasonably present a free subset of PG's offerings repackaged as-if > they came from Apple on iBooks > > It can reasonably present paid books from Amazon via Kindle for iPad > > It can reasonably present a free subset of PG's offerings repackaged as-if > they came from Amazon via Kindle for iPad > > If you have already bought books for a Kindle then Kindle for iPad will also > allow you to read them for no additional cost on iPad > > It can reasonably present PDF books and documents via Safari as long as you > have an active wifi connection > > It can store and allow you to read free ePub and other common document > formats that you have transferred to iPad using the slow and cumbersome > Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot > test it in the Apple store because they don't have USB to desktop set up] > > Is this all good or bad? It depends on what you want to do. If you simply > want to be a passive consumer of content, similar to watching TV from your > cable provider, then maybe its fine. If you want to be a CREATOR of > content, such as someone who helps DP, SRs books from DP, "solos" books for > PG, etc, then it's a pretty weak offering -- IMHO you would be much better > off putting up with the hassles of a netbook which DOES allow one to quickly > and painlessly transfer content using wifi. And if you are a reader omnivore > like I am, then you will probably rapidly get sick of the Job's monopolistic > restrictions constantly getting in the way of your ability to quickly and > easily download What you want from Where you want reading it with Whatever > reader applet YOU damned well choose -- NOT Steve Jobs! > > Other reasonable approaches: > > Wait for the HP Slate and see how cobbled-up its touch abilities are. At > least it offers a REAL operating system -- why couldn't Apple have offered > OS X on iPad ??? > > Buy a netbook and put up with the keyboard hassles. > > Buy a Kindle and put up with the crappy web browser and slow-and-unreliable > "whispernet" AT&T connection -- at least you get a good built-in reader app > and good screen technology. > > Buy a low-cost generic reader such as Libre Pro > > Buy an iPod and at least you're admitting you are reading on a cellphone and > at least you are actually getting a cellphone--with the resulting > compromises in space, speed, and OS. > > Wait and see if the next version of the OS for iPad is less compromised. > > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From jimad at msn.com Tue Apr 20 16:33:52 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 16:33:52 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <12ba6.1c940b.38ff760e@aol.com> References: <12ba6.1c940b.38ff760e@aol.com> Message-ID: >according to its website, ibisreader can: >1) fetch an .epub from any website, and >2) download it so it can be read offline... >jim claims neither of these things is true. Well you said I could try it from my desktop and I did and it didn't have that capabilities claimed. Rather on an "experimental" basis it says if you own a website and have a database set up the way they expect that database to be set up THEN you can add that website to their list of supported websites - which currently is just Feedbooks. And I'm guessing when they say "download" they mean that if the source is in ePub then THEY will scarf it to THEIR computer and then re-present it to you as HTML live to your attached HTML browser for as long as you have an internet connection - because that's what they do - they scarf and store the books on THEIR computere "for you," re-present it as HTML and then they claim this as a "feature." > anyone who makes a claim that a p.g. .txt file is "not an e-book file" is a full-on bloomin' idiot. I can never resist having BB call me an idiot (and more recently Michael) so here goes: PG txt file is NOT AN E-BOOK FILE because it does not meet at least one criterion that is universally accepted as being required of ebook file formats: namely reflow. Txt format can reflow, but PG txt format cannot reflow because it has hardwired linebreaks at around 70 chars. Yes I know that *in theory* if Apple say (LOL) wanted to they could write a PGTXT70 file format reader that would unwrap those line breaks more or less successfully most of the time but since the rest of the computer world sans PG decided circa 1970 than hardwired linebreaks is A BAD IDEA it seems highly unlikely that Apple or anyone else is going back to the future to fix PG's txt problems now. If you like PG TXT format is a "teletype file" because its capabilities are designed around the capabilities of teletypes circa 1970 which used ASCII and had 72 chars per line. I for one thank god when I got rid of my teletype after it burned out the third time trying to do microprocessor development circa 1976! Technician couldn't understand why all the grease in there kept getting baked into bricks - said AP never uses their machines this hard! One good introductory read about what an eBook File IS can be found at: http://en.wikipedia.org/wiki/EPub Other characteristics uniformly expected of eBook files include: Encapsulation: Download one file and you have all you need to read the book in the future without wireless connection. AKA "airplane mode" Book Metadata: Author, Title, TOC, Index, etc. at defined locations in a defined manner such that any reader app or bookshelf app can display these easily - without having to open and read the whole book. Sure, one could define how one or more of these things are suppose to work, and you could put it all in a zip file to encapsulate it, and then you can just change the txt extensions to .html on these "txt" files and change the .zip package extension to .epub and then one would have, well, then I guess then one would have an epub not a PG txt file anymore. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Tue Apr 20 16:34:49 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 20 Apr 2010 16:34:49 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <12ead.2be71cd9.38ff76b3@aol.com> References: <12ead.2be71cd9.38ff76b3@aol.com> Message-ID: Great news! Let's test this thesis. I'm currently working through the very first Britannica project ever - The Project Gutenberg Encyclopedia, Volume 1 of 28. It's etext # 200, dated "1995-01-01". It's in sad shape. Text only, many errors apparent to the casual eye. I'd like to reprocess it. Can anyone from DP tell me how to get the scans? On Tue, Apr 20, 2010 at 2:29 PM, wrote: > dakretz said: > > Summary of the situation (as it seems to me). > > DP is currently taking too long to produce texts > > that are are either less (plain-text) > > or more (DP-style HTML) > > than the supply chain is able to convey to the > > end-readers to deliver the experience intended. > > "less" and "more" are value-laden terms. > _and_ incorrect. please re-summarize... > > > > > Upload in as complete form as possible > > the matching image and text files > > so future modification and adaptation is possible. > > There's no loss to DP by doing so > > > d.p. has, in their hands, the scans from all the books > that have gone through their system. so they _could_ > have pushed them to project gutenberg at any time... > > indeed, charlz originally intended that d.p. itself would > mount the scans. he called it the "online library system", > and at one point in time, it actually came into existence. > (it's probably still there, with some 6,000 scan-sets in it.) > > why hasn't it been maintained? > > well, _i_ happen to think that it's pretty obvious. > > but maybe that's because of what i do with those scans: > i use them to point to unequivocal evidence of _errors_ > in the "final product" emerging from the d.p. workflow. > > and that's what other people might do with them, too... > > does d.p. want us unequivocally pointing out their errors? > > no. > > ergo, they are keeping their scans to themselves... > > the myth of d.p. accuracy is one that keeps d.p. going... > the powers-that-be over there do not want to put that > myth up against _any_ solid evidence to the contrary... > > and it's not that hard to understand, either. rfrank was > eager to see the results of my check on the "sitka" book, > at least when he thought that check would be _positive_. > but when it was less than flattering, he clammed right up. > it's hard for some people to admit they make mistakes... > even if they can do it in a "general" way, when it comes to > close-eyed examination of specifics, they're uninterested, > and might even go to great lengths to suppress evidence... > > > > > and the risk is that over time > > they are quite capable of losing track of them. > > they have the scans firmly in their grasp now, > and they wish to retain control, so they simply > are not worried about "losing track of them"... > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Morasch at aol.com Tue Apr 20 16:48:26 2010 From: Morasch at aol.com (Morasch at aol.com) Date: Tue, 20 Apr 2010 19:48:26 EDT Subject: [gutvol-d] a longstanding question has finally been answered Message-ID: <285bc.71539bf.38ff974a@aol.com> > "Then why in gawds name would anyone > want to get an iPad as an ebook reader?" people want the ipad for _lots_ of uses... not just as an e-book reader-machine... the fact that they get an e-book machine thrown in, for free, is icing on the cake... (and many of these users don't even read, which means they don't like cake or icing.) which reminds me... remember back in the day, when one of the most cherished merry-go-round topics on every e-book listserve was whether people would want a _dedicated_ e-book machine or a _multipurpose_ one? gosh, how many pleasant afternoons were spent composing posts on that dependable hobby-horse topic? well, folks, the winner has now been decided. amazon offered up a good dedicated machine. and they've moved about 3 million units so far. and they probably coulda moved twice as many if they would have fixed the obvious problems. throw in all the nooks and sonys, and we've got a downright respectable total for _dedicated_... but, on the other hand, however, we have apple. the iphone/ipodtouch has sold 80 million so far. and the ipad moved 300,000 units on pre-order and first-weekend sales alone, if we trust apple. and the 3g model i await isn't even available yet. so now we know... people prefer multi-purpose... the winner has been declared. which is not to say that all you people with a kindle must send it back. if you're happy with it, that's all that matters, really. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Morasch at aol.com Tue Apr 20 17:05:09 2010 From: Morasch at aol.com (Morasch at aol.com) Date: Tue, 20 Apr 2010 20:05:09 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <29487.1b1c9876.38ff9b35@aol.com> jim said: > PG txt file is NOT AN E-BOOK FILE > because it does not meet at least > one criterion that is universally accepted > as being required of ebook file formats: > namely reflow. jim, jim, jim, jim, jim, jim, jim. it's bad enough that i call you a bloomin' idiot. but it's even worse when you come right back with a reply that _proves_ that's what you are. one of the most widely-used e-book formats in the last 20 years has been the .pdf format -- a format which has not, historically, done reflow. yet you want to rule it out _by_definition_? please. i was _fighting_ against .pdf as an e-book format for many, many years before you even showed up, but even i cannot deny that it _is_ an e-book format. _any_ file-format which can express a book _is_ -- or can be considered as -- an e-book format. you seem to think you define terms of engagement, that any discussion must be conducted according to the way that _you_ define words. that's bullcrap, jim. *** besides, even if we _accepted_ your stupid definition, it still doesn't compute, jim, because an ascii-file like the p.g. e-text format _can_ be reflowed, quite easily. you just take out the mid-paragraph hard line-breaks. _any_ e-book programmer can write code to do that... voila! reflow! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Morasch at aol.com Tue Apr 20 17:08:20 2010 From: Morasch at aol.com (Morasch at aol.com) Date: Tue, 20 Apr 2010 20:08:20 EDT Subject: [gutvol-d] oh geez Message-ID: <29721.5b0df6c8.38ff9bf4@aol.com> oh geez, now lee is gonna call me "mr. morasch" again... :+) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 17:10:56 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 17:10:56 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >The point is, and always has been, that Apple and the iPad NEVER were a real consideration for Mr. Adcock and he just keeps on in the process of making more and more OBVIOUS objections that were ALREADY OBVIOUS from the start. You keep saying things that are not true of me Michael, and which are not true of the iPad. It certainly was not obvious to me that the iPad would not allow download of ePub and MOBI via wifi. It is also not true that the iPad was not a real consideration for me, and it is also not true that I wouldn't reconsider the iPad if the future OS is less restrictive. I don't understand why it is that *you* are so defensive about the iPad? Because you bought one??? I buy a lot of Dell computers, but if someone states an opinion that Dell is a load of cr*p then I'm not going to get bent out of shape, and if someone says that Amazon or Mickeysoft have made a hell of a lot of stupid decisions in their day -- well, I couldn't agree with that more! From jimad at msn.com Tue Apr 20 17:27:20 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 17:27:20 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <12ead.2be71cd9.38ff76b3@aol.com> Message-ID: >I'm currently working through the very first Britannica project ever - The Project Gutenberg Encyclopedia, Volume 1 of 28. It's etext # 200, dated "1995-01-01". It's in sad shape. Text only, many errors apparent to the casual eye. I'd like to reprocess it. I can't tell you how to get the scans but I have tools that will help you recover the original lines breaks and match the PG text against a new OCR, helping identify errors in both the OCR and the existant PG text. Let me know if you find the scans. Yes I have tried this on a couple texts already. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 20 17:35:30 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 20 Apr 2010 20:35:30 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <1d280.45fdb600.38ffa252@aol.com> dakretz said: > Great news! Let's test this thesis. > ... > etext # 200, dated "1995-01-01". > ... > Can anyone from DP tell me how to get the scans? ok, two things... first, it looks like i was wrong when i said that d.p. had stopped maintaining the "ols", so of course my "reason" for their having stopped maintaining it was also incorrect. (or one could say it's _no_longer_ correct, but i do believe it was correct at one time.) at any rate: > http://www.pgdp.org/ols it claims 16,809 "unique books". whether that means 16,809 scansets, i do not know. but the scans for pg#31946 are right there, online... second, the scan-sets from the very earliest books were said to be "inconvenient to get to right now" at one point in time. whether they were located or lost to the wind, i don't know. but that _could've_ included pg#200. the lowest p.g. numbers which are shown as being included in "ols" presently are pg#460 and pg#464, and four without any number. but are you sure that d.p. actually digitized pg#200? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 17:37:45 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 17:37:45 -0700 Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: <285bc.71539bf.38ff974a@aol.com> References: <285bc.71539bf.38ff974a@aol.com> Message-ID: >the winner has been declared. which is not to say that all you people with a kindle must send it back. if you're happy with it, that's all that matters, really. Hey BB, IF you know the limitations of the iPad [or the Kindle for that matter] and you're happy then that's all that matters, really. My wife could probably use one to watch reruns of House, probably would make her happy - maybe I'll get her one for that purpose. You are blessed in that indeed iPad will display txt files - linebreaks hardwired to 70 chars so you won't be able to use the two-finger zoom feature. It even has an applet for editing txt files - not sure how well its going to like the linebreaks. The good news about ebook readers is that most people DO seem to like what they end up buying - perhaps I'm unusual in seeing *what could have been*. As long as they read that's a good thing -- most of the people in the Apple Store I went to clearly DON'T! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 17:49:17 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 17:49:17 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com> References: <29487.1b1c9876.38ff9b35@aol.com> Message-ID: >one of the most widely-used e-book formats in the last 20 years has been the .pdf format -- a format which has not, historically, done reflow. And which is a format that is universally recognized to be a page layout descriptor language, not an ebook file format. PDF is a terrible thing to try to read on an ebook reader, unless the page layout happens to more-or-less match the size of your reader screen, and the size of the PDF font happens to be close to something your eyes like. People tend to print PDF out if its more than a few pages because it is so much more suitable to a laser printer than to an ebook reader. Google Books PDFs *do* happen to more-or-less often to match the size of the display on my DX and then its not too bad - although you are still reading a blurry photocopy with an occasional finger stuck in for good measure.. http://en.wikipedia.org/wiki/Portable_Document_Format -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 20 17:56:06 2010 From: jimad at msn.com (James Adcock) Date: Tue, 20 Apr 2010 17:56:06 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com> References: <29487.1b1c9876.38ff9b35@aol.com> Message-ID: >besides, even if we _accepted_ your stupid definition, it still doesn't compute, jim, because an ascii-file like the p.g. e-text format _can_ be reflowed, quite easily. you just take out the mid-paragraph hard line-breaks. And it will work most of the time. Go ahead and write your reflow "txt ebook reader" for the iPad -- ideally one that will allow downloading txt from the internet to the iPad via wifi - I want to see it up on the Apple Apps Store. Charge a buck for it and see how many sell - I would be curious. Maybe you'll end up a millionaire. I'll buy one even if I don't have an iPad! ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Tue Apr 20 19:12:06 2010 From: dakretz at gmail.com (don kretz) Date: Tue, 20 Apr 2010 19:12:06 -0700 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <1d280.45fdb600.38ffa252@aol.com> References: <1d280.45fdb600.38ffa252@aol.com> Message-ID: Interesting page - I've never seen it before. Wonder what it's for? The Britannica projects I can actually find fall into two groups. 1. The group where I can't see the images because *Available Formats: Display of images from this source has not been > permitted.* > 2. The group where I can see a page-at-a-time image viewer, by not a single image actually shows up. They all get that thing you get when there's a url, but the file is missing. So zero for sixteen or so. But I guess the intent was good. Maybe they are all working their way through a queue somewhere. On Tue, Apr 20, 2010 at 5:35 PM, wrote: > dakretz said: > > Great news! Let's test this thesis. > > ... > > > etext # 200, dated "1995-01-01". > > ... > > Can anyone from DP tell me how to get the scans? > > ok, two things... > > first, it looks like i was wrong when i said > that d.p. had stopped maintaining the "ols", > so of course my "reason" for their having > stopped maintaining it was also incorrect. > (or one could say it's _no_longer_ correct, > but i do believe it was correct at one time.) > > at any rate: > > http://www.pgdp.org/ols > > it claims 16,809 "unique books". > > whether that means 16,809 scansets, i do not know. > but the scans for pg#31946 are right there, online... > > second, the scan-sets from the very earliest books > were said to be "inconvenient to get to right now" > at one point in time. whether they were located or > lost to the wind, i don't know. but that _could've_ > included pg#200. the lowest p.g. numbers which > are shown as being included in "ols" presently are > pg#460 and pg#464, and four without any number. > > but are you sure that d.p. actually digitized pg#200? > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.pulliam at gmail.com Tue Apr 20 21:07:30 2010 From: kevin.pulliam at gmail.com (Kevin Pulliam) Date: Tue, 20 Apr 2010 23:07:30 -0500 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <1d280.45fdb600.38ffa252@aol.com> References: <1d280.45fdb600.38ffa252@aol.com> Message-ID: On the Open Library System, I note that high resolution gray-scale scans (at least for the one project I checked) are not archived, though the black and white scans are (though the example I checked, the Astounding Magazine scans were actually microfilm scans IIRC, which was a strange case but also what made the higher resolution scans helpful). I also note that there is no 'bulk' download function to get a zip of all the files associated with a text. But then again, something is better than nothing. Kevin On Tue, Apr 20, 2010 at 7:35 PM, wrote: SNIP > > first, it looks like i was wrong when i said > that d.p. had stopped maintaining the "ols", > so of course my "reason" for their having > stopped maintaining it was also incorrect. > (or one could say it's _no_longer_ correct, > but i do believe it was correct at one time.) > > at any rate: >>?? http://www.pgdp.org/ols > SNIP > > -bowerbird From hart at pglaf.org Wed Apr 21 01:08:52 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 01:08:52 -0700 (PDT) Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: <285bc.71539bf.38ff974a@aol.com> References: <285bc.71539bf.38ff974a@aol.com> Message-ID: > the winner has been declared.? which is not to say > that all you people with a kindle must send it back. > if you're happy with it, that's all that matters, really. > > -bowerbird I'll take that bet. . .another free lunch of cooked fowl. I'll be that the iPad never even gets HALF the market for eReaders over the long haul. mh From hart at pglaf.org Wed Apr 21 01:33:26 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 01:33:26 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Tue, 20 Apr 2010, James Adcock wrote: > >The point is, and always has been, that Apple and the iPad NEVER > were a real consideration for Mr. Adcock and he just keeps on in > the process of making more and more OBVIOUS objections that were > ALREADY OBVIOUS from the start. > > You keep saying things that are not true of me Michael, and which are not > true of the iPad. It certainly was not obvious to me that the iPad would > not allow download of ePub and MOBI via wifi. 1. I think most of what you get on iPad IS .epub, is it not? 2. I think Apple made it pretty obvious about other formats to most. > It is also not true that the iPad was not a real consideration for me, and I only have your own words upon which to base such. I wrote an extended piece about it, but our CEO has asked me to tone down my responses to you, even if you don't, so I didn't send it. However, if you ask for it I will ask him to reconsider his request. > it is also not true that I wouldn't reconsider the iPad if the future OS is As if any Apple OS, other than UNIX based, has been so. > less restrictive. I don't understand why it is that *you* are so defensive > about the iPad? Because you bought one??? I hate defending Apple, or any other billion dollar organization. However, when you come out and give the iPod Stanza app example-- well--someone has to immediately answer THIS IS NOT THE CASE!!! I provided several such examples, with no thanks for my effort. However, when you come out and say you cannot download PG files-- well--someone has to immediately come out and download PG files! No, not all formats, and certainly not all files, but the blanket statement that it cannot be done only requires ONE example to get proven false. I provided just such examples. With no thanks. However, you did finally say thanks for at least one thing, and I can't say you haven't given any thanks at all, but you certainly, we all must admit, have not been encouraging my efforts. Unless you think I thrive of discouraging remarks. > I buy a lot of Dell computers, but if someone states an opinion that Dell is > a load of cr*p then I'm not going to get bent out of shape, and if someone > says that Amazon or Mickeysoft have made a hell of a lot of stupid decisions > in their day -- well, I couldn't agree with that more! I'm just trying to balance out some rather general complaints you have made with some rather specific contradictions. If you had asked, "How can I. . ." instead of your blanket typing "you can't. . ." you might have gotten something a bit different. However, blanket statements and single examples deserve proven in direct fashion to be incorrect which it is so obvious. Let's face it, you CAN get higher-resolution eBook performance on iPads than with your example of the iPod Stanza app, and in quite a few different apps that are free of charge. Let's face it, you CAN go directly to pglaf.org and get eBooks. No, not all formats, and who knows if all titles, but lots. Let's face it, you didn't even try iBooks the first four hours. You didn't seem to want to try Wattpad, either. It's hard to consider your research as open when it's like this. I spent a lot of time and effort working to answer your questions and when I stated simple results of simple experiments you said I was flaming and trashing you. If someone says 2+2 is not 4, I have a right to challenge that in plain sight without being accused of flaming or trashing. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From hart at pglaf.org Wed Apr 21 01:39:29 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 01:39:29 -0700 (PDT) Subject: [gutvol-d] Re: Typesetting In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com> References: <29487.1b1c9876.38ff9b35@aol.com> Message-ID: Hey, when the first eBooks came out, they were all .txt files. Now someone want's to rewrite history and say they are NOT??? Because of easily strippable hard returns??? Reflow didn't even EXIST in those days. Much less WYSIWYG!!! And WYSIWYG doesn't allow reflow. . .unless you consider that as after the fact. . . . In fact this whole discussion is after the fact. eBooks have been around so much longer than these other ideal presentations. . .so let the new presentations use new names, and leave eBooks to the people who have been doing them. Don't let anyone co-opt the name eBook! Maybe they were right, and I should have trademarked "eBook," and then there would be no discussion out using the word. Sheesh! On Tue, 20 Apr 2010, Morasch at aol.com wrote: > jim said: > >?? PG txt file is NOT AN E-BOOK FILE > >?? because it does not meet at least > >?? one criterion that is universally accepted > >?? as being required of ebook file formats: > >?? namely reflow. > > jim, jim, jim, jim, jim, jim, jim. > > it's bad enough that i call you a bloomin' idiot. > > but it's even worse when you come right back > with a reply that _proves_ that's what you are. > > one of the most widely-used e-book formats > in the last 20 years has been the .pdf format -- > a format which has not, historically, done reflow. > > yet you want to rule it out _by_definition_? > > please. > > i was _fighting_ against .pdf as an e-book format > for many, many years before you even showed up, > but even i cannot deny that it _is_ an e-book format. > > _any_ file-format which can express a book _is_ > -- or can be considered as -- an e-book format. > > you seem to think you define terms of engagement, > that any discussion must be conducted according to > the way that _you_ define words.? that's bullcrap, jim. > > *** > > besides, even if we _accepted_ your stupid definition, > it still doesn't compute, jim, because an ascii-file like > the p.g. e-text format _can_ be reflowed, quite easily. > > you just take out the mid-paragraph hard line-breaks. > > _any_ e-book programmer can write code to do that... > > voila!? reflow! > > -bowerbird > > From hart at pglaf.org Wed Apr 21 01:46:02 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 01:46:02 -0700 (PDT) Subject: [gutvol-d] Re: Typesetting In-Reply-To: References: <12ba6.1c940b.38ff760e@aol.com> Message-ID: Funny how we can have so many people arguing that we should be preserving the layout of paper books and at the same time we have so much about getting rid of line breaks. . . . However, once again I must comments that the amounts of time spent on discussion would easily have made the conversions-- From Bowerbird at aol.com Wed Apr 21 01:59:34 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 04:59:34 EDT Subject: [gutvol-d] Re: Typesetting Message-ID: <15e6.16e10517.39001876@aol.com> the original linebreaks _should_ be preserved. because some people _want_ them. and those original linebreaks _should_ be easy to remove as well. because some people _want_ to remove them. what nobody wants -- not really -- is a set of _new_ linebreaks, which have no legacy import. but even those are bearable, _if_ they can be easily removed. and let us recall, again, that project gutenberg has _not_ made available a web-service which people can utilize to unwrap p.g. e-texts... _i_ have created such a web-service. but project gutenberg has not. which is a minor failing. (i'd be happy to provide my code, if you want it.) and let us recall, again, that project gutenberg does _not_ ensure that every one of its e-texts is structured so that it can be unwrapped properly. this one is a _major_ failing. these are the two things that project gutenberg must do if it wants to proclaim that it has done all that it can to make its linebreaks a non-issue. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Wed Apr 21 02:26:46 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 02:26:46 -0700 (PDT) Subject: [gutvol-d] Re: Typesetting In-Reply-To: <15e6.16e10517.39001876@aol.com> References: <15e6.16e10517.39001876@aol.com> Message-ID: Let's have the code, and install it where everyone can find/use it. Right away, without further ado. . . . i.e. send the code now Help Newby set it up later. On Wed, 21 Apr 2010, Bowerbird at aol.com wrote: > the original linebreaks _should_ be preserved. > > because some people _want_ them. > > and those original linebreaks _should_ be easy > to remove as well. > > because some people _want_ to remove them. > > what nobody wants -- not really -- is a set of > _new_ linebreaks, which have no legacy import. > > but even those are bearable, _if_ they can be > easily removed. > > and let us recall, again, that project gutenberg > has _not_ made available a web-service which > people can utilize to unwrap p.g. e-texts... > > _i_ have created such a web-service. > > but project gutenberg has not. > > which is a minor failing. > > (i'd be happy to provide my code, if you want it.) > > and let us recall, again, that project gutenberg > does _not_ ensure that every one of its e-texts is > structured so that it can be unwrapped properly. > > this one is a _major_ failing. > > these are the two things that project gutenberg > must do if it wants to proclaim that it has done > all that it can to make its linebreaks a non-issue. > > -bowerbird > > From marcello at perathoner.de Wed Apr 21 03:05:19 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 21 Apr 2010 12:05:19 +0200 Subject: [gutvol-d] Re: Typesetting In-Reply-To: References: <29487.1b1c9876.38ff9b35@aol.com> Message-ID: <4BCECDDF.4030507@perathoner.de> Michael S. Hart wrote: > Maybe they were right, and I should have trademarked "eBook," > and then there would be no discussion out using the word. Rewriting history again? Your name for the beast was "etext". This is the oldest file by timestamp we have in the archive http://www.gutenberg.org/dirs/2/25/old/world91a.txt and it contains no reference to "ebook". -- Marcello Perathoner webmaster at gutenberg.org From greg at durendal.org Wed Apr 21 04:29:40 2010 From: greg at durendal.org (Greg Weeks) Date: Wed, 21 Apr 2010 07:29:40 -0400 (EDT) Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: References: <1d280.45fdb600.38ffa252@aol.com> Message-ID: On Tue, 20 Apr 2010, Kevin Pulliam wrote: > On the Open Library System, I note that high resolution gray-scale > scans (at least for the one project I checked) are not archived, > though the black and white scans are (though the example I checked, > the Astounding Magazine scans were actually microfilm scans IIRC, > which was a strange case but also what made the higher resolution > scans helpful). I also note that there is no 'bulk' download function > to get a zip of all the files associated with a text. In the interest of having the high-res raw gray scans available I put them on the Internet Archive before they went to DP. -- Greg Weeks http://durendal.org:8080/greg/ From hart at pglaf.org Wed Apr 21 09:15:45 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 09:15:45 -0700 (PDT) Subject: [gutvol-d] Re: Typesetting In-Reply-To: <4BCECDDF.4030507@perathoner.de> References: <29487.1b1c9876.38ff9b35@aol.com> <4BCECDDF.4030507@perathoner.de> Message-ID: Once again Marcello [intentionally?] misses the point!!! No reason I couldn't have trademarked "ebook," too, is there? On Wed, 21 Apr 2010, Marcello Perathoner wrote: > Michael S. Hart wrote: > > > Maybe they were right, and I should have trademarked "eBook," > > and then there would be no discussion out using the word. > > Rewriting history again? Your name for the beast was "etext". > > > This is the oldest file by timestamp we have in the archive > > http://www.gutenberg.org/dirs/2/25/old/world91a.txt > > and it contains no reference to "ebook". > > > From lee at novomail.net Wed Apr 21 09:58:17 2010 From: lee at novomail.net (Lee Passey) Date: Wed, 21 Apr 2010 10:58:17 -0600 Subject: [gutvol-d] Tidy -c and tables Message-ID: <4BCF2EA9.1000402@novomail.net> OK guys, we have a problem. When one uses the "--clean" option, tidy removes any "
" elements and replaces them with "
", and adds "div.c1 {text-align: center}" to the internal style sheet. This seems reasonable, because according to the HTML spec, "The CENTER element is exactly equivalent to specifying the DIV element with the align attribute set to 'center'." In a bit of a chained dependency, it turns out the "align" attribute is /also/ deprecated in favor of the CSS "text-align" style. So Tidy's behavior is completely consistent with the HTML spec, and in theory should cause no presentational differences before and after a page is Tidy'ed. In theory, there is no difference between theory and reality; in reality, there is. Consider the following snippet:
line one
a longer line two
a very much longer line three
Using my four test browsers, Firefox 3,5, IE 8, Opera 9 and Safari 4, in each case the above table was center in the browser, but the text inside the table data element remained left justified. When I changed the "
" element to "
" the text inside the table data element became centered as well. This is the behavior I would expect; the whole notion of "Cascading" in CSS indicates that style continue down the tree until changed. But it does illustrate the fact that there is a distinction between centering an /element/ (in this case the table), and centering the text /inside/ an element. So while, in theory, the "
" element should be equivalent to "
", in practice it seems that not only are they not equivalent in /some/ browsers, they are not equivalent in /any/ browser. I believe one of our design goals was that Tidy would make no change to otherwise valid HTML that would cause it to render differently using browser defaults after Tidying. Thus, empty paragraphs, which are forbidden, are converted to /two/ "br />" elements, to match the default paragraph presentation in browsers. Leaving aside the fact that the use of tables to control layout is simply morally reprehensible, the fact is that there a many, many pages 'in the wild' that do so. And Tidy's current behavior will cause those pages' presentations to change after running Tidy. I think that in this case we have not met our design goal. Now I can fix the code so that this doesn't happen in the future, if only I knew what the right fix /is/. I could simply remove "center" from the list of elements that get 'cleaned', and print a warning that the resulting contains elements that are deprecated (this warning probably ought to be there whenever deprecated elements remain in the output). Or I could focus more directly on this specific issue and whenever a "" is a descendant of a "
" element I could add "style='text-align:left'" to the "
" element (assuming a "text-align" style is not already attached to that element) /before/ cleaning (both styles should then be moved to the internal style sheet). Or perhaps there is yet another solution that I haven't thought of? I don't think that simply telling the end user "your HTML doesn't follow the rules; we could fix it but we won't" is an option; after all, that's what Tidy is for right? So, what should I do? ps. I don't like the behavior that the "--drop-font-tags" option also drops "
" elements; page layout is not in the same classification as font appearance, and I can envision situations where I would want to drop "" elements but retain "
" elements. But that is an argument for another day. From lee at novomail.net Wed Apr 21 10:13:29 2010 From: lee at novomail.net (Lee Passey) Date: Wed, 21 Apr 2010 11:13:29 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCD8BA4.6070803@perathoner.de> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <4BCC9527.3000103@novomail.net> <4BCCBA89.2040103@novomail.net> <4BCD8BA4.6070803@perathoner.de> Message-ID: <4BCF3239.6000508@novomail.net> On 4/20/2010 5:10 AM, Marcello Perathoner wrote: > Lee Passey wrote: > >> creates a "
" around the tables of contents and >> illustrations, with a corresponding style sheet that centers the >> contents (which it should not), > > HTML Tidy does that. You are correct. There is apparently a disconnect between the official HTML specification for the "
" element and the implementation on all major browser. For informational purposes, I have CC'ed this list with my message to the Tidy developers list on SourceForge. Until I get the matter resolved, I would recommend you /not/ use the --clean option with Tidy. > Direct your complaints to the w3c. Why? They wouldn't and couldn't do anything about it. Tidy was developed by a member of the W3C, but he has long since abandoned any involvement with the project. Today, the Tidy project is an independent project based at http://www.sourceforge.net/projects/tidy. If you come across a bug in Tidy (or wish an enhancement), please log it at http://sourceforge.net/tracker/?group_id=27659&atid=390963. From lee at novomail.net Wed Apr 21 10:50:00 2010 From: lee at novomail.net (Lee Passey) Date: Wed, 21 Apr 2010 11:50:00 -0600 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <4BCDC069.2050608@yahoo.com> References: <4BCAEB9A.2040105@perathoner.de> <20100418150509.8A3501008D@cardano.dm.unipi.it> <20100418170536.GA22578@pglaf.org> <20100419021856.6C702100B0@cardano.dm.unipi.it> <4BCC1F14.1090801@perathoner.de> <4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de> <4BCDC069.2050608@yahoo.com> Message-ID: <4BCF3AC8.5070405@novomail.net> On 4/20/2010 8:55 AM, Julia C. Miller wrote: > > On 4/19/2010 12:35 PM, Marcello Perathoner wrote: > >> Julia C. Miller wrote: >> >>> In order for a "paradigm shift" to happen at DP, PG has to define >>> what is and is not acceptable in the HTML and spell it out so that DP >>> can put it into practice. >> >> It would be much better if DP did that. > > So after DP goes through the time and effort to define the standards to > upload to PG, people from PG can say "No, that's not what we want"? Sure. They can do that now with any of DP's offerings. But they won't. With the exception of Mr. Perathoner, I would be surprised if there were any of the Powers That Be at PG who know enough about HTML to be able to determine if an HTML file were "good" or "bad." And there are plenty of "bad" HTML files in the PG archive already. If DP were to develop standards for HTML files, they would become the /de facto/ HTML standard for PG, although no one but DP would actually enforce them. If you can help convince DP to establish HTML guidelines and standards, I think you ought to try, if for no other reason than to produce guidelines that can be used independently of DP. DP is moribund, but not nearly as moribund as PG. From jimad at msn.com Wed Apr 21 12:06:17 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 12:06:17 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: >I spent a lot of time and effort working to answer your questions and when I stated simple results of simple experiments you said I was flaming and trashing you. And I spent a lot of time and effort and a little bit of money in an Apple Store trying out what you suggested and it didn't work. Yes I can download something from PG, just not ePub nor MOBI. Yes Apple provides something on iBooks, just not something with the PG name in it. Yes Apple provides something on iPad but just no way to use wifi to do content dev in ePUB or MOBI aka SR for DP or solos. Yes you can overcome these limitations if you use USB instead of wifi but I thought the whole point of iPad at least from my point of view is that it HAS wifi. Well, so does a nook and a nook doesn't allow you to use it either. Etc. I think I was pretty clear about what I wanted, and you kept claiming iPad could do it, and I kept trying it, and guess what it can't -- at least not by way of any of your suggestions, nor by way of anything else listed under "books" or "ebooks" in the Apple App Store. I've tried about two dozen applets by now including the ones you suggested. I think its fair to say I've wasted much more time on this subject by now and done much more research into it than you have, so its not clear to me what *you* are complaining! From hart at pglaf.org Wed Apr 21 12:14:04 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 12:14:04 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <3426.7f772dcd.38fa3269@aol.com> <4BC98D3A.6080908@perathoner.de> Message-ID: On Wed, 21 Apr 2010, James Adcock wrote: > >I spent a lot of time and effort working to answer your questions > and when I stated simple results of simple experiments you said I > was flaming and trashing you. > > And I spent a lot of time and effort and a little bit of money in an Apple > Store trying out what you suggested and it didn't work. > > Yes I can download something from PG, just not ePub nor MOBI. You are either not reading my reports, or ignoring them. Go back and try again, otherwise I'll just let you talk yourself out, as has been suggested to me already. > Yes Apple provides something on iBooks, just not something with the PG name > in it. As above. You're just not trying what I suggested. You are doing something else, then complaining it didn't work. You are obviously just not willing to put in the effort, neither on your iPad research, nor in holding up your end of the conversation. This has been a VERY BUSY WEEK/MONTH for me, and I have give you in excess of what it appears I should have. My apologies. . .I am sure you, and others, would have been happier had I simply ignored, which I will start to do now, as advised. First of all, please let me apologize for having been so busy, it has been an incredible few weeks coming up to and now after my first university wide acceptance of my work, and a speech I gave about a week ago, from which I still have not caught up a whole way to my normal energetic levels. I'm still catching up on my sleeping, even sleeping through an earlier half of the garage sales. If you did not know, garage sales are pretty much my favorite thing to do along with work. Therefore, my messages may have been entirely too brief, or to the point, or not full of the materials I am world renowned to borrow at length from "The Tact and Diplomacy Department." If you look up the motto of the department, it is so obvious. Meanwhile, I am now trying to make contact with all those whom I promised I would about a week ago, none of whom have done it in my direction, so I really have no idea if YOU were serious, when it came to continuing our discussion. Normally I presume if someone has not contacted me in a week-- they are not interested at all--and waiting additional weeks-- rarely proves otherwise. However. . .MY INTEREST has not waned. . . . So, if you are willing to pursue our conversation further just let me know, and if not, no reply is required. After all, it IS "The Year of the eBook," and I expect busy to busier to busiest, when it comes to all the years of my life. If you would like to keep up with my thoughts and events I can put you on a list I send to at odd times with even odder junk. Again, if not, no reply is required, no offense taken. It was very nice talking with you, Michael S. Hart Founder Project Gutenberg, Inventor of eBooks > Yes Apple provides something on iPad but just no way to use wifi to do > content dev in ePUB or MOBI aka SR for DP or solos. > > Yes you can overcome these limitations if you use USB instead of wifi but I > thought the whole point of iPad at least from my point of view is that it > HAS wifi. Well, so does a nook and a nook doesn't allow you to use it > either. > > Etc. > > I think I was pretty clear about what I wanted, and you kept claiming iPad > could do it, and I kept trying it, and guess what it can't -- at least not > by way of any of your suggestions, nor by way of anything else listed under > "books" or "ebooks" in the Apple App Store. I've tried about two dozen > applets by now including the ones you suggested. I think its fair to say > I've wasted much more time on this subject by now and done much more > research into it than you have, so its not clear to me what *you* are > complaining! > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From jimad at msn.com Wed Apr 21 12:31:48 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 12:31:48 -0700 Subject: [gutvol-d] Re: Typesetting In-Reply-To: <15e6.16e10517.39001876@aol.com> References: <15e6.16e10517.39001876@aol.com> Message-ID: >but even those are bearable, _if_ they can be easily removed. The linebreaks are removable if PG enforces standards on txt files submitted. When people make mistakes on those submissions, and they will, then the linebreaks will not be easily removed correctly. Books of poetry or containing poetry are one common counterexample. Make a copy of your linebreak removal routine public in the common computer formats BB, and let us test it and see just how easily it works on the existing PG txt files. The Unicode txt efforts are not too bad because at least then people can choose to represent the glyphs the typesetter chose if they choose to do so, rather than guessing and reinterpreting intent. Italic and SC is then still clearly a loss, as is graphics. Most books use a least italics, so I'd hate to see a PG file format that doesn't even support that. If you wanted to implement even a Unicode txt+ file format then you've got to provide renderers for the different machines. Or you auto-translate Unicode txt+ files to HTML for submitters and use the ubiquitous HTML renderers to allow people to view the Unicode txt+ version. Then submitters do not have to submit HTML unless they want to. Recent efforts about 95% of the submissions DO have HTML, but its not clear that that is because people want to provide HTML or because the WW require it. PG *is* already doing this more-or-less on the rare txt-only submissions nowadays - automagically unwrapping and translating to HTML in a way which most of the time is a win and obviously occasionally a loss. The PG legalese unfortunately is particularly unattractive in this approach, and when the unwrapping fails then it is visually distracting - "how come this paragraph isn't unwrapped - is it suppose to be poetry?" How about it? Unicode txt+ file submissions if that is what a submitter wants to do, and PG automatically renders that in HTML, and ePUB, and MOBI? But if you are willing to take txt-only submissions and autorender them into HTML accepting the resulting mistakes then why is it that you aren't willing to take HTML and autorender them into the mandatory txt70 files? Certainly going from HTML to txt70 must introduce fewer mistakes. ??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 21 12:41:54 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 12:41:54 -0700 Subject: [gutvol-d] Re: Tidy -c and tables In-Reply-To: <4BCF2EA9.1000402@novomail.net> References: <4BCF2EA9.1000402@novomail.net> Message-ID: Why tidy? Many people work hard to retain linebreaks in the HTML so the code can be gone over again at a future date and then PG throws away those linebreaks. From Bowerbird at aol.com Wed Apr 21 12:54:34 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 15:54:34 EDT Subject: [gutvol-d] Re: Typesetting (unwrap.pl) Message-ID: <21579.59641b06.3900b1fa@aol.com> michael said: > Let's have the code sure thing, boss. > and install it where everyone can find/use it. great idea... > Help Newby set it up later. i don't think he will need any help, but yeah, sure. and, of course, i invite people to improve the script. -bowerbird =========================================== #!/usr/local/bin/perl -w use CGI::Carp qw(fatalsToBrowser); ########### read the user input read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); # Split the name-value pairs @pairs = split(/&/, $buffer); foreach $pair (@pairs) { ($name, $value) = split(/=/, $pair); # Un-Webify plus signs and %-encoding $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ s///g; if ($allow_html != 0) { $value =~ s/<([^>]|\n)*>//g; } $FORM{$name} = $value; $value =~ s/\cM//g; if ($name eq "theinput") {$thebook=$value}; } if ($thebook eq "") { $thebook='paste the text you want to unwrap in this field, and click "unwrap"...' }; print "content-type: text/html\n\n"; print ''; print "\n"; print "\n"; print 'unwrap p.g. paragraphs'; print ''; print "\n"; print ''; print '
';

print '
'; ######################################################### print ' '; print "\n"; print ''; print "\n"; print '

'; print ''; print '

'; print ''; ### the numbers here refer to a list of steps ### i posted in a message to gutvol-d ### #1 #skip #2 $thebook =~ s/\r\n/\n/g ; #3 $thebook =~ s/\r/\n/g ; #4 #skip #5 #skip #6 $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; #7 $thebook =~ s/\n/ \n/g ; #8 $thebook =~ s/ \n \n/\n\n/g ; #9 $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; #10 # wait! not yet! #11 $thebook =~ s/ \n /\n /g ; # maybe clone this for an asterisk in column 1, and # clone this for a number in column 1 which is # followed by a period-space in columns 2-3. $thebook =~ s/ \n>/\n>/g ; $thebook =~ s/ \n
"; -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Wed Apr 21 13:04:57 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 16:04:57 EDT Subject: [gutvol-d] Re: EBook formats on iPad via wifi Message-ID: <22152.5a5833e1.3900b469@aol.com> jim said: > so its not clear to me what *you* are complaining! i think he's complaining because he thought he was taking part in an actual dialog, so when he realized he'd been suckered into a bitch session, he chafed... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Wed Apr 21 14:09:33 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 17:09:33 EDT Subject: [gutvol-d] Re: a longstanding question has finally been answered Message-ID: <2e057.4a7c0f95.3900c38d@aol.com> michael said: > I'll take that bet. . .another free lunch of cooked fowl. pardon me? are you implying that you have won a bet against me in the past? _when?_ > I'll be that the iPad never even gets HALF > the market for eReaders over the long haul. you can't be serious. the ipad is already up to 500,000 sold. besides, by "multipurpose machine", i certainly include the iphone and the ipodtouch in there, and -- as i said -- apple has sold 80 million... they sold 8.75 million iphones in the first quarter, an _increase_ over the previous (_holiday_) quarter, which is an absolutely astonishing accomplishment. we're seeing a juggernaut, and it's gathering steam, all because they gave people multipurpose machines that can be carried around with the greatest of ease... there are some of us who _knew_ this'd be killer. (and, gee, michael, i thought you were one of us.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcello at perathoner.de Wed Apr 21 14:15:13 2010 From: marcello at perathoner.de (Marcello Perathoner) Date: Wed, 21 Apr 2010 23:15:13 +0200 Subject: [gutvol-d] Re: Tidy -c and tables In-Reply-To: References: <4BCF2EA9.1000402@novomail.net> Message-ID: <4BCF6AE1.8060900@perathoner.de> James Adcock wrote: > Why tidy? Because I have to convert all the crooked HTML that has been posted in 20 years into valid XHTML. > Many people work hard to retain linebreaks in the HTML so the code can be > gone over again at a future date and then PG throws away those linebreaks. It is simpler to fix the HTML than to fix the Epub, so why should the Epub retain the line breaks? -- Marcello Perathoner webmaster at gutenberg.org From jimad at msn.com Wed Apr 21 14:28:25 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 14:28:25 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: <22152.5a5833e1.3900b469@aol.com> References: <22152.5a5833e1.3900b469@aol.com> Message-ID: >i think he's complaining because he thought he was taking part in an actual dialog, so when he realized he'd been suckered into a bitch session, he chafed... Well, I guess we both suffered in this regard because I'm also just back from a trip yet I made two long trips to the mall to try out his suggestions and they didn't work. If I had thought it was just a B session I surely wouldn't have bothered to make the trips. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 21 15:02:48 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 15:02:48 -0700 Subject: [gutvol-d] Re: Tidy -c and tables In-Reply-To: <4BCF6AE1.8060900@perathoner.de> References: <4BCF2EA9.1000402@novomail.net> <4BCF6AE1.8060900@perathoner.de> Message-ID: >It is simpler to fix the HTML than to fix the Epub, so why should the Epub retain the line breaks? Sorry, if you say that tidy is only being used to generate epubs not to modify the posted HTML then fine. On one of my previous HTML submissions a WW said he had run tidy on it. Obviously the intent is to allow future DP'ers or PG'ers who have figured out a better scheme, TEI Lite or whatever (hypothetical), to make another DP pass or solo on the effort by extracting the already "corrected" txt matched against the original OCR rather than having to start again "from scratch." And again pgdiff can extract linebreak info given a txt which has lost linebreaks and an OCR that retains them, but, its still cleaner and easier not to have lost them in the first place. From hart at pglaf.org Wed Apr 21 15:25:24 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 15:25:24 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <22152.5a5833e1.3900b469@aol.com> Message-ID: Jim, learn to speak correctly, please. I think it would have been better for all concerned if you had said, on each occasion, something like: "I tried out his suggestions and they didn't work the way I wanted." My suggestions worked more than fine for myself and for many others, or so it would seem, but not for you. You have made a handful of absolute statements that appear to be 100 percent this or that, and I have refuted them at least to the points where they are leaking and you have start bailing water. This is what happens when ideas are half matching and half not. . .! Get used to it. . .please. Stop making such absolute statements as iPad cannot do this or that, and start making statements such as the iPad doesn't do this in some way that I would prefer, such as. . .then be specific. I did what I did. . .you can't actually deny that I did these things but you CAN say that this is not exactly what _I_ had in mind when I said the words that prompted you to try those things. There is a spirit of cooperation that has been lacking from the get- go and it is both between you and Apple and between you and me. It would be nice, very nice, if we could fix that up a little. Sincerely, Michael On Wed, 21 Apr 2010, James Adcock wrote: > > >i think he's complaining because he thought he was > taking part in an actual dialog, so when he realized > he'd been suckered into a bitch session, he chafed... > > Well, I guess we both suffered in this regard because I?m also just back > from a trip yet I made two long trips to the mall to try out his > suggestions and they didn?t work.? If I had thought it was just a B > session I surely wouldn?t have bothered to make the trips. > > > From Bowerbird at aol.com Wed Apr 21 15:25:25 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 18:25:25 EDT Subject: [gutvol-d] Re: EBook formats on iPad via wifi Message-ID: <33b17.38321b87.3900d555@aol.com> well, jim, i'm sorry you wasted some of your time following up on michael's suggestions. but i think if you would have phrased your complaints a bit better in the first place, you could've avoided the misunderstanding. i'm not just talking about your disingenuous means of (mis)defining terms like "e-book", either (although that is a serious error too), but a bad case of failure to qualify yourself. to my point, if you would have said this... > i have found it impossible to download > the books i want from the sites i want > in the formats i want such that i can > read them in the viewer-apps i want... ...you wouldn't have engendered opposition. indeed, you might have gotten a whole lot of sympathy. (or, realistically, a little bit.) and perhaps even received a few pointers... but that's not what you said, not at the outset. what you said initially sounded more like: > the ipad is so locked down that you > can only get the e-books steve jobs > allows you to get, and that sucks... that's a paraphrase, of course, but i think that that's what it sounded like to people. but of course we know that that's not true, not on the face of it. there's a browser on the ipad, so anything that's out on the web is something the ipad can readily display... put it this way. if i were to offer to pay you $100 for every e-book you read on the ipad, how many "e-books" could you find to "read"? yeah, that's what i thought; no shortage then. yes, there is a walled-in, locked-up section of the ipad, but we all know about that, and what good does it do to bitch about it here? it contributes nothing productive to a thread. to sum up, hyperbole doesn't work well if you don't know how to work it well... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 21 15:25:08 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 15:25:08 -0700 Subject: [gutvol-d] Re: Typesetting (unwrap.pl) In-Reply-To: <21579.59641b06.3900b1fa@aol.com> References: <21579.59641b06.3900b1fa@aol.com> Message-ID: Sorry, (not a perl programmer) perhaps you can provide some hints on how to get this to work. I tried it on my machine and this is what I got: C:\JIM\Perl>bbunwrap.pl < matra11.txt > matra11.html [Wed Apr 21 15:16:25 2010] bbunwrap.pl: Name "main::allow_html" used only once: possible typo at C:\JIM\Perl\bbunwrap.pl line 15. [Wed Apr 21 15:16:25 2010] bbunwrap.pl: Name "main::FORM" used only once: possib le typo at C:\JIM\Perl\bbunwrap.pl line 18. [Wed Apr 21 15:16:25 2010] bbunwrap.pl: Use of uninitialized value in read at C: \JIM\Perl\bbunwrap.pl line 5. [Wed Apr 21 15:16:25 2010] bbunwrap.pl: Use of uninitialized value $thebook in s tring eq at C:\JIM\Perl\bbunwrap.pl line 24. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Wed Apr 21 15:33:17 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 15:33:17 -0700 (PDT) Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: <2e057.4a7c0f95.3900c38d@aol.com> References: <2e057.4a7c0f95.3900c38d@aol.com> Message-ID: On Wed, 21 Apr 2010, Bowerbird at aol.com wrote: > michael said: > >?? I'll take that bet. . .another free lunch of cooked fowl. > > pardon me?? are you implying that you have > won a bet against me in the past?? _when?_ Every single time! > >?? I'll be that the iPad never even gets HALF > >?? the market for eReaders over the long haul. > > you can't be serious. > > the ipad is already up to 500,000 sold. Then place your bets, ladies and gentlemen!!! > besides, by "multipurpose machine", i certainly > include the iphone and the ipodtouch in there, > and -- as i said -- apple has sold 80 million... > > they sold 8.75 million iphones in the first quarter, > an _increase_ over the previous (_holiday_) quarter, > which is an absolutely astonishing accomplishment. > > we're seeing a juggernaut, and it's gathering steam, > all because they gave people multipurpose machines > that can be carried around with the greatest of ease... > > there are some of us who _knew_ this'd be killer. > (and, gee, michael, i thought you were one of us.) > > -bowerbird I am fine with the iPad, iPhone and iPod. However, I stand by my offer to accept your terms. What numbers over how many years. . . . By the way, I like my cooked fowl with a little spine. Who was it that predicted this whole cellphone thing?! Eh? By when you you think iPad will have half the market? Just counting eReaders, which is to your advantage. Even just counting Kindle, Noon and Sony??? I don't mean some fake "market" narrowed down in some small portion of space-time, I mean the grand total. Pick a date!!! From Bowerbird at aol.com Wed Apr 21 15:43:12 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 18:43:12 EDT Subject: [gutvol-d] Re: Typesetting (unwrap.pl) Message-ID: <34c52.130addae.3900d980@aol.com> sorry, jim, i'm just a kindergartner when it comes to perl, so i'm not sure i can help with any debugging, but i'll try... > Name "main::allow_html" used only once: that was code i just pulled in, but i didn't "allow html", so, practically, you can throw out that loop altogether... (but also see the note below.) > Name "main::FORM" used only once: > possible typo at C:\JIM\Perl\bbunwrap.pl line 18. that construct is just to split the buffer, so you don't really need it. the only variable of any interest is "theinput", so just strip "theinput=" off the buffer and you're good. note that $theinput is then dumped into $thebook. (but also see the note below.) > Use of uninitialized value in read at C: that command reads the buffer that's submitted to the script when it's mounted on a website, so you'll have to rewrite it if you wanna run it offline, which is what it appears you are trying to do here. read this note: what you would do instead (and it nullifies all of the errors that we've discussed so far, is to open the text-file on your machine, put it into $thebook, and proceed to this line: > print "content-type: text/html\n\n"; > Use of uninitialized value $thebook > in string eq at C:\JIM\Perl\bbunwrap.pl line 24. looks like your version of perl wants the variables to be initialized, so just go ahead and do that... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 21 15:44:31 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 15:44:31 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <22152.5a5833e1.3900b469@aol.com> Message-ID: >My suggestions worked more than fine for myself and for many others, or so it would seem, but not for you. OK, but then be clear that you are suggesting something different than what I was asking for. You could have said for example "I don't know how to do what you want to do but if you try this app then it allows you to read their selection of free PDF books instead of getting your choice of ePUB or MOBI books." I think I was pretty clear that what I wanted was a way to download a free ePUB or a MOBI file that I find at some location on the web to an iPad and read it there -- that is after all what most people would consider "The eBook Experience" -- the ability to actually HAVE an ePub or MOBI just like you have HAVE a paperback or you actually HAVE a printout if you prefer to print out a postscript copy of a PG book on your laserprinter. And by HAVING something I mean you can take it with you and read it on an airplane or on a beach -- all those things that people are used to doing with a paperback or a printout and are used to doing with other ebook readers. I would hope we could agree by now that this is not the iPad business model. Rather the iPad business model is either you "buy" the book from Apple (including a subset of "free" books that Apple has rebranded as coming from Apple), or if you are a publisher you write your own applet for iPad to distribute your own works (I guess PG can write its own applet if it wants to have a presence on iPad but I'm not sure I'm the one to take that one on -- maybe PG already has an iPhone programmer somewhere who can take that one on?) or if you are the person who actually bought the iPad you are given your own degraded transfer path via internet->desktop->iTunes->USB-iPad where presumably Apple is blocking that wifi transfer path for the same reason that B&N nook is blocking the wifi transfer path, namely to sell more books. Sorry but having already hooked up a ebook reader to my desktop by USB 1000+ times I can assure you that the USB connection path starts to get really really old! From Bowerbird at aol.com Wed Apr 21 15:55:56 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 18:55:56 EDT Subject: [gutvol-d] Re: a longstanding question has finally been answered Message-ID: <35839.586a3a6c.3900dc7c@aol.com> michael said: > Every single time! you've never won a bet against me. ever. > Then place your bets, ladies and gentlemen!!! you've already lost. 80 million versus 3 million. > I am fine with the iPad, iPhone and iPod. you've already lost. 80 million versus 3 million. > However, I stand by my offer to accept your terms. you've already lost. 80 million versus 3 million. > What numbers over how many years. . . . the only fair way is "since made available for sale". but who needs to be fair? let the sony/kindle/nook/whatever dedicated machines (even palm and rocketbook!) have their huge head-start in time... because even with it, they've fallen far behind. > By the way, I like my cooked fowl with a little spine. cook it however you like! if you can catch it, that is... :+) > Who was it that predicted this whole cellphone thing?! so why are you backing off it now? a cellphone that's also being used as an e-reader is -- by definition -- a multipurpose machine... > By when you you think iPad will have half the market? oh, now you want to make it just the ipad? ok, no problem. > Just counting eReaders, which is to your advantage. > Even just counting Kindle, Noon and Sony??? you've already lost. 80 million versus 3 million. > I don't mean some fake "market" narrowed down in > some small portion of space-time, I mean the grand total. > Pick a date!!! "since made available for sale"... 1-year-out vs. 1-year-out, 2-years-out vs. 2-years-out, etc. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Wed Apr 21 16:00:33 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 16:00:33 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <22152.5a5833e1.3900b469@aol.com> Message-ID: On Wed, 21 Apr 2010, James Adcock wrote: > >My suggestions worked more than fine for myself and for many others, or so > it would seem, but not for you. > > OK, but then be clear that you are suggesting something different than what > I was asking for. You could have said for example "I don't know how to do > what you want to do but if you try this app then it allows you to read their > selection of free PDF books instead of getting your choice of ePUB or MOBI > books." I did what you said could not be done. You did not mention .mobi when you started, did you? I was specific about each program I used, but I perhaps should have once again made it obvious that when I download from pglaf.org I am getting the .txt files, and I don't recall you specifying formats-- just that it was impossible to download eBooks from pglaf.org Which I proved false. I don't recall anything about about .pdf, .mobi, or .epub at first. To me this was merely, and I apologize, part of the bitch sessions. > I think I was pretty clear that what I wanted was a way to download a free > ePUB or a MOBI file that I find at some location on the web to an iPad and > read it there -- that is after all what most people would consider "The > eBook Experience" -- the ability to actually HAVE an ePub or MOBI just like Again, I must remind you once more, I am the wrong person to talk to about saving in YOUR favorite format. . .that is strictly up to YOU, not to me. I downloaded files, I can take them on a plane or to the beach. I don't deal with paper, again you have the wrong person. > you have HAVE a paperback or you actually HAVE a printout if you prefer to > print out a postscript copy of a PG book on your laserprinter. And by > HAVING something I mean you can take it with you and read it on an airplane > or on a beach -- all those things that people are used to doing with a > paperback or a printout and are used to doing with other ebook readers. This I must say I doubt, but it is really non-sequitur to what has passed. > I would hope we could agree by now that this is not the iPad business model. > Rather the iPad business model is either you "buy" the book from Apple > (including a subset of "free" books that Apple has rebranded as coming from A very large subset. . .perhaps even larger than any other comparable subset. Comparable meaning you can use something like "NOT Mark Twain" as a subset. > Apple), or if you are a publisher you write your own applet for iPad to Again, I must once again refer you to Wattpad, for the fifth? time. > distribute your own works (I guess PG can write its own applet if it wants > to have a presence on iPad but I'm not sure I'm the one to take that one on So far we have been pretty happy with the Wattpad app, but, yes, I think in time we SHOULD write out own apps. > -- maybe PG already has an iPhone programmer somewhere who can take that one > on?) or if you are the person who actually bought the iPad you are given > your own degraded transfer path via internet->desktop->iTunes->USB-iPad > where presumably Apple is blocking that wifi transfer path for the same > reason that B&N nook is blocking the wifi transfer path, namely to sell more > books. Sorry but having already hooked up a ebook reader to my desktop by > USB 1000+ times I can assure you that the USB connection path starts to get > really really old! I'm glad you brought up that the nook doesn't allow "real" wifi [at all!!!] I was going to get after you about that, and the other things nook, Sony or Kindle do to herd you onto the "company store" turf. Not sure how many remember "company stores" these days. mh > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From hart at pglaf.org Wed Apr 21 16:02:47 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 16:02:47 -0700 (PDT) Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: <35839.586a3a6c.3900dc7c@aol.com> References: <35839.586a3a6c.3900dc7c@aol.com> Message-ID: Total market. . .iPad versus Kindle, Sony and nook, leave the rest. When will there be more iPads??? A little spine here. . .come on! On Wed, 21 Apr 2010, Bowerbird at aol.com wrote: > michael said: > >?? Every single time! > > you've never won a bet against me.? ever. > > > >?? Then place your bets, ladies and gentlemen!!! > > you've already lost.? 80 million versus 3 million. > > > >?? I am fine with the iPad, iPhone and iPod. > > you've already lost.? 80 million versus 3 million. > > > >?? However, I stand by my offer to accept your terms. > > you've already lost.? 80 million versus 3 million. > > > >?? What numbers over how many years. . . . > > the only fair way is "since made available for sale". > > but who needs to be fair? > > let the sony/kindle/nook/whatever dedicated machines > (even palm and rocketbook!) have their huge head-start > in time...? because even with it, they've fallen far behind. > > > >?? By the way, I like my cooked fowl with a little spine. > > cook it however you like!? if you can catch it, that is...???? :+) > > > >?? Who was it that predicted this whole cellphone thing?! > > so why are you backing off it now? > > a cellphone that's also being used as an e-reader > is -- by definition -- a multipurpose machine... > > > >?? By when you you think iPad will have half the market? > > oh, now you want to make it just the ipad?? ok, no problem. > > > >?? Just counting eReaders, which is to your advantage. > >?? Even just counting Kindle, Noon and Sony??? > > you've already lost.? 80 million versus 3 million. > > > >?? I don't mean some fake "market" narrowed down in > >?? some small portion of space-time, I mean the grand total. > >?? Pick a date!!! > > "since made available for sale"... > > 1-year-out vs. 1-year-out, 2-years-out vs. 2-years-out, etc. > > -bowerbird > > From jimad at msn.com Wed Apr 21 16:03:54 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 16:03:54 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: <33b17.38321b87.3900d555@aol.com> References: <33b17.38321b87.3900d555@aol.com> Message-ID: >but of course we know that that's not true, not on the face of it. there's a browser on the ipad, so anything that's out on the web is something the ipad can readily display... I still think there is some fundamental misunderstanding here. Using the iPad I go on the web to PG. I see a ePub book I like there. I use the iPad web browser to go there. I click on the ePUB book. iPad says "sorry Hal I can't allow you to read that book." I don't see how you can say that the iPad "readily displays" something when it explicitly tells me that it refuses to display that something! I take my cheap crappy generic netbook, I go on the web to PG. I see a ePub book I like there. I use the cheap crappy generic netbook's web browser to go there. I click on the ePUB book. The cheap crappy netbook automatically downloads the ePUB book to the netbook so I can read it later in an airplane or on the beach, and it automatically opens it and I start reading. The netbook DOES "readily display" anything that's out on the web. >put it this way. if i were to offer to pay you $100 for every e-book you read on the ipad, how many "e-books" could you find to "read"? If you pay me $100 for every time I am part way through a book and then I pick up my iPad again and that book has magically disappeared because I am no longer in sight of a public wifi connection then I am going to come out way ahead. This is silly, Comcast offers 100s of TV channels, but if I turn on the TV channel at any moment in time the probability is 95% that Comcast will have nothing on that *I* want to watch at that moment in time. I work hard to find what I want to read, and I work hard to find texts that I want to create to submit to PG, and most of what I want to read or what I want to create to submit to PG is NOT available via the current hardwired iPad applets each distributing texts from ONE server location on the internet. If every site that offers free books writse its own applet specifically to support iPad rather than using their already existing HTML sites which support "real" HTML browsers, well, then I guess iPad would do what I want to do. But I don't understand why every organization out on the web offering free books has to write their own applet for iPad when they already HAVE written that applet -- its call an HTML web site - its just that Apple has deliberately pimped their web browser to make sure all these already existing "applets" aka HTML free ebook websites don't work! -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbnewby at pglaf.org Wed Apr 21 16:10:52 2010 From: gbnewby at pglaf.org (Greg Newby) Date: Wed, 21 Apr 2010 16:10:52 -0700 Subject: [gutvol-d] Re: Typesetting (unwrap.pl) In-Reply-To: <21579.59641b06.3900b1fa@aol.com> References: <21579.59641b06.3900b1fa@aol.com> Message-ID: <20100421231052.GA31654@pglaf.org> http://pglaf.org/cgi-bin/unwrap.pl A few small changes: #!/usr/local/bin/perl -w use strict; # gbn use CGI::Carp qw(fatalsToBrowser); ########### read the user input my $buffer; my $thebook=""; # gbn read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); # Split the name-value pairs my @pairs = split(/&/, $buffer); foreach my $pair (@pairs) { (my $name, my $value) = split(/=/, $pair); # Un-Webify plus signs and %-encoding $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ s///g; # gbn: if ($allow_html != 0) { # $value =~ s/<([^>]|\n)*>//g; # } # gbn: $FORM{$name} = $value; $value =~ s/\cM//g; if ($name eq "theinput") {$thebook=$value}; } if ($thebook eq "") { $thebook='paste the text you want to unwrap in this field, and click "unwrap"...' }; print "content-type: text/html\n\n"; print ''; print "\n"; print "\n"; print 'unwrap p.g. paragraphs'; print ''; print "\n"; print ''; print '
';

print '';
print 'action="http://pglaf.org/cgi-bin/unwrap.pl">';
#########################################################

print ' '; print 
"\n";
print ''; print 
"\n";
print '

'; print ''; print '

'; print ''; ### the numbers here refer to a list of steps ### i posted in a message to gutvol-d ### #1 #skip #2 $thebook =~ s/\r\n/\n/g ; #3 $thebook =~ s/\r/\n/g ; #4 #skip #5 #skip #6 $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; $thebook =~ s/ \n/\n/g ; #7 $thebook =~ s/\n/ \n/g ; #8 $thebook =~ s/ \n \n/\n\n/g ; #9 $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; $thebook =~ s/\n \n/\n\n/g ; #10 # wait! not yet! #11 $thebook =~ s/ \n /\n /g ; # maybe clone this for an asterisk in column 1, and # clone this for a number in column 1 which is # followed by a period-space in columns 2-3. $thebook =~ s/ \n>/\n>/g ; $thebook =~ s/ \n
"; On Wed, Apr 21, 2010 at 03:54:34PM -0400, Bowerbird at aol.com wrote: > michael said: > > Let's have the code > > sure thing, boss. > > > > and install it where everyone can find/use it. > > great idea... > > > > Help Newby set it up later. > > i don't think he will need any help, but yeah, sure. > > and, of course, i invite people to improve the script. > > -bowerbird > > =========================================== > > > #!/usr/local/bin/perl -w > use CGI::Carp qw(fatalsToBrowser); > > ########### read the user input > read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); > > # Split the name-value pairs > @pairs = split(/&/, $buffer); > foreach $pair (@pairs) { > ($name, $value) = split(/=/, $pair); > # Un-Webify plus signs and %-encoding > $value =~ tr/+/ /; > $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; > $value =~ s///g; > if ($allow_html != 0) { > $value =~ s/<([^>]|\n)*>//g; > } > $FORM{$name} = $value; > $value =~ s/\cM//g; > if ($name eq "theinput") {$thebook=$value}; > } > > > if ($thebook eq "") { > $thebook='paste the text you want to unwrap in this field, and click > "unwrap"...' > }; > > > print "content-type: text/html\n\n"; > print ''; > print "\n"; print "\n"; > print 'unwrap p.g. paragraphs'; > print ''; print "\n"; > print ''; > print '
';
> 
> print ' 
> #########################################################
> ###   note that this line has to be changed to point to the appropriate 
> place ###
> print 'action="http://z-m-l.com/go/unwrap.pl">';
> #########################################################
> 
> print ' '; print 
> "\n";
> print ''; print 
> "\n";
> print '

'; > print ''; > print '

'; > print ''; > > > ### the numbers here refer to a list of steps > ### i posted in a message to gutvol-d ### > > #1 > #skip > #2 > $thebook =~ s/\r\n/\n/g ; > #3 > $thebook =~ s/\r/\n/g ; > #4 > #skip > #5 > #skip > #6 > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > $thebook =~ s/ \n/\n/g ; > #7 > $thebook =~ s/\n/ \n/g ; > #8 > $thebook =~ s/ \n \n/\n\n/g ; > #9 > $thebook =~ s/\n \n/\n\n/g ; > $thebook =~ s/\n \n/\n\n/g ; > $thebook =~ s/\n \n/\n\n/g ; > $thebook =~ s/\n \n/\n\n/g ; > #10 > # wait! not yet! > #11 > $thebook =~ s/ \n /\n /g ; > # maybe clone this for an asterisk in column 1, and > # clone this for a number in column 1 which is > # followed by a period-space in columns 2-3. > $thebook =~ s/ \n>/\n>/g ; > $thebook =~ s/ \n $thebook =~ s/ \n\t/\n\t/g ; > $thebook =~ s| \n/tab|\n/tab|g ; > #12 > $thebook =~ s/ \n/ /g ; > > print $thebook; > > print "
"; > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d From Bowerbird at aol.com Wed Apr 21 16:20:10 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 19:20:10 EDT Subject: [gutvol-d] Re: a longstanding question has finally been answered Message-ID: <36f79.176fd0f1.3900e22a@aol.com> michael said: > Total market. . .iPad versus Kindle, Sony and nook, leave the rest. > When will there be more iPads??? > A little spine here. . .come on! ipad had 300,000 the first weekend. it took kindle/sony/nook a _year_ to move that many. ipad has 500,000 now, less than one month out... it took kindle/sony/nook _18_months_ to sell that many. are you disputing who the eventual winner will be? or are we just arguing about how long it will take the ipad to go ahead? how long has the kindle been out now? just to make it easy on myself, i'll say that when the ipad has been out _that_long_, on that date its sales will have surpassed sales made by the kindle/sony/nook trio on that date. (and likely by a large margin.) and if you want certainty, bracket a period a-year-before that date and a-year-after... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Wed Apr 21 16:23:37 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 16:23:37 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <33b17.38321b87.3900d555@aol.com> Message-ID: Jim is still refusing to acknowledge that PG eBooks in .epub are so easily available that people, including him, perhaps, literally, miss that they are .epub eBooks. He has not answered my questions about this. . . . I ask again: Are not the PG iBooks actually PG .epub books? Same goes for Wattpad? Same goes for all the rest. I thought .epub was the default iPad format, no? After all the peacemaking just now, I must admit that Jim still seems to be playing some games so he can keep on bitching. I think what he really wants is to download .epub files and the object is to do MORE than just READ them. I'm not sure WHAT more he has in mind, unless it's editing. I'm not really sure iPad were made for all that stuff. I think it's pretty obvious they weren't. And they don't have a trunk!!! On Wed, 21 Apr 2010, James Adcock wrote: > > >but of course we know that that's not true, > not on the face of it.? there's a browser on > the ipad, so anything that's out on the web > is something the ipad can readily display... > > I still think there is some fundamental misunderstanding here.? Using the > iPad I go on the web to PG.? I see a ePub book I like there.? I use the > iPad web browser to go there.? I click on the ePUB book.? iPad says ?sorry > Hal I can?t allow you to read that book.? I don?t see how you can say that > the iPad ?readily displays? something when it explicitly tells me that it > refuses to display that something! > > ? > > I take my cheap crappy generic netbook, I go on the web to PG.? I see a > ePub book I like there.? I use the cheap crappy generic netbook?s web > browser to go there.? I click on the ePUB book.? The cheap crappy netbook > automatically downloads the ePUB book to the netbook so I can read it later > in an airplane or on the beach, and it automatically opens it and I start > reading.? The netbook DOES ?readily display? anything that?s out on the > web. > > > >put it this way.? if i were to offer to pay you > $100 for every e-book you read on the ipad, > how many "e-books" could you find to "read"? > > ? > > If you pay me $100 for every time I am part way through a book and then I > pick up my iPad again and that book has magically disappeared because I am > no longer in sight of a public wifi connection then I am going to come out > way ahead.? This is silly, Comcast offers 100s of TV channels, but if I > turn on the TV channel at any moment in time the probability is 95% that > Comcast will have nothing on that *I* want to watch at that moment in time. > I work hard to find what I want to read, and I work hard to find texts that > I want to create to submit to PG, and most of what I want to read or what I > want to create to submit to PG is NOT available via the current hardwired > iPad applets each distributing texts from ONE server location on the > internet.? If every site that offers free books writse its own applet > specifically to support iPad rather than using their already existing HTML > sites which support ?real? HTML browsers, well, then I guess iPad would do > what I want to do.? But I don?t understand why every organization out on > the web offering free books has to write their own applet for iPad when > they already HAVE written that applet -- its call an HTML web site ? its > just that Apple has deliberately pimped their web browser to make sure all > these already existing ?applets? aka HTML free ebook websites don?t work! > > > > > From hart at pglaf.org Wed Apr 21 16:27:15 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 16:27:15 -0700 (PDT) Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: <36f79.176fd0f1.3900e22a@aol.com> References: <36f79.176fd0f1.3900e22a@aol.com> Message-ID: You are really willing to bet that on April 21, 2013 there will be more iPads than Kindles plus Sonys plus nooks. . . . Say it in print for the folks, and I'll start practicing turducken recipes. . . . On Wed, 21 Apr 2010, Bowerbird at aol.com wrote: > michael said: > >?? Total market. . .iPad versus Kindle, Sony and nook, leave the rest. > >?? When will there be more iPads??? > >?? A little spine here. . .come on! > > ipad had 300,000 the first weekend. > > it took kindle/sony/nook a _year_ to move that many. > > ipad has 500,000 now, less than one month out... > > it took kindle/sony/nook _18_months_ to sell that many. > > are you disputing who the eventual winner will be? > > or are we just arguing about how long it will take the ipad > to go ahead? > > how long has the kindle been out now? > just to make it easy on myself, i'll say that > when the ipad has been out _that_long_, > on that date its sales will have surpassed > sales made by the kindle/sony/nook trio > on that date.? (and likely by a large margin.) > > and if you want certainty, bracket a period > a-year-before that date and a-year-after... > > -bowerbird > > From Bowerbird at aol.com Wed Apr 21 16:34:25 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 19:34:25 EDT Subject: [gutvol-d] Re: EBook formats on iPad via wifi Message-ID: <37cac.38c448de.3900e581@aol.com> jim said: > I still think there is some fundamental misunderstanding here. there is. and it will remain here, until you decide to change your tune. > Using the iPad I go on the web to PG.? I see a ePub book I like there. you mean that you see a _book_ you like there. a _book_ that is offered in a _number_ of different formats, including .html (which can be read in the web-browser) and as a plain-vanilla .txt file, which can be read in many apps... your problem is that you want to insist on a certain file-format. even though i'm not fully convinced that one cannot find a way to get an .epub onto an ipad to read in the app that one wants, i _might_ be inclined to take your word for it (since i don't care). but don't try to pretend that because you cannot get an .epub, you can't get "an e-book", because i don't play nonsense games. > I don?t see how you can say that the iPad ?readily displays? > something when it explicitly tells me that it refuses to > display that something! you just can't give it up, can you, jim? i mean, seriously, you are _incapable_, aren't you? but how long do you expect people to take you seriously when you _persist_ in making such nonsense arguments? what book -- what _book_, not some particular file-format -- what _book_ is it that you claim the ipad is refusing to display? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee at novomail.net Wed Apr 21 16:47:18 2010 From: lee at novomail.net (Lee Passey) Date: Wed, 21 Apr 2010 17:47:18 -0600 Subject: [gutvol-d] Re: Tidy -c and tables In-Reply-To: References: <4BCF2EA9.1000402@novomail.net> Message-ID: <4BCF8E86.2000004@novomail.net> On 4/21/2010 1:41 PM, James Adcock wrote: > Why tidy? As Mr. Perathoner has pointed out, it is because OCF requires that the interior text be valid XML, and it is certain that not all of the hand-crafted HTML in the PG repository is valid XHTML. Tidy should cause no harm, but /will/ guarantee XHTML output. It is not the only tool that could produce this result, but it is probably the best (although not perfect). I suspect many of the DP post-processors use tidy as part of their regular workflow. > Many people work hard to retain linebreaks in the HTML so the code can be > gone over again at a future date and then PG throws away those linebreaks. If your notion of "retaining linebreaks" is by putting a newline in your HTML text you have already lost the battle. According to the HTML specification, newlines are white space, and must be treated as such. HTML is an explicit markup language; ie. any markup which is not part of the base text must be explicit, eg.
and not CR, LF, of CRLF. I do not believe that there is any HTML authoring/editing tool which will preserve newline characters as implicit markup. If you really are "work[ing] hard to retain linebreaks in HTML" then you will make them explicit. You can do this by adding explicit markup that user agents will ignore (eg. ), using an invalid HTML element (eg. ) which browser will ignore, or by using the HTML break element in such a way that its display can be turned on or off by the use of CSS styles (eg.
). If you expect everyone to respect newline characters as line breaks in HTML, in direct contravention to the HTML spec, you are borrowing trouble. I agree with you that line breaks need to be preserved; I just think they should be preserved explicitly, and not implicitly. From lee at novomail.net Wed Apr 21 16:51:46 2010 From: lee at novomail.net (Lee Passey) Date: Wed, 21 Apr 2010 17:51:46 -0600 Subject: [gutvol-d] Re: a longstanding question has finally been answered In-Reply-To: References: <36f79.176fd0f1.3900e22a@aol.com> Message-ID: <4BCF8F92.20604@novomail.net> On 4/21/2010 5:27 PM, Michael S. Hart wrote: > > > You are really willing to bet that on April 21, 2013 there will be > more iPads than Kindles plus Sonys plus nooks. . . . > > Say it in print for the folks, and I'll start practicing turducken > recipes. . . . I'll say it in print, and I don't even /like/ the iPad. General purpose computing devices will /always/ beat out single purpose devices. If there are more Kindles sold next year than iPads it will be because Amazon has turned the Kindle into a general purpose computing device as well. From Bowerbird at aol.com Wed Apr 21 16:56:51 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 19:56:51 EDT Subject: [gutvol-d] Re: a longstanding question has finally been answered Message-ID: <39212.7a4fea3d.3900eac3@aol.com> michael said: > You are really willing to bet that on April 21, 2013 there > will be more iPads than Kindles plus Sonys plus nooks. . . . well, every ipad and iphone and ipodtouch can run the kindle software, as can a slew of phones, even today, so all of those machines can act as "virtual kindles"... but in terms of dedicated kindle machines, you betcha. > Say it in print for the folks, and > I'll start practicing turducken recipes. . . . well, if you're even willing to take such a stupendous bet, it means you're gonna rewrite the terms of engagement, but i'll go for it anyway, to see how you're gonna do that. :+) *** while i've got the crystal ball out, might as well use it... amazon will drop the price of a kindle to nearly nothing. indeed, they'll have a "subscription" offer that will actually make the kindle _free_ if you agree to buy so many books. they will also offer an "all-you-can-eat" option that will attempt to do for books what "netflix" has done for films. similarly, they'll offer the kindle to school-districts with a _guarantee_ that their overall textbook-costs will drop, and increasingly-cash-starved schools will jump at that. all of these initiatives -- and more! -- will ensure that lots and lots and lots and lots of kindle units are moved. but it _still_ won't compare with the ipad juggernaut... (not even in quantity, and especially not in profitability.) why not? because amazon can't do software like apple. so they will always be three-and-a-half-steps behind... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Wed Apr 21 17:24:44 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 17:24:44 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <22152.5a5833e1.3900b469@aol.com> Message-ID: >Again, I must once again refer you to Wattpad, for the fifth? time. I DID download and install the Wattpad and I DID discuss this earlier and again it appeared to me to only be yet another app that displays a slightly hacked version of ascii txt files shipped from their own private server. It did not appear to have any way to get an ePUB or MOBI book from a location I choose on the internet. >I was going to get after you about that, and the other things nook, Sony or Kindle do to herd you onto the "company store" turf. iTunes and the Apple App Store are both "company store" turfs as far as I can see -- especially when Jobs can tell Stanza to take out a feature that customers to share free books with their friends--a feature they have to come to rely on-- and why -- because Jobs is introducing a competitive app and Jobs wants to cook the books so that his app wins. That if one even wants to install a free ePUB book via *USB* you *still* have to run it through the iTunes "company store" is particularly galling to me. I don't understand why you have to take this path if Apple isn't DRM'ing the free books??? For the record: Nook "company store" -- nook is hopelessly "locked down" as I have said many times. Sony "company store" -- don't know the wifi version if any, I just know that lots of people who work with PG/DP transfer ebooks to Sony via USB. Kindle "company store" -- offers about the same "features" as the iBooks "company store", plus you can USB by *direct connection* to your computer without having to go though an iTunes-like "company store" applet, plus it has an "experiment web browser" that allows you to download free MOBI books directly from the internet via whispernet, plus it allows one to quickly and easily write a "Magic Catalog" type ebook which in turn can pull down other free MOBI books from the internet via whispernet. Now whispernet is slow and unreliable compared to wifi at least in the 'burbs where I live. And the Kindle browser is weak and sucky -- but at least it hasn't been "pimped" to prevent the download of free ebooks from the internet! And you can also do all these things with PDF Files and TXT files and they will all also actually end up inside your Kindle on the standard bookshelf so that they will still be there when you want to read them, whether on the beach or on an airplane, etc. But the slow and unreliable whispernet plus the weak web browser all reasons why I am still looking for an ebook reader that uses wifi, which hasn't "pimped" that wifi, and hasn't "pimped" the web browser either! From jimad at msn.com Wed Apr 21 17:35:54 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 17:35:54 -0700 Subject: [gutvol-d] Re: Typesetting (unwrap.pl) In-Reply-To: <20100421231052.GA31654@pglaf.org> References: <21579.59641b06.3900b1fa@aol.com> <20100421231052.GA31654@pglaf.org> Message-ID: >A few small changes: Sorry, I'm not too sure how you are doing it, but it shows up "unwrapped" as html in the web browser, but if I actually try to save it as a text file, either by cut-and-paste or by File Save = name.txt then the text magically shows up wrapped again. I think what people want is an unwrapped txt file that they can actually save and can read in their choice of txt reader or txt editor. From jimad at msn.com Wed Apr 21 17:48:25 2010 From: jimad at msn.com (James Adcock) Date: Wed, 21 Apr 2010 17:48:25 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <33b17.38321b87.3900d555@aol.com> Message-ID: >Are not the PG iBooks actually PG .epub books? I have acknowledged before that the subset of free books iBooks that Apple has rebranded as being from Apple look like they originated from PG, and that if you are willing to accept a subset of what PG offers and accept the Apple rebranding then this is not a bad offering on that subset. >Same goes for Wattpad? Wattpad is a non-starter piece of junk as far as I can see. At least the iBooks subset is a reasonable port of that subset of the PG books they choose to offer. >I thought .epub was the default iPad format, no? I thought so too which is why I was so so surprised when from iPAD I clicked on an ePUB book at the PG website and iPAD refused to download and display the book. Even Kindle allows that! >I'm not sure WHAT more he has in mind, unless it's editing. Please Michael you are being silly because I have told you a dozen times already what I had in mind: I had in mind a ebook reader that has wifi and allows me to use its internet browser to download and display ebooks in ePUB and/or MOBI format. It should also allow me to quickly and easily use the wifi to transfer ebooks that I am working on from my local computer to the reader device. Any generic $200 netbook allows you to do these things. Its just that they have a keyboard that gets in the way when you are trying to read something. I wouldn't think it would be hard for YOU to imagine a netbook but with a virtual keyboard rather than a physical keyboard except that YOU are playing games because YOU don't want to admit how much Apple has pimped their offering to keep friends from freely sharing free books with their friends. And I thought that was something that YOU always claimed PG was about? So again, why are you defending Apple? From Bowerbird at aol.com Wed Apr 21 18:09:22 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Wed, 21 Apr 2010 21:09:22 EDT Subject: [gutvol-d] Re: Typesetting (unwrap.pl) Message-ID: <3d4f6.c5288d9.3900fbc2@aol.com> jim, you really need to learn the art of asking what you are doing wrong, instead of just making a report that "the tool doesn't work". you say the text shows up unwrapped; that indicates the script is working... if the linebreaks "reappear", then you are doing something wrong... _you_... what i imagine is happening is this: you're doing a "select-all" before you copy text from the browser-window. that copies the text from the _field_ as well as the unwrapped text below. so when you paste the text elsewhere, the text you see (because it's on top) is the wrapped text from in the field. you need to copy the _unwrapped_ text out of the browser window that appears, and _only_ that unwrapped text. get it? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From hart at pglaf.org Wed Apr 21 19:41:02 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 19:41:02 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <33b17.38321b87.3900d555@aol.com> Message-ID: On Wed, 21 Apr 2010, James Adcock wrote: > >Are not the PG iBooks actually PG .epub books? > > I have acknowledged before that the subset of free books iBooks that Apple > has rebranded as being from Apple look like they originated from PG, and > that if you are willing to accept a subset of what PG offers and accept the > Apple rebranding then this is not a bad offering on that subset. Gee! Jim has still found yet another way NOT to say if they are .epub. Not to mention all the ways he has found NOT to say that there are some obvious ways NOT to get the Stanza iPod effect in his original complaint. Yawn!!! > >Same goes for Wattpad? > > Wattpad is a non-starter piece of junk as far as I can see. At least the > iBooks subset is a reasonable port of that subset of the PG books they > choose to offer. "Non-starter piece of junk". . .another yawn!!! Jim. . .you have to START before you have any right to such comments. Have some experience before you say such things. This is why people think you are just bitching. > >I thought .epub was the default iPad format, no? > > I thought so too which is why I was so so surprised when from iPAD I clicked > on an ePUB book at the PG website and iPAD refused to download and display > the book. Even Kindle allows that! Yes, Jim has found another way to avoid answering that direct question. I am back to having to challenge his sincerity in all this. . . . Yahn!!! > >I'm not sure WHAT more he has in mind, unless it's editing. Jim, until and unless you are willing to CONVERSE and do research, and to admit that Apple's iPad was never intended for what you want I don't think there is any need or reason to continue this pretense. You can have the last word. You can have ALL of the last words. I retire from the field. The field is yours. Next subject. . .are there really $200 netbooks? Would you please send me some URLs for them??? > Please Michael you are being silly because I have told you a dozen times > already what I had in mind: I had in mind a ebook reader that has wifi and > allows me to use its internet browser to download and display ebooks in ePUB > and/or MOBI format. It should also allow me to quickly and easily use the > wifi to transfer ebooks that I am working on from my local computer to the > reader device. Any generic $200 netbook allows you to do these things. > Its just that they have a keyboard that gets in the way when you are trying > to read something. I wouldn't think it would be hard for YOU to imagine a > netbook but with a virtual keyboard rather than a physical keyboard except > that YOU are playing games because YOU don't want to admit how much Apple > has pimped their offering to keep friends from freely sharing free books > with their friends. > > And I thought that was something that YOU always claimed PG was about? So > again, why are you defending Apple? Only defending our readers from you. . . . Apple can take care of itself. > > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From hart at pglaf.org Wed Apr 21 19:49:29 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 19:49:29 -0700 (PDT) Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: References: <22152.5a5833e1.3900b469@aol.com> Message-ID: On Wed, 21 Apr 2010, James Adcock wrote: > >Again, I must once again refer you to Wattpad, for the fifth? time. > > I DID download and install the Wattpad and I DID discuss this earlier and > again it appeared to me to only be yet another app that displays a slightly > hacked version of ascii txt files shipped from their own private server. It > did not appear to have any way to get an ePUB or MOBI book from a location I > choose on the internet. You seem to be avoiding the point that they say it IS an .epub you read with Wattpad. . .and iBooks. . .and many others. You HAVE downloaded .epubs, or so it would appear, and _I_ have, so it would appear, if we believe it's the iPad default. You are hitting the target, but trying to deny it. Do more research. . .perhaps you can find ONE that not .epub! You could be famous!!! > > >I was going to get after you about that, and the other things nook, Sony or > Kindle do to herd you onto the "company store" turf. ALL of the materials I have mentioned are free of charge. Once again. . .the target has been hit. . .you are in denial. > iTunes and the Apple App Store are both "company store" turfs as far as I > can see -- especially when Jobs can tell Stanza to take out a feature that > customers to share free books with their friends--a feature they have to > come to rely on-- and why -- because Jobs is introducing a competitive app > and Jobs wants to cook the books so that his app wins. That if one even > wants to install a free ePUB book via *USB* you *still* have to run it > through the iTunes "company store" is particularly galling to me. I don't > understand why you have to take this path if Apple isn't DRM'ing the free > books??? > > For the record: > > Nook "company store" -- nook is hopelessly "locked down" as I have said many > times. Gee, I wonder who said all that "company store" stuff here before you did??? Enough. . .you are now just talking to yourself, unless you can tempt bowerbird to keep after you. > Sony "company store" -- don't know the wifi version if any, I just know that > lots of people who work with PG/DP transfer ebooks to Sony via USB. > > Kindle "company store" -- offers about the same "features" as the iBooks > "company store", plus you can USB by *direct connection* to your computer > without having to go though an iTunes-like "company store" applet, plus it > has an "experiment web browser" that allows you to download free MOBI books > directly from the internet via whispernet, plus it allows one to quickly and > easily write a "Magic Catalog" type ebook which in turn can pull down other > free MOBI books from the internet via whispernet. Now whispernet is slow > and unreliable compared to wifi at least in the 'burbs where I live. And > the Kindle browser is weak and sucky -- but at least it hasn't been "pimped" > to prevent the download of free ebooks from the internet! And you can also > do all these things with PDF Files and TXT files and they will all also > actually end up inside your Kindle on the standard bookshelf so that they > will still be there when you want to read them, whether on the beach or on > an airplane, etc. But the slow and unreliable whispernet plus the weak web > browser all reasons why I am still looking for an ebook reader that uses > wifi, which hasn't "pimped" that wifi, and hasn't "pimped" the web browser > either! > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > From hart at pglaf.org Wed Apr 21 20:02:42 2010 From: hart at pglaf.org (Michael S. Hart) Date: Wed, 21 Apr 2010 20:02:42 -0700 (PDT) Subject: [gutvol-d] APRIL 21, 2013 BOWERBIRD/HART WAGER In-Reply-To: <39212.7a4fea3d.3900eac3@aol.com> References: <39212.7a4fea3d.3900eac3@aol.com> Message-ID: Quit trying to change the subject and muddy the fowl waters! iPad have to BE iPads, Kindles have to BE Kindles, Sonys=Sonys, and nooks have to be nooks. . .not virtual. . .not clones. April 21, 2013 you say there will be more iPads than the rest. Period! On Wed, 21 Apr 2010, Bowerbird at aol.com wrote: > michael said: > >?? You are really willing to bet that on April 21, 2013 there > >?? will be more iPads than Kindles plus Sonys plus nooks. . . . > > well, every ipad and iphone and ipodtouch can run the > kindle software, as can a slew of phones, even today, > so all of those machines can act as "virtual kindles"... > > but in terms of dedicated kindle machines, you betcha. > > > >?? Say it in print for the folks, and > >?? I'll start practicing turducken recipes. . . . > > well, if you're even willing to take such a stupendous bet, > it means you're gonna rewrite the terms of engagement, > but i'll go for it anyway, to see how you're gonna do that.??????? :+) This was your original prediction, as far as I can make out. As listed above, and I'm not even counting the dozen other brands. . .at least for now. > > *** > > while i've got the crystal ball out, might as well use it... You mean making it a "loss leader" like Sony Playstations??? > indeed, they'll have a "subscription" offer that will actually > make the kindle _free_ if you agree to buy so many books. A virtual "Book of the Month Club"??? > they will also offer an "all-you-can-eat" option that will > attempt to do for books what "netflix" has done for films. Now THAT would be cute. . .but like Netflix, I presume your plan would mean you don't get to OWN the books. . .??? Cute. . .you should try to sell them that promotion and save these notes for "prior art"!!! I want 10% !!! Hee hee! > similarly, they'll offer the kindle to school-districts with > a _guarantee_ that their overall textbook-costs will drop, > and increasingly-cash-starved schools will jump at that. You mean like, let's see, who was it. . .APPLE used to do!?!?!? > all of these initiatives -- and more!? -- will ensure that > lots and lots and lots and lots of kindle units are moved. > > but it _still_ won't compare with the ipad juggernaut... > (not even in quantity, and especially not in profitability.) > > why not?? because amazon can't do software like apple. > so they will always be three-and-a-half-steps behind... Good. . .then it's still a bet, and we are on for: APRIL 21, 2013 iPad has to have over 50% of the grand total. . . . We'll have to wait to see if any of the others take more than a few percentage points of the market. I'm sure SOMEONE will try, perhaps even Microsoft, but I don't think they will get enough out there to matter. > > -bowerbird > > From Bowerbird at aol.com Wed Apr 21 23:12:16 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 22 Apr 2010 02:12:16 EDT Subject: [gutvol-d] Re: APRIL 21, 2013 BOWERBIRD/HART WAGER Message-ID: <4871f.4a561e90.390142c0@aol.com> michael said: > iPad have to BE iPads, Kindles have to BE Kindles, Sonys=Sonys, > and nooks have to be nooks. . .not virtual. . .not clones. > April 21, 2013 you say there will be more iPads than the rest. > Period! yes sir! or i'll buy you an all-you-can-eat dinner every night for a week! _plus_ you can say you once won one bet against me! :+) -bowerbird p.s. do i win automatically if ipad surges ahead _before_then_? or is it only what the sales figures happen to be on _that_date_? p.s. heck, i'll make it all-you-can-eat every night for _two_ weeks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at hutchinson.net Thu Apr 22 07:56:48 2010 From: joshua at hutchinson.net (Joshua Hutchinson) Date: Thu, 22 Apr 2010 14:56:48 +0000 (GMT) Subject: [gutvol-d] Re: Tidy -c and tables References: <4BCF2EA9.1000402@novomail.net> <4BCF6AE1.8060900@perathoner.de> Message-ID: <1992374499.215098.1271948208072.JavaMail.mail@webmail10> An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 22 12:51:23 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 22 Apr 2010 15:51:23 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: <6bf7b.87321.390202bb@aol.com> kevin said: > On the Open Library System, I note that > high resolution gray-scale scans > (at least for the one project I checked) > are not archived, > though the black and white scans are it's my understanding that d.p. has kept all scans, but it's reasonable they wouldn't mount the high-res ones; no sense letting the general public burn your bandwidth. this, of course, is the problem with high-res files in general. they're nice to have, for purposes of "preservation", but you can't really make them "accessible" in a practical way until computer resources become free across-the-board, so -- in a practical sense -- they don't really do any good. it's not just bandwidth, either. storage problems quickly ensue when each page of a book eats multiple megabytes. and computers need lotsa power to crunch through them. and sure, we can all see the day coming when all of these resources _will_ be available to us. but how soon is that? are you willing to bet on it? and don't forget that you are a lucky first-worlder. how soon until _everyone_ on the whole planet has unlimited computing resources? really? are you willing to bet on it? and if the third-worlders can't have what you lucky people have, how long do you think they will sit on the sidelines without a full-out revolution? we need to think in real-world terms, and be _practical_... > I also note that there is no 'bulk' download function > to get a zip of all the files associated with a text. yeah, that would be nice. will d.p. offer that? who knows? in the meantime, you can learn the address of an image by right-clicking it and choosing the appropriate menu-item. for instance, here's the u.r.l. i recovered for one page: > http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/web_ready/001.png subsequent scans have the same u.r.l., except "002.png", "003.png", etc., so it's very easy to scrape them en masse. (if anyone needs a scraper-program, just backchannel me.) -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Thu Apr 22 13:00:18 2010 From: jimad at msn.com (James Adcock) Date: Thu, 22 Apr 2010 13:00:18 -0700 Subject: [gutvol-d] Re: EBook formats on iPad via wifi In-Reply-To: <37cac.38c448de.3900e581@aol.com> References: <37cac.38c448de.3900e581@aol.com> Message-ID: >> Using the iPad I go on the web to PG. I see a ePub book I like there. >you mean that you see a _book_ you like there. No, I mean I see an ePub book there because Steve Jobs among other people at Apple claims that the iPad supports ePub, and it doesn?t really. If you read the APIs you will find that actually it is a PDF-centric machine which also has some HTML url APIs. >a _book_ that is offered in a _number_ of different formats, including .html (which can be read in the web-browser) and as a plain-vanilla .txt file, which can be read in many apps... And they all generally disappear on the iPad as soon as you lose the wifi connection (see more later) >your problem is that you want to insist on a certain file-format. No my problem is that I insist on a certain reader experience that is not important to you or Michael presumably because your eyestate is different than my eyestate, and in general you have lower expectations about what a ?book? is than I have, and you have lower expectations for a user?s experience than I have. ePub and MOBI are simply two real ?ebook? file formats that support that user experience. Other less common file formats the support a real ?ebook experience? include: azw, topaz, tr2, tr3, aeh, fb2, lit, pdb, lrf, lrx Here?s a few things I expect for my ?reader experience? based on my previous experience with many many different reader machines which are either designed for reading, or which truly are ?general purpose machines.? I expect to be able to see a book on the internet and actually get that book onto my machine to read it -- from where I see it on the internet. I expect to be able to read a book in either full screen portrait mode or full screen landscape mode. I DO NOT, for instance expect to be forced to read a book in ?two up? mode if I switch to landscape mode. I expect to be able to change font sizes, fonts, and margins. I expect reflow when these things happen. I expect to be able to keep a ?library? of at least a couple hundred books on my machine in differing formats and in differing states of being read. I expect that ?library? will show me spine information such as Author and Title without having to read the book. I expect that once I put a book on my machine it will be there again the next time I try to read the book whether or not I have an internet connection. AKA ?airplane mode.? I expect the reader machine will understand what it means to change a page and will not go off scrolling wacko in three dimensions when I just try to change a page. I expect that when I own a book and I own a desktop computer, and I own a portable computer, then I can move that book which I own to and from those two computers that I own without having to ask Steve Jobs or his Company Store?s permission every time I want to move that book file from one computer to the other. What in God?s Name gives Steve Jobs the right to say I?ve got to run MY files through his Company Store every time I want to make a file transfer between MY computers??? >but don't try to pretend that because you cannot get an .epub, you can't get "an e-book", because i don't play nonsense games. Here?s what one CAN do with an iPad, because I just went back to the Apple ?Bricks and Mortor? store again. You can open Safari. You can go to the PG website. You can say ?Show me all the PDF books? -- of the 30,000 books on PG exactly 483 are available in PDF format. You can say ?Save this webpage to the Desktop?. You can do this with a second book. Now you can go into ?beach mode? aka ?airplane mode? by turning off the wifi. You try to open the first book and you find that it has magically disappeared. It appears to be on the desktop, but when you open it nothing is there. Now you open the second book and you find that it IS still there. So the Safari web browser has the ability to store ONE book on the desktop. So you can get ONE book at a time off the internet from a location you choose without having a wifi connection IF that book is available in PDF format. If you try to read two books at once you either need a wifi connection so that the iPad can keep reloading those books over the internet over and over and over again, or you can confine yourself to only reading one book at a time cover to cover and then throw the first book away before you read the second book ? which will require you to find a wifi connection to reload it again. You can also ?buy? where ?buy? may be free if Steve Jobs says so a limited selection of books from the Apple Company Store using their Apple-hardwired puppy calls ?iBooks?. iBooks does store more than one book. You will have to sign up with iTunes and give them a valid credit card before you can even download ?free? books from the Apple Company Store. And you will have to content yourself with Jobs? choice of what you get to read ? a choice that may change for the better or worse a year from now. You can also do these same things through the Amazon Company Store in the form of Kindle for iPad downloaded to your iPad. >what book -- what _book_, not some particular file-format -- what _book_ is it that you claim the ipad is refusing to display? Two such books come readily to mind: PG #32085 (well I guess I can read #32085 in pgtxt70 mode but I refuse to read ?books? in pgtxt70 mode ? life is too short! Also I guess I can read it in HTML mode as long as I have the wifi connection up and running ? just not in ?airplane mode.?Also as long as it?s the ONLY book I want to currently read. And the book that I am currently working on for submission to PG. Look its pretty simple: iPad is actually a PDF machine NOT an ePub machine in spite of Job?s claims to the contrary. If you and Michael actually want to support iPad rather than claim it can do things that it can not do -- THEN SUPPORT IT! What this would take is firstly provide all the PG books in PDF format. BB has I think been working on this idea for text files. Use half page format as BB was trying previously. I spent literally a minute and as a test converted a PG HTML book file to PDF format using an Adobe tool. I put that PDF file up at http://www.freekindlebooks.org/Dev/Rainbow.pdf if you want to play with it using iPad and Safari ? or if you want to look at it in any other general purpose machine ? even a Kindle will display it if you get it from this location! Now Safari is actually a pretty sucky browser for reading books ? but at least you CAN read it. And you CAN store ONE book from PG on the iPad desktop and then read that ONE book on the beach or on an airplane. Secondly one of you or someone else at PG would need to write an iPad PDF/HTML browser program which would have its own ?sand box? area which could download and store more than one book. Unlike ePub iPad DOES have APIs for reading, storing, displaying PDF, and for accessing the internet using HTML url protocols. So this app would be RELATIVELY easy thing to do. Would this make *me* happy? Not really, because I like reflow and PDF doesn?t reflow. But then at least you guys COULD legitimately claim then that one CAN read PG ?ebooks? on the iPad! No Steve Jobs and no Company Store involved! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Thu Apr 22 13:05:37 2010 From: jimad at msn.com (James Adcock) Date: Thu, 22 Apr 2010 13:05:37 -0700 Subject: [gutvol-d] Re: Tidy -c and tables In-Reply-To: <4BCF8E86.2000004@novomail.net> References: <4BCF2EA9.1000402@novomail.net> <4BCF8E86.2000004@novomail.net> Message-ID: >I do not believe that there is any HTML authoring/editing tool which will preserve newline characters as implicit markup. Sorry what I do and others do is retain the original books linebreaks in the coding of the HTML. Fortunately HTML *is* a reflow file format which ignores those linebreaks and treats them as whitespace, allowing the end user to use whatever size device, fonts, screen orientations etc that they choose. If one later want to make another pass at the book one simply strips out the HTML markup leaving the original text part intact with the same linebreaks as were in the original book. Then as a hypothetical example one can resubmit that plaintext with the original linebreaks back through the DP process. From jimad at msn.com Thu Apr 22 13:24:59 2010 From: jimad at msn.com (James Adcock) Date: Thu, 22 Apr 2010 13:24:59 -0700 Subject: [gutvol-d] Re: Typesetting (unwrap.pl) In-Reply-To: <3d4f6.c5288d9.3900fbc2@aol.com> References: <3d4f6.c5288d9.3900fbc2@aol.com> Message-ID: >if the linebreaks "reappear", then you are doing something wrong... _you_... You all keep making tools that don't work and then you blame the user when they report back their experience to you. Tried your "workaround" suggestion on a couple of browsers and that just makes the browser crash. Suggest YOU ought to try it out on a "REAL" PG book not a toy test and see what happens to you. I'm on a PC, so maybe you all are making things that work on the other 1% of the machines out here in the real world? Bottom line, do you really think what you are offering is really going to make a real world PG customer happy? How about offering unwrapped txt files as part of the file download choices - I think that's what people expect! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 22 13:55:48 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 22 Apr 2010 16:55:48 EDT Subject: [gutvol-d] Re: Typesetting (unwrap.pl) Message-ID: <7301c.365c7490.390211d4@aol.com> jim- i'm sorry you can't make my unwrap script work for you. you haven't given me enough info that i can help you out. and your attitude is so bad that i am not inclined even to give you simple advice that would sidestep the problem... i'm _quite_ sure the program works just fine; after all, it's nothing but a set of reg-ex changes. nonetheless... if _anyone_else_ finds that you can't make it work either, then i _invite_ you to make a post here on this listserve, and i will do my best to get to the root of your problem. unfortunately, jim's problem is rooted _far_ too deeply for me to be able to do anything about it... and if you don't want to mess with a perl script at all, go ahead and use the web-service that i've created: > http://z-m-l.com/go/unwrap.pl paste your text into the field, then click "unwrap it!", and then copy the unwrapped text from the window... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee at novomail.net Thu Apr 22 14:06:30 2010 From: lee at novomail.net (Lee Passey) Date: Thu, 22 Apr 2010 15:06:30 -0600 Subject: [gutvol-d] Cooperative proofreading Message-ID: <4BD0BA56.5020806@novomail.net> Just an update on my cooperative proofreading site (http://www.ebookcooperative.com/). 1. The Java servlets are now connected to CVS, so any changes you make will be persisted to the file system. 2. Registration is now required to proof read pages (just your identity is currently required). This identity is used to add comments to CVS so we can see who changed things. 3. The proofreading system tracks progress, so if you go half-way through a project, exit, and return at some time later and choose the same project you will be taken to the last page you were viewing. 4. Some new projects were added. The project table is generated from a database of projects; no editing of web pages is required. The Kupu-based UI is still a little rough; that will probably be the last thing I tackle. I have not yet written the proofing guidelines. The next step will be to create a servlet that will allow downloading an entire e-book by combining all the pages from the repository into a single file. As always, feedback is welcomed. From Bowerbird at aol.com Thu Apr 22 14:30:58 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 22 Apr 2010 17:30:58 EDT Subject: [gutvol-d] Re: EBook formats on iPad via wifi Message-ID: <754f1.14597b23.39021a12@aol.com> jim said: > No, I mean I see an ePub book there > because Steve Jobs among other people at Apple > claims that the iPad supports ePub, and it doesn?t really. blah blah blah. i'm done talking to you about this, jim. > No my problem is that I insist on a certain reader experience i can relate to that. as i told you earlier, if you woulda said "i can't get the e-books that i want from the sites that i want in the format that i want", then you _might_ have received a sympathetic response. but instead, you said you couldn't get an e-book. that's false, so you get the scorn that people receive when they tell a lie... > you have lower expectations about what a ?book? is than I have i'm quite sure my ideal e-book surpasses an .epub in many ways. but i'm not foolish enough to think that my idea of an e-book is the only one that deserves to be called "an e-book", and that every other idea of an e-book can be discarded _by_definition_... > I expect to ... > I expect to ... > I expect to ... > I expect to ... > I expect to ... > I expect to ... > I expect to ... > I expect to ... that's a nice list; i applaud the thought that went into its creation. if the ipad comes up short of your expectations, just don't buy it... you might even tell us -- _once_ -- how it falls short via your list. but let me tell you a little something about listserve conversations; if they're not sufficiently interesting to the majority of the lurkers, then they aren't worth having. this conversation stopped being "sufficiently interesting" to them a long, long, long time ago. that's when you should have stopped. that's also when _i_ should've stopped, and michael too, and we both should know better than to make such a stupid mistake, but we'll both be smarter about having a dialog with you in the future. (i keep giving you "one more chance", and you keep blowing it, so i'm not going to be doing that any more, jim. you made your bed. now you're going to find that nobody wants to talk with you at all. worse, you might find nobody even bothers to _read_ your posts.) > well I guess I can read #32085 in pgtxt70 mode > but I refuse to read ?books? in pgtxt70 mode i see. so it isn't that the ipad "refuses to display" this book, it's that _you_ refuse to read it the way the ipad displayed it. well, yes, then _that_ is a different matter entirely, yes it is... > And the book that I am currently working on for submission to PG. am i some kind of mind-reader, or what? > Look its pretty simple: iPad is actually a PDF machine > NOT an ePub machine in spite of Job?s claims to the contrary. i think you're sputtering out of control, jim... > If you and Michael actually want to support iPad > rather than claim it can do things that it can not do > -- THEN SUPPORT IT! the ipad doesn't need our support. and we're not "claiming" that it can do anything it can't. then again, neither are we claiming that it "cannot" do something -- like display an e-book -- just because it won't display it in the way we want in the app we want. in sum, just because i don't like the format of an e-book doesn't mean that it magically ceases to _be_ an e-book. and i can't believe that i am willing to make yet another message that repeats something so inane, and therefore contributes absolutely nothing to the signal/noise ratio. so i will stop! now! my apologies to the subscribers... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Thu Apr 22 15:06:58 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Thu, 22 Apr 2010 18:06:58 EDT Subject: [gutvol-d] Re: Typesetting (back on track, calling the mad scientist) Message-ID: <777f8.52df4abc.39022282@aol.com> we seem to have lost the mad scientist. mike mcd, are you still out there? if so, i have some questions for you... here's another take on "gods and fighting men": > http://z-m-l.com/misc/14465-take6.pdf this .pdf just has the first page of each chapter, but it came outta my program, not a text-editor. as we can see, the p.g. linebreaks make this text practically unusable, so we'll have to do a rewrap, especially if you want to have the text _justfied_... (if not, i can just rearrange the unwieldy lines and leave the vast majority of p.g. linebreaks in place.) going on, is this text-size (10-point) good for you? (again, print out some pages so you know for sure.) how about the leading? it's 15-point leading, so that's generous for 10-point type, and you might feel it's _too_ big, but i thought i'd show it to you. on pages 12 and 101, you'll see _blue_ headers... those are lines that needed to be _shrunk_ a bit, so they would not spill over into the margin area. on page 12 it's 15.5-point instead of 16-point, and on page 101 it's 13-point instead of 14-point. (and that latter one still intrudes on the margins.) the program attempts to "copy-fit" all the headers to the same size, but i'm experimenting here with allowing slight variations in size on freakish lines. (rather than letting the freaks dictate that the other header-lines be smaller to accommodate the freaks.) so the question is, are these small variations bad? noticeable? too much so? do they bother you much? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From schultzk at uni-trier.de Thu Apr 22 23:42:54 2010 From: schultzk at uni-trier.de (Keith J. Schultz) Date: Fri, 23 Apr 2010 08:42:54 +0200 Subject: [gutvol-d] Re: EBook formats on iPad via wifi (OT) In-Reply-To: References: <37cac.38c448de.3900e581@aol.com> Message-ID: <754242A6-FA52-4421-8EB3-0E7D4DCDEDEF@uni-trier.de> Hi James, I think you have stated you points well enough. If you have nothing more to say about the PG/DP experimental format then please do this thread over Apple iPad and take up your problems with the iPad with Apple. regards Keith. From vlsimpson at gmail.com Fri Apr 23 07:08:07 2010 From: vlsimpson at gmail.com (V. L. Simpson) Date: Fri, 23 Apr 2010 09:08:07 -0500 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <6bf7b.87321.390202bb@aol.com> References: <6bf7b.87321.390202bb@aol.com> Message-ID: > (if anyone needs a scraper-program, just backchannel me.) http://www.gnu.org/software/wget/ From kevin.pulliam at gmail.com Fri Apr 23 08:10:13 2010 From: kevin.pulliam at gmail.com (Kevin Pulliam) Date: Fri, 23 Apr 2010 10:10:13 -0500 Subject: [gutvol-d] Re: DP output is technically obsolete In-Reply-To: <6bf7b.87321.390202bb@aol.com> References: <6bf7b.87321.390202bb@aol.com> Message-ID: This was a special case.. the high resolution scans are actually needed to read/decipher some of the text, but Greg popped up and pointed out that he uploaded the super-duper high res scans to Internet Archive. Which answers the mail on this and satisfies my desire that all those scans of hard to find issues of that work continue to be available. As to a screen scraper, wget, or simply clicking through and downloading each image at OLS, this fails the "Same Barrier to access" test (And I admit it is my standard, not a requirement, or something someone else promised to adhere to) when compared to a PG Text. In order for the scanned pages to be similarly available as the PG Text, the images will need to be available in a single download 'click' the hypothetical generic internet user can understand and make use of. 'One Click, One Book'. Just as a bookstore doesn't make you visit 16 different locations in the store to purchase one book, PG doesn't require you to visit multiple pages to download a book, and Amazon doesn't require you to visit multiple pages (other than order confirmation) to purchase a book. In each of my examples here, Person A can give Person B a link or a location description, and Person B can go to that location and get the book in the preferred format (Paper in hand, paper in the mail, etext of various types, etc). Thanks Kevin On Thu, Apr 22, 2010 at 2:51 PM, wrote: > kevin said: >>?? On the Open Library System, I note that >>?? high resolution gray-scale scans >>?? (at least for the one project I checked) >>?? are not archived, >>?? though the black and white scans are > > it's my understanding that d.p. has kept all scans, but > it's reasonable they wouldn't mount the high-res ones; > no sense letting the general public burn your bandwidth. > > this, of course, is the problem with high-res files in general. > > they're nice to have, for purposes of "preservation", but > you can't really make them "accessible" in a practical way > until computer resources become free across-the-board, > so -- in a practical sense -- they don't really do any good. > > it's not just bandwidth, either.? storage problems quickly > ensue when each page of a book eats multiple megabytes. > and computers need lotsa power to crunch through them. > > and sure, we can all see the day coming when all of these > resources _will_ be available to us.? but how soon is that? > are you willing to bet on it?? and don't forget that you are > a lucky first-worlder.? how soon until _everyone_ on the > whole planet has unlimited computing resources?? really? > are you willing to bet on it?? and if the third-worlders can't > have what you lucky people have, how long do you think > they will sit on the sidelines without a full-out revolution? > > we need to think in real-world terms, and be _practical_... > > >>?? I also note that there is no 'bulk' download function >>?? to get a zip of all the files associated with a text. > > yeah, that would be nice.? will d.p. offer that?? who knows? > > in the meantime, you can learn the address of an image by > right-clicking it and choosing the appropriate menu-item. > > for instance, here's the u.r.l. i recovered for one page: >> >> http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/web_ready/001.png > > subsequent scans have the same u.r.l., except "002.png", > "003.png", etc., so it's very easy to scrape them en masse. > (if anyone needs a scraper-program, just backchannel me.) > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > From Bowerbird at aol.com Fri Apr 23 12:54:54 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 23 Apr 2010 15:54:54 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: kevin said: > In order for the scanned pages to be > similarly available as the PG Text, > the images will need to be available > in a single download 'click' > the hypothetical generic internet user > can understand and make use of.? > 'One Click, One Book'. i see. you were discussing p.g. policy... i thought you wanted the scans yourself. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 23 13:02:22 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 23 Apr 2010 16:02:22 EDT Subject: [gutvol-d] Re: DP output is technically obsolete Message-ID: vlsimpson said: > http://www.gnu.org/software/wget/ wget is a nice program, for a non-interactive commandline tool. thanks for bringing attention to it... someone will appreciate it... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From e98cuenc at gmail.com Sat Apr 24 02:18:36 2010 From: e98cuenc at gmail.com (Joaquin Cuenca Abela) Date: Sat, 24 Apr 2010 11:18:36 +0200 Subject: [gutvol-d] Removing spurious break lines Message-ID: Hi, some books, like "Don Quijote" (http://www.gutenberg.org/etext/2000) have spurious break lines all over the text. From what I understood PG generates all the derived formats from the HTML, if there is one, or from the raw text format otherwise. In this case there is an HTML version, but it also contains the spurious break lines. My guess is that the HTML was automatically generated from the text, and the text breaks the lines at ~79 - 80 characters. Are there guidelines on how to format the raw text to make it more amenable for automatic conversion to other formats by the PG tools? Is it ok to reformat this text removing the spurious break lines in the raw text? Was the HTML automatically generated? or do I have to fix also the HTML? How can I check the results in other formats before sending it to PG? Also, are the conversion tools open source? Cheers, -- Joaquin Cuenca Abela From e98cuenc at gmail.com Sun Apr 25 09:57:48 2010 From: e98cuenc at gmail.com (Joaquin Cuenca Abela) Date: Sun, 25 Apr 2010 18:57:48 +0200 Subject: [gutvol-d] Re: Removing spurious break lines In-Reply-To: References: Message-ID: Wrt "Don Quijote", the page claims the HTML has been generated manually. I have generated an improved HTML version with a python script, and added a few manual fixes (like adding some extra headers). The trickiest part was to accurately identify verses. The original text is inconsistent on to where it splits the lines (but most of the text cuts lines at 75 characters). How can I submit the modified HTML? Thanks, On Sat, Apr 24, 2010 at 11:18 AM, Joaquin Cuenca Abela wrote: > Hi, > > some books, like "Don Quijote" (http://www.gutenberg.org/etext/2000) > have spurious break lines all over the text. From what I understood PG > generates all the derived formats from the HTML, if there is one, or > from the raw text format otherwise. > > In this case there is an HTML version, but it also contains the > spurious break lines. My guess is that the HTML was automatically > generated from the text, and the text breaks the lines at ~79 - 80 > characters. > > Are there guidelines on how to format the raw text to make it more > amenable for automatic conversion to other formats by the PG tools? Is > it ok to reformat this text removing the spurious break lines in the > raw text? > > Was the HTML automatically generated? or do I have to fix also the HTML? > > How can I check the results in other formats before sending it to PG? > > Also, are the conversion tools open source? > > Cheers, > > -- > Joaquin Cuenca Abela > -- Joaquin Cuenca Abela From Bowerbird at aol.com Sun Apr 25 11:45:35 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Sun, 25 Apr 2010 14:45:35 EDT Subject: [gutvol-d] Re: Removing spurious break lines Message-ID: <39eb8.7a131d9b.3905e7cf@aol.com> joaquin- someone from p.g. should be along to answer your questions any minute now... we just went through a long bruising discussion about "spurious" linebreaks, which is likely why they're a bit reluctant to get on that horse and ride it again so soon... they took a _little_ step toward some progress by accepting a script that would remove those "spurious" linebreaks from a properly-prepared file. but the _big_ step that they still need to take is to make sure that all the files in the library are "properly-prepared". the don quixote text was one such file which is not "properly-prepared", as you discovered. (if it had been properly-prepared, the verses would've been indented, and thus you would have found it extremely easy to identify them.) so you have made them face an issue that they would rather not face, especially right now... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajhaines at shaw.ca Sun Apr 25 12:21:04 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Sun, 25 Apr 2010 12:21:04 -0700 Subject: [gutvol-d] Re: Removing spurious break lines References: <39eb8.7a131d9b.3905e7cf@aol.com> Message-ID: I've contacted Joaquin directly. Al ----- Original Message ----- From: Bowerbird at aol.com To: gutvol-d at lists.pglaf.org ; bowerbird at aol.com Sent: Sunday, April 25, 2010 11:45 AM Subject: [gutvol-d] Re: Removing spurious break lines joaquin- someone from p.g. should be along to answer your questions any minute now... we just went through a long bruising discussion about "spurious" linebreaks, which is likely why they're a bit reluctant to get on that horse and ride it again so soon... they took a _little_ step toward some progress by accepting a script that would remove those "spurious" linebreaks from a properly-prepared file. but the _big_ step that they still need to take is to make sure that all the files in the library are "properly-prepared". the don quixote text was one such file which is not "properly-prepared", as you discovered. (if it had been properly-prepared, the verses would've been indented, and thus you would have found it extremely easy to identify them.) so you have made them face an issue that they would rather not face, especially right now... -bowerbird ------------------------------------------------------------------------------ _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 10:15:38 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 13:15:38 EDT Subject: [gutvol-d] Re: Removing spurious break lines Message-ID: <1d6ec.d2028aa.3907243a@aol.com> al said: > I've contacted Joaquin directly. well, i agree that there's no need to engulf an innocent newcomer in an involved discussion about p.g. policy. but that doesn't mean the discussion should be swept under the rug, eh? we'll need an answer to the question: will p.g. accept e-texts that have been "corrected" by virtue of having their not-to-be-unwrapped lines indented? transparency is the new black. sunshine is the best disinfectant. the most colorful fish deserve the most apparent aquarium. so... what is the p.g. policy on this? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Mon Apr 26 11:00:16 2010 From: jimad at msn.com (James Adcock) Date: Mon, 26 Apr 2010 11:00:16 -0700 Subject: [gutvol-d] Re: Removing spurious break lines In-Reply-To: <1d6ec.d2028aa.3907243a@aol.com> References: <1d6ec.d2028aa.3907243a@aol.com> Message-ID: In general, if derived formats including ePUB and MOBI from HTML, also HTML from txt, also unwrapping txt from wrapped txt, are to work "correctly" then there needs to be *some* degree of expectation on the formatting of the incoming texts. Otherwise these tasks cannot be successfully automated. Going the other way, the automated wrapping of txt is has built-in support by most (all?) modern text tools, including web browsers, e-book readers, text editors, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 11:28:25 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 14:28:25 EDT Subject: [gutvol-d] Re: Removing spurious break lines Message-ID: <2377d.3eb6f509.39073549@aol.com> jim said: > In general, if derived formats including ePUB and MOBI from HTML, > also HTML from txt, also unwrapping txt from wrapped txt, > are to work ?correctly? then there needs to be *some* degree of > expectation on the formatting of the incoming texts. Otherwise > these tasks cannot be successfully automated. that's true. but i'm not talking just about "derivative formats", because there's no need to create a "derivative" if you'd rather just use the .txt file itself to drive the display, a la "eucalyptus". however, the .txt file does have to be formatted "correctly" if it is to be _displayed_ correctly. that's what's driving my motivation... > Going the other way, the automated wrapping of txt is has > built-in support by most (all?) modern text tools, including > web browsers, e-book readers, text editors, etc. stop trying to derail the thread, jim. there's no way that project gutenberg is going to mount files that don't have mid-paragraph hard linebreaks... _no_way_... so that's not what we're talking about here. and we aren't _going_ to talk about that here, no matter how many times you try to bring it up. so stop trying. what we _are_ talking about now is formatting the .txt files _correctly_, so that they can be unwrapped automatically... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From donovan at abs.net Mon Apr 26 11:28:59 2010 From: donovan at abs.net (D Garcia) Date: Mon, 26 Apr 2010 14:28:59 -0400 Subject: [gutvol-d] DP Archives/OLS (Was: Re: DP output ...) In-Reply-To: References: Message-ID: <201004261428.59204.donovan@abs.net> Now that hardware replacements and follow up tasks are mostly complete for the DP production server, I'm taking a moment to at least partially respond to several comments and remarks recently brought up regarding archival materials for projects completed at DP. don kretz wrote on 2010-04-20 at 15:35 >Upload in as complete form as possible the matching image and >text files so future modification and adaptation is possible. There's >no loss to DP by doing so; and the risk is that over time they are >quite capable of losing track of them. I find your lack of faith disturbing. :) The DP test server is an Internet Archive machine and is backed up within their infrastructure. I also personally maintain a remote backup of all archived project files to dedicated storage here. Having said that, some of the earliest DP material (produced on the server in charlz's garage) is not archived in the OLS. This gap in the archives encompasses 422 known projects. At last word from charlz, he has this material, but unfortunately has not yet provided it to be incorporated into the archives. bowerbird wrote on 2010-04-20 at 20:35 >first, it looks like i was wrong when i said >that d.p. had stopped maintaining the "ols", >so of course my "reason" for their having >stopped maintaining it was also incorrect. >(or one could say it's _no_longer_ correct, >but i do believe it was correct at one time.) charlz was the original and sole maintainer of the DP archives. When he ended his active participation with DP, the archives were unmaintained until I made time to reconstruct the undocumented procedures for moving project files over and recording them in the database. Since then, they have been continuously maintained and the current process documented for the benefit of future caretakers. don kretz wrote on 2010-04-20 at 22:12 >Available Formats: Display of images from this source has not been permitted. DP abides by the stated wishes of the image sources with respect to public redisplay of images from various sources. For sources which do not wish images from their efforts redistributed, the files from DP are retained in the archives, but are not 'made available' in accordance with these agreements. Kevin Pulliam wrote on 2010-04-21 at 00:07 >On the Open Library System, I note that high resolution gray-scale >scans (at least for the one project I checked) are not archived, ... >I also note that there is no 'bulk' download function >to get a zip of all the files associated with a text. The hires scans are archived, however the OLS code and UI are feature-poor. Availability of hires and zip file sets are among the desired features, but development of the OLS is not currently a priority item. David (donovan) From Bowerbird at aol.com Mon Apr 26 11:36:14 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 14:36:14 EDT Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...) Message-ID: <241c1.457defaf.3907371e@aol.com> and the takeaway is that p.g. can copy those scans and mount them any time that it chooses to do so. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 11:56:20 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 14:56:20 EDT Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") Message-ID: <25a5e.64d77064.39073bd4@aol.com> looks like we lost our mad scientist... or perhaps he's off deep into programming mode, creating wondrous new tools for project gutenberg. at any rate, for anyone out there who's interested, here is my latest .pdf of "gods and fighting men": > http://z-m-l.com/misc/14465-take6.pdf feedback, public or private, would be appreciated... this version retains p.g. linebreaks, for the most part. (i did some rewrapping, to remove egregious orphans.) the use of 10.5-point type made even the longest lines manageable, given the 4.5-inch measure that i used... that's obtained by .5-inch margins on a 5.5-inch page. (if you were printing this at lulu.com, you could specify a 6*9-inch page, which would allow a bigger fontsize.) however, as you'll see, the lines are extremely ragged, with many short lines, since they were wrapped by the character-count, not the length of a proportional font. (furthermore, there was some real weirdness on this, in that many lines seemed to have been counted short; in particular, it was as if the algorithm was _trying_ to create very short lines as the last line of the paragraph. i don't recall having seen this before; it was _strange_.) anyway, because many of the lines were counted short, using full justification on this text would be a disaster. but otherwise, this is a _respectable_ job of typesetting. i'm gonna rewrap this text, using a bigger fontsize, and i'll mount that .pdf later this week. enjoy this one now... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Mon Apr 26 12:22:29 2010 From: dakretz at gmail.com (don kretz) Date: Mon, 26 Apr 2010 12:22:29 -0700 Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...) In-Reply-To: <241c1.457defaf.3907371e@aol.com> References: <241c1.457defaf.3907371e@aol.com> Message-ID: Except for the ones that are purportedly in Charlz' garage and/or they have a policy not to make available. donovan - are they in fact available somehow from within DP? I don't see how there can be a problem with using them to reproof old projects. On Mon, Apr 26, 2010 at 11:36 AM, wrote: > > and the takeaway is that p.g. can copy those scans > and mount them any time that it chooses to do so. > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmcdermott at mad-computer-scientist.com Mon Apr 26 12:26:32 2010 From: mmcdermott at mad-computer-scientist.com (Michael McDermott) Date: Mon, 26 Apr 2010 14:26:32 -0500 Subject: [gutvol-d] Re: Typesetting (back on track, calling the mad scientist) In-Reply-To: <777f8.52df4abc.39022282@aol.com> References: <777f8.52df4abc.39022282@aol.com> Message-ID: <1272027526-sup-6377@zion> Leslie, > mike mcd, are you still out there? Yup--everyone else seemed to be having too much fun with the iPad, though :) > here's another take on "gods and fighting men": > > http://z-m-l.com/misc/14465-take6.pdf I took a look at this and at the take5 version. Either would be quite sufficient for my needs, though, aesthetically, I like the take5 version better. > how about the leading? it's 15-point leading, so > that's generous for 10-point type, and you might > feel it's _too_ big, but i thought i'd show it to you. The leading is perfect. > so the question is, are these small variations bad? > noticeable? too much so? do they bother you much? Don't bother me--though, admittedly, if I had an ebook reader I probably would not be bothering with any of this. I also conducted some more experiments with CSS stylesheets for on the html2ps side of things (using txt2html so that the chain looked like: txt2html -> html2ps -> ps file -> printer/screen). -Michael Excerpts from Bowerbird's message of Thu Apr 22 17:06:58 -0500 2010: > we seem to have lost the mad scientist. > > mike mcd, are you still out there? > > if so, i have some questions for you... > > here's another take on "gods and fighting men": > > http://z-m-l.com/misc/14465-take6.pdf > > this .pdf just has the first page of each chapter, > but it came outta my program, not a text-editor. > > as we can see, the p.g. linebreaks make this text > practically unusable, so we'll have to do a rewrap, > especially if you want to have the text _justfied_... > (if not, i can just rearrange the unwieldy lines and > leave the vast majority of p.g. linebreaks in place.) > > going on, is this text-size (10-point) good for you? > (again, print out some pages so you know for sure.) > > how about the leading? it's 15-point leading, so > that's generous for 10-point type, and you might > feel it's _too_ big, but i thought i'd show it to you. > > on pages 12 and 101, you'll see _blue_ headers... > those are lines that needed to be _shrunk_ a bit, > so they would not spill over into the margin area. > on page 12 it's 15.5-point instead of 16-point, > and on page 101 it's 13-point instead of 14-point. > (and that latter one still intrudes on the margins.) > > the program attempts to "copy-fit" all the headers > to the same size, but i'm experimenting here with > allowing slight variations in size on freakish lines. > (rather than letting the freaks dictate that the other > header-lines be smaller to accommodate the freaks.) > > so the question is, are these small variations bad? > noticeable? too much so? do they bother you much? > > -bowerbird -- Michael McDermott www.mad-computer-scientist.com From Bowerbird at aol.com Mon Apr 26 13:39:15 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 16:39:15 EDT Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...) Message-ID: <2de3e.7994bbb2.390753f3@aol.com> dakretz said: > Except for the ones that are purportedly in Charlz' garage "purportedly"? tone down the skepticism, man. save it for when it's needed. > and/or they have a policy not to make available. well, um, yeah... d.p. made a policy decision to "respect" the wishes of some institutions _not_ to repost the scans, even though those "wishes" have no basis in legal rights... in other words, d.p. traded away your public-domain rights for the purpose of maintaining "friendly" relations. oh well. the good news is that, for the most part, anyone else can retrieve the scans from the same place where d.p. got 'em. i don't know if the "o.l.s." tells where the scans came from, but the "credits" portion of the posted e-book often says so. > donovan - are they in fact available somehow from within DP? it probably depends upon who is asking, and for what purpose. then again, don, i'm sure you're well aware of _those_ caveats... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 13:55:40 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 16:55:40 EDT Subject: [gutvol-d] Re: Typesetting (back on track, calling the mad scientist) Message-ID: <2f24a.5dfce11.390757cc@aol.com> mike said: > Leslie, leslie is my girlfriend, not me. sometimes she checks her e-mail when i'm away from the machine, so when i come back i end up sending a message from her account. oh, but you probably got her name from the "author" box of the .pdf, now that i think about it. that was filled in by the text-editor i used... if you check out the later .pdfs, you should find that that metadata is supplied correctly by my authoring-tool. (unless i forgot to specify it.) > The leading is perfect. oops... i took it up considerably in the newer version i just posted... it has 10.5-point type, with 12-point leading; and still runs 400 pages. and bigger leading means fewer lines per page, and thus more pages. which might or might not be a big deal to you. all of these variables make it complicated to know how to create a .pdf for somebody else. which is why a cyberlibrary needs to put .pdf/hard-copy output creation ability into the hands of its end-users, so they can _customize_ it fully... > if I had an ebook reader I probably > would not be bothering with any of this. that's why i make many of the decisions according to a smart default. > I also conducted some more experiments with CSS stylesheets for on > the html2ps side of things (using txt2html so that the chain looked like: > txt2html -> html2ps -> ps file -> printer/screen). i'd love to see a .pdf representing your output from that... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Mon Apr 26 14:04:31 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 26 Apr 2010 14:04:31 -0700 Subject: [gutvol-d] Re: Removing spurious break lines In-Reply-To: <2377d.3eb6f509.39073549@aol.com> References: <2377d.3eb6f509.39073549@aol.com> Message-ID: >what we _are_ talking about now is formatting the .txt files _correctly_, so that they can be unwrapped automatically... In any case PG already "owns" a txt unwrapper, since PG is in some cases generating HTML from pgtxt70, and that requires unwrapping the text (txt?), which is being done, as one can tell by opening just about any PG HTML that was autogenerated from the submitted pgtxt70 file format. The text not correctly unwrapped in this case was an HTML submitted that had the linebreaks forced -- which is not usual PG convention (to the extent PG *has* a HTML convention.) Perhaps you should start by examining what PG has *already* implemented for txt unwrapping to generated HTML, find out what works and what doesn't work, and what requirements this puts on txt submission in order to make it all work right? Otherwise PG will end up with two conflicting text unwrapping standards, which will make the submitter's task even more confusing. If PG can successfully implement the *hard* task of unwrapping text, one would think PG could also support the *easy* task of wrapping submissions to the pgtxt70 standard. Implementing both directions to form a round-trip might even give PG a heads-up where its assumptions -- or failure of the submission to follow style guidelines -- is "breaking" the wrapping or unwrapping effort. To the extent that you guys are heading more-and-more towards the "unobtrusive" marking up of txt files, please note that Python has already got very good efforts in this regard called "reStructured Text" -- and the tools existing to support it! Not to imply that PG would have to follow their lead literally for example they use *emphasis* for italics and **strong emphasis** for bold. Rather you could just "borrow" their tools. http://docs.python.org/documenting/rest.html http://docutils.sourceforge.net/rst.html and online tools that work for trying it out: http://www.tele3.cz/jbar/rest/rest.html Now I don't like the formatting of the Python manuals -- but that is a separate choice from the markup language, and the tools they have created for making the manuals from lightweight "unobtrusive" markup. From ajhaines at shaw.ca Mon Apr 26 14:12:40 2010 From: ajhaines at shaw.ca (Al Haines (shaw)) Date: Mon, 26 Apr 2010 14:12:40 -0700 Subject: [gutvol-d] Re: Removing spurious break lines References: <1d6ec.d2028aa.3907243a@aol.com> Message-ID: <4E5F6EAA7200402CAAEC93506E30B6B1@alp2400> The question was: will p.g. accept e-texts that have been "corrected" by virtue of having their not-to-be-unwrapped lines indented? The short answer, yes. "Corrected" texts can be sent to PG's errata system (errata2010_AT_pglaf.org) as attachments. However, it seems to me that simply indenting not-to-be-formatted lines, and doing nothing else, is at least somewhat pointless. Many of PG's older texts have other problems than that (missing illustrations, ASCII only rather than Latin1 or UTF8, missing/incomplete indexes, etc, etc, etc.) A far more desirable approach would be to pick an old PG text, find a scanset in IA/Google/wherever, get a copyright clearance, and do a new version of it, either from scratch, or a complete re-proof of the existing file(s), doing whatever is needed to bring it up to current standards. Two examples: "Main Travelled Roads", by Hamlin Garland (PG#2809). In response to a recent errata report, I repaired several missing paragraphs. While doing that, I found that the text also has hundreds, maybe thousands, of hyphens that should be em-dashes (--), far too many for my limited time to deal with (see note below). Arizona Sketches (PG#756). It's missing all its illustrations, and the first "n" in "canon" is a plain "n", not n-tilde. Investigation may reveal other problems. Note: Complaints that the Repost team (mostly myself and David Widger, between us doing considerable clean-up work on several thousand of PG's old files) "should have done more" will fall on deaf ears. We're only two people, we're also 2/3 of the Whitewashers, and 2/3 of the Errata team, and we both produce independently. Back off. Al ----- Original Message ----- From: Bowerbird at aol.com To: gutvol-d at lists.pglaf.org ; bowerbird at aol.com Sent: Monday, April 26, 2010 10:15 AM Subject: [gutvol-d] Re: Removing spurious break lines al said: > I've contacted Joaquin directly. well, i agree that there's no need to engulf an innocent newcomer in an involved discussion about p.g. policy. but that doesn't mean the discussion should be swept under the rug, eh? we'll need an answer to the question: will p.g. accept e-texts that have been "corrected" by virtue of having their not-to-be-unwrapped lines indented? transparency is the new black. sunshine is the best disinfectant. the most colorful fish deserve the most apparent aquarium. so... what is the p.g. policy on this? -bowerbird _______________________________________________ gutvol-d mailing list gutvol-d at lists.pglaf.org http://lists.pglaf.org/mailman/listinfo/gutvol-d From jimad at msn.com Mon Apr 26 14:49:05 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 26 Apr 2010 14:49:05 -0700 Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") In-Reply-To: <25a5e.64d77064.39073bd4@aol.com> References: <25a5e.64d77064.39073bd4@aol.com> Message-ID: Comparing to a recent pub of Studs Turkel which I happened to have at hand, the page size is almost identical -- 1/2 sheet US Letter. The Studs Turkel however has 60 chars per line compared to 70 chars per line in your example PDF -- and as compared to 50 chars per line or less for historical novels. Less chars per line tend to make things more readable while taking more paper. Too many chars per line make things very painful to read -- which is why magazine format or newspaper format is broken up into two or more columns. http://desktoppub.about.com/cs/finetypography/ht/line_length.htm From Bowerbird at aol.com Mon Apr 26 14:53:49 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 17:53:49 EDT Subject: [gutvol-d] Re: Removing spurious break lines Message-ID: <334c2.40a4c33d.3907656d@aol.com> al said: > However, it seems to me that simply indenting > not-to-be-formatted lines, and doing nothing else, > is at least somewhat pointless.? it's not "pointless" to make _some_ "corrections", but not all, to a text. that's what whitewashers do quite often, when _you_ "correct" a text... > Many of PG's older texts have other problems than that > (missing illustrations, ASCII only rather than Latin1 or UTF8, > missing/incomplete indexes, etc, etc, etc.) that's right. but you whitewashers don't fix all of those problems. you apply the corrections that've been submitted, and you run the text through the new version of your tools, and you might also do some other checks (and thanks for doing that), and then you post it. > A far more desirable approach would be to > pick an old PG text, find a scanset in IA/Google/wherever, > get a copyright clearance, and do a new version of it, either > from scratch, or a complete re-proof of the existing file(s), > doing whatever is needed to bring it up to current standards. i fully agree that that would be "more desirable". it would also be a heck of a lot more work. and that's the trade-off, is it not? still, someone who does _some_ of the "corrections" -- whether that is to make the text robust to rewrapping or some other subset of stuff -- is not engaging in a "pointless" exercise. they're improving the text, and -- just as i thank you whitewashers for improving the text when you do _your_ "corrections" -- i would thank any other person who improved the text when they do _their_ "corrections". i fully approve of an iterative process that steadily cumulates "partial corrections"... > Note: Complaints that the Repost team (mostly myself and > David Widger, between us doing considerable clean-up work > on several thousand of PG's old files) "should have done more" > will fall on deaf ears.? We're only two people, we're also 2/3 > of the Whitewashers, and 2/3 of the Errata team, and we both > produce independently.? Back off. i have never "complained" about the "corrections" by whitewashers. i _have_ pointed out that these "corrections" are _not_ complete, in the sense that many errors and inconsistencies and omissions _survive_ this "correction" process. but again, i do not condemn any "corrections" because they are not complete. i welcome and appreciate _all_ "corrections", even incomplete ones, because they move the text closer to _perfection_, and that's what i advocate... i wasn't "complaining" to report your corrections" are incomplete. i felt the need to point that out because you _do_not_ point it out. you say that "errors were corrected" and you simply leave it at that. i believe that many people probably conclude, from your statement, that you've made a good-faith effort to actually _find_ all the errors -- such as by comparing the text with a newly-obtained scan-set -- when, in point of fact, you have not actually gone to those lengths... nobody is "blaming" you, or "criticizing" you for doing what you do, or for not doing what you're not doing. so there's no need to tell us to "back off". that's insulting, and you really shouldn't be so sensitive. we're just stating the facts. _clearly_. because you haven't done that. so, can we all agree that "partial corrections" are _not_ "pointless"? because that would be a huge step in the right direction, yes it would. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 15:05:40 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 18:05:40 EDT Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") Message-ID: <34193.37b281e1.39076834@aol.com> jim said: > The Studs Turkel however has 60 chars per line > compared to 70 chars per line in your example PDF -- > and as compared to 50 chars per line or less for historical novels. jim, you need to pay better attention. > Less chars per line tend to make things more readable > while taking more paper.? Too many chars per line > make things very painful to read ... > http://desktoppub.about.com/cs/finetypography/ht/line_length.htm yes, jim, i know all about line-length and readability... (and if you do a citation, quote _bringhurst_, not about.com) *** those who _were_ paying attention know that i said explicitly i retained the p.g. linebreaks, accounting for the long lines... it's also the case that michael mcd seemed to accept the fontsize and the margins and the pagesize, which means that his eyes didn't particular mind the long lines, and since the .pdf was intended for him, he's the ultimate judge here. even more so, people who were paying attention also know that i said i would be rewrapping the text and doing a .pdf with a bigger fontsize. since the pagesize and the margins will stay the same, that means shorter lines _by_definition_, bringing them to the 50-65 characters bringhurst suggests. in other words, jim, your post was completely unnecessary. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Mon Apr 26 15:19:05 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 26 Apr 2010 15:19:05 -0700 Subject: [gutvol-d] Re: Removing spurious break lines In-Reply-To: <334c2.40a4c33d.3907656d@aol.com> References: <334c2.40a4c33d.3907656d@aol.com> Message-ID: Someone new to PG having gotten a new Brand X ebook reader hears about PG having "Free Books" goes to the website and downloads a file in some format. Either it "works" but maybe it has a few errors in it and many people never notice those errors are there. Or that person chooses some file format from PG opens it in their ebook reader and the results are totally scrambled and unreadable. And that person then says "Holy Cow what's wrong with PG???" and goes away never to return. Which do you prefer? From jimad at msn.com Mon Apr 26 15:24:33 2010 From: jimad at msn.com (Jim Adcock) Date: Mon, 26 Apr 2010 15:24:33 -0700 Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") In-Reply-To: <34193.37b281e1.39076834@aol.com> References: <34193.37b281e1.39076834@aol.com> Message-ID: I was paying attention and the insults you send my way are not necessary. Again, you ask for feedback but you are not willing to accept any graciously. From Bowerbird at aol.com Mon Apr 26 16:04:31 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 19:04:31 EDT Subject: [gutvol-d] Re: Removing spurious break lines Message-ID: <37e1b.78fc9644.390775ff@aol.com> jim said: > Perhaps you should start by examining what PG has *already* > implemented for txt unwrapping to generated HTML, > find out what works and what doesn't work, and > what requirements this puts on txt submission > in order to make it all work right? the code that does unwrapping right now is marcello's. i don't know if he has updated it since i looked at it, but when i did, it was just what i'd expect from a technocrat: he made the problem much more difficult than it is, and subsequently his code is overwrought, _and_ it backfires. (for instance, he was using a rhyming dictionary to try to determine if a set of lines constituted a poem; good luck.) all in all, once you approach the problem intelligently, it's not that difficult to unwrap most p.g. files correctly, even the ones which have not been formatted correctly, because you can detect the lines that should be indented. i could run a script that auto-fixes most p.g. e-texts, with few introduced errors; too bad p.g. doesn't work that way; the whitewashers insist on fixing the books one-at-a-time. > Otherwise PG will end up with > two conflicting text unwrapping standards, > which will make the submitter's task > even more confusing. marcello had to code his unwrapper precisely because p.g. doesn't enforce its existing policy on text indents, or have the foresight to expand it to cover other cases. his code won't scale. and the indentation policy _will_... (and it'll replace his kludge code with something simple.) so there's no issue with "two conflicting standards" here. > To the extent that you guys are heading more-and-more > towards the "unobtrusive" marking up of txt files, please > note that Python has already got very good efforts in this regard > called "reStructured Text" -- and the tools existing to support it!? you're a few years behind the threads here, jim... "restructured text" is a light-markup format, just like z.m.l. the main difference is z.m.l. is geared directly toward p.g., whereas restructured text has a provenance that's muddled, so if you were gonna choose between the two, choose z.m.l. (tools for any light markup system are _not_ hard to build.) but hey, if you can get p.g. to go for restructured text, do it! -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Mon Apr 26 16:10:14 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Mon, 26 Apr 2010 19:10:14 EDT Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") Message-ID: <383fb.6aac4bd9.39077756@aol.com> jim said: > I was paying attention then why didn't you understand the situation? why did you say something that made no sense? > Again, you ask for feedback > but you are not willing to accept any graciously. ok, jim, let me make things perfectly clear. i do not value _your_ feedback. so when i ask for feedback, i am most specifically _not_ asking for feedback from _you_. definitely not you. are we clear? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmcdermott at mad-computer-scientist.com Mon Apr 26 20:42:49 2010 From: mmcdermott at mad-computer-scientist.com (Michael McDermott) Date: Mon, 26 Apr 2010 22:42:49 -0500 Subject: [gutvol-d] Re: Typesetting (back on track, calling the mad scientist) In-Reply-To: <2f24a.5dfce11.390757cc@aol.com> References: <2f24a.5dfce11.390757cc@aol.com> Message-ID: <1272316031-sup-6696@zion> > sometimes she checks her e-mail when i'm away from the machine, > so when i come back i end up sending a message from her account. > oh, but you probably got her name from the "author" box of the .pdf, > now that i think about it. that was filled in by the text-editor i "Leslie" is also the registrant of z-m-l.com according to whois. :) > i'd love to see a .pdf representing your output from that... That can be arranged: http://www.mad-computer-scientist.com/files/14465-8.pdf Fairly straightforward. The commands: txt2html 14465-8.txt | html2ps -D > 14465-8.ps ps2pdf 14465-8.ps Some notes: * Inconsistent recognition of minor sections (i.e., sections within the intro). * Double dashes are not converted to em-dashes. * Images are, of course, not put in. This can be improved by using PG HTML, where applicable. * No TOC. This can be done by cutting the TOC from the original document and putting it in a link file for txt2html, then telling html2ps to convert links to page references. * No way that I see to handle footnotes automatically. Some custom CSS should handle it, if the docs don't lie. PS. The CSS: @html2ps { option { hyphenate: 1; number: 1; } } @page { size: 5.5in 8.5in; margin-top: 0.5in; margin-bottom: 0.5in; margin-left: 0.5in; margin-right: 0.5in; } p { text-indent: 1.5em; } Excerpts from Bowerbird's message of Mon Apr 26 15:55:40 -0500 2010: > mike said: > > Leslie, > > leslie is my girlfriend, not me. > > sometimes she checks her e-mail when i'm away from the machine, > so when i come back i end up sending a message from her account. > > oh, but you probably got her name from the "author" box of the .pdf, > now that i think about it. that was filled in by the text-editor i > used... > if you check out the later .pdfs, you should find that that metadata is > supplied correctly by my authoring-tool. (unless i forgot to specify it.) > > > The leading is perfect. > > oops... i took it up considerably in the newer version i just posted... > > it has 10.5-point type, with 12-point leading; and still runs 400 pages. > and bigger leading means fewer lines per page, and thus more pages. > which might or might not be a big deal to you. all of these variables > make it complicated to know how to create a .pdf for somebody else. > > which is why a cyberlibrary needs to put .pdf/hard-copy output creation > ability into the hands of its end-users, so they can _customize_ it > fully... > > > if I had an ebook reader I probably > > would not be bothering with any of this. > > that's why i make many of the decisions according to a smart default. > > > I also conducted some more experiments with CSS stylesheets for on > > the html2ps side of things (using txt2html so that the chain looked > like: > > txt2html -> html2ps -> ps file -> printer/screen). > > i'd love to see a .pdf representing your output from that... > > -bowerbird -- Michael McDermott www.mad-computer-scientist.com From Bowerbird at aol.com Tue Apr 27 00:22:16 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 Apr 2010 03:22:16 EDT Subject: [gutvol-d] Re: Typesetting (back on track, calling the mad scientist) Message-ID: <4bc33.33941a9b.3907eaa8@aol.com> michael mcd said: > That can be arranged: > http://www.mad-computer-scientist.com/files/14465-8.pdf that's serviceable output. the book designers won't be giving it any awards. but if it meets your needs, that's pretty much all that counts. as the object was a hard-copy printout, i didn't even talk about things like a hotlinked table of contents or footnote presentation, but i _can_ deal with them if you want to discuss the pdf qua pdf. just from an ink-on-paper perspective, though, if you were to critique any of the output, exactly what points would you make? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimad at msn.com Tue Apr 27 08:21:44 2010 From: jimad at msn.com (James Adcock) Date: Tue, 27 Apr 2010 08:21:44 -0700 Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") In-Reply-To: <383fb.6aac4bd9.39077756@aol.com> References: <383fb.6aac4bd9.39077756@aol.com> Message-ID: > then why didn't you understand the situation? why did you say something that made no sense? I believe I did say something that makes sense, its just that you still do not understand that your problem was not font size but rather line length. You also do not apparently understand that it is not generally true that font size and line length are inversely related. >i do not value _your_ feedback. so when i ask for feedback, i am most specifically _not_ asking for feedback from _you_. So you retract your previous statements when you said that I should have been reporting to you when your tools fail? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 27 11:40:06 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 Apr 2010 14:40:06 EDT Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") Message-ID: <39f63.1062790c.39088986@aol.com> jim... you just utterly and completely fail to grok the listserve imperative to move threads forward. *** jim said: > I believe I did say something that makes sense, then you have a serious intellectual problem too. > its just that you still do not understand that > your problem was not font size but rather line length. this is now the third time i have said it, twice now directly to you, and you still don't seem to "get it". i retained the p.g. linebreaks. that means p.g. decided the line-length, not me. if you don't understand the exact meaning of that, continue pondering it until you _do_ understand it. because you won't be able to keep up with the thread, let alone advance it, until you've understood that fact. i retained the p.g. linebreaks. that's why each line was as exactly as long as it was. because that's how long it was in the p.g. e-text... (except for lines i rewrapped to get rid of orphans.) are the lines in p.g. files too long, in general? _yes._ does that have any bearing on our experimentation? not really. because we're just trying things out here. nothing is cast in stone. and perhaps, for this book, for michael's eyes in particular, the p.g. lines are fine, even though -- for you, or me, or somebody else -- they might be too long. our opinion doesn't matter, because michael is printing out this book for himself. that's a beauty of print-on-demand -- customization. besides, we're going to do _more_ experiments later... > You also do not apparently understand that > it is not generally true that font size > and line length are inversely related. again, you need to put in some more thought... if we talk about a pre-determined line of characters -- like in this case, where we retain p.g. linebreaks -- it is _absolutely_true_ that the space required will be directly related to the fontsize. _it's_absolutely_true_. the character-count of the line will remain unchanged -- by definition, as it was determined by linebreaks -- but the width of the line printed on a page depends on fontsize. the bigger the size, the more space required. and if we also constrain the size of the space in which we are putting that pre-determined line of characters, we have put an upper-limit on the fontsize we can use. in this case, we are using a space that's 4.5-inches wide, so the biggest fontsize i could use that kept all the lines reasonably within the width of the space was 10.5-point. a smaller fontsize wouldn't have filled up all of the page, plus it would've been less readable, so i used 10.5-point. in other words, all of the other factors were constrained, and fontsize was left to vary, and had to make it all fit... for my next .pdf, and i've said _this_ three times now too, twice directly to you, so it really should've sunk in by now, i will unwrap the text (i.e., free it from the p.g. linebreaks) and jack the fontsize to 12-point, so michael can see that. and then we'll do some more experimentation after _that_. > So you retract your previous statements > when you said that I should have been > reporting to you when your tools fail? just exactly how useful do you think an "it doesn't work" report is, anyway? you never gave one worthwhile report. so no, jim, i don't want any reports from you, none at all... are we clear now? or do i have to repeat that a third time? -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Tue Apr 27 12:48:26 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Tue, 27 Apr 2010 15:48:26 EDT Subject: [gutvol-d] Re: Typesetting ("gods and fighting men") Message-ID: <3f3c4.5b33dc94.3908998a@aol.com> ok, back to work... *** have you ever tried to copy text out of a .pdf? if you have, you know that it can be frustrating. because a lot of information seems to get lost. perhaps the most noticeable are "empty" lines. if you used a blank line between paragraphs, all of those blank lines are lost, which means that your paragraphs are now all run together. compounding that problem is that the "soft" linebreak at the end of mid-paragraph lines is turned into a "hard" linebreak. it's a disaster... you _can_ choose the "export as plain text" item from one of the menus, and that does retain the empty lines. unfortunately, it also strips styling. the copy-text route retains the styling, or at least _some_ of it. but not all of it. italics are often lost. so is the indentation on block-quotes, poetry, etc. so no matter what you do, getting text from a .pdf is a struggle. that's one reason why .pdf is called "the roach motel of documents", because text can go in, but it cannot come out again. you can demonstrate this to yourself by using the .pdf that michael mcd created... *** i do some things with my .pdf tool to solve this little problem. for instance, it doesn't output a "blank" line when it encounters one. instead, it outputs a double-colon -- "::" -- that is white, thus _invisible._ (or i'll often make it light gray.) but it's still there, and gets copied out when you copy the text, so you can then do a global change of "::" to nil, and voila, you have your blank lines. in z.m.l., italics are represented by _underbars_, so i also have my program output the underbars, again turning them white so they'll be invisible... i haven't worked on this for a while, so i cannot remember what state of success it's in right now, but my goal is to create "round-tripping", so that when you use z.m.l. to create a .pdf, the text you copy out of that .pdf, after a few global changes, can be used to create that exact same .pdf again. go ahead and copy the text out of one of my .pdfs, and you'll get a good idea what i'm talking about... *** the tricks that are built into my tool are ones that you can do "manually" in your own wordprocessor, if you'd like to create a "round-trip" capability too. surround your italicized stuff with white underbars, change your blank lines to a white double-colon, and use white periods to create your indentation. i did that in the next two .pdfs i will talk about, so you can copy the text out of 'em to see this at work. *** i used a text-editor to create two more .pdfs for us in our experiments using "gods and fighting men". i unwrapped the text, freeing it from p.g. linebreaks. then i made the fontsize a more-readable 12-point. i also put back in a more-spacious 14-point leading. all these changes pushed the .pdf to some 567 pages, from the previous 391, so that's an offsetting negative, but the positive aspect is a much more readable .pdf... i created a ragged-right version, and a justified one: > http://z-m-l.com/misc/14465-rewrapped-rag.pdf > http://z-m-l.com/misc/14465-rewrapped-just.pdf these two are exactly the same, except for justification, so you might find it odd that the first is just 1.5 megs, while the second is almost twice as big, at 2.9 megs... the reason for this discrepancy is the ragged-right .pdf stores the location of each line, rendering it right there, while the justified .pdf has to store the location of each _word_, to print it in the right place. it's a big difference. at any rate, maybe the mad scientist will look at these and advise us on what pointsize he'd like to see "final", what leading he wants, and if he prefers justification... -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 30 01:24:46 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 30 Apr 2010 04:24:46 EDT Subject: [gutvol-d] spill baby spill Message-ID: <79295.5d9ffc0.390bedce@aol.com> the worst slick in human history. so let's all give our thanks to the republicans and the oil corporations who pull their puppet strings. -bowerbird -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Fri Apr 30 09:27:13 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 30 Apr 2010 09:27:13 -0700 Subject: [gutvol-d] Re: spill baby spill In-Reply-To: <79295.5d9ffc0.390bedce@aol.com> References: <79295.5d9ffc0.390bedce@aol.com> Message-ID: s/republicans/politicians/ On Fri, Apr 30, 2010 at 1:24 AM, wrote: > > the worst slick in human history. > > so let's all give our thanks to the republicans and > the oil corporations who pull their puppet strings. > > -bowerbird > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowerbird at aol.com Fri Apr 30 11:25:36 2010 From: Bowerbird at aol.com (Bowerbird at aol.com) Date: Fri, 30 Apr 2010 14:25:36 EDT Subject: [gutvol-d] Re: spill baby spill Message-ID: dakretz said: > s/republicans/politicians/ all of the politicians are rotten, it's true. but the republicans are more rotten. way more rotten. besides, look at the subject-header... -bowerbird p.s. this means if any of you consider yourself to be a republican, you'd better take a good look at yourself. not that democrats couldn't stand a look in the mirror. your whole system is now corrupt, and _you_ made it... -------------- next part -------------- An HTML attachment was scrubbed... URL: From dakretz at gmail.com Fri Apr 30 11:42:53 2010 From: dakretz at gmail.com (don kretz) Date: Fri, 30 Apr 2010 11:42:53 -0700 Subject: [gutvol-d] Re: spill baby spill In-Reply-To: References: Message-ID: OK, then /republicans/politicians currently in power/ On Fri, Apr 30, 2010 at 11:25 AM, wrote: > dakretz said: > > s/republicans/politicians/ > > all of the politicians are rotten, it's true. > > but the republicans are more rotten. way more rotten. > > besides, look at the subject-header... > > -bowerbird > > p.s. this means if any of you consider yourself to be > a republican, you'd better take a good look at yourself. > not that democrats couldn't stand a look in the mirror. > your whole system is now corrupt, and _you_ made it... > > _______________________________________________ > gutvol-d mailing list > gutvol-d at lists.pglaf.org > http://lists.pglaf.org/mailman/listinfo/gutvol-d > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee at novomail.net Fri Apr 30 12:53:21 2010 From: lee at novomail.net (Lee Passey) Date: Fri, 30 Apr 2010 13:53:21 -0600 Subject: [gutvol-d] Re: Cooperative proofreading In-Reply-To: <4BD0BA56.5020806@novomail.net> References: <4BD0BA56.5020806@novomail.net> Message-ID: <4BDB3531.1040307@novomail.net> On 4/22/2010 3:06 PM, Lee Passey wrote: > Just an update on my cooperative proofreading site > (http://www.ebookcooperative.com/). [snip] > The next step will be to create a servlet that will allow downloading an > entire e-book by combining all the pages from the repository into a > single file. Now completed. It is unlikely that I will make any further bulk changes to the source documents associated with any projects; feel free to make any corrections you feel appropriate. As I populate the database with more projects, I would like to focus on the PG "Frankentexts," that is to say, those texts in Project Gutenberg which appear to be stitched together from multiple, unspecified sources. The classic example of this is, of course, the PG edition of _Frankenstein_. Are there other instances I should be aware of? > As always, feedback is welcomed.