From Bowerbird at aol.com  Thu Apr  1 00:48:01 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 Apr 2010 03:48:01 EDT
Subject: [gutvol-d] the ipad and project gutenberg
Message-ID: <8ec0b.712bbbb8.38e5a9b1@aol.com>

oh boy, the ipad is almost here...

gonna change the world, you watch.

nobody has mentioned here yet how
apple scooped up the p.g. corpus...

i wonder if they gave p.g. any money?

i read apple is using the .epub files.

oh yeah, now _that_ is a good idea...

to put those crappy files in front of
a user-base that cares about quality.

yeah, that's a _really_ good idea, yep.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/8e2d1c5a/attachment.html>

From schultzk at uni-trier.de  Thu Apr  1 05:04:36 2010
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Thu, 1 Apr 2010 14:04:36 +0200
Subject: [gutvol-d] Re: here's my collaborative proofing system you can
	look	at
In-Reply-To: <4BB38407.1040402@novomail.net>
References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net>
Message-ID: <BC9F179A-8715-4355-AAF7-F01091461DF0@uni-trier.de>


Am 31.03.2010 um 19:19 schrieb Lee Passey:

> I would argue that one of the most important lessons of the internet is that there is practically no such thing as an "average" user. One of the beauties of well-constructed HTML is that it accommodates itself to /all/ users, not just the mythical "average" user.
> 
> If you ever come across a web site where you have a right/left scroll bar (and there are many) you know you have encountered a web designer who's stuck in the desktop publishing world.

	I would very much disagree here. Because as you say there is no such thing as the standard user.
	Or a standard font of size for that matter. Sure you can define that certain fonts and sizes are to be used.
	Yet, does the user have them or what to use them. 
	
	I agree with you in so far that the general design of a page should not require a scroll bar.
	Yet, it should show up if the setup a user has makes it necessary. 
	My main machine is a 17" MacBook with the resolution pushed all the way up. 
	At times I zoom pages or change the default sizes and then the pages go unusable.
	because scroll bars do not pop-up of the page is not designed very intelligently.

	As far as proofing is concerned, it is actually in the desktop domain and a web-based
	domain. So what is wrong with having scroll bars. They become more important in
	editing especial if you have multiple views in one window with different content.
	You resize the views to accommodate your needs and use the scroll bars when you need
	to reach parts you rarely need or use.

	I can remember when it was said you need to design your page for 640x480 resolution because
	other resolutions were not used that much. Or what you still find this page requires XXXX
	to display or display properly. Now that is poor design. 

	regards
		Keith.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/3c341fa7/attachment.html>

From Bowerbird at aol.com  Thu Apr  1 09:16:13 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 Apr 2010 12:16:13 EDT
Subject: [gutvol-d] Re: huge hole of missing months in the listserve archives
Message-ID: <a4c96.225c1a48.38e620cd@aol.com>

missing one month (april of 2009),
but otherwise here are the archives:

>    http://z-m-l.com/gutvold/

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/05d057e6/attachment.html>

From Bowerbird at aol.com  Thu Apr  1 09:35:16 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 Apr 2010 12:35:16 EDT
Subject: [gutvol-d] Re: jim replies to his e-mail
Message-ID: <a62d3.65ff64bc.38e62544@aol.com>

jim said:
>    You ask us to review something that was broken 

it wasn't "broken".   you were just doing it wrong.

so i politely told you how to do it right, that is:

>    make your window wider

and/or

>    make your text bigger

without ever making it explicit you were doing it wrong,
even though the solutions should've been obvious to you.

(perhaps that's why you appear angry?, because the answer
to your allegation of a "problem" was so darned obvious?)

but you _insisted_ on saying "no, it's _broken_", so i had to
become more explicit that, no indeed, jim, that "bug" was
_not_ a "bug", but was indeed a feature waiting to happen.

i have a cinema screen myself.   did you really think that
i hadn't seen the very exact same thing that you saw?


>    and then when we tell you that it is broken 
>    you say you have something hidden in your back pocket 
>    which isn?t broken and that make *US* look stupid?

jim, you made _yourself_ look stupid.   and you _persist._


>    Again, post your software, including your source code, 
>    and then let us talk about it.? Until then you are just 
>    wasting everybody?s time.

do you not realize what it sounds like when you just repeat
the things you said before that were rejected way back then?


>    If you want to know why nobody is interested in 
>    using your code: just keep it up.

oh, i know full well why few people here give me feedback.

it's because i defeated many of 'em in bloody hand-to-hand.

back when they thought _they_ had the upper-hand, they
were standing in line to post messages to this listserve...

look at the archives!   dozens of messages a day, day after day.
but eventually i defeated each and every one of them, soundly.

and they haven't forgotten it, either...

but hey, i don't mind if they hold a grudge.   because i do too.

and i've proven i have better aim than all of them combined...


>    At the very least please note that you are wasting time 
>    that I could be using to make PG books.

jim says i'm wasting his time.   how's _that_ for irony?        :+)

oh well, i guess april fool's day came a bit early for him...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/3dcd225c/attachment.html>

From Bowerbird at aol.com  Thu Apr  1 09:46:40 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 Apr 2010 12:46:40 EDT
Subject: [gutvol-d] the importance of the scientific method
Message-ID: <a6f83.228c2afe.38e627f0@aol.com>

in the past, i have suggested that rfrank would do well to
incorporate more scientific method into his experiments.

but alas...

here's a quote from him in one of his forum threads:
>    I've always hoped that some of what we seem 
>    to conclude will be compelling enough that 
>    the people at DP or DPC or any of the others 
>    might bake it into their future releases.

there are a lot of "compelling" stories that one can dream up
that have very little basis in reality...   the scientific method is
how we filter out those "compelling" stories from _the_truth_,
and data is the currency of the realm in scientific experiments.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/e924f40a/attachment.html>

From joshua at hutchinson.net  Thu Apr  1 14:29:51 2010
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu, 1 Apr 2010 21:29:51 +0000 (GMT)
Subject: [gutvol-d] Re: here's my collaborative proofing system you can look
 at
References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net>
Message-ID: <871711989.124272.1270157391343.JavaMail.mail@webmail07>

An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/8ca3b865/attachment-0001.html>

From Bowerbird at aol.com  Thu Apr  1 14:46:43 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 1 Apr 2010 17:46:43 EDT
Subject: [gutvol-d] a horizontal proofing interface, line by line
Message-ID: <b8b3a.7e327790.38e66e43@aol.com>

here's a take on a horizontal proofing interface,
where the scan is sliced up into individual lines:

>    http://z-m-l.com/go/lines/pagelines.html

might be good if you use an iphone, for instance.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100401/32869b9a/attachment.html>

From schultzk at uni-trier.de  Fri Apr  2 04:35:18 2010
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri, 2 Apr 2010 13:35:18 +0200
Subject: [gutvol-d] Re: here's my collaborative proofing system you can look
	at
In-Reply-To: <871711989.124272.1270157391343.JavaMail.mail@webmail07>
References: <4ede.20ecf1cb.38e39828@aol.com> <4BB38407.1040402@novomail.net>
	<871711989.124272.1270157391343.JavaMail.mail@webmail07>
Message-ID: <1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de>

Hi Everybody, Joshua,

	Uhmmmm !

	I do not think anybody would do proofing editing on a mobile
	divice, that is phones or PDAs. 
	NetBooks most likely and probably the iPad.

	Whether a browsers supports a standard generally depends on
	the browser and not the device it is running on.

	regards
		Keith.
	
Am 01.04.2010 um 23:29 schrieb Joshua Hutchinson:

> I've had some luck using the CSS max-width
> 
> You can tell an img tag to max-width: 75% and it will resize the image to fit that much of the available area.  It'll even dynamically resize when your stretch your browser window.
> 
> Downside: Some browsers *suck* at resize images.  IE6 does not support max-width, though pretty much everyone else does on the desktop (haven't tested it with any mobile devices, though).
> 
> Josh
> 
> On Mar 31, 2010, Lee Passey <lee at novomail.net> wrote:
> 
> 
> If anyone has any suggestions as to how to dynamically resize images, 
> I'm all ears, because this is one of the problems I'm going to need to 
> resolve for my own co-operative proofing demonstration.
> 
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/ae94c847/attachment.html>

From schultzk at uni-trier.de  Fri Apr  2 04:38:18 2010
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri, 2 Apr 2010 13:38:18 +0200
Subject: [gutvol-d] Re: jim replies to his e-mail
In-Reply-To: <a62d3.65ff64bc.38e62544@aol.com>
References: <a62d3.65ff64bc.38e62544@aol.com>
Message-ID: <338FEF56-DCA7-4620-A763-0EFDC50685C3@uni-trier.de>

Hi BB,

	I use the link given and I personally did not like
	the layout or interface, but then again it just me.

	To be honest I could not make heads or tails of
	what I was seeing or exactly what to do.

	regards
		Keith.

Am 01.04.2010 um 18:35 schrieb Bowerbird at aol.com:

> jim said:
> >   You ask us to review something that was broken 
> 
> it wasn't "broken".  you were just doing it wrong.
> 
> so i politely told you how to do it right, that is:
> 
> >   make your window wider
> 
> and/or
> 
> >   make your text bigger

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/70fa7486/attachment.html>

From jimad at msn.com  Fri Apr  2 09:05:09 2010
From: jimad at msn.com (Jim Adcock)
Date: Fri, 2 Apr 2010 09:05:09 -0700
Subject: [gutvol-d] [SPAM] RE:  a horizontal proofing interface, line by line
In-Reply-To: <b8b3a.7e327790.38e66e43@aol.com>
References: <b8b3a.7e327790.38e66e43@aol.com>
Message-ID: <SNT120-DS17BDFE30221BED44DF877AAE1C0@phx.gbl>


>   http://z-m-l.com/go/lines/pagelines.html

I think this option is not bad, although I suspect not every proofer will
have the patience to take it "line-by-line."  Do you have software to
automatically slice the bitmap and align the txt?  Or are you slicing the
bitmap by hand?


From vze3rknp at verizon.net  Fri Apr  2 09:16:17 2010
From: vze3rknp at verizon.net (Juliet Sutherland)
Date: Fri, 02 Apr 2010 12:16:17 -0400
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
In-Reply-To: <b8b3a.7e327790.38e66e43@aol.com>
References: <b8b3a.7e327790.38e66e43@aol.com>
Message-ID: <4BB61851.9010301@verizon.net>


On 4/1/2010 5:46 PM, Bowerbird at aol.com wrote:
> here's a take on a horizontal proofing interface,
> where the scan is sliced up into individual lines:
>
> >   http://z-m-l.com/go/lines/pagelines.html
>
> might be good if you use an iphone, for instance.
If I were implementing DP again from scratch, I would almost certainly 
use an interface like this for at least part of the proofing 
interface/process. There are lots of advantages to putting the text very 
close to the image like that for close checking of individual 
characters. I would also provide a full page interface, much like the 
current DP one, since some things are easier to see when looking at the 
entire page at once.

Implementing this kind of line-by-line interface efficiently is only 
possible if one has word boundary information (actually, the line 
boundary) from the OCR. That kind of information was not available when 
DP started and to add it into the current DP would be so much effort as 
to make it out of the question.

JulietS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/8f6d99dc/attachment.html>

From jimad at msn.com  Fri Apr  2 09:18:53 2010
From: jimad at msn.com (Jim Adcock)
Date: Fri, 2 Apr 2010 09:18:53 -0700
Subject: [gutvol-d] Re: here's my collaborative proofing system you can
	look	at
In-Reply-To: <1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de>
References: <4ede.20ecf1cb.38e39828@aol.com>
	<4BB38407.1040402@novomail.net>	<871711989.124272.1270157391343.JavaMail.mail@webmail07>
	<1F5E0F71-30E9-46B2-9317-CB04195D5A32@uni-trier.de>
Message-ID: <SNT120-DS2FB60248718D4F3348000AE1C0@phx.gbl>

>I do not think anybody would do proofing editing on a mobile divice, that
is phones or PDAs. 
NetBooks most likely and probably the iPad.

It would be nice to at least be able to SR on anything that allows input and
have a standardized way to flag an error.


From dakretz at gmail.com  Fri Apr  2 10:12:56 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 2 Apr 2010 11:12:56 -0600
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
In-Reply-To: <4BB61851.9010301@verizon.net>
References: <b8b3a.7e327790.38e66e43@aol.com> <4BB61851.9010301@verizon.net>
Message-ID: <h2t627d59b81004021012lf53737b0ne4e9cc39dcc85705@mail.gmail.com>

I would rather have a scrolling semi-transparent gun-slit overlay on the
image that
would synchronize with the cursor in the text. I can't proof with so little
context.

On Fri, Apr 2, 2010 at 10:16 AM, Juliet Sutherland <vze3rknp at verizon.net>wrote:

>
> On 4/1/2010 5:46 PM, Bowerbird at aol.com wrote:
>
> here's a take on a horizontal proofing interface,
> where the scan is sliced up into individual lines:
>
> >   http://z-m-l.com/go/lines/pagelines.html
>
> might be good if you use an iphone, for instance.
>
> If I were implementing DP again from scratch, I would almost certainly use
> an interface like this for at least part of the proofing interface/process.
> There are lots of advantages to putting the text very close to the image
> like that for close checking of individual characters. I would also provide
> a full page interface, much like the current DP one, since some things are
> easier to see when looking at the entire page at once.
>
> Implementing this kind of line-by-line interface efficiently is only
> possible if one has word boundary information (actually, the line boundary)
> from the OCR. That kind of information was not available when DP started and
> to add it into the current DP would be so much effort as to make it out of
> the question.
>
> JulietS
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/a101cbbb/attachment-0001.html>

From jimad at msn.com  Fri Apr  2 10:16:28 2010
From: jimad at msn.com (Jim Adcock)
Date: Fri, 2 Apr 2010 10:16:28 -0700
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
In-Reply-To: <4BB61851.9010301@verizon.net>
References: <b8b3a.7e327790.38e66e43@aol.com> <4BB61851.9010301@verizon.net>
Message-ID: <SNT120-DS8A115B469FAFDB57D21F1AE1C0@phx.gbl>

>Implementing this kind of line-by-line interface efficiently is only
possible if one has word boundary information (actually, the line boundary)
from the OCR. That kind of information was not available when DP started and
to add it into the current DP would be so much effort as to make it out of
the question.

I've been thinking about the possibility of this kind of interleaved
bitmap/txt at least as an option for SR?  You can presumably flag the
linebreaks inside the png in a non-intrusive way. It would require
generating SR in a file format that can mix bitmap and txt, for example
either html or rtf.

If the linebreaks are coded non-intrusively inside the png then it is
back-compatible with what you have now.  Then you just need separate
optional page editor software, and you have given the proofers the option of
doing it line-by-line.


From schultzk at uni-trier.de  Fri Apr  2 10:34:54 2010
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri, 2 Apr 2010 19:34:54 +0200
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
In-Reply-To: <SNT120-DS8A115B469FAFDB57D21F1AE1C0@phx.gbl>
References: <b8b3a.7e327790.38e66e43@aol.com> <4BB61851.9010301@verizon.net>
	<SNT120-DS8A115B469FAFDB57D21F1AE1C0@phx.gbl>
Message-ID: <92684726-10E3-42C5-964A-DB00BB91E457@uni-trier.de>

You can hide a lot of information in an image.
There are algorithm for encripting messages in images.
Sorry, I just can not think of the name of the method.

regards
	Keith.

Am 02.04.2010 um 19:16 schrieb Jim Adcock:

> If the linebreaks are coded non-intrusively inside the png then it is
> back-compatible with what you have now.  Then you just need separate
> optional page editor software, and you have given the proofers the option of
> doing it line-by-line.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/a66cd098/attachment.html>

From Bowerbird at aol.com  Fri Apr  2 11:25:35 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 2 Apr 2010 14:25:35 EDT
Subject: [gutvol-d] Re: jim replies to his e-mail
Message-ID: <b30d.582f0ecc.38e7909f@aol.com>

keith said:
>    I use the link given and I personally did not like
>    the layout or interface, but then again it just me.

saying you didn't like it is different from saying that
"it's broken".   but not that much more informative...


>    To be honest I could not make heads or tails
>    of what I was seeing or exactly what to do.

ok, that's a little bit more clear.   or maybe it's not.

i think most people would know that you make
fixes in the edit-field, so as to match the scan.

but maybe you need to be familiar with some other
proofing systems in order to use that experience...

at any rate, yes, training will certainly be available.

thanks for the feedback.           :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/08bb521c/attachment.html>

From Bowerbird at aol.com  Fri Apr  2 11:28:36 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 2 Apr 2010 14:28:36 EDT
Subject: [gutvol-d]  re: a horizontal proofing interface, line by line
Message-ID: <b5ba.fa71e75.38e79154@aol.com>

jim said:
>    Do you have software to automatically 
>    slice the bitmap and align the txt?? 
>    Or are you slicing the bitmap by hand?

i don't do anything by hand, except pheed my phat phace,
wave to my neighbors as i drive the car around our streets,
pet my cat and dog, and pleasure myself and loved ones...

***

juliet said:
>   Implementing this kind of line-by-line interface
>    efficiently is only possible if one has 
>    word boundary information (actually, the line boundary) 
>    from the OCR. That kind of information was not available 
>    when DP started and to add it into the current DP would
>    be so much effort as to make it out of the question.

my software doesn't need any such information from o.c.r.

***

dakretz said:
>   I would rather have a scrolling semi-transparent 
>    gun-slit overlay on the image that would synchronize with 
>    the cursor in the text. I can't proof with so little context.?

i'm confused.   the whole page of context is there, both
in the form of the (sliced) image and the (sliced) text...

but please, take the text and images i've provided and
provide a demo of the system that you have described.

>    http://z-m-l.com/go/sitka/sitkap002.txt
>    http://z-m-l.com/go/sitka/sitkap002.png
>    http://z-m-l.com/go/lines

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/3d06c2de/attachment.html>

From jimad at msn.com  Fri Apr  2 11:47:07 2010
From: jimad at msn.com (Jim Adcock)
Date: Fri, 2 Apr 2010 11:47:07 -0700
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
In-Reply-To: <b5ba.fa71e75.38e79154@aol.com>
References: <b5ba.fa71e75.38e79154@aol.com>
Message-ID: <SNT120-DS147FDE3BD42D64B0243F10AE1C0@phx.gbl>

>>   Do you have software to automatically 
>>   slice the bitmap and align the txt?  
>>   Or are you slicing the bitmap by hand?
>
>i don't do anything by hand...

OK, suggest you keep working in this direction as I think it would make a
contribution.


From jimad at msn.com  Fri Apr  2 16:26:06 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 2 Apr 2010 16:26:06 -0700
Subject: [gutvol-d] Latin / HTML entities Cheat Sheet
In-Reply-To: <4BB61851.9010301@verizon.net>
References: <b8b3a.7e327790.38e66e43@aol.com> <4BB61851.9010301@verizon.net>
Message-ID: <SNT120-DS139205045649A67B23F1C2AE1C0@phx.gbl>

 
Finally got around to making myself a cheat sheet re Latin / HTML entities:

 
http://www.freekindlebooks.org/Dev/charmap.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100402/e77d0de8/attachment.html>

From jimad at msn.com  Fri Apr  2 18:28:43 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 2 Apr 2010 18:28:43 -0700
Subject: [gutvol-d] Kindle on iPad
In-Reply-To: <SNT120-DS147FDE3BD42D64B0243F10AE1C0@phx.gbl>
References: <b5ba.fa71e75.38e79154@aol.com>
	<SNT120-DS147FDE3BD42D64B0243F10AE1C0@phx.gbl>
Message-ID: <SNT120-DS21B4D2A36A99FB0E583F57AE1B0@phx.gbl>

Amazon has announced Kindle on iPad:

http://www.amazon.com/kindleforipad

Which, depending on your point of view might mean:

"Oh Good, now I can also read my Amazon books on my iPad."

Or

"Oh Good, now I can proof both ePub and Mobi on iPad"

Or

"O. S., Amazon is *already* capitulating in the race against Apple!"

Competition is good, right? ;-)

Personally, having a reader app for the iPad which looks less goofy than
that which Apple is proposing looks to me like a good thing. Tempting to get
one for SR'ing where one wants to do small edits on the fly.


From Bowerbird at aol.com  Sat Apr  3 12:22:51 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 3 Apr 2010 15:22:51 EDT
Subject: [gutvol-d] Re: a horizontal proofing interface, line by line
Message-ID: <78177.23fb8c5e.38e8ef8b@aol.com>

jim said:
>   suggest you keep working in this direction 
>    as I think it would make a contribution.

well, yes and no.

it's good for doing a word-by-word proofing...

but, as i've shown, that methodology is overkill,
at least when it is practiced over an entire book
that was properly and aggressively preprocessed,
as upwards of 90% of the lines are already perfect.

it is good at pulling out the other 10% of the lines,
however, and subjecting them to a closer scrutiny.

plus it does have utility in doing a _comparison_
between two files, as is shown in this screenshot:
>?? http://z-m-l.com/go/sitka/findlines.png

that screenshot also gives you some hints about
how i go about slicing the lines, although that's
not really too difficult to figure out on your own.

-bowerbird

p.s.   the pinkish lines in that screenshot are lines
that have a diff between the two different versions.
the numbers by the asterisks are a simple counter
for the number of diffs cumulative to that point...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100403/4631b9cf/attachment-0001.html>

From joyce.b.wilson at sbcglobal.net  Mon Apr  5 05:52:13 2010
From: joyce.b.wilson at sbcglobal.net (Joyce Wilson)
Date: Mon, 05 Apr 2010 07:52:13 -0500
Subject: [gutvol-d] Full text catalog search
Message-ID: <4BB9DCFD.3090008@sbcglobal.net>

The "Full Text" search option here 
<http://www.gutenberg.org/catalog/world/search> is broken, and has been 
for at least a year.  If the thinking is that it doesn't need to be 
fixed because some of the alternative searches at the bottom of this 
page <http://www.gutenberg.org/catalog/> are acceptable substitutes, 
then it makes sense to remove it as an option on the "Advanced Search" 
page.  To have the option there but never find any results with it is 
just confusing.

Regards,
Joyce

From Bowerbird at aol.com  Mon Apr  5 16:27:51 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 5 Apr 2010 19:27:51 EDT
Subject: [gutvol-d] the sitka book from fadedpage is now posted
Message-ID: <3ad60.6f52c518.38ebcbf7@aol.com>

the sitka book, which was produced over at fadedpage.com,
was posted over the weekend, so i will finish my version and
do a comparison on it.

the fadedpage/p.g. version is here:
>    http://www.gutenberg.org/files/31862/31862.txt

i can, however, already report that the posted version _does_
contain a few errors.   from page 19 in the original p-book:
>    http://z-m-l.com/go/sitka/sitkap019.png
"yukali" is incorrectly entered as "yuhali", and "prazdnik" is
incorrectly given as "prasdnik".   in addition, both words are
italicized in the original, but unitalicized in the p.g. version,
for a grand total of 4 errors on this single page.   (troubling
is the fact that "yukali" had another occurrence in the book,
and the similarity of these two non-dictionary words in the
book's word-list _should've_ triggered closer examination.)

ironically, there was a note in the forum about this page,
saying that it was marked as "done" incorrectly after _no_
proofing, due to an inadvertent click on a wrong button, 
a note which also then pointed out these changes which
needed to be done, but evidently rfrank missed that note.

and with nobody else doing that page, the errors persisted.

plus there are a few other errors i have already noticed...

the word "sheetkah", on page 25, is missing its italics:
>    http://z-m-l.com/go/sitka/sitkap025.png

and the period after "december 14th" should be a comma,
in the footnote (#25) which was on page 80 in the p-book:
>    http://z-m-l.com/go/sitka/sitkap080.png

so there are some things worth pointing out about all of this...

the first is that, by _my_ standard of one-error-per-10-pages,
these 6 errors are _not_ a damning indictment, not nearly so...
in and of themselves, these errors are trivial.   don't forget that.

of course, you don't like to concentrate four errors on one page.
but stuff happens.   so in terms of this particular case, for _me_,
i don't think it's a big deal.   however, in terms of the _workflow_,
the fact that a page with 4 errors can float through the system
without _anyone_ having had a "second look" at it -- or, indeed,
even a _first_ look -- is not a good sign, not a good sign at all.

moreover, according to clear results of a poll i ran over at d.p.,
the majority of people over there believe that 5 errors in a whole
_book_   is right at the maximum that they are willing to tolerate
as "acceptable".   even worse, some of them -- in a defiant act of
wishful thinking -- actually have convinced themselves that they
really do _attain_ that level of accuracy with the books they do!
now, clearly that's ridiculous, and they're just fooling themselves;
they don't actually _know_ how many errors are in their books, so
they just let themselves believe that there aren't any errors there.

but in spite of this break from reality, their _expressed_desire_
is that they release books that have 5 or fewer errors in them...

viewed from that perspective, this performance -- with 4 errors
on a single page and (at least) 6 in a posted book -- is _bad_...

-bowerbird

p.s.   i'd point to the note that was left about that page, except that
rfrank seems to take down the thread for a book once it is posted...
once again, this data is important for a proper analysis of the test.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100405/30133e53/attachment.html>

From Bowerbird at aol.com  Tue Apr  6 15:48:25 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 6 Apr 2010 18:48:25 EDT
Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted
Message-ID: <53001.fb9fc78.38ed1439@aol.com>

rfrank has responded, over on the fadedpage forums,
to the note i sent yesterday on the posted "sitka" book.

as i might have expected, he fell on his sword and
took full blame for the errors that were present...

in some part, that's correct, because some errors _had_
been reported, and he simply failed to check the forum.

but that's not really the proper take-away message here.

a proofer can miss things, and still hit the "done" button,
just because they honestly felt they'd caught everything.
even the best proofer, with an accurate self-perception.

2 of the 6 errors that i reported were cases just like that.

if the only person who's gonna serve as "back-up" for that
proofer is the post-processor, that puts too much stress
on the post-processor.   that's exactly what d.p. has done,
and that's why they have so few volunteers for that task...

rfrank is willing to take on that stress, which is why he has
fallen on his sword.   but that's the wrong approach to take.

rfrank is also improving his tools, so they catch those glitches.
that's good, and it's part of the reason i reported these errors.

plus he'd already improved his workflow, to catch _italics_...

his improved workflow might work, _if_ the o.c.r. recognizes
the styling correctly.   but on any books with heavy formatting,
going back and reinserting the formatting might be a real pain.

a better approach, in my view, would be the one that rfrank
started with, when he put up his site, which is to encourage
the volunteers to do _both_ the proofing and the formatting.

it's really _not_ that difficult to do both these tasks together.

this is especially true if you give people a _formatted_display_,
because then the obtrusive markup is cleared from the screen,
and it's replaced by a rendering that resembles the actual scan.

i demonstrated this technique with my own proofing site, and
showed the additional strength that questionable words can be
highlighted in a different color, maximizing the value of a flag.

***

rfrank said:
>    But the most important thing I've concluded is that 
>    the majority of reportable errors in Sitka are chargeable
>    to the post-processor (me) and not the roundless system.

see, there's the "experimenter bias" that i was talking about;
he'd rather take the blame himself than blame his system...

i believe in the roundless system too.   i believe in it so much,
so strongly, that i believe the evidence can stand up for itself.


>    We don't have nearly enough participation by a realistic 
>    cross-section of typical users here at fadedpage to 
>    conclude anything based on real science or real statistics. 
>    All I can say is that I am liking the roundless system more and 
>    more. It seems to be doing its part well and continues to improve.

there isn't a "cross-section" of "typical" proofers over at fadedpage,
it's true.   the proofers there are probably much better than average.

and, as shown, even these better-than-average proofers can _miss_
errors.   we're all human.   we make mistakes.   we need to be checked.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100406/39fefba7/attachment.html>

From gbuchana at teksavvy.com  Tue Apr  6 18:20:37 2010
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Tue, 06 Apr 2010 21:20:37 -0400
Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted
In-Reply-To: <53001.fb9fc78.38ed1439@aol.com>
References: <53001.fb9fc78.38ed1439@aol.com>
Message-ID: <4BBBDDE5.9090309@teksavvy.com>

On 06-Apr-2010 18:48, Bowerbird at aol.com wrote:
> a proofer can miss things, and still hit the "done" button,

What would happen if the proofing system occasionally
*inserted* an error into the page and the double-checked
that the known error had been found and fixed?  eg: find
a correctly spelled word with "m" in it and change to "rn".
Choose from amongst a list of 100 similar things.

It might be a little paternalistic towards the proofreader,
but would give the automated system some basis for judging
whether the proofreader had *actually* proofed the the page
or not.  It might also help to keep proofers paying attention.

The final test for correctness is then that
(1) the fake error is found and fixed, and
(2) nothing else changed.

I haven't been paying much attention to this thread, so
apologies if you've all covered this ground already.

============================================================
Gardner Buchanan                     <gbuchana at teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.

From dakretz at gmail.com  Tue Apr  6 19:03:17 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 6 Apr 2010 19:03:17 -0700
Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted
In-Reply-To: <4BBBDDE5.9090309@teksavvy.com>
References: <53001.fb9fc78.38ed1439@aol.com> <4BBBDDE5.9090309@teksavvy.com>
Message-ID: <r2i627d59b81004061903i18ad4cd4p2c8b929ec6bd52db@mail.gmail.com>

This comes up all the time.

a. It's socially unacceptable by acclamation.
b. All you prove is that the proofer did or didn't catch an error you
already knew about.

On Tue, Apr 6, 2010 at 6:20 PM, Gardner Buchanan <gbuchana at teksavvy.com>wrote:

> On 06-Apr-2010 18:48, Bowerbird at aol.com wrote:
>
>> a proofer can miss things, and still hit the "done" button,
>>
>
> What would happen if the proofing system occasionally
> *inserted* an error into the page and the double-checked
> that the known error had been found and fixed?  eg: find
> a correctly spelled word with "m" in it and change to "rn".
> Choose from amongst a list of 100 similar things.
>
> It might be a little paternalistic towards the proofreader,
> but would give the automated system some basis for judging
> whether the proofreader had *actually* proofed the the page
> or not.  It might also help to keep proofers paying attention.
>
> The final test for correctness is then that
> (1) the fake error is found and fixed, and
> (2) nothing else changed.
>
> I haven't been paying much attention to this thread, so
> apologies if you've all covered this ground already.
>
> ============================================================
> Gardner Buchanan                     <gbuchana at teksavvy.com>
> Ottawa, ON             FreeBSD: Where you want to go. Today.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100406/1edeadf2/attachment-0001.html>

From Bowerbird at aol.com  Tue Apr  6 19:49:27 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 6 Apr 2010 22:49:27 EDT
Subject: [gutvol-d] Re: the sitka book from fadedpage is now posted
Message-ID: <42bf5.35a96207.38ed4cb7@aol.com>

dakretz said:
>   This comes up all the time.

it has been suggested before, yes...


>    a. It's socially unacceptable by acclamation.

...and is usually deemed socially unacceptable, true...

but i'm not sure that that couldn't be turned around,
assuming people are genuinely looking to be tested,
in a sincere desire to improve.   it is a bit repugnant,
in my opinion...   but so is the secretive collection of
data on proofing accuracy that rfrank is doing now,
especially since he's not revealing that to _anyone_,
including the very people he is collecting data on...


>    b. All you prove is that the proofer did or 
>    didn't catch an error you already knew about.

well, i don't necessarily agree with that.   i'd believe
your ability to catch the introduced errors would be
highly correlated with your general overall accuracy.
so if you value that metric, there's one good purpose.
(but, for the record, i believe that metric is valueless.)

further, the ability to detect the error _immediately_,
and show it to the proofer on-the-spot might well be
the very best feedback necessary to get their attention,
an argument which hasn't been fully considered before.

so there _could_ be some real value in this technique.

now, i certainly wouldn't do such error-injection on a
proofer without their express approval, because it is
entirely too _sneaky_ when you are doing it that way,
and jeopardizes the trust-relationship, which is vital.

but if a proofer _asked_ for it, i think it would be ok...

i never subscribed to the "i need to have some errors
to keep myself from betting bored" philosophy...   but
for a proofer who does, this could well be an answer.

bottom line, though, i just don't think it's necessary...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100406/3f4e991d/attachment.html>

From jimad at msn.com  Wed Apr  7 12:15:00 2010
From: jimad at msn.com (Jim Adcock)
Date: Wed, 7 Apr 2010 12:15:00 -0700
Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted
In-Reply-To: <53001.fb9fc78.38ed1439@aol.com>
References: <53001.fb9fc78.38ed1439@aol.com>
Message-ID: <SNT120-DS651FB845BBE85655CE7EBAE170@phx.gbl>

>if the only person who's gonna serve as "back-up" for that proofer is the
post-processor, that puts too much stress on the post-processor.  that's
exactly what d.p. has done, and that's why they have so few volunteers for
that task...

Agreed that PP at DP is stressful, in part because of the high standards
expected there of PPs -- don't see how one can meet their expectations
without at least doing a SR pass oneself, which supposedly isn't required.
But to me a more fundamental part of the problem is that it is relatively
easy to start a book at DP and then expect someone else to fix the
"problems" at the other end, leading a PP to stare at a potential PP book
and say "why in g--- name would someone have started this book project in
the first place???"  Again, its pretty easy to find a book project where one
can pretty easily predict it will be read 1000 times more than some other
book.  Which one would you rather PP?  A book that gets read 1000 a year, or
a book that gets read 1 time a year?  If one can "solo" a compelling book
myself, or "PP" a drab book at DP, which would you choose?

The other problem is if you PP a book you haven't lived with through the
entire process then you "start from scratch" with your knowledge of the
book, its author, its proofing problems, etc, and getting up to speed IMHO
is almost as painful as doing the whole project "solo" in the first place.
Don't get me wrong, I *love* the feeling of having others proof one's choice
of books at DP and say "wow, this is a really cool book to be proofing!"


From jimad at msn.com  Wed Apr  7 12:19:52 2010
From: jimad at msn.com (Jim Adcock)
Date: Wed, 7 Apr 2010 12:19:52 -0700
Subject: [gutvol-d] Re: : the sitka book from fadedpage is now posted
In-Reply-To: <4BBBDDE5.9090309@teksavvy.com>
References: <53001.fb9fc78.38ed1439@aol.com> <4BBBDDE5.9090309@teksavvy.com>
Message-ID: <SNT120-DS155B8EE078938D74CF7E29AE170@phx.gbl>

>What would happen if the proofing system occasionally
*inserted* an error into the page and the double-checked
that the known error had been found and fixed?  eg: find
a correctly spelled word with "m" in it and change to "rn".
Choose from amongst a list of 100 similar things.

The problems I see is that it would be hard for the system to "model" the
kinds of errors that remain unseen in the book, thus it would train proofers
to look for the wrong things.  Also, if the system can introduce errors, it
had better know how to take them back out.  And its not fair to introduce
errors on pages that a particular individual has already proofed, for
example if for a given page I P2 and PP then I do NOT want to have to go
into "paranoid mode" and look during PP for errors introduced after I had
already P2'ed. Proofing *already* leads too much to the feeling that one is
chasing one's tail, going around in circles, and "didn't I already fix that
one already!" 


From jimad at msn.com  Wed Apr  7 12:25:02 2010
From: jimad at msn.com (Jim Adcock)
Date: Wed, 7 Apr 2010 12:25:02 -0700
Subject: [gutvol-d] Re: the sitka book from fadedpage is now posted
In-Reply-To: <42bf5.35a96207.38ed4cb7@aol.com>
References: <42bf5.35a96207.38ed4cb7@aol.com>
Message-ID: <SNT120-DS9515EC7ADE68657E3907DAE170@phx.gbl>

>but i'm not sure that that couldn't be turned around, assuming people are
genuinely looking to be tested, in a sincere desire to improve.  it is a bit
repugnant, in my opinion...  

Don't see why this should be more repugnant than the other testings and
scorings that DP does on people? Except maybe the assumption is that now one
can always "hump it" for 50 pages and increase one's score enough to qualify
for the next level -- and then slack back off into "cruise mode?"


From Bowerbird at aol.com  Thu Apr  8 01:42:07 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 8 Apr 2010 04:42:07 EDT
Subject: [gutvol-d] how to create a spellcheck workflow
Message-ID: <462cd.1b65839.38eef0df@aol.com>

ok, well gee...

i had gotten the firm impression that rfrank had read
the d.p. forum thread where the development of their
"wordcheck" was discussed (thoroughly -- 30 pages).
d.p. calls it "wordcheck", but it's basically spellcheck.

so why did i think he had read it?

because he incorporated several of the things which
i had suggested (to no good effect) in that thread...

i won't bother reciting all of those particulars, since
he might well have come up with the ideas himself,
and nobody really cares anyway, not even me...

but i mention it because now i have been convinced
that roger must _not_ have read that thread, based
on some current discussions on his fadedpage site...

or, if he did read it, he didn't "get it", but given those
current discussions, he might now be more receptive.

so i will run through a quick little "how-to" refresher
on how to design and build a spellcheck functionality,
and incorporate it into the overall workflow of a book,
so roger has the benefit of my wisdom here...         ;+)

as you'll see, this will hit a very wide variety of topics.
note that what we're talking about here is _primarily_
executed during _preprocessing_, but there are some
follow-on thoughts that apply to the proofing as well.

***

0.   set o.c.r. parameters correctly.   don't dehyphenate!
name your files wisely.   look both ways before crossing.

1.   the first thing you should do with your o.c.r. results
is to make the few global changes you can do _blindly_.
the very first one is to strip trailing spaces from all lines.
another will be to replace two spaces with a single space.
yet another is to change a linebreak-doublequote-space
combination to eliminate the space, since it's superfluous.
likewise, change all cases of space-doublequote-linebreak.
you get the drift.   oh, by the way, don't do what d.p. does!
_retain_ the runheads, and pagenumbers; you need 'em...

2.   the next thing you should do is clean up the runheads.
this has little bearing on our general "spellcheck" topic,
but i include it here because it's always your second step.

3.   the third thing on your list is to fix all paragraphing...
again, not much to do with spellcheck, but it _is_ step #3.

4.   now we can focus quite specifically on spellcheck stuff.
we will take the o.c.r. and run it through a program i wrote
which pulls out all the words _not_ present in its dictionary.
i think rfrank has his own program that does the same thing,
or something similar enough.   i'll make mine available too.
this is the first draft of your "bad-words" list for this book.
(note this is not the same as how d.p. defines "bad words".)
in this regard, use a good dictionary in this first check here.
(this is something that rfrank hasn't done correctly thus far.)
the dictionary i use is quite good, and it can be found here:
>    http://z-m-l.com/go/regulardictionary.txt
just so you have a feel for the output from this program,
i have posted it for the "sitka" book we've been discussing:
>    http://z-m-l.com/go/sitka/sitka-reversedictionary.txt
that output was generated in 5 seconds, so it's pretty fast...

5.   you'll see, on viewing your list of supposed "bad-words",
that a bunch are not "bad-words" at all.   some of 'em will be
character-names, or jargon specific to your particular book.
some will be hyphenated fragments, some compound-words.   
you can delete these words from the list now, if you want,
but you shouldn't necessarily feel a great need to do that.
if you're looking at the list of words from the "sitka" book,
you will also notice that i separated the initial-caps words
from all-lowercase ones.   there is a good reason for that.
due to proper names, _most_ initial-cap words are correct,
whereas most of the all-lowercase words are _incorrect_...
separating the lists makes it easier to focus your attention.

6.   your dictionary-check program should also spit out the
_frequency_ of each bad-word.   you'll use that information
to cull some words from this list of "bad-words".   this you
_will_ want to do, most definitely, so _sort_ on frequency.
i fought tooth-and-nail on this with d.p. (i lost, of course),
but you can take this to the bank that as long as there are
a mere 4-plus occurrences of a specific string in the o.c.r.,
you can (i.e., should) delete it from this list of "bad-words".
yes, some of the words _might_indeed_ be bad, but unless
you're positive of that, you should delete 'em from the list.
and yes, this means that those words will _not_ be flagged.
but trust me, if there's 4 or more occurrences of a scanno,
your proofers will find at least _one_ .   and whenever they
find _one_ "bad-word" that wasn't on that "bad-word" list,
you will automatically search the rest of the book, and thus
find those _other_ occurrences as well, so you can fix them.
i did _not_ include frequency information in my "sitka" list,
because i didn't want to make everything so bloody obvious,
and because i want you to discover the importance of that
frequency data for yourself, so it burns itself in your brain.

7.   when you narrow your focus to the words that are _not_
in the dictionary, and which occur only two or three times in
the book, you'll find you can be very productive fixing errors.
for many of the words, it'll be obvious what they should be...
building a tool that will take you _immediately_ to each word,
plus show the scan alongside, will turn you into a _machine,_
an awesome and devastatingly efficient error-fixing machine.
for this very first pass, i recommend you look only at words
you're confident are scanning errors.   (they're easy to spot.)
on your next pass, you can look at more questionable words.
also pay attention to words with several variants that'll thus
sort next to each other.   (see the asterisks in the "sitka" list.)
it'll almost certainly be the case that one variant is a scanno.
(97 times out of 100, it is the one with fewer occurrences.)
also, in a system where you're gonna have proofers doing a
word-by-word proofing, don't even bother to look at words
which look kinda reasonable.   and don't ever bother to view
any words with only _one_ occurrence.   those will be flagged,
and it's no less efficient to have the proofer see if it's correct
than to have _you_ see if it's correct.   you wanna be efficient
in preprocessing.   efficiency is _the_point_ of preprocessing!

8.   the next thing you want to look at are compound-words.
my tool also separates compound-words to their own listing.
as per usual, perusal of the compound-words will show that
some are obviously correct, others are obviously incorrect,
and a bunch where a judgment can only happen with a scan.
the other thing about compound-words, which you will want
your tools to handle, is a check against the rest of the book,
to find any other instances of the compound where the parts
are separated by a space (two words) or joined (one word)...
that information will help you decide how to treat the word.

9.   the other thing you're gonna check is end-line hyphenates.
remember that i told you _not_ to have the o.c.r. rejoin them.
my philosophy is to _retain_ the end-line hyphenates through
my final product, but i'm not arguing for that position _here_.
you can rejoin the end-line hyphenates if you want to do that.
just don't do it until _after_ all of the proofing is done, because
having original linebreaks makes a page much easier to proof.
and besides, in determining whether or not the rejoined word
contains a dash or not, we need to have uncompromised data.
if your o.c.r. program destroys that data (by joining the word
in the way that _its_ dictionary dictates), then you might just
be doing a disservice to the way that the _book_ did things...
for your spellcheck, however, you can ignore all of that stuff.
internal to your spellcheck tool, rejoin the end-line hyphenate
by eliminating the linebreak, and test the resultant compound.
if it passes, fine.   if not,   try again with the dash removed too.
if it _still_ doesn't pass, flag _both_ portions of the compound.
(note that if you are using _my_ dictionary, mentioned above,
it contains no compounds, so you would skip that first check.)

10.   so at this point in time, you have a great "bad-words" list.
that is half your battle.   (that's right, just _half_.)   so now you
will make your "good-words" list.   to do this, just run the text
through your dictionary-checker using your "bad-words" list
as the dictionary.   thus, the output will be all of the words in
your text which are _not_ included on your "bad-words" list.
(or you can just use my tool, which can also create this list.)
from now on, this "good-words" list will be your dictionary...
got that?   you're not using the huge dictionary file any more.
you'll use the much-more-compact "good-words" list instead.
so now you have a "bad-words" list and a "good-words" list,
which -- taken together -- comprise all words in your book.
there are other jobs that you'll do during preprocessing, but
this is all the spellcheck work that's needed during that stage.

11.   so now we will move on from preprocessing to proofing,
which takes us to "flagging" -- highlighting possible errors...
a word should be flagged if it appears in the "bad-words" list.
a word _might_ be flagged if it's not on the "good-words" list,
perhaps in a different way; for instance, yellow instead of red.
notice that this is a slightly more nuanced way to do flagging.
rather than flag _everything_ that _might_ be wrong, we are
gonna flag _only_ the things we really _suspect_ are wrong...
underflagging is better than overflagging, because too many
flags makes us complacent; we start to check only the flags.
it's impossible for your mind to ignore the fact that most of
the flags are not actually errors, so it comes to expect that
if something is _not_ flagged, it certainly won't be an error.
but if you underflag, and the proofer spots an error that was
_not_ flagged, it primes them to be attentive to everything...

12.   the other thing that's extremely important here is that,
when the book is finished, we should have resolved all flags.
every word in the book will be there on the "good-word" list,
and the "bad-word" list will have shrunk until it disappeared.
that is, every bad-word will have been checked, and if it was
"ok", it will have been moved to the "good-word" list, and
if it was not "ok", it would have been _changed_, which will
also remove it from the bad-words list.   the words do _not_
have to be physically removed from the "bad-words" list, but
if we check every word in the book on the "good-words" list,
and find it fine, then the "bad-words" list will be eliminated.
this complete "good-words" list is _useful,_ because we can
run the full book against spellcheck at any time, and it will
come out totally clean.   so we do that check periodically, so
we know that we haven't compromised the book's accuracy.
oh, and just so you'll know, it's quite easy to write the code
that does this check.   you simply sort the words in the book,
eliminating duplicates; then you sort the "good-words" list
(if it's not already sorted), and eliminate its duplicates too
(shouldn't be any); then the 2 outputs should be _identical._
this lets us envision the proofing process as movement of
all words on the "bad-words" list to the "good-words" list.
(put that image in your head; the visualization has utility.)
to help facilitate that movement, you need to make it easy
for proofers to put words on the "good-words" list, which is
why -- on my proofing site -- i let them add all the words for
a single page to the "good-words" list with one button-click.
(another option is a button-click for each individual word.)
the flip-side is that, in order to have a page be considered as
"finished", all flagged words on that page _must_ be cleared.
remember, words move from "bad-words" to "good-words".

that's good enough for now.   any questions on this?        ;+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100408/543a1685/attachment-0001.html>

From gbuchana at teksavvy.com  Thu Apr  8 16:30:33 2010
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Thu, 08 Apr 2010 19:30:33 -0400
Subject: [gutvol-d] Re: how to create a spellcheck workflow
In-Reply-To: <462cd.1b65839.38eef0df@aol.com>
References: <462cd.1b65839.38eef0df@aol.com>
Message-ID: <4BBE6719.2070708@teksavvy.com>

On 08-Apr-2010 04:42, Bowerbird at aol.com wrote:

> _retain_ the runheads, and pagenumbers; you need 'em...
>

I imagine you'll eventually explain, but what use is preserving
or fixing the page headings? If they can be mechanically fixed,
they were not worth much.

Do you include or exclude running headings in your word-count
dictionary analysis?  Does it matter?

============================================================
Gardner Buchanan                     <gbuchana at teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.

From Bowerbird at aol.com  Thu Apr  8 21:56:16 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 9 Apr 2010 00:56:16 EDT
Subject: [gutvol-d] why isn't distributed proofreaders helping fadedpage?
Message-ID: <88407.6b6f18ae.38f00d70@aol.com>

why isn't distributed proofreaders helping fadedpage?

the site needs proofers, and d.p. has an _excess_,
to the degree that they are actively attempting to
stunt the work being done by the p1 volunteers...

so why not send some people over to fadedpage?

rfrank has done a lot for distributed proofreaders.

and fadedpage is showing d.p. how they can make
a roundless system work, and that it _does_ work...

so why is d.p. being so stingy?   help your brother!

-bowerbird

p.s.   and to start, some of you people who are now
working at fadedpage can occasionally make a post
in a forum at d.p. inviting people to try fadedpage...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100409/f74beeca/attachment.html>

From Bowerbird at aol.com  Thu Apr  8 22:08:45 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 9 Apr 2010 01:08:45 EDT
Subject: [gutvol-d] Re: how to create a spellcheck workflow
Message-ID: <8888c.1468b5c0.38f0105d@aol.com>

gardner said:
>    what use is preserving or fixing the page headings?
>    If they can be mechanically fixed, they were not worth much.

well, to take your second sentence first, the runheads cannot
_always_ be mechanically fixed.   sometimes they contain text
that describes the current point in the chapter, like an outline,
and aren't just limited to a boring recitation of title and author.

and returning to the question, runheads are useful since they
help you keep your bearings in the book.   even when they are
nothing but title/author, they help keep recto/verso straight...

plus -- especially when you're working on multiple projects --
it's useful to have the reminders about which book you're in...
because without 'em, after a time, the pages all look the same.


>   Do you include or exclude running headings in 
>    your word-count dictionary analysis?? Does it matter?

never paid much attention, so i guess it doesn't matter much.

but runheads are definitely excluded from search operations.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100409/a5c174df/attachment.html>

From Bowerbird at aol.com  Fri Apr  9 12:25:18 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 9 Apr 2010 15:25:18 EDT
Subject: [gutvol-d] so let's talk about my collaborative proofreading site
Message-ID: <25a83.14c958b4.38f0d91e@aol.com>

ok, so let's talk about my collaborative proofreading site...

yeah, that's a real knee-slapper, isn't it?            :+)

ok, i'll talk, and y'all can just sit and listen.   or go watch t.v.

***

to see what i'm talking about, you can visit this u.r.l.:
>    http://z-m-l.com/go/sitka/editr.pl

when you go there, you're proofing the "sitka" book...

unlike other proofing systems, pages are not "assigned"
to you.   you can go and proof any page you wanna proof.
even go back the next day and proof it again, if you like.

you'll see a row of buttons at the top-left of the screen,
which includes one that says "go", and "prev", and "next"...


navigating the pages...

"prev" and "next" are what you'd expect.   they take you to
the previous page or the next page in the book, obviously.

at the end of that row of buttons, you'll see an edit-field
that has a number in it.   (or perhaps a letter and numbers.)
that number is the pagenumber.   (and, ergo, the filename.)

if you want to jump to a specific page, put its number in the
edit-field, and then click the "go" button.   boom, you're there.

the ability to navigate the book at will, and to proof any page
you like, can be extremely powerful once you learn to use it.


certifying a page as clean...

there's also an "ok" button.   that's what you will click when
you've proofed a page and found nothing at all to change.
by clicking "ok", you indicate you certify the page is clean.


searching the book for a string...

next to "ok" is another edit-field and a "find" button next to it.
this field lets you enter a search-term, and then when you click
"find", a screen appears that lists pages with that search-term...

the line containing the word is shown, with a link to its page.

if you want to visit any of these pages, you can open them in a
separate tab/window.   and it's fine to open a bunch at one time.

for instance, say you want to check the sitka chapter-headers.
enter the term "chapter" in the edit-field, and then click "find",
and you will get a list that includes the table of contents page
as well as each of the nine pages where each chapter starts...
(red lines are a case-insensitive hit; black are case-sensitive.)

there are many other ways you can use the search functionality,
but we'll save the discussion about all of those for a later time...

to return from the search-results page back to a proofing page,
just use the "back" button in your browser.   you'll have to clear
the search-term out of the edit-field, or it'll do the search again.

(i should make it so it only does the search when you click the
"find" button, but that's not the way it works now, sorry folks.)


feel the power with the "command" field...

the edit-field where you enter your search-term also serves as
a "command" field.   i'll eventually have a number of commands
that you can issue, but for now there's just a couple of them...


showmap...

the first command is "showmap".   enter that, and click "find".
the program will show the "map" of files comprising the book.
each one is a clickable link, so -- as before -- you can open
any pages you like in separate tabs and proof them just fine...
("listcat", short for "list catalog", is a synonym that works too.)


concat...

another command is "concat".   enter that, and click "find",
and the program will concatenate all the text files into one,
and put them on a web-page for you.   this will allow you to
look at the entire book, save it to your machine, and so on...
you can also use the browser's "find" command, so "concat"
is useful when you need more context to a "find" operation
than the single line output from the native find command...


showcustom...

a third command is "showcustom".   this command spits out
the "custom dictionary" that has been created for the book.

***

that's enough for now.   give you a little toy to play with over
the weekend, if you like.   we'll discuss more stuff next week.

anyway, thanks for the little chat.   so, how was the t.v. show?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100409/e53c2a92/attachment.html>

From Bowerbird at aol.com  Fri Apr  9 14:44:30 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 9 Apr 2010 17:44:30 EDT
Subject: [gutvol-d] my report on that sitka posting
Message-ID: <ae9d2.497e68af.38f0f9be@aol.com>

ok folks, here's my book report on the "sitka" book
which was done over at fadedpage, and posted to p.g.

it's #31862 if you want to go and take a look at it:
>    http://www.gutenberg.org/files/31862/31862.txt

***

i'll discuss the high points in the body of this message;
appended you'll find the data documenting these points...

***

1.   ...stealth scanno...

the first thing to note is that a sharp-eyed proofer
caught a stealth scanno.   my system, which relies on
spellcheck heavily, is susceptible to stealth scannos,
so i'm always interested in their frequency, and whether
or not they will cause a serious problem in comprehension.

in this book, there was 1 stealth scanno, so they were
infrequent in this book.   further, it wasn't a bad one...
a "heeded" was misrecognized as "needed", so no big deal.

***

2.   ...publisher corrections made; here's a list of 15...

rfrank made his digitization philosophy very clear here,
as he made a number of _corrections_ to the original book.

and i agree!   i'm firmly against the "transcription" model,
which holds that we should merely reproduce the print-book.

i maintain we are _republishing_ the book, and therefore
i strongly believe we need to correct any errors we find.

and like me, roger doesn't bother with a "list of changes",
either.   i don't think it's necessary to make such a list.
they're boring.   just make the change and keep on moving...

however, there is one aspect to note about this, which is
that means rfrank sets a higher standard for his work, and
if we find errors in the p-book which he should've found,
then he is now accountable for having missed those errors.

***

3.   ...more fixes, requiring research outside the book...

to his credit, rfrank went out of his way to check things,
so as to make sure that he was _only_ fixing actual errors,
and not introducing bad changes.   sure, it's not too hard
to look up a sailor's name on wikipedia to spell it right,
or to check that a word means "an edit from the czar", but
it takes time, and it improves the output, so give kudos...

***

4.   ...he even did "corrections" i might not have done...

as is often the case when you adopt a "re-publication"
philosophy as opposed to "transcription", some of the
changes that you make might not get universal agreement.

but that comes with the territory, so you roll with it.

***

5.   ...and some i _certainly_ wouldn't have done...

and you continue rolling, even when disagreement is thick.

***

6.   ...and yet he didn't do some that i woulda done...

everybody has their own opinions...         :+)

i won't count these as "errors".
but _i_would_ do them differently...

***

7.   ...and i disagree on how he did some formatting...

once again, everybody's got an opinion...

for the first 2 cases shown, i put a comma after
the period, just so the following lowercase letter
wouldn't throw a flag every time i did that check.

the third case is a bit more complicated...

i believe that the formatting rules used by rfrank
-- and d.p. in general -- are slightly misguided...

first of all, they are arbitrary, with wiggle-room,
which means they're hard to understand and interpret.
i think rules that are firmly grounded work better...

second, in many situations the rules instruct us to 
put the formatting toggles _inside_ the word itself
(at least inside the punctuation which surrounds it).

in this case, for instance, the _italicized_ word
is placed inside some _non-italicized_ parentheses.

so an italic letter butts a non-italic parenthesis.

but _none_ of the rendering engines we have today
(like browsers and e-book apps) are capable enough
to make that look reasonably good, let alone nice.

indeed, most of the time it looks absolutely dreadful;
a shift from non-italic to italic (and back) is ugly.

so even when it doesn't "make sense" grammatically,
my recommendation is to italicize the entire string,
it looks better, so that's what i do; you should too.

***

8.   ...the previous errors...

ok, now we get down to the brass tacks.

were there any errors in the posted book?

well, yeah.   i've already pointed out 6.

they are repeated below, to refresh memory.

but there are more errors...

***

9.   ...there are hyphenation differences...

rfrank has clearly shown that he is more than willing
to make changes to the book, so as to correct errors,
and for consistency.   for instance, he changed several
cases of the name "wrangel", consistent to "wrangell".

so if we now _find_ errors or inconsistencies, even if 
they were present in the original book, we'll hold him 
accountable for not having made the proper corrections.

it's only fair.

and yes, the book had several hyphenation inconsistencies.

rfrank may have found and fixed some, i don't know, but
i _do_ know that there were some that he did _not_ fix...

do you like the word "sealion"?   or is "sea-lion" better?
i like the second.   the book was split 50/50.   rfrank did
3 of them one way, then left the final 1 the other way...
i'd call the error-count 2, but you can call it 1, or 3.

there were 2 cases of "far-off" and 1 case of "far off",
so i'd go with "far-off", and raise the error-count by 1.

there was 1 "far-seeing" and 1 "far seeing".   ironically,
they're both on the same page.   talk about short-sighted.
call 1 of 'em an error.

there was 1 "guest-house" and 1 "guesthouse", so 1 error.

and 1 "ice-houses" and 1 "icehouses", so again, 1 error...

so let's say 6 hyphenation errors, due to inconsistency...

***

10.   ...and the other consistency errors...

i show 11 other consistency errors, all on names...

feel free to argue about any of them.   if you can
find someone who's willing to argue them, that is.

so far, 17 new errors...

***

11.   ...and finally, the last 2 errors...

there was 1 case of some missed formatting,
as shown clearly in the scan for this page:
>    http://z-m-l.com/go/sitka/sitkap025.png

_barabaras_,
barabaras,
^^^^^^^^^^

it's humorous to note that this missed italic was on
the same page as the missed italic i reported before.
but nobody said anything about "barabaras" earlier...

last but not least, 1 misspelled word: "gooch-heat",
which should have been "gooch-haet" -- wolf-house...
yes, it was printed wrong, but it shoulda been caught.

so 2 more, for a sum of 19 new errors, plus the original 6,
making a grand total of 25.   that's twice my usual standard
-- 1 error per 10 pages -- but since _most_ of these errors
were also present in the p-book (along with many more which
were corrected by rfrank), i'd call this a good digitization.

***

in conclusion...

many people will say that most of these errors are trivial.

i wouldn't argue with them.

at the same time, i'm compulsive enough to fix the trivial.

i don't want any errors in my books.   even "trivial" ones...

i'm guessing that roger will want to fix these errors too.

i'm also guessing that roger will say most of these errors
can be laid at the feet of the postprocessor, so that his
"roundless" system has passed this test with flying colors.

and, in a sense, he's right, in that most of these errors
_should_ be located within the realm of the postprocessor.
(or -- to once again stress my model -- the preprocessor.)

but to give the postprocessor the _luxury_ of enough time
to do the job of catching errors like these, we must have
the proofers perform as many duties as they possibly can.

when a page can conceivably be seen by only _one_ proofer,
then a postprocessor simply can't have enough faith that
each page is correct, and will inevitably perform checks
they would not do if they believed every page was solid...

if you're worried about simple things like missing italics,
you just won't have the focus to look for more subtle stuff.

use your proofers to make sure every page is rock-solid.

give the postprocessor the luxury to _polish_ the book...

-bowerbird

p.s.   here's a "bonus error", thrown in for good measure.
"blockhouse" isn't capitalized elsewhere in the book, and
if we did capitalize it here, we'd capitalize "upper" too.

>   near the site of the upper blockhouse. Her successor,
>   near the site of the upper Blockhouse. Her successor,
>    ===========================^^^^^^^^^^^^^^^^^^^^^^^^^^

===============================================================


1.   ...stealth scanno...

>    and was needed. Captain A. Holmes A'Court,
>    and was heeded. Captain A. Holmes A'Court,
>    ========^=================================


2.   ...publisher corrections made; here's a list of 15...

>    Narative of a Voyage Round the World.
>    Narrative of a Voyage Round the World.
>    ===^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    pupil.'
>    pupil.'"
>    =======^

>    an air of prosperity prevaded the place.
>    an air of prosperity pervaded the place.
>    ======================^^================

>    Her successor, the sceond Princess
>    Her successor, the second Princess
>    ====================^^============

>    in these waters bought skins for mere trifles, some for
>    in these waters, bought skins for mere trifles, some for
>    ===============^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    bath house
>    a bath house
>    ^^^^^=^^^^

>    island but
>    islands but
>    ======^^^^

>    ungents, combine to make
>    unguents, combine to make
>    ===^^^^^^^^^^^^^^^^^^^^^

>    Hudson Bay Co. the Russian ships that sailed
>    Hudson's Bay Co. the Russian ships that sailed
>    ======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    of the educational institution of Sitka.
>    of the educational institutions of Sitka.
>    ========================^^^^^^^

>    The liquid refreshments serve to him
>    The liquid refreshments served to him
>    =============================^^^^^^^

>    the villianous liquor called "hoochinoo"
>    the villainous liquor called "hoochinoo"
>    ========^^==============================

>    island to the broad Pacific. What were the thoughts
>    islands to the broad Pacific. What were the thoughts
>    ======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    or "Ranche," there is choice of two streets,
>    or "Ranche," there is a choice of two streets,
>    ======================^^^^^^^^^^^^^^^^^^^^^^

>    and three who names I do not know.
>    and three whose names I do not know.
>    =============^^^^^^^^^^^^^^^^^^^^^


3.   ...more fixes, requiring research outside the book...

>    Priest Vasili Michaeloff Ocueredin,
>    Priest Vasili Michaeloff Ocheredin,
>    ===========================^=======

>    of the harvest. Her captain, Entienne Marchand,
>    of the harvest. Her captain, Etienne Marchand,
>    ==============================^^^^=^^^^^^^^^^^

>    ukaze,
>    ukase,
>    ===^==


4.   ...he even did "corrections" i might not have done...

>    all the Russians to the Republic
>    all the Russias to the Republic
>    ==============^^^^^^^^^^^^^^^^^

>    of the American troops, Gen. Jeff C. Davis,
>    of the American troops, Gen. Jefferson C. Davis,
>    =================================^^^^^^^^^^

>    is called Kosters Trail. The first
>    is called Koster's Trail. The first
>    ================^^^^^^^^^^^^^^^^^^


5.   ...and some i _certainly_ wouldn't have done...

>    articles, carved with totemic design
>    articles, carved with totemic designs
>    ====================================


6.   ...and yet he didn't do some that i woulda done...

>     --The Author
>     The Author
>    ^^^^^^^^^^^

>    in the Bering Sea,
>    in Bering Sea
>    ===^^^^^^^^^^


7.   ...and i disagree on how he did some formatting...

>    to the Hudson's Bay Co., the Russian ships that
>    to the Hudson's Bay Co. the Russian ships that
>    =======================^^^^^^^^^^^^^^^^^^^^^^^

>    Total, 400. Ib., p. 52.
>    Total, 400. Ib. p. 52.
>    ===============^^^^^^^

>    of New Archangel _(Novo Arkangelsk),_
>    of New Archangel (_Novo Arkangelsk_,)
>    =================^^===============^^^


8.   ...the previous errors...

>    _yukali_ (dried salmon),
>    yuhali (dried salmon),
>    ^^^^^^^^^^^^^^^^^^^^^^

>    _prazdnik_ (holiday)
>    prasdnik (holiday)
>    ^^^^^^^^^^^^^^^^^^

>    _Sheetkah_,
>    Sheetkah,
>    ^^^=^^^^^

>    Seattle Intelligencer, December 14th, 1868;
>    Seattle Intelligencer, December 14th. 1868;
>    ======================^^^^^^


9.   ...there are hyphenation differences...

sea-lions (#19, but at line-end)
sealion (#29)
sealion (#48)
sea-lion (#101)

>    and sea-lion meat from Kodiak, and
>    and sealion meat from Kodiak, and
>    =======^^^^^^^^^^^^^^^^^^^^^^^^^^

>    adorned with sea-lion heads
>    adorned with sealion heads
>    ================^^^^^^^^^^

far off (#35)
far-off (#42)
far-off (#80)

>    in the far-off possession of the Czar.
>    in the far off possession of the Czar.
>    ==========^===========================

far seeing (#16)
far-seeing (#16) (yes, inconsistent on the same page)

>   wealthiest and most far-seeing of the leaders
>    wealthiest and most far seeing of the leaders
>    =======================^^^^^^^^^^^^^^^^^^^^^^

>    being entertained in the guest-house were
...but yet...
>    went to the guesthouse of the kwan. All the

>    The ice-houses were near the outlet of
...but yet...
>    icehouses was laden on the ship 250 tons, and


10.   ...and the other consistency errors...

>    in Chicagof Island, sent his ship's boat
>    in Chichagoff Island, sent his ship's boat
>    =======^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    Admiral Chicagof, Minister
>    Admiral Chichagof, Minister
>    ============^^^^^^^^^^^^^^

>    and Chicagof." (A Voyage Round the World, Lisianski, p. 235.)]
>    and Chichagof." (A Voyage Round the World, Lisianski, p. 235.)]
>    ========^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    of Golden California. Captain Hagemeister came to relieve him,
>    of Golden California. Captain Hagmeister came to relieve him,
>    =================================^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    of St. Michael
>    of St. Michaels
>    ==============

>    stir the lust for vengeance. The Keeksitties,
>    stir the lust for vengeance. The Keeksittis,
>    ==========================================^^

>    [Footnote 19: Globokoe Lake was sounded to
>    [Footnote 19: Golobokoe Lake was sounded to
>    ===============^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    Prince Dmitri Maksoutoff, Dec. 2, 1863, to Oct. 18, 1867.]
>    Prince Dmitri Maksoutof, Dec. 2, 1863, to Oct. 18, 1867.]
>    =======================^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>    Annahootz, the friendly Kokwanton war chief,
>    Annahootz, the friendly Kokwantan war chief,
>    ===============================^============

for these two, we'd have to research which variant is correct:

...in one place, we have one spelling...
>    the mouth of the Indian River, or _Kolosh Ryeku_.
...but in another place it's spelled differently...
>    the river, known as the _Kolosh Ryeka_, by the Russians

...in one place, we have one spelling...
>    Captain Leontius Andreanovich Hagemeister
...but in another place it's spelled differently...
>    Leonti Andreanvich Hagemeister, Jan. 11, 1818, to Oct. 24, 1818.

...getting now down to some smaller points...

...for consistency, and clarity, i'd change these lines...
>    Total, 400, Ib. p. 52.
>    their own account. Id. Vol. 2, p. 38.
...i would use "ibid" in both, replacing the "ib." and "id."...
...but i wouldn't call a "failure" to do this "an error"...

...for consistency, i would also change this reference...
>    Russian American Archives. Corr. Vol. I, p. 275.
...so that it matched with this reference...
>    Russian American Archives, Correspondence, Vol. II, No. 108.
...but i wouldn't count a "failure" to do so as "an error"...

...and, to finish off this consistency section...

...i'd do research to find out how these archives are named...
>    Russian American Archives. Corr. Vol. I, p. 275.
>    their own account. Id. Vol. 2, p. 38.
...because in the top line, it's "vol. i", roman-style...
...while in the bottom line, it's "vol. 2", arabic-style...
...obviously, one of these versions is incorrect...

...but so much for the trivial obsessive-compulsive points...


11.   ...and finally, the last 2 errors...

_barabaras_,
barabaras,
^^^^^^^^^^

gooch-heat
gooch-haet
========^^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100409/668a17a5/attachment-0001.html>

From cannona at fireantproductions.com  Sat Apr 10 20:45:39 2010
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sat, 10 Apr 2010 22:45:39 -0500
Subject: [gutvol-d] Project Gutenberg DVD Release Candidate now available
	for testing
Message-ID: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>

Hi all.

Finally, I have a release candidate of the new Gutenberg DVD available
for testing.  Currently, it is only available via BitTorrent.  Please
download and test, and let me know if you find any bugs.  At this
point, I'm not looking for suggestions on which new titles to add, but
any other feedback is welcome and appreciated.

I would not yet recommend distribution of this DVD image for any
purpose other than testing.

Please do not seed this torrent after April 30, because by then, we
should have an official version available.

This image is an .iso for a dvd-9, so you will either need a way to
open or mount .iso files, or a dual layer DVD burner.

You can download the torrent via the following magnet link:
magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI

or if your client doesn't speak magnet links, you can download the
torrent file from:
http://www.fireantproductions.com/pgdvd201004-rc1.torrent

Thanks in advance for all the feedback.

Aaron Cannon

From cannona at fireantproductions.com  Sun Apr 11 06:14:09 2010
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Sun, 11 Apr 2010 08:14:09 -0500
Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now
	available for testing
In-Reply-To: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
References: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
Message-ID: <s2w628c29181004110614s74a16593n93083a06adeb134@mail.gmail.com>

There was a problem with the first torrent I uploaded.  It did not
work in some older clients.  I have fixed it, so if you downloaded the
torrent, you may wish to redownload it.

Thanks.

Aaron

On 4/10/10, Aaron Cannon <cannona at fireantproductions.com> wrote:
> Hi all.
>
> Finally, I have a release candidate of the new Gutenberg DVD available
> for testing.  Currently, it is only available via BitTorrent.  Please
> download and test, and let me know if you find any bugs.  At this
> point, I'm not looking for suggestions on which new titles to add, but
> any other feedback is welcome and appreciated.
>
> I would not yet recommend distribution of this DVD image for any
> purpose other than testing.
>
> Please do not seed this torrent after April 30, because by then, we
> should have an official version available.
>
> This image is an .iso for a dvd-9, so you will either need a way to
> open or mount .iso files, or a dual layer DVD burner.
>
> You can download the torrent via the following magnet link:
> magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI
>
> or if your client doesn't speak magnet links, you can download the
> torrent file from:
> http://www.fireantproductions.com/pgdvd201004-rc1.torrent
>
> Thanks in advance for all the feedback.
>
> Aaron Cannon
>

From gbnewby at pglaf.org  Sun Apr 11 23:42:13 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 11 Apr 2010 23:42:13 -0700
Subject: [gutvol-d] Re: Newby/Hart at Illinois symposium April 15-16
In-Reply-To: <20100316021403.GA26102@pglaf.org>
References: <20100316021403.GA26102@pglaf.org>
Message-ID: <20100412064213.GA11084@pglaf.org>

We just heard this will be webcast. I do not know whether
recordings will be made available later.

  http://go.illinois.edu/50years

The schedule is at the conference site.  Michael and I are
scheduled for 1:30-3pm (CDT) Thursday April 15.

 http://50years.lis.illinois.edu/

  -- Greg

On Mon, Mar 15, 2010 at 07:14:03PM -0700, Greg Newby wrote:
> For those in the region, this might be of interest:
>   http://50years.lis.illinois.edu/
> 
> PGLAF CEO Greg Newby will join PG founder Michael Hart
> at a symposium on the U. Illinois campus.  Registration
> is free but limited.  The panel with Michael & Greg is
> scheduled for Thursday April 15 from 1:30-3pm.
> 
>   -- Greg

From Bowerbird at aol.com  Mon Apr 12 01:20:25 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 12 Apr 2010 04:20:25 EDT
Subject: [gutvol-d] more notes from the merry-go-round
Message-ID: <6576c.204495ee.38f431c9@aol.com>

gosh i love a merry-go-round...          :+(

***

here's a few notes, repeated too many times already,
but whatchagonnado?

if this doesn't make sense to you, that's probably
because it wasn't intended for you, so pay no mind.

***

any heavy bracket markup like [i]italics[/i] is obtrusive,
and that's why proofers complain about it, justifiably...

use light markup, like _italics_, and they won't complain.

better yet, have them do the proofing with an .html field
-- using a text-area field only when editing is required --
and you can use _real_ italics (and highlight it with color).
you can also use _color_ to flag your questionable words,
which makes the flagging a whole lot harder to "miss"...

***

spacey double-quotes are _easy_ to solve if you pay heed
to the _paragraphs_.   ergo... pay heed to the paragraphs!

yes, paragraphs cross page-boundaries...   but so what?

you start with the first file, and keep track of paragraphs
while you proceed to the second, and the third, and so on.

there's a ton of redundancy in the quotes -- open, close,
open, close, open, close.   make use of that redundancy...

it's not rocket-science; it's not even difficult programming.

get over the mind-blockage you have on this topic.

***

and once you _do_ get over that mind-blockage, you just
might see that books that use _single-quotes_ for dialog
aren't really all that different.   "but", you are sputtering,
"yes they are, because contractions cause big difficulties!"

poppycock.

here's a file with a list of contractions (among other stuff):
>    http://z-m-l.com/customdictionary.txt

that file's been up since june of 2007.

use that list intelligently to control those contractions...

possessives also use the single-quote, but they're easy
to deal with as well; just do a little thinking about them.

don't give up so easily.   _try._

you'll find it's not as hard as you thought.

and if you actually honestly _try_, and hit a wall anyway,
show me your actual honest efforts, and i will help you...

***

"probable markup on this page"?

are you purposely trying to be vague?

how about listing the italicized words, _specifically_?

geez!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100412/ecdc89d2/attachment.html>

From Bowerbird at aol.com  Mon Apr 12 14:26:55 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 12 Apr 2010 17:26:55 EDT
Subject: [gutvol-d] spell-check functionality and the sitka book
Message-ID: <1f6e.3591e780.38f4ea1f@aol.com>

in discussing how to bring about a spell-check functionality,
i talked about the "bad-words" list as a part of that workflow.

i posted the "bad-words" list from the o.c.r. for the sitka book.
>    http://z-m-l.com/go/sitka/sitka-reversedictionary.txt

i talked about how the proofing process can be envisioned as
movement from the "bad-words" list to the "good-words" list.

now that rfrank has posted the "final" version at p.g., we can
do the same "dictionary-check" procedure on his finished book:
>    http://z-m-l.com/go/sitka/sitka-reverseposted.txt

you'll see i have introduced blank lines into these files, so as
to coordinate their lines so they can be merged into one file.
this will allow us to see how each word traversed the process.

the merged file is here:
>    http://z-m-l.com/go/sitka/sitka-reverse-review.html

***

go ahead and take a quick look at the words...

one thing you might notice is that a number of the words are
marked with asterisks.   those are the ones that are suspicious,
in that they appear to be _variants_ of each other, which might
be a good indication of either (1) an o.c.r. misrecognition, or
(2) an inconsistency in the p-book which should be corrected...

(some people use levenshtein edit-distance to find these words.
that's nice, but a plain old review of a sorted list works well too.)

i checked those variants against the p-book, and many of them
were indeed errors.   (the ones marked "ok" at the end were right.)

that's how i found many of these consistency errors rfrank missed.

oh, by the way, there is still at least one more of those errors that
i didn't mention earlier...   so if anyone wants to go looking for it...

ok, so let's go on to look at the list in other ways...

***

for each individual word, we're gonna see how it was handled...

the first group are some words that had garbage characters.
those words didn't have any direct equivalent in the final file,
at least none that were close enough to be sorted similarly...

once we get into the lowercase words, we get some matching.

and that continues as we get into the words with an initial cap,
and on into the compound-words, and then into the numbers...

focus on the initial-cap words -- which are primarily names --
and you'll see that the vast majority of these were recognized
correctly, in that they persisted through to the final version...
about 85% of the initial-cap words were recognized correctly.

the same is true of the compound-words, with few exceptions.
and most numbers seem to have been recognized correctly too.

but lowercase words are more of a mixed bag.   some were right,
it's true, but a relatively high percentage of them were incorrect,
as evidenced by the fact they were changed, one way or another.
only about half the lowercase words were recognized correctly.

you might remember that i had predicted precisely this pattern.

most lowercase o.c.r. words that are in the "bad-words" list are
generally misrecognitions, while most initial-cap words are not.

indeed, the percentage of correct lowercase words in this o.c.r.
was much higher than is normal, because this was not typical
o.c.r., in the sense that the scans were clean, but also because
rfrank probably did some preprocessing on the raw o.c.r. text,
which we can tell because it had very few garbage characters...

so what we see is that roughly 75% of these words were _correct_,
despite the fact they were "bad-words" (i.e., not in the dictionary).
they weren't in the dictionary, but they were "good" in this book...
that so many "bad-words" can be correct and unique to the book
is why it's important to use a "custom" book-specific dictionary.

only about 25% of the "bad-words" were actually misrecognitions.

to the extent that you can narrow down your _flagging_ of the
"bad-words" to the ones that are _really_ bad, you can relieve
your proofers of a _lot_ of unnecessary flags, which is _good_,
because false flags sap the attention of proofers unnecessarily.

once you've done this analysis a number of times, like i have,
you'll come to recognize that it is a very important analysis...

-bowerbird

p.s.   hey, dkretz, thanks for the shoutout!   but one correction!
i wasn't "baited" into "crossing a line" that got be banned at d.p.
i never get "baited" into _anything_.   i always know what i'm doing.
and i've been banned from enough places that i know how it works,
so it wasn't that i "crossed a line".   again, i know what i'm doing...
no, if i get banned from somewhere, it's something i _anticipated_,
and after a consideration of that outcome, decided it didn't matter.
which is _not_ to say that i _like_ to get banned, or that i _try_ to,
but is _rather_ to say that i won't allow myself be banned _unless_
i have decided that i don't really care whether i'm banned or not...
as for "crossing a line", there's no need for it.   even though people
will generally say that i broke some technical rule and that is why
i was banned, the truth of the matter is that one only gets banned
if one pisses off the person with the power to push the ban button.
it has nothing at all to do with "the rules"...   it's just raw emotion...
oh, and let me say one more thing; it's _nice_ that d.p. still lets me
come to their forums and read them.   if they tried to prevent that,
i _could_ get around it, but it's a hassle.   so i thank them for that...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100412/d694058d/attachment.html>

From gbnewby at pglaf.org  Tue Apr 13 00:32:53 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Tue, 13 Apr 2010 00:32:53 -0700
Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now
 available for testing
In-Reply-To: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
References: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
Message-ID: <20100413073253.GA1262@pglaf.org>

On Sat, Apr 10, 2010 at 10:45:39PM -0500, Aaron Cannon wrote:
> Hi all.
> 
> Finally, I have a release candidate of the new Gutenberg DVD available
> for testing.  Currently, it is only available via BitTorrent.  Please
> download and test, and let me know if you find any bugs.  At this
> point, I'm not looking for suggestions on which new titles to add, but
> any other feedback is welcome and appreciated.

Thanks, Aaron. This is beautifully done - congratulations!

Has a link checker been run on this?  I quickly found a missing
file via file:///PGDVD_2010_04_RC1/etext/3002.html
and wonder whether some filetypes or other content might not
have made it.

I am really impressed at the number of titles!

I have temporarily put the ISO here:
  http://pglaf.org/PGDVD201004-RC1/
 ...but don't try to download via HTTP unless you have a
fast 'net connection.  Use the .torrent instead.

And you can browser the disc contents here (fast connection not needed):
  http://pglaf.org/PGDVD201004-RC1/content/

Thanks again!  I will confirm this burns onto a DL DVD.
  -- Greg

> I would not yet recommend distribution of this DVD image for any
> purpose other than testing.
> 
> Please do not seed this torrent after April 30, because by then, we
> should have an official version available.
> 
> This image is an .iso for a dvd-9, so you will either need a way to
> open or mount .iso files, or a dual layer DVD burner.
> 
> You can download the torrent via the following magnet link:
> magnet:?xt=urn:btih:JB4PMPIXNTYMFCUZABAYYTX56JWQINPI
> 
> or if your client doesn't speak magnet links, you can download the
> torrent file from:
> http://www.fireantproductions.com/pgdvd201004-rc1.torrent
> 
> Thanks in advance for all the feedback.
> 
> Aaron Cannon
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d

From greg at durendal.org  Tue Apr 13 06:04:31 2010
From: greg at durendal.org (Greg Weeks)
Date: Tue, 13 Apr 2010 09:04:31 -0400 (EDT)
Subject: [gutvol-d] [SPAM] PGDP down?
Message-ID: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>


Is PGDP down, or is something with just my link?

-- 
Greg Weeks
http://durendal.org:8080/greg/


From sankarrukku at gmail.com  Tue Apr 13 07:34:57 2010
From: sankarrukku at gmail.com (Sankar Viswanathan)
Date: Tue, 13 Apr 2010 20:04:57 +0530
Subject: [gutvol-d] {Disarmed} Re:  [SPAM] PGDP down?
In-Reply-To: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
References: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
Message-ID: <k2oe45c9fe71004130734l9ec5da22l692f561e81d156b4@mail.gmail.com>

Yes. It is down.

Sankar

On Tue, Apr 13, 2010 at 6:34 PM, Greg Weeks <greg at durendal.org> wrote:

>
> Is PGDP down, or is something with just my link?
>
> --
> Greg Weeks
> http://durendal.org:8080/greg/
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>


-- 
Sankar

Service to Humanity is Service to God
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100413/804008b0/attachment.html>

From dakretz at gmail.com  Tue Apr 13 07:36:45 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 13 Apr 2010 07:36:45 -0700
Subject: [gutvol-d] {Disarmed} Re:  [SPAM] PGDP down?
In-Reply-To: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
References: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
Message-ID: <m2w627d59b81004130736s984a2cd1kee2134e16d49cc6f@mail.gmail.com>

It is down.

On Tue, Apr 13, 2010 at 6:04 AM, Greg Weeks <greg at durendal.org> wrote:

>
> Is PGDP down, or is something with just my link?
>
> --
> Greg Weeks
> http://durendal.org:8080/greg/
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100413/b500ecd3/attachment.html>

From donovan at abs.net  Tue Apr 13 10:05:53 2010
From: donovan at abs.net (D Garcia)
Date: Tue, 13 Apr 2010 12:05:53 -0500
Subject: [gutvol-d] Re: PGDP down?
In-Reply-To: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
References: <alpine.DEB.2.00.1004130904010.26751@durendal.durendal.org>
Message-ID: <201004131305.54424.donovan@abs.net>

> Is PGDP down, or is something with just my link?

At o'dark-thirty EDT, pgdp.net suffered a kernel panic during backup and hard-
locked instead of self-rebooting as it is configured to normally do. The 
hosting company has power-cycled the machine for us, and I am monitoring 
remotely while integrity checking runs on our filesystems. The system will 
continue to be unavailable for several hours while this and other validations 
take place.

David (donovan)

From Bowerbird at aol.com  Tue Apr 13 13:28:51 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 13 Apr 2010 16:28:51 EDT
Subject: [gutvol-d] so let's talk about my collaborative proofreading site,
	part 2
Message-ID: <2ab43.7bb18f9.38f62e03@aol.com>

here's more info on my collaborative proofreading site...

***

to see what we're talking about, you can visit this u.r.l.:
>    http://z-m-l.com/go/sitka/editr.pl

***

we talked about 4 main topics last week:
>    navigating the pages...
>    certifying a page as clean...
>    searching the book for a string...
>    feel the power with the "command" field...

under the 4th topic -- the command field --
we discussed 3 of the commands you can issue:
>    showmap...
>    concat...
>    showcustom...

today we'll discuss a few more commands...

***


blubberbaby...

you'll remember that i also discussed how you can
implement spellcheck functionality in your workflow.

one key to this is creating a custom dictionary for
each specific book that you are digitizing, one that
contains the words unique to that particular book...

at first, you'll have a "bad-words" list, which contains
low-frequency words not found in a regular dictionary.

the other list is the "good-words" list, which contains
high-frequency words plus those in a regular dictionary.

the process of correcting the book is one of _moving_
items on the "bad-words" list to the "good-words" list,
either by certifying o.c.r. did recognize them correctly,
or by correcting the misrecognition to what it should be.
(or, in the case of an error by the publisher, correcting it.)

what is handy, for this process, is knowing which pages
have words that are contained on the "bad-words" list.

you _could_ navigate through each of the pages, to see
which ones have flagged words, which are shown in red.

but why not have the computer just tell us what they are?

voila the next command, christened "blubberbaby", to
honor alaska, for this sitka book.   enter "blubberbaby"
in the search-field and click "find", and in a little while
-- it's not yet unoptimized, so it's about 20 seconds --
you will be shown a page that includes all of the pages
that have words which are still on the "bad-words" list...

from that display-page, you can use the links there to
open a number of these pages -- each in its own tab --
and work on them to deal with all of the flagged words.

questionable words should be handled in preprocessing,
for the most part, so if the workflow is designed correctly,
you won't need to use this "blubberbaby" command often.

but it's useful to have it, so you can do the check if desired.

and if questionable words were not fixed in preprocessing,
then you'll find "blubberbaby" to be even more important.

***


pairsearch...

you'll remember when i was discussing _inconsistencies_
in the sitka book that i used the "bad-words" list to find
possible problems.   specifically, when two variants of a
word (usually a name) came up sorted next to each other,
it was easy to spot 'em and tell that they needed checking.

here are a few of them, so you can see what i mean...

>    Globokoe**************
>    Golobokoe**************
>    ...
>    Golofnin**************
>    Golovin**************
>    ...
>    Hagemeister**************
>    Hagmeister**************

it's pretty obvious that these _might_ be inconsistencies...

not all of them are.   for instance, "golofnin" and "golvin"
were -- apparently -- the names of two different people.

but the others were errors made by the original printer,
errors that coulda been caught (i caught 'em) and fixed.

what you have to do, though, to check these pairs out,
is go to the actual pages where they appear, and read
the text, so as to determine the correct course of action.

now, with the search capability, it's fairly easy to do that.

you just enter each term, and then click on the links to
open up the pages where that term appears.   fairly easy.

but that can get a bit tiresome if you have a lot to check.

so i programmed this "pairsearch" command to help out.

you enter the command "pairsearch", followed by pairs
of terms that you want to search for, and the program
presents the relevant pages to help you make a decision.

so, for instance, for the three pairs above, you'd enter:

>    pairsearch hagemeister hagmeister golofnin golovin globokoe golobokoe

the search-terms can be separated by spaces or line-ends.

the output from that search is appended to this message.
the lines are long, and will likely wrap, so it's also here:
>    http://z-m-l.com/go/sitka/pairsearch-output.html

the pagenames aren't linked now, but eventually will be.

this "pairsearch" command can be extremely useful in
resolving inconsistencies within the book, both those
introduced by o.c.r. and those by the original publisher.

one more note...

remember that publishers back in the old days didn't have
the wonderful tools that we now have at our disposal, so
it's no wonder that they had some problems when it came
to words like "globokoe" and "golobokoe", or russian names.
i'm sure if i had to use the primitive tools they had back then,
i'd be making 3 times as many errors as they made, or more...

***


end-page-hyphenates

d.p. has proofers mark end-page-hyphenates with an asterisk.
i'm not sure why they feel that's necessary.   the computer can
find end-page-hyphenates just fine.   here's a routine to do it.

put the command "end-page-hyphenates" in the search-field,
and then click "find", and you'll get a list of where they occur.
the list has links for both pages, containing both fragments...

for this book, you'll get this:

>    sitkap002.txt ... and ... sitkap003.txt
>    sitkap007.txt ... and ... sitkap008.txt
>    sitkap018.txt ... and ... sitkap019.txt
>    sitkap019.txt ... and ... sitkap020.txt
>    sitkap021.txt ... and ... sitkap022.txt
>    sitkap027.txt ... and ... sitkap028.txt
>    sitkap043.txt ... and ... sitkap044.txt
>    sitkap051.txt ... and ... sitkap052.txt
>    sitkap077.txt ... and ... sitkap078.txt
>    sitkap079.txt ... and ... sitkap080.txt
>    sitkap083.txt ... and ... sitkap084.txt
>    sitkap087.txt ... and ... sitkap088.txt
>    sitkap102.txt ... and ... sitkap103.txt

those pagenames are clickable, and take you to that page...

there isn't a lot of reason you need to check those fragments,
since the computer will also rejoin 'em if you unwrap the text.

but if i didn't include this functionality, you know _someone_
would say "yeah, but your system doesn't do _this_, does it?"

so now i can say, "well yes, as a matter of fact, it _does_..."

***

so, we've added "blubberbaby" and "pairsearch" commands,
as well as "end-page-hyphenates"; that's enough for today.

by now, you should have a pretty good feel on how we will
continue to implement functionalities as they are needed...

we'll discuss more stuff as i get it put into place...

-bowerbird

p.s.   here's the output from the "pairsearch" command above:

> .....here it is, in order of appearance in the book:
>
> globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe 
(Deep) Lake; the island-studded
> hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson 
Hagemeister Jamestown Kashavaroffs Katle
> hagemeister ... sitkap009.txt ... g instructions previously given to 
Hagemeister, instructing him to find the
> golofnin ... sitkap032.txt ... r the command of Captain Vasili M. 
Golofnin, who was widely known for his a
> golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin 
soon left Sitka to return to St
> hagmeister ... sitkap042.txt ... ills of Golden California. Captain 
Hagmeister came to re- lieve him, and in
> golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, 
Voyage of the Sloop "Kamchatka
> golofnin ... sitkap060.txt ... ccording to the account of Captain 
Golofnin, it was an establishment well b
> golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, 
pp. 72-73. [[72]]
> globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on 
Globokoef[72-2] (Deep) Lake, ground the
> golobokoe ... sitkap072.txt ... f the present improvement. [72-2] 
Golobokoe Lake was sounded to a depth cf
> hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich 
Hagemeister, Jan. 11, 1818, to Oct. 24,
> globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe 
Lake.-- Southwest from Sitka ab
> globokoe ... sitkap106.txt ... re in the rocky wall which divided 
Globokoe, or Deep Lake, from the sea, an
>
>
> .....and sorted, by search-term:
>
> globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe 
(Deep) Lake; the island-studded
> globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on 
Globokoef[72-2] (Deep) Lake, ground the
> globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe 
Lake.-- Southwest from Sitka ab
> globokoe ... sitkap106.txt ... re in the rocky wall which divided 
Globokoe, or Deep Lake, from the sea, an
>
> golobokoe ... sitkap072.txt ... f the present improvement. [72-2] 
Golobokoe Lake was sounded to a depth cf
>
> golofnin ... sitkap032.txt ... r the command of Captain Vasili M. 
Golofnin, who was widely known for his a
> golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin 
soon left Sitka to return to St
> golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, 
Voyage of the Sloop "Kamchatka
> golofnin ... sitkap060.txt ... ccording to the account of Captain 
Golofnin, it was an establishment well b
>
> golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, 
pp. 72-73. [[72]]
>
> hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson 
Hagemeister Jamestown Kashavaroffs Katle
> hagemeister ... sitkap009.txt ... g instructions previously given to 
Hagemeister, instructing him to find the
> hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich 
Hagemeister, Jan. 11, 1818, to Oct. 24,
>
> hagmeister ... sitkap042.txt ... ills of Golden California. Captain 
Hagmeister came to re- lieve him, and in
>
>
> .....and sorted again, this time in the order in which they were entered:
>
> hagemeister ... sitkap006.txt ... ngland English Francisco Georgeson 
Hagemeister Jamestown Kashavaroffs Katle
> hagemeister ... sitkap009.txt ... g instructions previously given to 
Hagemeister, instructing him to find the
> hagemeister ... sitkap075.txt ... nuary 11, 1818. Leonti Andreanvich 
Hagemeister, Jan. 11, 1818, to Oct. 24,
>
> hagmeister ... sitkap042.txt ... ills of Golden California. Captain 
Hagmeister came to re- lieve him, and in
>
>
> golofnin ... sitkap032.txt ... r the command of Captain Vasili M. 
Golofnin, who was widely known for his a
> golofnin ... sitkap034.txt ... stant, nor one doctor's pupil.'?? Golofnin 
soon left Sitka to return to St
> golofnin ... sitkap045.txt ... to trade with the Kolosh [45-1] Golofnin, 
Voyage of the Sloop "Kamchatka
> golofnin ... sitkap060.txt ... ccording to the account of Captain 
Golofnin, it was an establishment well b
>
> golovin ... sitkap072.txt ... erica, by Captain-Lieutenant P. N. Golovin, 
pp. 72-73. [[72]]
>
>
> globokoe ... sitkap002.txt ... the inlet at Ozerskoe Redoubt and Globokoe 
(Deep) Lake; the island-studded
> globokoe ... sitkap072.txt ... other at the Ozer- skoe Redoubt on 
Globokoef[72-2] (Deep) Lake, ground the
> globokoe ... sitkap105.txt ... mountainside. The Redoubt and the Globokoe 
Lake.-- Southwest from Sitka ab
> globokoe ... sitkap106.txt ... re in the rocky wall which divided 
Globokoe, or Deep Lake, from the sea, an
>
> golobokoe ... sitkap072.txt ... f the present improvement. [72-2] 
Golobokoe Lake was sounded to a depth cf
>
>
> --30--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100413/29a23698/attachment-0001.html>

From greg at durendal.org  Wed Apr 14 04:44:17 2010
From: greg at durendal.org (Greg Weeks)
Date: Wed, 14 Apr 2010 07:44:17 -0400 (EDT)
Subject: [gutvol-d] [SPAM] Re:  slightly off topic, first post, scanning
In-Reply-To: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>
Message-ID: <alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>

On Sat, 13 Feb 2010, Sparr wrote:

> Remnants of glue left on the edge of a page (from the removed spine)
> get stuck to the inside of the scanner or feeder, ruining the scans of
> subsequent pages until the scanner is cleaned.

I use a knife to cut the binding off rather than try to separate the 
pages. A plough knife is actually made for this. I've had pretty good 
results with a standard construction razor knife.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From hart at pglaf.org  Wed Apr 14 06:17:35 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 14 Apr 2010 06:17:35 -0700 (PDT)
Subject: [gutvol-d] !@! #17135 Twas The Night Before Christmas
Message-ID: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>


***** This file should be named 17135-h.htm or 17135-h.zip *****
This and all associated files of various formats will be found in:
        http://www.gutenberg.org/1/7/1/3/17135/

Produced by Janet Blenkinship, Suzanne Shell and the Online
Distributed Proofreading Team at http://www.pgdp.net


Where the first characters of stanza were "illuminated"
they were eliminated and never replaced in the htm. version.


From richfield at telkomsa.net  Wed Apr 14 07:42:47 2010
From: richfield at telkomsa.net (Jon Richfield)
Date: Wed, 14 Apr 2010 16:42:47 +0200
Subject: [gutvol-d] Re: [SPAM] Re:  slightly off topic, first post, scanning
In-Reply-To: <alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>
	<alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>
Message-ID: <4BC5D467.4000407@telkomsa.net>

FWIW, Since I bought myself a digital camera for general use, plus a 
copy of Omniscan, my scanner has been pretty well idle. As it happens, 
the camera is a 12 megapixel model, but for most purposes I find it 
better to set it to 8 MP or even less. Also, for most books I set the 
mode to black and white. It is best for mass input either to get a 
tripod as well, or to buy some sort of cheap plastic stand and mutilate 
it into a camera stand. I have been using a kindergarten table into the 
top of which I cut a camera-shaped hole with a hobby knife and due 
caution. Avoid buying s cheerfully coloured stand, because if you happen 
to need colour shots it can seriously affect the picture. The best is 
translucent white or grey, or possibly transparent. Grey or black are 
not too bad if illumination is no problem. Then it is just a matter of 
setting manual focus and clicking away till done. The table is very 
light and firm and I have had no problems with unsteadiness. Obviously 
one chooses a suitable surface to work on, so that glue and similar 
pollutants are not a consideration.

There are of course umpteen variations on the theme. You might prefer 
stands and clips to hold the objects erect. You might buy a second-hand 
camera economically, but do make sure that it will take a suitable 
memory module, the larger the better. SD cards are very good, especially 
if have a reading USB attachment. I got one pretty cheap. The main 
regret is that I didn't get a mains adapter to power the camera while 
one was still available.  As it stands I simply use rechargeable NIMH  
batteries of the right size. Remember: the power  burden is much heavier 
than most other photographic activities.

There are some definite advantages over the scanner, even though modern 
scanners are remarkably good. Fewer moving parts for one. (once you have 
the camera set up, it is only the button and the shutter that move! ) 
Unless you have a scanner with an automatic feed, the speed is better 
too, plus, there are few books that you need mutilate to photograph them.

Another luxury, though I have not in practice needed it, is that the 
camera can be set to various degrees of resolution. For most purposes 
very modest resolution is far more than adequate, but if you should need 
more than you can get from a single shot, then set it up to take only 
part of a page at a time, and you can magnify your material till the 
limiting factor is not the camera, but the quality of the printing.

Is my choice unusual in any way?

Jon

On 2010/04/14 13:44 PM, Greg Weeks wrote:
> On Sat, 13 Feb 2010, Sparr wrote:
>
>> Remnants of glue left on the edge of a page (from the removed spine)
>> get stuck to the inside of the scanner or feeder, ruining the scans of
>> subsequent pages until the scanner is cleaned.
>
> I use a knife to cut the binding off rather than try to separate the 
> pages. A plough knife is actually made for this. I've had pretty good 
> results with a standard construction razor knife.
>

From ajhaines at shaw.ca  Wed Apr 14 09:12:55 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Wed, 14 Apr 2010 09:12:55 -0700
Subject: [gutvol-d] Re: PDF-files
References: <7527194b0912050812s17b817f5i81e0398905f15c68@mail.gmail.com>
Message-ID: <1EA29AAF01F04633AE18B692A7AD561D@alp2400>

Fernando, PDF files are welcome, as long as they're part of a complete submission package.  

At a minimum, a submission must include a plain text file (either ASCII, ISO/Latin1, or UTF8, as required by the book's language). If the source book has illustrations or other graphical content, you can prepare an HTML file.

If you wish to also submit a PDF file, it should be generated from either your text or HTML file, not simply downloaded from, for example, Internet Archive (http://www.archive.org/details/americana), and included with the other submission files.

More information can be found in PG's various FAQ's at http://www.gutenberg.org/wiki/Category:FAQ

Al Haines
Project Gutenberg


  ----- Original Message ----- 
  From: Fernando Maia Jr. 
  To: Project Gutenberg Volunteer Discussion 
  Sent: Saturday, December 05, 2009 9:12 AM
  Subject: [gutvol-d] PDF-files


  [Sorry, I forgot to change the subject.]

  Hello, volunteers!


  I'm new here and I have a doubt. Would it be interesting for PG if it would have more PDF-files? I've searched for this information everywhere and I haven't found an answer yet, so that I decided to ask about it here.


  Sorry for possible mistakes (English isn't my native language).


  Thanks in advance,


  Fernando


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d at lists.pglaf.org
  http://lists.pglaf.org/mailman/listinfo/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/a862f6bb/attachment.html>

From ajhaines at shaw.ca  Wed Apr 14 09:26:24 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Wed, 14 Apr 2010 09:26:24 -0700
Subject: [gutvol-d] Re: Question about Scanned Books
References: <1266262684-sup-4204@zion>
Message-ID: <B95B1D442F314C5AA8FA691C64E16137@alp2400>

Michael, as long as the book can be proven to have been published before 
1923, it's in the public domain in the U.S., and is eligible for addition to 
the Project Gutenberg collection.  (For the past few years, Archive.org and 
Google Books have been the source for most PG submissions, as produced by 
Distributed Proofreaders and other submitters.)  A paper copy is not 
necessary.

For most books in Internet Archive, there's an "All Files: HTTP" link in the 
"View the book" box, at the left.  That gives access to all formats in which 
the book is available--GIF, PDF, TIF, etc.

Before beginning work on any book, you should check that it's not already in 
Project Gutenberg, and not being worked on by someone else by checking David 
Price's In-progress list at http://www.dprice48.freeserve.co.uk/GutIP.html.

If you haven't prepared an ebook for submission to PG, you should read its 
various FAQ's at http://www.gutenberg.org/wiki/Category:FAQ.  Section 7 of 
the Volunteers' FAQ is especially important.

Al Haines
Project Gutenberg


----- Original Message ----- 
From: "Michael McDermott" <mmcdermott at mad-computer-scientist.com>
To: "gutvol-d" <gutvol-d at lists.pglaf.org>
Sent: Monday, February 15, 2010 12:38 PM
Subject: [gutvol-d] Question about Scanned Books


> Archive.org has many DJVU files of books that have lapsed into the public 
> domain. Would it comply with PG's guidelines to take one of these volumes 
> (the one I was thinking of has a copyright date of 1915 and can be found 
> at http://www.archive.org/details/worksmartinluth00spaegoog)?
>
> The important elements here are:
>
> 1) I do not have a copy of the paper edition
> 2) This is a scan of a work that, by all appearances, qualifies having 
> been published in the US before 1923
> 3) Was digitized by Google
>
> Would this work or would its ancestry cause problems?
> -- 
> Michael McDermott
> www.mad-computer-scientist.com
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d 


From ajhaines at shaw.ca  Wed Apr 14 09:58:35 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Wed, 14 Apr 2010 09:58:35 -0700
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
Message-ID: <52E0A8172A5F417986C50DAFE5D1132A@alp2400>

Actually, it's fairly common practice that if a paragraph/verse starts with 
some kind of graphical/illuminated character, the actual character it stands 
for is not included in the HTML version.


----- Original Message ----- 
From: "Michael S. Hart" <hart at pglaf.org>
To: "The gutvol-d Mailing List" <gutvol-d at lists.pglaf.org>
Sent: Wednesday, April 14, 2010 6:17 AM
Subject: [gutvol-d] !@! #17135 Twas The Night Before Christmas


>
> ***** This file should be named 17135-h.htm or 17135-h.zip *****
> This and all associated files of various formats will be found in:
>        http://www.gutenberg.org/1/7/1/3/17135/
>
> Produced by Janet Blenkinship, Suzanne Shell and the Online
> Distributed Proofreading Team at http://www.pgdp.net
>
>
> Where the first characters of stanza were "illuminated"
> they were eliminated and never replaced in the htm. version.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d 


From marcello at perathoner.de  Wed Apr 14 11:09:07 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 14 Apr 2010 20:09:07 +0200
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <52E0A8172A5F417986C50DAFE5D1132A@alp2400>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
Message-ID: <4BC604C3.2000804@perathoner.de>

Al Haines (shaw) wrote:

> Actually, it's fairly common practice that if a paragraph/verse starts 
> with some kind of graphical/illuminated character, the actual character 
> it stands for is not included in the HTML version.

And that makes the HTML pretty useless for further processing like 
conversion to mobile formats.

It should be made a requirement that the stream of non-markup-characters 
be identical in all versions of an ebook:

   lynx --dump

should produce a text that wdiffs equal with the text version.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From jhowse at nf.sympatico.ca  Wed Apr 14 11:20:45 2010
From: jhowse at nf.sympatico.ca (Jeannie Howse)
Date: Wed, 14 Apr 2010 15:50:45 -0230
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <4BC604C3.2000804@perathoner.de>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<4BC604C3.2000804@perathoner.de>
Message-ID: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>

At 03:39 PM 14/04/2010, you wrote:
>Al Haines (shaw) wrote:
>
>>Actually, it's fairly common practice that if a paragraph/verse 
>>starts with some kind of graphical/illuminated character, the 
>>actual character it stands for is not included in the HTML version.
>
>And that makes the HTML pretty useless for further processing like 
>conversion to mobile formats.
>
>It should be made a requirement that the stream of 
>non-markup-characters be identical in all versions of an ebook:
>
>   lynx --dump
>
>should produce a text that wdiffs equal with the text version.

and it does. stripping this:

<p><span class="dropcapc">&nbsp;</span><span 
class="dropcap">T</span>he children were nestled all snug in their beds,<br />
While visions of sugar-plums danced in their heads;<br />
And mamma in her kerchief, and I in my cap,<br />
Had just settled our brains for a long winter's nap,<br /><br />
</p>

gives you this:

The children were nestled all snug in their beds,
While visions of sugar-plums danced in their heads;
And mamma in her kerchief, and I in my cap,
Had just settled our brains for a long winter's nap,

JHowse


================================================================================
"Turning a Picture into a thousand words"Preserving History One Page 
at a Time!!
Celebrating more than 17,350 books posted to Project Gutenberg!
Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/
================================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/35c247f6/attachment.html>

From azkar0 at gmail.com  Wed Apr 14 11:25:06 2010
From: azkar0 at gmail.com (Scott Olson)
Date: Wed, 14 Apr 2010 12:25:06 -0600
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<4BC604C3.2000804@perathoner.de>
	<20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>
Message-ID: <q2m2362473e1004141125wf4743225q7b9e39a35c8ece0f@mail.gmail.com>

On Wed, Apr 14, 2010 at 12:20 PM, Jeannie Howse <jhowse at nf.sympatico.ca>wrote:

>
> and it does.
>

Except where some unfortunately placed white space gives you stuff like:

A  mid the many celebrations..

:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/863a6ec4/attachment.html>

From greg at durendal.org  Wed Apr 14 13:17:16 2010
From: greg at durendal.org (Greg Weeks)
Date: Wed, 14 Apr 2010 16:17:16 -0400 (EDT)
Subject: [gutvol-d] [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <4BC604C3.2000804@perathoner.de>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<4BC604C3.2000804@perathoner.de>
Message-ID: <alpine.DEB.2.00.1004141616290.3369@durendal.durendal.org>

On Wed, 14 Apr 2010, Marcello Perathoner wrote:

> And that makes the HTML pretty useless for further processing like conversion 
> to mobile formats.

I've bitched about this before at DP and it got me nowhere. I didn't 
really jump up and down either though.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From hart at pglaf.org  Wed Apr 14 14:10:19 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 14 Apr 2010 14:10:19 -0700 (PDT)
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<4BC604C3.2000804@perathoner.de>
	<20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>
Message-ID: <alpine.DEB.2.00.1004141408170.8646@mail.pglaf.org>


If the between stanzas illustrations are so easily included,
then why no the illuminated characters?

6 of one, half a dozen of the other. . .eh?

Me. . .I would in both the plain ascii letter AND the graphic letter.

"Fairly common practice" = "UNfairly common practice". . . . .


On Wed, 14 Apr 2010, Jeannie Howse wrote:

> At 03:39 PM 14/04/2010, you wrote:
>       Al Haines (shaw) wrote:
>
>             Actually, it's fairly common practice that if a
>             paragraph/verse starts with some kind of
>             graphical/illuminated character, the actual
>             character it stands for is not included in the
>             HTML version.
>
>
>       And that makes the HTML pretty useless for further processing
>       like conversion to mobile formats.
>
>       It should be made a requirement that the stream of
>       non-markup-characters be identical in all versions of an
>       ebook:
>
>       ? lynx --dump
>
>       should produce a text that wdiffs equal with the text version.
>
>
> and it does. stripping this:
>
> <p><span
> class="dropcapc">&nbsp;</span><span
> class="dropcap">T</span>he children were nestled all
> snug in their beds,<br />
> While visions of sugar-plums danced in their heads;<br />
> And mamma in her kerchief, and I in my cap,<br />
> Had just settled our brains for a long winter's nap,<br /><br
> />
> </p>
>
> gives you this:
>
> The children were nestled all snug in their beds,
> While visions of sugar-plums danced in their heads;
> And mamma in her kerchief, and I in my cap,
> Had just settled our brains for a long winter's nap,
>
> JHowse
>
>
> ==========================================================================
>                                   ======
> "Turning a Picture into a thousand words"Preserving History One Page at a
>                                   Time!!
>      Celebrating more than 17,350 books posted to Project Gutenberg!
>  Join Project Gutenberg's Distributed Proofreaders http://www.pgdp.net/c/
> ==========================================================================
>                                   ======
>
>
>
>

From dakretz at gmail.com  Wed Apr 14 14:58:21 2010
From: dakretz at gmail.com (don kretz)
Date: Wed, 14 Apr 2010 14:58:21 -0700
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <alpine.DEB.2.00.1004141408170.8646@mail.pglaf.org>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<4BC604C3.2000804@perathoner.de>
	<20100414182048.HWG9392.torspm04.toronto.rmgopenwave.com@Jeannie-PC.nf.sympatico.ca>
	<alpine.DEB.2.00.1004141408170.8646@mail.pglaf.org>
Message-ID: <h2h627d59b81004141458h852e96d7gf0dc8f1499e8ac06@mail.gmail.com>

PG texts seem to be distributed to readers
by a number of different channels. In a sense,
PG has become the dominant wholesaler with
a number of retailers. And they also provide
direct distribution.

Source texts are provided to PG by DP (with
trivial exceptions) in two formats: plain text
and HTML. But PG and other mediators distribute
ebooks in a variety of different formats; and
given the variety of devices, readers are
requiring a number of other formats. This will
if anything be increasingly true. But all these
ebook formats must somehow be derived, through
one or more transformation processes, from one
or the other of the two originals.

Here are my naive, uninformed perceptions of
the trends of what's happening among four different
segments: Untransformed plain-text, transformed
plain-text, untransformed HTML, and transformed
HTML.

1. The number of readers who read ebooks using
the original plain-text versions, distributed
directly or indirectly, are a significant but
declining proportion of the whole.

2. The number of readers who read ebooks using
the original HTML versions, distributed
directly or indirectly, are a significant
proportion of the whole, not declining as
rapidly, but still declining (because they
require a real browser and a large-enough
screen to read them with any level of fidelity.)

3. Some proportion of readers are reading ebooks
derived from plain-text versions but transformed
using some kind of software to infer formatting.
I suspect this proportion is declining as well,
but it's hard to do and the readers are increasingly
expecting more from ebooks from their increasingly
sophisticated devices.

4. So that leaves the rest, who are reading ebooks
derived from the original HTML versions. My
suspicion is that the majority of ebooks are
already provided this way, and (especially with
the increasing acceptance of de jure and de facto
sub-html standards,) this will only increase.

How accurate is this assessment?

Based on the distribution among the quartiles,
should PG and DP make any changes in the way
ebooks are prepared and supplied?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/d69cbb8c/attachment.html>

From Bowerbird at aol.com  Wed Apr 14 15:42:20 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 14 Apr 2010 18:42:20 EDT
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
Message-ID: <30c96.47708041.38f79ecc@aol.com>

dakretz said:
>   How accurate is this assessment?

it's half-assed accurate.

mostly because it's looking at the p.g. corpus
from the standpoint of its two major file-types.

but we'll want to look at it from the perspective
of the users who will access it, and how...

(the answer to that is mobile, mobile, and mobil.)

moreover, some of your points, even as given, are wrong...


>    1.   original plain-text.   significant but declining strongly.

no.   significant and _increasing._   (a rising tide lifts all boats.)


>    2.   original .html files.   significant but declining slightly.

no.   significant and _increasing._   (same tide, different boat.)


>    3.   plain-text derivatives.   declining.

dead wrong.   this is the segment that is increasing fastest.

most of the places that make derivatives use the plain-text.


>    4.   .html derivates.   significant and increasingly so.

well, not quite "dead-wrong", but still wrong nonetheless.

the .html files have far too little consistency to be used in
a systematic creation of derivatives, not without glitches...

some places use the .html file, but then "fall back" to the
plain-text version if they see problems with the derivative.

but most can't spend that much energy on quality-control,
so they've resigned themselves to using the plain-text files.

which is not that big of a sacrifice, to be perfectly honest...

indeed, the system giving the most consistently best results
is the iphone viewer-app "eucalyptus", which utilizes _only_
the plain-text files; his converter is giving very good output.

and, to help get people's heads on, and completely straight,
it's good to do the reminder that many of the .html files are
the result of a straight-out conversion of the plain-text file.

and these files, because they're machine-generated, _are_
consistent enough to be used in a systematic conversion...
it's the "hand-crafted" ones that cause all of the problems,
which is something that i first pointed out many years ago.

when problems with the auto-generated .html files do occur,
it's usually due to an underlying glitch in the plain-text file.

so auto-conversion of plain-text is the best way to proceed.

and i've maintained for 7 years now that such a conversion is
not just _possible_, but our best course of action to follow...

for several years after i started, i left my argument unproven,
just to see who would jump at the bait and try to dispute it...

after destroying all that opposition, i have since proven that
it is indeed possible to use a plain-text file as your "master".

why y'all continue to ignore this proof, i simply do not know.

but i'll keep making the case, until all of you can see it clearly.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/1a223e3d/attachment-0001.html>

From cannona at fireantproductions.com  Wed Apr 14 16:46:43 2010
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Wed, 14 Apr 2010 18:46:43 -0500
Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now
	available for testing
In-Reply-To: <x2x628c29181004132117r7a410a9dpf8ed49e292e4a548@mail.gmail.com>
References: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
	<20100413073253.GA1262@pglaf.org> <4BC5241A.2040409@teksavvy.com>
	<x2x628c29181004132117r7a410a9dpf8ed49e292e4a548@mail.gmail.com>
Message-ID: <l2q628c29181004141646y450f5b85m4acc0e332bb1694b@mail.gmail.com>

Hi all.

Thanks to all those who downloaded and sent me feedback.

I figured out what was causing the broken links.  Turns out that when
I was creating an ISO file, all the "-"s in filenames were being
changed to "_"s, so as to meet the ISO9660 standard, which doesn't
permit filenames to contain "-".  So, 3533-h.zip was being changed to
3533_h.zip.  So, I am having to change the HTML to deal with this.

I should have Release Candidate 2 ready for upload tomorrow.  I'll
keep everyone posted.

Thanks.

Aaron

On 4/13/10, Aaron Cannon <cannona at fireantproductions.com> wrote:
> It looks like the HTML was changed some how from my local copy.  For
> instance, my local copy has no content directory.  All of the numbered
> directories are in the root, though come to think of it, it probably
> would have been better to create a content directory.  Nevertheless, I
> believe that it must have been altered to serve the books from a
> content directory when it was placed on the web server.  So, to answer
> your concern, the link works on the real ISO.
>
> Also, I don't think I want to mess with the books we've already got on
> there.  It won't really make much difference in size, and I don't want
> to add books at this point, so I guess I just don't see any compelling
> reason to mess with it.  Thanks for the suggestions though!
>
> Thanks a lot for looking.  I really appreciate the feedback!
>
> If you find anything else, please let me know.  Also, Maybe Greg can
> comment on the 404 error.
>
> Thanks.
>
> Aaron
>
>
>
> On 4/13/10, Gardner Buchanan <gbuchana at teksavvy.com> wrote:
>> Hi Aaron,
>>
>> Similarly, I have found that this link does not work:
>>
>> http://pglaf.org/PGDVD201004-RC1/content/3/5/3/3533/3533-h.zip
>>
>> followed from here:
>> http://pglaf.org/PGDVD201004-RC1/content/etext/3533.html
>>
>> If you have to muck with things, I also noticed that the following
>> section includes the separate volume 1 and 2 of _the Attache_ as
>> well as the omnibus.  Likely only the two volumes of the omnibus
>> are really needed.
>>
>> Canada -- Social life and customs -- Fiction
>>
>>      * Sunshine Sketches of a Little Town (English) By Leacock, Stephen,
>> 1869-1944
>>      * The Attach?; or, Sam Slick in England ? Complete (English) By
>> Haliburton, Thomas Chandler, 1796-1865
>>      * The Attach?; or, Sam Slick in England ? Volume 01 (English) By
>> Haliburton, Thomas Chandler, 1796-1865
>>      * The Attach?; or, Sam Slick in England ? Volume 02 (English) By
>> Haliburton, Thomas Chandler, 1796-1865
>>
>>
>> On 13-Apr-2010 03:32, Greg Newby wrote:
>>>
>>> Has a link checker been run on this?  I quickly found a missing
>>> file via file:///PGDVD_2010_04_RC1/etext/3002.html
>>> and wonder whether some filetypes or other content might not
>>> have made it.
>>
>> ============================================================
>> Gardner Buchanan                     <gbuchana at teksavvy.com>
>> Ottawa, ON             FreeBSD: Where you want to go. Today.
>>
>

From gbuchana at teksavvy.com  Wed Apr 14 16:57:44 2010
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Wed, 14 Apr 2010 19:57:44 -0400
Subject: [gutvol-d] Re: Question about Scanned Books
In-Reply-To: <B95B1D442F314C5AA8FA691C64E16137@alp2400>
References: <1266262684-sup-4204@zion>
	<B95B1D442F314C5AA8FA691C64E16137@alp2400>
Message-ID: <4BC65678.7030105@teksavvy.com>

This part of Al's advice should indeed be taken seriously for
those contemplating a solo project. I have twice been caught out
having worked on a project for which I have had a valid prior
clearance only to find that someone has done a duplicate project.

I am deeply grateful to David for the work he does on the in-progress
list, but I find it difficult to use and in any event one depends
on others using it effectively too, and you rely on them to make
some effort to follow up with the prior clearance holder as well.
My experience is that this is not a reliable string of assumptions.

I have been advocating off and on for a more accurate and up to
date in-progress mechanism that would be driven from the core
information that forms the PG clearance database.

The process I apply when deciding to do a project is:

(1) Search for the work online in the "usual" places:
  - king kong http://www.kingkong.demon.co.uk/ngcoba/ngcoba.htm
  - online books page http://onlinebooks.library.upenn.edu/new.html
  - PG's catalogue

Search using the author and title.  Find the author and title
on a large catalogue like the LOC or AMICUS and also search
for variant names and such.

(2) Go to the PG-DP, DP-Europe and DP-Canada sites and search
the forums for any talk about my proposed project.

(3) Look in David's list.

(4) Obtain a PG clearance.

If a book I am working on is something that might be relevant to
PG Canada, I *also* obtain a clearance from them.  In the case
of parallel clearances I inform both parties that this is going
on.

(5) Look *again* in David's list to see that my project appears.

(6) Start work on my project.

This may seem somewhat paranoid, but I don't intend *again* to
be caught-out this way.


On 14-Apr-2010 12:26, Al Haines (shaw) wrote:

> Before beginning work on any book, you should check that it's not
> already in Project Gutenberg, and not being worked on by someone else by
> checking David Price's In-progress list at
> http://www.dprice48.freeserve.co.uk/GutIP.html.


============================================================
Gardner Buchanan                     <gbuchana at teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.

From gbuchana at teksavvy.com  Wed Apr 14 17:38:27 2010
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Wed, 14 Apr 2010 20:38:27 -0400
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <30c96.47708041.38f79ecc@aol.com>
References: <30c96.47708041.38f79ecc@aol.com>
Message-ID: <4BC66003.80201@teksavvy.com>

While I agree more with BB's conjecture, than Don's I have seen
no real statistical evidence on either side.

My own experience, which is very old now, is of encountering
titles in Palm-compatible formats that had manifestly been derived
mechanically from the PG plain-text versions.  This is just
an anecdotal point, but it matches BB's "eucalyptus" data
point.

This doesn't seem to hard to research though.  For grins
I rummaged around for e-book versions of something I am
familiar with.  I found two separate conversions of
_Sunshine Sketches_ by Leacock.  Despite the existence
of a nice HTML version by David Widger, both the PDF and
HTML versions I found were based on the PG text version,
using the text version of the TOC and having the double-
hyphen version of M-dashes.  So there's two more random
data points in BB's column.


On 14-Apr-2010 18:42, Bowerbird at aol.com wrote:
> dakretz said:
>  > How accurate is this assessment?
>
> it's half-assed accurate.
>

============================================================
Gardner Buchanan                     <gbuchana at teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.

From greg at durendal.org  Wed Apr 14 17:48:50 2010
From: greg at durendal.org (Greg Weeks)
Date: Wed, 14 Apr 2010 20:48:50 -0400 (EDT)
Subject: [gutvol-d] [SPAM] Re: Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <4BC66003.80201@teksavvy.com>
References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com>
Message-ID: <alpine.DEB.2.00.1004142043530.5281@durendal.durendal.org>

On Wed, 14 Apr 2010, Gardner Buchanan wrote:

> While I agree more with BB's conjecture, than Don's I have seen
> no real statistical evidence on either side.
>
> My own experience, which is very old now, is of encountering
> titles in Palm-compatible formats that had manifestly been derived
> mechanically from the PG plain-text versions.  This is just
> an anecdotal point, but it matches BB's "eucalyptus" data
> point.

Another couple of anecdotal points. There are two paper publishers I've 
worked with a bit. Not recently, but a couple of years ago. Both had 
scripts to take the plain text and allow them to typeset in a couple of 
hours. They didn't use the html ever because it threw too many exceptions 
that required hand input to resolve, and therefore took a lot longer to 
get typeset. The proofread after to make sure nothing got messed up took 
longer.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From dakretz at gmail.com  Wed Apr 14 17:53:34 2010
From: dakretz at gmail.com (don kretz)
Date: Wed, 14 Apr 2010 17:53:34 -0700
Subject: [gutvol-d] {Disarmed} Re: [SPAM] Re: Re: !@! #17135 Twas The Night
	Before Christmas
In-Reply-To: <alpine.DEB.2.00.1004142043530.5281@durendal.durendal.org>
References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com>
	<alpine.DEB.2.00.1004142043530.5281@durendal.durendal.org>
Message-ID: <s2i627d59b81004141753qf7938b5aoa578c0d10edb3883@mail.gmail.com>

Does anyone know of any epublisher other than PG that *does* distribute the
html we provide?

Don

On Wed, Apr 14, 2010 at 5:48 PM, Greg Weeks <greg at durendal.org> wrote:

> On Wed, 14 Apr 2010, Gardner Buchanan wrote:
>
>  While I agree more with BB's conjecture, than Don's I have seen
>> no real statistical evidence on either side.
>>
>> My own experience, which is very old now, is of encountering
>> titles in Palm-compatible formats that had manifestly been derived
>> mechanically from the PG plain-text versions.  This is just
>> an anecdotal point, but it matches BB's "eucalyptus" data
>> point.
>>
>
> Another couple of anecdotal points. There are two paper publishers I've
> worked with a bit. Not recently, but a couple of years ago. Both had scripts
> to take the plain text and allow them to typeset in a couple of hours. They
> didn't use the html ever because it threw too many exceptions that required
> hand input to resolve, and therefore took a lot longer to get typeset. The
> proofread after to make sure nothing got messed up took longer.
>
>
> --
> Greg Weeks
> http://durendal.org:8080/greg/
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/4cb979c5/attachment.html>

From gbuchana at teksavvy.com  Wed Apr 14 18:07:12 2010
From: gbuchana at teksavvy.com (Gardner Buchanan)
Date: Wed, 14 Apr 2010 21:07:12 -0400
Subject: [gutvol-d] Re: [SPAM] Re:  slightly off topic, first post, scanning
In-Reply-To: <4BC5D467.4000407@telkomsa.net>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>	<alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>
	<4BC5D467.4000407@telkomsa.net>
Message-ID: <4BC666C0.9040300@teksavvy.com>

Hi Jon,

Nope, I think you're part of a pretty popular movement there.
There's a whole cottage industry of building home-made
book scanners that consist of a jig to hold the book and
a pair of digital cameras positioned to capture the two
facing pages.  Look at http://www.diybookscanner.org/

Personally, I still use a flatbed, but that's because
I'm a Luddite.

On 14-Apr-2010 10:42, Jon Richfield wrote:
> FWIW, Since I bought myself a digital camera for general use, plus a
> copy of Omniscan, my scanner has been pretty well idle.
[...]
>
> Is my choice unusual in any way?
>


============================================================
Gardner Buchanan                     <gbuchana at teksavvy.com>
Ottawa, ON             FreeBSD: Where you want to go. Today.

From dakretz at gmail.com  Wed Apr 14 18:09:39 2010
From: dakretz at gmail.com (don kretz)
Date: Wed, 14 Apr 2010 18:09:39 -0700
Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning
In-Reply-To: <4BC666C0.9040300@teksavvy.com>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>
	<alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>
	<4BC5D467.4000407@telkomsa.net> <4BC666C0.9040300@teksavvy.com>
Message-ID: <t2w627d59b81004141809u3903f3bdu9ee432be5992daca@mail.gmail.com>

I bought a crappy digital camera to use
"most of the time". It didn't even come with
a manual - just a url to download it.

But it did have instructions on how to
scan a book.

Don

On Wed, Apr 14, 2010 at 6:07 PM, Gardner Buchanan <gbuchana at teksavvy.com>wrote:

> Hi Jon,
>
> Nope, I think you're part of a pretty popular movement there.
> There's a whole cottage industry of building home-made
> book scanners that consist of a jig to hold the book and
> a pair of digital cameras positioned to capture the two
> facing pages.  Look at http://www.diybookscanner.org/
>
> Personally, I still use a flatbed, but that's because
> I'm a Luddite.
>
>
> On 14-Apr-2010 10:42, Jon Richfield wrote:
>
>> FWIW, Since I bought myself a digital camera for general use, plus a
>> copy of Omniscan, my scanner has been pretty well idle.
>>
> [...]
>
>
>> Is my choice unusual in any way?
>>
>>
>
> ============================================================
> Gardner Buchanan                     <gbuchana at teksavvy.com>
> Ottawa, ON             FreeBSD: Where you want to go. Today.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/5a90f131/attachment-0001.html>

From sparr0 at gmail.com  Wed Apr 14 18:25:31 2010
From: sparr0 at gmail.com (Sparr)
Date: Wed, 14 Apr 2010 21:25:31 -0400
Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning
In-Reply-To: <4BC666C0.9040300@teksavvy.com>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com> 
	<alpine.DEB.2.00.1004140741020.575@durendal.durendal.org> 
	<4BC5D467.4000407@telkomsa.net> <4BC666C0.9040300@teksavvy.com>
Message-ID: <y2j811b48bd1004141825hf36875f2y134cedc6a3c46767@mail.gmail.com>

I have a few hundred thousand pages to scan, so a diy camera-style
book scanner isn't appropriate, nor is a flatbed scanner.  Thanks for
the ideas, though.

On Wed, Apr 14, 2010 at 10:42 AM, Jon Richfield <richfield at telkomsa.net> wrote:
> FWIW, Since I bought myself a digital camera for general use, plus a copy of
> Omniscan, my scanner has been pretty well idle.

On Wed, Apr 14, 2010 at 9:07 PM, Gardner Buchanan <gbuchana at teksavvy.com> wrote:
> Nope, I think you're part of a pretty popular movement there.
> There's a whole cottage industry of building home-made
> book scanners that consist of a jig to hold the book and
> a pair of digital cameras positioned to capture the two
> facing pages. ?Look at http://www.diybookscanner.org/
>
> Personally, I still use a flatbed, but that's because
> I'm a Luddite.

From Bowerbird at aol.com  Wed Apr 14 20:22:48 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 14 Apr 2010 23:22:48 EDT
Subject: [gutvol-d] [SPAM] re: !@! #17135 Twas The Night Before Christmas
Message-ID: <41239.6b00c426.38f7e088@aol.com>

gardner said:
>    So there's two more random data points in BB's column.

thanks, gardner.   but there's no need to be "random".

the big sites using p.g. e-books are very easy to find,
because they are... um... big.

the first, and foremost, of course, was "black mask".
david moynihan had the scripts down to a science,
and even offered them to p.g. at one point in time,
an offer that was declined, for some stupid reason
that was not _just_ stupid, but asininely ridiculous.
(and no, i don't even know what that reason was.)

next up is "manybooks".   matthew's converters are
not nearly as good as david's, but he seems to have
many loyal users who pick up their books from him,
probably because he's always been very good about
supporting the widest possible array of machinery...

both of these providers have been using books from
p.g. going back for many years, so the fact that they
were using the plain-text versions might be due to
their history...   that is, they might just be in a rut...

but the newest big provider on the block nowadays
is "feedbooks".   and as far as i can tell, hadrien also
uses the plain-text version as his "starter" version...
makes sense, since he stores the text in a database,
so it does no good to have somebody else's markup.

the number of files being download from these sites
these days -- thanks to the kindle/iphone/ipad trio --
is downright _stunning_.   hadrien at feedbooks says
he's running at about 75,000 downloads every _day_,
with 2.5 million in march.   several individual books
have over 10k downloads, some 20k, and one 30k...

***

dakretz said:
>    Does anyone know of any epublisher other than PG 
>    that *does* distribute the html we provide?

well, in a way.   apple grabs the .epubs from here.
if i'm not mistaken, marcello uses the .html file,
if one exists.   but if not, he uses the plain-text...
i'm quite sure the .epub files are full of ugliness.

and mike cook and his "epubbooks" site might use
the .html file as his "starter", you'd have to ask him.
he hasn't done books "en masse", though, and thus
hasn't had to face the problems from inconsistencies.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100414/61a21de6/attachment.html>

From richfield at telkomsa.net  Thu Apr 15 01:17:15 2010
From: richfield at telkomsa.net (Jon Richfield)
Date: Thu, 15 Apr 2010 10:17:15 +0200
Subject: [gutvol-d] Re: [SPAM] Re: slightly off topic, first post, scanning
In-Reply-To: <t2w627d59b81004141809u3903f3bdu9ee432be5992daca@mail.gmail.com>
References: <811b48bd1002131127pa394f82g13bfdb667f511479@mail.gmail.com>	<alpine.DEB.2.00.1004140741020.575@durendal.durendal.org>	<4BC5D467.4000407@telkomsa.net>
	<4BC666C0.9040300@teksavvy.com>
	<t2w627d59b81004141809u3903f3bdu9ee432be5992daca@mail.gmail.com>
Message-ID: <4BC6CB8B.1090005@telkomsa.net>

Thanks Don, I suppose I could have done the research to find that out 
myself, but it never occurred to me to do so. What you say shows that 
the concept is by now very routine. Hardly surprising; in university 
librares and even surreptitiously in bookshops, energetic anti-Luddites 
are busy snapping away at books and articles.

Gardner,
Thanks for the URL.Many of the devices it illustrates certainly are 
impressive, but all demand more elaborate mechanisms than I have 
hitherto so much as considered. I am not saying that this disqualifies 
them from reasonable consideration, and certainly if I am working in 
poor light  it is necessary to scrounge a reading light, but so far I 
have managed acceptably with a hand-held camera for a few pages at a 
time for corrections or bits of newsprint etc, and the mutilated 
polystyrene table for a stand. I have occasionally used a tripod, which 
can be more suitable for some purposes. Another trick that is useful for 
more portable requirements is to se a stick such as a walking stick as a 
means of steadying the camera, a sort of monopodal tripod. It enables 
one to keep the camera steady enough for most purposes, plus controlling 
the distance well enough to use manual focus, which conserves battery 
power, gives more consistent results and increases speed.
If I had to do it again, I probably would have chosen something longer; 
my little table is only 40 cm high, and 60-80 cm would give less 
distortion and more even focus. It is no problem for little paperbacks, 
but large pages are not so good, so I have to work out something to 
raise the level. I haven't done much scanning lately, but soon I may 
consider carving up an inverted dustbin or something if I can't find a 
higher small table.
One thing I have not yet found is anything that I can use as a 
non-reflective overlay to flatten the pages without degrading the image 
or causing reflections. Picture glass doesn't work, unless there is a 
new grade that I don't know of.
Something I have not yet got round to obtaining or jury-rigging, is one 
of those nice little cable-attached plungers, or better, a foot pedal 
for taking the snaps. After a few hundred pages, groping for the button 
is a nuisance.

Sparr,
 >I have a few hundred thousand pages to scan, so a diy camera-style 
book scanner isn't appropriate, nor is a flatbed scanner. Thanks for the 
ideas, though.<

You are welcome. I did wonder. Then I assume that you are using a 
mechanical feed scanner. If so it is simply a matter of guillotining or 
otherwise amputating the gluey bits.
Someone once said something like: "If you can neither avoid it nor fix 
it, don't worry about it; it isn't a problem; it's reality."
Or as they said long ago, "What can't be cured must be endured."
Now, I don't know your circumstances, so everything I say is highly 
context sensitive, and please don't bite me if I tell you obviosities 
that have nothing to do with your needs and constraints (not to mention 
tastes, as Gardner instanced.)  BUT if the material cannot reasonably be 
chopped or automatically handled, then it might be time to reconsider.
How many pages per second  . . . AVERAGE, INCLUDING dealing with jams 
and messes . . . does your automated glue-hating system read clean?
If you cannot comfortably produce properly readable, OCRable pages at 
better than one per second, then you had better think of a few hundred 
thousand seconds.
One or a half per second is in any case what a camera with a system like 
mine could give you, once you are up to speed. I have occasionally torn 
glued pages apart for photographic work, but for me that was no problem, 
so guillotining and trimming did nt come into it.
At eight hours per day, you should be able to capture more than 100000 
pages per 5-day week. It certainly is not nice, but it beats a "faster" 
system that does not work, or at least does not work faster.
Just thoughts, together with the thought: "Sooner you than me!"  ;-)

Go well folks,

Jon


On 2010/04/15 03:09 AM, don kretz wrote:
> I bought a crappy digital camera to use
> "most of the time". It didn't even come with
> a manual - just a url to download it.
>
> But it did have instructions on how to
> scan a book.
>
> Don
>
> On Wed, Apr 14, 2010 at 6:07 PM, Gardner Buchanan 
> <gbuchana at teksavvy.com <mailto:gbuchana at teksavvy.com>> wrote:
>
>     Hi Jon,
>
>     Nope, I think you're part of a pretty popular movement there.
>     There's a whole cottage industry of building home-made
>     book scanners that consist of a jig to hold the book and
>     a pair of digital cameras positioned to capture the two
>     facing pages.  Look at http://www.diybookscanner.org/
>
>     Personally, I still use a flatbed, but that's because
>     I'm a Luddite.
>
>
>     On 14-Apr-2010 10:42, Jon Richfield wrote:
>
>         FWIW, Since I bought myself a digital camera for general use,
>         plus a
>         copy of Omniscan, my scanner has been pretty well idle.
>
>     [...]
>
>
>         Is my choice unusual in any way?
>
>
>
>     ============================================================
>     Gardner Buchanan <gbuchana at teksavvy.com
>     <mailto:gbuchana at teksavvy.com>>
>     Ottawa, ON             FreeBSD: Where you want to go. Today.
>
>     _______________________________________________
>     gutvol-d mailing list
>     gutvol-d at lists.pglaf.org <mailto:gutvol-d at lists.pglaf.org>
>     http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100415/ec50218d/attachment.html>

From mrcdh58 at gmail.com  Thu Apr 15 02:03:47 2010
From: mrcdh58 at gmail.com (Marc D'Hooghe)
Date: Thu, 15 Apr 2010 11:03:47 +0200
Subject: [gutvol-d] freeliterature.org
Message-ID: <u2m8e04d6361004150203n1c782afel15824fd095deea5a@mail.gmail.com>

Hi there,

I started a couple of weeks ago with a new site in support of PG.

http://www.freeliterature.org.

Two goals: two spread information about free e-books and literature on the
web (extensive link list) - and the possibility to help producing e-text by
proofreading.

You can download the scans of a book of your choice, and the text to proof
is sent on demand.

Enjoy.

Marc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100415/41003b02/attachment-0001.html>

From ricardofdiogo at gmail.com  Thu Apr 15 06:29:01 2010
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Thu, 15 Apr 2010 14:29:01 +0100
Subject: [gutvol-d] Re: PDF-files
In-Reply-To: <1EA29AAF01F04633AE18B692A7AD561D@alp2400>
References: <7527194b0912050812s17b817f5i81e0398905f15c68@mail.gmail.com>
	<1EA29AAF01F04633AE18B692A7AD561D@alp2400>
Message-ID: <t2s9c6138c51004150629h2cfc314ao3950a5dd1defc89a@mail.gmail.com>

Ou ent?o http://www.gutenberg.org/wiki/Category:PT_PergFreq

Ricardo

From Bowerbird at aol.com  Thu Apr 15 16:09:22 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 15 Apr 2010 19:09:22 EDT
Subject: [gutvol-d] html/css expert advice sought
Message-ID: <33ee1.72d80d3e.38f8f6a2@aol.com>

here's an example of the code i'm using for multiple columns:

>   http://z-m-l.com/go/2-column-good-xml.html
>    http://z-m-l.com/go/3-column-good-xml.html

if anyone has advice on how to improve that code, i'm all ears...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100415/25192663/attachment.html>

From cannona at fireantproductions.com  Thu Apr 15 16:43:14 2010
From: cannona at fireantproductions.com (Aaron Cannon)
Date: Thu, 15 Apr 2010 18:43:14 -0500
Subject: [gutvol-d] Re: Project Gutenberg DVD Release Candidate now
	available for testing
In-Reply-To: <l2q628c29181004141646y450f5b85m4acc0e332bb1694b@mail.gmail.com>
References: <o2p628c29181004102045ped65b48em6d123d94bc3119fc@mail.gmail.com>
	<20100413073253.GA1262@pglaf.org> <4BC5241A.2040409@teksavvy.com>
	<x2x628c29181004132117r7a410a9dpf8ed49e292e4a548@mail.gmail.com>
	<l2q628c29181004141646y450f5b85m4acc0e332bb1694b@mail.gmail.com>
Message-ID: <p2l628c29181004151643m2e0b7a16n911500fe5f972d9c@mail.gmail.com>

Hi all.

I fixed the reported bugs and the second release candidate is now
available for download.

If you already downloaded the first DVD, you can download a binary
delta which can be used to patch the old one.  You will need xdelta3
for this (on Ubuntu you can just do sudo apt-get install xdelta3


use the following commands
first, change to the directory where the .iso is located, then do:
wget http://www.fireantproductions.com/delta.bin
xdelta3 -d -s pgdvd201004-rc1.iso delta.bin pgdvd042010-rc2.iso
rm delta.bin

and optionally

rm pgdvd201004-rc1.iso

If you haven't downloaded the previous version, or if the delta patch
doesn't work, you can find the new torrent at
http://www.fireantproductions.com/pgdvd042010-rc2.torrent

The md5sum for the new ISO is
B1EA6C8C15BB2EE84126227017F56310

Thanks.

Aaron

On 4/14/10, Aaron Cannon <cannona at fireantproductions.com> wrote:
> Hi all.
>
> Thanks to all those who downloaded and sent me feedback.
>
> I figured out what was causing the broken links.  Turns out that when
> I was creating an ISO file, all the "-"s in filenames were being
> changed to "_"s, so as to meet the ISO9660 standard, which doesn't
> permit filenames to contain "-".  So, 3533-h.zip was being changed to
> 3533_h.zip.  So, I am having to change the HTML to deal with this.
>
> I should have Release Candidate 2 ready for upload tomorrow.  I'll
> keep everyone posted.
>
> Thanks.
>
> Aaron
>
> On 4/13/10, Aaron Cannon <cannona at fireantproductions.com> wrote:
>> It looks like the HTML was changed some how from my local copy.  For
>> instance, my local copy has no content directory.  All of the numbered
>> directories are in the root, though come to think of it, it probably
>> would have been better to create a content directory.  Nevertheless, I
>> believe that it must have been altered to serve the books from a
>> content directory when it was placed on the web server.  So, to answer
>> your concern, the link works on the real ISO.
>>
>> Also, I don't think I want to mess with the books we've already got on
>> there.  It won't really make much difference in size, and I don't want
>> to add books at this point, so I guess I just don't see any compelling
>> reason to mess with it.  Thanks for the suggestions though!
>>
>> Thanks a lot for looking.  I really appreciate the feedback!
>>
>> If you find anything else, please let me know.  Also, Maybe Greg can
>> comment on the 404 error.
>>
>> Thanks.
>>
>> Aaron
>>
>>
>>
>> On 4/13/10, Gardner Buchanan <gbuchana at teksavvy.com> wrote:
>>> Hi Aaron,
>>>
>>> Similarly, I have found that this link does not work:
>>>
>>> http://pglaf.org/PGDVD201004-RC1/content/3/5/3/3533/3533-h.zip
>>>
>>> followed from here:
>>> http://pglaf.org/PGDVD201004-RC1/content/etext/3533.html
>>>
>>> If you have to muck with things, I also noticed that the following
>>> section includes the separate volume 1 and 2 of _the Attache_ as
>>> well as the omnibus.  Likely only the two volumes of the omnibus
>>> are really needed.
>>>
>>> Canada -- Social life and customs -- Fiction
>>>
>>>      * Sunshine Sketches of a Little Town (English) By Leacock, Stephen,
>>> 1869-1944
>>>      * The Attach?; or, Sam Slick in England ? Complete (English) By
>>> Haliburton, Thomas Chandler, 1796-1865
>>>      * The Attach?; or, Sam Slick in England ? Volume 01 (English) By
>>> Haliburton, Thomas Chandler, 1796-1865
>>>      * The Attach?; or, Sam Slick in England ? Volume 02 (English) By
>>> Haliburton, Thomas Chandler, 1796-1865
>>>
>>>
>>> On 13-Apr-2010 03:32, Greg Newby wrote:
>>>>
>>>> Has a link checker been run on this?  I quickly found a missing
>>>> file via file:///PGDVD_2010_04_RC1/etext/3002.html
>>>> and wonder whether some filetypes or other content might not
>>>> have made it.
>>>
>>> ============================================================
>>> Gardner Buchanan                     <gbuchana at teksavvy.com>
>>> Ottawa, ON             FreeBSD: Where you want to go. Today.
>>>
>>
>

From prosfilaes at gmail.com  Thu Apr 15 20:09:19 2010
From: prosfilaes at gmail.com (David Starner)
Date: Thu, 15 Apr 2010 23:09:19 -0400
Subject: [gutvol-d] Re: !@!!@!!@!Re: Re: so what is so important about
	pagination?
In-Reply-To: <1266938880-sup-4545@zion>
References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion>
Message-ID: <z2l6d99d1fd1004152009h7245507dnd8f85fe847f10185@mail.gmail.com>

On Tue, Feb 23, 2010 at 2:22 PM, Michael McDermott
<mmcdermott at mad-computer-scientist.com> wrote:
> Readers do
> not care about the original pages. There have been many editions of Twain or
> Shakespeare.

But a lot of books aren't Twain or Shakespeare. Most non-fiction books
are littered with page numbers that probably should be converted to
hyperlinks, but that's a lot of work. And non-fiction books reference
page numbers in other books.

-- 
Kie ekzistas vivo, ekzistas espero.

From Bowerbird at aol.com  Fri Apr 16 11:58:11 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 16 Apr 2010 14:58:11 EDT
Subject: [gutvol-d] so let's talk about my collaborative proofreading site,
	part 3
Message-ID: <261a.60807b81.38fa0d43@aol.com>

here's more info on my collaborative proofreading site...

***

to see what we're talking about, you can visit this u.r.l.:
>    http://z-m-l.com/go/sitka/editr.pl

***

we've talked about 4 main topics:
>    navigating the pages...
>    certifying a page as clean...
>    searching the book for a string...
>    feel the power with the "command" field...

under the 4th topic -- the command field --
we've discussed these commands you can issue:
>    showmap...
>    concat...
>    showcustom...
>    blubberbaby...
>    pairsearch...
>    end-page-hyphenates...

today we'll discuss a few more commands...

>    copyfootnotes...
>    movefootnotes...
>    show-end-line-hyphenates...

***


copyfootnotes...

some e-book formats want the footnotes collected
together into their own section (a la "endnotes")...

to accomplish this, enter "copyfootnotes" into the
search-field, and then click the "find" button...

all of the footnotes will be presented on a screen,
so you can copy them en masse.   this command
leaves the footnotes unmolested on their pages...

***


movefootnotes...

movefootnotes is another command that does the
same as "copyfootnotes", except "movefootnotes"
also deletes each footnote from its original page...

i'll note that neither of these commands should be
used until proofing has been completely finished.
until that time, you want to leave the footnote in
the one place where it can be most easily proofed,
which is right there on that page, next to the scan.

for the moment, i have disabled this command...
once i've programmed the "mass revert" ability,
to reverse any sabotage effort, i will reinstate it.

***


show-end-line-hyphenates...

you'll probably recall that i encourage people to
_retain_ original linebreaks from the paper-book,
expressly including all the end-line-hyphenates...

this makes it much easier to do proofing, as even
distributed proofreaders and project gutenberg
acknowledge when it comes to _them_ proofing.

(so why they rewrap their text before giving it to
other people is a bit disingenuous; but i digress.)

at any rate, one slight problem with this approach
is that the hyphenated fragments often do _not_
pass spellcheck, and thus are unnecessarily flagged.

for instance, you might have the first part of a frag-
ment on the top line, and the second on the bottom,
and neither "frag-" nor "ment" will pass spellcheck.

this command helps you solve that little problem.

"show-end-line-hyphenates" will list all of them,
as you might expect, but it does a little bit more.

first, it tests if the rejoined form passes spellcheck.

if so, then it gives you both fragments, so that you
can include them in the book's custom dictionary...

this command also surveys the full book to see
how many times the rejoined form appears in it
-- with hyphen, without it, and as two words --
and informs you of the counts, which is good info.

i restored all of the end-line-hyphenates on many
pages within the "sitka" book, and you can observe
the output from "show-end-line-hyphenates" here:
>    http://z-m-l.com/go/sitka/hyphenates-output.html

***

while we're on the subject of end-line-hyphenates,
i should briefly address one of the thorny matters...

i've always maintained that users should be able to
unwrap the text themselves, any time they wanted.
indeed, i've said we should give them tools to do it.

even more than _that,_ i've _provided_ such a tool:
>    http://z-m-l.com/go/unwrap.pl

in most cases, an end-line-hyphenate is _easy_ to
resolve.   you eliminate the dash and then bring up
the first string from the next line and concatenate it.

simple enough.

the glitch happens when it was a _compound_word_
-- i.e., a word that includes a dash in it _normally._

in word-processing parlance, this is known as the
difference between a "hard" and a "soft" hyphen...

so, in order to indicate to the unwrap routine that any
particular dash at the end of a line is a "hard" hyphen,
to be retained, we need to give it some kind of marker.

i've decided -- tentative to testing for problems --
this marker will be the "~" character, after the dash.

you can see cases in the sitka book where this happens:
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap007
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap019
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap093
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap094
>    http://z-m-l.com/go/sitka/editr.pl?bpn=sitkap107

the lines from those 6 cases are listed here, respectively:

>    sions in America. The sails of ships from far-~ 
>    off Kronstadt on the Baltic brought Russian

>    during the winter the hunters took 40 sea-~
>    lions, and in the spring many seals were

>    of ancient Venice. The picturesque, dark-~
>    skinned Thlingit women sit at the doors of

>    Russian fur warehouse. Next is the three-~
>    story building used for courthouse and jail,

>    and later of the U.S. Marines from the Man-~
>    of-War which was stationed here. East of

>    sea. Eastward crest after crest of glacier-~
>    capped peaks rise for a hundred miles,

so when these are unwrapped, the words "far-off"
and "sea-lion" and "dark-skinned" and "three-story"
and "man-of-war" and "glacier-capped" will now be
rendered as they should be -- as compound words...

***

based on my long observation, i'd say dehyphenation is
one of the most _inelegant_ aspects of the d.p. system...

first of all, it causes unnecessary work for the proofers,
because it's more difficult to proof when the linebreaks
have been disturbed in any way.   even though the effect
is relatively small when it's just on end-line-hyphenates,
it still cumulates.   (and the dictum against "unclothed"
em-dashes at line-ends adds to this cumulative effect.)

this shifting of original linebreaks causes line-lengths
to become uneven, introducing a variety of problems
in that some routines that _could_ be written to help
process the text depend on line-lengths, and thus are
sabotaged when we change the line-lengths arbitrarily.

second, dehyphenation itself is work, because proofers
(who do not have access to any book-wide information)
have to make a judgment about whether the hyphen is
to be retained or not, which is fraught with ambiguity...
this leads to diffs, which chew even more proofer time.
indeed, in the "perpetual" projects, we saw cases where
one proofer would take out a hyphen, and another one
would put it back with an asterisk (meaning "check it").
and then the third proofer would take out the asterisk!
and of course, if a proofer makes a bad decision, that
pollutes the text, which can lead to more bad decisions.

decisions on all end-line-hyphenates should be made
during preprocessing.   then if the proofers challenge
any of the decisions, the postprocessor can decide that.
that's the only sensible workflow.

and this "show-end-line-hyphenates" command shows
that it is indeed possible to handle end-line-hyphenates
in a manner that is simple, yet adequately sophisticated.

***

so those are our 3 new commands for the weekend...

more later...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/339dbc6c/attachment-0001.html>

From mmcdermott at mad-computer-scientist.com  Fri Apr 16 12:03:49 2010
From: mmcdermott at mad-computer-scientist.com (Michael McDermott)
Date: Fri, 16 Apr 2010 14:03:49 -0500
Subject: [gutvol-d] Typesetting
Message-ID: <1271444206-sup-9976@zion>

Like many I'm sure (all right, I'm not really sure), I like
ebooks/etexts but do not like to read them on a computer screen. This is
largely, no doubt, because I work with a computer all day anyway--a book
should be a place to get away from it all for a little. The natural
thing to do is to print the text out and read it. The question then is:
how do we typeset it? 

The first thing I looked at was GutenMark. I was a little disappointed
when I tried it on _Gods and Fighting Men_ by Lady Augusta Gregory. The
LaTeX it generated was invalid.

Then I took the HTML version of said book and ran it through HTML2PS.
The results were serviceable, but looked like, well, a printed web page.

a2ps worked in the most rudimentary sense. The font was still a
fixed width font, the paragraphs were not reformatted, so there was a
lot of unused space on the right hand side of the page.

One of the last two would, of course, do in a pinch, but I was wondering
whether anyone else here had any ideas/recipes on how to automatically
or mostly-automatically typeset a PG etext for printing.
-- 
Michael McDermott
www.mad-computer-scientist.com

From Bowerbird at aol.com  Fri Apr 16 14:36:41 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 16 Apr 2010 17:36:41 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <3426.7f772dcd.38fa3269@aol.com>

michael said:
>    Like many I'm sure (all right, I'm not really sure), 
>    I like ebooks/etexts but do not like to read them 
>    on a computer screen.

there are many people like you.   most of them are old.

which is to say that very few young people feel that way.

it does _not_ mean that all "old" people agree with you;
many oldsters are very comfortable reading on screens.


>    This is largely, no doubt, because I work with 
>    a computer all day anyway--a book should be 
>    a place to get away from it all for a little. 

i'm sorry you hate your job...           ;+)

and maybe that's part of the problem.   perhaps you
don't really hate reading off a screen, you just hate
doing it at a computer while you're sitting at a desk.

in which case an ipad might be a very nice solution...
(it _would_ save some trees.   or maybe just one tree.
but every tree we save is one more tree on the earth.)


>    The natural thing to do is to 
>    print the text out and read it. 
>    The question then is:
>    how do we typeset it?

boy, you _are_ old, aren't you?          :+)

"typesetting" is such a quaint term, charming and cute.

even "desktop publishing" now seems badly outdated.


>    One of the last two would, of course, do in a pinch, 
>    but I was wondering whether anyone else here 
>    had any ideas/recipes on how to automatically or 
>    mostly-automatically typeset a PG etext for printing.

well, yeah.

but what are your expectations?   what are your demands?

if you were to do the job for an individual e-text, perhaps
like the one you mentioned, what changes would you make?

let's start with ripping out the legalese and go from there...

you talked about unwrapping paragraphs.   you'd do that?
(were they too long for you, or too short for you, or what?)

of course you don't want a monospaced font, but which
fonts would you settle for?   times new roman?   helvetica?
or do you need an ability to use any font on your machine?

what about paragraphing?   block paragraphs, or indentation?

do you want full-justification, or is ragged-right acceptable?

hyphenation, or not?   if you could have the original linebreaks,
complete with the original end-of-line-hyphenates, would you?

how about chapter-headings?   page-top?   recto?   double-truck?

curly-quotes?   typographic em-dashes?   footnotes or endnotes?

runheads?   do you want pagenumbers?   if so, printed where?

what pagesize would you prefer?   8.5*11?   or 5.5*8.5 for 2-up?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/4b12a17a/attachment.html>

From mmcdermott at mad-computer-scientist.com  Fri Apr 16 15:37:05 2010
From: mmcdermott at mad-computer-scientist.com (Michael McDermott)
Date: Fri, 16 Apr 2010 17:37:05 -0500
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <3426.7f772dcd.38fa3269@aol.com>
References: <3426.7f772dcd.38fa3269@aol.com>
Message-ID: <1271457420-sup-4620@zion>

Excerpts from Bowerbird's message of Fri Apr 16 16:36:41 -0500 2010:
> michael said:
> >    Like many I'm sure (all right, I'm not really sure), 
> >    I like ebooks/etexts but do not like to read them 
> >    on a computer screen.
> 
> there are many people like you.   most of them are old.
> 
> which is to say that very few young people feel that way.
> 
> it does _not_ mean that all "old" people agree with you;
> many oldsters are very comfortable reading on screens.
> 
> >    This is largely, no doubt, because I work with 
> >    a computer all day anyway--a book should be 
> >    a place to get away from it all for a little. 
> 
> i'm sorry you hate your job...           ;+)
> 
> and maybe that's part of the problem.   perhaps you
> don't really hate reading off a screen, you just hate
> doing it at a computer while you're sitting at a desk.
> 
> in which case an ipad might be a very nice solution...
> (it _would_ save some trees.   or maybe just one tree.
> but every tree we save is one more tree on the earth.)
> 
> >    The natural thing to do is to 
> >    print the text out and read it. 
> >    The question then is:
> >    how do we typeset it?
> 
> boy, you _are_ old, aren't you?          :+)
> 
> "typesetting" is such a quaint term, charming and cute.
> 
> even "desktop publishing" now seems badly outdated.
> 
> >    One of the last two would, of course, do in a pinch, 
> >    but I was wondering whether anyone else here 
> >    had any ideas/recipes on how to automatically or 
> >    mostly-automatically typeset a PG etext for printing.
> 
> well, yeah.
> 
> but what are your expectations?   what are your demands?
> 
> if you were to do the job for an individual e-text, perhaps
> like the one you mentioned, what changes would you make?
> 
> let's start with ripping out the legalese and go from there...
> 
> you talked about unwrapping paragraphs.   you'd do that?
> (were they too long for you, or too short for you, or what?)
> 
> of course you don't want a monospaced font, but which
> fonts would you settle for?   times new roman?   helvetica?
> or do you need an ability to use any font on your machine?
> 
> what about paragraphing?   block paragraphs, or indentation?
> 
> do you want full-justification, or is ragged-right acceptable?
> 
> hyphenation, or not?   if you could have the original linebreaks,
> complete with the original end-of-line-hyphenates, would you?
> 
> how about chapter-headings?   page-top?   recto?   double-truck?
> 
> curly-quotes?   typographic em-dashes?   footnotes or endnotes?
> 
> runheads?   do you want pagenumbers?   if so, printed where?
> 
> what pagesize would you prefer?   8.5*11?   or 5.5*8.5 for 2-up?
> 
> -bowerbird
-- 
Michael McDermott
www.mad-computer-scientist.com

From mmcdermott at mad-computer-scientist.com  Fri Apr 16 16:04:22 2010
From: mmcdermott at mad-computer-scientist.com (Michael McDermott)
Date: Fri, 16 Apr 2010 18:04:22 -0500
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <1271457420-sup-4620@zion>
References: <3426.7f772dcd.38fa3269@aol.com> <1271457420-sup-4620@zion>
Message-ID: <1271457428-sup-4270@zion>

> > there are many people like you.   most of them are old.

I'm in my early 20s.

> > i'm sorry you hate your job...           ;+)

> > and maybe that's part of the problem.   perhaps you
> > don't really hate reading off a screen, you just hate
> > doing it at a computer while you're sitting at a desk.

Well, your psychoanalyzing is interesting (sarc.), but I like what I do.
I do not like eyestrain and I like the variety that print media provides.

> > "typesetting" is such a quaint term, charming and cute.

An old term to be sure, but I like it. "Desktop publishing" was a lame
term, even when it was in vogue.

> > boy, you _are_ old, aren't you?          :+)

No. There are a finite number of options: a computer screen (a
blackberry screen is just a small computer screen), an eink screen
(which would be a good compromise if I had the spare cash), or print.
I'm trying to move away from 1, 2 is impractical for the time being, and
that brings us to 3.

> > but what are your expectations?   what are your demands?

I have no demands, per se. It was a question. Googling did not turn up
anything convenient. The only real option would be to convert each text
manually into LaTeX or some lightweight format like asciidoc (my
personal favorite).

Largely, I am looking to see if anyone else has a solution to a problem
before I break out an interpreter/compiler and get cracking on my own.

Nitpicking aside, you raise a valid point. What do I want? 

* Automatic or mostly automatic. This is all done by running a single
  command or with some slight configuration changes to said command.
* Font family selection. I don't personally care about picking an exact
  font, but font family select ala CSS would be nice, with a reasonable
  default of the Roman variety.
* Paragraph lines should run to the end of the printed page--be that
  margins or whatnot.
* On screen, I like block paragraphs, but in print indented ones.
  Optimally, this would be user-settable.
* Page size I would want to set, but 2 pages printed on an
  8.5x11 sheet in practice.
* I care little about hyphenation vs wrapping, but I would want the text
  conformed to the print media, not verbatim of the original edition.
  This is, after all, one of the advantages of an etext--the ability to
  reflow the content as desired. 
* Page numbers, of course. 
* Curly quotes do not matter one way or another to me.
* em-dashes would be preferable.
* Footnotes and endnotes should be included, of course.

-Michael

Excerpts from Michael McDermott's message of Fri Apr 16 17:37:05 -0500 2010:
> Excerpts from Bowerbird's message of Fri Apr 16 16:36:41 -0500 2010:
> > michael said:
> > >    Like many I'm sure (all right, I'm not really sure), 
> > >    I like ebooks/etexts but do not like to read them 
> > >    on a computer screen.
> > 
> > there are many people like you.   most of them are old.
> > 
> > which is to say that very few young people feel that way.
> > 
> > it does _not_ mean that all "old" people agree with you;
> > many oldsters are very comfortable reading on screens.
> > 
> > >    This is largely, no doubt, because I work with 
> > >    a computer all day anyway--a book should be 
> > >    a place to get away from it all for a little. 
> > 
> > i'm sorry you hate your job...           ;+)
> > 
> > and maybe that's part of the problem.   perhaps you
> > don't really hate reading off a screen, you just hate
> > doing it at a computer while you're sitting at a desk.
> > 
> > in which case an ipad might be a very nice solution...
> > (it _would_ save some trees.   or maybe just one tree.
> > but every tree we save is one more tree on the earth.)
> > 
> > >    The natural thing to do is to 
> > >    print the text out and read it. 
> > >    The question then is:
> > >    how do we typeset it?
> > 
> > boy, you _are_ old, aren't you?          :+)
> > 
> > "typesetting" is such a quaint term, charming and cute.
> > 
> > even "desktop publishing" now seems badly outdated.
> > 
> > >    One of the last two would, of course, do in a pinch, 
> > >    but I was wondering whether anyone else here 
> > >    had any ideas/recipes on how to automatically or 
> > >    mostly-automatically typeset a PG etext for printing.
> > 
> > well, yeah.
> > 
> > but what are your expectations?   what are your demands?
> > 
> > if you were to do the job for an individual e-text, perhaps
> > like the one you mentioned, what changes would you make?
> > 
> > let's start with ripping out the legalese and go from there...
> > 
> > you talked about unwrapping paragraphs.   you'd do that?
> > (were they too long for you, or too short for you, or what?)
> > 
> > of course you don't want a monospaced font, but which
> > fonts would you settle for?   times new roman?   helvetica?
> > or do you need an ability to use any font on your machine?
> > 
> > what about paragraphing?   block paragraphs, or indentation?
> > 
> > do you want full-justification, or is ragged-right acceptable?
> > 
> > hyphenation, or not?   if you could have the original linebreaks,
> > complete with the original end-of-line-hyphenates, would you?
> > 
> > how about chapter-headings?   page-top?   recto?   double-truck?
> > 
> > curly-quotes?   typographic em-dashes?   footnotes or endnotes?
> > 
> > runheads?   do you want pagenumbers?   if so, printed where?
> > 
> > what pagesize would you prefer?   8.5*11?   or 5.5*8.5 for 2-up?
> > 
> > -bowerbird
-- 
Michael McDermott
www.mad-computer-scientist.com

From jimad at msn.com  Fri Apr 16 16:05:53 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 16:05:53 -0700
Subject: [gutvol-d] [SPAM] RE: Re: !@!!@!!@!Re: Re: so what is so important
	about	pagination?
In-Reply-To: <1266938880-sup-4545@zion>
References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion>
Message-ID: <SNT120-DS23FE9E81C1EDBDAF8E4AC3AE0E0@phx.gbl>

....are you interested in text preservation or in manuscript
preservation? 

PG & DP while they do good work for society don't actually do either of those things.  

What they do is transcription of a book into ASCII or something close to ASCII -- even when transcribing into HTML or ISO.  The end result is usually something that is readable and recognizable as being somehow more-or-less related to what the original author wrote and the original author published.  Is it "correct" ?  Of course not -- one cannot talk about "correctness" when something is 1) intended to be readable by today's audience, and 2) has been transcribed into something that is a small subset of what was available to publishers even by the 1700s 3) the chosen subset is primarily dictated by what can be easily input from a standard IBM chicklet keyboard and more-or-less OCR'ed by standard OCR software 4) a subset of punctuation and simplified punctuation rules have been adopted in practice which differ somewhat from that which obviously the author and publisher put in their books. 

One might be tempted to say that what PG & DP actually do is "word preservation" but actually they don't even really do that either.  Its really re-interpretation and republishing from one format -- on paper by professional publishers a long long time ago, into another format -- either a PG specific non-re-flowable electronic format built around "teletype" standards of the early 1970s, "ASCII, 70 chars more-or-less per line" similar to AP wire format, or to HTML for lowest-common-denominator browsers -- said constraint being in practice more likely the HTML to EPUB and/or HTML to MOBI converter routines and the limitations of EPUB and/or MOBI stand-alone reader hardware -- and doing so in a way that might actually be read by one or another target audiences on said devices.

Are these efforts successful?  I think so -- for example when I see a friend of mine has bought a new iPad and is happily reading a text I produced for PG prior to the iPad's announcement and my friend didn't even realize that I wrote it in HTML and PG published it -- because of course Apple strips out the PG header and transcriber acknowledgements before converting it to Apple DRM'ed EPUB and redistributing it as "Apple's own free book available only from the Apple iPad Store"! [ Thank You Jobs -- who's "1984" now??? ]


From jimad at msn.com  Fri Apr 16 16:24:23 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 16:24:23 -0700
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <52E0A8172A5F417986C50DAFE5D1132A@alp2400>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
Message-ID: <SNT120-DS1CEA6D044B4018CFBA83FAE0E0@phx.gbl>

Even more spectacular than the "illuminated letter" problem [which is bad
enough][and which I would hope most transcribers would avoid nowadays by
choosing NOT to include GIFs for illuminated letters and other trivial
"printers art"] you also have texts when the transcriber has chosen to leave
some text in GIF only mode, and/or other text in GIF mode AND OCR'ed mode,
such that the MOBI and EPUB versions may have 0, 1 or 2 copies of a
particular entire paragraph of text. And/or the HTML was written in a
non-linear form in which case the MOBI and EPUB versions may have 0, 1, 2 or
N copies of any particular passage in the text. And captions on images may
be retained in the image, included in the HTML, and/or included in the
alt-tag meaning that a particular user with a particular reading device may
see or hear the image caption 0, 1, 2 or 3 times.


From lee at novomail.net  Fri Apr 16 16:49:33 2010
From: lee at novomail.net (Lee Passey)
Date: Fri, 16 Apr 2010 17:49:33 -0600
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <1271444206-sup-9976@zion>
References: <1271444206-sup-9976@zion>
Message-ID: <4BC8F78D.1000707@novomail.net>

On 4/16/2010 1:03 PM, Michael McDermott wrote:

[snip]

> One of the last two would, of course, do in a pinch, but I was wondering
> whether anyone else here had any ideas/recipes on how to automatically
> or mostly-automatically typeset a PG etext for printing.

As bowerbird is unfailingly quick to point out, automatic processing of 
any document file relies on the file being regularized in such a way 
that any transformation you which to make is unambiguously identifiable. 
If it is not possible to unambiguously identify a transformation you 
wish to make, the file must include unambiguously identifiable 
meta-information (information that is not part of the primary data) that 
identifies the transformation (this kind of meta-information is commonly 
known as "markup").

Project Gutenberg requires no textual regularization of any kind for its 
impoverished text files, and therefore these files are extremely 
difficult to automatically transform. Of course, there are some 
conventions which have evolved some of which are used more regularly 
than others. Thus, if you are content with the italicization of text set 
off by underscores (_) you will probably be successful with this 
transformation more than 90% of the time. On the other hand, if you want 
to start chapters on a new page, you will probably be successful with 
that transformation less than 50% of the time. The degree of success you 
have will depend to a large extent on the degree of transformation you 
want to achieve; if you are content to simply print the file as is, 
changing only the font face (don't try to change the font size, or you 
will run into reflowing problems) you will can probably achieve 99+% 
success. If you want to make a PG file look like an ordinary paperback, 
certainly less than 50%. (This is, of course, assuming you are using 
"off-the-shelf" tools. If you're comfortable with scripting languages 
you could no doubt do better).

Your degree of success will also depend on the age of the PG file you 
want to transform. As time has gone on, and conventions have evolved, 
later texts are more "regular" than earlier texts; good luck converting 
_Pride and Prejudice_.

You will probably have the most success by using the HTML version of a 
file, when it can be found (I do not believe that the majority of texts 
at Project Gutenberg are yet available in HTML versions); this is 
because while PG HTML texts are still not completely consistent in their 
use of markup, they are probably /more/ consistent than the impoverished 
text files.

I am assuming you used html2ps or html2pdf version 2.0.43 available from 
http://www.tufat.com/s_html2ps_html2pdf.htm, and that you have 
completely read the documentation (BTW, I have not). According to the 
website, html2pdf almost completely supports CSS version 2, and the 
media parameters values of CSS3. Were it I (and it will not be, because 
I am completely happy reading HTML on my mobile device, and because I 
find PDF to be the one format which is actually worse than PG 
impoverished text format) I would find a css style sheet which has most 
of the features I like then use that with the PG HTML files and 
html2pdf. The resulting PDF can then be printed using Acrobat Reader or 
equivalent (if you are committed to the destruction of the environment).

I suspect that html2pdf will not consume a style sheet unless it is 
referenced by the html document itself, and DP/PG has been highly 
resistant to the notion of adding a reference to a generic style sheet 
in every HTML file, so you will probably have to edit each file to add 
"<link href="pgstd.css" type="text/css" rel="stylesheet" />" to the 
<head> section of each HTML file, but I would think that would fall 
under the category of "semi-automated." If you cannot find an HTML 
version of the text you want (be sure to look outside of PG, as there 
are many other sources) you might want to try bowerbird's ZML2HTML 
coverter; I suspect it may work about 75% of the time to get basic HTML 
out of PG impoverished text.

FWIW, the style sheet I typically use for reading HTML files can be 
found at http://www.ebookcooperative.com/ebook.css.


From jimad at msn.com  Fri Apr 16 16:52:28 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 16:52:28 -0700
Subject: [gutvol-d] [SPAM] RE: {Disarmed} Re: [SPAM] Re: Re: !@! #17135 Twas
	The Night	Before Christmas
In-Reply-To: <s2i627d59b81004141753qf7938b5aoa578c0d10edb3883@mail.gmail.com>
References: <30c96.47708041.38f79ecc@aol.com>
	<4BC66003.80201@teksavvy.com>	<alpine.DEB.2.00.1004142043530.5281@durendal.durendal.org>
	<s2i627d59b81004141753qf7938b5aoa578c0d10edb3883@mail.gmail.com>
Message-ID: <SNT120-DS1185CDE07B2103ABB412F8AE0E0@phx.gbl>

>Does anyone know of any epublisher other than PG that *does* distribute the
html we provide?

 
Not sure exactly what you are asking but Apple for example takes the PG
html, strips out the PG legalize and acknowledgment of the volunteers,
converts it to EPUB with DRM, and redistributes it "free" [where "Free" in
this case means being only able to get then book in DRM form and only being
able to get it directly from the Steve Jobs iPad monopoly]  One knows they
are not working from the txt versions of the files because the Apple
redistributions contains chars and formatting found only in the HTML
versions.  FreeKindleBooks redistributes in HTML form converted to MOBI and
retaining all the PG legalize and requirements. Mobileread has volunteers
which take the HTML usually heavily reformat it, strip it, and republish in
MOBI and EPUB formats while cackling about how much better their versions
are!  Many other sites appear to "down-convert" to a
least-common-denominator ASCII format before "up-converting" back to HTML,
MOBI, EPUB, etc. Presumably they are working from an ASCII version of an old
DVD distribution - getting "working" EPUB and MOBI from the HTML formats
tends to be "non-trivial", not to mention that some sites republish in say
two dozen different formats.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/3c1f1600/attachment.html>

From Bowerbird at aol.com  Fri Apr 16 16:54:03 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 16 Apr 2010 19:54:03 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <1f3b1.3bbc6acd.38fa529b@aol.com>

michael said:
>    I'm in my early 20s.

ok.

age is a state of mind.


>    I like what I do.
>    I do not like eyestrain

you get eyestrain from your screens?   really?
sincerely, you need to get better equipment.
there's no good reason for eyestrain any more.


>    and I like the variety that print media provides.

ok.

i find the web has 30 times more variety,
including most stuff you can find in print,
but everybody has their own sense of taste.


>    An old term to be sure, but I like it.

i like it too.   it's charming.   and cute.         :+)


>    The only real option would be to 
>    convert each text manually into LaTeX 
>    or some lightweight format like asciidoc 
>    (my personal favorite).

if you read the archives, you'll see that
i have been the major cheerleader here
for light-markup since way back in 2003.


>    Largely, I am looking to see 
>    if anyone else has a solution

i have a solution.   it's sitting on my hard-drive.
you might be the person who springs it free...


>    before I break out an interpreter/compiler 
>    and get cracking on my own.

i encourage you to get cracking on your own...
i'd love to have someone to compare notes with.
i would offer you all kinds of advice in your work,
probably (to be frank) whether you want it or not.


>    Nitpicking aside, you raise a valid point.

we're just having fun...   don't take it too seriously.

but that wasn't nitpicking.   i was running through
the checklist that you will eventually have to make,
if you want to offer your program to anybody else.
(which you might or might not wanna do, i dunno.)


>   What do I want?

yes please, do tell...


>    * Automatic or mostly automatic. 

makes sense.

fully automatic just isn't possible, not across the full
p.g. library, but "mostly automatic" is within range...


>    * Font family selection. I don't personally 
>    care about picking an exact font

i ask about times new roman and helvetica in particular
because some aspects of my solution can only use those.

but if we take another approach, you can use any font...


>    * Paragraph lines should run to the end of the printed page

ok, but are lines as they're wrapped in a typical p.g. e-text
too short, or too long?   in other words, what's the measure?
(that's the typographic term for the length of your lines, i.e.,
the pagesize minus your margins.)

this will depend on the fontsize, which i forgot to ask about.


>    * On screen, I like block paragraphs, but in print indented ones.

ok.


>    Optimally, this would be user-settable.

optimally, _everything_ is user-settable.


>    * Page size I would want to set, but 
>    2 pages printed on an 8.5x11 sheet in practice.

that's right.


>    * I care little about hyphenation vs wrapping, but 
>    I would want the text conformed to the print media, 
>    not verbatim of the original edition.

that's a bit ironic, since the original edition _was_
text that was made to conform to the print media.
so there is a certain bit of contradiction in there...
but i'll let it pass.


>    This is, after all, one of the advantages of an etext
>    --the ability to reflow the content as desired.

well, yes, of course.   but once you've printed it out,
you've lost that ability-to-reflow.   so does it matter?
(never mind, it's just another philosophical question.)


>    * Page numbers, of course.

of course, of course.


>    * Curly quotes do not matter one way or another to me.

ok.


>    * em-dashes would be preferable.

ok.


>    * Footnotes and endnotes should be included, of course.

yes, of course they should be included.   i was asking if
you had a preference for one over the other, because it
can get very hairy to do footnotes in a rewrapped e-text.
it's easier to do endnotes.   but if you _want_ footnotes...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/a7b397d8/attachment.html>

From jimad at msn.com  Fri Apr 16 17:18:12 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 17:18:12 -0700
Subject: [gutvol-d] Re: html/css expert advice sought
In-Reply-To: <33ee1.72d80d3e.38f8f6a2@aol.com>
References: <33ee1.72d80d3e.38f8f6a2@aol.com>
Message-ID: <SNT120-DS315FC5D32FB9BDC244547AE0D0@phx.gbl>

Not an expert html coder, but 2-column "works" on my 1280 display in both IE
and Firefox, whereas 3-column does bad things with the page image. 


>if anyone has advice on how to improve that code, i'm all ears...


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/f6b748b0/attachment-0001.html>

From jimad at msn.com  Fri Apr 16 17:36:40 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 17:36:40 -0700
Subject: [gutvol-d] [SPAM] RE: so let's talk about my collaborative
	proofreading site, part 3
In-Reply-To: <261a.60807b81.38fa0d43@aol.com>
References: <261a.60807b81.38fa0d43@aol.com>
Message-ID: <SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>

Agreed that mindlessly changing hyphens to check-hyphens is one way some P3
automatons introduce more damage than they're worth.

 
There is a general problem in DP that proofers introduce a check-hyphen when
they mean "I really don't like the fact that the original book had a hyphen
there."

 
Well, too bad.  If the original book had a hyphen there then the two options
are: 

 
1)      Join with hyphen, or

2)      Join without hyphen

 
"Throw the hyphen away because I do not like it" is not an option.

 
Typical example is something like:

 
..school-

teacher.

 
Where the two plausible answers could be:

 
.school-teacher.

 
or

 
.schoolteacher.

 
and of course the proofer automaton changes this to

 
.school-*teacher.

 
meaning "gee I wish the author had written this as:"

 
.school teacher.

 
which of course is not an option.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/67d61f81/attachment.html>

From jimad at msn.com  Fri Apr 16 17:47:24 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 17:47:24 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <3426.7f772dcd.38fa3269@aol.com>
References: <3426.7f772dcd.38fa3269@aol.com>
Message-ID: <SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>

>in which case an ipad might be a very nice solution...
(it _would_ save some trees.  or maybe just one tree.
but every tree we save is one more tree on the earth.)


iPad is a very nice solution if 

 
a)      you want to let Steve Jobs decide what you get to read, where you
can download it from, what reader app you get to read it with, and how much
you pay for it.

b)      Its not important for you to acknowledge where you books are coming
from, who did them for you, and to be allowed to redistribute those books to
your friends.

c)       You don't mind reading your books through a "screen door"

 
Good ebook readers allow:

 
YOU to easily get a book from any free site YOU choose - NOT Steve Jobs!

 
Choose from a variety of fonts.

 
Choose from a variety of font sizes - and easily change those font sizes
over the course of a day if your eyes begin to tire.

 
Choose how big the margins are - how many chars or words YOU want per line
of text.

 
Don't know about you, but I have absolutely no desire to have Steve Jobs
censor my reading materials - nor to censor the reading app that I use to
read those books - nor to monopolize the distribution channel! 

 
iPad is a HUGE step BACKWARDS as far as I can tell!

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/eadff20c/attachment.html>

From klofstrom at gmail.com  Fri Apr 16 17:57:03 2010
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 16 Apr 2010 14:57:03 -1000
Subject: [gutvol-d] Re: [SPAM] RE: so let's talk about my collaborative
	proofreading site, part 3
In-Reply-To: <SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
Message-ID: <p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>

On Fri, Apr 16, 2010 at 2:36 PM, James Adcock <jimad at msn.com> wrote:

> There is a general problem in DP that proofers introduce a check-hyphen when they mean ?I really don?t like the fact that the original book had a hyphen there.?

And you know this HOW?

What the asterisk means is, "I don't know how the author usually
spells this, whether closed (schoolteacher) or hyphenated
(school-teacher). This hyphen comes at the end of a line, so I don't
know whether to drop it or keep it. I'll put an asterisk there, so the
PPer can check the usage in the rest of the text. to see what spelling
the author usually uses, hyphenated or closed."

That's ALL it means. Uncertainty about the author's preferred
spelling. I know, as a professional copyeditor, that the
open/hyphenated/closed continuum is extremely mutable, that words have
changed over time (to-day becomes today), and that at any one time,
different authors and different publishing houses may make different
choices. (Copyeditor or copy-editor is just one example; you'll find
it both ways.)

Sometimes newbie proofers over-asterisk. The same word may occur on
the same page with the author's preferred spelling prominently on
display. But the newbie is afraid of making a judgment call and
asterisks anyway. No big deal. Better to be too careful than to drop
the hyphen and rejoin words that should be hyphenated rather than
closed up.

At times this list seems to function just as Encyclopedia Dramatica
does for Wikipedia; all the malcontents gather and mutter about THEM
over THERE doing it WRONG and THEY didn't listen to ME.

--
Karen Lofstrom

From jimad at msn.com  Fri Apr 16 17:57:34 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 17:57:34 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <4BC8F78D.1000707@novomail.net>
References: <1271444206-sup-9976@zion> <4BC8F78D.1000707@novomail.net>
Message-ID: <SNT120-DS671C00FB138FAC30A495AAE0D0@phx.gbl>

>...DP/PG has been highly resistant to the notion of adding a reference to a
generic style sheet 
in every HTML file...

Anything more than the simplest uses of CSS tends to break the conversion of
HTML into EPUB and MOBI that can be successfully used by most ebook readers
-- not to mention older browsers.


From jimad at msn.com  Fri Apr 16 18:12:51 2010
From: jimad at msn.com (James Adcock)
Date: Fri, 16 Apr 2010 18:12:51 -0700
Subject: [gutvol-d] [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative	proofreading site, part 3
In-Reply-To: <p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
Message-ID: <SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>

>And you know this HOW?

Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2) As
a PP I have attempted to "fix" check-hyphens and to do so one has to try to
understand what it was that the P3 was complaining about.  I've emailed some
and said "what were you thinking?" and they say "oops, you're right, I was
basically thinking that I wished the hyphen wasn't there."

Lot's of people put check-hyphen in there when they are feeling
"uncomfortable."  Feeling "uncomfortable" isn't *sufficient* reason to put a
check-hyphen in -- because if you do so then you make the PP uncomfortable
too -- and who has no recourse except a) ignore the check-hyphen. b) waste
copious amounts of time trying to double check the hyphen against the
author's published corpus c) write the proofer an email and hope some day
they will respond honestly and tell you what they were thinking if they were
thinking when they entered the check-hyphen.  

Many authors put hyphens in places which today make to our modern tastes
feel uncomfortable.  That is not enough reason to insert a check-hyphen. A
check-hyphen is basically a punt to the PP -- who is no better placed to
resolve the issue.


From klofstrom at gmail.com  Fri Apr 16 18:19:46 2010
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 16 Apr 2010 15:19:46 -1000
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
Message-ID: <q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>

On Fri, Apr 16, 2010 at 3:12 PM, James Adcock <jimad at msn.com> wrote:

> Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2) As a PP I have attempted to "fix" check-hyphens and to do so one has to try to understand what it was that the P3 was complaining about. ?I've emailed some and said "what were you thinking?" and they say "oops, you're right, I was basically thinking that I wished the hyphen wasn't there."

Those are folks who don't understand the rules. You ran into some bad
P3ers; that doesn't mean that all of us in P3 are like that.

You could have *corrected* the misconceptions, rather than deciding
that we're all idiots.

--
Karen Lofstrom
not an idiot
(at least in THIS area)

From prosfilaes at gmail.com  Fri Apr 16 18:39:00 2010
From: prosfilaes at gmail.com (David Starner)
Date: Fri, 16 Apr 2010 21:39:00 -0400
Subject: [gutvol-d] Re: [SPAM] RE: Re: !@!!@!!@!Re: Re: so what is so
	important about pagination?
In-Reply-To: <SNT120-DS23FE9E81C1EDBDAF8E4AC3AE0E0@phx.gbl>
References: <1b8ef.3a619508.38b47a10@aol.com> <1266938880-sup-4545@zion>
	<SNT120-DS23FE9E81C1EDBDAF8E4AC3AE0E0@phx.gbl>
Message-ID: <h2i6d99d1fd1004161839n4a411e7chfb3ed54a8afb1be1@mail.gmail.com>

On Fri, Apr 16, 2010 at 7:05 PM, James Adcock <jimad at msn.com> wrote:
> Of course not -- one cannot talk about "correctness" when something is 1) intended to be readable by today's audience, and 2) has been transcribed into something that is a small subset of what was available to publishers even by the 1700s 3) the chosen subset is primarily dictated by what can be easily input from a standard IBM chicklet keyboard and more-or-less OCR'ed by standard OCR software 4) a subset of punctuation and simplified punctuation rules have been adopted in practice which differ somewhat from that which obviously the author and publisher put in their books.

One can always talk about correctness; it comes in many different
levels and varieties. Just because the New Testament was written in
Greek, doesn't mean we can't call an English translation wrong where
the Gospel of John starts: "Send David all your money in small
unmarked bills." I rather like that translation, but objectively
speaking, it doesn't represent the original Greek in any way, shape or
form.

-- 
Kie ekzistas vivo, ekzistas espero.

From dakretz at gmail.com  Fri Apr 16 20:26:20 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 16 Apr 2010 20:26:20 -0700
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
Message-ID: <p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>

It's also possible that the P3er was responding appropriately by responding
mindlessly to a process that encourages and rewards mindlessness. (Saying
that more diplomatically, a system that encourages rote memorization and
application of universal rules rather than thoughtful consideration of the
text in the light of the available context.)

What's ironic is that the second easiest way to handle it is to let the
postprocessor
(or is that post-processor? let's say post-*processor. See what I mean?) use
the available tools to simply list all the cases where hyphenated and
dehyphenated
versions of the same word appear in the text, check a page image, see which
was
actually used (I bet it's the most frequent), and fix 'em all at a stroke.

The first easiest way is to do this before posting the project in the first
place.

Then let the instructions say those dreaded DP words: "It doesn't matter,",
reducing
the cognitive distinctions and requirements between new proofers and old
proofers.

Somehow this concept is always a non-starter unfortunately, especially among
the old proofers who get to write the rules.


On Fri, Apr 16, 2010 at 6:19 PM, Karen Lofstrom <klofstrom at gmail.com> wrote:

> On Fri, Apr 16, 2010 at 3:12 PM, James Adcock <jimad at msn.com> wrote:
>
> > Because 1) I have seen some P3s change EVERY hyphen to a check-hyphen. 2)
> As a PP I have attempted to "fix" check-hyphens and to do so one has to try
> to understand what it was that the P3 was complaining about.  I've emailed
> some and said "what were you thinking?" and they say "oops, you're right, I
> was basically thinking that I wished the hyphen wasn't there."
>
> Those are folks who don't understand the rules. You ran into some bad
> P3ers; that doesn't mean that all of us in P3 are like that.
>
> You could have *corrected* the misconceptions, rather than deciding
> that we're all idiots.
>
> --
> Karen Lofstrom
> not an idiot
> (at least in THIS area)
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/ac7c2ede/attachment.html>

From klofstrom at gmail.com  Fri Apr 16 20:49:19 2010
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 16 Apr 2010 17:49:19 -1000
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
Message-ID: <k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>

On Fri, Apr 16, 2010 at 5:26 PM, don kretz <dakretz at gmail.com> wrote:

> The first easiest way is to do this before posting the project in the first place.

That sounds like a good idea. Why don't we add it as a step in the
preparation process?

--
Karen Lofstrom

From dakretz at gmail.com  Fri Apr 16 21:03:41 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 16 Apr 2010 21:03:41 -0700
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
Message-ID: <t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>

Ummm ... I think it is. It's a standard feature of guiprep. It's *almost* a
required activity for the Content Provider. But one that the guidelines
suggest that the proofer (who doesn't know this has probably happened) will
nullify.

Here is the applicable Proofing Guideline:

Words like to-day and to-morrow that we don't commonly hyphenate now were
often hyphenated in the old books we are working on. Leave them hyphenated
the way the author did. If you're not sure if the author hyphenated it or
not, leave the hyphen, put an * after it, and join the word together like
this: to-*day. The asterisk will bring it to the attention of the
post-processor, who has access to all the pages and can determine how the
author typically wrote this word.

Now an only mildly conservative reading of that suggests that just about any
word that could possibly be hyphenated should be "-*"ed unless there's
another example showing the "right way" on the very same page. There's
certainly no moderating language encouraging the proofer to do anything
else. Especially considering the possible calumny if they should do the
wrong thing. And it says right there to leave it for the PPer if it's not
obvious to you in the context of the one page available to you at the time.

In fact, if the CPer  has done what most CPers do, and left provably
hyphenated words hyphenated and closed up the rest, the Guideline actually
would lead the proofer to undo it all.

On Fri, Apr 16, 2010 at 8:49 PM, Karen Lofstrom <klofstrom at gmail.com> wrote:

> On Fri, Apr 16, 2010 at 5:26 PM, don kretz <dakretz at gmail.com> wrote:
>
> > The first easiest way is to do this before posting the project in the
> first place.
>
> That sounds like a good idea. Why don't we add it as a step in the
> preparation process?
>
> --
> Karen Lofstrom
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/9ff7e58a/attachment.html>

From dakretz at gmail.com  Fri Apr 16 21:19:38 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 16 Apr 2010 21:19:38 -0700
Subject: [gutvol-d] Re: [SPAM] RE: {Disarmed} Re: [SPAM] Re: Re: !@! #17135
	Twas The Night Before Christmas
In-Reply-To: <SNT120-DS1185CDE07B2103ABB412F8AE0E0@phx.gbl>
References: <30c96.47708041.38f79ecc@aol.com> <4BC66003.80201@teksavvy.com>
	<alpine.DEB.2.00.1004142043530.5281@durendal.durendal.org>
	<s2i627d59b81004141753qf7938b5aoa578c0d10edb3883@mail.gmail.com>
	<SNT120-DS1185CDE07B2103ABB412F8AE0E0@phx.gbl>
Message-ID: <l2w627d59b81004162119w9831fb4bwfd40a311295cdea1@mail.gmail.com>

I did some checking too. The conclusion I provisionally have arrived at
is that there are relatively few beneficiaries from our expectations for
an increasingly elegant HTML version of each project which also is
one of the major drags on the post-processing stage and a major
contributor in the increasing residency period of projects on DP.

It appears to me that the only people who enjoy the full pleasure
of our finest work are a.) those who read the whole thing online
at PG, and b) those who personally download the HTML version
and install it locally so they can read it with a device (probably a
PC full-width screen (including laptops and similar.) Which would
be - what - 10% or less?

In fact, it appears that secondary distributors treat the removal
of all or part of the HTML as part of their value-add.

Don


On Fri, Apr 16, 2010 at 4:52 PM, James Adcock <jimad at msn.com> wrote:

>   >Does anyone know of any epublisher other than PG that *does* distribute
> the html we provide?
>
>
>
> Not sure exactly what you are asking but Apple for example takes the PG
> html, strips out the PG legalize and acknowledgment of the volunteers,
> converts it to EPUB with DRM, and redistributes it ?free? [where ?Free? in
> this case means being only able to get then book in DRM form and only being
> able to get it directly from the Steve Jobs iPad monopoly]  One knows they
> are not working from the txt versions of the files because the Apple
> redistributions contains chars and formatting found only in the HTML
> versions.  FreeKindleBooks redistributes in HTML form converted to MOBI and
> retaining all the PG legalize and requirements. Mobileread has volunteers
> which take the HTML usually heavily reformat it, strip it, and republish in
> MOBI and EPUB formats while cackling about how much better their versions
> are!  Many other sites appear to ?down-convert? to a
> least-common-denominator ASCII format before ?up-converting? back to HTML,
> MOBI, EPUB, etc. Presumably they are working from an ASCII version of an old
> DVD distribution ? getting ?working? EPUB and MOBI from the HTML formats
> tends to be ?non-trivial?, not to mention that some sites republish in say
> two dozen different formats.
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/13063d4e/attachment-0001.html>

From klofstrom at gmail.com  Fri Apr 16 21:24:29 2010
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Fri, 16 Apr 2010 18:24:29 -1000
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
Message-ID: <x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>

On Fri, Apr 16, 2010 at 6:03 PM, don kretz <dakretz at gmail.com> wrote:

> Here is the applicable Proofing Guideline:

> Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together like this: to-*day.

Ah, badly-written guideline. What it doesn't spell out is that there's
ambiguity ONLY when words are hyphenated at the end of a line. I can
see how someone would misread that guideline and add asterisks before
every dang hyphen. The more so if the proofer weren't familiar with
18th and 19th century spellings and lacked any sense of how spellings
might have changed.

I have been proofing for nearly seven years now, so I suppose some
things seem clear to me that might be opaque to a less-experienced
proofer.

You're also assuming that the proofer is doing only one page at a
time. Many of us P3ers tend to do many pages in the same book, so
begin to have some sense of what spellings the author uses.

It might make sense for the project comments to include a list of
words that the au hyphenates that might be problematic. A note to the
effect that au uses to-day and to-morrow might alleviate some anxiety
and asterisks.

--
Karen Lofstrom

From dakretz at gmail.com  Fri Apr 16 21:36:01 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 16 Apr 2010 21:36:01 -0700
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
	collaborative proofreading site, part 3
In-Reply-To: <x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
	<x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
Message-ID: <n2k627d59b81004162136i5f2113e2xa5b18484bcc899e7@mail.gmail.com>

Ah, badly-written guideline. What it doesn't spell out is that there's

> ambiguity ONLY when words are hyphenated at the end of a line. I can
> see how someone would misread that guideline and add asterisks before
> every dang hyphen. The more so if the proofer weren't familiar with
> 18th and 19th century spellings and lacked any sense of how spellings
> might have changed.
>
> I have been proofing for nearly seven years now, so I suppose some
> things seem clear to me that might be opaque to a less-experienced
> proofer.
>
> You're also assuming that the proofer is doing only one page at a
> time. Many of us P3ers tend to do many pages in the same book, so
> begin to have some sense of what spellings the author uses.
>
> It might make sense for the project comments to include a list of
> words that the au hyphenates that might be problematic. A note to the
> effect that au uses to-day and to-morrow might alleviate some anxiety
> and asterisks.
>
> --
>

Yup. Or you could say that all the hyphenated words have already been
checked once, they will all be checked again in post-processing, and it
doesn't matter. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100416/75636a63/attachment.html>

From gbnewby at pglaf.org  Fri Apr 16 22:21:23 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Fri, 16 Apr 2010 22:21:23 -0700
Subject: [gutvol-d] Hyphenation (Re: Re: [SPAM] RE: Re: [SPAM] RE: so let's
 talk about my collaborative proofreading site, part 3)
In-Reply-To: <n2k627d59b81004162136i5f2113e2xa5b18484bcc899e7@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
	<x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
	<n2k627d59b81004162136i5f2113e2xa5b18484bcc899e7@mail.gmail.com>
Message-ID: <20100417052123.GA26290@pglaf.org>

Why is it that people will have a long thread about
hyphenation, yet not edit the darned screwed-up subject 
line to be clear and readable?

I'm sorry that our pglaf spam filter tags some stuff as spam,
but it doesn't mean we need to carry the tag forever!

Lovingly,
  Greg

On Fri, Apr 16, 2010 at 09:36:01PM -0700, don kretz wrote:
> Ah, badly-written guideline. What it doesn't spell out is that there's
> 
> > ambiguity ONLY when words are hyphenated at the end of a line. I can
> > see how someone would misread that guideline and add asterisks before
> > every dang hyphen. The more so if the proofer weren't familiar with
> > 18th and 19th century spellings and lacked any sense of how spellings
> > might have changed.
> >
> > I have been proofing for nearly seven years now, so I suppose some
> > things seem clear to me that might be opaque to a less-experienced
> > proofer.
> >
> > You're also assuming that the proofer is doing only one page at a
> > time. Many of us P3ers tend to do many pages in the same book, so
> > begin to have some sense of what spellings the author uses.
> >
> > It might make sense for the project comments to include a list of
> > words that the au hyphenates that might be problematic. A note to the
> > effect that au uses to-day and to-morrow might alleviate some anxiety
> > and asterisks.
> >
> > --
> >
> 
> Yup. Or you could say that all the hyphenated words have already been
> checked once, they will all be checked again in post-processing, and it
> doesn't matter. :)

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d


From richfield at telkomsa.net  Fri Apr 16 23:55:38 2010
From: richfield at telkomsa.net (Jon Richfield)
Date: Sat, 17 Apr 2010 08:55:38 +0200
Subject: [gutvol-d] Re: Eyestrain (Was Typesetting)
In-Reply-To: <1271444206-sup-9976@zion>
References: <1271444206-sup-9976@zion>
Message-ID: <4BC95B6A.7000002@telkomsa.net>

My history of screen experience goes back some 44 years, which is longer 
than we have had TV in South Africa. More than half of that period at 
work (and since the very early eighties home as well) was spent on 
screens of various qualities and functionalities, everything from 8080s 
and 8600s with delusions of grandeur, to large mainframes and the whole 
bang shoot in between, and everything from 300 bits (no, not bytes) per 
second (not necessarily baud) to crwth-knows-what now.

My point? Apart from my decrepitude and the fact that I now have taken 
to wearing glasses while on line, that eyestrain never figured. I could 
not understand what the problem was with friends who complained of it 
(and there were plenty). Then a year or two after I got into PC work I 
realised that if I got involved in an exciting interactive game (not 
always if I was the player if things got really exciting), I soon got 
eyestrain!

Now, what follows is not the remark of your friendly corner-shop 
ophthalmologist, and as far as I can make out my experience, while not 
unique is not shared by the majority of users, but I think it is of 
potential use to some people. In all my computer experience I have been 
emotionally comfortable with hardware, software, and their logic and 
theory of operation. Whereas many people lean forward when working at 
the screen, I lounge back, working with my eyes, not actually focussed 
on infinity (though I think it is a disgrace that our screens do not yet 
routinely and  economically support that)  but certainly focussed well 
past the tip of my cute little snout. In short, I am relaxed, *and so 
are my eyes*!

But obviously I am doing something different with my eyes when playing 
games. The screens are the actual same screens. I usually am sitting in 
the same attitude, etc. so dust from the screen isn't a factor. People 
have suggested all sorts of things, such as that when excited my pupils 
are more distended or my blink rate is lower. Maybe some of those 
factors are true, but what it feels like to me (subjectively, I haven't 
been in a position to test this) is that my ciliary muscles get tired.

So???

So, unless your screen or lighting is really lousy, ditto your typeface, 
colour, layout, size etc really unsuited to your needs, if screen 
fatigue is a problem, maybe what you need is some well-mamaged 
relaxation exercises. If what knackers your eyes is games, I am sure you 
can do the arithmetic!  (No, don't mind ME!  this is my sympathetic 
look!  ;-)  )

Cheers,

Jon

From hart at pglaf.org  Sat Apr 17 02:16:39 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 02:16:39 -0700 (PDT)
Subject: [gutvol-d] Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <SNT120-DS1CEA6D044B4018CFBA83FAE0E0@phx.gbl>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>
	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>
	<SNT120-DS1CEA6D044B4018CFBA83FAE0E0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004170213300.31117@mail.pglaf.org>


I have no objection to having both the Illuminated GIF file and
the ASCII equivalent character.  I see these as just fine, with
no impediment to either reading or searching or quoting, other,
of course, that any artifact of the GIF file usually not really
much of a problem when I cut and paste.

As for the MOBI, EPUB, etc., formats, as long as it's easy from
the average reader's POV, it should be acceptable.


Michael


On Fri, 16 Apr 2010, James Adcock wrote:

> Even more spectacular than the "illuminated letter" problem [which is bad
> enough][and which I would hope most transcribers would avoid nowadays by
> choosing NOT to include GIFs for illuminated letters and other trivial
> "printers art"] you also have texts when the transcriber has chosen to leave
> some text in GIF only mode, and/or other text in GIF mode AND OCR'ed mode,
> such that the MOBI and EPUB versions may have 0, 1 or 2 copies of a
> particular entire paragraph of text. And/or the HTML was written in a
> non-linear form in which case the MOBI and EPUB versions may have 0, 1, 2 or
> N copies of any particular passage in the text. And captions on images may
> be retained in the image, included in the HTML, and/or included in the
> alt-tag meaning that a particular user with a particular reading device may
> see or hear the image caption 0, 1, 2 or 3 times.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From marcello at perathoner.de  Sat Apr 17 03:28:10 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sat, 17 Apr 2010 12:28:10 +0200
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
Message-ID: <4BC98D3A.6080908@perathoner.de>

James Adcock wrote:

> iPad is a very nice solution if
> 
> a)      you want to let Steve Jobs decide what you get to read, where you can 
> download it from, what reader app you get to read it with, and how much you pay 
> for it.

With Stanza you can download directly from PG and many other free 
publishers.


> iPad is a HUGE step BACKWARDS as far as I can tell!

Apple systems have always been more closed than the alternatives.

If you don't like closed systems, buy an Android tablet instead.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From Bowerbird at aol.com  Sat Apr 17 11:13:30 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 17 Apr 2010 14:13:30 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <12881.668b4509.38fb544a@aol.com>

michael said:
>    Gods and Fighting Men

ok, just so we can all get "on the same page",
i've run out a first draft of this book as a .pdf.

>    http://z-m-l.com/misc/14465-take5.pdf

it's got some problems, notably with orphans,
including more than one page with one word,
but that's ok for the time being.

michael, how would this .pdf fit your needs?
(you'll need to print a few pages to evaluate.)
what, if anything, would need to be changed?

-bowerbird

p.s.   why can't people pick a _short_ book for
demo purposes?   long books clog the works...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100417/a93fe9a6/attachment.html>

From jimad at msn.com  Sat Apr 17 14:56:51 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 14:56:51 -0700
Subject: [gutvol-d] [SPAM] RE: Re: !@! #17135 Twas The Night Before Christmas
In-Reply-To: <alpine.DEB.2.00.1004170213300.31117@mail.pglaf.org>
References: <alpine.DEB.2.00.1004140615230.21016@mail.pglaf.org>	<52E0A8172A5F417986C50DAFE5D1132A@alp2400>	<SNT120-DS1CEA6D044B4018CFBA83FAE0E0@phx.gbl>
	<alpine.DEB.2.00.1004170213300.31117@mail.pglaf.org>
Message-ID: <SNT120-DS180859411F57F9964B657DAE0D0@phx.gbl>

>I have no objection to having both the Illuminated GIF file and
the ASCII equivalent character.  I see these as just fine, with
no impediment to either reading or searching or quoting, other,
of course, that any artifact of the GIF file usually not really
much of a problem when I cut and paste.

>As for the MOBI, EPUB, etc., formats, as long as it's easy from
the average reader's POV, it should be acceptable.

OK, then someone needs to think this through and come up with standards and
expectations, because what is happening now is "not working."  Again, it is
not infrequently the case in one or more of the file formats that PG is
distributing that a particular item of text is showing up 0, 1, 2, or 3
times, where the "right answer" is once -- or maybe twice -- if as you
suggests one accepts redundancy in the case of illuminated letters. As you
suggest probably the simplest answer is that if someone wants to put in
illuminated letters they also include the plain-text version of the letter,
and then presumably one should NOT include an alt-tag on the "illustration"
[when it is actually just an illuminated letter] What one *ought* to do for
a no-illustration distribution given a "real" illustration with an alt-tag
is yet another matter that needs to be thought out.

Also suggest it would be nice if we had a naming convention for illuminated
letters or some such equivalent, such that the file format conversion
software, and/or other software, can tell whether a particular HTML "really"
has illustrations, or if it just contains illuminated letters.  For example
in the text in question, when I ask PG for the MOBI version with *no images*
this is what I currently get (which is not quite what one would hope for!)

...

Saying her Prayers

T was the night before Christmas, when all through the house
Not a creature was stirring, not even a mouse;
The stocking were hung by the chimney with care
In hopes that St. Nicholas soon would be there;

Sleeping Mouse

Stocking in the Fireplace

The children were nestled all snug in their beds,
While visions of sugar-plums danced in their heads;
And mamma in her kerchief, and I in my cap,
Had just settled our brains for a long winter's nap,

The children were nestled

When out on the lawn there arose such a clatter,
I sprang from the bed to see what was the matter

....


From jimad at msn.com  Sat Apr 17 14:59:14 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 14:59:14 -0700
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about
	my	collaborative proofreading site, part 3
In-Reply-To: <q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
Message-ID: <SNT120-DS2505BBC013CB7F6979DAD7AE0D0@phx.gbl>

>You could have *corrected* the misconceptions, rather than deciding
that we're all idiots.

I did correct the misconceptions and I did not decide that "we" are all
idiots.  I have stated repeatedly that I found found extremely competent and
dedicated volunteers at all levels of DP -- and the converse.


From jimad at msn.com  Sat Apr 17 15:15:33 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 15:15:33 -0700
Subject: [gutvol-d] [SPAM] RE: Re: [SPAM] RE: Re: [SPAM] RE: so let's talk
	about my	collaborative proofreading site, part 3
In-Reply-To: <x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
References: <261a.60807b81.38fa0d43@aol.com>	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
	<x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
Message-ID: <SNT120-DS5FD6EECA86BF8C9683088AE0D0@phx.gbl>

>It might make sense for the project comments to include a list of
words that the au hyphenates that might be problematic. A note to the
effect that au uses to-day and to-morrow might alleviate some anxiety
and asterisks.

I have tried leaving project comments and the P1s and the P2s tend to read
and follow the project comments whereas the P3s ignore them and undo the
good work of the P1s and P2s.


From jimad at msn.com  Sat Apr 17 15:25:40 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 15:25:40 -0700
Subject: [gutvol-d] Re: Eyestrain (Was Typesetting)
In-Reply-To: <4BC95B6A.7000002@telkomsa.net>
References: <1271444206-sup-9976@zion> <4BC95B6A.7000002@telkomsa.net>
Message-ID: <SNT120-DS169AFDE2954C36A5E4BFC5AE0D0@phx.gbl>

The way people use their eyes, the ways people read, the capabilities of
their eyes, and their brains to process information, vary widely, and in
ways you cannot imagine unless you personally have run into problems and
have noticed that you have them.  In the simplest almost universal case
people start experiencing eyestrain around age 40 requiring the use of
compensating visual orthotics. Age 40 also seems to be about the age of
greatest denial ;-)


From jimad at msn.com  Sat Apr 17 15:32:43 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 15:32:43 -0700
Subject: [gutvol-d] [SPAM] RE:  Re: Typesetting
In-Reply-To: <4BC98D3A.6080908@perathoner.de>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
Message-ID: <SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>

>With Stanza you can download directly from PG and many other free 
publishers.

Sorry, but are you saying that you are actually currently running Stanza on
an iPad, that you have tested this, and that it works? From what I can see
they only have an iPod version, which yes will run on iPad -- and create a
blurry simulation of an iPod on your iPad.


From greg at durendal.org  Sat Apr 17 15:46:24 2010
From: greg at durendal.org (Greg Weeks)
Date: Sat, 17 Apr 2010 18:46:24 -0400 (EDT)
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: Re: [SPAM] RE: so let's
 talk about my collaborative proofreading site, part 3
In-Reply-To: <SNT120-DS5FD6EECA86BF8C9683088AE0D0@phx.gbl>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
	<x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
	<SNT120-DS5FD6EECA86BF8C9683088AE0D0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004171840020.9990@durendal.durendal.org>

On Sat, 17 Apr 2010, Jim Adcock wrote:

>> It might make sense for the project comments to include a list of
> words that the au hyphenates that might be problematic. A note to the
> effect that au uses to-day and to-morrow might alleviate some anxiety
> and asterisks.
>
> I have tried leaving project comments and the P1s and the P2s tend to read
> and follow the project comments whereas the P3s ignore them and undo the
> good work of the P1s and P2s.

What I found was that there's a contingent in all rounds that seem to have 
never read the project comments. I don't know if they never read them, or 
just forgot them in the throws of proofing.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From klofstrom at gmail.com  Sat Apr 17 15:47:22 2010
From: klofstrom at gmail.com (Karen Lofstrom)
Date: Sat, 17 Apr 2010 12:47:22 -1000
Subject: [gutvol-d] Dim view of P3ers
Message-ID: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>

On Sat, Apr 17, 2010 at 12:15 PM, Jim Adcock <jimad at msn.com> wrote:

> I have tried leaving project comments and the P1s and the P2s tend to read and follow the project comments whereas the P3s ignore them and undo the good work of the P1s and P2s.

And earlier Jim wrote:

> I have stated repeatedly that I found found extremely competent and
dedicated volunteers at all levels of DP -- and the converse.

Bizarre. In one post you're drawing back from blanket accusations and
in the next, you repeat them.

Jim, I don't understand WHY you feel impelled to keep throwing stones
at DP. You don't like the way we do things, you've left ... it's all
behind you, right? But no, you have to join the grouch group here at
PG and repeatedly attack the organization that is providing the
overwhelming majority of the texts submitted to PG.

I suppose I ought to just killfile you, as I have Bowerbird.

--
Karen Lofstrom

From hart at pglaf.org  Sat Apr 17 17:07:11 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 17:07:11 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>


On Sat, 17 Apr 2010, Jim Adcock wrote:

> >With Stanza you can download directly from PG and many other free
> publishers.
>
> Sorry, but are you saying that you are actually currently running Stanza on
> an iPad, that you have tested this, and that it works? From what I can see
> they only have an iPod version, which yes will run on iPad -- and create a
> blurry simulation of an iPod on your iPad.

iPads have their own iBooks App and if you search for "Project Gutenberg"
and various titles what you get seems very much not to be what you call a
"blurry simulation of an iPod on your iPad."

I suggest that instead of taking Artistotle's thought processing to try a
way of figuring out what an iPad looks like without looking at a real one
of these gizmos that instead you just find one and actually look at it or
the next best thing, look at the online demonstrations or ask someone who
is trying one out to do some experimentation for you.

In addition, you can also find a nice App from the people at Wattpad that
also has a rather nice rendering of the Project Gutenberg eBooks on iPad.

Given that eBook Apps surpassed game Apps on the iPod a while while, and,
no, I don't know exactly when that was or if games took back the crown or
eBooks kept the lead, but given that, I must presume eBook Apps will have
a decent life on the iPad.

I've tried out several reading experiences on the iPad and all seem quite
easy to read from and the Apps store makes it quite obvious which App has
been written specifically for the iPad and which for combinations.

I'm sure all the iPod reader outfits that are still in production will be
releassing iPad products that take full advantage of the 768 x 1024 res--
which works so well that you don't think at all about resolution and size
becomes the only factor you will probably worry about.

However, I should state in advance that I am sure people will find worry,
about all sorts of things, that seem just fine to nearly everyone else.

Personally, I'm just waiting to see what comes down the pike from persons
who want to turn iPads into iMacs or whatever, and then start Apps Stores
of various and sundry varieties, just like they did with iPhones, iPods &
pretty much everything else in the computing world.

Heck, the iPod was not out but a week when the first eBook reader was out
to let people read our Project Gutenberg eBooks and others on it.

I'm sure there will be dozens, if not hundreds, of iPad eBook readers.


From hart at pglaf.org  Sat Apr 17 17:11:13 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 17:11:13 -0700 (PDT)
Subject: [gutvol-d] Re: Eyestrain (Was Typesetting)
In-Reply-To: <SNT120-DS169AFDE2954C36A5E4BFC5AE0D0@phx.gbl>
References: <1271444206-sup-9976@zion> <4BC95B6A.7000002@telkomsa.net>
	<SNT120-DS169AFDE2954C36A5E4BFC5AE0D0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004171707550.27333@mail.pglaf.org>


On Sat, 17 Apr 2010, Jim Adcock wrote:

> The way people use their eyes, the ways people read, the capabilities of
> their eyes, and their brains to process information, vary widely, and in
> ways you cannot imagine unless you personally have run into problems and
> have noticed that you have them.  In the simplest almost universal case
> people start experiencing eyestrain around age 40 requiring the use of
> compensating visual orthotics. Age 40 also seems to be about the age of
> greatest denial ;-)

I could read the OED Microprint edition without decent lighting until 42.

After that it was all downhill so fast I never really tried it any more--
with or without glasses, but would use the provided Bausch & Lomb reader.

Today I use $1 glasses with all my computers. . .I just buy ever grade of
magnification and leave each with the computer it works best with.

I'll know I'm in trouble if/when I move to the 3x range. . .hee hee!


From hart at pglaf.org  Sat Apr 17 17:15:30 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 17:15:30 -0700 (PDT)
Subject: [gutvol-d] Dim View: WAS Re: [SPAM] RE: Re: [SPAM] RE: Re: [SPAM]
 RE: so let's talk about my collaborative proofreading site, part 3
In-Reply-To: <SNT120-DS5FD6EECA86BF8C9683088AE0D0@phx.gbl>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<p2j627d59b81004162026h2ceb6e74y9aa4582882b3454e@mail.gmail.com>
	<k2h1e8e65081004162049re1ac9489l5ca1c5d983f72f6f@mail.gmail.com>
	<t2p627d59b81004162103y2c50e9c7h7cb916716ba3178@mail.gmail.com>
	<x2i1e8e65081004162124s8ee31f3dz7e75f6735abdbd44@mail.gmail.com>
	<SNT120-DS5FD6EECA86BF8C9683088AE0D0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004171712551.27333@mail.pglaf.org>


On Sat, 17 Apr 2010, Jim Adcock wrote:

> >It might make sense for the project comments to include a list of
> words that the au hyphenates that might be problematic. A note to the
> effect that au uses to-day and to-morrow might alleviate some anxiety
> and asterisks.
>
> I have tried leaving project comments and the P1s and the P2s tend to read
> and follow the project comments whereas the P3s ignore them and undo the
> good work of the P1s and P2s.

Somehow in the context of the handful of mesages Jim Adcock sent,
and in even in the context of this message, this does not seem to
be a dim view. . .with two plusses and one minus.

Of course, it would all work out better if the minus came first--


From hart at pglaf.org  Sat Apr 17 17:16:51 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 17:16:51 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE: Re: [SPAM] RE: so let's talk about my
 collaborative proofreading site, part 3
In-Reply-To: <SNT120-DS2505BBC013CB7F6979DAD7AE0D0@phx.gbl>
References: <261a.60807b81.38fa0d43@aol.com>
	<SNT120-DS14FB512EDA804B7AFD14DFAE0D0@phx.gbl>
	<p2l1e8e65081004161757sd8a3283ega72a88f7e90c102c@mail.gmail.com>
	<SNT120-DS12B7C5943DA99055B6D403AE0D0@phx.gbl>
	<q2p1e8e65081004161819o776cfbf0pc6641dfe45223efa@mail.gmail.com>
	<SNT120-DS2505BBC013CB7F6979DAD7AE0D0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004171716190.27333@mail.pglaf.org>


On Sat, 17 Apr 2010, Jim Adcock wrote:

> >You could have *corrected* the misconceptions, rather than deciding
> that we're all idiots.  [Talk about a "Dim View". . . .]


> I did correct the misconceptions and I did not decide that "we" are all
> idiots.  I have stated repeatedly that I found found extremely competent and
> dedicated volunteers at all levels of DP -- and the converse.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From Bowerbird at aol.com  Sat Apr 17 18:14:46 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sat, 17 Apr 2010 21:14:46 EDT
Subject: [gutvol-d] Re: Typesetting (not really,
	but nobody seems to read subject-headers)
Message-ID: <a4b6.2a7c583f.38fbb706@aol.com>

michael said:
>   Of course, it would all work out better if the minus came first--

hey, that's an idea.   have p3 proof first, followed by p2, then p1.

***

the p3 proofers asterisk the end-line-hyphenates because
that's the one course of action guaranteed not to be wrong.
moreover, it's the _only_ one that carries that promise.

***

michael said:
>    there will be dozens, if not hundreds, of iPad eBook readers.

and 3/4 of them will claim to support the .epub format,
but yet no two of them will do it in the exact same way...

but hey, aren't y'all glad that we have a _standard_?   i am!

***

um, and jim is right about one thing.   there's no stanza on ipad.
unless amazon changes its mind, and reverses its current stand.

***

as for eyestrain, if you have it, explore the various solutions!
because it _is_ possible for you to get rid of it, in most cases.
it might mean buying better equipment, but not necessarily;
the solution might be free, and easy, and improve your life...

***

oh, and that .pdf i created?   no comments?   whatsamatter?
is something like that too _tangible_ to be discussed here?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100417/88cbdab3/attachment-0001.html>

From mmcdermott at mad-computer-scientist.com  Sat Apr 17 18:49:45 2010
From: mmcdermott at mad-computer-scientist.com (Michaelu McDermott)
Date: Sat, 17 Apr 2010 20:49:45 -0500
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <12881.668b4509.38fb544a@aol.com>
References: <12881.668b4509.38fb544a@aol.com>
Message-ID: <1271554991-sup-4922@zion>

> i've run out a first draft of this book as a .pdf.

I won't be able to run off some pages until tomorrow, but the PDF looks
quite good. Offhand, though, the page numbers look like they drop low
enough that they _could_ be out of the printable area.

> p.s.   why can't people pick a _short_ book for
> demo purposes?   long books clog the works...

Aww, come on. That wouldn't be any fun, now would it? :)

Seriously, though shorter works are poor representatives of the problem
at hand. Picking the Declaration of Independence, or TS Eliot's the
Wasteland, would be too simple to print up as is and ignore the issues,
load up in a word processor, or manually mark up.

-Michael

Excerpts from Bowerbird's message of Sat Apr 17 13:13:30 -0500 2010:
> michael said:
> >    Gods and Fighting Men
> 
> ok, just so we can all get "on the same page",
> i've run out a first draft of this book as a .pdf.
> 
> >    http://z-m-l.com/misc/14465-take5.pdf
> 
> it's got some problems, notably with orphans,
> including more than one page with one word,
> but that's ok for the time being.
> 
> michael, how would this .pdf fit your needs?
> (you'll need to print a few pages to evaluate.)
> what, if anything, would need to be changed?
> 
> -bowerbird
> 
> p.s.   why can't people pick a _short_ book for
> demo purposes?   long books clog the works...
-- 
Michael McDermott
www.mad-computer-scientist.com

From hart at pglaf.org  Sat Apr 17 20:18:11 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sat, 17 Apr 2010 20:18:11 -0700 (PDT)
Subject: [gutvol-d] Eyestrain
In-Reply-To: <a4b6.2a7c583f.38fbb706@aol.com>
References: <a4b6.2a7c583f.38fbb706@aol.com>
Message-ID: <alpine.DEB.2.00.1004172016390.32458@mail.pglaf.org>


Try reading white on black, or charcoal, or any such mixtures.

Get the contrast where you like it, try lots of fonts, sizes,
refresh rates, etc.


From jimad at msn.com  Sat Apr 17 23:27:49 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 23:27:49 -0700
Subject: [gutvol-d] Re: Dim view of P3ers
In-Reply-To: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
Message-ID: <SNT120-DS137470B6E73D91DDD1984AE0C0@phx.gbl>

>Jim, I don't understand WHY you feel impelled to keep throwing stones
at DP. You don't like the way we do things, you've left ... it's all
behind you, right? But no, you have to join the grouch group here at
PG and repeatedly attack the organization that is providing the
overwhelming majority of the texts submitted to PG.

If you read my comments carefully I think you will find that I try to speak
truthfully to what works at DP and at PG and what doesn't work so that we
all can try to fix it and make a better contribution to the world.  In the
business world this would be called "continuous improvement."  PG'ers at
least seem to be able to generally acknowledge what works and what doesn't
work.  In DP-land if you don't drink the koolaid and declare it tasty then
you fall constantly under attack.  If there are problems with how P3 works
-- and there are -- one would think DP would want to face up to that and
work to improve it -- just as in PG-land the lack of standards are causing
texts to be distributed to users frequently missing or duplicating letters
and words and in some cases whole paragraphs.  I could say "gosh let's
ignore this because DP and PG are all volunteers and their hearts are in the
right places and I wouldn't want to hurt anyone's feelings" but that
wouldn't change the facts: DP wastes a lot of volunteer time and in general
makes things more painful than need be due to aged tools and approaches.
And PG distributes a lot of stuff that ends up appearing "broken" to end
users because of the standards chosen -- and/or the lack thereof.


From jimad at msn.com  Sat Apr 17 23:47:08 2010
From: jimad at msn.com (Jim Adcock)
Date: Sat, 17 Apr 2010 23:47:08 -0700
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
Message-ID: <SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>

>> Sorry, but are you saying that you are actually currently running Stanza
on
>> an iPad, that you have tested this, and that it works? From what I can
see
>> they only have an iPod version, which yes will run on iPad -- and create
a
>> blurry simulation of an iPod on your iPad.
>
>iPads have their own iBooks App and if you search for "Project Gutenberg"
>and various titles what you get seems very much not to be what you call a
>"blurry simulation of an iPod on your iPad."
>
>I suggest that instead of taking Artistotle's thought processing to try a
>way of figuring out what an iPad looks like without looking at a real one
>of these gizmos that instead you just find one and actually look at it or
>the next best thing, look at the online demonstrations or ask someone who
>is trying one out to do some experimentation for you.

I have done all these things.  I went to an apple store and played with an
iPad as soon as they came out and was underwhelmed.  I compared it to an
iPod and decided that if I was going to consider either one probably the
iPod made more sense to me.  A friend has bought an iPod and we spent an
evening playing with it trying to get PG books directly to it without
passing through the Steve Jobs filter.  For example in the web browser we
tried downloading an ePub format book from PG and Apple blocks this whereas
in comparison Kindle supports it -- as do PC browsers.  We downloaded and
installed Stanza and it showed up as a blurry simulation of an iPod within
the iPad.

Again, I am asking a serious question: Are you saying that you are actually
currently running Stanza on an iPad, that you have tested this, and that it
works? Because I have tested it and for me it didn't work, but rather showed
up as a blurry simulation of an iPod on the iPad.  There are also
discussions on the web about how Steve Jobs required Stanza to take out
features that allowed Stanza users to share non-DRM books with friends.

If you have found "good" ways to get PG directly to iPad how about
discussing them in detail, what you did to have success, rather than flaming
me -- because I have tried and what I have seen to date is not very
encouraging. If you own an iPad and have had luck directly loading a PG book
from PG onto your iPad and can read it then please share with us how because
that will certainly affect my purchase decision -- or lack thereof.

Yes one can use the apple ibooks app to read copies of PG books
redistributed by Apple where Apple has stripped the PG legalese and
acknowledgements - at least the first 20,000 titles, the most recent stuff
doesn't seem to be there.  I have already said this in previous emails.


From jimad at msn.com  Sun Apr 18 00:21:43 2010
From: jimad at msn.com (James Adcock)
Date: Sun, 18 Apr 2010 00:21:43 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <12881.668b4509.38fb544a@aol.com>
References: <12881.668b4509.38fb544a@aol.com>
Message-ID: <SNT120-DS7822C0F94D4FDE6DAB4EEAE0C0@phx.gbl>

>   http://z-m-l.com/misc/14465-take5.pdf


First time I tried downloading this is didn't work.  Tried it again later
from a different computer and it worked.

 
Tried printing out the first 10 pages.  My printer reported that the
document requested C5 page size - but the C series is an envelope size?

 
I would have expected A4 or US "Letter" size.

 
First Page title appears to print off center to the left.

 
Contents in an unusually small font

 
Page numbers in an unusually large font

 
Ragged Right is an unusual convention for a PDF document

 
Body font seems to be unusually small.

 
Line length of approx 70 chars seems unusually long for a book-like format.
Most books use about 50 chars per line of text because doing so makes the
book more readable.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/effbc6dc/attachment.html>

From richfield at telkomsa.net  Sun Apr 18 00:48:54 2010
From: richfield at telkomsa.net (Jon Richfield)
Date: Sun, 18 Apr 2010 09:48:54 +0200
Subject: [gutvol-d] Re: Eyestrain (Was Typesetting)
In-Reply-To: <alpine.DEB.2.00.1004171707550.27333@mail.pglaf.org>
References: <1271444206-sup-9976@zion>
	<4BC95B6A.7000002@telkomsa.net>	<SNT120-DS169AFDE2954C36A5E4BFC5AE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171707550.27333@mail.pglaf.org>
Message-ID: <4BCAB966.3010805@telkomsa.net>

Yes, I agree with both. I never was very comfortable with OEDMP at any 
age, but could read it in good light at a pinch till about 50 (can't 
remember exactly; memory going along with other virtues. Used to have 
senior moments. Now have junior moments Not yet in my pants, but no 
doubt that too is on the way.)  Now, as it happens, I am (primarily) an 
unfrocked biologist and have discovered that the strongest "readers" I 
can find, (+3.5 to 4 if I am lucky)  though useless for proper reading 
(my current prescription is +2.5) make very useful visual aids for field 
work and are perfect for OEDMP reading; far better than the rather good 
magnifier supplied with the books.

BTW, in case anyone else in the forum still reads and enjoys books, 
paper books (a medium that needs redesign, and I am just the man to do 
it!) might be interested in a useful expedient that I happened across.
My OEDMP came in a box/shelf with magnifier and two slots, one for each 
tome. As the designers of the package obviously had experience of what 
happened to large volumes that got manhandled by their bindings, they 
had a neat expedient: behind each tome a strip of tough, transparent 
plastic was fastened to the upper back corner of the slot, and hung down 
to  the bottom, passing thence to the front, where it emerged as a tab 
below each volume.  To get the volume out without brutalising it, you 
simply pulled at the matching tab. The volume then emerged a few inches 
without damage or inconvenient scrabbling, and could then be picked up 
in a civilised, nondestructive mode.
Now, after some 40 years or so, (can't remember exactly; memory going 
along with other virtues. Used to have senior moments. Now have junior 
moments Not yet in my pants, but no doubt that too is on the way.) those 
strips of polyester or whatever (I omitted to burn a bit, so I am 
uncertain; it might have been plasticised PVC or something (can't 
remember exactly; memory going along with other virtues. Used to have 
senior moments. Now have junior moments Not yet in my pants, but no 
doubt that too is on the way.) )  began to go nonfunctional and their 
connections failed. So I removed them. Then an idea struck as my 
gathering senility went on strike for a while. Some idiot was lining a 
dam with plastic in the near neighbourhood and offcuts of 2mm-thick 
black HDPE were lying around as though waste were a virtue. I had  
liberated a square metre or two and cut two strips to fit where the 
transparent plastic had gone. Unlike the original, my inserts were much 
stiffer and I applied some brutal folding to make it turn the corner, 
but had no need to fasten it at the top back corner.
It works amazingly, smoothly and cleanly, and it is harmless to book, 
cabinet and reader. Two moving parts, including the book. Its only 
shortcoming for general use on broad shelves is that one needs strips 
that roughly correspond to the widths of the matching books. One could 
design shelves and attachments to overcome that (very minor) problem, 
but I seldom have such a need, so I let it go. Old age and all that.

Cheers,

Jon

On 2010/04/18 02:11 AM, Michael S. Hart wrote:
> On Sat, 17 Apr 2010, Jim Adcock wrote:
>
>    
>> The way people use their eyes, the ways people read, the capabilities of
>> their eyes, and their brains to process information, vary widely, and in
>> ways you cannot imagine unless you personally have run into problems and
>> have noticed that you have them.  In the simplest almost universal case
>> people start experiencing eyestrain around age 40 requiring the use of
>> compensating visual orthotics. Age 40 also seems to be about the age of
>> greatest denial ;-)
>>      
> I could read the OED Microprint edition without decent lighting until 42.
>
> After that it was all downhill so fast I never really tried it any more--
> with or without glasses, but would use the provided Bausch&  Lomb reader.
>
> Today I use $1 glasses with all my computers. . .I just buy ever grade of
> magnification and leave each with the computer it works best with.
>
> I'll know I'm in trouble if/when I move to the 3x range. . .hee hee!
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>    

From hart at pglaf.org  Sun Apr 18 02:47:05 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sun, 18 Apr 2010 02:47:05 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>


On Sat, 17 Apr 2010, Jim Adcock wrote:

> >> Sorry, but are you saying that you are actually currently running Stanza
> on
> >> an iPad, that you have tested this, and that it works? From what I can
> see
> >> they only have an iPod version, which yes will run on iPad -- and create
> a
> >> blurry simulation of an iPod on your iPad.
> >
> >iPads have their own iBooks App and if you search for "Project Gutenberg"
> >and various titles what you get seems very much not to be what you call a
> >"blurry simulation of an iPod on your iPad."
> >
> >I suggest that instead of taking Artistotle's thought processing to try a
> >way of figuring out what an iPad looks like without looking at a real one
> >of these gizmos that instead you just find one and actually look at it or
> >the next best thing, look at the online demonstrations or ask someone who
> >is trying one out to do some experimentation for you.
>
> I have done all these things.  I went to an apple store and played with an
> iPad as soon as they came out and was underwhelmed.  I compared it to an
> iPod and decided that if I was going to consider either one probably the
> iPod made more sense to me.  A friend has bought an iPod and we spent an
> evening playing with it trying to get PG books directly to it without
> passing through the Steve Jobs filter.  For example in the web browser we
> tried downloading an ePub format book from PG and Apple blocks this whereas
> in comparison Kindle supports it -- as do PC browsers.  We downloaded and
> installed Stanza and it showed up as a blurry simulation of an iPod within
> the iPad.

Somewhere in the previous paragraph you seem to have switched from talking
"A friend has bought an iPod and we spent and evening playing with it...",
to "it showed up as a blurry simulation of an iPod within the iPad", with,
it would appear, no switch of topic from iPod to iPad.

Was there are typo in "friend has bought an iPod" where you meant "iPad"?,
or did I miss something else that indicated changes from iPod to iPad?


> Again, I am asking a serious question: Are you saying that you are actually
> currently running Stanza on an iPad, that you have tested this, and that it
> works? Because I have tested it and for me it didn't work, but rather showed
> up as a blurry simulation of an iPod on the iPad.  There are also
> discussions on the web about how Steve Jobs required Stanza to take out
> features that allowed Stanza users to share non-DRM books with friends.

I didn't mention Stanza at all, so how can you be asking me "a serious
question:  Are you saying you are actually running Stanza on an iPad?"

Perhaps you can restate this and also enlighten us on the feature that
is missing, where it and how to use it on the other Stanza version[s].


> If you have found "good" ways to get PG directly to iPad how about
> discussing them in detail, what you did to have success,

I told you. . .I used the iBooks App that popped up at first turn on,
and I also used the Wattpad App.

If you don't like those, you might try Goodreader Lite, before trying
though I am not sure of the details, haven't tried it yet.


>  rather than flaming me --

Flaming you?

After all the previous harshness, you accuse ME of flaming you?

Is that because I asked if you didn't try iBooks and Wattpad?

Neither of which product mentions did you reply to, nor even "Thanks,
but no thanks for the suggestion."

Not to mention attacking me for something I said about Stanza, when I
didn't even mention Stanza.

Please. . .lighten up. . .I'm on your side. . .and trying to help.


> because I have tried and what I have seen to date is not very
> encouraging. If you own an iPad and have had luck directly loading a PG book
> from PG onto your iPad and can read it then please share with us how because
> that will certainly affect my purchase decision -- or lack thereof.


> Yes one can use the apple ibooks app to read copies of PG books
> redistributed by Apple where Apple has stripped the PG legalese and
> acknowledgements - at least the first 20,000 titles, the most recent stuff
> doesn't seem to be there.  I have already said this in previous emails.

Yet this is a case of "had luck directly loading a PG book," though not "from
PG onto your iPad" but "can read it". . . .

Personally, I don't care where anyone gets our books from, just as long as we
get them out to people.

As for the more recent titles, yes, most people "start at the beginning, and
continue on until they get to the end."  However, I am guessing even if/when
they catch up, there will still be some delay, as is true for any numbers of
other sites that have relayed our books from us to others in format variety,
or other change, that gives them a certain appeal beyond our own formats.

Some of these hand out nearly as many as we do from our largest sites.

Our goal is the most eBooks to the most people.

All of these people are helping us do this, and we don't pay them anything.

In a very real sense Apple, Amazon, et al, work for Project Gutenberg.


>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From marcello at perathoner.de  Sun Apr 18 03:37:55 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 18 Apr 2010 12:37:55 +0200
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
Message-ID: <4BCAE103.2030105@perathoner.de>

Jim Adcock wrote:

>> With Stanza you can download directly from PG and many other free 
> publishers.
> 
> Sorry, but are you saying that you are actually currently running Stanza on
> an iPad, that you have tested this, and that it works? From what I can see
> they only have an iPod version, which yes will run on iPad -- and create a
> blurry simulation of an iPod on your iPad.

I didn't because Apple sent me no iPad and I never bought from Apple in 
my life nor will I unless they radically change their business model.

By Lexcycle's own claim Stanza is compatible with the iPad:

   http://itunes.apple.com/us/app/stanza/id284956128?mt=8


I run Stanza on a Touch and download dozens of PG ePubs every day. Just 
point a new 'Book Source' at

   m.gutenberg.org

(Don't publish this url because we have not enough server horsepower 
behind it yet.)


-- 
Marcello Perathoner
webmaster at gutenberg.org

From marcello at perathoner.de  Sun Apr 18 04:23:06 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 18 Apr 2010 13:23:06 +0200
Subject: [gutvol-d] DP output is technically obsolete
In-Reply-To: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
Message-ID: <4BCAEB9A.2040105@perathoner.de>

Karen Lofstrom wrote:

> But no, you have to join the grouch group here at
> PG and repeatedly attack the organization that is providing the 
> overwhelming majority of the texts submitted to PG.

Quantity, yes ... Let's talk *quality* instead.

The problem is not that some PPers are incompetent, the problem is that 
the whole DP output is technically obsolete:

DP is producing `HTML Facsimiles for the Desktop? while it should be 
producing eBooks.

Which do you think is more useful? A book you can only read at home on 
your dektop or a book you can read everywhere on your phone?

Ironically much of PPing clogs the queues while lessening the value of 
the books.

DP output renders ugly on all devices except desktop-sized screens.

DP HTML is almost as hard to convert to other formats as PG plain text.

DP has to enforce some standard that greatly simplifies the output.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From hart at pglaf.org  Sun Apr 18 07:11:05 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sun, 18 Apr 2010 07:11:05 -0700 (PDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCAEB9A.2040105@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
Message-ID: <alpine.DEB.2.00.1004180710530.15183@mail.pglaf.org>


Hear!  Hear!


On Sun, 18 Apr 2010, Marcello Perathoner wrote:

> Karen Lofstrom wrote:
>
> > But no, you have to join the grouch group here at
> > PG and repeatedly attack the organization that is providing the overwhelming
> > majority of the texts submitted to PG.
>
> Quantity, yes ... Let's talk *quality* instead.
>
> The problem is not that some PPers are incompetent, the problem is that the
> whole DP output is technically obsolete:
>
> DP is producing `HTML Facsimiles for the Desktop? while it should be producing
> eBooks.
>
> Which do you think is more useful? A book you can only read at home on your
> dektop or a book you can read everywhere on your phone?
>
> Ironically much of PPing clogs the queues while lessening the value of the
> books.
>
> DP output renders ugly on all devices except desktop-sized screens.
>
> DP HTML is almost as hard to convert to other formats as PG plain text.
>
> DP has to enforce some standard that greatly simplifies the output.
>
>
>
>
>

From prosfilaes at gmail.com  Sun Apr 18 07:24:57 2010
From: prosfilaes at gmail.com (David Starner)
Date: Sun, 18 Apr 2010 10:24:57 -0400
Subject: [gutvol-d] Re: Dim view of P3ers
In-Reply-To: <SNT120-DS137470B6E73D91DDD1984AE0C0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<SNT120-DS137470B6E73D91DDD1984AE0C0@phx.gbl>
Message-ID: <v2y6d99d1fd1004180724n87b1424cgb242f5ff2443d3bd@mail.gmail.com>

On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock <jimad at msn.com> wrote:
>?In the
> business world this would be called "continuous improvement."

Jim, in the business world, your complaint about the fact the business
wasn't working on your preferred projects would annoy the hell out of
your coworkers the eighth time they heard it, just like here.

-- 
Kie ekzistas vivo, ekzistas espero.

From traverso at posso.dm.unipi.it  Sun Apr 18 08:05:09 2010
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Sun, 18 Apr 2010 17:05:09 +0200 (CEST)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCAEB9A.2040105@perathoner.de> (message from Marcello
	Perathoner on Sun, 18 Apr 2010 13:23:06 +0200)
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
Message-ID: <20100418150509.8A3501008D@cardano.dm.unipi.it>

>>>>> "Marcello" == Marcello Perathoner <marcello at perathoner.de> writes:

    Marcello> Karen Lofstrom wrote:

    >> But no, you have to join the grouch group here at PG and
    >> repeatedly attack the organization that is providing the
    >> overwhelming majority of the texts submitted to PG.

    Marcello> Quantity, yes ... Let's talk *quality* instead.

    Marcello> The problem is not that some PPers are incompetent, the
    Marcello> problem is that the whole DP output is technically
    Marcello> obsolete:

    Marcello> DP is producing `HTML Facsimiles for the Desktop? while
    Marcello> it should be producing eBooks.

    Marcello> Which do you think is more useful? A book you can only
    Marcello> read at home on your dektop or a book you can read
    Marcello> everywhere on your phone?

Is PG ready to accept Epub as submission format? (i.e. one submits a
valid epub from which the other formats are derived)? If so, one can
target Epub, otherwise at best one is forced to submit HTML or txt
that converts not-too-badly with current PG tools, and this migh be
extremely challenging. 

Carlo

From dakretz at gmail.com  Sun Apr 18 09:20:03 2010
From: dakretz at gmail.com (don kretz)
Date: Sun, 18 Apr 2010 09:20:03 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
Message-ID: <r2r627d59b81004180920w2205f342v59b339e5eb45c299@mail.gmail.com>

It really doesn't matter what DP targets as long as it's capable
of identifying, completely and unambiguously, the requisite
syntactic elements. But we have no agreed list, not even an
ad hoc functional one, of what those are.

Instead our focus is on subjective elegance of appearance
rather than on objective clarity and completeness. "Good work"
has come to be associated with "looks pretty and makes the
PPer feel good," plus the ability to pass two sets of incompletely
documented and sometimes inconsistent automated tests -
the postprocessor tools and the whitewashers' tools - neither
of which were intended to consider syntactic rigor and accuracy.

Interestingly, we seem to have instinctively inferred the need
of this. the HTML texts often include some basic form of it (or
more accurately an ad hoc collection of basic forms) in the CSS
stylesheets.

It seems to me that we need the "what" before we worry about the "how".

Don


On Sun, Apr 18, 2010 at 8:05 AM, Carlo Traverso
<traverso at posso.dm.unipi.it>wrote:

> >>>>> "Marcello" == Marcello Perathoner <marcello at perathoner.de> writes:
>
>    Marcello> Karen Lofstrom wrote:
>
>    >> But no, you have to join the grouch group here at PG and
>    >> repeatedly attack the organization that is providing the
>    >> overwhelming majority of the texts submitted to PG.
>
>     Marcello> Quantity, yes ... Let's talk *quality* instead.
>
>    Marcello> The problem is not that some PPers are incompetent, the
>    Marcello> problem is that the whole DP output is technically
>    Marcello> obsolete:
>
>    Marcello> DP is producing `HTML Facsimiles for the Desktop? while
>    Marcello> it should be producing eBooks.
>
>    Marcello> Which do you think is more useful? A book you can only
>    Marcello> read at home on your dektop or a book you can read
>    Marcello> everywhere on your phone?
>
> Is PG ready to accept Epub as submission format? (i.e. one submits a
> valid epub from which the other formats are derived)? If so, one can
> target Epub, otherwise at best one is forced to submit HTML or txt
> that converts not-too-badly with current PG tools, and this migh be
> extremely challenging.
>
> Carlo
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/d75a4168/attachment.html>

From ajhaines at shaw.ca  Sun Apr 18 09:29:58 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sun, 18 Apr 2010 09:29:58 -0700
Subject: [gutvol-d] Reporting errors in PG files (was  Dim view of P3ers)
Message-ID: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>

Jim Adcock wrote:

>just as in PG-land the lack of standards are causing
>texts to be distributed to users frequently missing or duplicating letters
>and words and in some cases whole paragraphs.

Errors in PG's files should be reported to the Errata system: 
errata2010_AT_pglaf.org

Error reports should be as specific as possible.  Mention the etext number, 
the line number(s), the line(s) of text in question, and the proposed 
correction(s) to each.  If there are many errors, feel free to download and 
correct the existing files, and send them to the above address.  (Don't 
re-wrap; don't touch the PG header or footer.)

If you feel that a text can be fixed only by a complete re-do (maybe it's 
missing the illustrations, the index, or whatever), feel free to download a 
scanset, get a copyright clearance, and have at it.  When the new fileset is 
submitted through the normal process, mention the text number that it's an 
update/correction/replacement for.  The original producer's credit will be 
added to yours, the original etext will be archived, and the new version 
posted (under the original etext number).

Simply complaining about errors isn't useful, nor are general complaints, 
especially concerning older texts, such as "italics aren't shown" or 
"all-caps are used for italics, not underscores".

Al


From marcello at perathoner.de  Sun Apr 18 09:35:36 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Sun, 18 Apr 2010 18:35:36 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
Message-ID: <4BCB34D8.8090908@perathoner.de>

Carlo Traverso wrote:
>>>>>> "Marcello" == Marcello Perathoner <marcello at perathoner.de> writes:
> 
>     Marcello> Karen Lofstrom wrote:
> 
>     >> But no, you have to join the grouch group here at PG and
>     >> repeatedly attack the organization that is providing the
>     >> overwhelming majority of the texts submitted to PG.
> 
>     Marcello> Quantity, yes ... Let's talk *quality* instead.
> 
>     Marcello> The problem is not that some PPers are incompetent, the
>     Marcello> problem is that the whole DP output is technically
>     Marcello> obsolete:
> 
>     Marcello> DP is producing `HTML Facsimiles for the Desktop? while
>     Marcello> it should be producing eBooks.
> 
>     Marcello> Which do you think is more useful? A book you can only
>     Marcello> read at home on your dektop or a book you can read
>     Marcello> everywhere on your phone?
> 
> Is PG ready to accept Epub as submission format? (i.e. one submits a
> valid epub from which the other formats are derived)? If so, one can
> target Epub, otherwise at best one is forced to submit HTML or txt
> that converts not-too-badly with current PG tools, and this migh be
> extremely challenging. 

That is not the problem. You can botch ePub as easily as you can HTML. 
(In fact ePub is only HTML + some metadata)

You should produce HTML that is *semantically* correct and degrades 
gracefully. Ie. if you remove all CSS it should still make sense.


Most prominent offenders are non-semantic headers, preformatted text, 
positioning, floating and ornaments.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From gbnewby at pglaf.org  Sun Apr 18 10:05:36 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 18 Apr 2010 10:05:36 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
Message-ID: <20100418170536.GA22578@pglaf.org>

On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso wrote:
> Is PG ready to accept Epub as submission format? (i.e. one submits a
> valid epub from which the other formats are derived)? If so, one can
> target Epub, otherwise at best one is forced to submit HTML or txt
> that converts not-too-badly with current PG tools, and this migh be
> extremely challenging. 
> 
> Carlo

I don't think we're ready for this except in rare cases where
ePub is the best format for display for a particular item
(we just released a book where PDF was the best format, believe
it or not).

The challenge is that when books are fixed, someone (typically
the whitewasher, seldom the original submitter) needs to
regenerate all the files from that book.  

Since there is not yet any standard processing stream to 
generate static ePub files, this makes it hard for fixes (to HTML & text)
to be applied to ePubs.  

I would, of course, love to see something become our "standard"
conversion tool, usable by anyone.  Right now, the closest for PG is
Marcello's software to build the cached ePub files.  It's wonderful and
functional, but is it ready for all envisioned purposes?  I think not,
due at least in part to shortcomings of the input HTML.


ALL that said, maybe I am too hung up on automated or semi-automated
methods.  It *is* the case that an ePub can yield plain HTML, which
could be edited and zipped up into a new ePub (without too much trouble).

  Is there enough benefit in such ePubs?  Are there good examples
of hand-crafted (or automated, but using different software than
is used on the gutenberg.org server) that are far superior to
the alternatives?  


Having a single master format, from which all subsidiary formats can be
derived, has been a long-time goal.  This has not yet been viable for
most titles, despite valiant (and productive) efforts with HTML and TeX.
>From everything I've seen about ePub, adding static ePub files to
the collection would be a net increase in the effort needed to apply
fixes (i.e., it would be one MORE format to deal with by hand, not
a generated format that would be very little extra work to generate).


There are lots of people involved in creating, managing and fixing eBook
files, and there is certainly room for any experiments that people can
think of.  My response isn't intended to quell such effort, rather to
state that given the current state of things, I don't think ePub is a
great candidate for a new static file format for the PG collection.

  -- Greg

From dakretz at gmail.com  Sun Apr 18 10:07:35 2010
From: dakretz at gmail.com (don kretz)
Date: Sun, 18 Apr 2010 10:07:35 -0700
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
In-Reply-To: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
Message-ID: <u2l627d59b81004181007nfed6c66dhfdb44feb20f5dfe8@mail.gmail.com>

It seems to me that error identification, reporting, verification and repair
would be
a lot easier if PG provided easily-accessible on-line access to the page
images,
and a form to provide the required information, and least for point cases.

Then the reporting person could just find the page, check the image you're
going
to use for verification, and narrow things down for processing.


On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines at shaw.ca> wrote:

> Jim Adcock wrote:
>
>  just as in PG-land the lack of standards are causing
>> texts to be distributed to users frequently missing or duplicating letters
>> and words and in some cases whole paragraphs.
>>
>
> Errors in PG's files should be reported to the Errata system:
> errata2010_AT_pglaf.org
>
> Error reports should be as specific as possible.  Mention the etext number,
> the line number(s), the line(s) of text in question, and the proposed
> correction(s) to each.  If there are many errors, feel free to download and
> correct the existing files, and send them to the above address.  (Don't
> re-wrap; don't touch the PG header or footer.)
>
> If you feel that a text can be fixed only by a complete re-do (maybe it's
> missing the illustrations, the index, or whatever), feel free to download a
> scanset, get a copyright clearance, and have at it.  When the new fileset is
> submitted through the normal process, mention the text number that it's an
> update/correction/replacement for.  The original producer's credit will be
> added to yours, the original etext will be archived, and the new version
> posted (under the original etext number).
>
> Simply complaining about errors isn't useful, nor are general complaints,
> especially concerning older texts, such as "italics aren't shown" or
> "all-caps are used for italics, not underscores".
>
> Al
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/8eca553b/attachment.html>

From gbnewby at pglaf.org  Sun Apr 18 10:29:18 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 18 Apr 2010 10:29:18 -0700
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
In-Reply-To: <u2l627d59b81004181007nfed6c66dhfdb44feb20f5dfe8@mail.gmail.com>
References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
	<u2l627d59b81004181007nfed6c66dhfdb44feb20f5dfe8@mail.gmail.com>
Message-ID: <20100418172918.GA24296@pglaf.org>

On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote:
> It seems to me that error identification, reporting, verification and repair
> would be
> a lot easier if PG provided easily-accessible on-line access to the page
> images,

We post 'em when we get 'em.  There is guidance for the file naming
convention on images.

Mostly we do not get page images.  In the case of DP, a few people
have provided page images after the eBooks were posted.  But this does
not seem to be a part of the regular DP processing chain.

> and a form to provide the required information, and least for point cases.

A form...  maybe.  I am not sure this would make things any easier to
fix (for the fixers -- there are only three people who regularly apply
fixes -- Al is one of them, so his views carry more weight than mine!).
But it might make it easier for people to report errata.

> Then the reporting person could just find the page, check the image you're
> going
> to use for verification, and narrow things down for processing.

Sure.  Only some errors require checking page images, but it would
be nice to have them.  It would be nice to have them for numerous
purposes to which our readers might put them.
  -- Greg

> On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines at shaw.ca> wrote:
> 
> > Jim Adcock wrote:
> >
> >  just as in PG-land the lack of standards are causing
> >> texts to be distributed to users frequently missing or duplicating letters
> >> and words and in some cases whole paragraphs.
> >>
> >
> > Errors in PG's files should be reported to the Errata system:
> > errata2010_AT_pglaf.org
> >
> > Error reports should be as specific as possible.  Mention the etext number,
> > the line number(s), the line(s) of text in question, and the proposed
> > correction(s) to each.  If there are many errors, feel free to download and
> > correct the existing files, and send them to the above address.  (Don't
> > re-wrap; don't touch the PG header or footer.)
> >
> > If you feel that a text can be fixed only by a complete re-do (maybe it's
> > missing the illustrations, the index, or whatever), feel free to download a
> > scanset, get a copyright clearance, and have at it.  When the new fileset is
> > submitted through the normal process, mention the text number that it's an
> > update/correction/replacement for.  The original producer's credit will be
> > added to yours, the original etext will be archived, and the new version
> > posted (under the original etext number).
> >
> > Simply complaining about errors isn't useful, nor are general complaints,
> > especially concerning older texts, such as "italics aren't shown" or
> > "all-caps are used for italics, not underscores".
> >
> > Al
> >
> >
> > _______________________________________________
> > gutvol-d mailing list
> > gutvol-d at lists.pglaf.org
> > http://lists.pglaf.org/mailman/listinfo/gutvol-d
> >

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d


From ajhaines at shaw.ca  Sun Apr 18 10:58:06 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sun, 18 Apr 2010 10:58:06 -0700
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
	<u2l627d59b81004181007nfed6c66dhfdb44feb20f5dfe8@mail.gmail.com>
Message-ID: <64E735529F244281A37EF3F097BFCBA5@alp2400>

The only page scans PG has are those that may have been submitted by the preparer.  (Joshua Hutchinson has submitted many scansets of DP productions.)  Some DP submitters provide page scans linked to page numbers in the HTML version, but this is rare.  (I don't think I've ever seen a scanset from an independent producer.)

The Whitewashers, a.k.a. the Errata Team, simply aren't equipped to find, download, and process pagescans for the submissions they handle.  

Any questions/policy concerning making pagescans mandatory, e.g. the cost/amount of the increased drive space needed, I leave to Greg/Michael.


An errata submission webform would be useful.  (Some emailed errata reports are sadly lacking in detail.)  Maybe sometime when Greg has a student intern?


  ----- Original Message ----- 
  From: don kretz 
  To: Project Gutenberg Volunteer Discussion 
  Sent: Sunday, April 18, 2010 10:07 AM
  Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)


  It seems to me that error identification, reporting, verification and repair would be
  a lot easier if PG provided easily-accessible on-line access to the page images,
  and a form to provide the required information, and least for point cases.

  Then the reporting person could just find the page, check the image you're going
  to use for verification, and narrow things down for processing.


  On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines at shaw.ca> wrote:

    Jim Adcock wrote:


      just as in PG-land the lack of standards are causing
      texts to be distributed to users frequently missing or duplicating letters
      and words and in some cases whole paragraphs.


    Errors in PG's files should be reported to the Errata system: errata2010_AT_pglaf.org

    Error reports should be as specific as possible.  Mention the etext number, the line number(s), the line(s) of text in question, and the proposed correction(s) to each.  If there are many errors, feel free to download and correct the existing files, and send them to the above address.  (Don't re-wrap; don't touch the PG header or footer.)

    If you feel that a text can be fixed only by a complete re-do (maybe it's missing the illustrations, the index, or whatever), feel free to download a scanset, get a copyright clearance, and have at it.  When the new fileset is submitted through the normal process, mention the text number that it's an update/correction/replacement for.  The original producer's credit will be added to yours, the original etext will be archived, and the new version posted (under the original etext number).

    Simply complaining about errors isn't useful, nor are general complaints, especially concerning older texts, such as "italics aren't shown" or "all-caps are used for italics, not underscores".

    Al


    _______________________________________________
    gutvol-d mailing list
    gutvol-d at lists.pglaf.org
    http://lists.pglaf.org/mailman/listinfo/gutvol-d


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d at lists.pglaf.org
  http://lists.pglaf.org/mailman/listinfo/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/5e64b291/attachment.html>

From ajhaines at shaw.ca  Sun Apr 18 11:12:36 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sun, 18 Apr 2010 11:12:36 -0700
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
	<u2l627d59b81004181007nfed6c66dhfdb44feb20f5dfe8@mail.gmail.com>
	<20100418172918.GA24296@pglaf.org>
Message-ID: <5EDA62DFBF564575AAFA1CD75287E15D@alp2400>

Greg said:

> A form...  maybe.  I am not sure this would make things any easier to
> fix (for the fixers -- there are only three people who regularly apply
> fixes -- Al is one of them, so his views carry more weight than mine!).
> But it might make it easier for people to report errata.

A webform would (hopefully) make reporting more consistent, possibly with 
such mandatory fields as etext number, title, and author.  (Yes, the 
occasional report arrives with none of them, only a pre-10K filename, which 
has to be tracked down in the gutindex files to find the etext number.)

However, the current volume of errata reports (several/week, if that), 
probably doesn't make the work of creating such a form worth while.  And, 
agreed--it wouldn't help the actual correction process.


----- Original Message ----- 
From: "Greg Newby" <gbnewby at pglaf.org>
To: "Project Gutenberg Volunteer Discussion" <gutvol-d at lists.pglaf.org>
Sent: Sunday, April 18, 2010 10:29 AM
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)


> On Sun, Apr 18, 2010 at 10:07:35AM -0700, don kretz wrote:
>> It seems to me that error identification, reporting, verification and 
>> repair
>> would be
>> a lot easier if PG provided easily-accessible on-line access to the page
>> images,
>
> We post 'em when we get 'em.  There is guidance for the file naming
> convention on images.
>
> Mostly we do not get page images.  In the case of DP, a few people
> have provided page images after the eBooks were posted.  But this does
> not seem to be a part of the regular DP processing chain.
>
>> and a form to provide the required information, and least for point 
>> cases.
>
> A form...  maybe.  I am not sure this would make things any easier to
> fix (for the fixers -- there are only three people who regularly apply
> fixes -- Al is one of them, so his views carry more weight than mine!).
> But it might make it easier for people to report errata.
>
>> Then the reporting person could just find the page, check the image 
>> you're
>> going
>> to use for verification, and narrow things down for processing.
>
> Sure.  Only some errors require checking page images, but it would
> be nice to have them.  It would be nice to have them for numerous
> purposes to which our readers might put them.
>  -- Greg
>
>> On Sun, Apr 18, 2010 at 9:29 AM, Al Haines (shaw) <ajhaines at shaw.ca> 
>> wrote:
>>
>> > Jim Adcock wrote:
>> >
>> >  just as in PG-land the lack of standards are causing
>> >> texts to be distributed to users frequently missing or duplicating 
>> >> letters
>> >> and words and in some cases whole paragraphs.
>> >>
>> >
>> > Errors in PG's files should be reported to the Errata system:
>> > errata2010_AT_pglaf.org
>> >
>> > Error reports should be as specific as possible.  Mention the etext 
>> > number,
>> > the line number(s), the line(s) of text in question, and the proposed
>> > correction(s) to each.  If there are many errors, feel free to download 
>> > and
>> > correct the existing files, and send them to the above address.  (Don't
>> > re-wrap; don't touch the PG header or footer.)
>> >
>> > If you feel that a text can be fixed only by a complete re-do (maybe 
>> > it's
>> > missing the illustrations, the index, or whatever), feel free to 
>> > download a
>> > scanset, get a copyright clearance, and have at it.  When the new 
>> > fileset is
>> > submitted through the normal process, mention the text number that it's 
>> > an
>> > update/correction/replacement for.  The original producer's credit will 
>> > be
>> > added to yours, the original etext will be archived, and the new 
>> > version
>> > posted (under the original etext number).
>> >
>> > Simply complaining about errors isn't useful, nor are general 
>> > complaints,
>> > especially concerning older texts, such as "italics aren't shown" or
>> > "all-caps are used for italics, not underscores".
>> >
>> > Al
>> >
>> >
>> > _______________________________________________
>> > gutvol-d mailing list
>> > gutvol-d at lists.pglaf.org
>> > http://lists.pglaf.org/mailman/listinfo/gutvol-d
>> >
>
>> _______________________________________________
>> gutvol-d mailing list
>> gutvol-d at lists.pglaf.org
>> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d 


From Bowerbird at aol.com  Sun Apr 18 12:26:53 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 18 Apr 2010 15:26:53 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <10bbb.720a5730.38fcb6fd@aol.com>

boy oh boy, it must really be _painful_ to marcello
that he has to now be saying the very same things
that -- when i said them here for years and years --
he constantly disagreed with, and called me names.

assholes really hate to admit that they were wrong...

you know he had to resist it for a long time, but still,
eventually, one just cannot dispute the truth, can one?

***

greg said:
>   Having a single master format, from which 
>    all subsidiary formats can be derived, has been 
>    a long-time goal.? This has not yet been viable 
>    for most titles, despite valiant (and productive)
>    efforts with HTML and TeX.

unfortunately, if you've been _ignoring_ what has
happened here over the years -- as dr. newby has,
evidently -- then you might not realize that i have
already proven z.m.l. can be that "master format"...

that's right, greg!   while you've supposedly been
"looking for" such a format, i've been right here,
shoving it up under your nose.   do you see it now?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/fce05913/attachment.html>

From Bowerbird at aol.com  Sun Apr 18 12:34:36 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 18 Apr 2010 15:34:36 EDT
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
Message-ID: <110e5.659b7916.38fcb8cc@aol.com>

greg said:
>   There is guidance for the file naming convention on images.

evidently, as part of his general ignorance of what goes on here,
dr. newby missed my recent devastating critique of this "convention".

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/c7689a4a/attachment.html>

From Bowerbird at aol.com  Sun Apr 18 12:46:55 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 18 Apr 2010 15:46:55 EDT
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
Message-ID: <11976.78437e8e.38fcbbaf@aol.com>

al said:
>    An errata submission webform would be useful.? 
>    (Some emailed errata?reports are sadly lacking in detail.)
>    Maybe sometime when Greg has a student intern?

my gawd.   what a bunch of idiots we have in charge here.

who needs a "student intern" to create a darn web-form?

i devised a whole error-reporting system for you to use.

it's up on my site right now.

***

as for the page-images, we have equivalent stupidity...

the page-images for the books that d.p. processes are
sitting at d.p. for the entire time that the book is there.
p.g. could scrape them with the greatest of ease, _if_
it truly wanted them.   why should anyone be forced to
"submit" them?   do we _enjoy_ wasting people's time?

look...   if y'all are too busy to do things correctly, fine.
but at least empower somebody else to do the job, ok?
because otherwise your stupidity starts to look _willful_.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/da094e9d/attachment.html>

From Morasch at aol.com  Sun Apr 18 13:06:47 2010
From: Morasch at aol.com (Morasch at aol.com)
Date: Sun, 18 Apr 2010 16:06:47 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <40a9b.162dd4d7.38fcc057@aol.com>

michael mcd said:
>   the PDF looks quite good. Offhand, though, 
>    the page numbers look like they drop low enough 
>    that they _could_ be out of the printable area.

you're right.   so i fixed that.   download it again...
>    http://z-m-l.com/misc/14465-take5.pdf


>    shorter works are poor representatives of the problem

still, you don't need 350 pages to cover the waterfront,
especially when 315 of them have nothing happening...

***

jim said:
>    First time I tried downloading this is didn?t work. Tried 
>    it again later from a different computer and it worked.

don't know what to tell you, jim.


>    Tried printing out the first 10 pages.? 
>    My printer reported that the document requested C5 page size
>    ? but the C series is an envelope size?
>    I would have expected A4 or US ?Letter? size.

the pagesize is 5.5*8.5; that's what michael wanted.

eventually, how you will print it out will depend on
what you intend to do with it in terms of _binding_.

for this preview, you have 2 convenient options...

you can print it out 2-up, on letter-size, using the
"layout" method you should find in the print dialog.
for enhanced realism, slice pages down the middle.

or you can print it out on 5.5*8.5, which is available
at most any office-supplies stores, in my experience.

we'll discuss printing and binding more, at a later time.


>    First Page title appears to print off center to the left.

looks pretty dead-on centered to me, at least in the .pdf.
did you print to 5.5*8.5 paper?   if so, it should be right...


>    Contents in an unusually small font

correct...   my preference is for the contents section to be
shown on 1 page, 2 pages max, so i had to cramp the font.

i woulda reworked it manually if i wanted to take the time.

reworking entails moving the chapters to the first page of
each _book_ section, so the contents section up front just
contains the entries relating to the _parts_ and the _books._

it's issues like these that get into the nitty-gritty questions
on how automated you want the whole process to become.

the easiest solution would be to run the table of contents
over to 3 pages, or 4, or whatever it happens to be...   but
that approach doesn't produce a lot of satisfaction for me.

so the question for me is, "how hard is it to automate what
i would _really_ like to do in various situations like these?"

since i was doing this by hand, to get on the same page
with michael, i was willing to do a little manual massage.


>    Page numbers in an unusually large font

the text-editor i used to create this .pdf does it that way,
and as far as i can see, there's no way that i can control it.

but that's not the way my program does it.

so it's not something we really need to worry about...


>    Ragged Right is an unusual convention for a PDF document

michael expressed no preference; ragged-right was easier.


>    Body font seems to be unusually small.

i think so too.   it's 10-point times new roman, i believe.
(yep, that's it, checked.)   but michael has the young eyes.

what i was doing, in case it wasn't totally clear to people,
was retaining the existing linebreaks from the p.g. e-text.

in order to get (reasonable) half-inch margins on each side,
i had to reduce point-size down to what _i_ feel is too small.
...but i wasn't doing this for me; i was doing it for michael...


>    Line length of approx 70 chars seems unusually long for a
>    book-like format.? Most books use about 50 chars per line
>    of text because doing so makes the book more readable.

so you didn't notice that i was using the existing p.g. linebreaks.

there were several tip-offs.   first and foremost, the first line of
each paragraph is too long, because of the indent i introduced.

second, the lines are more ragged than they should be, because
line-length decisions are made on a monospace character count,
not on a metric based on the line's proportionally-spaced width.
(_any_ proportionally-spaced metric is better than monospacing,
since the proportions are highly correlated across various fonts.)

and thirdly, like i said, and you said, the line-lengths are too long.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/ba121e32/attachment.html>

From gbnewby at pglaf.org  Sun Apr 18 13:18:16 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Sun, 18 Apr 2010 13:18:16 -0700
Subject: [gutvol-d] Seeking current/past bookshelf maintainers
Message-ID: <20100418201816.GB30665@pglaf.org>

We used to have a bookshelf@ alias, but I've lost track
of who this is supposed to go to.  If it's you, or
you know who is (was?) maintaining the bookshelf
area of the gutenberg.org wikispace, please drop me a note.

This page is all we have for contact info, but it's currently a dead
end:

  http://www.gutenberg.org/wiki/Gutenberg:Bookshelf_Contributions

Thanks in advance.
  -- Greg (a.k.a. one of the idiots in charge)


From ricardofdiogo at gmail.com  Sun Apr 18 14:44:52 2010
From: ricardofdiogo at gmail.com (Ricardo F Diogo)
Date: Sun, 18 Apr 2010 22:44:52 +0100
Subject: [gutvol-d] Re: Seeking current/past bookshelf maintainers
In-Reply-To: <20100418201816.GB30665@pglaf.org>
References: <20100418201816.GB30665@pglaf.org>
Message-ID: <j2j9c6138c51004181444o46fabeccja29d81163c3b5918@mail.gmail.com>

It was Robert Marquardt, who died in Dec 2007.

From hart at pglaf.org  Sun Apr 18 15:17:07 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Sun, 18 Apr 2010 15:17:07 -0700 (PDT)
Subject: [gutvol-d] Re: Dim view of P3ers
In-Reply-To: <v2y6d99d1fd1004180724n87b1424cgb242f5ff2443d3bd@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<SNT120-DS137470B6E73D91DDD1984AE0C0@phx.gbl>
	<v2y6d99d1fd1004180724n87b1424cgb242f5ff2443d3bd@mail.gmail.com>
Message-ID: <alpine.DEB.2.00.1004181508550.1241@mail.pglaf.org>


David Starner, if you only would be willing to take your own advice.

So much of what you say here, and I've said it before, is complaint,
without you providing any hope of solution.  As I have said before--
there is a word for this, but it is not used in polite conversation.

If only you took EITHER your own advice OR your signature block:

"Kie ekzistas vivo, ekzistas espero."

at all seriously, then we would be glad to hear from you, however it
turns out that all too much of what you say goes to /dev/null or the
various other killfiles people use to filter you out.

Now. . .please. . .give some hope. . .or you will most certainly see
the result of using vinegar rather than honey to get what you want--
presuming you really do want things to get/work better.

Please. . .take a lesson from you own words. . . .

You once said something like:

As an honest person I am willing to learn from my mistakes. . . .

Please do. . . .


On Sun, 18 Apr 2010, David Starner wrote:

> On Sun, Apr 18, 2010 at 2:27 AM, Jim Adcock <jimad at msn.com> wrote:
> >?In the
> > business world this would be called "continuous improvement."
>
> Jim, in the business world, your complaint about the fact the business
> wasn't working on your preferred projects would annoy the hell out of
> your coworkers the eighth time they heard it, just like here.
>
> --
> Kie ekzistas vivo, ekzistas espero.
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From Bowerbird at aol.com  Sun Apr 18 15:54:58 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 18 Apr 2010 18:54:58 EDT
Subject: [gutvol-d] Re: the idiots in charge
Message-ID: <41f08.7e064389.38fce7c2@aol.com>

greg said:
>    -- Greg (a.k.a. one of the idiots in charge)

well, good, at least you have a sense of humor about it.       :+)

look, i apologize if i have criticized anyone unduly, 
as my intentions are not to hurt anyone's feelings...

we've discussed some of these topics over and over,
and even though some solutions are rather _obvious_
and we seem to have people willing to implement 'em,
nonetheless _nothing_ ever seems to get done.   _ever._

and when the topics come up again, and again and again,
the people in charge act as if dialog has never been held.

so the only conclusion that seems plausible out here in
volunteer-land is that nobody at the top is _listening_...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/3fe6cc58/attachment.html>

From traverso at posso.dm.unipi.it  Sun Apr 18 19:18:56 2010
From: traverso at posso.dm.unipi.it (Carlo Traverso)
Date: Mon, 19 Apr 2010 04:18:56 +0200 (CEST)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100418170536.GA22578@pglaf.org> (message from Greg Newby on
	Sun, 18 Apr 2010 10:05:36 -0700)
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
Message-ID: <20100419021856.6C702100B0@cardano.dm.unipi.it>

>>>>> "Greg" == Greg Newby <gbnewby at pglaf.org> writes:

    Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso
    Greg> wrote:
    >> Is PG ready to accept Epub as submission format? (i.e. one
    >> submits a valid epub from which the other formats are derived)?
    >> If so, one can target Epub, otherwise at best one is forced to
    >> submit HTML or txt that converts not-too-badly with current PG
    >> tools, and this migh be extremely challenging.
    >> 
    >> Carlo

    Greg> I don't think we're ready for this except in rare cases
    Greg> where ePub is the best format for display for a particular
    Greg> item (we just released a book where PDF was the best format,
    Greg> believe it or not).

    Greg> The challenge is that when books are fixed, someone
    Greg> (typically the whitewasher, seldom the original submitter)
    Greg> needs to regenerate all the files from that book.

    Greg> Since there is not yet any standard processing stream to
    Greg> generate static ePub files, this makes it hard for fixes (to
    Greg> HTML & text) to be applied to ePubs.

    Greg> I would, of course, love to see something become our
    Greg> "standard" conversion tool, usable by anyone.  Right now,
    Greg> the closest for PG is Marcello's software to build the
    Greg> cached ePub files.  It's wonderful and functional, but is it
    Greg> ready for all envisioned purposes?  I think not, due at
    Greg> least in part to shortcomings of the input HTML.

That's the whole point of my proposal. Starting with hand-crafted HTML
we are likely to end with poor ePub, since the inference of metadata
might be wrong, and many features of HTML need to be tuned to ePub and
might not turn out correct; While obtaining reasonable HTML from ePub
is just unzipping and discarding metadata. Maybe it will be harder to
have "nicely handcrafted" HTML, but we have to give the best available
product in the standard format that most users are likely to use (and
of course a reasonable product in every other format).

To maintain ePub (to correct typos) one has to unzip the ePub, correct
the HTML and re-zip.

Another issue is to automate the creation of txt from HTML. Currently,
the output of w3m -dump (or links -dump, or lynx -dump etc.) is pretty
good for txt, except that font changes (mainly, underscores for
italics) are lost.

It shouldn't be difficult to pre-process the HTML to show the
underscores for italics, in such a way that one obtains a reasonable
PG txt file. This might work better from the HTML generated from epub
(in which the HTML is more constrained) than for handcrafted HTML.

It might be a bit more challenging to downgrade from UTF-8 (as
generated by -dump) to iso-8859-1 or to ASCII, for example to handle
the unicode characters that are used to draw tables, but this might be
very well automated too.


This is on my side an offer to work towards the production of a
toolchain along these lines, if it is not discarded a priori.

Carlo

From Bowerbird at aol.com  Sun Apr 18 19:40:33 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 18 Apr 2010 22:40:33 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <1fdf6.4812f09b.38fd1ca1@aol.com>

carlo said:
>   Another issue is to automate the creation of txt from HTML.

why do it backwards?

when it's done correctly, the .txt file can create the .html...
and an xhtml file, if that's what you want.   and your .epub.
plus it can generate whatever kind of .pdf you might want...

don't you realize how stupid you sound when you say this?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100418/d50d90ac/attachment.html>

From marcello at perathoner.de  Mon Apr 19 02:15:00 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 19 Apr 2010 11:15:00 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100419021856.6C702100B0@cardano.dm.unipi.it>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
Message-ID: <4BCC1F14.1090801@perathoner.de>

Carlo Traverso wrote:
>>>>>> "Greg" == Greg Newby <gbnewby at pglaf.org> writes:
> 
>     Greg> On Sun, Apr 18, 2010 at 05:05:09PM +0200, Carlo Traverso
>     Greg> wrote:
>     >> Is PG ready to accept Epub as submission format? (i.e. one
>     >> submits a valid epub from which the other formats are derived)?
>     >> If so, one can target Epub, otherwise at best one is forced to
>     >> submit HTML or txt that converts not-too-badly with current PG
>     >> tools, and this migh be extremely challenging.
>     >> 
>     >> Carlo
> 
>     Greg> I don't think we're ready for this except in rare cases
>     Greg> where ePub is the best format for display for a particular
>     Greg> item (we just released a book where PDF was the best format,
>     Greg> believe it or not).
> 
>     Greg> The challenge is that when books are fixed, someone
>     Greg> (typically the whitewasher, seldom the original submitter)
>     Greg> needs to regenerate all the files from that book.
> 
>     Greg> Since there is not yet any standard processing stream to
>     Greg> generate static ePub files, this makes it hard for fixes (to
>     Greg> HTML & text) to be applied to ePubs.
> 
>     Greg> I would, of course, love to see something become our
>     Greg> "standard" conversion tool, usable by anyone.  Right now,
>     Greg> the closest for PG is Marcello's software to build the
>     Greg> cached ePub files.  It's wonderful and functional, but is it
>     Greg> ready for all envisioned purposes?  I think not, due at
>     Greg> least in part to shortcomings of the input HTML.
> 
> That's the whole point of my proposal. Starting with hand-crafted HTML
> we are likely to end with poor ePub, since the inference of metadata
> might be wrong, and many features of HTML need to be tuned to ePub and
> might not turn out correct; 

And what about users who download the HTML to view on a mobile? You must 
  produce better HTML not for the sake of ePub but for the sake of 
universal usability.


The metadata come directly from the PG database and are updated whenever 
the PG database changes. That makes our metadata far more consistent 
than your proposal would do.


> While obtaining reasonable HTML from ePub
> is just unzipping and discarding metadata.

ePub HTML is often split into chapters, which may leave you with 50+ 
files after unzipping which you have to merge manually.


> This is on my side an offer to work towards the production of a
> toolchain along these lines, if it is not discarded a priori.

Before that can happen a major `paradigm shift? has to happen at DP.

At DP the PPers enjoy to push their pet preferences down the readers 
throat: "What *I* See Is What You Get." And most PP time is spent in 
weaving those personal preference deep into the markup so as to make the 
markup pretty useless for anything but desktop devices with lots of 
screen, lots of cycles and lots of RAM.

What the PPers should do is to produce light semantic markup that lets 
the user choose the presentation and device: "Get It The Way You Want."

The PPers will have to relinquish their power of God -- or have it 
wrested from their hands -- and very strict guidelines will have to be 
put into place as to what markup is accepted.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From jimad at msn.com  Mon Apr 19 06:45:02 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 06:45:02 -0700
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
Message-ID: <SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>

>Was there are typo in "friend has bought an iPod" where you meant "iPad"?,
or did I miss something else that indicated changes from iPod to iPad?

Yes Typo Sorry iPad iPod sometimes (always?) Apple is too clever for its own
good.

>I didn't mention Stanza at all, so how can you be asking me "a serious
question:  Are you saying you are actually running Stanza on an iPad?"

You trashed me for playing mind games whereas I was actually describing my
actual experience with an iPad.  I bragged to my friend who had bought an
iPad how you could get these wonderful PG books on the iPad for free and
then proceeded to try to show him "all I know" about the subject -- and of
course nothing I tried to show him about reading these wonderful PG books on
the iPad actually worked in practice -- except we discovered that Apple has
ported at least a subset of PG books to the iBooks App stripping out the PG
header and acknowledgements in the process and "locked" the books to the
iBooks Applet which I think is actually a pretty bad reader app when you get
right down to it not even allowing one to set the margins....but it does
contain fluff such as animated page turns which is cute for about the first
five pages.

>Perhaps you can restate this and also enlighten us on the feature that
is missing, where it and how to use it on the other Stanza version[s].

As BB said Lexcycle which is now owned by Amazon doesn't appear to be
releasing a copy of Stanza for the iPad.  On the contrary when you download
Stanza for the iPad from the Apple Store what you get is a copy of Stanza
for the iPod which shows up within the iPod simulator build into the iPad.
The text of that iPod simulator has been "zoomed in on" without even
substituting a higher-rez version of the text resulting in a very blurry
read.

The controversy about the Apple "censorship" of Stanza can be found here:

http://www.google.com/search?q=Apple+Stanza+USB

For example quote "Lexcycle's Stanza e-book reader for the iPhone and iPod
touch has been stripped of USB book sharing, at the request of Apple...."
where "at the request of Apple" means "if you don't do what we say you can't
distribute your app via the Apple Store which is the only way to distribute
your app."

>> If you have found "good" ways to get PG directly to iPad how about
>> discussing them in detail, what you did to have success,
>
>I told you. . .I used the iBooks App that popped up at first turn on,
>and I also used the Wattpad App.

Sorry, but I don't know about the Wattpad App but I'm pretty sure the iBooks
App doesn't allow one to directly load PG books from the PG site.  Or have
you discovered something I didn't discover?  Why do I care?  I want to be
able to read what I want to read, and I want to be able to use the internet
and wifi to do so to get what I want to read where I want to get it.  I
don't want to send my $500 to Steve Jobs in order to *test* whether or not
he has locked down the iPad so much I cannot read what I want to read. Nook
is worthless, for example -- too locked down.  Has wifi which could be great
-- but B&N doesn't actually let you use that which you have paid for.
Kindle has weak and slow whispernet/AT&T connection which is troublesome
here in the 'burbs, also PG seems to be leaning towards ePub instead of
MOBI, which begs the question of long-term viability of MOBI -- but ePub in
turn has problems of dueling distributors and incompatible DRM schemes....
iPad says it allows you to transfer books via USB and the iTunes, but if you
have to plug in a USB cable then nook and Kindle have the same capabilities
so then why bother putting in wifi in the first place?  If you don't let the
purchaser use it?

>Is that because I asked if you didn't try iBooks and Wattpad?

How would I try these things without sending my $500 to Jobs for the
privilege of *testing* his offering?  You can't download this stuff at the
Apple Store.  What I really need to know is if any PG person has succeeding
in directly transferring books of their choice from an internet site of
their choice using wifi.

>Personally, I don't care where anyone gets our books from, just as long as
we
get them out to people.

I care because I would like to be able to use iPad or whatever to read books
in development, say for example SR from DP or my own efforts.  And I don't
want to wait an extra year or two for PG to make a new DVD distribution to
go out to Apple or whoever so that they can stick their own DRM scheme on
that PG effort or reduce it all down to txt before turning it back into HTML
and from there into ePub or MOBI -- to choose a few common examples.

>In a very real sense Apple, Amazon, et al, work for Project Gutenberg.

I would certainly disagree with this statement if they stick DRM on a PG
effort, or if they work to prevent redistribution of PG books among friends.
If they do these things then they are working AGAINST PG -- and using your
own books to do so.


From hart at pglaf.org  Mon Apr 19 06:57:26 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 06:57:26 -0700 (PDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC1F14.1090801@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
Message-ID: <alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>


Worthy of a second look:


Marcello Perathoner said:  [re:  eBooks for cellphones, etc]


Before that can happen a major `paradigm shift? has to happen at DP.

At DP the PPers enjoy to push their pet preferences down the readers throat:
"What *I* See Is What You Get." And most PP time is spent in weaving those
personal preference deep into the markup so as to make the markup pretty
useless for anything but desktop devices with lots of screen, lots of cycles
and lots of RAM.

What the PPers should do is to produce light semantic markup that lets the
user choose the presentation and device: "Get It The Way You Want."

The PPers will have to relinquish their power of God -- or have it wrested
from their hands -- and very strict guidelines will have to be put into place
as to what markup is accepted.


From jimad at msn.com  Mon Apr 19 07:25:23 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 07:25:23 -0700
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <4BCAE103.2030105@perathoner.de>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<4BCAE103.2030105@perathoner.de>
Message-ID: <SNT120-DS21F85D2E75FF39083E2B7DAE0B0@phx.gbl>

>   m.gutenberg.org

Wow. Now that *is* a step in the right direction!  Hope we-all will be able
to talk about it soon.


From hart at pglaf.org  Mon Apr 19 07:50:25 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 07:50:25 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>


Starting with your last comment first:

What DRM is put on PG files?

I thought the DRM was in the reader program, not the files.


In general:  you said you had hands on experience with and iPad,
but couldn't find anything that looked good, gave Stanza example.

Here is my suggestion:

The next time you get your hands on an iPad, or even ask friends
to try it for you, just do the little search they have and try a
few obvious things like "books" "ebooks" and similar things.

You'll get a handful of free or "lite" ereaders programs, and in
some cases they will be for the iPad, some for the iPod, and you
can compare them yourself to your heart's content, and give your
conclusions here, please.

My own conclusions were that all the iPad programs are readable.
Black on white, or white on black.  If the "Accessibility" black
and white reversal doesn't work, that means the program has that
under control via it's own commands.


You also complained about not getting directly from PG and DRM.

As I have said so many times, I don't care who redistributes PG,
from Tea Party people to Sarah Palin or Tina Fey. . .period.

If they put our books on their sites, or go the other way, great
difference from my POV, unless they censor out some books, but I
am not sure that is a valid reason even then to stop them.

There are right and left wing physical libraries. . .who cares?


More below:


On Mon, 19 Apr 2010, Jim Adcock wrote:

> >Was there are typo in "friend has bought an iPod" where you meant "iPad"?,
> or did I miss something else that indicated changes from iPod to iPad?
>
> Yes Typo Sorry iPad iPod sometimes (always?) Apple is too clever for its own
> good.

So are some of the people who post here. . . .


> >I didn't mention Stanza at all, so how can you be asking me "a serious
> question:  Are you saying you are actually running Stanza on an iPad?"

You still haven't made any point about Stanza, nor answered my question.

You say you are serious, and then you are all bent out of shape in both
your questions and your answers, and then you blame me and Apple. . . .

"The fault lies not the stars, the fault lies in ourselves."


> You trashed me for playing mind games

No, you trashed yourself, if you insist on calling it that, by writing
something that was very incomplete, inconclusive and confusing. . . .

The solution is just to lighten up and try again, not to accuse worlds
of "flaming" and "trashing" you.

Just make your point[s] as best you can, and move on.

An apology for when you have been confusing is also appropriate, with
no need to blame me or Apple.


> whereas I was actually describing my actual experience with an iPad.

Let's just say your "actual experience with an iPad" could have used a
little more explanation, perhaps a little more experience.

I just searched for "ebook" downloaded programs, and searched in those
for "Project Gutenberg."

I didn't expect to find a list of 30,000 titles on first try, any more
than I expect to get the books at pglaf.org or gutenberg.org or .cc on
the first try, even after lots of practice, certainly NOT first time.


> I bragged to my friend who had bought an iPad how you could get these
> wonderful PG books on the iPad for free and then proceeded to try to show
> him "all I know" about the subject -- and of course nothing I tried to show
> him about reading these wonderful PG books on the iPad actually worked in

It worked for me, but then I gave it a few tries.  However, the first two
both worked, as did all the others made for iPad, though I have not tried
each one in great detail, but enough to bring up books I know I typed in.

Keep trying. . . .

I really hate to say this, as you'll probably accuse me of flame/trashing
but it sounds as if you have spent more time complaining here than in the
actual testing of the product.

I'm sure you know that Apple wants to control how files get to iPads.

However, it certainly appears that at least several of the programs I was
testing have their own ways of getting our "Alice in Wonderland" example.


[Big Snip, will address later, if we get a few requests for it]


> >Is that because I asked if you didn't try iBooks and Wattpad?
>
> How would I try these things without sending my $500 to Jobs for the
> privilege of *testing* his offering?

My apologies, perhaps I have this all backwards, as I thought I had it
backwards when you swapped "iPad" for "iPod" or vice versa:

I thought you already had managed to "try these things without sending
my $500 to Jobs for the privilege of *testing* his offering. . . .

Did you, or did you not, make that trial run with an iPad?

If you did, then I made suggestions for how to get what you want.

Or at least what you SAY you want, but I'm not sure any longer.

If you did not take a test drive. . . .

Well, in either case I suggest more test driving, and searching for
"ebook" and "book" and the like and downloading their programs.


> You can't download this stuff at the Apple Store.

Then how did I manage to download them from the store?

I just tapped on "Apps" and did my little searches. . . .

Isn't that the way you're supposed to?

Am I really missing something here about your experience???

If so, I apologize, and am willing to start again, but I strongly
suggest a visit to where you can play with an iPad again and try,
hopefully successfully, some of the suggestions I already made.


> What I really need to know is if any PG person has succeeding in directly
> transferring books of their choice from an internet site of their choice
> using wifi.

You can do this with Goodreader.

There is a free Goodreader Lite for the iPad/iPhone, that let's
grab 5 books at a time. . .I tried it. . .it works.


> >Personally, I don't care where anyone gets our books from, just as long as
> >we get them out to people.
>
> I care because I would like to be able to use iPad or whatever to read books
> in development, say for example SR from DP or my own efforts.  And I don't
> want to wait an extra year or two for PG to make a new DVD distribution to
> go out to Apple or whoever so that they can stick their own DRM scheme on
> that PG effort or reduce it all down to txt before turning it back into HTML
> and from there into ePub or MOBI -- to choose a few common examples.

I agree that it's a pain to have to wait for books in progress.

However, Goodreader should let you download those if you find them.


> >In a very real sense Apple, Amazon, et al, work for Project Gutenberg.
>
> I would certainly disagree with this statement if they stick DRM on a PG
> effort, or if they work to prevent redistribution of PG books among friends.
> If they do these things then they are working AGAINST PG -- and using your
> own books to do so.

You insist on saying that someone who is filling this glass has left
it half empty just because it is not overflowing to the whole world.

Anything that gets more people to read more books is a positive even
if it is a little too high on the hog for most or is not into files-
sharing on your scale.

However, Wattpad, Goodreader, iBooks, and other do provide relief.


Times will change, people will jailbreak iPads and all. . . .

>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From jimad at msn.com  Mon Apr 19 07:53:06 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 07:53:06 -0700
Subject: [gutvol-d] Re: Dim view of P3ers
In-Reply-To: <v2y6d99d1fd1004180724n87b1424cgb242f5ff2443d3bd@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<SNT120-DS137470B6E73D91DDD1984AE0C0@phx.gbl>
	<v2y6d99d1fd1004180724n87b1424cgb242f5ff2443d3bd@mail.gmail.com>
Message-ID: <SNT120-DS1383431122DB2EC3DF82CCAE0B0@phx.gbl>

>Jim, in the business world, your complaint about the fact the business
wasn't working on your preferred projects would annoy the hell out of
your coworkers the eighth time they heard it, just like here.

That is probably a true statement: When one talks about things being "broken" and open for possible improvement the response is almost always universally scorn and derision.  Only when an organization falls into acute duress is it usually open to considering change -- if then. The US auto industry being perhaps a current, but weak, example.

Stating that I have a dim view of P3ers is probably overstating the case.  What I am sure I have a dim view of is:

Query-hyphen and especially the rote overuse of it by some P3ers.

The rote removal of whitespace on both sides of m-dash even when that is clearly not author intent.

Some P3ers who are clearly just SR'ing without looking at the page images.

Punting "bugs" down field under the assumption that *someone else* is going to fix them.

Not having a clear point in the process when the "proofing" phase is supposedly done.

Taking 3+ years to create a text, or not finishing a text that has had considerable volunteer time and effort invested in it.

Designing a process where *no one* is allowed to take responsibility for a text.

Distributing texts that have less than 1 or more than 1 copy of some portion of an author's text.

Distributing "risen to the public domain" texts under DRM

Preventing friends and fellow citizens from sharing texts "risen to the public domain"

Otherwise claiming or enforcing restrictions on the sharing and redistribution of texts "risen to the public domain"

Creating texts that cannot be used as widely as possible on a great variety of differing reader machines including addressing issues of "accessibility"

Demoware

-- Sorry if any of these statements are controversial -- I don't think they should be!


From jimad at msn.com  Mon Apr 19 08:00:46 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 08:00:46 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <20100418150509.8A3501008D@cardano.dm.unipi.it>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
Message-ID: <SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>


>Is PG ready to accept Epub as submission format? (i.e. one submits a
valid epub from which the other formats are derived)? If so, one can
target Epub, otherwise at best one is forced to submit HTML or txt
that converts not-too-badly with current PG tools, and this migh be
extremely challenging. 

It would be nice to have a portable version of the current tools, so that transcribers can see how their HTML is going to "officially" translate into ePub and MOBI prior to submission.  I tried porting the tools, but got bogged down by the amount of stuff which wouldn't port easily.


From jimad at msn.com  Mon Apr 19 08:21:59 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 08:21:59 -0700
Subject: [gutvol-d] Re: Reporting errors in PG files (was Dim view of P3ers)
In-Reply-To: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
References: <4A14932748A64E0F8B8CA199FCDF8B0E@alp2400>
Message-ID: <SNT120-DS1515A9C96C455362DE2E8CAE0B0@phx.gbl>

>Errors in PG's files should be reported to the Errata system: 
errata2010_AT_pglaf.org

Not sure how that's going to help when the problems are pretty systematic?

>If you feel that a text can be fixed only by a complete re-do (maybe it's 
missing the illustrations, the index, or whatever), feel free to download a 
scanset, get a copyright clearance, and have at it.

I'm doing one such right now but I'm apprehensive of the flame-fest that
will ensue if one namely me actually tries to redo an old text.  But, I
guess I'm willing to throw my body on the fire and see what happens
*next*...

>Simply complaining about errors isn't useful, nor are general complaints,
especially concerning older texts, such as "italics aren't shown" or
"all-caps are used for italics, not underscores".

The more general problem is that texts continue to be created that are
generally not readable with fidelity by many users on many different
machines.

Typical problem, as others have mentioned, lies in the choice HTML coding
techniques used, and a preference for visual cuteness on one or another HTML
machine rather than on fidelity on a wide variety of HTML and HTML derived
machines -- including issues of "accessibility."


From jimad at msn.com  Mon Apr 19 08:43:57 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 08:43:57 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC1F14.1090801@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
Message-ID: <SNT120-DS231C7BC25C8D72BE024F25AE0B0@phx.gbl>

>The PPers will have to relinquish their power of God -- or have it 
wrested from their hands -- and very strict guidelines will have to be 
put into place as to what markup is accepted.

I'm not sure that the PPers in question understand the damage they are
doing. A first step would be not to force changes but at least let people
know what problems they are creating and how NOT to cause them.  There are
some people at DP who care about these issues -- and obviously others who do
not. Obviously its very hard to tell people to try to minimize their use of
CSS....


From tunelera at yahoo.com  Mon Apr 19 08:47:17 2010
From: tunelera at yahoo.com (Julia C. Miller)
Date: Mon, 19 Apr 2010 10:47:17 -0500
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>
	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
Message-ID: <4BCC7B05.5020506@yahoo.com>

In order for a "paradigm shift" to happen at DP, PG has to define what 
is and is not acceptable in the HTML and spell it out so that DP can put 
it into practice. I took another look at the PG HTML FAQ and it does not 
say anything that might be used as a guide to improving HTML output.

It would also be extremely helpful to have a way to preview the 
different output formats so we can test our finished HTML and make sure 
it works properly not only as HTML but also as the source for the other 
formats.

I (for one) am happy to modify the way I do things -- as long as someone 
explains what should/shouldn't be done and why. I am not a computer 
professional (and neither are many or most of the PPers at DP) and don't 
have the time or background to track down the current thinking on how to 
code HTML. But I don't have a problem modifying my practices to end up 
with a better end product.

Perhaps some of the time that is spent ranting about DP's work flow and 
DP's output could be better put to use creating more informative FAQs or 
even guidelines that DPers can use to create output that fits into the 
current thinking about acceptable HTML and/or other formats.


On 4/19/2010 8:57 AM, Michael S. Hart wrote:
> Worthy of a second look:
>
>
> Marcello Perathoner said:  [re:  eBooks for cellphones, etc]
>
>
> Before that can happen a major `paradigm shift? has to happen at DP.
>
> At DP the PPers enjoy to push their pet preferences down the readers throat:
> "What *I* See Is What You Get." And most PP time is spent in weaving those
> personal preference deep into the markup so as to make the markup pretty
> useless for anything but desktop devices with lots of screen, lots of cycles
> and lots of RAM.
>
> What the PPers should do is to produce light semantic markup that lets the
> user choose the presentation and device: "Get It The Way You Want."
>
> The PPers will have to relinquish their power of God -- or have it wrested
> from their hands -- and very strict guidelines will have to be put into place
> as to what markup is accepted.
>
>    
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/0c4947d4/attachment-0001.html>

From prosfilaes at gmail.com  Mon Apr 19 09:13:34 2010
From: prosfilaes at gmail.com (David Starner)
Date: Mon, 19 Apr 2010 12:13:34 -0400
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC1F14.1090801@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
Message-ID: <h2x6d99d1fd1004190913m67b1f5ach6cc33e304d900f33@mail.gmail.com>

On Mon, Apr 19, 2010 at 5:15 AM, Marcello Perathoner
<marcello at perathoner.de> wrote:
> At DP the PPers enjoy to push their pet preferences down the readers throat:
> "What *I* See Is What You Get." And most PP time is spent in weaving those
> personal preference deep into the markup so as to make the markup pretty
> useless for anything but desktop devices with lots of screen, lots of cycles
> and lots of RAM.

You know we might have TEI-Lite now if you hadn't tried to push your
pet preferences about what the generated HTML must look like on all DP
projects, especially when you had the audacity to call it standard
when it clearly wasn't.

-- 
Kie ekzistas vivo, ekzistas espero.

From dakretz at gmail.com  Mon Apr 19 09:29:40 2010
From: dakretz at gmail.com (don kretz)
Date: Mon, 19 Apr 2010 09:29:40 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <1fdf6.4812f09b.38fd1ca1@aol.com>
References: <1fdf6.4812f09b.38fd1ca1@aol.com>
Message-ID: <z2g627d59b81004190929x68edfe76s87732e0605c69caa@mail.gmail.com>

Hypothesis: A good paradigm for proofing and marking up a book is an
outline.

Several assumptions that help this to work.

1. Without any exceptions I can think of, any comprehensible printed
text can be completely, unambiguously outlined.

We know from experience it works. Any XML document, including
an XHTMLdocument, complies by definition. It's not just a good idea,
it's the law.

2. An outline is easy to define and easy to understand. Conceptually,
it's simply a regular hierarchical structure, with every syntactic
element completely embedded within another below a simple
sequential list of top-level elements.


3. Any syntactic element can be structurally identified as one
of three types.
    a.) A section.
    b.) A sequence of characters.
    c.) A position offset from the start (of the text, and/or of an
element.)

We know from experience that this works. Any HTML element can
be bound by one of only two types: a <div> or a <span>. What we
need to do is to associate logical divs and spans with syntax.

==================================================

Benefits:

A book that has been outlined is probably simultaneously
easier to build, to read, to comprehend, to verify visually, to verify
grammatically with software, and to transform into ebook markups
than any other format. And structurally, it's self-validating.

Low barrier to entry. Anyone can proof with confidence from the
start, with a brief introduction and a list of syntax elements.


==================================================

Proofing interface.

Notice that the proofing representation can be entirely separate from
the serialized representation - i.e. how it's stored in a file for instance.

What might it look like? We have lots of history for this - there are
not many ways to represent language that are more universal than
an outline. Almost all of us come pre-trained.

Say the convention is to start an element with a newline, a plus
sign, and a syntax tag, on a line by themselves. Paragraphs are so
common that they can just start with, say, two blank lines. An
element's content continues with indented content. An element
ends with the start of another element at the same indentation
level, two blank lines (another paragraph), or outdented content.

+chapter
    +chapter-heading
        Chapter The First

    It was a dark and windy ...

I think I'll play with this a bit and see how far it goes. Is anyone
familiar with other attempts in this line?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/63dbef3a/attachment.html>

From dakretz at gmail.com  Mon Apr 19 09:33:30 2010
From: dakretz at gmail.com (don kretz)
Date: Mon, 19 Apr 2010 09:33:30 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <z2g627d59b81004190929x68edfe76s87732e0605c69caa@mail.gmail.com>
References: <1fdf6.4812f09b.38fd1ca1@aol.com>
	<z2g627d59b81004190929x68edfe76s87732e0605c69caa@mail.gmail.com>
Message-ID: <p2z627d59b81004190933p46e005ffp329cd81469df3c88@mail.gmail.com>

Oh, and ...

Yes, indenting every line of text is a bitch. So don't do it.
If the tagging is done properly (probably an adaptation of the
example), software can indent it automatically.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/6a54f268/attachment.html>

From lee at novomail.net  Mon Apr 19 09:34:09 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 10:34:09 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC7B05.5020506@yahoo.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
	<4BCC7B05.5020506@yahoo.com>
Message-ID: <4BCC8601.3020408@novomail.net>

On 4/19/2010 9:47 AM, Julia C. Miller wrote:

>   In order for a "paradigm shift" to happen at DP, PG has to define what
> is and is not acceptable in the HTML and spell it out so that DP can put
> it into practice. I took another look at the PG HTML FAQ and it does not
> say anything that might be used as a guide to improving HTML output.

The odds of this happening are about equivalent to that of having 
porcine aviators; Mr. Hart is diametrically opposed to standards of any 
kind for PG. However, PG creating an HTML standard is in fact 
unnecessary. According to Mr. Hart (although somewhat disputed by Mr. 
Haines) PG will accept just about anything it is given. Thus, DP could 
establish its own HTML guidelines with the assurance that they would be 
acceptable to PG. Non-conforming HTML could still make its way into the 
PG corpus from other sources, but at least the DP work-product would be 
consistent.

> It would also be extremely helpful to have a way to preview the
> different output formats so we can test our finished HTML and make sure
> it works properly not only as HTML but also as the source for the other
> formats.

This could be so difficult as to be nigh on impossible. For example, as 
most here know, the ".epub" format is actually just a zip file 
containing (among other things) the XHTML version of the document. How 
that document is displayed does not rely at all on the nature of the 
document's markup, but almost exclusively on the capabilities of reading 
device's software. The .epub readers based on JavaScript (such as 
Monocle) will probably display the text with as much richness as the 
hosting browser software would, whereas standalone .epub readers (such 
as ?Book) will only display what the software designers felt was 
important, and probably will not support CSS at all. No one viewer can 
tell you if the markup is satisfactory, because with .epub the markup is 
only part of the story.

On the other hand, if DP were to establish HTML guidelines and 
requirements (requirements for a baseline, guidelines for enhancements) 
I would be happy to code up a program which would test for conformance 
to those guidelines. I couldn't give you a picture, but I could give you 
a thousand words.

> I (for one) am happy to modify the way I do things -- as long as someone
> explains what should/shouldn't be done and why. I am not a computer
> professional (and neither are many or most of the PPers at DP) and don't
> have the time or background to track down the current thinking on how to
> code HTML. But I don't have a problem modifying my practices to end up
> with a better end product.

Adding HTML markup to a document (or modifying that which is already 
there) is nowhere near as difficult as many would have you believe. 
Check out 
http://web.archive.org/web/20080327044926/gutenberg.hwg.org/tutorials.html 
and http://www.dysfunctionals.org/~networker/HTMLeBooks.html. But you 
are correct, having a document like one of these which is DP-sanctioned 
would simplify a PPers life dramatically.

> Perhaps some of the time that is spent ranting about DP's work flow and
> DP's output could be better put to use creating more informative FAQs or
> even guidelines that DPers can use to create output that fits into the
> current thinking about acceptable HTML and/or other formats.

Many have tried (among them Mr. Hutchinson and Mr. Perathoner). But 
without organizational buy-in those FAQs and guidelines will go 
nowhere--fast. Unfortunately, there appears to be no one left at DP with 
the clout to say, "this is our first draft of HTML guidelines. Comments 
and discussion is welcome, but by the end of the year some sort of 
guidelines /will/ be adopted." As near as I can tell, the ranters rant 
not because DP's work flows are, shall we say, sub-optimal, or because 
the FAQs and guidelines have not been written, but because none of the 
Powers That Be at DP seem to be willing to do anything about it.

These kinds of decisions cannot be made by consensus. Somebody needs to 
step up to the plate. Mr. Adcock seems to still have enough respect for 
DP that he believes it can be improved. I do not. I would love for 
someone to prove me wrong.


From dakretz at gmail.com  Mon Apr 19 09:45:29 2010
From: dakretz at gmail.com (don kretz)
Date: Mon, 19 Apr 2010 09:45:29 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC8601.3020408@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
	<4BCC7B05.5020506@yahoo.com> <4BCC8601.3020408@novomail.net>
Message-ID: <o2o627d59b81004190945v89e9d6f4w111940de292a4a8@mail.gmail.com>

Also, I see maybe 3 or 4 elements that should be identified in-line using
conventions we already have. Italics, boldface, small-caps (although these
are often micro-headings), ...

One opportunity I think would be break out embedded quotes and make
them visually obvious. And their boundaries checkable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/42b0a436/attachment.html>

From ke at gnu.franken.de  Mon Apr 19 09:57:05 2010
From: ke at gnu.franken.de (Karl Eichwalder)
Date: Mon, 19 Apr 2010 18:57:05 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <h2x6d99d1fd1004190913m67b1f5ach6cc33e304d900f33@mail.gmail.com>
	(David Starner's message of "Mon, 19 Apr 2010 12:13:34 -0400")
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
	<h2x6d99d1fd1004190913m67b1f5ach6cc33e304d900f33@mail.gmail.com>
Message-ID: <m2sk6rxtwe.fsf@gnu.franken.de>

David Starner <prosfilaes at gmail.com> writes:

> You know we might have TEI-Lite now if you hadn't tried to push your
> pet preferences about what the generated HTML must look like on all DP
> projects, especially when you had the audacity to call it standard
> when it clearly wasn't.

At least, tidy seems to be happy with it and you can embed your own CSS
fragments.  And, finally, it's much better than all these handcrafted
HTML exercises that are mostly just a waste of time.

-- 
Karl Eichwalder

From jimad at msn.com  Mon Apr 19 10:07:28 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 10:07:28 -0700
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
Message-ID: <SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>


>What DRM is put on PG files? I thought the DRM was in the reader program,
not the files.

DRM on books in my experience is typically implemented as a device-specific
encryption such that even if you move an ebook file to a different machine
you own that machine cannot read the file. A hidden key say "serial number"
on a particular hardware device is used as a decryption key to allow
decryption of the file encrypted specifically for that device.  Thus for
example if one buys an in-copyright book from Amazon and you physically copy
that ebook file from one Kindle you own to another Kindle you own the second
Kindle still will not be able to read the ebook file for the first Kindle.
While Amazon will typically allow you to read one purchased book on six
devices simultaneously -- including on Kindle for the PC, Kindle for the
Mac, Kindle for the Blackberry, etc, each of those ebook files has to be
downloaded separately from Amazon because each comes with a unique
device-specific encryption.  Not everything from Amazon needs to have DRM,
the publisher who uploads to Amazon has the right to specify "I don't want
DRM on my book."  Further, the encryption schemes are typically owned
proprietary to a particular company or consortium and discussing or
distributing information about those encryption schemes is against the law.
And thus ePub specs for example doesn't include description of a ePub
specific DRM scheme rather each distributor of ePub files can implement
their own proprietary and mutually incompatible DRM scheme such that owning
multiple ePub devices is not sufficient to ensure that one can purchase an
ePub book for one device and read it on another device.  And if you have an
ePub library of purchased books that you read on your blackberry or what
have you and now you want to move that library to your new iPad well too bad
because its probably not going to work.  Nor can you resell your ebooks to
someone else on eBay when you are done with them.

Even without DRM as far as I know all storage on iPad is currently tied to a
particular app so even if you have a non-DRM "PG" book under Apples' iBooks
applet you can't say "Gee let me open that up in Stanza because Stanza
offers a better ebook reader" -- you can't do that because the iPad ties the
book file to the particular reader applet. If Apple were to allow book
transfer via USB then god forbid you could at least move non-DRM books from
one iPad reader applet to a different iPad reader applet!

>The next time you get your hands on an iPad, or even ask friends
to try it for you, just do the little search they have and try a
few obvious things like "books" "ebooks" and similar things.

Sorry, perhaps "friends" was too strong a word but I thought what I have
been asking here is if anyone in PG or DP land has actually found an applet
that will allow them to directly download a free book from the PG website or
other websites using wifi and read it in a manner that makes you happy.  Or
is it necessary, as in the case of Apple's own iBooks applet, to *always*
tie the distribution path of the applet to the applet itself?  This may seem
like a strange question except that Apple already HAS shut down distribution
of books by USB except via the iTunes monopoly. 

>As I have said so many times, I don't care who redistributes PG,
from Tea Party people to Sarah Palin or Tina Fey. . .period.

Again, I acknowledge that *you* don't care, but I do: I want to be able to
get books and publications directly from a variety of web sites via wifi,
and I don't want the applet nor the device to tell me where I can get MOBI
or ePub books from anymore than I would want a web browser tell what HTML I
am allowed read from what HTML sites.  This is the ebook version of "net
neutrality" as opposed to buying a device from Big Brother and letting Big
Brother then tell you that you can only use that device to buy MORE product
from Big Brother. Even Big Bill allows me to install a large variety of
reader apps on my netbook say, right click on some ebook I see anywhere on
the internet, and that book automagically opens in my choice of reader app.

>An apology for when you have been confusing is also appropriate, with
no need to blame me or Apple.

I apologize for having difficultly unambiguously discussing terminology that
Appple chooses to be deliberately ambiguous as a cute marketing device.

>It worked for me, but then I gave it a few tries.  However, the first two
both worked, as did all the others made for iPad, though I have not tried
each one in great detail, but enough to bring up books I know I typed in.

You are I have differing ideas of what "it worked" means.  For example on a
Kindle, which again is also not the most "unlocked" device in the world, I
can web browse to www.gutenberg.org, click on a MOBI title there, and it
"works", or I can go to FreeKindleBooks, or to Feedbooks, etc -- my choice
of publisher -- and if it's a "free book" I can get it -- it works.  I can't
get it if it's a "for pay" book because Amazon has locked up that
distribution channel -- which is not a good thing. As opposed to Nook where
none of this works at all except the direct "for pay" path from B&N.

>I really hate to say this, as you'll probably accuse me of flame/trashing
but it sounds as if you have spent more time complaining here than in the
actual testing of the product.

We spent about four hours playing with the iPad and trying to get it to do
what we wanted it to do -- namely direct access to free ebooks on particular
websites on the internet. In that amount of time I was already writing
software to freely distribute books on the Kindle when I got my first Kindle
Dec 2007.

>I'm sure you know that Apple wants to control how files get to iPads.

Yes, the only question is just how badly "locked down" their device is in
the matter -- and whether or not they will take steps again in the future to
force an increase in that "lock down". Again, what I want is the ebook
version of "net neutrality" -- I want to have an ebook reader applet which
is independent of ebook publisher. I don't want to have to acquire and use a
different ebook reader applet for each book I want to purchase -- or acquire
freely on the internet.  There are WAY more internet sites offering
interesting books and other publications that there are organizations
willing to write applets for the iPad!

>I thought you already had managed to "try these things without sending
my $500 to Jobs for the privilege of *testing* his offering. . . .

Sure, I borrowed a friend and his iPad for four hours of lack of success
trying various approaches after which he ran away screaming....

>Then how did I manage to download them from the store?

Sorry again more Apple cuteness, there is the "Apple Store" virtual on the
internet, and there is the "Apple Store" bricks and mortar at the Mall. I
can download software from the virtual Apple Store to my desktop, but then I
don't have a physical iPad to test it on.  Or I can go to the Mall where
they have a physical iPad, but then I don't have permission to download and
install applets from the virtual Apple Store. And I've used up my friendship
for right now with the "bricks and mortar" friend who has a "bricks and
mortar" iPad...

>You can do this with Goodreader.

OK, good suggestion -- their website looks promising I will dig into it more
-- thanks!

>Times will change, people will jailbreak iPads and all. . . .

I am hoping that the future OS in the works for iPad may make things less
restrictive. Not personally interested in hacking anything to get increased
access. Hacking to my taste is incompatible with creating texts for PG....


From jimad at msn.com  Mon Apr 19 10:26:22 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 10:26:22 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC8601.3020408@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>
	<4BCC8601.3020408@novomail.net>
Message-ID: <SNT120-DS77D0C6767F8C1A9848416AE0B0@phx.gbl>

>> It would also be extremely helpful to have a way to preview the
>> different output formats so we can test our finished HTML and make sure
>> it works properly not only as HTML but also as the source for the other
>> formats.
>
>This could be so difficult as to be nigh on impossible. For example, as 
>most here know, the ".epub" format is actually just a zip file 
>containing (among other things) the XHTML version of the document....

Sorry, but I've looked and tried to port Marcello's HTML->epub code and its anything but that simple. (But I am not an experienced Python coder)

Again, to my mind a "preview" need simply be a portable version of Marcello's code so that we can do our own HTML to ePub conversion (and from there to MOBI) and run it on the variety of ePub and MOBI reader devices and software we already own, so that we have *some* idea of the problems that the particular HTML is going to run into on various portable devices.  And I am sure there are any number of people who are willing to preview a DP candidate release on the hardware they own in order to find what problems there are to be found -- most of us are pretty passionate about our choice of hardware and would like very much for DP/PG to produce ebooks that actually work on our hardware investments!

PS: I already to make preview versions of my HTML on ePub and MOBI -- its just that the HTML->ePub and HTML->MOBI conversion software I have is not identical to Marcello's and thus the formatting ends up different than the "official" version.


From marcello at perathoner.de  Mon Apr 19 10:35:52 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Mon, 19 Apr 2010 19:35:52 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC7B05.5020506@yahoo.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
	<4BCC7B05.5020506@yahoo.com>
Message-ID: <4BCC9478.5040204@perathoner.de>

Julia C. Miller wrote:

> In order for a "paradigm shift" to happen at DP, PG has to define what is and is 
> not acceptable in the HTML and spell it out so that DP can put it into practice. 

It would be much better if DP did that.


> It would also be extremely helpful to have a way to preview the different output 
> formats so we can test our finished HTML and make sure it works properly not 
> only as HTML but also as the source for the other formats.

Roger Frank has the converter and did extensive testing on it.


> I (for one) am happy to modify the way I do things -- as long as someone 
> explains what should/shouldn't be done and why. I am not a computer professional 
> (and neither are many or most of the PPers at DP) and don't have the time or 
> background to track down the current thinking on how to code HTML. But I don't 
> have a problem modifying my practices to end up with a better end product.

Got to the DP wiki and search for 'ePub'. I don't know the exact url 
because the site is down.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From lee at novomail.net  Mon Apr 19 10:38:47 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 11:38:47 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
Message-ID: <4BCC9527.3000103@novomail.net>

On 4/19/2010 9:00 AM, Jim Adcock wrote:

[snip]

> It would be nice to have a portable version of the current tools, so
> that transcribers can see how their HTML is going to "officially"
> translate into ePub and MOBI prior to submission.  I tried porting
> the tools, but got bogged down by the amount of stuff which wouldn't
> port easily.

Only half of this proposal is possible: the .mobi half.

As others have pointed out recently, .epub is not really an e-book
format. For reasons both technical and practical, most people agree that
HTML is the preferred markup for creating e-books. The primary drawback
to HTML is that it is inherently a multi-file solution; the HTML file is
distinct from the image files, CSS files, font files, etc. Moreover, if
you had multiple HTML files that made up the book (and sometimes there
are good technical reasons for doing so) you needed yet another metafile
that described how the different files related to each other.

After about a year of wrangling, in September 2006 the IDPF officially
released the "Open Container Format," which specified how a collection
of HTML files and the other files on which they depend would be included
in a ZIP archive. The specification recommends using the file extension
".epub" to identify files that are OCF containers.

In other words, an ".epub" file is just a ".zip" file with a few
additional metadata files added. Software that purports to "convert"
HTML to .epub should not do /anything/ to the source file, except
perhaps to insure that it is valid XHTML (for older HTML files). There
is no need to validate an .epub conversion, as no conversion should have
occurred. If a rendered .epub document does not look exactly like the
same collection of files rendered by a browser from the file system, it
is the fault of the .epub rendering software, not the "conversion."

Mobipocket, on the other hand, is a different ball of worms.

The original Mobipocket reader (which, I understand, became the basis
for the Kindle software) used a subset of HTML markup, and in a few
instances changed the meaning of tags (<hr /> does not create a
Horizontal Rule, but starts a new page in the user agent). It did not
recognize all of the named entities, and did not support CSS at all. A 
Mobipocket PRC file was simply this almost-HTML compressed using Rick 
Bram's PalmDOC compression scheme (which was actually quite elegant in 
its simplicity).

The later ".mobi" format was the same almost-HTML file compressed across 
the entire package using Huffman encoding instead. It produces a 
somewhat small file; the contents of the archive are identical to those 
in the ".prc" format.

Mobipocket Publisher (which I assume is still what is used to create 
Kindle files) claimed that Mobipocket files supported CSS. In fact what 
happened was that Mobipocket Publisher would load a CSS file if it were 
specified in the source HTML, and would convert all the style attributes 
and computed CSS to the almost-HTML the Mobipocket reader recognized. 
Thus, a style like "style='font-size: larger';" might be converted to 
"<font size='4'>", but a style like "style='margin-left: 10em';" was 
simply discarded, because the Mobipocket almost-HTML did not recognize 
any way to change margin sizes.

If you wanted to test the Mobipocket conversion, I would think the way 
to do that would be to extract the modified HTML from the Mobipocket 
file, and then write whatever kind of tests you needed to be sure the 
conversion was correct. I have some 'C' code hanging around to extract 
HTML from ".mobi" files; if you want it, I could send it to you.

From dakretz at gmail.com  Mon Apr 19 11:01:08 2010
From: dakretz at gmail.com (don kretz)
Date: Mon, 19 Apr 2010 11:01:08 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC9527.3000103@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
	<4BCC9527.3000103@novomail.net>
Message-ID: <v2v627d59b81004191101pbff8f242gf6d3959e88d1bbf4@mail.gmail.com>

The primary drawback

 to HTML is that it is inherently a multi-file solution;

I'd say that's far from the primary drawback.

Much more substantial drawbacks are that is presentational,
not syntactic; and even if you make it even more complex
with syntactic information (or don't for that matter) the
proofers will never (nor should they) proof in that format.

For DP's purposes, for actually doing the work, HTML is
a non-starter - but so is any other equally complex (I'd say
any XML-based) representation. What we have in there
already (<i> etc.) is the locus of major headaches and
an ongoing error-trap.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/53e51470/attachment.html>

From Bowerbird at aol.com  Mon Apr 19 11:02:47 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 Apr 2010 14:02:47 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <81321.79a934f1.38fdf4c7@aol.com>

tunelera said:
>    Perhaps some of the time that is spent ranting 
>    about DP's work flow and DP's output could be 
>    better put to use creating more informative FAQs
>     or even guidelines that DPers can use to create 
>    output that fits into the current thinking about 
>    acceptable HTML and/or other formats.

my word, it must be "convenient" to simply _ignore_
all the work that i've done here in the last six years.

to reiterate, i solved this problem a long time ago...

***

michael said:
>    Worthy of a second look:
>    Marcello Perathoner said:

hey, go ahead and look a second time if you like,
but marcello is rarely worth the effort...

to some extent, he's on the right track.   then again,
to that exact same extent, i've said the same thing,
over and over, again and again, for years and years.

i'm also smart enough to know that postprocessors
at d.p. will not go for this approach.   they _want_ to
make it look pretty.   that's why they do what they do.
so you will never get them to strip down their .html.

but that doesn't matter...   if you jigger the workflow
so it will create a text-file which has semantic rigor
-- e.g., one in z.m.l. format -- you can use _that_ as
"the master file" and still let the postprocessors play
for as long as they like in fancy-markup disneyland.

plus you will make the d.p. workflow more efficient.

-bowerbird

p.s.   i'm also smart enough to know it is impossible
to "target" .epub at this time, because of the huge
inconsistencies in the way that it gets rendered by
the various apps out there.   if you focus on adobe's,
you're gonna break your file for other viewer-apps,
and vice-versa.   it will take _years_ to gain stability.
the .epub scene was mired in hype since its start...

p.p.s.   and if _you_ were smart enough, you'd know
rfrank already tried to target .epub, but he gave up.
d.p. is down right now, so i can't quote you the u.r.l.,
but use the forum search and you can find it easily...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/5f51a79c/attachment.html>

From hart at pglaf.org  Mon Apr 19 11:06:57 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 11:06:57 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>


On Mon, 19 Apr 2010, Jim Adcock wrote:

Apple has assured me over and over there is no DRM on our files.

If you have any evidence to the opposite, we'd love to hear it.

More below:

>
> >What DRM is put on PG files? I thought the DRM was in the reader program,
> not the files.
>
> DRM on books in my experience is typically implemented as a device-specific
> encryption such that even if you move an ebook file to a different machine
> you own that machine cannot read the file. A hidden key say "serial number"
> on a particular hardware device is used as a decryption key to allow
> decryption of the file encrypted specifically for that device.  Thus for
> example if one buys an in-copyright book from Amazon and you physically copy
> that ebook file from one Kindle you own to another Kindle you own the second
> Kindle still will not be able to read the ebook file for the first Kindle.
> While Amazon will typically allow you to read one purchased book on six
> devices simultaneously -- including on Kindle for the PC, Kindle for the
> Mac, Kindle for the Blackberry, etc, each of those ebook files has to be
> downloaded separately from Amazon because each comes with a unique
> device-specific encryption.  Not everything from Amazon needs to have DRM,
> the publisher who uploads to Amazon has the right to specify "I don't want
> DRM on my book."  Further, the encryption schemes are typically owned
> proprietary to a particular company or consortium and discussing or
> distributing information about those encryption schemes is against the law.
> And thus ePub specs for example doesn't include description of a ePub
> specific DRM scheme rather each distributor of ePub files can implement
> their own proprietary and mutually incompatible DRM scheme such that owning
> multiple ePub devices is not sufficient to ensure that one can purchase an
> ePub book for one device and read it on another device.  And if you have an
> ePub library of purchased books that you read on your blackberry or what
> have you and now you want to move that library to your new iPad well too bad
> because its probably not going to work.  Nor can you resell your ebooks to
> someone else on eBay when you are done with them.
>
> Even without DRM as far as I know all storage on iPad is currently tied to a
> particular app so even if you have a non-DRM "PG" book under Apples' iBooks
> applet you can't say "Gee let me open that up in Stanza because Stanza
> offers a better ebook reader" -- you can't do that because the iPad ties the
> book file to the particular reader applet. If Apple were to allow book
> transfer via USB then god forbid you could at least move non-DRM books from
> one iPad reader applet to a different iPad reader applet!
>
> >The next time you get your hands on an iPad, or even ask friends
> to try it for you, just do the little search they have and try a
> few obvious things like "books" "ebooks" and similar things.
>
> Sorry, perhaps "friends" was too strong a word but I thought what I have
> been asking here is if anyone in PG or DP land has actually found an applet
> that will allow them to directly download a free book from the PG website or
> other websites using wifi and read it in a manner that makes you happy.  Or
> is it necessary, as in the case of Apple's own iBooks applet, to *always*
> tie the distribution path of the applet to the applet itself?  This may seem
> like a strange question except that Apple already HAS shut down distribution
> of books by USB except via the iTunes monopoly.

Until you have at least tried the examples I went and found for you,
that allowed ME to read AND download directly from gutenberg.org....

I have nothing further to offer you on this subject.

You are leading me to believe I was correct in the extreme when that
thought came to me that you are spending more time complaining about
all this than actually doing your own research.

Please. . .get out there and do something between your messages so I
or we don't have the feeling this is a totally useless exercise from
your experimental labs.


> >As I have said so many times, I don't care who redistributes PG,
> from Tea Party people to Sarah Palin or Tina Fey. . .period.
>
> Again, I acknowledge that *you* don't care, but I do: I want to be able to
> get books and publications directly from a variety of web sites via wifi,

This does not remove any from your "variety of web sites via wifi.

I told you which Apps you could use, and you pretend I never said it.

Please go back and read it all again, do your homework, and prepare
for a real conversation.

You are NOT conversing here, you are not sharing the wealth.

Look up the roots of communicate.


> and I don't want the applet nor the device to tell me where I can get MOBI
> or ePub books from anymore than I would want a web browser tell what HTML I

I read all sorts of stuff with Safari and Opera Lite, which is your problem?

Please experiment and cite your specific examples that we can recreate.


> am allowed read from what HTML sites.  This is the ebook version of "net
> neutrality" as opposed to buying a device from Big Brother and letting Big
> Brother then tell you that you can only use that device to buy MORE product
> from Big Brother. Even Big Bill allows me to install a large variety of
> reader apps on my netbook say, right click on some ebook I see anywhere on
> the internet, and that book automagically opens in my choice of reader app.

And you never tried to install those reader apps I mentioned. . . .

So what right have you to complain?

That they didn't install themselves?

I'm sure you would be complaining even more if they did.

That they didn't SAVE the files by themselves?

Again, I am sure you would be complaining even more if they did.


What is it you want?!?!?!?

You haven't SAID you want anything I haven't found for you.

Yet you have refused to acknowledge those efforts.

No thanks means no thanks.


> >An apology for when you have been confusing is also appropriate, with
> no need to blame me or Apple.
>
> I apologize for having difficultly unambiguously discussing terminology that
> Appple chooses to be deliberately ambiguous as a cute marketing device.

Is that a "non-denial denial?"


> >It worked for me, but then I gave it a few tries.  However, the first two
> both worked, as did all the others made for iPad, though I have not tried
> each one in great detail, but enough to bring up books I know I typed in.
>
> You are I have differing ideas of what "it worked" means.  For example on a
> Kindle, which again is also not the most "unlocked" device in the world, I
> can web browse to www.gutenberg.org, click on a MOBI title there, and it

Are you saying you sent to gutenberg.org and tried this without success?

Are you telling us what program you used in that effort?

Are asking us to try this for you?

Still, I'm not sure why one brand should support another proprietary format,
but I'll try it if you ask, after you thank me for my previous efforts from
your previous questions.


> "works", or I can go to FreeKindleBooks, or to Feedbooks, etc -- my choice
> of publisher -- and if it's a "free book" I can get it -- it works.  I can't
> get it if it's a "for pay" book because Amazon has locked up that
> distribution channel -- which is not a good thing. As opposed to Nook where
> none of this works at all except the direct "for pay" path from B&N.

You need to be more specific with your requests and challenges.

You say what I mean by "it works for me" is not what you mean for you,
but you are not specific about what it is you really want.

Does every program do everything the way I want?

No.

Can I manage to get the results I want?

Yes?

Are you willing to do what it takes to get what you want?

???


> >I really hate to say this, as you'll probably accuse me of flame/trashing
> but it sounds as if you have spent more time complaining here than in the
> actual testing of the product.
>
> We spent about four hours playing with the iPad and trying to get it to do
> what we wanted it to do -- namely direct access to free ebooks on particular
> websites on the internet.

You still have refused to name what programs you tried on what sites,
and what you tried to do with them.

You also have refused to comment on the programs I have suggested already.

You are not encouraging me, or anyone else, to try to help you further.


> In that amount of time I was already writing software to freely distribute
> books on the Kindle when I got my first Kindle Dec 2007.

Tell me, honestly, have to asked Apple for the documentation on how to write
for the iPad?

There were lots of people working for months on Apps before it came out.


> >I'm sure you know that Apple wants to control how files get to iPads.
>
> Yes, the only question is just how badly "locked down" their device is in

Ah. . .now we come to the point!!!

It's not the "programs" or the "ebooks" you are complaining about,
it's "how badly 'locked down` their device is". . .!!!

See???

You weren't complaining just about DRM on eBooks. . . .

You seem to have a whole pile of axes to grind, and whenever anyone
shows you how one axe will do what you want, you switch to another,
and another:  books, programs, the device itself, Microsoft, etc.

I did ALL the things you said.

You haven't even gone back and tried ONE of them.

Yet you continue to act as if you are right in there testing.


> the matter -- and whether or not they will take steps again in the future to
> force an increase in that "lock down".

Since I haven't been able to get you to state specifics on the present,
I am certainly NOT going to engage in a purely hypothetical discussion,
on this, or perhaps even other such things you may include.

Let's deal with reality before dealing with the other stuff, ok?


> Again, what I want is the ebook version of "net neutrality" -- I want to
> have an ebook reader applet which is independent of ebook publisher. I don't
> want to have to acquire and use a different ebook reader applet for each
> book I want to purchase -- or acquire freely on the internet.  There are WAY
> more internet sites offering interesting books and other publications that
> there are organizations willing to write applets for the iPad!

Again, _I_ have surfed to "normal" eBook sites and grabbed eBooks on an iPad.

I even told you about it. . . .

I'm not sure you are paying much attention, and this is very likely to be
my last message to you on this subject, perhaps for quite some time.


> >I thought you already had managed to "try these things without sending
> my $500 to Jobs for the privilege of *testing* his offering. . . .
>
> Sure, I borrowed a friend and his iPad for four hours of lack of success
> trying various approaches after which he ran away screaming....

Just because YOU can't drive a certain car, doesn't make it undriveable.

Four hours???  And you never managed to download ONE eReader App???

Something is seriously wrong here.

Either with the experimentation or the reporting of it, or both.


> >Then how did I manage to download them from the store?
>
> Sorry again more Apple cuteness, there is the "Apple Store" virtual on the
> internet, and there is the "Apple Store" bricks and mortar at the Mall. I
> can download software from the virtual Apple Store to my desktop, but then I
> don't have a physical iPad to test it on.

You said you had one for four hours.

You never managed to do what most people do in the first ten minutes?

Including myself?


> Or I can go to the Mall where they have a physical iPad, but then I don't
> have permission to download and install applets from the virtual Apple
> Store. And I've used up my friendship for right now with the "bricks and
> mortar" friend who has a "bricks and mortar" iPad...

You are saying they won't let you test these features at the mall???

I have a strong feeling you didn't ask them for very much help.


> >You can do this with Goodreader.
>
> OK, good suggestion -- their website looks promising I will dig into it more
> -- thanks!

I did mention Goodreader earlier, did I not?

And Wattpad?

If this pretty much one line reply had been forthcoming back then,
this just might have been a totally different conversation.

Think about honey vs vinegar. . .eh?


> >Times will change, people will jailbreak iPads and all. . . .
>
> I am hoping that the future OS in the works for iPad may make things less
> restrictive. Not personally interested in hacking anything to get increased
> access. Hacking to my taste is incompatible with creating texts for PG....

And just how do think most of the great apps in history got started???

>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From lee at novomail.net  Mon Apr 19 11:07:50 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 12:07:50 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS77D0C6767F8C1A9848416AE0B0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>	<4BCC8601.3020408@novomail.net>
	<SNT120-DS77D0C6767F8C1A9848416AE0B0@phx.gbl>
Message-ID: <4BCC9BF6.8040409@novomail.net>

On 4/19/2010 11:26 AM, Jim Adcock wrote:

[snip]

> PS: I already to make preview versions of my HTML on ePub and MOBI --
> its just that the HTML->ePub and HTML->MOBI conversion software I
> have is not identical to Marcello's and thus the formatting ends up
> different than the "official" version.

If true, this is troubling. Because .epub is just a ZIP file, you should 
be able to open the archive in your favorite tool (WinZip, WinRar, 
7-Zip, PowerArchiver, whatever) or use gzip -x and extract all the 
files. The HTML file(s) should be identical to whatever the source was. 
If they differ, the differences had better be harmless (making the 
source valid XHTML, for example). If they /do/ differ in substantive 
ways, Marcello should revisit his "publishing" code.

It is possible, however, that if an .epub file looks different when 
rendered than the source HTML perhaps the archive contains a default 
stylesheet that alters the appearance.

BTW, to create a valid .epub file, start by creating an .opf file which 
describes the publication. One extracted from an existing .epub file 
should give you a good example of what is necessary. Then create a 
container.xml file that references the .opf file you created. Put this 
file in a subdirectory called "meta-inf". Lastly capture the mimetype 
file from an existing .epub.

Now, add "mimetype" to a zip file, *without compression*. Then add the 
.opf file, the content XHTML file(s), and meta-inf/container.xml. Rename 
the file to ".epub", and voil?, you have a valid .epub file. Of course 
other files can be added as well (such as font files and stylesheets), 
but they are just gilding the lily. The actual paths of the various 
files are irrelevant except for the container.xml file, which *must* be 
in the meta-inf/ folder (and of course the paths to the files must be 
correctly recorded in the .opf file).

I think it is only polite to add the .opf file to the archive second, 
and to leave it uncompressed, but that is fairly uncommon. The OCF 
specification requires that the mimetype file be the first file in the 
archive (so it can always be found at a specific byte offset), but I 
know of no .epub reader that actually enforces this requirement.

From jimad at msn.com  Mon Apr 19 11:25:43 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 11:25:43 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC9527.3000103@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
	<4BCC9527.3000103@novomail.net>
Message-ID: <SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>

>In other words, an ".epub" file is just a ".zip" file with a few
additional metadata files added. Software that purports to "convert"
HTML to .epub should not do /anything/ to the source file, except
perhaps to insure that it is valid XHTML (for older HTML files). There
is no need to validate an .epub conversion, as no conversion should have
occurred. If a rendered .epub document does not look exactly like the
same collection of files rendered by a browser from the file system, it
is the fault of the .epub rendering software, not the "conversion."

You make an interesting thesis, which, rare in the case of DP/PG arguments,
is eminently testable.  I have done so, and you clearly have not.  Take a PG
HTML zip file, say "76" for the sake of completeness. Download it, and
unpack it on your computer.  Take a PG epub "zip" file, say pg76.epub for
concreteness.  Download it, and unpack it on your computer.

Now, look at the contents.

Do they have the same HTML files?

No they do not.

Do the have the same number of HTML files?

No they do not.

Are the contents of the HTML files identical?

No they are not.

For the sake of completeness, open the first HTML file of each.  Do the
files RENDER the same on your browser when you actually TRY them to see if
your thesis is correct?

No they do not RENDER the same.

It is an interesting thesis that PG epub files are "just" a zipped version
of the PG HTML files -- but it is an easily demonstrably false thesis.
Marcello's epub software does more than "just" pack the HTML files into an
epub package.  Ask him for a copy of his converter software, and see what
the conversion actually entails.  And/or ask Marcello what conversions he
actually does to move from the HTML version to the epub version.

Thus again, I suggest that it would be a good idea to have a portable
version of Marcello's epub conversion software that we could use for testing
on our local machines.  Given a portable version of the epub conversion
software going to mobi is easy using the same Amazon/Mobipocket provided
epub->mobi conversion software that Marcello is already using.


From hart at pglaf.org  Mon Apr 19 11:30:30 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 11:30:30 -0700 (PDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
	<4BCC9527.3000103@novomail.net>
	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>


If only Jim would have been as thorough, and polite, about the iPad.

mh


On Mon, 19 Apr 2010, Jim Adcock wrote:

> >In other words, an ".epub" file is just a ".zip" file with a few
> additional metadata files added. Software that purports to "convert"
> HTML to .epub should not do /anything/ to the source file, except
> perhaps to insure that it is valid XHTML (for older HTML files). There
> is no need to validate an .epub conversion, as no conversion should have
> occurred. If a rendered .epub document does not look exactly like the
> same collection of files rendered by a browser from the file system, it
> is the fault of the .epub rendering software, not the "conversion."
>
> You make an interesting thesis, which, rare in the case of DP/PG arguments,
> is eminently testable.  I have done so, and you clearly have not.  Take a PG
> HTML zip file, say "76" for the sake of completeness. Download it, and
> unpack it on your computer.  Take a PG epub "zip" file, say pg76.epub for
> concreteness.  Download it, and unpack it on your computer.
>
> Now, look at the contents.
>
> Do they have the same HTML files?
>
> No they do not.
>
> Do the have the same number of HTML files?
>
> No they do not.
>
> Are the contents of the HTML files identical?
>
> No they are not.
>
> For the sake of completeness, open the first HTML file of each.  Do the
> files RENDER the same on your browser when you actually TRY them to see if
> your thesis is correct?
>
> No they do not RENDER the same.
>
> It is an interesting thesis that PG epub files are "just" a zipped version
> of the PG HTML files -- but it is an easily demonstrably false thesis.
> Marcello's epub software does more than "just" pack the HTML files into an
> epub package.  Ask him for a copy of his converter software, and see what
> the conversion actually entails.  And/or ask Marcello what conversions he
> actually does to move from the HTML version to the epub version.
>
> Thus again, I suggest that it would be a good idea to have a portable
> version of Marcello's epub conversion software that we could use for testing
> on our local machines.  Given a portable version of the epub conversion
> software going to mobi is easy using the same Amazon/Mobipocket provided
> epub->mobi conversion software that Marcello is already using.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From lee at novomail.net  Mon Apr 19 12:05:17 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 13:05:17 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>
Message-ID: <4BCCA96D.4030803@novomail.net>

On 4/19/2010 12:30 PM, Michael S. Hart wrote:
>
> If only Jim would have been as thorough, and polite, about the iPad.
>
> mh

Hmmm, I thought he was ...


From jimad at msn.com  Mon Apr 19 12:14:14 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 12:14:14 -0700
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
Message-ID: <SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>


>Apple has assured me over and over there is no DRM on our files.

And I said it wouldn't matter if they have DRM on your files or not as long
as they prevent you from moving the files from applet to applet, and/or
prevent you from sharing the files with your friends, because then they have
accomplished the same goals as DRM without actually implementing the DRM.

>Until you have at least tried the examples I went and found for you,
that allowed ME to read AND download directly from gutenberg.org....

Well, I just spent about an hour reading all the manuals for goodreader --
the applet you most recommended -- and it talks about supporting PDF not
epub nor mobi and it talks about why when you try to read a big book it
crashes, and how if you want to set up wifi to talk to your computer then
that kills wifi to the internet, etc.  So forgive me if I am not impressed.
I also looked up everything else on the iTunes store listed under "ebooks"
or "books" and those apps are even weaker.  So clearly we are living on
non-parallel planets!

>I read all sorts of stuff with Safari and Opera Lite, which is your
problem?

I told you the first thing I tried doing at the Apple bricks and mortor
store was to go to PG in the Safari Browser, clicked on an epub link, and it
says "Sorry downloading that file type is not allowed."


>Please experiment and cite your specific examples that we can recreate.

Please use your Safari browser, go to PG, pick an epub link, click on it,
and report back what happens on the iPad. When I tried this at the Apple
Brick and Mortor store iPad says "I'm sorry but I can't do that Hal".

For comparison, I go on my desktop using IE or Mozilla, click on an epub or
mobi link and that book opens automatically in its appropriate ebook reader,
just the same as clicking on a PDF file causes that file to open in Adobe
Reader.  Or clicking on a djvu file opens it in a LizardTech djvu ebook
reader.

For comparison, on Kindle I go to PG, I click on a mobi link and it says "Do
You Want to Download This Book?" I say "Yes" that book shows up in my Kindle
bookshelf, where I click on the book and read it any time I want.

>And you never tried to install those reader apps I mentioned. . . .
>So what right have you to complain?

I complain because every time you suggest something where I have to spend my
$500 up front only to determine that indeed what I said doesn't work doesn't
work.  If I spend the $500 and sure enough it doesn't work are *you* going
to offer me my money back??? Sure I know that iPad has Safari that can read
HTML but I don't want to read HTML.  I want to read ePub or Mobi on a decent
ebook reader which will allow me to set things like font sizes and margins.

>What is it you want?!?!?!?
>You haven't SAID you want anything I haven't found for you.
>Yet you have refused to acknowledge those efforts.

I have checked them out and at least according to their own documentation
they don't work. What I want is a slate like device with wifi where I can
download epubs and mobis from the internet or from my intranet, read them,
perhaps lightly edit or annotate them, and I want to be able to do so as
seamlessly and as painlessly as from my netbook -- given that a slate is
simply a netbook minus the keyboard.

>Are you saying you sent to gutenberg.org and tried this without success?

Yes.

>Are you telling us what program you used in that effort?

Safari

>Are you willing to do what it takes to get what you want?

I already have done so three different ways:

1) Using a desktop.
2) Using a netbook.
3) Using a Kindle.

The question then is NOT whether I can find iPad "workaround" to get to some
subset of what someone might be doing somewhere in the ebook world.  The
question is whether or not there is some iPad reader app that allows at
least as good and as complete an experience as I am already experiencing via
1) 2) 3) above. 1) has the problem that its not portable. 2) has the problem
that it has a keyboard that gets in the way. 3) has the problem that it has
slow and unreliable whispernet rather than fast and reliable wifi.  Is iPad
better?  Presumable not, or you would not keep emphasizing work-arounds.
Perhaps when HP comes out with the Slate it will be "unlocked."  Perhaps
not.  But I'm not going to pay $500 for the privilege of hack work-arounds!

>You still have refused to name what programs you tried on what sites,
and what you tried to do with them.

I think I've told you, actually. When I say I used the web browser, I think
its pretty obvious that the web browser on iPad is Safari?  I told you we
used iBooks, because we both discussed the PG limitations of what is there.
I told you we tried Stanza, because I told you about the large blurry iPod
simulator that brought up.  I told you I spent an hour reading the
Goodreader documentation about crashes and having to reconfigure ones
computer and router to either support reading from the internet or from a
local computer, and having to reconfigure to switch between the two...

>Tell me, honestly, have to asked Apple for the documentation on how to
write
for the iPad?

I have researched the issue of developing for Apple, yes, and was turned off
having to pay subscription fees up front.  Even Big Bill doesn't require
that.

>It's not the "programs" or the "ebooks" you are complaining about,
it's "how badly 'locked down` their device is". . .!!!

Same thing since they lock the books to the programs...

>You haven't even gone back and tried ONE of them.

Again, how would I test them more than I have already tested them without
spending my $500 up front?

>Let's deal with reality before dealing with the other stuff, ok?

The reality is that people had bought iPods using Stanza and expecting to be
able to share books and Apple took this away from them.  Same "1984" kind of
deal as the student who had purchased "1984" for their Kindle, was relying
on that to do his homework, and without warning Amazon took off the
purchased book without permission.

>Four hours???  And you never managed to download ONE eReader App???

Sure we did, I told you we downloaded Stanza.

>I have a strong feeling you didn't ask them for very much help.

There wasn't much help to be had, truth be told.  I will go back and see if
they will allow me to install Goodreader, since that is your top suggestion.

>I did mention Goodreader earlier, did I not?

Perhaps, but you didn't mention that it could download directly from any
particular website, in fact you have said repeatedly you don't care if it
can download from any particular website.

>And just how do think most of the great apps in history got started???

Most of them got started somewhere where a mere say-so from Steve Jobs isn't
enough to get them *stopped!!!*


From Bowerbird at aol.com  Mon Apr 19 12:18:38 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 Apr 2010 15:18:38 EDT
Subject: [gutvol-d] the blind men and the .epub file-format
Message-ID: <8ce40.74ecf6f2.38fe068e@aol.com>

it's sad y'all know so much, yet so little, all at the same time.

lee is correct.   but he doesn't know what he's talking about.

jim is correct.   but he doesn't know what he's talking about.

as lee says, an .epub file is just some (x)html files zipped up.

as jim says, the .epub files at p.g. often differ from the .html.

what neither one seems to know is that marcello's converter
doesn't always use the .html; sometimes it uses the .txt file.
i don't know the particulars, but it probably has something
to do with the nature of the specifics within the .html file...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/ebe14c8f/attachment.html>

From Bowerbird at aol.com  Mon Apr 19 12:29:15 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 Apr 2010 15:29:15 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <470.723ebfcf.38fe090b@aol.com>

jim said:
>    I want to read ePub or Mobi on a decent ebook reader 
>    which will allow me to set things like font sizes and margins.

try ibisreader.

>    http://ibisreader.com

you can try it on your desktop machine to see if you like it.
and it's on the ipad.   plus it is under _active_ development.

i don't know if "eucalyptus" is ipad-native yet, but when it is,
i would definitely recommend that as a worthwhile viewer-app.
(it has shortcomings, but from a reading perspective, it's fine.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/7864925c/attachment-0001.html>

From jimad at msn.com  Mon Apr 19 12:48:46 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 19 Apr 2010 12:48:46 -0700
Subject: [gutvol-d] Re: the blind men and the .epub file-format
In-Reply-To: <8ce40.74ecf6f2.38fe068e@aol.com>
References: <8ce40.74ecf6f2.38fe068e@aol.com>
Message-ID: <SNT120-DS1481CF1FB8469BB4317E08AE0B0@phx.gbl>

>what neither one seems to know is that marcello's converter doesn't always
use the .html; sometimes it uses the .txt file.
i don't know the particulars, but it probably has something to do with the
nature of the specifics within the .html file...

Not sure what part of the elephant you've grabbed hold of, but if you looked
at the example in question it would be obvious that your answer isn't.


From lee at novomail.net  Mon Apr 19 13:18:17 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 14:18:17 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>
	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
Message-ID: <4BCCBA89.2040103@novomail.net>

On 4/19/2010 12:25 PM, Jim Adcock wrote:

>> In other words, an ".epub" file is just a ".zip" file with a few
> additional metadata files added. Software that purports to "convert"
> HTML to .epub should not do /anything/ to the source file, except
> perhaps to insure that it is valid XHTML (for older HTML files). There
> is no need to validate an .epub conversion, as no conversion should have
> occurred. If a rendered .epub document does not look exactly like the
> same collection of files rendered by a browser from the file system, it
> is the fault of the .epub rendering software, not the "conversion."
>
> You make an interesting thesis, which, rare in the case of DP/PG arguments,
> is eminently testable.  I have done so, and you clearly have not.  Take a PG
> HTML zip file, say "76" for the sake of completeness. Download it, and
> unpack it on your computer.  Take a PG epub "zip" file, say pg76.epub for
> concreteness.  Download it, and unpack it on your computer.
>
> Now, look at the contents.
>
> Do they have the same HTML files?

Yes, they do. The file names have been altered, but the content is 
virtually the same.

[snip]

> Do the have the same number of HTML files?

Yes they do. Each has eight parts plus the godawful and legally 
unnecessary PG header (Apple is doing the world a favor by stripping it 
away.

[snip]

> Are the contents of the HTML files identical?
>
> No they are not.

No, they are not. Mr. Perathoner's files 1.) have been converted from 
ISO-8859 to Unicode/UTF-8; 2.) have extracted the internal style sheets 
into external style sheets; 3.) have added a links to a "center contents 
pages" and generic "pgepub" stylesheet; 4.) have added "id" attributes 
for use by .epub user agents for navigation; and 5.) have changed all 
the internal links to match the file paths inside his archive.

All of these steps, except #3, are harmless and do not affect the 
presentation of the content. Indeed, with the exception of centering the 
tables they are probably all desirable things to do.

> For the sake of completeness, open the first HTML file of each.  Do the
> files RENDER the same on your browser when you actually TRY them to see if
> your thesis is correct?
>
> No they do not RENDER the same.

First of all, it is your thesis not mine. I rarely, if ever, download 
files from PG; instead I get them from some other source where the 
quality of the files has more importance.

But you are correct, with an unaltered archive they do /not/ render the 
same. However, if you delete the "pgepub.css" file, or delete its 
contents, they /do/ render the same with the exception of the centered 
tables of contents. If you delete all the odd numbered .css files, then 
they /do/ render identically.

This is, of course, exactly why embedding style information inside an 
HTML file is a bad thing (you can't change the presentation without 
editing the HTML) and including a link to a generic stylesheet is a good 
thing (just find the stylesheet you like, copy it over the top of the 
generic one, and voil?, your book, your way). All of this can be 
accomplished by using a visual zip tool, and without ever having to edit 
a file (other than your zipper).

Although we definitely need to talk Mr. Perathoner out of adding a link 
to a "center me" style sheet.

> It is an interesting thesis that PG epub files are "just" a zipped version
> of the PG HTML files -- but it is an easily demonstrably false thesis.

I never said that /PG/ .epub files are just a zipped version of /PG/ 
HTML files; I said that technically conforming .epub files are just 
zipped versions of their source HTML files. It is certainly possible to 
take an HTML file, alter it, and make an .epub file from the newly 
altered file. Personally, I would view that as a flaw in the conversion 
software, though, and independent of the issue of .epub encapsulation.

> Marcello's epub software does more than "just" pack the HTML files into an
> epub package.  Ask him for a copy of his converter software, and see what
> the conversion actually entails.  And/or ask Marcello what conversions he
> actually does to move from the HTML version to the epub version.

True. Apparently, Mr. Perathoner's software extracts embedded CSS 
information and moves it to an external style sheet (as it should), 
creates a "<div class='c1'>" around the tables of contents and 
illustrations, with a corresponding style sheet that centers the 
contents (which it should not), and adds a link to a generic "pgepub" 
style sheet (as it should), in addition to altering names for navigation 
purposes.

Now apparently, your complaint is not that PG HTML does not make good 
.epub files, or that including a generic stylesheet "breaks" the 
".epub", but that you don't like the .epub generator that Mr. Perathoner 
wrote. That complaint, with which I sympathize, needs to be directed to 
him individually; it cannot, however, be generalized to /all/ .epub 
files, only those created by his software.

> Thus again, I suggest that it would be a good idea to have a portable
> version of Marcello's epub conversion software that we could use for testing
> on our local machines.  Given a portable version of the epub conversion
> software going to mobi is easy using the same Amazon/Mobipocket provided
> epub->mobi conversion software that Marcello is already using.

From Bowerbird at aol.com  Mon Apr 19 13:49:32 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 19 Apr 2010 16:49:32 EDT
Subject: [gutvol-d] Re: the blind men and the .epub file-format
Message-ID: <6fba.24226b0f.38fe1bdc@aol.com>

jim said:
>   Not sure what part of the elephant you've grabbed hold of, 
>    but if you looked at the example in question it would be 
>    obvious that your answer isn't.

do you really think i'm going to let myself get wrestled into
having a discussion with the blind men about the elephant?

if so, think again.

i said as much as i can in support of you -- you're half-right.
so is lee...   if you want to figure out the specifics from there,
you're welcome to do so.   but you'll be doing that without me.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100419/29a5dade/attachment.html>

From hart at pglaf.org  Mon Apr 19 14:37:38 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 14:37:38 -0700 (PDT)
Subject: [gutvol-d] Re: [SPAM] RE:  Re: Typesetting
In-Reply-To: <SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>


On Mon, 19 Apr 2010, Jim Adcock wrote:

>
> >Apple has assured me over and over there is no DRM on our files.
>
> And I said it wouldn't matter if they have DRM on your files or not as long
> as they prevent you from moving the files from applet to applet, and/or
> prevent you from sharing the files with your friends, because then they have
> accomplished the same goals as DRM without actually implementing the DRM.

This is not where you started, and I doubt it's where either you or iPad end.


> >Until you have at least tried the examples I went and found for you,
> that allowed ME to read AND download directly from gutenberg.org....
>
> Well, I just spent about an hour reading all the manuals for goodreader --
> the applet you most recommended -- and it talks about supporting PDF not
> epub nor mobi and it talks about why when you try to read a big book it

Is The Bible a big enough book for you?  Shall I test that for you?

Once again, I repeat, I don't care about any one particular format,
so please stop pretending this is a valid topic with me, instead, I
ask you to take that to a different subject header so I do not have
to keep getting bonked over the head with it, however cartoonish.


> crashes, and how if you want to set up wifi to talk to your computer then
> that kills wifi to the internet, etc.

And just how many systems do you have, or know of, that wifi to two wifi
spots at the same time???

I have never even tried this, so I am very interested.

I hope it isn't just one more dead herring dragged across this pathway--
a pathway that still continues to weave madly across times zones.


> So forgive me if I am not impressed. I also looked up everything else on the
> iTunes store listed under "ebooks" or "books" and those apps are even
> weaker.  So clearly we are living on non-parallel planets!

Sorry, reading the manual and looking up Apps does not qualify you with
any actual experience on the subject.

Not to mention that YOU did not mention, once again, the actual NAME of
the products you are claiming so much expertise about.

I'm willing to bet you can't even name the handful I downloaded or much
less the entire list available.

This would indicate you don't actually know the options available for a
selection of programs, not to mention those that are not spelled out in
detail in what you say you have been reading.

One of my favorite quotes of all time comes to mind here:

"Don't Confuse The Map With The Territory."


> >I read all sorts of stuff with Safari and Opera Lite, which is your
> problem?
>
> I told you the first thing I tried doing at the Apple bricks and mortor
> store was to go to PG in the Safari Browser, clicked on an epub link, and it
> says "Sorry downloading that file type is not allowed."

And you didn't go any farther with your experimentation?

I tried other options, got other results.

However, I will try what you said, as well.


> >Please experiment and cite your specific examples that we can recreate.
>
> Please use your Safari browser, go to PG, pick an epub link, click on it,
> and report back what happens on the iPad. When I tried this at the Apple
> Brick and Mortor store iPad says "I'm sorry but I can't do that Hal".

I have no desire to go to any Apple Brick and Mortar store, but I will try
from general wifi hookups.

Nevertheless, if, at the end of all this, you send me a list of questions,
comments, complaints, etc., I will try to go get you answers from Apple.

Fair enough?


> For comparison, I go on my desktop using IE or Mozilla, click on an epub or
> mobi link and that book opens automatically in its appropriate ebook reader,
> just the same as clicking on a PDF file causes that file to open in Adobe
> Reader.  Or clicking on a djvu file opens it in a LizardTech djvu ebook
> reader.

> For comparison, on Kindle I go to PG, I click on a mobi link and it says "Do
> You Want to Download This Book?" I say "Yes" that book shows up in my Kindle
> bookshelf, where I click on the book and read it any time I want.

And hasn't Apple made it totally obvious you can't do with with an iPad???

Yet you continue to complain that they have Apple when you want Orange???

Yet, I have found plenty of ways to get that kind of end result.

No matter what your reading of manuals and reports might have said.

BTW, I haven't been able to make the iPad crash yet, even with The Bible.


> >And you never tried to install those reader apps I mentioned. . . .
> >So what right have you to complain?


About the below:

Now you resort to putting words in my mouth, like in high school days.

I have never tried to "suggest something where [you] have to spend [your]
$500 up front only to. . ." particularly when you have made it obvious it
it the case that your mind is closed to a variety of options.

> I complain because every time you suggest something where I have to spend my
> $500 up front only to determine that indeed what I said doesn't work doesn't
> work.  If I spend the $500 and sure enough it doesn't work are *you* going
> to offer me my money back??? Sure I know that iPad has Safari that can read
> HTML but I don't want to read HTML.  I want to read ePub or Mobi on a decent
> ebook reader which will allow me to set things like font sizes and margins.

You never figured out how "to set things like font sizes and margins?"

I'm beginning to wonder just what you did with your four hours. . . .


> >What is it you want?!?!?!?
> >You haven't SAID you want anything I haven't found for you.
> >Yet you have refused to acknowledge those efforts.
>
> I have checked them out and at least according to their own documentation

No, you haven't. . .not what most people mean. . .you never went back to
see how reality compares with the docs.


> they don't work. What I want is a slate like device with wifi where I can
> download epubs and mobis from the internet or from my intranet, read them,
> perhaps lightly edit or annotate them, and I want to be able to do so as
> seamlessly and as painlessly as from my netbook -- given that a slate is
> simply a netbook minus the keyboard.

Ah, now, at this late stage, you have added that you want to edit eBooks
on the iPad.


> >Are you saying you sent to gutenberg.org and tried this without success?
>
> Yes.
>
> >Are you telling us what program you used in that effort?
>
> Safari
>
> >Are you willing to do what it takes to get what you want?
>
> I already have done so three different ways:
>
> 1) Using a desktop.
> 2) Using a netbook.

Not any different in this respect, just padding your bibliography.


> 3) Using a Kindle.

If you spent the same four hours' worth on a Kindle, and liked it
so much you were already programming with it, I can't imagine why
you are having this conversation at all.

Unless it is just to moan and complain in front of an audience to
somehow "get even" with Apple for being. . .well. . .Apple.

You like Microsoft and Kindle. . .go. . .Bon Voyage!!!


> The question then is NOT whether I can find iPad "workaround" to get to some
> subset of what someone might be doing somewhere in the ebook world.  The

Sorry, but that is pretty much the entire essence of running computers.

I'm betting you have just forgotten the steep learning curve you climbed
to get to know the ones you now say you like.

I'll bet you ranted and raved about them just like you are doing now!!!


I did.


;-)


> question is whether or not there is some iPad reader app that allows at
> least as good and as complete an experience as I am already experiencing via
> 1) 2) 3) above. 1) has the problem that its not portable. 2) has the problem
> that it has a keyboard that gets in the way. 3) has the problem that it has
> slow and unreliable whispernet rather than fast and reliable wifi.  Is iPad
> better?  Presumable not, or you would not keep emphasizing work-arounds.
> Perhaps when HP comes out with the Slate it will be "unlocked."  Perhaps
> not.  But I'm not going to pay $500 for the privilege of hack work-arounds!

Sure you are!

You do it every time you buy a computer, or somebody pays for you to use.

It's all built on that sort of thing.

Get used to it.

Don't ever look under the hood, you will be terribly disappointed as to a
plethora of "hacks and workarounds" that make every bit of this work.


> >You still have refused to name what programs you tried on what sites,
> and what you tried to do with them.
>
> I think I've told you, actually. When I say I used the web browser, I think
> its pretty obvious that the web browser on iPad is Safari?  I told you we
> used iBooks, because we both discussed the PG limitations of what is there.

At first you denied using iBooks at all, don't you remember???


> I told you we tried Stanza, because I told you about the large blurry iPod

Actually, you spoke of that as if it were a hypothetical, so there was quite
literally no way to know you had actually tried it or not, or were reporting
once again what various manuals and reviews told you.


> simulator that brought up.  I told you I spent an hour reading the
> Goodreader documentation about crashes and having to reconfigure ones
> computer and router to either support reading from the internet or from a
> local computer, and having to reconfigure to switch between the two...

Same with your cell phone and most other such devices.

However, if all your systems use plain 801n, or 801g, should be no problem,
it certainly hasn't been for me.

>
> >Tell me, honestly, have to asked Apple for the documentation on how to
> write for the iPad?
>
> I have researched the issue of developing for Apple, yes, and was turned off
> having to pay subscription fees up front.  Even Big Bill doesn't require
> that.

So, you are admitting you never asked for what you didn't get.

"You Never Know What You Might Get If You Don't Ask For It."


> >It's not the "programs" or the "ebooks" you are complaining about,
> it's "how badly 'locked down` their device is". . .!!!
>
> Same thing since they lock the books to the programs...

Not, it's not the same thing.  Learn to speak specifically when you
say such things, ask such questions, etc.

Otherwise you are just wasting a lot of people's time.


> >You haven't even gone back and tried ONE of them.
>
> Again, how would I test them more than I have already tested them without
> spending my $500 up front?

Gee, I would think that obvious to someone who already did it once before.


> >Let's deal with reality before dealing with the other stuff, ok?
>
> The reality is that people had bought iPods using Stanza and expecting to be
> able to share books and Apple took this away from them.  Same "1984" kind of
> deal as the student who had purchased "1984" for their Kindle, was relying
> on that to do his homework, and without warning Amazon took off the
> purchased book without permission.

You seem to be bringing up something new, and of great interest.

I'm sure we'd all like to hear more about this!!!!!!!


> >Four hours???  And you never managed to download ONE eReader App???
>
> Sure we did, I told you we downloaded Stanza.

For the iPad specifically!

Now, just above, you said you were using iBooks, doesn't that count?


> >I have a strong feeling you didn't ask them for very much help.
>
> There wasn't much help to be had, truth be told.  I will go back and see if
> they will allow me to install Goodreader, since that is your top suggestion.

You keep short-changing Wattpad, which I think I mentioned first.

When at the Apps Store, just search for "ebooks" and "books" etc.

How many times have I said that???


> >I did mention Goodreader earlier, did I not?
>
> Perhaps, but you didn't mention that it could download directly from any
> particular website, in fact you have said repeatedly you don't care if it
> can download from any particular website.

You download it from the Apps Store. . . .


> >And just how do think most of the great apps in history got started???
>
> Most of them got started somewhere where a mere say-so from Steve Jobs isn't
> enough to get them *stopped!!!*

No, they just worked around their current version of Steve Jobs, such as
working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony
and all the rest. . . .

>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From lee at novomail.net  Mon Apr 19 14:51:13 2010
From: lee at novomail.net (Lee Passey)
Date: Mon, 19 Apr 2010 15:51:13 -0600
Subject: [gutvol-d] Re: the blind men and the .epub file-format
In-Reply-To: <8ce40.74ecf6f2.38fe068e@aol.com>
References: <8ce40.74ecf6f2.38fe068e@aol.com>
Message-ID: <4BCCD051.7060508@novomail.net>

On 4/19/2010 1:18 PM, Bowerbird at aol.com wrote:

[snip]

> what neither one seems to know is that marcello's converter
> doesn't always use the .html; sometimes it uses the .txt file.
> i don't know the particulars, but it probably has something
> to do with the nature of the specifics within the .html file...

bb is correct when he suggests that sometimes the .epub file is 2 
generations removed from the Impoverished Text file. If there is no 
hand-crafted HTML file, there is an option to download a 
computer-generated HTML file.

If you were to download an .epub file for one of these texts for which 
only ITF is stored (I used _War of the Worlds_, etext 35) you would see 
that the internal HTML differs from the computer-generated HTML only by 
the fact that the computer-generated HTML contains the metadata in 
<meta> elements whereas the .epub contains the metadata in the 
content.opf file, and by the fact that the .epub file contains a link to 
"pgepub.css" whereas the computer-generated HTML does not (why not? what 
harm would it do? For that matter, why not leave the metadata in the 
HTML file as well?).

Presumably .epub generation is a linked process whereby ITF is converted 
to HTML which is then encapsulated in the OCF. Because native HTML is 
relative uncommon at PG, I would guess that most .epub files start the 
process as ITF.

From hart at pglaf.org  Mon Apr 19 17:35:43 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Mon, 19 Apr 2010 17:35:43 -0700 (PDT)
Subject: [gutvol-d] Dim View
In-Reply-To: <SNT120-DS1481CF1FB8469BB4317E08AE0B0@phx.gbl>
References: <8ce40.74ecf6f2.38fe068e@aol.com>
	<SNT120-DS1481CF1FB8469BB4317E08AE0B0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004191733520.23510@mail.pglaf.org>


I, myself, am taking a dim view of some of these conversations,
in re: that I am perhaps taking them more seriously than deserved.

So, unless there are some requests for further comments, I intend
my future comments to be more limited in seriousness and scope.


mh


From marcello at perathoner.de  Tue Apr 20 04:10:28 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 Apr 2010 13:10:28 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCCBA89.2040103@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<4BCCBA89.2040103@novomail.net>
Message-ID: <4BCD8BA4.6070803@perathoner.de>

Lee Passey wrote:

> creates a "<div class='c1'>" around the tables of contents and 
> illustrations, with a corresponding style sheet that centers the 
> contents (which it should not), 

HTML Tidy does that. Direct your complaints to the w3c.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From tunelera at yahoo.com  Tue Apr 20 07:55:37 2010
From: tunelera at yahoo.com (Julia C. Miller)
Date: Tue, 20 Apr 2010 09:55:37 -0500
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCC9478.5040204@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>
	<4BCC9478.5040204@perathoner.de>
Message-ID: <4BCDC069.2050608@yahoo.com>


On 4/19/2010 12:35 PM, Marcello Perathoner wrote:
> Julia C. Miller wrote:
>
>> In order for a "paradigm shift" to happen at DP, PG has to define 
>> what is and is not acceptable in the HTML and spell it out so that DP 
>> can put it into practice. 
>
> It would be much better if DP did that.
>
So after DP goes through the time and effort to define the standards to 
upload to PG, people from PG can say "No, that's not what we want"?

>
>> It would also be extremely helpful to have a way to preview the 
>> different output formats so we can test our finished HTML and make 
>> sure it works properly not only as HTML but also as the source for 
>> the other formats.
>
> Roger Frank has the converter and did extensive testing on it.
>

Yes, Roger has the converter and his discussion of the changes that need 
to be made so the conversion to ePub works properly was very helpful. I 
used what I learned in that thread in the last 8 books that I have 
uploaded. But I am working on books right now that I know will not 
convert properly (based on what I have learned from Roger's discussion). 
I would like to be able to preview, change the coding and preview again 
until I find a satisfactory solution.


From marcello at perathoner.de  Tue Apr 20 08:26:44 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Tue, 20 Apr 2010 17:26:44 +0200
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCDC069.2050608@yahoo.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>	<4BCC9478.5040204@perathoner.de>
	<4BCDC069.2050608@yahoo.com>
Message-ID: <4BCDC7B4.9080605@perathoner.de>

Julia C. Miller wrote:

> So after DP goes through the time and effort to define the standards to 
> upload to PG, people from PG can say "No, that's not what we want"?

I don't see any danger of that as long as the new standards are more 
restrictive than the old ones. And they'd have to be a lot more 
restrictive to be worth the trouble of implementing them.


> Yes, Roger has the converter and his discussion of the changes that need 
> to be made so the conversion to ePub works properly was very helpful. I 
> used what I learned in that thread in the last 8 books that I have 
> uploaded. But I am working on books right now that I know will not 
> convert properly (based on what I have learned from Roger's discussion). 
> I would like to be able to preview, change the coding and preview again 
> until I find a satisfactory solution.

The sources are online ... But me being a 100% linux shop and ibiblio 
being a 100% linux shop and with 99% of you wanting a windows software 
somebody has to take the time and port it.

OTOH the converter is just one link in the chain. You'd also have to 
test the ePub on every reader out there.

Its much easier to forget about fancy formatting and use only the 
simplest HTML constructs.


-- 
Marcello Perathoner
webmaster at gutenberg.org

From dakretz at gmail.com  Tue Apr 20 10:38:01 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 20 Apr 2010 10:38:01 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCDC7B4.9080605@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
	<4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de>
	<4BCDC069.2050608@yahoo.com> <4BCDC7B4.9080605@perathoner.de>
Message-ID: <q2t627d59b81004201038i8829e3a6l82b51198c08ca60b@mail.gmail.com>

Roger is also a 100% linux shop. Well, that may not be entirely true - he
probably uses Solarix and various unices.

On Tue, Apr 20, 2010 at 8:26 AM, Marcello Perathoner <marcello at perathoner.de
> wrote:

> Julia C. Miller wrote:
>
>  So after DP goes through the time and effort to define the standards to
>> upload to PG, people from PG can say "No, that's not what we want"?
>>
>
> I don't see any danger of that as long as the new standards are more
> restrictive than the old ones. And they'd have to be a lot more restrictive
> to be worth the trouble of implementing them.
>
>
>
>  Yes, Roger has the converter and his discussion of the changes that need
>> to be made so the conversion to ePub works properly was very helpful. I used
>> what I learned in that thread in the last 8 books that I have uploaded. But
>> I am working on books right now that I know will not convert properly (based
>> on what I have learned from Roger's discussion). I would like to be able to
>> preview, change the coding and preview again until I find a satisfactory
>> solution.
>>
>
> The sources are online ... But me being a 100% linux shop and ibiblio being
> a 100% linux shop and with 99% of you wanting a windows software somebody
> has to take the time and port it.
>
> OTOH the converter is just one link in the chain. You'd also have to test
> the ePub on every reader out there.
>
> Its much easier to forget about fancy formatting and use only the simplest
> HTML constructs.
>
>
>
> --
> Marcello Perathoner
> webmaster at gutenberg.org
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/e5156650/attachment.html>

From jimad at msn.com  Tue Apr 20 11:41:02 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 11:41:02 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>
Message-ID: <SNT120-DS17689D5DA8EADF9A4E6ACAE0A0@phx.gbl>

>If only Jim would have been as thorough, and polite, about the iPad.

I don't know what your problem is, but per your suggestions I went back to
the Apple "brick and mortar" store yesterday, and spent an additional three+
hours researching all the iPad suggestions you made.  None of them "worked"
as you suggested.  None of them allow direct access to allow one to read
ebook files either ePub or MOBI via wifi access to the internet.  None of
them allow latest access to the most recent books in ePub format released on
PG.  What almost all of them do is allow access to an internet server tied
to that particular applet that allows one to read some subset of the PG
offerings in some degraded form, typically a slightly spruced up "pretty
print" version of a text file. This can simply be checked by searching for
one of the latest PG books in which case you will find that NONE of these
applets offer the latest PG books.

Unlike most full function web browsers on desktops or even netbooks, if one
uses the provided Safari web browser to click on an ePub or MOBI file Safari
says "You cannot download that file."  You CAN use Safari to open a PDF
file, which makes iPad useful for reading Google Books PDF "photocopies" of
books.  Just not useful for what PG offers.  Also even IF you use Safari to
open a PDF file, apparently Safari does not retain a copy of that book
because when I run Safari again it redownloads that PDF file again from
scratch.  Why does one care?  Well, that's slow, and it means that this
doesn't work in "airplane mode" ie you cannot read the PDF file on an
airplane via Safari with the wifi turned off.  The ability to actually
download and save book files is a fundamental feature of "real" book
readers, and in the design of "real" ebook formats such as ePub and MOBI,
such that when you download a book, you HAVE that book, and can then read it
wherever and whenever you choose without requiring wifi or other wired
connection.

What the Apple manuals say (finally released yesterday and read last night)
is that one is allowed to transfer free book files in ePub format via USB
cable from your desktop to your iPad via iTunes.  You download a free ePub
book to your desktop.  From there you transfer it to the iTunes software.
Then you hook up your iPad using a USB.  Then you sync to iTunes.  Then you
safely unplug your iPad.  Then you open iBooks.  Then you find the new book
in the iBooks "shelf" which you can then finally click on and start reading.


As opposed to one click on a link to a free ePub or MOBI book link say at PG
using a netbook browser, which downloads the file, stores it, opens the
reader app and you are up and reading.  1 second verses 10 minutes of hassle
factor.

Further, the iPad manuals say that Apple has *permanently* given Apple
applets priority over other applets for the file types that the iPad
supports.  IE ePub type is "hardwired" to the iBooks applet such if you
transfer an ePub file via the long-winded iTunes USB "sync" process then one
can only read that ePub using the iBooks app -- which is a pretty weak app
compared to other ePub and MOBI readers if one has made the comparison.
[Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and
allowed no other browser choice! Can you say "Monopoly," I knew you could]

What CAN iPad do?

It can reasonably present paid books from Apple on iBooks (not the greatest
reader app, but not too horrible either)

It can reasonably present a free subset of PG's offerings repackaged as-if
they came from Apple on iBooks

It can reasonably present paid books from Amazon via Kindle for iPad

It can reasonably present a free subset of PG's offerings repackaged as-if
they came from Amazon via Kindle for iPad

If you have already bought books for a Kindle then Kindle for iPad will also
allow you to read them for no additional cost on iPad

It can reasonably present PDF books and documents via Safari as long as you
have an active wifi connection

It can store and allow you to read free ePub and other common document
formats that you have transferred to iPad using the slow and cumbersome
Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot
test it in the Apple store because they don't have USB to desktop set up]

Is this all good or bad?  It depends on what you want to do.  If you simply
want to be a passive consumer of content, similar to watching TV from your
cable provider, then maybe its fine.  If you want to be a CREATOR of
content, such as someone who helps DP, SRs books from DP, "solos" books for
PG, etc, then it's a pretty weak offering -- IMHO you would be much better
off putting up with the hassles of a netbook which DOES allow one to quickly
and painlessly transfer content using wifi. And if you are a reader omnivore
like I am, then you will probably rapidly get sick of the Job's monopolistic
restrictions constantly getting in the way of your ability to quickly and
easily download What you want from Where you want reading it with Whatever
reader applet YOU damned well choose -- NOT Steve Jobs!

Other reasonable approaches: 

Wait for the HP Slate and see how cobbled-up its touch abilities are.  At
least it offers a REAL operating system -- why couldn't Apple have offered
OS X on iPad ???

Buy a netbook and put up with the keyboard hassles.

Buy a Kindle and put up with the crappy web browser and slow-and-unreliable
"whispernet" AT&T connection -- at least you get a good built-in reader app
and good screen technology.

Buy a low-cost generic reader such as Libre Pro

Buy an iPod and at least you're admitting you are reading on a cellphone and
at least you are actually getting a cellphone--with the resulting
compromises in space, speed, and OS.

Wait and see if the next version of the OS for iPad is less compromised.


From jimad at msn.com  Tue Apr 20 12:01:12 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 12:01:12 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCCBA89.2040103@novomail.net>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<4BCCBA89.2040103@novomail.net>
Message-ID: <SNT120-DS871E80E198E99851DB7BFAE0A0@phx.gbl>

>Now apparently, your complaint is not that PG HTML does not make good 
.epub files, or that including a generic stylesheet "breaks" the 
".epub", but that you don't like the .epub generator that Mr. Perathoner 
wrote. That complaint, with which I sympathize, needs to be directed to 
him individually; it cannot, however, be generalized to /all/ .epub 
files, only those created by his software.

First, it should be obvious to all the PG ePub is NOT simply HTML repackaged and compressed in that PG ePub is offered in two flavors, with and without "illustrations" and if those "illustrations" are illuminated caps then that is going to have at least SOME impact on the ePub files generated and the enjoyment or lack thereof of the end reader!

My *complaint* rather was that YOU said it was not necessary to have access to Marcello's converter because I could easily create my own ePub files to see what my HTML would like as an ePub. Which was clearly false.

My *suggestion* after *others* at PG complained that DP keeps turning out HTML which breaks when turned into PG ePub files was that maybe PG ought to offer Marcello's converter software in a portable form (I tried porting it but can't get it to work) so that DP authors (PP's) can actually TRY the ePub format as part of their content development process, and perhaps IF they saw for themselves that they were making choices in their HTML cutesiness that is causing the ebook readers experience to fail THEN perhaps they would make better choices.  BUT, currently the only way to see how the ePubs or MOBI is going to turn out is to submit the completed HTML to PG for posting at which point in time its way too late to make more reasoned HTML design tradeoffs.


From jimad at msn.com  Tue Apr 20 12:15:35 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 12:15:35 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <470.723ebfcf.38fe090b@aol.com>
References: <470.723ebfcf.38fe090b@aol.com>
Message-ID: <SNT120-DS133E29A742F75F39AC6828AE0A0@phx.gbl>

Sigh.  Do any of you guys know what an eBook is or an eBook reader???

 
Ibis is yet another hack workaround, in this case offering a low quality
rendering of an ePub from a list they maintain on their website, rendered
into HTML, displayed while you are attached to the internet via cable or
wifi.  It doesn't allow you to download the book, nor does it allow you to
download a book from a location you choose, but rather always from Feedbook.
It doesn't allow you to choose font, or font size, or margins.  It doesn't
allow you to read on an airplane or on a beach or anything else an ebook
reader allows.  It doesn't contain all the PG catalog and certainly not any
of the recent titles which 30 seconds of test will easily demonstrate.

 
Eurcalyptus from their website says iPod not iPad and it says they work from
ASCII format not ePub nor MOBI so they are not even working from eBook
files.

 
>try ibisreader.

>i don't know if "eucalyptus" is ipad-native yet, but when it is.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/969e7ab3/attachment.html>

From dakretz at gmail.com  Tue Apr 20 12:35:52 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 20 Apr 2010 12:35:52 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS871E80E198E99851DB7BFAE0A0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
	<4BCC9527.3000103@novomail.net>
	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<4BCCBA89.2040103@novomail.net>
	<SNT120-DS871E80E198E99851DB7BFAE0A0@phx.gbl>
Message-ID: <j2v627d59b81004201235zbd594f0dn81e0752da6053f0a@mail.gmail.com>

Summary of the situation (as it seems to me).

DP is currently taking too long to produce texts that are are either less
(plain-text)
or more (DP-style HTML) than the supply chain is able to convey to the
end-readers
to deliver the experience intended.

Once DP delivers their content in one or both of those formats, it's for all
purposes
stuck at PG (nicely symmetrical with how it had previously been stuck at DP)
because
while DP had the raw materials but no finished goods, PG has the finished
goods
but no raw materials. So for whatever purpose (quality improvement, error
correction,
evolving requirements) PG's products grow stale.

What can DP do in the reasonable short-term future that would be low risk
and low
effort?

The first most obvious to me is to start getting serious about passing along
the
raw materials. Upload in as complete form as possible the matching image and
text files so future modification and adaptation is possible. There's no
loss to
DP by doing  so; and the risk is that over time they are quite capable of
losing
track of them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/ee2a4417/attachment.html>

From jimad at msn.com  Tue Apr 20 13:31:36 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 13:31:36 -0700
Subject: [gutvol-d] EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>
Message-ID: <SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>

[Changed the header per Michaels request]

>Once again, I repeat, I don't care about any one particular format,
so please stop pretending this is a valid topic with me, 

But then why do you keep offering advice to me that you know will not work
for me because I *am* interested in eBook formats and eBook content
development such as DP does which is HTML rendered into ePub and MOBI? PG
doesn't even typically support PDF so how does the fact tha Safari happens
to kind of support reading documents in PDF have anything to do with PG?

>And just how many systems do you have, or know of, that wifi to two wifi
spots at the same time???

My generic $300 netbook has absolutely no problem whatsoever fresh out of
the box using wifi to transfer files either from the internet or to/from the
publically shared folders of all my locally networked computers.  I never
have to change a wifi setting just to get a file from a differing location!

>Sorry, reading the manual and looking up Apps does not qualify you with
any actual experience on the subject.

OK I spent another 3+ hours at the Apple Store (bricks and mortor) last
night trying all the "books" apps there including all of your suggestions
and none of them do anything "reasonable" like actually allowing one to
transfer a ePub or MOBI file from a chosen location on the internet onto the
iPad and allowing one to read that eBook there.

>Not to mention that YOU did not mention, once again, the actual NAME of
the products you are claiming so much expertise about.

I tried all the ones you mentioned previous plus all the ones searchable via
the App Store searching on "book" or "ebooks" as you suggested previously.

>Nevertheless, if, at the end of all this, you send me a list of questions,
comments, complaints, etc., I will try to go get you answers from Apple.

The question, complain, comments, etc would be the same as from the start,
namely:

"If in fact one can do so on iPad, how does one use iPad to download a free
eBook in ePub or MOBI format via wifi from an internet site that I choose,
storing that ePub or MOBI eBook on the iPad, and then get iPad to open that
eBook for reading at this and/or a later date -- when I may or may not have
an internet connection?"

This is quite possible from Kindle (whispernet), desktops, laptops, and
netbooks, for example, so I don't think it's an unreasonable expectation. 

>And hasn't Apple made it totally obvious you can't do with with an iPad???

Then why in gawds name would anyone want to get an iPad as an ebook reader?

>Ah, now, at this late stage, you have added that you want to edit eBooks
on the iPad.

I can live without it.  What makes iPad uninteresting to me is if one cannot
use wifi to download an ePub or MOBI file from a location of my own
choosing. I can't edit eBooks on the Kindle, for example, but the Kindle
does allow me to bookmark problem spots in the text I am SR'ing, and then I
can transfer the bookmarks to my desktop, use that info to locate the
problems in the text under development, and fix it there.

>If you spent the same four hours' worth on a Kindle, and liked it
so much you were already programming with it, I can't imagine why
you are having this conversation at all.

I think if you have been following these conversations at all over the
preceding months you would know that I am not in love with any particular
ebook reader which is why I am still on the outlook for something that would
work better.  Apple hyped how much better iPad would be, so I tried it, and
found that in fact it consists of demoware.

>Unless it is just to moan and complain in front of an audience to
somehow "get even" with Apple for being. . .well. . .Apple.

I have to admit I have not spent much time on Apple desktops or laptops but
I cannot believe that Apple could possibly be *this* restrictive on their
desktops or laptops or they would be out of business.  The question is not
then whether or not I like Apple, but rather whether or not iPad offers
anything new and interesting in terms of an eBook Reader.  You claimed it
did.  I tried it and it doesn't work.  If iPad is that restrictive, then I
don't like it.  I also don't like nook for the same reason -- namely nook
has a wifi but doesn't let the customers use it for anything except buying
books from B&N.  Why should I pay for a "feature" I am not allowed to use?
Does that mean I hate B&N?  No, if I want to buy a book or magazine I still
go to B&N -- I just don't spend my money on a nook designed to lock me into
only being able to spend more money on a nook.  Do I "love" Kindle -- no, it
has a crappy web browser, is slow to open PDF files, has the lousy slow AT&T
"whispernet" connection etc.  Yet even with these restrictions I CAN get
things done with Kindle, whereas iPad successfully blocks everything I try
to do.

>At first you denied using iBooks at all, don't you remember???

No.  Quote me when.

>> I told you we tried Stanza, because I told you about the large blurry
iPod

>Actually, you spoke of that as if it were a hypothetical...

I don't think I did.  Quote me when.

>So, you are admitting you never asked for what you didn't get.

Strange.  Why would I ask for the privilege of paying a subscription fee to
develop apps for a device that doesn't work?

>You seem to be bringing up something new, and of great interest.

I have talked about it before on this same forum so it is not new and flamed
Amazon for their stupidity then just as I am flaming Apple for their
stupidity now.  Search on "1984 Amazon" if anyone is interested in the
"1984" Amazon Kindle act of stupidity. Read
http://manuals.info.apple.com/en_US/iPad_User_Guide.pdf re iTunes syncing if
you want to read about Apple's act of stupidity.

>Now, just above, you said you were using iBooks, doesn't that count?

iBooks was on the iPad already, so no, I didn't download it.  When I went
back to the Apple store again last night at your suggestion they repeated
that I was not allowed to download apps and that if I tried to do so it
would not work.  I waited till they were not looking, tried downloading
apps, and eventually figured out how to get the app downloads to work.  The
apps *themselves* once downloaded however do not allow downloading of free
ePub and MOBI books from a website of my choosing, storing those on the iPad
for reading later, so the apps you ask me to install don't do anything
interesting or useful to me.

>You keep short-changing Wattpad, which I think I mentioned first.

I did download Wattpad and it simply yet another app that ties to one
particular server on the internet downloading a subset of PG books lightly
reformatted from ASCII plaintext.

>When at the Apps Store, just search for "ebooks" and "books" etc.

>How many times have I said that???

I did that, tried everything, again the apps out there all connect to a
private server on the internet downloading a subset of PG books lightly
reformatted from ASCII plaintext.  iBooks is a bit better in that they take
PG ePub, hack it to represent it as-if it comes from Apple, and redistribute
it from their servers.  This also means that they only serve up a subset of
PG works, and it means that it is not useful for content development, such
as SR from DP.  Kindle for iPad is a bit better in that they again take PG
books, hack it to represent it as-if it comes from Amazon, and redistribute
it from their servers -- but do it on a better reader app than iBooks. Which
again means that they only serve up a subset of PG works, and it means that
it is not useful for content development, such as SR from DP.

> (Re Goodreader) You download it from the Apps Store. . . .

But IT in turn cannot download ePub or MOBI books from a general location on
the internet.

> No, they just worked around their current version of Steve Jobs, such as
working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony
and all the rest. . . .

Thinking back in time I think this was a somewhat true statement when app
distribution was via computer stores.  Since the internet has caught on I
haven't had problems distributing content nor apps to whoever I want.  The
internet has a problem in that searching is via Google, and Google in turn
does their own monopolistic practices, such as refusing to return a search
"hit" on small websites even if you search on the exact name of that website
-- unless one sends copious advertising dollars to Google.


From jimad at msn.com  Tue Apr 20 13:43:56 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 13:43:56 -0700
Subject: [gutvol-d] Re: the blind men and the .epub file-format
In-Reply-To: <4BCCD051.7060508@novomail.net>
References: <8ce40.74ecf6f2.38fe068e@aol.com> <4BCCD051.7060508@novomail.net>
Message-ID: <SNT120-DS2111200A8B41D8DDA65F95AE0A0@phx.gbl>

>Because native HTML is relative uncommon at PG, I would guess that most
.epub files start the 
process as ITF.

Please don't guess, but rather check it out. For example of books posted in
the last 24 hours, 15 out of 17 came with native HTML.

Playing around with Advanced Search it reports 21786 books in HTML native
format verses 20828 in text format.  IE going back over the entire history
of PG about 2/3rds of the books have HTML native format.


From Bowerbird at aol.com  Tue Apr 20 14:16:40 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 Apr 2010 17:16:40 EDT
Subject: [gutvol-d] summary of the recent discussion of e-books on the ipad
Message-ID: <12108.203769e3.38ff73b8@aol.com>

if you wonder if the recent flurry between jim and michael
needs to be read, let me save you the time and trouble...

according to michael, who doesn't care about the format
of the books just as long as he can get to the content inside,
it's pretty easy to read e-books on the ipad, especially when
one has an always-on connection to the web, in which case
one can access many sites, using many viewer-programs,
specifically including the native safari web-browser...

according to jim, who wants his e-books in .epub or .mobi,
getting e-books on the ipad can be a large pain in the ass...

jim also complains about the closed nature of the ipad, and
prefers his netbook, albeit hasn't reported on whether or not
the added weight of the netbook hampers his use of the unit.

much drama and misunderstandings and unaddressed points
were also part of the recent exchange of e-mails, but i think
i've boiled the important aspects.   if you have any questions,
please feel free to ask them...

-bowerbird

p.s.   if anyone actually _owns_ an ipad, would you report in?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/3eb67c0d/attachment-0001.html>

From Bowerbird at aol.com  Tue Apr 20 14:26:38 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 Apr 2010 17:26:38 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <12ba6.1c940b.38ff760e@aol.com>

according to its website, ibisreader can:
1) fetch an .epub from any website, and
2) download it so it can be read offline...

jim claims neither of these things is true.

i have no dog in this fight.

-bowerbird

p.s.   as i said, _when_ eucalyptus is ipad-native,
i recommend it.   the fact that it uses p.g. .txt files,
instead of .epub files, is a _feature_, not a _bug_...
and anyone who makes a claim that a p.g. .txt file
is "not an e-book file" is a full-on bloomin' idiot.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/71744bca/attachment.html>

From Bowerbird at aol.com  Tue Apr 20 14:29:23 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 Apr 2010 17:29:23 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <12ead.2be71cd9.38ff76b3@aol.com>

dakretz said:
>    Summary of the situation (as it seems to me).
>    DP is currently taking too long to produce texts 
>    that are are either less (plain-text)
>    or more (DP-style HTML) 
>    than the supply chain is able to convey to the 
>    end-readers to deliver the experience intended.

"less" and "more" are value-laden terms.
_and_ incorrect.   please re-summarize...


>    Upload in as complete form as possible 
>    the matching image and text files 
>    so future modification and adaptation is possible.
>    There's no loss to DP by doing so

oh dear boy.   (just an expression.   don is an old man.
heck, he might even be _a_grandfather_ by this time.)

but don, how much more personal denigration are you
going to have to endure at the hands of d.p. apologists
before you realize that you do not grok their mindset?

let me 'splain it to you...

d.p. has, in their hands, the scans from all the books
that have gone through their system.   so they _could_
have pushed them to project gutenberg at any time...

indeed, charlz originally intended that d.p. itself would
mount the scans.   he called it the "online library system",
and at one point in time, it actually came into existence.
(it's probably still there, with some 6,000 scan-sets in it.)

why hasn't it been maintained?

well, _i_ happen to think that it's pretty obvious.

but maybe that's because of what i do with those scans:
i use them to point to unequivocal evidence of _errors_
in the "final product" emerging from the d.p. workflow.

and that's what other people might do with them, too...

does d.p. want us unequivocally pointing out their errors?

no.

ergo, they are keeping their scans to themselves...

the myth of d.p. accuracy is one that keeps d.p. going...
the powers-that-be over there do not want to put that
myth up against _any_ solid evidence to the contrary...

and it's not that hard to understand, either.   rfrank was
eager to see the results of my check on the "sitka" book,
at least when he thought that check would be _positive_.
but when it was less than flattering, he clammed right up.
it's hard for some people to admit they make mistakes...
even if they can do it in a "general" way, when it comes to
close-eyed examination of specifics, they're uninterested,
and might even go to great lengths to suppress evidence...


>    and the risk is that over time 
>    they are quite capable of losing track of them.

they have the scans firmly in their grasp now,
and they wish to retain control, so they simply
are not worried about "losing track of them"...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/b74ffd07/attachment.html>

From jimad at msn.com  Tue Apr 20 14:57:40 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 14:57:40 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCDC7B4.9080605@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>	<4BCC9478.5040204@perathoner.de>	<4BCDC069.2050608@yahoo.com>
	<4BCDC7B4.9080605@perathoner.de>
Message-ID: <SNT120-DS17E69BDA51EB5D26255A23AE0A0@phx.gbl>

>Its much easier to forget about fancy formatting and use only the 
simplest HTML constructs.

I think for the most part people are after reviewing the recent submissions.
It seems like there a only a few commonly repeated mistakes PPs do that
confound ePub and MOBI generation:

What do to about illustrations in the ePub and MOBI files distributed
without illustration.

Illuminated Initial Caps

Drop Caps

Text represented as Illustration for some reason (PP thought the original
text looked so cool that some of it was introduced as an Illustration)

Equations "typeset" in Unicode/HTML

I wonder if instead of enumerating the HTML constructs people are allowed to
use if it wouldn't be better simply to enumerate the HTML practices that
will lead to trouble?  Again, I don't think people are trying to cause
trouble, they just get seduced by some visual aspect of HTML without
realizing the problems that will cause later.


From jimad at msn.com  Tue Apr 20 15:13:02 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 15:13:02 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <j2v627d59b81004201235zbd594f0dn81e0752da6053f0a@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>	<4BCCBA89.2040103@novomail.net>	<SNT120-DS871E80E198E99851DB7BFAE0A0@phx.gbl>
	<j2v627d59b81004201235zbd594f0dn81e0752da6053f0a@mail.gmail.com>
Message-ID: <SNT120-DS9A92277DD71ACF0355817AE0A0@phx.gbl>

>The first most obvious to me is to start getting serious about passing
along the
raw materials. Upload in as complete form as possible the matching image and
text files so future modification and adaptation is possible. There's no
loss to
DP by doing  so; and the risk is that over time they are quite capable of
losing
track of them.

I suggest that it is helpful if possible for the HTML to be submitted with
the linebreaks the same as the original book, and that PG retain those
linebreaks rather than changing the line lengths by say running the HTML
through "tidy."  Or else at least retain the submitted HTML internally with
the original linebreaks to make it easier to fix problems or make another
pass through DP or some other process some day.  Pgdiff can be used to
recover the linebreaks, but it is less work if the linebreaks are never
discarded in the first place.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/cbc5fe47/attachment.html>

From jimad at msn.com  Tue Apr 20 15:21:35 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 15:21:35 -0700
Subject: [gutvol-d] Re: the blind men and the .epub file-format
In-Reply-To: <SNT120-DS2111200A8B41D8DDA65F95AE0A0@phx.gbl>
References: <8ce40.74ecf6f2.38fe068e@aol.com> <4BCCD051.7060508@novomail.net>
	<SNT120-DS2111200A8B41D8DDA65F95AE0A0@phx.gbl>
Message-ID: <SNT120-DS21F81DF482A5C2CAB1FA02AE0A0@phx.gbl>

20828
^ sorry, should read 30828


From hart at pglaf.org  Tue Apr 20 15:39:45 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Tue, 20 Apr 2010 15:39:45 -0700 (PDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS17E69BDA51EB5D26255A23AE0A0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<20100418170536.GA22578@pglaf.org>
	<20100419021856.6C702100B0@cardano.dm.unipi.it>
	<4BCC1F14.1090801@perathoner.de>
	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>
	<4BCC7B05.5020506@yahoo.com> <4BCC9478.5040204@perathoner.de>
	<4BCDC069.2050608@yahoo.com> <4BCDC7B4.9080605@perathoner.de>
	<SNT120-DS17E69BDA51EB5D26255A23AE0A0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004201535540.9849@mail.pglaf.org>


As I said before, no one can or will complain vociferously
if BOTH the illuminated caps AND the ASCII are included.

It won't hurt the readability, and it won't matter where
the illumination ends up in nearly such exact terms.

Why make this so much harder than is has to be???!!!

Just make it so everyone can BOTH read the text AND
appreciate the illumination.

So. . .please. . .stop wasting time and effort, and
just make it easy on all concerned, as it should be.

No more mountains made out of molehills. . . .


Thanks!!!


Give eBooks in 2010!!!


Michael S. Hart
Founder
Project Gutenberg
Inventor of eBooks


Recommended Books:

Dandelion Wine, by Ray Bradbury:  For The Right Brain
Diamond Age, by Neal Stephenson:  To Understand The Internet
The Phantom Tollbooth, by Norton Juster:  Lesson of Life. . .

If you ever do not get a prompt response, please resend, then
keep resending, I won't mind getting several copies per week.


On Tue, 20 Apr 2010, James Adcock wrote:

> >Its much easier to forget about fancy formatting and use only the
> simplest HTML constructs.
>
> I think for the most part people are after reviewing the recent submissions.
> It seems like there a only a few commonly repeated mistakes PPs do that
> confound ePub and MOBI generation:
>
> What do to about illustrations in the ePub and MOBI files distributed
> without illustration.
>
> Illuminated Initial Caps
>
> Drop Caps
>
> Text represented as Illustration for some reason (PP thought the original
> text looked so cool that some of it was introduced as an Illustration)
>
> Equations "typeset" in Unicode/HTML
>
> I wonder if instead of enumerating the HTML constructs people are allowed to
> use if it wouldn't be better simply to enumerate the HTML practices that
> will lead to trouble?  Again, I don't think people are trying to cause
> trouble, they just get seduced by some visual aspect of HTML without
> realizing the problems that will cause later.
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From kevin.pulliam at gmail.com  Tue Apr 20 15:48:37 2010
From: kevin.pulliam at gmail.com (Kevin Pulliam)
Date: Tue, 20 Apr 2010 17:48:37 -0500
Subject: [gutvol-d] Archiving of Raw Materials was Re: Re: DP output is
	technically obsolete
Message-ID: <q2m78defb41004201548m2068c90cg312e4a48ff9604c1@mail.gmail.com>

On Tue, Apr 20, 2010 at 2:35 PM, don kretz <dakretz at gmail.com> wrote:
> Summary of the situation (as it seems to me).
>
SNIP
>
> What can DP do in the reasonable short-term future that would be low risk
> and low
> effort?
>
> The first most obvious to me is to start getting serious about passing along
> the
> raw materials. Upload in as complete form as possible the matching image and
> text files so future modification and adaptation is possible. There's no
> loss to
> DP by doing  so; and the risk is that over time they are quite capable of
> losing
> track of them.
SNIP

Yes please.

Please provide a complete archive of the project files (Scans, text,
clearance docs, etc) in an obvious location for all projects and
submissions (DP or otherwise).  This can be at PG, at DP, at
archive.org or elsewhere... I don't care where, so long as it has the
same permanence of a PG text, and has the same 'lack' of barriers to
access as PG texts.

Just as open source projects literally require distribution of the
un-compiled code so that future folks can see the foundation of the
work as well as the finished product, so can ebooks benefit if future
users can (if they choose) see the basis of the work as well as the
finished product if they feel like offering an improved or altered
version.

Thanks!

Kevin

From jimad at msn.com  Tue Apr 20 15:49:19 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 15:49:19 -0700
Subject: [gutvol-d] Re: summary of the recent discussion of e-books on the
	ipad
In-Reply-To: <12108.203769e3.38ff73b8@aol.com>
References: <12108.203769e3.38ff73b8@aol.com>
Message-ID: <SNT120-DS3FC88988284BC0DA35395AE0A0@phx.gbl>

>.jim also complains about the closed nature of the ipad, and
prefers his netbook, albeit hasn't reported on whether or not
the added weight of the netbook hampers his use of the unit.

LOL, thank you for the intelligent summary!  However, I *have* reported here
previously on my experience with the added weight of a netbook: Namely that
when I try to hold it and read one-handed it weighs enough that my hand
falls asleep.  A bigger problem is the attached keyboard that I don't really
need when I am reading a book, except that the attached keyboard has the
page-turn buttons located in really stupid and unhelpful locations, such
that when simply reading a book one-handed on a netbook turning pages is a
pain in the *ss! Weight-wise iPad is slightly bigger thicker and heavier
than a Kindle DX which is my current go-to preferred reading device, and the
iPad is about the size and weight of the non-keyboard half of a netbook -
which means overall a netbook is about 2X the size and weight of an iPad.

 
Again, problems with the Kindle DX:

 
Slow and unreliable AT&T "whispernet" wireless connection

 
Slow and crappy basic web browser

 
Slow PDF loads and page turns

 
Difficult to use to write even basic notes - for example if one wants to
take notes of the problems one sees when doing an SR.

 
DRM policies of the books you buy from Amazon (as opposed to free books) are
overly restrictive.

 
Low contrast display when in low-light situations.

 
Problems with a netbook:

 
Too heavy

 
Battery life too short

 
Keyboard is not useful and is awkward when one just wants to read a book.

 
Screen door effect

 
Problems with iPad:

 
Can't use wifi to download a ePub or MOBI book to the iPad, must download to
a desktop computer and from there to iTunes to USB to iPad.

 
Screen door effect

 
ePub and MOBI reader apps on iPad not as good as those available on other
platforms.

 
We don't really know yet in practice how restrictive Apple DRM policies will
prove to be [on purchased books] - in practice on free ePub and MOBI books
they are very annoying.

 
3G monthly wireless prices are pricey - and once again comes from AT&T! [At
least Kindle's 3G wireless is free - and worth every penny!]

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/9250fb0c/attachment.html>

From hart at pglaf.org  Tue Apr 20 15:55:14 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Tue, 20 Apr 2010 15:55:14 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>
	<SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>


If you read the following carefully, you will see that Jim Adcock
has continued to change the frames of reference of his decisions,
which were, obviously, already made before he did his research.

As we all have heard from very reliable and value sources, quite
literally even the best research techniques suffer greatly via a
new interpretation by someone leaning in any biased direction.

The point is, and always has been, that Apple and the iPad NEVER
were a real consideration for Mr. Adcock and he just keeps on in
the process of making more and more OBVIOUS objections that were
ALREADY OBVIOUS from the start.

ALL OBVIOUS, ALL THE TIME, NOTHING NEW, JUST RANTING AND RAVING.

I use his own words from below to help you decide:

"Then why in gawds name would anyone want
to get an iPad as an ebook reader?"

Mr. Adcock NEVER wanted to get an iPad as an eBook reader.

This was never a choice that was up for discussion.

The only discussion from him:  all the reasons he wouldn't.

This is what THE LIST OF FAMOUS FALLACIES calls the trick of:

"THE DOG IN THE MANGER."

Just another secondary school fallacy brought back to life as
if by a rather limited Frankenstein.

Frankenstein's monster was much more of a humanist. . . .


On Tue, 20 Apr 2010, James Adcock wrote:

> [Changed the header per Michaels request]
>
> >Once again, I repeat, I don't care about any one particular format,
> so please stop pretending this is a valid topic with me,
>
> But then why do you keep offering advice to me that you know will not work
> for me because I *am* interested in eBook formats and eBook content
> development such as DP does which is HTML rendered into ePub and MOBI? PG
> doesn't even typically support PDF so how does the fact tha Safari happens
> to kind of support reading documents in PDF have anything to do with PG?
>
> >And just how many systems do you have, or know of, that wifi to two wifi
> spots at the same time???
>
> My generic $300 netbook has absolutely no problem whatsoever fresh out of
> the box using wifi to transfer files either from the internet or to/from the
> publically shared folders of all my locally networked computers.  I never
> have to change a wifi setting just to get a file from a differing location!
>
> >Sorry, reading the manual and looking up Apps does not qualify you with
> any actual experience on the subject.
>
> OK I spent another 3+ hours at the Apple Store (bricks and mortor) last
> night trying all the "books" apps there including all of your suggestions
> and none of them do anything "reasonable" like actually allowing one to
> transfer a ePub or MOBI file from a chosen location on the internet onto the
> iPad and allowing one to read that eBook there.
>
> >Not to mention that YOU did not mention, once again, the actual NAME of
> the products you are claiming so much expertise about.
>
> I tried all the ones you mentioned previous plus all the ones searchable via
> the App Store searching on "book" or "ebooks" as you suggested previously.
>
> >Nevertheless, if, at the end of all this, you send me a list of questions,
> comments, complaints, etc., I will try to go get you answers from Apple.
>
> The question, complain, comments, etc would be the same as from the start,
> namely:
>
> "If in fact one can do so on iPad, how does one use iPad to download a free
> eBook in ePub or MOBI format via wifi from an internet site that I choose,
> storing that ePub or MOBI eBook on the iPad, and then get iPad to open that
> eBook for reading at this and/or a later date -- when I may or may not have
> an internet connection?"
>
> This is quite possible from Kindle (whispernet), desktops, laptops, and
> netbooks, for example, so I don't think it's an unreasonable expectation.
>
> >And hasn't Apple made it totally obvious you can't do with with an iPad???
>
> Then why in gawds name would anyone want to get an iPad as an ebook reader?
>
> >Ah, now, at this late stage, you have added that you want to edit eBooks
> on the iPad.
>
> I can live without it.  What makes iPad uninteresting to me is if one cannot
> use wifi to download an ePub or MOBI file from a location of my own
> choosing. I can't edit eBooks on the Kindle, for example, but the Kindle
> does allow me to bookmark problem spots in the text I am SR'ing, and then I
> can transfer the bookmarks to my desktop, use that info to locate the
> problems in the text under development, and fix it there.
>
> >If you spent the same four hours' worth on a Kindle, and liked it
> so much you were already programming with it, I can't imagine why
> you are having this conversation at all.
>
> I think if you have been following these conversations at all over the
> preceding months you would know that I am not in love with any particular
> ebook reader which is why I am still on the outlook for something that would
> work better.  Apple hyped how much better iPad would be, so I tried it, and
> found that in fact it consists of demoware.
>
> >Unless it is just to moan and complain in front of an audience to
> somehow "get even" with Apple for being. . .well. . .Apple.
>
> I have to admit I have not spent much time on Apple desktops or laptops but
> I cannot believe that Apple could possibly be *this* restrictive on their
> desktops or laptops or they would be out of business.  The question is not
> then whether or not I like Apple, but rather whether or not iPad offers
> anything new and interesting in terms of an eBook Reader.  You claimed it
> did.  I tried it and it doesn't work.  If iPad is that restrictive, then I
> don't like it.  I also don't like nook for the same reason -- namely nook
> has a wifi but doesn't let the customers use it for anything except buying
> books from B&N.  Why should I pay for a "feature" I am not allowed to use?
> Does that mean I hate B&N?  No, if I want to buy a book or magazine I still
> go to B&N -- I just don't spend my money on a nook designed to lock me into
> only being able to spend more money on a nook.  Do I "love" Kindle -- no, it
> has a crappy web browser, is slow to open PDF files, has the lousy slow AT&T
> "whispernet" connection etc.  Yet even with these restrictions I CAN get
> things done with Kindle, whereas iPad successfully blocks everything I try
> to do.
>
> >At first you denied using iBooks at all, don't you remember???
>
> No.  Quote me when.
>
> >> I told you we tried Stanza, because I told you about the large blurry
> iPod
>
> >Actually, you spoke of that as if it were a hypothetical...
>
> I don't think I did.  Quote me when.
>
> >So, you are admitting you never asked for what you didn't get.
>
> Strange.  Why would I ask for the privilege of paying a subscription fee to
> develop apps for a device that doesn't work?
>
> >You seem to be bringing up something new, and of great interest.
>
> I have talked about it before on this same forum so it is not new and flamed
> Amazon for their stupidity then just as I am flaming Apple for their
> stupidity now.  Search on "1984 Amazon" if anyone is interested in the
> "1984" Amazon Kindle act of stupidity. Read
> http://manuals.info.apple.com/en_US/iPad_User_Guide.pdf re iTunes syncing if
> you want to read about Apple's act of stupidity.
>
> >Now, just above, you said you were using iBooks, doesn't that count?
>
> iBooks was on the iPad already, so no, I didn't download it.  When I went
> back to the Apple store again last night at your suggestion they repeated
> that I was not allowed to download apps and that if I tried to do so it
> would not work.  I waited till they were not looking, tried downloading
> apps, and eventually figured out how to get the app downloads to work.  The
> apps *themselves* once downloaded however do not allow downloading of free
> ePub and MOBI books from a website of my choosing, storing those on the iPad
> for reading later, so the apps you ask me to install don't do anything
> interesting or useful to me.
>
> >You keep short-changing Wattpad, which I think I mentioned first.
>
> I did download Wattpad and it simply yet another app that ties to one
> particular server on the internet downloading a subset of PG books lightly
> reformatted from ASCII plaintext.
>
> >When at the Apps Store, just search for "ebooks" and "books" etc.
>
> >How many times have I said that???
>
> I did that, tried everything, again the apps out there all connect to a
> private server on the internet downloading a subset of PG books lightly
> reformatted from ASCII plaintext.  iBooks is a bit better in that they take
> PG ePub, hack it to represent it as-if it comes from Apple, and redistribute
> it from their servers.  This also means that they only serve up a subset of
> PG works, and it means that it is not useful for content development, such
> as SR from DP.  Kindle for iPad is a bit better in that they again take PG
> books, hack it to represent it as-if it comes from Amazon, and redistribute
> it from their servers -- but do it on a better reader app than iBooks. Which
> again means that they only serve up a subset of PG works, and it means that
> it is not useful for content development, such as SR from DP.
>
> > (Re Goodreader) You download it from the Apps Store. . . .
>
> But IT in turn cannot download ePub or MOBI books from a general location on
> the internet.
>
> > No, they just worked around their current version of Steve Jobs, such as
> working around IBM, then Apple, then Microsoft, and Intel, and ADM, Sony
> and all the rest. . . .
>
> Thinking back in time I think this was a somewhat true statement when app
> distribution was via computer stores.  Since the internet has caught on I
> haven't had problems distributing content nor apps to whoever I want.  The
> internet has a problem in that searching is via Google, and Google in turn
> does their own monopolistic practices, such as refusing to return a search
> "hit" on small websites even if you search on the exact name of that website
> -- unless one sends copious advertising dollars to Google.
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From hart at pglaf.org  Tue Apr 20 16:04:00 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Tue, 20 Apr 2010 16:04:00 -0700 (PDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <SNT120-DS17689D5DA8EADF9A4E6ACAE0A0@phx.gbl>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<4BCAEB9A.2040105@perathoner.de>
	<20100418150509.8A3501008D@cardano.dm.unipi.it>
	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>
	<4BCC9527.3000103@novomail.net>
	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191129450.7967@mail.pglaf.org>
	<SNT120-DS17689D5DA8EADF9A4E6ACAE0A0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004201559350.9849@mail.pglaf.org>


Once again I must insist Mr. Adcock stop putting words in my mouth.

You notice he doesn't put his answers in the context I asked them.

These products all "work as I suggested," they just NEVER WERE
INTENDED TO WORK AS MR. ADCOCK WOULD HAVE WANTED, AND HE KNEW
THIS BEFOREHAND. . .nothing new here, just the same old same old.

There are plenty of options to read in higher res than iPod/iPhone.

That was your original complaint.

There are plenty of PG eBooks.

Also part of your original complaint.

That you want to go behind the counter and rearrange things?

Sorry, it's their store, their counter, not at all up to you.

Make your own. . . .


On Tue, 20 Apr 2010, James Adcock wrote:

> >If only Jim would have been as thorough, and polite, about the iPad.
>
> I don't know what your problem is, but per your suggestions I went back to
> the Apple "brick and mortar" store yesterday, and spent an additional three+
> hours researching all the iPad suggestions you made.  None of them "worked"
> as you suggested.  None of them allow direct access to allow one to read
> ebook files either ePub or MOBI via wifi access to the internet.  None of
> them allow latest access to the most recent books in ePub format released on
> PG.  What almost all of them do is allow access to an internet server tied
> to that particular applet that allows one to read some subset of the PG
> offerings in some degraded form, typically a slightly spruced up "pretty
> print" version of a text file. This can simply be checked by searching for
> one of the latest PG books in which case you will find that NONE of these
> applets offer the latest PG books.
>
> Unlike most full function web browsers on desktops or even netbooks, if one
> uses the provided Safari web browser to click on an ePub or MOBI file Safari
> says "You cannot download that file."  You CAN use Safari to open a PDF
> file, which makes iPad useful for reading Google Books PDF "photocopies" of
> books.  Just not useful for what PG offers.  Also even IF you use Safari to
> open a PDF file, apparently Safari does not retain a copy of that book
> because when I run Safari again it redownloads that PDF file again from
> scratch.  Why does one care?  Well, that's slow, and it means that this
> doesn't work in "airplane mode" ie you cannot read the PDF file on an
> airplane via Safari with the wifi turned off.  The ability to actually
> download and save book files is a fundamental feature of "real" book
> readers, and in the design of "real" ebook formats such as ePub and MOBI,
> such that when you download a book, you HAVE that book, and can then read it
> wherever and whenever you choose without requiring wifi or other wired
> connection.
>
> What the Apple manuals say (finally released yesterday and read last night)
> is that one is allowed to transfer free book files in ePub format via USB
> cable from your desktop to your iPad via iTunes.  You download a free ePub
> book to your desktop.  From there you transfer it to the iTunes software.
> Then you hook up your iPad using a USB.  Then you sync to iTunes.  Then you
> safely unplug your iPad.  Then you open iBooks.  Then you find the new book
> in the iBooks "shelf" which you can then finally click on and start reading.
>
>
> As opposed to one click on a link to a free ePub or MOBI book link say at PG
> using a netbook browser, which downloads the file, stores it, opens the
> reader app and you are up and reading.  1 second verses 10 minutes of hassle
> factor.
>
> Further, the iPad manuals say that Apple has *permanently* given Apple
> applets priority over other applets for the file types that the iPad
> supports.  IE ePub type is "hardwired" to the iBooks applet such if you
> transfer an ePub file via the long-winded iTunes USB "sync" process then one
> can only read that ePub using the iBooks app -- which is a pretty weak app
> compared to other ePub and MOBI readers if one has made the comparison.
> [Imagine if Mickeysoft "hardwired" the HTML file type to the IE browser and
> allowed no other browser choice! Can you say "Monopoly," I knew you could]
>
> What CAN iPad do?
>
> It can reasonably present paid books from Apple on iBooks (not the greatest
> reader app, but not too horrible either)
>
> It can reasonably present a free subset of PG's offerings repackaged as-if
> they came from Apple on iBooks
>
> It can reasonably present paid books from Amazon via Kindle for iPad
>
> It can reasonably present a free subset of PG's offerings repackaged as-if
> they came from Amazon via Kindle for iPad
>
> If you have already bought books for a Kindle then Kindle for iPad will also
> allow you to read them for no additional cost on iPad
>
> It can reasonably present PDF books and documents via Safari as long as you
> have an active wifi connection
>
> It can store and allow you to read free ePub and other common document
> formats that you have transferred to iPad using the slow and cumbersome
> Desktop/USB/iTunes path. [At least the documentation claims this -- I cannot
> test it in the Apple store because they don't have USB to desktop set up]
>
> Is this all good or bad?  It depends on what you want to do.  If you simply
> want to be a passive consumer of content, similar to watching TV from your
> cable provider, then maybe its fine.  If you want to be a CREATOR of
> content, such as someone who helps DP, SRs books from DP, "solos" books for
> PG, etc, then it's a pretty weak offering -- IMHO you would be much better
> off putting up with the hassles of a netbook which DOES allow one to quickly
> and painlessly transfer content using wifi. And if you are a reader omnivore
> like I am, then you will probably rapidly get sick of the Job's monopolistic
> restrictions constantly getting in the way of your ability to quickly and
> easily download What you want from Where you want reading it with Whatever
> reader applet YOU damned well choose -- NOT Steve Jobs!
>
> Other reasonable approaches:
>
> Wait for the HP Slate and see how cobbled-up its touch abilities are.  At
> least it offers a REAL operating system -- why couldn't Apple have offered
> OS X on iPad ???
>
> Buy a netbook and put up with the keyboard hassles.
>
> Buy a Kindle and put up with the crappy web browser and slow-and-unreliable
> "whispernet" AT&T connection -- at least you get a good built-in reader app
> and good screen technology.
>
> Buy a low-cost generic reader such as Libre Pro
>
> Buy an iPod and at least you're admitting you are reading on a cellphone and
> at least you are actually getting a cellphone--with the resulting
> compromises in space, speed, and OS.
>
> Wait and see if the next version of the OS for iPad is less compromised.
>
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From jimad at msn.com  Tue Apr 20 16:33:52 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 16:33:52 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <12ba6.1c940b.38ff760e@aol.com>
References: <12ba6.1c940b.38ff760e@aol.com>
Message-ID: <SNT120-DS2245A36A4B0A9D031FD9E2AE0A0@phx.gbl>

>according to its website, ibisreader can:
>1) fetch an .epub from any website, and
>2) download it so it can be read offline...
>jim claims neither of these things is true.


Well you said I could try it from my desktop and I did and it didn't have
that capabilities claimed.  Rather on an "experimental" basis it says if you
own a website and have a database set up the way they expect that database
to be set up THEN you can add that website to their list of supported
websites - which currently is just Feedbooks.  And I'm guessing when they
say "download" they mean that  if the source is in ePub then THEY will scarf
it to THEIR computer and then re-present it to you as HTML live to your
attached HTML browser for as long as you have an internet connection -
because that's what they do - they scarf and store the books on THEIR
computere "for you," re-present it as HTML and then they claim this as a
"feature."


> anyone who makes a claim that a p.g. .txt file
is "not an e-book file" is a full-on bloomin' idiot.

 
I can never resist having BB call me an idiot (and more recently Michael) so
here goes:

 
PG txt file is NOT AN E-BOOK FILE because it does not meet at least one
criterion that is universally accepted as being required of ebook file
formats: namely reflow.  Txt format can reflow, but PG txt format cannot
reflow because it has hardwired linebreaks at around 70 chars.  Yes I know
that *in theory* if Apple say (LOL) wanted to they could write a PGTXT70
file format reader that would unwrap those line breaks more or less
successfully most of the time but since the rest of the computer world sans
PG decided circa 1970 than hardwired linebreaks is A BAD IDEA it seems
highly unlikely that Apple or anyone else is going back to the future to fix
PG's txt problems now.

 
If you like PG TXT format is a "teletype file" because its capabilities are
designed around the capabilities of teletypes circa 1970 which used ASCII
and had 72 chars per line. I for one thank god when I got rid of my teletype
after it burned out the third time trying to do microprocessor development
circa 1976!  Technician couldn't understand why all the grease in there kept
getting baked into bricks - said AP never uses their machines this hard!

 
One good introductory read about what an eBook File IS can be found at:

 
http://en.wikipedia.org/wiki/EPub

 
Other characteristics uniformly expected of eBook files include:

 
Encapsulation: Download one file and you have all you need to read the book
in the future without wireless connection. AKA "airplane mode"

 
Book Metadata:  Author, Title, TOC, Index, etc. at defined locations in a
defined manner such that any reader app or bookshelf app can display these
easily - without having to open and read the whole book.

 
Sure, one could define how one or more of these things are suppose to work,
and you could put it all in a zip file to encapsulate it, and then you can
just change the txt extensions to .html on these "txt" files and change the
.zip package extension to .epub and then one would have, well, then  I guess
then one would have an epub not a PG txt file anymore.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/d24a87f6/attachment.html>

From dakretz at gmail.com  Tue Apr 20 16:34:49 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 20 Apr 2010 16:34:49 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <12ead.2be71cd9.38ff76b3@aol.com>
References: <12ead.2be71cd9.38ff76b3@aol.com>
Message-ID: <r2m627d59b81004201634p995d3ccbjd8d1a66238e38932@mail.gmail.com>

Great news! Let's test this thesis.

I'm currently working through the very first Britannica project ever
- The Project Gutenberg Encyclopedia, Volume 1 of 28. It's
etext # 200, dated "1995-01-01". It's in sad shape. Text only,
many errors apparent to the casual eye. I'd like to reprocess
it.

Can anyone from DP tell me how to get the scans?


On Tue, Apr 20, 2010 at 2:29 PM, <Bowerbird at aol.com> wrote:

> dakretz said:
> >   Summary of the situation (as it seems to me).
> >   DP is currently taking too long to produce texts
> >   that are are either less (plain-text)
> >   or more (DP-style HTML)
> >   than the supply chain is able to convey to the
> >   end-readers to deliver the experience intended.
>
> "less" and "more" are value-laden terms.
> _and_ incorrect.  please re-summarize...
>
>
>
> >   Upload in as complete form as possible
> >   the matching image and text files
> >   so future modification and adaptation is possible.
> >   There's no loss to DP by doing so
>
>
> d.p. has, in their hands, the scans from all the books
> that have gone through their system.  so they _could_
> have pushed them to project gutenberg at any time...
>
> indeed, charlz originally intended that d.p. itself would
> mount the scans.  he called it the "online library system",
> and at one point in time, it actually came into existence.
> (it's probably still there, with some 6,000 scan-sets in it.)
>
> why hasn't it been maintained?
>
> well, _i_ happen to think that it's pretty obvious.
>
> but maybe that's because of what i do with those scans:
> i use them to point to unequivocal evidence of _errors_
> in the "final product" emerging from the d.p. workflow.
>
> and that's what other people might do with them, too...
>
> does d.p. want us unequivocally pointing out their errors?
>
> no.
>
> ergo, they are keeping their scans to themselves...
>
> the myth of d.p. accuracy is one that keeps d.p. going...
> the powers-that-be over there do not want to put that
> myth up against _any_ solid evidence to the contrary...
>
> and it's not that hard to understand, either.  rfrank was
> eager to see the results of my check on the "sitka" book,
> at least when he thought that check would be _positive_.
> but when it was less than flattering, he clammed right up.
> it's hard for some people to admit they make mistakes...
> even if they can do it in a "general" way, when it comes to
> close-eyed examination of specifics, they're uninterested,
> and might even go to great lengths to suppress evidence...
>
>
>
> >   and the risk is that over time
> >   they are quite capable of losing track of them.
>
> they have the scans firmly in their grasp now,
> and they wish to retain control, so they simply
> are not worried about "losing track of them"...
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/a81bb42d/attachment.html>

From Morasch at aol.com  Tue Apr 20 16:48:26 2010
From: Morasch at aol.com (Morasch at aol.com)
Date: Tue, 20 Apr 2010 19:48:26 EDT
Subject: [gutvol-d] a longstanding question has finally been answered
Message-ID: <285bc.71539bf.38ff974a@aol.com>

>    "Then why in gawds name would anyone 
>    want to get an iPad as an ebook reader?"

people want the ipad for _lots_ of uses...
not just as an e-book reader-machine...

the fact that they get an e-book machine
thrown in, for free, is icing on the cake...
(and many of these users don't even read,
which means they don't like cake or icing.)

which reminds me...

remember back in the day, when one of the
most cherished merry-go-round topics on
every e-book listserve was whether people
would want a _dedicated_ e-book machine
or a _multipurpose_ one?   gosh, how many
pleasant afternoons were spent composing
posts on that dependable hobby-horse topic?

well, folks, the winner has now been decided.

amazon offered up a good dedicated machine.
and they've moved about 3 million units so far.
and they probably coulda moved twice as many
if they would have fixed the obvious problems.
throw in all the nooks and sonys, and we've got
a downright respectable total for _dedicated_...

but, on the other hand, however, we have apple.

the iphone/ipodtouch has sold 80 million so far.
and the ipad moved 300,000 units on pre-order
and first-weekend sales alone, if we trust apple.
and the 3g model i await isn't even available yet.

so now we know...   people prefer multi-purpose...

the winner has been declared.   which is not to say
that all you people with a kindle must send it back.
if you're happy with it, that's all that matters, really.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/712df06d/attachment.html>

From Morasch at aol.com  Tue Apr 20 17:05:09 2010
From: Morasch at aol.com (Morasch at aol.com)
Date: Tue, 20 Apr 2010 20:05:09 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <29487.1b1c9876.38ff9b35@aol.com>

jim said:
>    PG txt file is NOT AN E-BOOK FILE 
>    because it does not meet at least 
>    one criterion that is universally accepted 
>    as being required of ebook file formats: 
>    namely reflow.

jim, jim, jim, jim, jim, jim, jim.

it's bad enough that i call you a bloomin' idiot.

but it's even worse when you come right back
with a reply that _proves_ that's what you are.

one of the most widely-used e-book formats
in the last 20 years has been the .pdf format --
a format which has not, historically, done reflow.

yet you want to rule it out _by_definition_?

please.

i was _fighting_ against .pdf as an e-book format
for many, many years before you even showed up,
but even i cannot deny that it _is_ an e-book format.

_any_ file-format which can express a book _is_
-- or can be considered as -- an e-book format.

you seem to think you define terms of engagement,
that any discussion must be conducted according to
the way that _you_ define words.   that's bullcrap, jim.

***

besides, even if we _accepted_ your stupid definition,
it still doesn't compute, jim, because an ascii-file like
the p.g. e-text format _can_ be reflowed, quite easily.

you just take out the mid-paragraph hard line-breaks.

_any_ e-book programmer can write code to do that...

voila!   reflow!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/7a840700/attachment-0001.html>

From Morasch at aol.com  Tue Apr 20 17:08:20 2010
From: Morasch at aol.com (Morasch at aol.com)
Date: Tue, 20 Apr 2010 20:08:20 EDT
Subject: [gutvol-d] oh geez
Message-ID: <29721.5b0df6c8.38ff9bf4@aol.com>

oh geez, now lee is gonna call me "mr. morasch" again...         :+)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/43567014/attachment.html>

From jimad at msn.com  Tue Apr 20 17:10:56 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 17:10:56 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>	<SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>
	<alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>
Message-ID: <SNT120-DS8122EB2A61A6C5F79789BAE090@phx.gbl>

>The point is, and always has been, that Apple and the iPad NEVER
were a real consideration for Mr. Adcock and he just keeps on in
the process of making more and more OBVIOUS objections that were
ALREADY OBVIOUS from the start.

You keep saying things that are not true of me Michael, and which are not
true of the iPad.  It certainly was not obvious to me that the iPad would
not allow download of ePub and MOBI via wifi.  It is also not true that the
iPad was not a real consideration for me, and it is also not true that I
wouldn't reconsider the iPad if the future OS is less restrictive.  I don't
understand why it is that *you* are so defensive about the iPad? Because you
bought one??? I buy a lot of Dell computers, but if someone states an
opinion that Dell is a load of cr*p then I'm not going to get bent out of
shape, and if someone says that Amazon or Mickeysoft have made a hell of a
lot of stupid decisions in their day -- well, I couldn't agree with that
more!


From jimad at msn.com  Tue Apr 20 17:27:20 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 17:27:20 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <r2m627d59b81004201634p995d3ccbjd8d1a66238e38932@mail.gmail.com>
References: <12ead.2be71cd9.38ff76b3@aol.com>
	<r2m627d59b81004201634p995d3ccbjd8d1a66238e38932@mail.gmail.com>
Message-ID: <SNT120-DS593578974E8989C364AA2AE090@phx.gbl>


>I'm currently working through the very first Britannica project ever
- The Project Gutenberg Encyclopedia, Volume 1 of 28. It's
etext # 200, dated "1995-01-01". It's in sad shape. Text only,
many errors apparent to the casual eye. I'd like to reprocess
it.

I can't tell you how to get the scans but I have tools that will help you
recover the original lines breaks and match the PG text against a new OCR,
helping identify errors in both the OCR and the existant PG text. Let me
know if you find the scans. Yes I have tried this on a couple texts already.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/965847b4/attachment.html>

From Bowerbird at aol.com  Tue Apr 20 17:35:30 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 20 Apr 2010 20:35:30 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <1d280.45fdb600.38ffa252@aol.com>

dakretz said:
>    Great news! Let's test this thesis.
>    ...
>    etext # 200, dated "1995-01-01".
>    ...
>    Can anyone from DP tell me how to get the scans?

ok, two things...

first, it looks like i was wrong when i said
that d.p. had stopped maintaining the "ols",
so of course my "reason" for their having
stopped maintaining it was also incorrect.
(or one could say it's _no_longer_ correct,
but i do believe it was correct at one time.)

at any rate:
>    http://www.pgdp.org/ols

it claims 16,809 "unique books".

whether that means 16,809 scansets, i do not know.
but the scans for pg#31946 are right there, online...

second, the scan-sets from the very earliest books
were said to be "inconvenient to get to right now"
at one point in time.   whether they were located or
lost to the wind, i don't know.   but that _could've_
included pg#200.   the lowest p.g. numbers which
are shown as being included in "ols" presently are
pg#460 and pg#464, and four without any number.

but are you sure that d.p. actually digitized pg#200?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/aba24ab2/attachment.html>

From jimad at msn.com  Tue Apr 20 17:37:45 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 17:37:45 -0700
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <285bc.71539bf.38ff974a@aol.com>
References: <285bc.71539bf.38ff974a@aol.com>
Message-ID: <SNT120-DS100782580070B12CD41D8EAE090@phx.gbl>


>the winner has been declared.  which is not to say
that all you people with a kindle must send it back.
if you're happy with it, that's all that matters, really.


Hey BB, IF you know the limitations of the iPad [or the Kindle for that
matter] and you're happy then that's all that matters, really.  My wife
could probably use one to watch reruns of House, probably would make her
happy - maybe I'll get her one for that purpose. You are blessed in that
indeed iPad will display txt files - linebreaks hardwired to 70 chars so you
won't be able to use the two-finger zoom feature.  It even has an applet for
editing txt files - not sure how well its going to like the linebreaks. The
good news about ebook readers is that most people DO seem to like what they
end up buying - perhaps I'm unusual in seeing *what could have been*.  As
long as they read that's a good thing -- most of the people in the Apple
Store I went to clearly DON'T!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/99ac2d87/attachment.html>

From jimad at msn.com  Tue Apr 20 17:49:17 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 17:49:17 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com>
References: <29487.1b1c9876.38ff9b35@aol.com>
Message-ID: <SNT120-DS10C39FAC4A95AA62CFF1BEAE090@phx.gbl>

>one of the most widely-used e-book formats
in the last 20 years has been the .pdf format --
a format which has not, historically, done reflow.


And which is a format that is universally recognized to be a page layout
descriptor language, not an ebook file format.  PDF is a terrible thing to
try to read on an ebook reader, unless the page layout happens to
more-or-less match the size of your reader screen, and the size of the PDF
font happens to be close to something your eyes like.  People tend to print
PDF out if its more than a few pages because it is so much more suitable to
a laser printer than to an ebook reader.  Google Books PDFs *do* happen to
more-or-less often to match the size of the display on my DX and then its
not too bad - although you are still reading a blurry photocopy with an
occasional finger stuck in for good measure..

 
http://en.wikipedia.org/wiki/Portable_Document_Format

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/41b2940f/attachment-0001.html>

From jimad at msn.com  Tue Apr 20 17:56:06 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 20 Apr 2010 17:56:06 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com>
References: <29487.1b1c9876.38ff9b35@aol.com>
Message-ID: <SNT120-DS15D0932BD993BAEBFEB327AE090@phx.gbl>


>besides, even if we _accepted_ your stupid definition,
it still doesn't compute, jim, because an ascii-file like
the p.g. e-text format _can_ be reflowed, quite easily.
you just take out the mid-paragraph hard line-breaks.


And it will work most of the time.  Go ahead and write your reflow "txt
ebook reader" for the iPad -- ideally one that will allow downloading txt
from the internet to the iPad via wifi - I want to see it up on the Apple
Apps Store.  Charge a buck for it and see how many sell - I would be
curious. Maybe you'll end up a millionaire. I'll buy one even if I don't
have an iPad! ;-)

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/7c37da05/attachment.html>

From dakretz at gmail.com  Tue Apr 20 19:12:06 2010
From: dakretz at gmail.com (don kretz)
Date: Tue, 20 Apr 2010 19:12:06 -0700
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <1d280.45fdb600.38ffa252@aol.com>
References: <1d280.45fdb600.38ffa252@aol.com>
Message-ID: <i2u627d59b81004201912qf62e5d02q4c89a30f10948f82@mail.gmail.com>

Interesting page - I've never seen it before. Wonder what it's for?

The Britannica projects I can actually find fall into two groups.

1. The group where I can't see the images because

*Available Formats:    Display of images from this source has not been
> permitted.*
>

2. The group where I can see a page-at-a-time image viewer, by not a single
image
actually shows up. They all get that thing you get when there's a url, but
the file is missing.

So zero for sixteen or so.

But I guess the intent was good. Maybe they are all working their way
through a queue somewhere.


On Tue, Apr 20, 2010 at 5:35 PM, <Bowerbird at aol.com> wrote:

> dakretz said:
> >   Great news! Let's test this thesis.
> >   ...
>
> >   etext # 200, dated "1995-01-01".
> >   ...
> >   Can anyone from DP tell me how to get the scans?
>
> ok, two things...
>
> first, it looks like i was wrong when i said
> that d.p. had stopped maintaining the "ols",
> so of course my "reason" for their having
> stopped maintaining it was also incorrect.
> (or one could say it's _no_longer_ correct,
> but i do believe it was correct at one time.)
>
> at any rate:
> >   http://www.pgdp.org/ols
>
> it claims 16,809 "unique books".
>
> whether that means 16,809 scansets, i do not know.
> but the scans for pg#31946 are right there, online...
>
> second, the scan-sets from the very earliest books
> were said to be "inconvenient to get to right now"
> at one point in time.  whether they were located or
> lost to the wind, i don't know.  but that _could've_
> included pg#200.  the lowest p.g. numbers which
> are shown as being included in "ols" presently are
> pg#460 and pg#464, and four without any number.
>
> but are you sure that d.p. actually digitized pg#200?
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100420/122a1fe2/attachment.html>

From kevin.pulliam at gmail.com  Tue Apr 20 21:07:30 2010
From: kevin.pulliam at gmail.com (Kevin Pulliam)
Date: Tue, 20 Apr 2010 23:07:30 -0500
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <1d280.45fdb600.38ffa252@aol.com>
References: <1d280.45fdb600.38ffa252@aol.com>
Message-ID: <v2n78defb41004202107g353bc87di3ed0b2e43776d349@mail.gmail.com>

On the Open Library System, I note that high resolution gray-scale
scans (at least for the one project I checked) are not archived,
though the black and white scans are (though the example I checked,
the Astounding Magazine scans were actually microfilm scans IIRC,
which was a strange case but also what made the higher resolution
scans helpful).  I also note that there is no 'bulk' download function
to get a zip of all the files associated with a text.

But then again, something is better than nothing.

Kevin

On Tue, Apr 20, 2010 at 7:35 PM,  <Bowerbird at aol.com> wrote:
SNIP
>
> first, it looks like i was wrong when i said
> that d.p. had stopped maintaining the "ols",
> so of course my "reason" for their having
> stopped maintaining it was also incorrect.
> (or one could say it's _no_longer_ correct,
> but i do believe it was correct at one time.)
>
> at any rate:
>>?? http://www.pgdp.org/ols
>
SNIP
>
> -bowerbird

From hart at pglaf.org  Wed Apr 21 01:08:52 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 01:08:52 -0700 (PDT)
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <285bc.71539bf.38ff974a@aol.com>
References: <285bc.71539bf.38ff974a@aol.com>
Message-ID: <alpine.DEB.2.00.1004210105460.24234@mail.pglaf.org>

> the winner has been declared.? which is not to say
> that all you people with a kindle must send it back.
> if you're happy with it, that's all that matters, really.
>
> -bowerbird

I'll take that bet. . .another free lunch of cooked fowl.

I'll be that the iPad never even gets HALF the market for
eReaders over the long haul.

mh

From hart at pglaf.org  Wed Apr 21 01:33:26 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 01:33:26 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS8122EB2A61A6C5F79789BAE090@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>
	<SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>
	<alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>
	<SNT120-DS8122EB2A61A6C5F79789BAE090@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004210111010.24234@mail.pglaf.org>


On Tue, 20 Apr 2010, James Adcock wrote:

> >The point is, and always has been, that Apple and the iPad NEVER
> were a real consideration for Mr. Adcock and he just keeps on in
> the process of making more and more OBVIOUS objections that were
> ALREADY OBVIOUS from the start.
>
> You keep saying things that are not true of me Michael, and which are not
> true of the iPad.  It certainly was not obvious to me that the iPad would
> not allow download of ePub and MOBI via wifi.

1.  I think most of what you get on iPad IS .epub, is it not?

2.  I think Apple made it pretty obvious about other formats to most.


> It is also not true that the iPad was not a real consideration for me, and

I only have your own words upon which to base such.

I wrote an extended piece about it, but our CEO has asked me to tone
down my responses to you, even if you don't, so I didn't send it.

However, if you ask for it I will ask him to reconsider his request.


> it is also not true that I wouldn't reconsider the iPad if the future OS is

As if any Apple OS, other than UNIX based, has been so.


> less restrictive.  I don't understand why it is that *you* are so defensive
> about the iPad? Because you bought one???

I hate defending Apple, or any other billion dollar organization.

However, when you come out and give the iPod Stanza app example--
well--someone has to immediately answer THIS IS NOT THE CASE!!!

I provided several such examples, with no thanks for my effort.


However, when you come out and say you cannot download PG files--
well--someone has to immediately come out and download PG files!

No, not all formats, and certainly not all files, but the blanket
statement that it cannot be done only requires ONE example to get
proven false.  I provided just such examples.  With no thanks.

However, you did finally say thanks for at least one thing, and I
can't say you haven't given any thanks at all, but you certainly,
we all must admit, have not been encouraging my efforts.

Unless you think I thrive of discouraging remarks.

> I buy a lot of Dell computers, but if someone states an opinion that Dell is
> a load of cr*p then I'm not going to get bent out of shape, and if someone
> says that Amazon or Mickeysoft have made a hell of a lot of stupid decisions
> in their day -- well, I couldn't agree with that more!

I'm just trying to balance out some rather general complaints you
have made with some rather specific contradictions.

If you had asked, "How can I. . ." instead of your blanket typing
"you can't. . ." you might have gotten something a bit different.

However, blanket statements and single examples deserve proven in
direct fashion to be incorrect which it is so obvious.

Let's face it, you CAN get higher-resolution eBook performance on
iPads than with your example of the iPod Stanza app, and in quite
a few different apps that are free of charge.

Let's face it, you CAN go directly to pglaf.org and get eBooks.
No, not all formats, and who knows if all titles, but lots.

Let's face it, you didn't even try iBooks the first four hours.

You didn't seem to want to try Wattpad, either.

It's hard to consider your research as open when it's like this.

I spent a lot of time and effort working to answer your questions
and when I stated simple results of simple experiments you said I
was flaming and trashing you.

If someone says 2+2 is not 4, I have a right to challenge that in
plain sight without being accused of flaming or trashing.


>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From hart at pglaf.org  Wed Apr 21 01:39:29 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 01:39:29 -0700 (PDT)
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <29487.1b1c9876.38ff9b35@aol.com>
References: <29487.1b1c9876.38ff9b35@aol.com>
Message-ID: <alpine.DEB.2.00.1004210134390.24234@mail.pglaf.org>


Hey, when the first eBooks came out, they were all .txt files.

Now someone want's to rewrite history and say they are NOT???

Because of easily strippable hard returns???

Reflow didn't even EXIST in those days.

Much less WYSIWYG!!!

And WYSIWYG doesn't allow reflow. . .unless you consider that
as after the fact. . . .

In fact this whole discussion is after the fact.

eBooks have been around so much longer than these other ideal
presentations. . .so let the new presentations use new names,
and leave eBooks to the people who have been doing them.

Don't let anyone co-opt the name eBook!

Maybe they were right, and I should have trademarked "eBook,"
and then there would be no discussion out using the word.

Sheesh!


On Tue, 20 Apr 2010, Morasch at aol.com wrote:

> jim said:
> >?? PG txt file is NOT AN E-BOOK FILE
> >?? because it does not meet at least
> >?? one criterion that is universally accepted
> >?? as being required of ebook file formats:
> >?? namely reflow.
>
> jim, jim, jim, jim, jim, jim, jim.
>
> it's bad enough that i call you a bloomin' idiot.
>
> but it's even worse when you come right back
> with a reply that _proves_ that's what you are.
>
> one of the most widely-used e-book formats
> in the last 20 years has been the .pdf format --
> a format which has not, historically, done reflow.
>
> yet you want to rule it out _by_definition_?
>
> please.
>
> i was _fighting_ against .pdf as an e-book format
> for many, many years before you even showed up,
> but even i cannot deny that it _is_ an e-book format.
>
> _any_ file-format which can express a book _is_
> -- or can be considered as -- an e-book format.
>
> you seem to think you define terms of engagement,
> that any discussion must be conducted according to
> the way that _you_ define words.? that's bullcrap, jim.
>
> ***
>
> besides, even if we _accepted_ your stupid definition,
> it still doesn't compute, jim, because an ascii-file like
> the p.g. e-text format _can_ be reflowed, quite easily.
>
> you just take out the mid-paragraph hard line-breaks.
>
> _any_ e-book programmer can write code to do that...
>
> voila!? reflow!
>
> -bowerbird
>
>

From hart at pglaf.org  Wed Apr 21 01:46:02 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 01:46:02 -0700 (PDT)
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <SNT120-DS2245A36A4B0A9D031FD9E2AE0A0@phx.gbl>
References: <12ba6.1c940b.38ff760e@aol.com>
	<SNT120-DS2245A36A4B0A9D031FD9E2AE0A0@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004210143320.24234@mail.pglaf.org>


Funny how we can have so many people arguing that we should
be preserving the layout of paper books and at the same time
we have so much about getting rid of line breaks. . . .

However, once again I must comments that the amounts of time
spent on discussion would easily have made the conversions--


From Bowerbird at aol.com  Wed Apr 21 01:59:34 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 04:59:34 EDT
Subject: [gutvol-d] Re: Typesetting
Message-ID: <15e6.16e10517.39001876@aol.com>

the original linebreaks _should_ be preserved.

because some people _want_ them.

and those original linebreaks _should_ be easy
to remove as well.

because some people _want_ to remove them.

what nobody wants -- not really -- is a set of
_new_ linebreaks, which have no legacy import.

but even those are bearable, _if_ they can be
easily removed.

and let us recall, again, that project gutenberg
has _not_ made available a web-service which
people can utilize to unwrap p.g. e-texts...

_i_ have created such a web-service.

but project gutenberg has not.

which is a minor failing.

(i'd be happy to provide my code, if you want it.)

and let us recall, again, that project gutenberg
does _not_ ensure that every one of its e-texts is
structured so that it can be unwrapped properly.

this one is a _major_ failing.

these are the two things that project gutenberg
must do if it wants to proclaim that it has done
all that it can to make its linebreaks a non-issue.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/ee619625/attachment.html>

From hart at pglaf.org  Wed Apr 21 02:26:46 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 02:26:46 -0700 (PDT)
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <15e6.16e10517.39001876@aol.com>
References: <15e6.16e10517.39001876@aol.com>
Message-ID: <alpine.DEB.2.00.1004210225240.28158@mail.pglaf.org>


Let's have the code, and install it where everyone can find/use it.

Right away, without further ado. . . .

i.e. send the code now

Help Newby set it up later.


On Wed, 21 Apr 2010, Bowerbird at aol.com wrote:

> the original linebreaks _should_ be preserved.
>
> because some people _want_ them.
>
> and those original linebreaks _should_ be easy
> to remove as well.
>
> because some people _want_ to remove them.
>
> what nobody wants -- not really -- is a set of
> _new_ linebreaks, which have no legacy import.
>
> but even those are bearable, _if_ they can be
> easily removed.
>
> and let us recall, again, that project gutenberg
> has _not_ made available a web-service which
> people can utilize to unwrap p.g. e-texts...
>
> _i_ have created such a web-service.
>
> but project gutenberg has not.
>
> which is a minor failing.
>
> (i'd be happy to provide my code, if you want it.)
>
> and let us recall, again, that project gutenberg
> does _not_ ensure that every one of its e-texts is
> structured so that it can be unwrapped properly.
>
> this one is a _major_ failing.
>
> these are the two things that project gutenberg
> must do if it wants to proclaim that it has done
> all that it can to make its linebreaks a non-issue.
>
> -bowerbird
>
>

From marcello at perathoner.de  Wed Apr 21 03:05:19 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 21 Apr 2010 12:05:19 +0200
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <alpine.DEB.2.00.1004210134390.24234@mail.pglaf.org>
References: <29487.1b1c9876.38ff9b35@aol.com>
	<alpine.DEB.2.00.1004210134390.24234@mail.pglaf.org>
Message-ID: <4BCECDDF.4030507@perathoner.de>

Michael S. Hart wrote:

> Maybe they were right, and I should have trademarked "eBook,"
> and then there would be no discussion out using the word.

Rewriting history again?  Your name for the beast was "etext".


This is the oldest file by timestamp we have in the archive

   http://www.gutenberg.org/dirs/2/25/old/world91a.txt

and it contains no reference to "ebook".


-- 
Marcello Perathoner
webmaster at gutenberg.org

From greg at durendal.org  Wed Apr 21 04:29:40 2010
From: greg at durendal.org (Greg Weeks)
Date: Wed, 21 Apr 2010 07:29:40 -0400 (EDT)
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <v2n78defb41004202107g353bc87di3ed0b2e43776d349@mail.gmail.com>
References: <1d280.45fdb600.38ffa252@aol.com>
	<v2n78defb41004202107g353bc87di3ed0b2e43776d349@mail.gmail.com>
Message-ID: <alpine.DEB.2.00.1004210727560.6044@durendal.durendal.org>

On Tue, 20 Apr 2010, Kevin Pulliam wrote:

> On the Open Library System, I note that high resolution gray-scale
> scans (at least for the one project I checked) are not archived,
> though the black and white scans are (though the example I checked,
> the Astounding Magazine scans were actually microfilm scans IIRC,
> which was a strange case but also what made the higher resolution
> scans helpful).  I also note that there is no 'bulk' download function
> to get a zip of all the files associated with a text.

In the interest of having the high-res raw gray scans available I put them 
on the Internet Archive before they went to DP.

-- 
Greg Weeks
http://durendal.org:8080/greg/


From hart at pglaf.org  Wed Apr 21 09:15:45 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 09:15:45 -0700 (PDT)
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <4BCECDDF.4030507@perathoner.de>
References: <29487.1b1c9876.38ff9b35@aol.com>
	<alpine.DEB.2.00.1004210134390.24234@mail.pglaf.org>
	<4BCECDDF.4030507@perathoner.de>
Message-ID: <alpine.DEB.2.00.1004210914410.9864@mail.pglaf.org>


Once again Marcello [intentionally?] misses the point!!!

No reason I couldn't have trademarked "ebook," too, is there?


On Wed, 21 Apr 2010, Marcello Perathoner wrote:

> Michael S. Hart wrote:
>
> > Maybe they were right, and I should have trademarked "eBook,"
> > and then there would be no discussion out using the word.
>
> Rewriting history again?  Your name for the beast was "etext".
>
>
> This is the oldest file by timestamp we have in the archive
>
>   http://www.gutenberg.org/dirs/2/25/old/world91a.txt
>
> and it contains no reference to "ebook".
>
>
>

From lee at novomail.net  Wed Apr 21 09:58:17 2010
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 Apr 2010 10:58:17 -0600
Subject: [gutvol-d] Tidy -c and tables
Message-ID: <4BCF2EA9.1000402@novomail.net>

OK guys, we have a problem.

When one uses the "--clean" option, tidy removes any "<center>" elements 
and replaces them with "<div class='c1'>", and adds "div.c1 {text-align: 
center}" to the internal style sheet. This seems reasonable, because 
according to the HTML spec, "The CENTER element is exactly equivalent to 
specifying the DIV element with the align attribute set to 'center'." In 
a bit of a chained dependency, it turns out the "align" attribute is 
/also/ deprecated in favor of the CSS "text-align" style. So Tidy's 
behavior is completely consistent with the HTML spec, and in theory 
should cause no presentational differences before and after a page is 
Tidy'ed.

In theory, there is no difference between theory and reality; in 
reality, there is.

Consider the following snippet:

<center>
   <table>
     <tr>
       <td>
         line one<br />
         a longer line two<br />
         a very much longer line three
       </td>
     </tr>
   </table>
<center>

Using my four test browsers, Firefox 3,5, IE 8, Opera 9 and Safari 4, in 
each case the above table was center in the browser, but the text inside 
the table data element remained left justified.

When I changed the "<center>" element to "<div style='text-align: 
center'>" the text inside the table data element became centered as 
well. This is the behavior I would expect; the whole notion of 
"Cascading" in CSS indicates that style continue down the tree until 
changed. But it does illustrate the fact that there is a distinction 
between centering an /element/ (in this case the table), and centering 
the text /inside/ an element. So while, in theory, the "<center>" 
element should be equivalent to "<div style='text-align:center'>", in 
practice it seems that not only are they not equivalent in /some/ 
browsers, they are not equivalent in /any/ browser.

I believe one of our design goals was that Tidy would make no change to 
otherwise valid HTML that would cause it to render differently using 
browser defaults after Tidying. Thus, empty paragraphs, which are 
forbidden, are converted to /two/ "br />" elements, to match the default 
paragraph presentation in browsers.

Leaving aside the fact that the use of tables to control layout is 
simply morally reprehensible, the fact is that there a many, many pages 
'in the wild' that do so. And Tidy's current behavior will cause those 
pages' presentations to change after running Tidy. I think that in this 
case we have not met our design goal.

Now I can fix the code so that this doesn't happen in the future, if 
only I knew what the right fix /is/. I could simply remove "center" from 
the list of elements that get 'cleaned', and print a warning that the 
resulting contains elements that are deprecated (this warning probably 
ought to be there whenever deprecated elements remain in the output). Or 
I could focus more directly on this specific issue and whenever a 
"<table>" is a descendant of a "<center>" element I could add 
"style='text-align:left'" to the "<table>" element (assuming a 
"text-align" style is not already attached to that element) /before/ 
cleaning (both styles should then be moved to the internal style sheet). 
Or perhaps there is yet another solution that I haven't thought of? I 
don't think that simply telling the end user "your HTML doesn't follow 
the rules; we could fix it but we won't" is an option; after all, that's 
what Tidy is for right?

So, what should I do?

ps. I don't like the behavior that the "--drop-font-tags" option also 
drops "<center>" elements; page layout is not in the same classification 
as font appearance, and I can envision situations where I would want to 
drop "<font>" elements but retain "<center>" elements. But that is an 
argument for another day.


From lee at novomail.net  Wed Apr 21 10:13:29 2010
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 Apr 2010 11:13:29 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCD8BA4.6070803@perathoner.de>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<SNT120-DS3307B0B80EC34B96B1626AE0B0@phx.gbl>	<4BCC9527.3000103@novomail.net>	<SNT120-DS18E74382C2C41139CD70B1AE0B0@phx.gbl>	<4BCCBA89.2040103@novomail.net>
	<4BCD8BA4.6070803@perathoner.de>
Message-ID: <4BCF3239.6000508@novomail.net>

On 4/20/2010 5:10 AM, Marcello Perathoner wrote:

> Lee Passey wrote:
>
>> creates a "<div class='c1'>" around the tables of contents and
>> illustrations, with a corresponding style sheet that centers the
>> contents (which it should not),
>
> HTML Tidy does that.

You are correct. There is apparently a disconnect between the official 
HTML specification for the "<center>" element and the implementation on 
all major browser. For informational purposes, I have CC'ed this list 
with my message to the Tidy developers list on SourceForge. Until I get 
the matter resolved, I would recommend you /not/ use the --clean option 
with Tidy.

> Direct your complaints to the w3c.

Why? They wouldn't and couldn't do anything about it.

Tidy was developed by a member of the W3C, but he has long since 
abandoned any involvement with the project. Today, the Tidy project is 
an independent project based at 
http://www.sourceforge.net/projects/tidy. If you come across a bug in 
Tidy (or wish an enhancement), please log it at 
http://sourceforge.net/tracker/?group_id=27659&atid=390963.


From lee at novomail.net  Wed Apr 21 10:50:00 2010
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 Apr 2010 11:50:00 -0600
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <4BCDC069.2050608@yahoo.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>	<4BCAEB9A.2040105@perathoner.de>	<20100418150509.8A3501008D@cardano.dm.unipi.it>	<20100418170536.GA22578@pglaf.org>	<20100419021856.6C702100B0@cardano.dm.unipi.it>	<4BCC1F14.1090801@perathoner.de>	<alpine.DEB.2.00.1004190655470.28647@mail.pglaf.org>	<4BCC7B05.5020506@yahoo.com>	<4BCC9478.5040204@perathoner.de>
	<4BCDC069.2050608@yahoo.com>
Message-ID: <4BCF3AC8.5070405@novomail.net>

On 4/20/2010 8:55 AM, Julia C. Miller wrote:
>
> On 4/19/2010 12:35 PM, Marcello Perathoner wrote:
>
>> Julia C. Miller wrote:
>>
>>> In order for a "paradigm shift" to happen at DP, PG has to define
>>> what is and is not acceptable in the HTML and spell it out so that DP
>>> can put it into practice.
>>
>> It would be much better if DP did that.
>
> So after DP goes through the time and effort to define the standards to
> upload to PG, people from PG can say "No, that's not what we want"?

Sure. They can do that now with any of DP's offerings. But they won't. 
With the exception of Mr. Perathoner, I would be surprised if there were 
any of the Powers That Be at PG who know enough about HTML to be able to 
determine if an HTML file were "good" or "bad." And there are plenty of 
"bad" HTML files in the PG archive already.

If DP were to develop standards for HTML files, they would become the 
/de facto/ HTML standard for PG, although no one but DP would actually 
enforce them. If you can help convince DP to establish HTML guidelines 
and standards, I think you ought to try, if for no other reason than to 
produce guidelines that can be used independently of DP. DP is moribund, 
but not nearly as moribund as PG.

From jimad at msn.com  Wed Apr 21 12:06:17 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 12:06:17 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004210111010.24234@mail.pglaf.org>
References: <3426.7f772dcd.38fa3269@aol.com>	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>	<4BC98D3A.6080908@perathoner.de>	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>	<SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>	<alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>	<SNT120-DS8122EB2A61A6C5F79789BAE090@phx.gbl>
	<alpine.DEB.2.00.1004210111010.24234@mail.pglaf.org>
Message-ID: <SNT120-DS1906A5A524F13D6D5DC977AE090@phx.gbl>

>I spent a lot of time and effort working to answer your questions
and when I stated simple results of simple experiments you said I
was flaming and trashing you.

And I spent a lot of time and effort and a little bit of money in an Apple
Store trying out what you suggested and it didn't work.

Yes I can download something from PG, just not ePub nor MOBI.

Yes Apple provides something on iBooks, just not something with the PG name
in it.

Yes Apple provides something on iPad but just no way to use wifi to do
content dev in ePUB or MOBI aka SR for DP or solos.

Yes you can overcome these limitations if you use USB instead of wifi but I
thought the whole point of iPad at least from my point of view is that it
HAS wifi.  Well, so does a nook and a nook doesn't allow you to use it
either.

Etc.

I think I was pretty clear about what I wanted, and you kept claiming iPad
could do it, and I kept trying it, and guess what it can't -- at least not
by way of any of your suggestions, nor by way of anything else listed under
"books" or "ebooks" in the Apple App Store. I've tried about two dozen
applets by now including the ones you suggested. I think its fair to say
I've wasted much more time on this subject by now and done much more
research into it than you have, so its not clear to me what *you* are
complaining!


From hart at pglaf.org  Wed Apr 21 12:14:04 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 12:14:04 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS1906A5A524F13D6D5DC977AE090@phx.gbl>
References: <3426.7f772dcd.38fa3269@aol.com>
	<SNT120-DS2573E51EDABAAA90BE59E6AE0D0@phx.gbl>
	<4BC98D3A.6080908@perathoner.de>
	<SNT120-DS81A9933399358A005FD8FAE0D0@phx.gbl>
	<alpine.DEB.2.00.1004171631270.27333@mail.pglaf.org>
	<SNT120-DS16235B821B2DBAC3144550AE0C0@phx.gbl>
	<alpine.DEB.2.00.1004180221360.7683@mail.pglaf.org>
	<SNT120-DS13A0AAF21D124ECD266F59AE0B0@phx.gbl>
	<alpine.DEB.2.00.1004190702450.28647@mail.pglaf.org>
	<SNT120-DS22920359DA9212E384994FAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191032210.7967@mail.pglaf.org>
	<SNT120-DS207B6B0E47EE7552103B6DAE0B0@phx.gbl>
	<alpine.DEB.2.00.1004191355110.16824@mail.pglaf.org>
	<SNT120-DS13790D997AC3DEDC39B88AAE0A0@phx.gbl>
	<alpine.DEB.2.00.1004201544050.9849@mail.pglaf.org>
	<SNT120-DS8122EB2A61A6C5F79789BAE090@phx.gbl>
	<alpine.DEB.2.00.1004210111010.24234@mail.pglaf.org>
	<SNT120-DS1906A5A524F13D6D5DC977AE090@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211209200.19972@mail.pglaf.org>


On Wed, 21 Apr 2010, James Adcock wrote:

> >I spent a lot of time and effort working to answer your questions
> and when I stated simple results of simple experiments you said I
> was flaming and trashing you.
>
> And I spent a lot of time and effort and a little bit of money in an Apple
> Store trying out what you suggested and it didn't work.
>
> Yes I can download something from PG, just not ePub nor MOBI.

You are either not reading my reports, or ignoring them.

Go back and try again, otherwise I'll just let you talk yourself out,
as has been suggested to me already.


> Yes Apple provides something on iBooks, just not something with the PG name
> in it.

As above.  You're just not trying what I suggested.

You are doing something else, then complaining it didn't work.

You are obviously just not willing to put in the effort, neither on
your iPad research, nor in holding up your end of the conversation.

This has been a VERY BUSY WEEK/MONTH for me, and I have give you in
excess of what it appears I should have.

My apologies. . .I am sure you, and others, would have been happier
had I simply ignored, which I will start to do now, as advised.


First of all, please let me apologize for having been so busy,
it has been an incredible few weeks coming up to and now after
my first university wide acceptance of my work, and a speech I
gave about a week ago, from which I still have not caught up a
whole way to my normal energetic levels.

I'm still catching up on my sleeping, even sleeping through an
earlier half of the garage sales.  If you did not know, garage
sales are pretty much my favorite thing to do along with work.

Therefore, my messages may have been entirely too brief, or to
the point, or not full of the materials I am world renowned to
borrow at length from "The Tact and Diplomacy Department."

If you look up the motto of the department, it is so obvious.

Meanwhile, I am now trying to make contact with all those whom
I promised I would about a week ago, none of whom have done it
in my direction, so I really have no idea if YOU were serious,
when it came to continuing our discussion.

Normally I presume if someone has not contacted me in a week--
they are not interested at all--and waiting additional weeks--
rarely proves otherwise.

However. . .MY INTEREST has not waned. . . .

So, if you are willing to pursue our conversation further just
let me know, and if not, no reply is required.

After all, it IS "The Year of the eBook," and I expect busy to
busier to busiest, when it comes to all the years of my life.

If you would like to keep up with my thoughts and events I can
put you on a list I send to at odd times with even odder junk.

Again, if not, no reply is required, no offense taken.


It was very nice talking with you,


Michael S. Hart
Founder
Project Gutenberg,
Inventor of eBooks


> Yes Apple provides something on iPad but just no way to use wifi to do
> content dev in ePUB or MOBI aka SR for DP or solos.
>
> Yes you can overcome these limitations if you use USB instead of wifi but I
> thought the whole point of iPad at least from my point of view is that it
> HAS wifi.  Well, so does a nook and a nook doesn't allow you to use it
> either.
>
> Etc.
>
> I think I was pretty clear about what I wanted, and you kept claiming iPad
> could do it, and I kept trying it, and guess what it can't -- at least not
> by way of any of your suggestions, nor by way of anything else listed under
> "books" or "ebooks" in the Apple App Store. I've tried about two dozen
> applets by now including the ones you suggested. I think its fair to say
> I've wasted much more time on this subject by now and done much more
> research into it than you have, so its not clear to me what *you* are
> complaining!
>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From jimad at msn.com  Wed Apr 21 12:31:48 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 12:31:48 -0700
Subject: [gutvol-d] Re: Typesetting
In-Reply-To: <15e6.16e10517.39001876@aol.com>
References: <15e6.16e10517.39001876@aol.com>
Message-ID: <SNT120-DS1193D568973FDC2DE7B36DAE090@phx.gbl>

>but even those are bearable, _if_ they can be
easily removed.


The linebreaks are removable if PG enforces standards on txt files
submitted.  When people make mistakes on those submissions, and they will,
then the linebreaks will not be easily removed correctly.  Books of poetry
or containing poetry are one common counterexample.  Make a copy of your
linebreak removal routine public in the common computer formats BB, and let
us test it and see just how easily it works on the existing PG txt files.

 
The Unicode txt efforts are not too bad because at least then people can
choose to represent the glyphs the typesetter chose if they choose to do so,
rather than guessing and reinterpreting intent.  Italic and SC is then still
clearly a loss, as is graphics.  Most books use a least italics, so I'd hate
to see a PG file format that doesn't even support that.  If you wanted to
implement even a Unicode txt+ file format then you've got to provide
renderers for the different machines.  Or you auto-translate Unicode txt+
files to HTML for submitters and use the ubiquitous HTML renderers to allow
people to view the Unicode txt+ version. Then submitters do not have to
submit HTML unless they want to.  Recent efforts about 95% of the
submissions DO have HTML, but its not clear that that is because people want
to provide HTML or because the WW require it.

 
PG *is* already doing this more-or-less on the rare txt-only submissions
nowadays - automagically unwrapping and translating to HTML in a way which
most of the time is a win and obviously occasionally a loss.  The PG
legalese unfortunately is particularly unattractive in this approach, and
when the unwrapping fails then it is visually distracting - "how come this
paragraph isn't unwrapped - is it suppose to be poetry?"

 
How about it? Unicode txt+ file submissions if that is what a submitter
wants to do, and PG automatically renders that in HTML, and ePUB, and MOBI?

 
But if you are willing to take txt-only submissions and autorender them into
HTML accepting the resulting mistakes then why is it that you aren't willing
to take HTML and autorender them into the mandatory txt70 files?  Certainly
going from HTML to txt70 must introduce fewer mistakes. ??? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/87766a21/attachment.html>

From jimad at msn.com  Wed Apr 21 12:41:54 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 12:41:54 -0700
Subject: [gutvol-d] Re: Tidy -c and tables
In-Reply-To: <4BCF2EA9.1000402@novomail.net>
References: <4BCF2EA9.1000402@novomail.net>
Message-ID: <SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>

Why tidy?  

Many people work hard to retain linebreaks in the HTML so the code can be
gone over again at a future date and then PG throws away those linebreaks.


From Bowerbird at aol.com  Wed Apr 21 12:54:34 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 15:54:34 EDT
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
Message-ID: <21579.59641b06.3900b1fa@aol.com>

michael said:
>   Let's have the code

sure thing, boss.


>   and install it where everyone can find/use it.

great idea...


>   Help Newby set it up later.

i don't think he will need any help, but yeah, sure.

and, of course, i invite people to improve the script.

-bowerbird

===========================================


#!/usr/local/bin/perl -w
use CGI::Carp qw(fatalsToBrowser);

###########   read the user input
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

# Split the name-value pairs
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
    ($name, $value) = split(/=/, $pair);
    # Un-Webify plus signs and %-encoding
    $value =~ tr/+/ /;
    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    $value =~ s/<!--(.|\n)*-->//g;
    if ($allow_html != 0) {
       $value =~ s/<([^>]|\n)*>//g;
    }
    $FORM{$name} = $value;
       $value =~ s/\cM//g;
       if ($name eq "theinput") {$thebook=$value};
}


if ($thebook eq "") {
$thebook='paste the text you want to unwrap in this field, and click 
"unwrap"...'
};


print "content-type: text/html\n\n";
print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">';
print "\n"; print "\n";
print '<head><title>unwrap p.g. paragraphs';
print '</title>'; print "\n";
print '<meta http-equiv="content-type" content="text/html; 
charset=iso-8859-1">';
print '<body><pre>';

print '<form method="post"';

#########################################################
###   note that this line has to be changed to point to the appropriate 
place ###
print 'action="http://z-m-l.com/go/unwrap.pl">';
#########################################################

print '<input align="left" type="submit" value="-- unwrap --"> '; print 
"\n";
print '<input type="hidden" name="unwrap" value="reporting...">'; print 
"\n";
print '<p align="left">';
print '<textarea name="theinput"   rows=30 cols=80>';
print $thebook;
print '</textarea>';
print '</p>';
print '<input type="hidden" name="hiddenname" value="hiddenvalue">';


###   the numbers here refer to a list of steps 
###   i posted in a message to gutvol-d ###

#1
#skip
#2
$thebook =~ s/\r\n/\n/g ;
#3
$thebook =~ s/\r/\n/g ;
#4
#skip
#5
#skip
#6
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
#7
$thebook =~ s/\n/ \n/g ;
#8
$thebook =~ s/ \n \n/\n\n/g ;
#9
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
#10
# wait!   not yet!
#11
$thebook =~ s/ \n /\n /g ;
# maybe clone this for an asterisk in column 1, and
# clone this for a number in column 1 which is
# followed by a period-space in columns 2-3.
$thebook =~ s/ \n>/\n>/g ;
$thebook =~ s/ \n</\n</g ;
$thebook =~ s/ \n\t/\n\t/g ;
$thebook =~ s| \n/tab|\n/tab|g ;
#12
$thebook =~ s/ \n/ /g ;

print $thebook;

print "</form></pre></body></html>";
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/7ec07831/attachment.html>

From Bowerbird at aol.com  Wed Apr 21 13:04:57 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 16:04:57 EDT
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
Message-ID: <22152.5a5833e1.3900b469@aol.com>

jim said:
>    so its not clear to me what *you* are complaining!

i think he's complaining because he thought he was
taking part in an actual dialog, so when he realized
he'd been suckered into a bitch session, he chafed...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/4104854c/attachment-0001.html>

From Bowerbird at aol.com  Wed Apr 21 14:09:33 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 17:09:33 EDT
Subject: [gutvol-d] Re: a longstanding question has finally been answered
Message-ID: <2e057.4a7c0f95.3900c38d@aol.com>

michael said:
>    I'll take that bet. . .another free lunch of cooked fowl.

pardon me?   are you implying that you have
won a bet against me in the past?   _when?_


>    I'll be that the iPad never even gets HALF
>    the market for eReaders over the long haul.

you can't be serious.

the ipad is already up to 500,000 sold.

besides, by "multipurpose machine", i certainly
include the iphone and the ipodtouch in there,
and -- as i said -- apple has sold 80 million...

they sold 8.75 million iphones in the first quarter,
an _increase_ over the previous (_holiday_) quarter,
which is an absolutely astonishing accomplishment.

we're seeing a juggernaut, and it's gathering steam,
all because they gave people multipurpose machines
that can be carried around with the greatest of ease...

there are some of us who _knew_ this'd be killer.
(and, gee, michael, i thought you were one of us.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/77df52d8/attachment.html>

From marcello at perathoner.de  Wed Apr 21 14:15:13 2010
From: marcello at perathoner.de (Marcello Perathoner)
Date: Wed, 21 Apr 2010 23:15:13 +0200
Subject: [gutvol-d] Re: Tidy -c and tables
In-Reply-To: <SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
References: <4BCF2EA9.1000402@novomail.net>
	<SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
Message-ID: <4BCF6AE1.8060900@perathoner.de>

James Adcock wrote:

> Why tidy?  

Because I have to convert all the crooked HTML that has been posted in 
20 years into valid XHTML.

> Many people work hard to retain linebreaks in the HTML so the code can be
> gone over again at a future date and then PG throws away those linebreaks.

It is simpler to fix the HTML than to fix the Epub, so why should the 
Epub retain the line breaks?


-- 
Marcello Perathoner
webmaster at gutenberg.org

From jimad at msn.com  Wed Apr 21 14:28:25 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 14:28:25 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <22152.5a5833e1.3900b469@aol.com>
References: <22152.5a5833e1.3900b469@aol.com>
Message-ID: <SNT120-DS121702B770935483138313AE090@phx.gbl>

>i think he's complaining because he thought he was
taking part in an actual dialog, so when he realized
he'd been suckered into a bitch session, he chafed...


Well, I guess we both suffered in this regard because I'm also just back
from a trip yet I made two long trips to the mall to try out his suggestions
and they didn't work.  If I had thought it was just a B session I surely
wouldn't have bothered to make the trips.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/224d9f96/attachment.html>

From jimad at msn.com  Wed Apr 21 15:02:48 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 15:02:48 -0700
Subject: [gutvol-d] Re: Tidy -c and tables
In-Reply-To: <4BCF6AE1.8060900@perathoner.de>
References: <4BCF2EA9.1000402@novomail.net>	<SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
	<4BCF6AE1.8060900@perathoner.de>
Message-ID: <SNT120-DS1104DCE8B884F170AB82E4AE090@phx.gbl>

>It is simpler to fix the HTML than to fix the Epub, so why should the Epub
retain the line breaks?

Sorry, if you say that tidy is only being used to generate epubs not to
modify the posted HTML then fine.  On one of my previous HTML submissions a
WW said he had run tidy on it. Obviously the intent is to allow future
DP'ers or PG'ers who have figured out a better scheme, TEI Lite or whatever
(hypothetical), to make another DP pass or solo on the effort by extracting
the already "corrected" txt matched against the original OCR rather than
having to start again "from scratch."  And again pgdiff can extract
linebreak info given a txt which has lost linebreaks and an OCR that retains
them, but, its still cleaner and easier not to have lost them in the first
place.


From hart at pglaf.org  Wed Apr 21 15:25:24 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 15:25:24 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS121702B770935483138313AE090@phx.gbl>
References: <22152.5a5833e1.3900b469@aol.com>
	<SNT120-DS121702B770935483138313AE090@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>


Jim, learn to speak correctly, please.

I think it would have been better for all concerned if you had said,
on each occasion, something like:

"I tried out his suggestions and they didn't work the way I wanted."

My suggestions worked more than fine for myself and for many others,
or so it would seem, but not for you.

You have made a handful of absolute statements that appear to be 100
percent this or that, and I have refuted them at least to the points
where they are leaking and you have start bailing water.

This is what happens when ideas are half matching and half not. . .!

Get used to it. . .please.

Stop making such absolute statements as iPad cannot do this or that,
and start making statements such as the iPad doesn't do this in some
way that I would prefer, such as. . .then be specific.

I did what I did. . .you can't actually deny that I did these things
but you CAN say that this is not exactly what _I_ had in mind when I
said the words that prompted you to try those things.

There is a spirit of cooperation that has been lacking from the get-
go and it is both between you and Apple and between you and me.

It would be nice, very nice, if we could fix that up a little.


Sincerely,


Michael


On Wed, 21 Apr 2010, James Adcock wrote:

>
> >i think he's complaining because he thought he was
> taking part in an actual dialog, so when he realized
> he'd been suckered into a bitch session, he chafed...
>
> Well, I guess we both suffered in this regard because I?m also just back
> from a trip yet I made two long trips to the mall to try out his
> suggestions and they didn?t work.? If I had thought it was just a B
> session I surely wouldn?t have bothered to make the trips.
>
>
>

From Bowerbird at aol.com  Wed Apr 21 15:25:25 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 18:25:25 EDT
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
Message-ID: <33b17.38321b87.3900d555@aol.com>

well, jim, i'm sorry you wasted some of your
time following up on michael's suggestions.

but i think if you would have phrased your
complaints a bit better in the first place,
you could've avoided the misunderstanding.

i'm not just talking about your disingenuous
means of (mis)defining terms like "e-book",
either (although that is a serious error too),
but a bad case of failure to qualify yourself.

to my point, if you would have said this...

>    i have found it impossible to download
>    the books i want from the sites i want
>    in the formats i want such that i can
>    read them in the viewer-apps i want...

...you wouldn't have engendered opposition.
indeed, you might have gotten a whole lot
of sympathy.   (or, realistically, a little bit.)

and perhaps even received a few pointers...

but that's not what you said, not at the outset.

what you said initially sounded more like:

>    the ipad is so locked down that you
>    can only get the e-books steve jobs
>    allows you to get, and that sucks...

that's a paraphrase, of course, but i think
that that's what it sounded like to people.

but of course we know that that's not true,
not on the face of it.   there's a browser on
the ipad, so anything that's out on the web
is something the ipad can readily display...

put it this way.   if i were to offer to pay you
$100 for every e-book you read on the ipad,
how many "e-books" could you find to "read"?
yeah, that's what i thought; no shortage then.

yes, there is a walled-in, locked-up section
of the ipad, but we all know about that, and
what good does it do to bitch about it here?
it contributes nothing productive to a thread.

to sum up, hyperbole doesn't work well
if you don't know how to work it well...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/2426c6c7/attachment.html>

From jimad at msn.com  Wed Apr 21 15:25:08 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 15:25:08 -0700
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
In-Reply-To: <21579.59641b06.3900b1fa@aol.com>
References: <21579.59641b06.3900b1fa@aol.com>
Message-ID: <SNT120-DS250CCF656D2FF00F3F52D5AE090@phx.gbl>

Sorry, (not a perl programmer) perhaps you can provide some hints on how to
get this to work.  I tried it on my machine and this is what I got:

 
C:\JIM\Perl>bbunwrap.pl < matra11.txt > matra11.html

[Wed Apr 21 15:16:25 2010] bbunwrap.pl: Name "main::allow_html" used only
once:

possible typo at C:\JIM\Perl\bbunwrap.pl line 15.

[Wed Apr 21 15:16:25 2010] bbunwrap.pl: Name "main::FORM" used only once:
possib

le typo at C:\JIM\Perl\bbunwrap.pl line 18.

[Wed Apr 21 15:16:25 2010] bbunwrap.pl: Use of uninitialized value in read
at C:

\JIM\Perl\bbunwrap.pl line 5.

[Wed Apr 21 15:16:25 2010] bbunwrap.pl: Use of uninitialized value $thebook
in s

tring eq at C:\JIM\Perl\bbunwrap.pl line 24.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/84110970/attachment-0001.html>

From hart at pglaf.org  Wed Apr 21 15:33:17 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 15:33:17 -0700 (PDT)
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <2e057.4a7c0f95.3900c38d@aol.com>
References: <2e057.4a7c0f95.3900c38d@aol.com>
Message-ID: <alpine.DEB.2.00.1004211527030.27938@mail.pglaf.org>


On Wed, 21 Apr 2010, Bowerbird at aol.com wrote:

> michael said:
> >?? I'll take that bet. . .another free lunch of cooked fowl.
>
> pardon me?? are you implying that you have
> won a bet against me in the past?? _when?_

Every single time!


> >?? I'll be that the iPad never even gets HALF
> >?? the market for eReaders over the long haul.
>
> you can't be serious.
>
> the ipad is already up to 500,000 sold.

Then place your bets, ladies and gentlemen!!!


> besides, by "multipurpose machine", i certainly
> include the iphone and the ipodtouch in there,
> and -- as i said -- apple has sold 80 million...
>
> they sold 8.75 million iphones in the first quarter,
> an _increase_ over the previous (_holiday_) quarter,
> which is an absolutely astonishing accomplishment.
>
> we're seeing a juggernaut, and it's gathering steam,
> all because they gave people multipurpose machines
> that can be carried around with the greatest of ease...
>
> there are some of us who _knew_ this'd be killer.
> (and, gee, michael, i thought you were one of us.)
>
> -bowerbird

I am fine with the iPad, iPhone and iPod.

However, I stand by my offer to accept your terms.

What numbers over how many years. . . .

By the way, I like my cooked fowl with a little spine.

Who was it that predicted this whole cellphone thing?!

Eh?

By when you you think iPad will have half the market?

Just counting eReaders, which is to your advantage.

Even just counting Kindle, Noon and Sony???

I don't mean some fake "market" narrowed down in some
small portion of space-time, I mean the grand total.

Pick a date!!!

From Bowerbird at aol.com  Wed Apr 21 15:43:12 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 18:43:12 EDT
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
Message-ID: <34c52.130addae.3900d980@aol.com>

sorry, jim, i'm just a kindergartner when it comes to perl,
so i'm not sure i can help with any debugging, but i'll try...


>    Name "main::allow_html" used only once:

that was code i just pulled in, but i didn't "allow html",
so, practically, you can throw out that loop altogether...
(but also see the note below.)


>    Name "main::FORM" used only once: 
>    possible typo at C:\JIM\Perl\bbunwrap.pl line 18.

that construct is just to split the buffer, so 
you don't really need it.   the only variable
of any interest is "theinput", so just strip
"theinput=" off the buffer and you're good.

note that $theinput is then dumped into $thebook.
(but also see the note below.)


>    Use of uninitialized value in read at C:

that command reads the buffer that's submitted
to the script when it's mounted on a website, so
you'll have to rewrite it if you wanna run it offline,
which is what it appears you are trying to do here.

read this note:

what you would do instead (and it nullifies all of
the errors that we've discussed so far, is to open
the text-file on your machine, put it into $thebook,
and proceed to this line:
>    print "content-type: text/html\n\n";


>    Use of uninitialized value $thebook 
>    in string eq at C:\JIM\Perl\bbunwrap.pl line 24.

looks like your version of perl wants the variables
to be initialized, so just go ahead and do that...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/d6d656d2/attachment.html>

From jimad at msn.com  Wed Apr 21 15:44:31 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 15:44:31 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>
References: <22152.5a5833e1.3900b469@aol.com>	<SNT120-DS121702B770935483138313AE090@phx.gbl>
	<alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>
Message-ID: <SNT120-DS150DCE0CD3D5D71BD2EEC2AE090@phx.gbl>

>My suggestions worked more than fine for myself and for many others, or so
it would seem, but not for you.

OK, but then be clear that you are suggesting something different than what
I was asking for.  You could have said for example "I don't know how to do
what you want to do but if you try this app then it allows you to read their
selection of free PDF books instead of getting your choice of ePUB or MOBI
books."

I think I was pretty clear that what I wanted was a way to download a free
ePUB or a MOBI file that I find at some location on the web to an iPad and
read it there -- that is after all what most people would consider "The
eBook Experience" -- the ability to actually HAVE an ePub or MOBI just like
you have HAVE a paperback or you actually HAVE a printout if you prefer to
print out a postscript copy of a PG book on your laserprinter.  And by
HAVING something I mean you can take it with you and read it on an airplane
or on a beach -- all those things that people are used to doing with a
paperback or a printout and are used to doing with other ebook readers.

I would hope we could agree by now that this is not the iPad business model.
Rather the iPad business model is either you "buy" the book from Apple
(including a subset of "free" books that Apple has rebranded as coming from
Apple), or if you are a publisher you write your own applet for iPad to
distribute your own works (I guess PG can write its own applet if it wants
to have a presence on iPad but I'm not sure I'm the one to take that one on
-- maybe PG already has an iPhone programmer somewhere who can take that one
on?) or if you are the person who actually bought the iPad you are given
your own degraded transfer path via internet->desktop->iTunes->USB-iPad
where presumably Apple is blocking that wifi transfer path for the same
reason that B&N nook is blocking the wifi transfer path, namely to sell more
books. Sorry but having already hooked up a ebook reader to my desktop by
USB 1000+ times I can assure you that the USB connection path starts to get
really really old!


From Bowerbird at aol.com  Wed Apr 21 15:55:56 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 18:55:56 EDT
Subject: [gutvol-d] Re: a longstanding question has finally been answered
Message-ID: <35839.586a3a6c.3900dc7c@aol.com>

michael said:
>    Every single time!

you've never won a bet against me.   ever.


>    Then place your bets, ladies and gentlemen!!!

you've already lost.   80 million versus 3 million.


>    I am fine with the iPad, iPhone and iPod.

you've already lost.  80 million versus 3 million.


>    However, I stand by my offer to accept your terms.

you've already lost.  80 million versus 3 million.


>    What numbers over how many years. . . .

the only fair way is "since made available for sale".

but who needs to be fair?

let the sony/kindle/nook/whatever dedicated machines
(even palm and rocketbook!) have their huge head-start
in time...   because even with it, they've fallen far behind.


>    By the way, I like my cooked fowl with a little spine.

cook it however you like!   if you can catch it, that is...      :+)


>    Who was it that predicted this whole cellphone thing?!

so why are you backing off it now?

a cellphone that's also being used as an e-reader
is -- by definition -- a multipurpose machine...


>    By when you you think iPad will have half the market?

oh, now you want to make it just the ipad?   ok, no problem.


>    Just counting eReaders, which is to your advantage.
>    Even just counting Kindle, Noon and Sony???

you've already lost.  80 million versus 3 million.


>    I don't mean some fake "market" narrowed down in 
>    some small portion of space-time, I mean the grand total.
>    Pick a date!!!

"since made available for sale"...

1-year-out vs. 1-year-out, 2-years-out vs. 2-years-out, etc.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/96ffc6ae/attachment.html>

From hart at pglaf.org  Wed Apr 21 16:00:33 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 16:00:33 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS150DCE0CD3D5D71BD2EEC2AE090@phx.gbl>
References: <22152.5a5833e1.3900b469@aol.com>
	<SNT120-DS121702B770935483138313AE090@phx.gbl>
	<alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>
	<SNT120-DS150DCE0CD3D5D71BD2EEC2AE090@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211549030.27938@mail.pglaf.org>


On Wed, 21 Apr 2010, James Adcock wrote:

> >My suggestions worked more than fine for myself and for many others, or so
> it would seem, but not for you.
>
> OK, but then be clear that you are suggesting something different than what
> I was asking for.  You could have said for example "I don't know how to do
> what you want to do but if you try this app then it allows you to read their
> selection of free PDF books instead of getting your choice of ePUB or MOBI
> books."

I did what you said could not be done.

You did not mention .mobi when you started, did you?

I was specific about each program I used, but I perhaps should have
once again made it obvious that when I download from pglaf.org I am
getting the .txt files, and I don't recall you specifying formats--
just that it was impossible to download eBooks from pglaf.org

Which I proved false.

I don't recall anything about about .pdf, .mobi, or .epub at first.

To me this was merely, and I apologize, part of the bitch sessions.


> I think I was pretty clear that what I wanted was a way to download a free
> ePUB or a MOBI file that I find at some location on the web to an iPad and
> read it there -- that is after all what most people would consider "The
> eBook Experience" -- the ability to actually HAVE an ePub or MOBI just like

Again, I must remind you once more, I am the wrong person to talk to about
saving in YOUR favorite format. . .that is strictly up to YOU, not to me.

I downloaded files, I can take them on a plane or to the beach.

I don't deal with paper, again you have the wrong person.


> you have HAVE a paperback or you actually HAVE a printout if you prefer to
> print out a postscript copy of a PG book on your laserprinter.  And by
> HAVING something I mean you can take it with you and read it on an airplane
> or on a beach -- all those things that people are used to doing with a
> paperback or a printout and are used to doing with other ebook readers.

This I must say I doubt, but it is really non-sequitur to what has passed.


> I would hope we could agree by now that this is not the iPad business model.
> Rather the iPad business model is either you "buy" the book from Apple
> (including a subset of "free" books that Apple has rebranded as coming from

A very large subset. . .perhaps even larger than any other comparable subset.

Comparable meaning you can use something like "NOT Mark Twain" as a subset.


> Apple), or if you are a publisher you write your own applet for iPad to

Again, I must once again refer you to Wattpad, for the fifth? time.


> distribute your own works (I guess PG can write its own applet if it wants
> to have a presence on iPad but I'm not sure I'm the one to take that one on

So far we have been pretty happy with the Wattpad app, but, yes, I think in
time we SHOULD write out own apps.

> -- maybe PG already has an iPhone programmer somewhere who can take that one
> on?) or if you are the person who actually bought the iPad you are given
> your own degraded transfer path via internet->desktop->iTunes->USB-iPad
> where presumably Apple is blocking that wifi transfer path for the same
> reason that B&N nook is blocking the wifi transfer path, namely to sell more
> books. Sorry but having already hooked up a ebook reader to my desktop by
> USB 1000+ times I can assure you that the USB connection path starts to get
> really really old!

I'm glad you brought up that the nook doesn't allow "real" wifi [at all!!!]

I was going to get after you about that, and the other things nook, Sony or
Kindle do to herd you onto the "company store" turf.

Not sure how many remember "company stores" these days.


mh


>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From hart at pglaf.org  Wed Apr 21 16:02:47 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 16:02:47 -0700 (PDT)
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <35839.586a3a6c.3900dc7c@aol.com>
References: <35839.586a3a6c.3900dc7c@aol.com>
Message-ID: <alpine.DEB.2.00.1004211601360.27938@mail.pglaf.org>


Total market. . .iPad versus Kindle, Sony and nook, leave the rest.

When will there be more iPads???

A little spine here. . .come on!


On Wed, 21 Apr 2010, Bowerbird at aol.com wrote:

> michael said:
> >?? Every single time!
>
> you've never won a bet against me.? ever.
>
>
> >?? Then place your bets, ladies and gentlemen!!!
>
> you've already lost.? 80 million versus 3 million.
>
>
> >?? I am fine with the iPad, iPhone and iPod.
>
> you've already lost.? 80 million versus 3 million.
>
>
> >?? However, I stand by my offer to accept your terms.
>
> you've already lost.? 80 million versus 3 million.
>
>
> >?? What numbers over how many years. . . .
>
> the only fair way is "since made available for sale".
>
> but who needs to be fair?
>
> let the sony/kindle/nook/whatever dedicated machines
> (even palm and rocketbook!) have their huge head-start
> in time...? because even with it, they've fallen far behind.
>
>
> >?? By the way, I like my cooked fowl with a little spine.
>
> cook it however you like!? if you can catch it, that is...???? :+)
>
>
> >?? Who was it that predicted this whole cellphone thing?!
>
> so why are you backing off it now?
>
> a cellphone that's also being used as an e-reader
> is -- by definition -- a multipurpose machine...
>
>
> >?? By when you you think iPad will have half the market?
>
> oh, now you want to make it just the ipad?? ok, no problem.
>
>
> >?? Just counting eReaders, which is to your advantage.
> >?? Even just counting Kindle, Noon and Sony???
>
> you've already lost.? 80 million versus 3 million.
>
>
> >?? I don't mean some fake "market" narrowed down in
> >?? some small portion of space-time, I mean the grand total.
> >?? Pick a date!!!
>
> "since made available for sale"...
>
> 1-year-out vs. 1-year-out, 2-years-out vs. 2-years-out, etc.
>
> -bowerbird
>
>

From jimad at msn.com  Wed Apr 21 16:03:54 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 16:03:54 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <33b17.38321b87.3900d555@aol.com>
References: <33b17.38321b87.3900d555@aol.com>
Message-ID: <SNT120-DS50CC460C27F4FBD7EF1DDAE090@phx.gbl>

>but of course we know that that's not true,
not on the face of it.  there's a browser on
the ipad, so anything that's out on the web
is something the ipad can readily display...


I still think there is some fundamental misunderstanding here.  Using the
iPad I go on the web to PG.  I see a ePub book I like there.  I use the iPad
web browser to go there.  I click on the ePUB book.  iPad says "sorry Hal I
can't allow you to read that book." I don't see how you can say that the
iPad "readily displays" something when it explicitly tells me that it
refuses to display that something!

 
I take my cheap crappy generic netbook, I go on the web to PG.  I see a ePub
book I like there.  I use the cheap crappy generic netbook's web browser to
go there.  I click on the ePUB book.  The cheap crappy netbook automatically
downloads the ePUB book to the netbook so I can read it later in an airplane
or on the beach, and it automatically opens it and I start reading.  The
netbook DOES "readily display" anything that's out on the web.


>put it this way.  if i were to offer to pay you
$100 for every e-book you read on the ipad,
how many "e-books" could you find to "read"?

 
If you pay me $100 for every time I am part way through a book and then I
pick up my iPad again and that book has magically disappeared because I am
no longer in sight of a public wifi connection then I am going to come out
way ahead.  This is silly, Comcast offers 100s of TV channels, but if I turn
on the TV channel at any moment in time the probability is 95% that Comcast
will have nothing on that *I* want to watch at that moment in time. I work
hard to find what I want to read, and I work hard to find texts that I want
to create to submit to PG, and most of what I want to read or what I want to
create to submit to PG is NOT available via the current hardwired iPad
applets each distributing texts from ONE server location on the internet.
If every site that offers free books writse its own applet specifically to
support iPad rather than using their already existing HTML sites which
support "real" HTML browsers, well, then I guess iPad would do what I want
to do.  But I don't understand why every organization out on the web
offering free books has to write their own applet for iPad when they already
HAVE written that applet -- its call an HTML web site - its just that Apple
has deliberately pimped their web browser to make sure all these already
existing "applets" aka HTML free ebook websites don't work! 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/119b06dd/attachment-0001.html>

From gbnewby at pglaf.org  Wed Apr 21 16:10:52 2010
From: gbnewby at pglaf.org (Greg Newby)
Date: Wed, 21 Apr 2010 16:10:52 -0700
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
In-Reply-To: <21579.59641b06.3900b1fa@aol.com>
References: <21579.59641b06.3900b1fa@aol.com>
Message-ID: <20100421231052.GA31654@pglaf.org>

http://pglaf.org/cgi-bin/unwrap.pl

A few small changes:

#!/usr/local/bin/perl -w
use strict; # gbn

use CGI::Carp qw(fatalsToBrowser);

###########   read the user input
my $buffer; my $thebook=""; # gbn
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

# Split the name-value pairs
my @pairs = split(/&/, $buffer);
foreach my $pair (@pairs) {
    (my $name, my $value) = split(/=/, $pair);
    # Un-Webify plus signs and %-encoding
    $value =~ tr/+/ /;
    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    $value =~ s/<!--(.|\n)*-->//g;
# gbn:    if ($allow_html != 0) {
#       $value =~ s/<([^>]|\n)*>//g;
#    }
# gbn:   $FORM{$name} = $value;
       $value =~ s/\cM//g;
       if ($name eq "theinput") {$thebook=$value};
}


if ($thebook eq "") {
    $thebook='paste the text you want to unwrap in this field, and click 
"unwrap"...'
};


print "content-type: text/html\n\n";
print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">';
print "\n"; print "\n";
print '<head><title>unwrap p.g. paragraphs';
print '</title>'; print "\n";
print '<meta http-equiv="content-type" content="text/html; 
charset=iso-8859-1">';
print '<body><pre>';

print '<form method="post"';

#########################################################
###   note that this line has to be changed to point to the appropriate 
###  place
# gbn: print 'action="http://z-m-l.com/go/unwrap.pl">';
print 'action="http://pglaf.org/cgi-bin/unwrap.pl">';
#########################################################

print '<input align="left" type="submit" value="-- unwrap --"> '; print 
"\n";
print '<input type="hidden" name="unwrap" value="reporting...">'; print 
"\n";
print '<p align="left">';
print '<textarea name="theinput"   rows=30 cols=80>';
print $thebook;
print '</textarea>';
print '</p>';
print '<input type="hidden" name="hiddenname" value="hiddenvalue">';


###   the numbers here refer to a list of steps 
###   i posted in a message to gutvol-d ###

#1
#skip
#2
$thebook =~ s/\r\n/\n/g ;
#3
$thebook =~ s/\r/\n/g ;
#4
#skip
#5
#skip
#6
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
$thebook =~ s/ \n/\n/g ;
#7
$thebook =~ s/\n/ \n/g ;
#8
$thebook =~ s/ \n \n/\n\n/g ;
#9
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
$thebook =~ s/\n \n/\n\n/g ;
#10
# wait!   not yet!
#11
$thebook =~ s/ \n /\n /g ;
# maybe clone this for an asterisk in column 1, and
# clone this for a number in column 1 which is
# followed by a period-space in columns 2-3.
$thebook =~ s/ \n>/\n>/g ;
$thebook =~ s/ \n</\n</g ;
$thebook =~ s/ \n\t/\n\t/g ;
$thebook =~ s| \n/tab|\n/tab|g ;
#12
$thebook =~ s/ \n/ /g ;

print $thebook;

print "</form></pre></body></html>";


On Wed, Apr 21, 2010 at 03:54:34PM -0400, Bowerbird at aol.com wrote:
> michael said:
> >   Let's have the code
> 
> sure thing, boss.
> 
> 
> >   and install it where everyone can find/use it.
> 
> great idea...
> 
> 
> >   Help Newby set it up later.
> 
> i don't think he will need any help, but yeah, sure.
> 
> and, of course, i invite people to improve the script.
> 
> -bowerbird
> 
> ===========================================
> 
> 
> #!/usr/local/bin/perl -w
> use CGI::Carp qw(fatalsToBrowser);
> 
> ###########   read the user input
> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> 
> # Split the name-value pairs
> @pairs = split(/&/, $buffer);
> foreach $pair (@pairs) {
>     ($name, $value) = split(/=/, $pair);
>     # Un-Webify plus signs and %-encoding
>     $value =~ tr/+/ /;
>     $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>     $value =~ s/<!--(.|\n)*-->//g;
>     if ($allow_html != 0) {
>        $value =~ s/<([^>]|\n)*>//g;
>     }
>     $FORM{$name} = $value;
>        $value =~ s/\cM//g;
>        if ($name eq "theinput") {$thebook=$value};
> }
> 
> 
> if ($thebook eq "") {
> $thebook='paste the text you want to unwrap in this field, and click 
> "unwrap"...'
> };
> 
> 
> print "content-type: text/html\n\n";
> print '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">';
> print "\n"; print "\n";
> print '<head><title>unwrap p.g. paragraphs';
> print '</title>'; print "\n";
> print '<meta http-equiv="content-type" content="text/html; 
> charset=iso-8859-1">';
> print '<body><pre>';
> 
> print '<form method="post"';
> 
> #########################################################
> ###   note that this line has to be changed to point to the appropriate 
> place ###
> print 'action="http://z-m-l.com/go/unwrap.pl">';
> #########################################################
> 
> print '<input align="left" type="submit" value="-- unwrap --"> '; print 
> "\n";
> print '<input type="hidden" name="unwrap" value="reporting...">'; print 
> "\n";
> print '<p align="left">';
> print '<textarea name="theinput"   rows=30 cols=80>';
> print $thebook;
> print '</textarea>';
> print '</p>';
> print '<input type="hidden" name="hiddenname" value="hiddenvalue">';
> 
> 
> ###   the numbers here refer to a list of steps 
> ###   i posted in a message to gutvol-d ###
> 
> #1
> #skip
> #2
> $thebook =~ s/\r\n/\n/g ;
> #3
> $thebook =~ s/\r/\n/g ;
> #4
> #skip
> #5
> #skip
> #6
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> $thebook =~ s/ \n/\n/g ;
> #7
> $thebook =~ s/\n/ \n/g ;
> #8
> $thebook =~ s/ \n \n/\n\n/g ;
> #9
> $thebook =~ s/\n \n/\n\n/g ;
> $thebook =~ s/\n \n/\n\n/g ;
> $thebook =~ s/\n \n/\n\n/g ;
> $thebook =~ s/\n \n/\n\n/g ;
> #10
> # wait!   not yet!
> #11
> $thebook =~ s/ \n /\n /g ;
> # maybe clone this for an asterisk in column 1, and
> # clone this for a number in column 1 which is
> # followed by a period-space in columns 2-3.
> $thebook =~ s/ \n>/\n>/g ;
> $thebook =~ s/ \n</\n</g ;
> $thebook =~ s/ \n\t/\n\t/g ;
> $thebook =~ s| \n/tab|\n/tab|g ;
> #12
> $thebook =~ s/ \n/ /g ;
> 
> print $thebook;
> 
> print "</form></pre></body></html>";

> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d


From Bowerbird at aol.com  Wed Apr 21 16:20:10 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 19:20:10 EDT
Subject: [gutvol-d] Re: a longstanding question has finally been answered
Message-ID: <36f79.176fd0f1.3900e22a@aol.com>

michael said:
>    Total market. . .iPad versus Kindle, Sony and nook, leave the rest.
>    When will there be more iPads???
>    A little spine here. . .come on!

ipad had 300,000 the first weekend.

it took kindle/sony/nook a _year_ to move that many.

ipad has 500,000 now, less than one month out...

it took kindle/sony/nook _18_months_ to sell that many.

are you disputing who the eventual winner will be?

or are we just arguing about how long it will take the ipad
to go ahead?

how long has the kindle been out now?
just to make it easy on myself, i'll say that
when the ipad has been out _that_long_,
on that date its sales will have surpassed
sales made by the kindle/sony/nook trio
on that date.   (and likely by a large margin.)

and if you want certainty, bracket a period
a-year-before that date and a-year-after...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/65f0a038/attachment.html>

From hart at pglaf.org  Wed Apr 21 16:23:37 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 16:23:37 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS50CC460C27F4FBD7EF1DDAE090@phx.gbl>
References: <33b17.38321b87.3900d555@aol.com>
	<SNT120-DS50CC460C27F4FBD7EF1DDAE090@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211618150.31494@mail.pglaf.org>


Jim is still refusing to acknowledge that PG eBooks in .epub
are so easily available that people, including him, perhaps,
literally, miss that they are .epub eBooks.

He has not answered my questions about this. . . .

I ask again:

Are not the PG iBooks actually PG .epub books?

Same goes for Wattpad?

Same goes for all the rest.

I thought .epub was the default iPad format, no?


After all the peacemaking just now, I must admit that Jim still
seems to be playing some games so he can keep on bitching.

I think what he really wants is to download .epub files and the
object is to do MORE than just READ them.

I'm not sure WHAT more he has in mind, unless it's editing.

I'm not really sure iPad were made for all that stuff.

I think it's pretty obvious they weren't.

And they don't have a trunk!!!


On Wed, 21 Apr 2010, James Adcock wrote:

>
> >but of course we know that that's not true,
> not on the face of it.? there's a browser on
> the ipad, so anything that's out on the web
> is something the ipad can readily display...
>
> I still think there is some fundamental misunderstanding here.? Using the
> iPad I go on the web to PG.? I see a ePub book I like there.? I use the
> iPad web browser to go there.? I click on the ePUB book.? iPad says ?sorry
> Hal I can?t allow you to read that book.? I don?t see how you can say that
> the iPad ?readily displays? something when it explicitly tells me that it
> refuses to display that something!
>
> ?
>
> I take my cheap crappy generic netbook, I go on the web to PG.? I see a
> ePub book I like there.? I use the cheap crappy generic netbook?s web
> browser to go there.? I click on the ePUB book.? The cheap crappy netbook
> automatically downloads the ePUB book to the netbook so I can read it later
> in an airplane or on the beach, and it automatically opens it and I start
> reading.? The netbook DOES ?readily display? anything that?s out on the
> web.
>
>
> >put it this way.? if i were to offer to pay you
> $100 for every e-book you read on the ipad,
> how many "e-books" could you find to "read"?
>
> ?
>
> If you pay me $100 for every time I am part way through a book and then I
> pick up my iPad again and that book has magically disappeared because I am
> no longer in sight of a public wifi connection then I am going to come out
> way ahead.? This is silly, Comcast offers 100s of TV channels, but if I
> turn on the TV channel at any moment in time the probability is 95% that
> Comcast will have nothing on that *I* want to watch at that moment in time.
> I work hard to find what I want to read, and I work hard to find texts that
> I want to create to submit to PG, and most of what I want to read or what I
> want to create to submit to PG is NOT available via the current hardwired
> iPad applets each distributing texts from ONE server location on the
> internet.? If every site that offers free books writse its own applet
> specifically to support iPad rather than using their already existing HTML
> sites which support ?real? HTML browsers, well, then I guess iPad would do
> what I want to do.? But I don?t understand why every organization out on
> the web offering free books has to write their own applet for iPad when
> they already HAVE written that applet -- its call an HTML web site ? its
> just that Apple has deliberately pimped their web browser to make sure all
> these already existing ?applets? aka HTML free ebook websites don?t work!
>
>
>
>
>

From hart at pglaf.org  Wed Apr 21 16:27:15 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 16:27:15 -0700 (PDT)
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <36f79.176fd0f1.3900e22a@aol.com>
References: <36f79.176fd0f1.3900e22a@aol.com>
Message-ID: <alpine.DEB.2.00.1004211625160.31494@mail.pglaf.org>


You are really willing to bet that on April 21, 2013 there will be
more iPads than Kindles plus Sonys plus nooks. . . .

Say it in print for the folks, and I'll start practicing turducken
recipes. . . .


On Wed, 21 Apr 2010, Bowerbird at aol.com wrote:

> michael said:
> >?? Total market. . .iPad versus Kindle, Sony and nook, leave the rest.
> >?? When will there be more iPads???
> >?? A little spine here. . .come on!
>
> ipad had 300,000 the first weekend.
>
> it took kindle/sony/nook a _year_ to move that many.
>
> ipad has 500,000 now, less than one month out...
>
> it took kindle/sony/nook _18_months_ to sell that many.
>
> are you disputing who the eventual winner will be?
>
> or are we just arguing about how long it will take the ipad
> to go ahead?
>
> how long has the kindle been out now?
> just to make it easy on myself, i'll say that
> when the ipad has been out _that_long_,
> on that date its sales will have surpassed
> sales made by the kindle/sony/nook trio
> on that date.? (and likely by a large margin.)
>
> and if you want certainty, bracket a period
> a-year-before that date and a-year-after...
>
> -bowerbird
>
>

From Bowerbird at aol.com  Wed Apr 21 16:34:25 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 19:34:25 EDT
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
Message-ID: <37cac.38c448de.3900e581@aol.com>

jim said:
>    I still think there is some fundamental misunderstanding here.

there is.   and it will remain here, until you decide to change your tune.


>    Using the iPad I go on the web to PG.? I see a ePub book I like there.

you mean that you see a _book_ you like there.

a _book_ that is offered in a _number_ of different formats,
including .html (which can be read in the web-browser) and
as a plain-vanilla .txt file, which can be read in many apps...

your problem is that you want to insist on a certain file-format.

even though i'm not fully convinced that one cannot find a way
to get an .epub onto an ipad to read in the app that one wants,
i _might_ be inclined to take your word for it (since i don't care).

but don't try to pretend that because you cannot get an .epub,
you can't get "an e-book", because i don't play nonsense games.


>    I don?t see how you can say that the iPad ?readily displays? 
>    something when it explicitly tells me that it refuses to 
>    display that something!

you just can't give it up, can you, jim?

i mean, seriously, you are _incapable_, aren't you?

but how long do you expect people to take you seriously
when you _persist_ in making such nonsense arguments?

what book -- what _book_, not some particular file-format --
what _book_ is it that you claim the ipad is refusing to display?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/385639bd/attachment.html>

From lee at novomail.net  Wed Apr 21 16:47:18 2010
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 Apr 2010 17:47:18 -0600
Subject: [gutvol-d] Re: Tidy -c and tables
In-Reply-To: <SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
References: <4BCF2EA9.1000402@novomail.net>
	<SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
Message-ID: <4BCF8E86.2000004@novomail.net>

On 4/21/2010 1:41 PM, James Adcock wrote:

> Why tidy?

As Mr. Perathoner has pointed out, it is because OCF requires that the 
interior text be valid XML, and it is certain that not all of the 
hand-crafted HTML in the PG repository is valid XHTML. Tidy should cause 
no harm, but /will/ guarantee XHTML output. It is not the only tool that 
could produce this result, but it is probably the best (although not 
perfect). I suspect many of the DP post-processors use tidy as part of 
their regular workflow.

> Many people work hard to retain linebreaks in the HTML so the code can be
> gone over again at a future date and then PG throws away those linebreaks.

If your notion of "retaining linebreaks" is by putting a newline in your 
HTML text you have already lost the battle. According to the HTML 
specification, newlines are white space, and must be treated as such. 
HTML is an explicit markup language; ie. any markup which is not part of 
the base text must be explicit, eg. <br /> and not CR, LF, of CRLF.

I do not believe that there is any HTML authoring/editing tool which 
will preserve newline characters as implicit markup. If you really are 
"work[ing] hard to retain linebreaks in HTML" then you will make them 
explicit. You can do this by adding explicit markup that user agents 
will ignore (eg. <span class='linebreak'> </span>), using an invalid 
HTML element (eg. <lb>) which browser will ignore, or by using the HTML 
break element in such a way that its display can be turned on or off by 
the use of CSS styles (eg. <br class='lb' />). If you expect everyone to 
respect newline characters as line breaks in HTML, in direct 
contravention to the HTML spec, you are borrowing trouble.

I agree with you that line breaks need to be preserved; I just think 
they should be preserved explicitly, and not implicitly.

From lee at novomail.net  Wed Apr 21 16:51:46 2010
From: lee at novomail.net (Lee Passey)
Date: Wed, 21 Apr 2010 17:51:46 -0600
Subject: [gutvol-d] Re: a longstanding question has finally been answered
In-Reply-To: <alpine.DEB.2.00.1004211625160.31494@mail.pglaf.org>
References: <36f79.176fd0f1.3900e22a@aol.com>
	<alpine.DEB.2.00.1004211625160.31494@mail.pglaf.org>
Message-ID: <4BCF8F92.20604@novomail.net>

On 4/21/2010 5:27 PM, Michael S. Hart wrote:
>
>
> You are really willing to bet that on April 21, 2013 there will be
> more iPads than Kindles plus Sonys plus nooks. . . .
>
> Say it in print for the folks, and I'll start practicing turducken
> recipes. . . .

I'll say it in print, and I don't even /like/ the iPad. General purpose 
computing devices will /always/ beat out single purpose devices. If 
there are more Kindles sold next year than iPads it will be because 
Amazon has turned the Kindle into a general purpose computing device as 
well.

From Bowerbird at aol.com  Wed Apr 21 16:56:51 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 19:56:51 EDT
Subject: [gutvol-d] Re: a longstanding question has finally been answered
Message-ID: <39212.7a4fea3d.3900eac3@aol.com>

michael said:
>    You are really willing to bet that on April 21, 2013 there 
>    will be more iPads than Kindles plus Sonys plus nooks. . . .

well, every ipad and iphone and ipodtouch can run the
kindle software, as can a slew of phones, even today,
so all of those machines can act as "virtual kindles"...

but in terms of dedicated kindle machines, you betcha.


>    Say it in print for the folks, and 
>    I'll start practicing turducken recipes. . . .

well, if you're even willing to take such a stupendous bet,
it means you're gonna rewrite the terms of engagement,
but i'll go for it anyway, to see how you're gonna do that.         :+)

***

while i've got the crystal ball out, might as well use it...

amazon will drop the price of a kindle to nearly nothing.

indeed, they'll have a "subscription" offer that will actually
make the kindle _free_ if you agree to buy so many books.

they will also offer an "all-you-can-eat" option that will
attempt to do for books what "netflix" has done for films.

similarly, they'll offer the kindle to school-districts with
a _guarantee_ that their overall textbook-costs will drop,
and increasingly-cash-starved schools will jump at that.

all of these initiatives -- and more!   -- will ensure that
lots and lots and lots and lots of kindle units are moved.

but it _still_ won't compare with the ipad juggernaut...
(not even in quantity, and especially not in profitability.)

why not?   because amazon can't do software like apple.
so they will always be three-and-a-half-steps behind...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/284cc820/attachment.html>

From jimad at msn.com  Wed Apr 21 17:24:44 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 17:24:44 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004211549030.27938@mail.pglaf.org>
References: <22152.5a5833e1.3900b469@aol.com>	<SNT120-DS121702B770935483138313AE090@phx.gbl>	<alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>	<SNT120-DS150DCE0CD3D5D71BD2EEC2AE090@phx.gbl>
	<alpine.DEB.2.00.1004211549030.27938@mail.pglaf.org>
Message-ID: <SNT120-DS125C615F11E8D851225C9EAE080@phx.gbl>

>Again, I must once again refer you to Wattpad, for the fifth? time.

I DID download and install the Wattpad and I DID discuss this earlier and
again it appeared to me to only be yet another app that displays a slightly
hacked version of ascii txt files shipped from their own private server.  It
did not appear to have any way to get an ePUB or MOBI book from a location I
choose on the internet.

>I was going to get after you about that, and the other things nook, Sony or
Kindle do to herd you onto the "company store" turf.

iTunes and the Apple App Store are both "company store" turfs as far as I
can see -- especially when Jobs can tell Stanza to take out a feature that
customers to share free books with their friends--a feature they have to
come to rely on-- and why -- because Jobs is introducing a competitive app
and Jobs wants to cook the books so that his app wins.  That if one even
wants to install a free ePUB book via *USB* you *still* have to run it
through the iTunes "company store" is particularly galling to me. I don't
understand why you have to take this path if Apple isn't DRM'ing the free
books???

For the record:

Nook "company store" -- nook is hopelessly "locked down" as I have said many
times.

Sony "company store" -- don't know the wifi version if any, I just know that
lots of people who work with PG/DP transfer ebooks to Sony via USB.

Kindle "company store" -- offers about the same "features" as the iBooks
"company store", plus you can USB by *direct connection* to your computer
without having to go though an iTunes-like "company store" applet, plus it
has an "experiment web browser" that allows you to download free MOBI books
directly from the internet via whispernet, plus it allows one to quickly and
easily write a "Magic Catalog" type ebook which in turn can pull down other
free MOBI books from the internet via whispernet.  Now whispernet is slow
and unreliable compared to wifi at least in the 'burbs where I live.  And
the Kindle browser is weak and sucky -- but at least it hasn't been "pimped"
to prevent the download of free ebooks from the internet! And you can also
do all these things with PDF Files and TXT files and they will all also
actually end up inside your Kindle on the standard bookshelf so that they
will still be there when you want to read them, whether on the beach or on
an airplane, etc. But the slow and unreliable whispernet plus the weak web
browser all reasons why I am still looking for an ebook reader that uses
wifi, which hasn't "pimped" that wifi, and hasn't "pimped" the web browser
either!


From jimad at msn.com  Wed Apr 21 17:35:54 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 17:35:54 -0700
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
In-Reply-To: <20100421231052.GA31654@pglaf.org>
References: <21579.59641b06.3900b1fa@aol.com>
	<20100421231052.GA31654@pglaf.org>
Message-ID: <SNT120-DS216A867244381EC1B33B33AE080@phx.gbl>

>A few small changes:

Sorry, I'm not too sure how you are doing it, but it shows up "unwrapped" as
html in the web browser, but if I actually try to save it as a text file,
either by cut-and-paste or by File Save = name.txt then the text magically
shows up wrapped again.

I think what people want is an unwrapped txt file that they can actually
save and can read in their choice of txt reader or txt editor.


From jimad at msn.com  Wed Apr 21 17:48:25 2010
From: jimad at msn.com (James Adcock)
Date: Wed, 21 Apr 2010 17:48:25 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <alpine.DEB.2.00.1004211618150.31494@mail.pglaf.org>
References: <33b17.38321b87.3900d555@aol.com>	<SNT120-DS50CC460C27F4FBD7EF1DDAE090@phx.gbl>
	<alpine.DEB.2.00.1004211618150.31494@mail.pglaf.org>
Message-ID: <SNT120-DS7E36CE91F52A6E034593EAE080@phx.gbl>

>Are not the PG iBooks actually PG .epub books?

I have acknowledged before that the subset of free books iBooks that Apple has rebranded as being from Apple look like they originated from PG, and that if you are willing to accept a subset of what PG offers and accept the Apple rebranding then this is not a bad offering on that subset.  

>Same goes for Wattpad?

Wattpad is a non-starter piece of junk as far as I can see.  At least the iBooks subset is a reasonable port of that subset of the PG books they choose to offer.

>I thought .epub was the default iPad format, no?

I thought so too which is why I was so so surprised when from iPAD I clicked on an ePUB book at the PG website and iPAD refused to download and display the book.  Even Kindle allows that!

>I'm not sure WHAT more he has in mind, unless it's editing.

Please Michael you are being silly because I have told you a dozen times already what I had in mind: I had in mind a ebook reader that has wifi and allows me to use its internet browser to download and display ebooks in ePUB and/or MOBI format. It should also allow me to quickly and easily use the wifi to transfer ebooks that I am working on from my local computer to the reader device.  Any generic $200 netbook allows you to do these things.  Its just that they have a keyboard that gets in the way when you are trying to read something.  I wouldn't think it would be hard for YOU to imagine a netbook but with a virtual keyboard rather than a physical keyboard except that YOU are playing games because YOU don't want to admit how much Apple has pimped their offering to keep friends from freely sharing free books with their friends.  

And I thought that was something that YOU always claimed PG was about?  So again, why are you defending Apple?


From Bowerbird at aol.com  Wed Apr 21 18:09:22 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Wed, 21 Apr 2010 21:09:22 EDT
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
Message-ID: <3d4f6.c5288d9.3900fbc2@aol.com>

jim, you really need to learn the art
of asking what you are doing wrong,
instead of just making a report that
"the tool doesn't work".

you say the text shows up unwrapped;
that indicates the script is working...

if the linebreaks "reappear", then you
are doing something wrong...   _you_...

what i imagine is happening is this:
you're doing a "select-all" before you
copy text from the browser-window.

that copies the text from the _field_
as well as the unwrapped text below.
so when you paste the text elsewhere,
the text you see (because it's on top)
is the wrapped text from in the field.

you need to copy the _unwrapped_ text
out of the browser window that appears,
and _only_ that unwrapped text.   get it?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100421/fca8e725/attachment.html>

From hart at pglaf.org  Wed Apr 21 19:41:02 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 19:41:02 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS7E36CE91F52A6E034593EAE080@phx.gbl>
References: <33b17.38321b87.3900d555@aol.com>
	<SNT120-DS50CC460C27F4FBD7EF1DDAE090@phx.gbl>
	<alpine.DEB.2.00.1004211618150.31494@mail.pglaf.org>
	<SNT120-DS7E36CE91F52A6E034593EAE080@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211932010.6023@mail.pglaf.org>


On Wed, 21 Apr 2010, James Adcock wrote:

> >Are not the PG iBooks actually PG .epub books?
>
> I have acknowledged before that the subset of free books iBooks that Apple
> has rebranded as being from Apple look like they originated from PG, and
> that if you are willing to accept a subset of what PG offers and accept the
> Apple rebranding then this is not a bad offering on that subset.

Gee!  Jim has still found yet another way NOT to say if they are .epub.

Not to mention all the ways he has found NOT to say that there are some
obvious ways NOT to get the Stanza iPod effect in his original complaint.

Yawn!!!


> >Same goes for Wattpad?
>
> Wattpad is a non-starter piece of junk as far as I can see.  At least the
> iBooks subset is a reasonable port of that subset of the PG books they
> choose to offer.

"Non-starter piece of junk". . .another yawn!!!

Jim. . .you have to START before you have any right to such comments.

Have some experience before you say such things.

This is why people think you are just bitching.


> >I thought .epub was the default iPad format, no?
>
> I thought so too which is why I was so so surprised when from iPAD I clicked
> on an ePUB book at the PG website and iPAD refused to download and display
> the book.  Even Kindle allows that!

Yes, Jim has found another way to avoid answering that direct question.

I am back to having to challenge his sincerity in all this. . . .

Yahn!!!


> >I'm not sure WHAT more he has in mind, unless it's editing.

Jim, until and unless you are willing to CONVERSE and do research,
and to admit that Apple's iPad was never intended for what you want
I don't think there is any need or reason to continue this pretense.

You can have the last word.

You can have ALL of the last words.

I retire from the field.

The field is yours.

Next subject. . .are there really $200 netbooks?

Would you please send me some URLs for them???


> Please Michael you are being silly because I have told you a dozen times
> already what I had in mind: I had in mind a ebook reader that has wifi and
> allows me to use its internet browser to download and display ebooks in ePUB
> and/or MOBI format. It should also allow me to quickly and easily use the
> wifi to transfer ebooks that I am working on from my local computer to the
> reader device.  Any generic $200 netbook allows you to do these things.
> Its just that they have a keyboard that gets in the way when you are trying
> to read something.  I wouldn't think it would be hard for YOU to imagine a
> netbook but with a virtual keyboard rather than a physical keyboard except
> that YOU are playing games because YOU don't want to admit how much Apple
> has pimped their offering to keep friends from freely sharing free books
> with their friends.
>
> And I thought that was something that YOU always claimed PG was about?  So
> again, why are you defending Apple?

Only defending our readers from you. . . .

Apple can take care of itself.

>
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From hart at pglaf.org  Wed Apr 21 19:49:29 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 19:49:29 -0700 (PDT)
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <SNT120-DS125C615F11E8D851225C9EAE080@phx.gbl>
References: <22152.5a5833e1.3900b469@aol.com>
	<SNT120-DS121702B770935483138313AE090@phx.gbl>
	<alpine.DEB.2.00.1004211516270.27938@mail.pglaf.org>
	<SNT120-DS150DCE0CD3D5D71BD2EEC2AE090@phx.gbl>
	<alpine.DEB.2.00.1004211549030.27938@mail.pglaf.org>
	<SNT120-DS125C615F11E8D851225C9EAE080@phx.gbl>
Message-ID: <alpine.DEB.2.00.1004211944030.6023@mail.pglaf.org>


On Wed, 21 Apr 2010, James Adcock wrote:

> >Again, I must once again refer you to Wattpad, for the fifth? time.
>
> I DID download and install the Wattpad and I DID discuss this earlier and
> again it appeared to me to only be yet another app that displays a slightly
> hacked version of ascii txt files shipped from their own private server.  It
> did not appear to have any way to get an ePUB or MOBI book from a location I
> choose on the internet.

You seem to be avoiding the point that they say it IS an .epub you read
with Wattpad. . .and iBooks. . .and many others.

You HAVE downloaded .epubs, or so it would appear, and _I_ have, so it
would appear, if we believe it's the iPad default.

You are hitting the target, but trying to deny it.

Do more research. . .perhaps you can find ONE that not .epub!


You could be famous!!!


>
> >I was going to get after you about that, and the other things nook, Sony or
> Kindle do to herd you onto the "company store" turf.

ALL of the materials I have mentioned are free of charge.

Once again. . .the target has been hit. . .you are in denial.


> iTunes and the Apple App Store are both "company store" turfs as far as I
> can see -- especially when Jobs can tell Stanza to take out a feature that
> customers to share free books with their friends--a feature they have to
> come to rely on-- and why -- because Jobs is introducing a competitive app
> and Jobs wants to cook the books so that his app wins.  That if one even
> wants to install a free ePUB book via *USB* you *still* have to run it
> through the iTunes "company store" is particularly galling to me. I don't
> understand why you have to take this path if Apple isn't DRM'ing the free
> books???
>
> For the record:
>
> Nook "company store" -- nook is hopelessly "locked down" as I have said many
> times.

Gee, I wonder who said all that "company store" stuff here before you did???

Enough. . .you are now just talking to yourself,
unless you can tempt bowerbird to keep after you.


> Sony "company store" -- don't know the wifi version if any, I just know that
> lots of people who work with PG/DP transfer ebooks to Sony via USB.
>
> Kindle "company store" -- offers about the same "features" as the iBooks
> "company store", plus you can USB by *direct connection* to your computer
> without having to go though an iTunes-like "company store" applet, plus it
> has an "experiment web browser" that allows you to download free MOBI books
> directly from the internet via whispernet, plus it allows one to quickly and
> easily write a "Magic Catalog" type ebook which in turn can pull down other
> free MOBI books from the internet via whispernet.  Now whispernet is slow
> and unreliable compared to wifi at least in the 'burbs where I live.  And
> the Kindle browser is weak and sucky -- but at least it hasn't been "pimped"
> to prevent the download of free ebooks from the internet! And you can also
> do all these things with PDF Files and TXT files and they will all also
> actually end up inside your Kindle on the standard bookshelf so that they
> will still be there when you want to read them, whether on the beach or on
> an airplane, etc. But the slow and unreliable whispernet plus the weak web
> browser all reasons why I am still looking for an ebook reader that uses
> wifi, which hasn't "pimped" that wifi, and hasn't "pimped" the web browser
> either!
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>

From hart at pglaf.org  Wed Apr 21 20:02:42 2010
From: hart at pglaf.org (Michael S. Hart)
Date: Wed, 21 Apr 2010 20:02:42 -0700 (PDT)
Subject: [gutvol-d] APRIL 21, 2013 BOWERBIRD/HART WAGER
In-Reply-To: <39212.7a4fea3d.3900eac3@aol.com>
References: <39212.7a4fea3d.3900eac3@aol.com>
Message-ID: <alpine.DEB.2.00.1004211952240.6023@mail.pglaf.org>


Quit trying to change the subject and muddy the fowl waters!

iPad have to BE iPads, Kindles have to BE Kindles, Sonys=Sonys,
and nooks have to be nooks. . .not virtual. . .not clones.

April 21, 2013 you say there will be more iPads than the rest.

Period!


On Wed, 21 Apr 2010, Bowerbird at aol.com wrote:

> michael said:
> >?? You are really willing to bet that on April 21, 2013 there
> >?? will be more iPads than Kindles plus Sonys plus nooks. . . .
>
> well, every ipad and iphone and ipodtouch can run the
> kindle software, as can a slew of phones, even today,
> so all of those machines can act as "virtual kindles"...
>
> but in terms of dedicated kindle machines, you betcha.
>
>
> >?? Say it in print for the folks, and
> >?? I'll start practicing turducken recipes. . . .
>
> well, if you're even willing to take such a stupendous bet,
> it means you're gonna rewrite the terms of engagement,
> but i'll go for it anyway, to see how you're gonna do that.??????? :+)

This was your original prediction, as far as I can make out.

As listed above, and I'm not even counting the dozen other
brands. . .at least for now.


>
> ***
>
> while i've got the crystal ball out, might as well use it...

You mean making it a "loss leader" like Sony Playstations???


> indeed, they'll have a "subscription" offer that will actually
> make the kindle _free_ if you agree to buy so many books.

A virtual "Book of the Month Club"???


> they will also offer an "all-you-can-eat" option that will
> attempt to do for books what "netflix" has done for films.

Now THAT would be cute. . .but like Netflix, I presume your
plan would mean you don't get to OWN the books. . .???

Cute. . .you should try to sell them that promotion and save
these notes for "prior art"!!!

I want 10% !!!


Hee hee!


> similarly, they'll offer the kindle to school-districts with
> a _guarantee_ that their overall textbook-costs will drop,
> and increasingly-cash-starved schools will jump at that.

You mean like, let's see, who was it. . .APPLE used to do!?!?!?


> all of these initiatives -- and more!? -- will ensure that
> lots and lots and lots and lots of kindle units are moved.
>
> but it _still_ won't compare with the ipad juggernaut...
> (not even in quantity, and especially not in profitability.)
>
> why not?? because amazon can't do software like apple.
> so they will always be three-and-a-half-steps behind...

Good. . .then it's still a bet, and we are on for:


APRIL 21, 2013

iPad has to have over 50% of the grand total. . . .

We'll have to wait to see if any of the others take more
than a few percentage points of the market.

I'm sure SOMEONE will try, perhaps even Microsoft, but I
don't think they will get enough out there to matter.


>
> -bowerbird
>
>

From Bowerbird at aol.com  Wed Apr 21 23:12:16 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 22 Apr 2010 02:12:16 EDT
Subject: [gutvol-d] Re: APRIL 21, 2013 BOWERBIRD/HART WAGER
Message-ID: <4871f.4a561e90.390142c0@aol.com>

michael said:
>    iPad have to BE iPads, Kindles have to BE Kindles, Sonys=Sonys,
>    and nooks have to be nooks. . .not virtual. . .not clones.
>    April 21, 2013 you say there will be more iPads than the rest.
>    Period!

yes sir!

or i'll buy you an all-you-can-eat dinner every night for a week!

_plus_ you can say you once won one bet against me!          :+)

-bowerbird

p.s.   do i win automatically if ipad surges ahead _before_then_?
or is it only what the sales figures happen to be on _that_date_?

p.s.   heck, i'll make it all-you-can-eat every night for _two_ weeks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/84a20a57/attachment.html>

From joshua at hutchinson.net  Thu Apr 22 07:56:48 2010
From: joshua at hutchinson.net (Joshua Hutchinson)
Date: Thu, 22 Apr 2010 14:56:48 +0000 (GMT)
Subject: [gutvol-d] Re: Tidy -c and tables
References: <4BCF2EA9.1000402@novomail.net>	<SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
	<4BCF6AE1.8060900@perathoner.de>
	<SNT120-DS1104DCE8B884F170AB82E4AE090@phx.gbl>
Message-ID: <1992374499.215098.1271948208072.JavaMail.mail@webmail10>

An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/72ded068/attachment.html>

From Bowerbird at aol.com  Thu Apr 22 12:51:23 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 22 Apr 2010 15:51:23 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <6bf7b.87321.390202bb@aol.com>

kevin said:
>    On the Open Library System, I note that 
>    high resolution gray-scale scans 
>    (at least for the one project I checked) 
>    are not archived, 
>    though the black and white scans are

it's my understanding that d.p. has kept all scans, but
it's reasonable they wouldn't mount the high-res ones;
no sense letting the general public burn your bandwidth.

this, of course, is the problem with high-res files in general.

they're nice to have, for purposes of "preservation", but
you can't really make them "accessible" in a practical way
until computer resources become free across-the-board,
so -- in a practical sense -- they don't really do any good.

it's not just bandwidth, either.   storage problems quickly
ensue when each page of a book eats multiple megabytes.
and computers need lotsa power to crunch through them.

and sure, we can all see the day coming when all of these
resources _will_ be available to us.   but how soon is that?
are you willing to bet on it?   and don't forget that you are
a lucky first-worlder.   how soon until _everyone_ on the
whole planet has unlimited computing resources?   really?
are you willing to bet on it?   and if the third-worlders can't
have what you lucky people have, how long do you think
they will sit on the sidelines without a full-out revolution?

we need to think in real-world terms, and be _practical_...


>    I also note that there is no 'bulk' download function
>    to get a zip of all the files associated with a text.

yeah, that would be nice.   will d.p. offer that?   who knows?

in the meantime, you can learn the address of an image by
right-clicking it and choosing the appropriate menu-item.

for instance, here's the u.r.l. i recovered for one page:
>    
http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/web_ready/001.png

subsequent scans have the same u.r.l., except "002.png",
"003.png", etc., so it's very easy to scrape them en masse.
(if anyone needs a scraper-program, just backchannel me.)

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/7208efa0/attachment.html>

From jimad at msn.com  Thu Apr 22 13:00:18 2010
From: jimad at msn.com (James Adcock)
Date: Thu, 22 Apr 2010 13:00:18 -0700
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
In-Reply-To: <37cac.38c448de.3900e581@aol.com>
References: <37cac.38c448de.3900e581@aol.com>
Message-ID: <SNT120-DS29B7BB3D35875B6772821AE080@phx.gbl>

>>   Using the iPad I go on the web to PG.  I see a ePub book I like there.

>you mean that you see a _book_ you like there.

 
No, I mean I see an ePub book there because Steve Jobs among other people at Apple claims that the iPad supports ePub, and it doesn?t really.  If you read the APIs you will find that actually it is a PDF-centric machine which also has some HTML url APIs.

>a _book_ that is offered in a _number_ of different formats,
including .html (which can be read in the web-browser) and
as a plain-vanilla .txt file, which can be read in many apps...

 
And they all generally disappear on the iPad as soon as you lose the wifi connection (see more later)


>your problem is that you want to insist on a certain file-format.


No my problem is that I insist on a certain reader experience that is not important to you or Michael presumably because your eyestate is different than my eyestate, and in general you have lower expectations about what a ?book? is than I have, and you have lower expectations for a user?s experience than I have. ePub and MOBI are simply two real ?ebook? file formats that support that user experience.  Other less common file formats the support a real ?ebook experience? include: azw, topaz, tr2, tr3, aeh, fb2, lit, pdb, lrf, lrx

 
Here?s a few things I expect for my ?reader experience? based on my previous experience with many many different reader machines which are either designed for reading, or which truly are ?general purpose machines.?

 
I expect to be able to see a book on the internet and actually get that book onto my machine to read it -- from where I see it on the internet.

 
I expect to be able to read a book in either full screen portrait mode or full screen landscape mode.  I DO NOT, for instance expect to be forced to read a book in ?two up? mode if I switch to landscape mode.

 
I expect to be able to change font sizes, fonts, and margins.

 
I expect reflow when these things happen.

 
I expect to be able to keep a ?library? of at least a couple hundred books on my machine in differing formats and in differing states of being read.

 
I expect that ?library? will show me spine information such as Author and Title without having to read the book.

 
I expect that once I put a book on my machine it will be there again the next time I try to read the book whether or not I have an internet connection. AKA ?airplane mode.?

 
I expect the reader machine will understand what it means to change a page and will not go off scrolling wacko in three dimensions when I just try to change a page.

 
I expect that when I own a book and I  own a desktop computer, and I own a portable computer, then I can move that book which I own to and from those two computers that I own without having to ask Steve Jobs or his Company Store?s permission every time I want to move that book file from one computer to the other.  What in God?s Name gives Steve Jobs the right to say I?ve got to run MY files through his Company Store every time I want to make a file transfer between MY computers??? 


>but don't try to pretend that because you cannot get an .epub,
you can't get "an e-book", because i don't play nonsense games.

 
Here?s what one CAN do with an iPad, because I just went back to the Apple ?Bricks and Mortor? store again.  You can open Safari. You can go to the PG website.  You can say ?Show me all the PDF books?  -- of the 30,000 books on PG exactly 483 are available in PDF format.  You can say ?Save this webpage to the Desktop?.  You can do this with a second book.  Now you can go into ?beach mode? aka ?airplane mode? by turning off the wifi.  You try to open the first book and you find that it has magically disappeared.  It appears to be on the desktop, but when you open it nothing is there.  Now you open the second book and you find that it IS still there.   So the Safari web browser has the ability to store ONE book on the desktop.  So you can get ONE book at a time off the internet from a location you choose without having a wifi connection IF that book is available in PDF format.  If you try to read two books at once you either need a wifi connection so that the iPad can keep reloading those books over the internet over and over and over again, or you can confine yourself to only reading one book at a time cover to cover and then throw the first book away before you read the second book ? which will require you to find a wifi connection to reload it again.

 
You can also ?buy? where ?buy? may be free if Steve Jobs says so a limited selection of books from the Apple Company Store using their Apple-hardwired puppy calls ?iBooks?.  iBooks does store more than one book.  You will have to sign up with iTunes and give them a valid credit card before you can even download ?free? books from the Apple Company Store. And you will have to content yourself with Jobs? choice of what you get to read ? a choice that may change for the better or worse a year from now.

 
You can also do these same things through the Amazon Company Store in the form of Kindle for iPad downloaded to your iPad.


>what book -- what _book_, not some particular file-format --
what _book_ is it that you claim the ipad is refusing to display?

Two such books come readily to mind:  

 
PG #32085

 
(well I guess I can read #32085 in pgtxt70 mode but I refuse to read ?books? in pgtxt70 mode ? life is too short! Also I guess I can read it in HTML mode as long as I have the wifi connection up and running ? just not in ?airplane mode.?Also as long as it?s the ONLY book I want to currently read.

 
And the book that I am currently working on for submission to PG.

 
Look its pretty simple: iPad is actually a PDF machine NOT an ePub machine in spite of Job?s claims to the contrary. If you and Michael actually want to support iPad rather than claim it can do things that it can not do -- THEN SUPPORT IT!

 
What this would take is firstly provide all the PG books in PDF format.  BB has I think been working on this idea for text files.  Use half page format as BB was trying previously.  I spent literally a minute and as a test converted a PG HTML book file to PDF format using an Adobe tool.  I put that PDF file up at http://www.freekindlebooks.org/Dev/Rainbow.pdf if you want to play with it using iPad and Safari ? or if you want to look at it in any other general purpose machine ? even a Kindle will display it if you get it from this location!  Now Safari is actually a pretty sucky browser for reading books ? but at least you CAN read it.  And you CAN store ONE book from PG on the iPad desktop and then read that ONE book on the beach or on an airplane. 

 
Secondly one of you or someone else at PG would need to write an iPad PDF/HTML browser program which would have its own ?sand box? area which could download and store more than one book.  Unlike ePub iPad DOES have APIs for reading, storing, displaying PDF, and for accessing the internet using HTML url protocols.  So this app would be RELATIVELY easy thing to do.

 
Would this make *me* happy?  Not really, because I like reflow and PDF doesn?t reflow.  But then at least you guys COULD legitimately claim then that one CAN read PG ?ebooks? on the iPad!  No Steve Jobs and no Company Store involved!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/5c805800/attachment-0001.html>

From jimad at msn.com  Thu Apr 22 13:05:37 2010
From: jimad at msn.com (James Adcock)
Date: Thu, 22 Apr 2010 13:05:37 -0700
Subject: [gutvol-d] Re: Tidy -c and tables
In-Reply-To: <4BCF8E86.2000004@novomail.net>
References: <4BCF2EA9.1000402@novomail.net>	<SNT120-DS544E4CFBC31533B50C055AE090@phx.gbl>
	<4BCF8E86.2000004@novomail.net>
Message-ID: <SNT120-DS16BD0DA92AAA1494B72FD6AE080@phx.gbl>

>I do not believe that there is any HTML authoring/editing tool which 
will preserve newline characters as implicit markup. 

Sorry what I do and others do is retain the original books linebreaks in the
coding of the HTML. Fortunately HTML *is* a reflow file format which ignores
those linebreaks and treats them as whitespace, allowing the end user to use
whatever size device, fonts, screen orientations etc that they choose.  If
one later want to make another pass at the book one simply strips out the
HTML markup leaving the original text part intact with the same linebreaks
as were in the original book.  Then as a hypothetical example one can
resubmit that plaintext with the original linebreaks back through the DP
process.


From jimad at msn.com  Thu Apr 22 13:24:59 2010
From: jimad at msn.com (James Adcock)
Date: Thu, 22 Apr 2010 13:24:59 -0700
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
In-Reply-To: <3d4f6.c5288d9.3900fbc2@aol.com>
References: <3d4f6.c5288d9.3900fbc2@aol.com>
Message-ID: <SNT120-DS1747EAEC111B4219C0FB04AE080@phx.gbl>

>if the linebreaks "reappear", then you
are doing something wrong...  _you_...

You all keep making tools that don't work and then you blame the user when
they report back their experience to you.  Tried your "workaround"
suggestion on a couple of browsers and that just makes the browser crash.
Suggest YOU ought to try it out on a "REAL" PG book not a toy test and see
what happens to you.  I'm on a PC, so maybe you all are making things that
work on the other 1% of the machines out here in the real world?


Bottom line, do you really think what you are offering is really going to
make a real world PG customer happy?  How about offering unwrapped txt files
as part of the file download choices - I think that's what people expect!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/7c08cce7/attachment.html>

From Bowerbird at aol.com  Thu Apr 22 13:55:48 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 22 Apr 2010 16:55:48 EDT
Subject: [gutvol-d] Re: Typesetting (unwrap.pl)
Message-ID: <7301c.365c7490.390211d4@aol.com>

jim-

i'm sorry you can't make my unwrap script work for you.
you haven't given me enough info that i can help you out.
and your attitude is so bad that i am not inclined even to
give you simple advice that would sidestep the problem...

i'm _quite_ sure the program works just fine; after all,
it's nothing but a set of reg-ex changes.   nonetheless...

if _anyone_else_ finds that you can't make it work either,
then i _invite_ you to make a post here on this listserve,
and i will do my best to get to the root of your problem.
unfortunately, jim's problem is rooted _far_ too deeply
for me to be able to do anything about it...

and if you don't want to mess with a perl script at all,
go ahead and use the web-service that i've created:
>    http://z-m-l.com/go/unwrap.pl

paste your text into the field, then click "unwrap it!",
and then copy the unwrapped text from the window...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/09fd6628/attachment.html>

From lee at novomail.net  Thu Apr 22 14:06:30 2010
From: lee at novomail.net (Lee Passey)
Date: Thu, 22 Apr 2010 15:06:30 -0600
Subject: [gutvol-d] Cooperative proofreading
Message-ID: <4BD0BA56.5020806@novomail.net>

Just an update on my cooperative proofreading site 
(http://www.ebookcooperative.com/).

1. The Java servlets are now connected to CVS, so any changes you make 
will be persisted to the file system.

2. Registration is now required to proof read pages (just your identity 
is currently required). This identity is used to add comments to CVS so 
we can see who changed things.

3. The proofreading system tracks progress, so if you go half-way 
through a project, exit, and return at some time later and choose the 
same project you will be taken to the last page you were viewing.

4. Some new projects were added. The project table is generated from a 
database of projects; no editing of web pages is required.

The Kupu-based UI is still a little rough; that will probably be the 
last thing I tackle. I have not yet written the proofing guidelines. The 
next step will be to create a servlet that will allow downloading an 
entire e-book by combining all the pages from the repository into a 
single file.

As always, feedback is welcomed.


From Bowerbird at aol.com  Thu Apr 22 14:30:58 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 22 Apr 2010 17:30:58 EDT
Subject: [gutvol-d] Re: EBook formats on iPad via wifi
Message-ID: <754f1.14597b23.39021a12@aol.com>

jim said:
>    No, I mean I see an ePub book there 
>    because Steve Jobs among other people at Apple 
>    claims that the iPad supports ePub, and it doesn?t really.   

blah blah blah.   i'm done talking to you about this, jim.


>    No my problem is that I insist on a certain reader experience 

i can relate to that.

as i told you earlier, if you woulda said "i can't get the e-books
that i want from the sites that i want in the format that i want",
then you _might_ have received a sympathetic response.

but instead, you said you couldn't get an e-book.   that's false,
so you get the scorn that people receive when they tell a lie...


>    you have lower expectations about what a ?book? is than I have

i'm quite sure my ideal e-book surpasses an .epub in many ways.

but i'm not foolish enough to think that my idea of an e-book is
the only one that deserves to be called "an e-book", and that
every other idea of an e-book can be discarded _by_definition_...


>    I expect to ...
>    I expect to ...
>    I expect to ...
>    I expect to ...
>    I expect to ...
>    I expect to ...
>    I expect to ...
>    I expect to ...

that's a nice list; i applaud the thought that went into its creation.

if the ipad comes up short of your expectations, just don't buy it...

you might even tell us -- _once_ -- how it falls short via your list.

but let me tell you a little something about listserve conversations;
if they're not sufficiently interesting to the majority of the lurkers,
then they aren't worth having.

this conversation stopped being "sufficiently interesting" to them
a long, long, long time ago.   that's when you should have stopped.

that's also when _i_ should've stopped, and michael too, and we
both should know better than to make such a stupid mistake, but
we'll both be smarter about having a dialog with you in the future.

(i keep giving you "one more chance", and you keep blowing it, so
i'm not going to be doing that any more, jim.   you made your bed.
now you're going to find that nobody wants to talk with you at all.
worse, you might find nobody even bothers to _read_ your posts.)

 
>    well I guess I can read #32085 in pgtxt70 mode 
>    but I refuse to read ?books? in pgtxt70 mode

i see.   so it isn't that the ipad "refuses to display" this book,
it's that _you_ refuse to read it the way the ipad displayed it.

well, yes, then _that_ is a different matter entirely, yes it is...


>    And the book that I am currently working on for submission to PG.

am i some kind of mind-reader, or what?


>    Look its pretty simple: iPad is actually a PDF machine 
>    NOT an ePub machine in spite of Job?s claims to the contrary.

i think you're sputtering out of control, jim...


>    If you and Michael actually want to support iPad 
>    rather than claim it can do things that it can not do 
>    -- THEN SUPPORT IT!

the ipad doesn't need our support.

and we're not "claiming" that it can do anything it can't.

then again, neither are we claiming that it "cannot" do
something -- like display an e-book -- just because it
won't display it in the way we want in the app we want.

in sum, just because i don't like the format of an e-book
doesn't mean that it magically ceases to _be_ an e-book.

and i can't believe that i am willing to make yet another
message that repeats something so inane, and therefore
contributes absolutely nothing to the signal/noise ratio.

so i will stop!   now!   my apologies to the subscribers...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/ccef3dc6/attachment.html>

From Bowerbird at aol.com  Thu Apr 22 15:06:58 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Thu, 22 Apr 2010 18:06:58 EDT
Subject: [gutvol-d] Re: Typesetting (back on track,
	calling the mad scientist)
Message-ID: <777f8.52df4abc.39022282@aol.com>

we seem to have lost the mad scientist.

mike mcd, are you still out there?

if so, i have some questions for you...

here's another take on "gods and fighting men":
>    http://z-m-l.com/misc/14465-take6.pdf

this .pdf just has the first page of each chapter,
but it came outta my program, not a text-editor.

as we can see, the p.g. linebreaks make this text
practically unusable, so we'll have to do a rewrap,
especially if you want to have the text _justfied_...
(if not, i can just rearrange the unwieldy lines and
leave the vast majority of p.g. linebreaks in place.)

going on, is this text-size (10-point) good for you?
(again, print out some pages so you know for sure.)

how about the leading?   it's 15-point leading, so
that's generous for 10-point type, and you might
feel it's _too_ big, but i thought i'd show it to you.

on pages 12 and 101, you'll see _blue_ headers...
those are lines that needed to be _shrunk_ a bit,
so they would not spill over into the margin area.
on page 12 it's 15.5-point instead of 16-point,
and on page 101 it's 13-point instead of 14-point.
(and that latter one still intrudes on the margins.)

the program attempts to "copy-fit" all the headers
to the same size, but i'm experimenting here with
allowing slight variations in size on freakish lines.
(rather than letting the freaks dictate that the other
header-lines be smaller to accommodate the freaks.)

so the question is, are these small variations bad?
noticeable?   too much so?   do they bother you much?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100422/f90013f7/attachment-0001.html>

From schultzk at uni-trier.de  Thu Apr 22 23:42:54 2010
From: schultzk at uni-trier.de (Keith J. Schultz)
Date: Fri, 23 Apr 2010 08:42:54 +0200
Subject: [gutvol-d] Re: EBook formats on iPad via wifi (OT)
In-Reply-To: <SNT120-DS29B7BB3D35875B6772821AE080@phx.gbl>
References: <37cac.38c448de.3900e581@aol.com>
	<SNT120-DS29B7BB3D35875B6772821AE080@phx.gbl>
Message-ID: <754242A6-FA52-4421-8EB3-0E7D4DCDEDEF@uni-trier.de>

Hi James,

	I think you have stated you points well enough.
	
	If you have nothing more to say about the PG/DP experimental
	format then please do this thread over Apple iPad and take up
	your problems with the iPad with Apple.

	regards
		Keith.


From vlsimpson at gmail.com  Fri Apr 23 07:08:07 2010
From: vlsimpson at gmail.com (V. L. Simpson)
Date: Fri, 23 Apr 2010 09:08:07 -0500
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <6bf7b.87321.390202bb@aol.com>
References: <6bf7b.87321.390202bb@aol.com>
Message-ID: <h2xbd09bf341004230708j547f7cd5m8ba7df318b5d9690@mail.gmail.com>

> (if anyone needs a scraper-program, just backchannel me.)
http://www.gnu.org/software/wget/

From kevin.pulliam at gmail.com  Fri Apr 23 08:10:13 2010
From: kevin.pulliam at gmail.com (Kevin Pulliam)
Date: Fri, 23 Apr 2010 10:10:13 -0500
Subject: [gutvol-d] Re: DP output is technically obsolete
In-Reply-To: <6bf7b.87321.390202bb@aol.com>
References: <6bf7b.87321.390202bb@aol.com>
Message-ID: <w2t78defb41004230810y6c5773cdo2118d8f79f40829e@mail.gmail.com>

This was a special case.. the high resolution scans are actually
needed to read/decipher some of the text, but Greg popped up and
pointed out that he uploaded the super-duper high res scans to
Internet Archive. Which answers the mail on this and satisfies my
desire that all those scans of hard to find issues of that work
continue to be available.

As to a screen scraper, wget, or simply clicking through and
downloading each image at OLS, this fails the "Same Barrier to access"
test (And I admit it is my standard, not a requirement, or something
someone else promised to adhere to) when compared to a PG Text.

In order for the scanned pages to be similarly available as the PG
Text, the images will need to be available in a single download
'click' the hypothetical generic internet user can understand and make
use of.  'One Click, One Book'.

Just as a bookstore doesn't make you visit 16 different locations in
the store to purchase one book, PG doesn't require you to visit
multiple pages to download a book, and Amazon doesn't require you to
visit multiple pages (other than order confirmation) to purchase a
book.  In each of my examples here, Person A can give Person B a link
or a location description, and Person B can go to that location and
get the book in the preferred format (Paper in hand, paper in the
mail, etext of various types, etc).

Thanks

Kevin

On Thu, Apr 22, 2010 at 2:51 PM,  <Bowerbird at aol.com> wrote:
> kevin said:
>>?? On the Open Library System, I note that
>>?? high resolution gray-scale scans
>>?? (at least for the one project I checked)
>>?? are not archived,
>>?? though the black and white scans are
>
> it's my understanding that d.p. has kept all scans, but
> it's reasonable they wouldn't mount the high-res ones;
> no sense letting the general public burn your bandwidth.
>
> this, of course, is the problem with high-res files in general.
>
> they're nice to have, for purposes of "preservation", but
> you can't really make them "accessible" in a practical way
> until computer resources become free across-the-board,
> so -- in a practical sense -- they don't really do any good.
>
> it's not just bandwidth, either.? storage problems quickly
> ensue when each page of a book eats multiple megabytes.
> and computers need lotsa power to crunch through them.
>
> and sure, we can all see the day coming when all of these
> resources _will_ be available to us.? but how soon is that?
> are you willing to bet on it?? and don't forget that you are
> a lucky first-worlder.? how soon until _everyone_ on the
> whole planet has unlimited computing resources?? really?
> are you willing to bet on it?? and if the third-worlders can't
> have what you lucky people have, how long do you think
> they will sit on the sidelines without a full-out revolution?
>
> we need to think in real-world terms, and be _practical_...
>
>
>>?? I also note that there is no 'bulk' download function
>>?? to get a zip of all the files associated with a text.
>
> yeah, that would be nice.? will d.p. offer that?? who knows?
>
> in the meantime, you can learn the address of an image by
> right-clicking it and choosing the appropriate menu-item.
>
> for instance, here's the u.r.l. i recovered for one page:
>>
>> http://pgdp01.us.archive.org/1/pgdp02-archive/texts/documents/43e52c83dd501/web_ready/001.png
>
> subsequent scans have the same u.r.l., except "002.png",
> "003.png", etc., so it's very easy to scrape them en masse.
> (if anyone needs a scraper-program, just backchannel me.)
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>

From Bowerbird at aol.com  Fri Apr 23 12:54:54 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 23 Apr 2010 15:54:54 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <b2c45.d7b1b4d.3903550e@aol.com>

kevin said:
>    In order for the scanned pages to be 
>    similarly available as the PG Text, 
>    the images will need to be available 
>    in a single download 'click' 
>    the hypothetical generic internet user 
>    can understand and make use of.? 
>    'One Click, One Book'.

i see.   you were discussing p.g. policy...
i thought you wanted the scans yourself.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100423/563c0d97/attachment.html>

From Bowerbird at aol.com  Fri Apr 23 13:02:22 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 23 Apr 2010 16:02:22 EDT
Subject: [gutvol-d] Re: DP output is technically obsolete
Message-ID: <b3379.561bf02d.390356ce@aol.com>

vlsimpson said:
>   http://www.gnu.org/software/wget/

wget is a nice program, for a non-interactive commandline tool.
thanks for bringing attention to it...   someone will appreciate it...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100423/c6aab392/attachment.html>

From e98cuenc at gmail.com  Sat Apr 24 02:18:36 2010
From: e98cuenc at gmail.com (Joaquin Cuenca Abela)
Date: Sat, 24 Apr 2010 11:18:36 +0200
Subject: [gutvol-d] Removing spurious break lines
Message-ID: <z2t8b2ed8551004240218pe072305an46bfab8b9eb97a5f@mail.gmail.com>

Hi,

some books, like "Don Quijote" (http://www.gutenberg.org/etext/2000)
have spurious break lines all over the text. From what I understood PG
generates all the derived formats from the HTML, if there is one, or
from the raw text format otherwise.

In this case there is an HTML version, but it also contains the
spurious break lines. My guess is that the HTML was automatically
generated from the text, and the text breaks the lines at ~79 - 80
characters.

Are there guidelines on how to format the raw text to make it more
amenable for automatic conversion to other formats by the PG tools? Is
it ok to reformat this text removing the spurious break lines in the
raw text?

Was the HTML automatically generated? or do I have to fix also the HTML?

How can I check the results in other formats before sending it to PG?

Also, are the conversion tools open source?

Cheers,

-- 
Joaquin Cuenca Abela

From e98cuenc at gmail.com  Sun Apr 25 09:57:48 2010
From: e98cuenc at gmail.com (Joaquin Cuenca Abela)
Date: Sun, 25 Apr 2010 18:57:48 +0200
Subject: [gutvol-d] Re: Removing spurious break lines
In-Reply-To: <z2t8b2ed8551004240218pe072305an46bfab8b9eb97a5f@mail.gmail.com>
References: <z2t8b2ed8551004240218pe072305an46bfab8b9eb97a5f@mail.gmail.com>
Message-ID: <q2h8b2ed8551004250957q6e5f7c85h3f90b35c53a5c239@mail.gmail.com>

Wrt "Don Quijote", the page claims the HTML has been generated
manually. I have generated an improved HTML version with a python
script, and added a few manual fixes (like adding some extra headers).
The trickiest part was to accurately identify verses. The original
text is inconsistent on to where it splits the lines (but most of the
text cuts lines at 75 characters).

How can I submit the modified HTML?

Thanks,

On Sat, Apr 24, 2010 at 11:18 AM, Joaquin Cuenca Abela
<e98cuenc at gmail.com> wrote:
> Hi,
>
> some books, like "Don Quijote" (http://www.gutenberg.org/etext/2000)
> have spurious break lines all over the text. From what I understood PG
> generates all the derived formats from the HTML, if there is one, or
> from the raw text format otherwise.
>
> In this case there is an HTML version, but it also contains the
> spurious break lines. My guess is that the HTML was automatically
> generated from the text, and the text breaks the lines at ~79 - 80
> characters.
>
> Are there guidelines on how to format the raw text to make it more
> amenable for automatic conversion to other formats by the PG tools? Is
> it ok to reformat this text removing the spurious break lines in the
> raw text?
>
> Was the HTML automatically generated? or do I have to fix also the HTML?
>
> How can I check the results in other formats before sending it to PG?
>
> Also, are the conversion tools open source?
>
> Cheers,
>
> --
> Joaquin Cuenca Abela
>


-- 
Joaquin Cuenca Abela

From Bowerbird at aol.com  Sun Apr 25 11:45:35 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Sun, 25 Apr 2010 14:45:35 EDT
Subject: [gutvol-d] Re: Removing spurious break lines
Message-ID: <39eb8.7a131d9b.3905e7cf@aol.com>

joaquin-

someone from p.g. should be along to 
answer your questions any minute now...

we just went through a long bruising discussion
about "spurious" linebreaks, which is likely why
they're a bit reluctant to get on that horse and 
ride it again so soon...

they took a _little_ step toward some progress
by accepting a script that would remove those
"spurious" linebreaks from a properly-prepared file.

but the _big_ step that they still need to take
is to make sure that all the files in the library
are "properly-prepared".

the don quixote text was one such file which
is not "properly-prepared", as you discovered.
(if it had been properly-prepared, the verses
would've been indented, and thus you would
have found it extremely easy to identify them.)

so you have made them face an issue that they
would rather not face, especially right now...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100425/a7e8277a/attachment-0001.html>

From ajhaines at shaw.ca  Sun Apr 25 12:21:04 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Sun, 25 Apr 2010 12:21:04 -0700
Subject: [gutvol-d] Re: Removing spurious break lines
References: <39eb8.7a131d9b.3905e7cf@aol.com>
Message-ID: <BE7AA0E793894A028D69D68213379AA5@alp2400>

I've contacted Joaquin directly.

Al

  ----- Original Message ----- 
  From: Bowerbird at aol.com 
  To: gutvol-d at lists.pglaf.org ; bowerbird at aol.com 
  Sent: Sunday, April 25, 2010 11:45 AM
  Subject: [gutvol-d] Re: Removing spurious break lines


  joaquin-

  someone from p.g. should be along to 
  answer your questions any minute now...

  we just went through a long bruising discussion
  about "spurious" linebreaks, which is likely why
  they're a bit reluctant to get on that horse and 
  ride it again so soon...

  they took a _little_ step toward some progress
  by accepting a script that would remove those
  "spurious" linebreaks from a properly-prepared file.

  but the _big_ step that they still need to take
  is to make sure that all the files in the library
  are "properly-prepared".

  the don quixote text was one such file which
  is not "properly-prepared", as you discovered.
  (if it had been properly-prepared, the verses
  would've been indented, and thus you would
  have found it extremely easy to identify them.)

  so you have made them face an issue that they
  would rather not face, especially right now...

  -bowerbird


------------------------------------------------------------------------------


  _______________________________________________
  gutvol-d mailing list
  gutvol-d at lists.pglaf.org
  http://lists.pglaf.org/mailman/listinfo/gutvol-d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100425/dfbe3e06/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 10:15:38 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 13:15:38 EDT
Subject: [gutvol-d] Re: Removing spurious break lines
Message-ID: <1d6ec.d2028aa.3907243a@aol.com>

al said:
>    I've contacted Joaquin directly.

well, i agree that there's no need to
engulf an innocent newcomer in an
involved discussion about p.g. policy.

but that doesn't mean the discussion
should be swept under the rug, eh?

we'll need an answer to the question:
will p.g. accept e-texts that have been
"corrected" by virtue of having their
not-to-be-unwrapped lines indented?

transparency is the new black.
sunshine is the best disinfectant.
the most colorful fish deserve
the most apparent aquarium.

so... what is the p.g. policy on this?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/2b67a7bb/attachment.html>

From jimad at msn.com  Mon Apr 26 11:00:16 2010
From: jimad at msn.com (James Adcock)
Date: Mon, 26 Apr 2010 11:00:16 -0700
Subject: [gutvol-d] Re: Removing spurious break lines
In-Reply-To: <1d6ec.d2028aa.3907243a@aol.com>
References: <1d6ec.d2028aa.3907243a@aol.com>
Message-ID: <SNT120-DS88E07CCDB0FF731A5B1E3AE040@phx.gbl>

 
In general, if derived formats including ePUB and MOBI from HTML, also HTML
from txt, also unwrapping txt from wrapped txt, are to work "correctly" then
there needs to be *some* degree of expectation on the formatting of the
incoming texts. Otherwise these tasks cannot be successfully automated. 

 
Going the other way, the automated wrapping of txt is has built-in support
by most (all?) modern text tools, including web browsers, e-book readers,
text editors, etc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/fd4532fd/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 11:28:25 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 14:28:25 EDT
Subject: [gutvol-d] Re: Removing spurious break lines
Message-ID: <2377d.3eb6f509.39073549@aol.com>

jim said:
>    In general, if derived formats including ePUB and MOBI from HTML, 
>    also HTML from txt, also unwrapping txt from wrapped txt, 
>    are to work ?correctly? then there needs to be *some* degree of 
>    expectation on the formatting of the incoming texts. Otherwise 
>    these tasks cannot be successfully automated.

that's true.   but i'm not talking just about "derivative formats",
because there's no need to create a "derivative" if you'd rather
just use the .txt file itself to drive the display, a la "eucalyptus".

however, the .txt file does have to be formatted "correctly" if it is
to be _displayed_ correctly.   that's what's driving my motivation...


>    Going the other way, the automated wrapping of txt is has 
>    built-in support by most (all?) modern text tools, including 
>    web browsers, e-book readers, text editors, etc.

stop trying to derail the thread, jim.

there's no way that project gutenberg is going to mount files
that don't have mid-paragraph hard linebreaks...   _no_way_...

so that's not what we're talking about here.

and we aren't _going_ to talk about that here,
no matter how many times you try to bring it up.

so stop trying.

what we _are_ talking about now is formatting the .txt files
_correctly_, so that they can be unwrapped automatically...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/e379b6a5/attachment.html>

From donovan at abs.net  Mon Apr 26 11:28:59 2010
From: donovan at abs.net (D Garcia)
Date: Mon, 26 Apr 2010 14:28:59 -0400
Subject: [gutvol-d] DP Archives/OLS (Was: Re: DP output ...)
In-Reply-To: <j2v627d59b81004201235zbd594f0dn81e0752da6053f0a@mail.gmail.com>
References: <y2i1e8e65081004171547t58719fb1ib013175147a203bb@mail.gmail.com>
	<SNT120-DS871E80E198E99851DB7BFAE0A0@phx.gbl>
	<j2v627d59b81004201235zbd594f0dn81e0752da6053f0a@mail.gmail.com>
Message-ID: <201004261428.59204.donovan@abs.net>

Now that hardware replacements and follow up tasks are mostly complete for the 
DP production server, I'm taking a moment to at least partially respond to 
several comments and remarks recently brought up regarding archival materials 
for projects completed at DP.

don kretz wrote on 2010-04-20 at 15:35

>Upload in as complete form as possible the matching image and
>text files so future modification and adaptation is possible. There's
>no loss to DP by doing  so; and the risk is that over time they are
>quite capable of losing track of them.

I find your lack of faith disturbing. :) The DP test server is an Internet 
Archive machine and is backed up within their infrastructure. I also 
personally maintain a remote backup of all archived project files to dedicated 
storage here.

Having said that, some of the earliest DP material (produced on the server in 
charlz's garage) is not archived in the OLS. This gap in the archives 
encompasses 422 known projects. At last word from charlz, he has this 
material, but unfortunately has not yet provided it to be incorporated into 
the archives.

bowerbird wrote on 2010-04-20 at 20:35

>first, it looks like i was wrong when i said
>that d.p. had stopped maintaining the "ols",
>so of course my "reason" for their having
>stopped maintaining it was also incorrect.
>(or one could say it's _no_longer_ correct,
>but i do believe it was correct at one time.)

charlz was the original and sole maintainer of the DP archives. When he ended 
his active participation with DP, the archives were unmaintained until I made 
time to reconstruct the undocumented procedures for moving project files over 
and recording them in the database. Since then, they have been continuously 
maintained and the current process documented for the benefit of future 
caretakers.

don kretz wrote on 2010-04-20 at 22:12

>Available Formats: Display of images from this source has not been permitted.

DP abides by the stated wishes of the image sources with respect to public 
redisplay of images from various sources. For sources which do not wish images 
from their efforts redistributed, the files from DP are retained in the 
archives, but are not 'made available' in accordance with these agreements.

Kevin Pulliam wrote on 2010-04-21 at 00:07

>On the Open Library System, I note that high resolution gray-scale
>scans (at least for the one project I checked) are not archived,
...
>I also note that there is no 'bulk' download function
>to get a zip of all the files associated with a text.

The hires scans are archived, however the OLS code and UI are feature-poor. 
Availability of hires and zip file sets are among the desired features, but 
development of the OLS is not currently a priority item.

David (donovan)

From Bowerbird at aol.com  Mon Apr 26 11:36:14 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 14:36:14 EDT
Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...)
Message-ID: <241c1.457defaf.3907371e@aol.com>


and the takeaway is that p.g. can copy those scans
and mount them any time that it chooses to do so.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/a014b230/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 11:56:20 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 14:56:20 EDT
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
Message-ID: <25a5e.64d77064.39073bd4@aol.com>

looks like we lost our mad scientist...

or perhaps he's off deep into programming mode,
creating wondrous new tools for project gutenberg.

at any rate, for anyone out there who's interested,
here is my latest .pdf of "gods and fighting men":
>    http://z-m-l.com/misc/14465-take6.pdf

feedback, public or private, would be appreciated...

this version retains p.g. linebreaks, for the most part.
(i did some rewrapping, to remove egregious orphans.)

the use of 10.5-point type made even the longest lines
manageable, given the 4.5-inch measure that i used...
that's obtained by .5-inch margins on a 5.5-inch page.
(if you were printing this at lulu.com, you could specify
a 6*9-inch page, which would allow a bigger fontsize.)

however, as you'll see, the lines are extremely ragged,
with many short lines, since they were wrapped by the
character-count, not the length of a proportional font.

(furthermore, there was some real weirdness on this,
in that many lines seemed to have been counted short;
in particular, it was as if the algorithm was _trying_ to
create very short lines as the last line of the paragraph.
i don't recall having seen this before; it was _strange_.)

anyway, because many of the lines were counted short,
using full justification on this text would be a disaster.

but otherwise, this is a _respectable_ job of typesetting.

i'm gonna rewrap this text, using a bigger fontsize, and
i'll mount that .pdf later this week.   enjoy this one now...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/524b126f/attachment-0001.html>

From dakretz at gmail.com  Mon Apr 26 12:22:29 2010
From: dakretz at gmail.com (don kretz)
Date: Mon, 26 Apr 2010 12:22:29 -0700
Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...)
In-Reply-To: <241c1.457defaf.3907371e@aol.com>
References: <241c1.457defaf.3907371e@aol.com>
Message-ID: <i2q627d59b81004261222o6b31a05fo91918f4373c572cf@mail.gmail.com>

Except for the ones that are purportedly in Charlz' garage and/or
they have a policy not to make available. donovan - are they in
fact available somehow from within DP? I don't see how there
can be a problem with using them to reproof old projects.

On Mon, Apr 26, 2010 at 11:36 AM, <Bowerbird at aol.com> wrote:

>
> and the takeaway is that p.g. can copy those scans
> and mount them any time that it chooses to do so.
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/cc2d6853/attachment.html>

From mmcdermott at mad-computer-scientist.com  Mon Apr 26 12:26:32 2010
From: mmcdermott at mad-computer-scientist.com (Michael McDermott)
Date: Mon, 26 Apr 2010 14:26:32 -0500
Subject: [gutvol-d] Re: Typesetting (back on track,
	calling the mad scientist)
In-Reply-To: <777f8.52df4abc.39022282@aol.com>
References: <777f8.52df4abc.39022282@aol.com>
Message-ID: <1272027526-sup-6377@zion>

Leslie,

> mike mcd, are you still out there?

Yup--everyone else seemed to be having too much fun with the iPad,
though :)

> here's another take on "gods and fighting men":
> >    http://z-m-l.com/misc/14465-take6.pdf

I took a look at this and at the take5 version. Either would be quite
sufficient for my needs, though, aesthetically, I like the take5 version
better.

> how about the leading?   it's 15-point leading, so
> that's generous for 10-point type, and you might
> feel it's _too_ big, but i thought i'd show it to you.

The leading is perfect.

> so the question is, are these small variations bad?
> noticeable?   too much so?   do they bother you much?

Don't bother me--though, admittedly, if I had an ebook reader I probably
would not be bothering with any of this. 

I also conducted some more experiments with CSS stylesheets for on the
html2ps side of things (using txt2html so that the chain looked like:

txt2html -> html2ps -> ps file -> printer/screen).

-Michael

Excerpts from Bowerbird's message of Thu Apr 22 17:06:58 -0500 2010:
> we seem to have lost the mad scientist.
> 
> mike mcd, are you still out there?
> 
> if so, i have some questions for you...
> 
> here's another take on "gods and fighting men":
> >    http://z-m-l.com/misc/14465-take6.pdf
> 
> this .pdf just has the first page of each chapter,
> but it came outta my program, not a text-editor.
> 
> as we can see, the p.g. linebreaks make this text
> practically unusable, so we'll have to do a rewrap,
> especially if you want to have the text _justfied_...
> (if not, i can just rearrange the unwieldy lines and
> leave the vast majority of p.g. linebreaks in place.)
> 
> going on, is this text-size (10-point) good for you?
> (again, print out some pages so you know for sure.)
> 
> how about the leading?   it's 15-point leading, so
> that's generous for 10-point type, and you might
> feel it's _too_ big, but i thought i'd show it to you.
> 
> on pages 12 and 101, you'll see _blue_ headers...
> those are lines that needed to be _shrunk_ a bit,
> so they would not spill over into the margin area.
> on page 12 it's 15.5-point instead of 16-point,
> and on page 101 it's 13-point instead of 14-point.
> (and that latter one still intrudes on the margins.)
> 
> the program attempts to "copy-fit" all the headers
> to the same size, but i'm experimenting here with
> allowing slight variations in size on freakish lines.
> (rather than letting the freaks dictate that the other
> header-lines be smaller to accommodate the freaks.)
> 
> so the question is, are these small variations bad?
> noticeable?   too much so?   do they bother you much?
> 
> -bowerbird
-- 
Michael McDermott
www.mad-computer-scientist.com

From Bowerbird at aol.com  Mon Apr 26 13:39:15 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 16:39:15 EDT
Subject: [gutvol-d] Re: DP Archives/OLS (Was: Re: DP output ...)
Message-ID: <2de3e.7994bbb2.390753f3@aol.com>

dakretz said:
>    Except for the ones that are purportedly in Charlz' garage 

"purportedly"?

tone down the skepticism, man.   save it for when it's needed.


>    and/or they have a policy not to make available.

well, um, yeah...   d.p. made a policy decision to "respect"
the wishes of some institutions _not_ to repost the scans,
even though those "wishes" have no basis in legal rights...

in other words, d.p. traded away your public-domain rights
for the purpose of maintaining "friendly" relations.   oh well.

the good news is that, for the most part, anyone else can
retrieve the scans from the same place where d.p. got 'em.
i don't know if the "o.l.s." tells where the scans came from,
but the "credits" portion of the posted e-book often says so.


>    donovan - are they in fact available somehow from within DP?

it probably depends upon who is asking, and for what purpose.
then again, don, i'm sure you're well aware of _those_ caveats...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/1238bb02/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 13:55:40 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 16:55:40 EDT
Subject: [gutvol-d] Re: Typesetting (back on track,
	calling the mad scientist)
Message-ID: <2f24a.5dfce11.390757cc@aol.com>

mike said:
>    Leslie,

leslie is my girlfriend, not me.

sometimes she checks her e-mail when i'm away from the machine,
so when i come back i end up sending a message from her account.

oh, but you probably got her name from the "author" box of the .pdf,
now that i think about it.   that was filled in by the text-editor i 
used...
if you check out the later .pdfs, you should find that that metadata is
supplied correctly by my authoring-tool.   (unless i forgot to specify it.)


>    The leading is perfect.

oops...   i took it up considerably in the newer version i just posted...

it has 10.5-point type, with 12-point leading; and still runs 400 pages.
and bigger leading means fewer lines per page, and thus more pages.
which might or might not be a big deal to you.   all of these variables
make it complicated to know how to create a .pdf for somebody else.

which is why a cyberlibrary needs to put .pdf/hard-copy output creation
ability into the hands of its end-users, so they can _customize_ it 
fully...


>    if I had an ebook reader I probably
>    would not be bothering with any of this.

that's why i make many of the decisions according to a smart default.


>    I also conducted some more experiments with CSS stylesheets for on 
>    the html2ps side of things (using txt2html so that the chain looked 
like:
>    txt2html -> html2ps -> ps file -> printer/screen).

i'd love to see a .pdf representing your output from that...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/d59e0223/attachment.html>

From jimad at msn.com  Mon Apr 26 14:04:31 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 26 Apr 2010 14:04:31 -0700
Subject: [gutvol-d] Re: Removing spurious break lines
In-Reply-To: <2377d.3eb6f509.39073549@aol.com>
References: <2377d.3eb6f509.39073549@aol.com>
Message-ID: <SNT120-DS31A3E0CB867B8B1D0009AAE040@phx.gbl>


>what we _are_ talking about now is formatting the .txt files _correctly_, so that they can be unwrapped automatically...

In any case PG already "owns" a txt unwrapper, since PG is in some cases generating HTML from pgtxt70, and that requires unwrapping the text (txt?), which is being done, as one can tell by opening just about any PG HTML that was autogenerated from the submitted pgtxt70 file format.  The text not correctly unwrapped in this case was an HTML submitted that had the linebreaks forced -- which is not usual PG convention (to the extent PG *has* a HTML convention.)

Perhaps you should start by examining what PG has *already* implemented for txt unwrapping to generated HTML, find out what works and what doesn't work, and what requirements this puts on txt submission in order to make it all work right?  Otherwise PG will end up with two conflicting text unwrapping standards, which will make the submitter's task even more confusing.

If PG can successfully implement the *hard* task of unwrapping text, one would think PG could also support the *easy* task of wrapping submissions to the pgtxt70 standard. Implementing both directions to form a round-trip might even give PG a heads-up where its assumptions -- or failure of the submission to follow style guidelines -- is "breaking" the wrapping or unwrapping effort.

To the extent that you guys are heading more-and-more towards the "unobtrusive" marking up of txt files, please note that Python has already got very good efforts in this regard called "reStructured Text" -- and the tools existing to support it!  Not to imply that PG would have to follow their lead literally for example they use *emphasis* for italics and **strong emphasis** for bold. Rather you could just "borrow" their tools.

http://docs.python.org/documenting/rest.html

http://docutils.sourceforge.net/rst.html

and online tools that work for trying it out:

http://www.tele3.cz/jbar/rest/rest.html

Now I don't like the formatting of the Python manuals -- but that is a separate choice from the markup language, and the tools they have created for making the manuals from lightweight "unobtrusive" markup.


From ajhaines at shaw.ca  Mon Apr 26 14:12:40 2010
From: ajhaines at shaw.ca (Al Haines (shaw))
Date: Mon, 26 Apr 2010 14:12:40 -0700
Subject: [gutvol-d] Re: Removing spurious break lines
References: <1d6ec.d2028aa.3907243a@aol.com>
Message-ID: <4E5F6EAA7200402CAAEC93506E30B6B1@alp2400>

The question was:

will p.g. accept e-texts that have been "corrected" by virtue of having 
their not-to-be-unwrapped lines indented?

The short answer, yes.  "Corrected" texts can be sent to PG's errata system 
(errata2010_AT_pglaf.org) as attachments.

However, it seems to me that simply indenting not-to-be-formatted lines, and 
doing nothing else, is at least somewhat pointless.  Many of PG's older 
texts have other problems than that (missing illustrations, ASCII only 
rather than Latin1 or UTF8, missing/incomplete indexes, etc, etc, etc.)

A far more desirable approach would be to pick an old PG text, find a 
scanset in IA/Google/wherever, get a copyright clearance, and do a new 
version of it, either from scratch, or a complete re-proof of the existing 
file(s), doing whatever is needed to bring it up to current standards.

Two examples:

"Main Travelled Roads", by Hamlin Garland (PG#2809).  In response to a 
recent errata report, I repaired several missing paragraphs.  While doing 
that, I found that the text also has hundreds, maybe thousands, of hyphens 
that should be em-dashes (--), far too many for my limited time to deal with 
(see note below).

Arizona Sketches (PG#756).  It's missing all its illustrations, and the 
first "n" in "canon" is a plain "n", not n-tilde.  Investigation may reveal 
other problems.


Note: Complaints that the Repost team (mostly myself and David Widger, 
between us doing considerable clean-up work on several thousand of PG's old 
files) "should have done more" will fall on deaf ears.  We're only two 
people, we're also 2/3 of the Whitewashers, and 2/3 of the Errata team, and 
we both produce independently.  Back off.

Al


----- Original Message ----- 
From: Bowerbird at aol.com
To: gutvol-d at lists.pglaf.org ; bowerbird at aol.com
Sent: Monday, April 26, 2010 10:15 AM
Subject: [gutvol-d] Re: Removing spurious break lines


al said:
>   I've contacted Joaquin directly.

well, i agree that there's no need to
engulf an innocent newcomer in an
involved discussion about p.g. policy.

but that doesn't mean the discussion
should be swept under the rug, eh?

we'll need an answer to the question:
will p.g. accept e-texts that have been
"corrected" by virtue of having their
not-to-be-unwrapped lines indented?

transparency is the new black.
sunshine is the best disinfectant.
the most colorful fish deserve
the most apparent aquarium.

so... what is the p.g. policy on this?

-bowerbird


_______________________________________________
gutvol-d mailing list
gutvol-d at lists.pglaf.org
http://lists.pglaf.org/mailman/listinfo/gutvol-d 


From jimad at msn.com  Mon Apr 26 14:49:05 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 26 Apr 2010 14:49:05 -0700
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
In-Reply-To: <25a5e.64d77064.39073bd4@aol.com>
References: <25a5e.64d77064.39073bd4@aol.com>
Message-ID: <SNT120-DS17E8DB82E3DE1B764A917CAE040@phx.gbl>

Comparing to a recent pub of Studs Turkel which I happened to have at hand,
the page size is almost identical -- 1/2 sheet US Letter.  The Studs Turkel
however has 60 chars per line compared to 70 chars per line in your example
PDF -- and as compared to 50 chars per line or less for historical novels.
Less chars per line tend to make things more readable while taking more
paper.  Too many chars per line make things very painful to read -- which is
why magazine format or newspaper format is broken up into two or more
columns.

http://desktoppub.about.com/cs/finetypography/ht/line_length.htm


From Bowerbird at aol.com  Mon Apr 26 14:53:49 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 17:53:49 EDT
Subject: [gutvol-d] Re: Removing spurious break lines
Message-ID: <334c2.40a4c33d.3907656d@aol.com>

al said:
>    However, it seems to me that simply indenting 
>    not-to-be-formatted lines, and doing nothing else, 
>    is at least somewhat pointless.? 

it's not "pointless" to make _some_ "corrections", but not all, to a text.

that's what whitewashers do quite often, when _you_ "correct" a text...


>    Many of PG's older texts have other problems than that 
>    (missing illustrations, ASCII only rather than Latin1 or UTF8, 
>    missing/incomplete indexes, etc, etc, etc.)

that's right.   but you whitewashers don't fix all of those problems.

you apply the corrections that've been submitted, and you run the
text through the new version of your tools, and you might also do
some other checks (and thanks for doing that), and then you post it.


>   A far more desirable approach would be to 
>    pick an old PG text, find a scanset in IA/Google/wherever, 
>    get a copyright clearance, and do a new version of it, either 
>    from scratch, or a complete re-proof of the existing file(s), 
>    doing whatever is needed to bring it up to current standards.

i fully agree that that would be "more desirable".   it would also be
a heck of a lot more work.   and that's the trade-off, is it not?   still,
someone who does _some_ of the "corrections" -- whether that is to
make the text robust to rewrapping or some other subset of stuff --
is not engaging in a "pointless" exercise.   they're improving the text,
and -- just as i thank you whitewashers for improving the text when
you do _your_ "corrections" -- i would thank any other person who
improved the text when they do _their_ "corrections".   i fully approve
of an iterative process that steadily cumulates "partial corrections"...


>   Note: Complaints that the Repost team (mostly myself and 
>    David Widger, between us doing considerable clean-up work 
>    on several thousand of PG's old files) "should have done more" 
>    will fall on deaf ears.? We're only two people, we're also 2/3 
>    of the Whitewashers, and 2/3 of the Errata team, and we both 
>    produce independently.? Back off.

i have never "complained" about the "corrections" by whitewashers.

i _have_ pointed out that these "corrections" are _not_ complete,
in the sense that many errors and inconsistencies and omissions
_survive_ this "correction" process.   but again, i do not condemn
any "corrections" because they are not complete.   i welcome and
appreciate _all_ "corrections", even incomplete ones, because they
move the text closer to _perfection_, and that's what i advocate...

i wasn't "complaining" to report your corrections" are incomplete.

i felt the need to point that out because you _do_not_ point it out.

you say that "errors were corrected" and you simply leave it at that.

i believe that many people probably conclude, from your statement,
that you've made a good-faith effort to actually _find_ all the errors
-- such as by comparing the text with a newly-obtained scan-set --
when, in point of fact, you have not actually gone to those lengths...

nobody is "blaming" you, or "criticizing" you for doing what you do,
or for not doing what you're not doing.   so there's no need to tell us
to "back off".   that's insulting, and you really shouldn't be so 
sensitive.

we're just stating the facts.   _clearly_.   because you haven't done that.

so, can we all agree that "partial corrections" are _not_ "pointless"?
because that would be a huge step in the right direction, yes it would.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/1e13fad0/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 15:05:40 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 18:05:40 EDT
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
Message-ID: <34193.37b281e1.39076834@aol.com>

jim said:
>    The Studs Turkel however has 60 chars per line 
>    compared to 70 chars per line in your example PDF -- 
>    and as compared to 50 chars per line or less for historical novels.

jim, you need to pay better attention.


>    Less chars per line tend to make things more readable 
>    while taking more paper.? Too many chars per line 
>    make things very painful to read
...
>    http://desktoppub.about.com/cs/finetypography/ht/line_length.htm

yes, jim, i know all about line-length and readability...

(and if you do a citation, quote _bringhurst_, not about.com)

***

those who _were_ paying attention know that i said explicitly
i retained the p.g. linebreaks, accounting for the long lines...

it's also the case that michael mcd seemed to accept the
fontsize and the margins and the pagesize, which means
that his eyes didn't particular mind the long lines, and since
the .pdf was intended for him, he's the ultimate judge here.

even more so, people who were paying attention also know
that i said i would be rewrapping the text and doing a .pdf
with a bigger fontsize.   since the pagesize and the margins
will stay the same, that means shorter lines _by_definition_,
bringing them to the 50-65 characters bringhurst suggests.

in other words, jim, your post was completely unnecessary.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/3f81a0e3/attachment.html>

From jimad at msn.com  Mon Apr 26 15:19:05 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 26 Apr 2010 15:19:05 -0700
Subject: [gutvol-d] Re: Removing spurious break lines
In-Reply-To: <334c2.40a4c33d.3907656d@aol.com>
References: <334c2.40a4c33d.3907656d@aol.com>
Message-ID: <SNT120-DS143D0294FF66B5182A822FAE040@phx.gbl>

Someone new to PG having gotten a new Brand X ebook reader hears about PG
having "Free Books" goes to the website and downloads a file in some format.
Either it "works" but maybe it has a few errors in it and many people never
notice those errors are there.

Or that person chooses some file format from PG opens it in their ebook
reader and the results are totally scrambled and unreadable.  And that
person then says "Holy Cow what's wrong with PG???" and goes away never to
return.

Which do you prefer?


From jimad at msn.com  Mon Apr 26 15:24:33 2010
From: jimad at msn.com (Jim Adcock)
Date: Mon, 26 Apr 2010 15:24:33 -0700
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
In-Reply-To: <34193.37b281e1.39076834@aol.com>
References: <34193.37b281e1.39076834@aol.com>
Message-ID: <SNT120-DS1325CDBFD3EEDCD352F8EAAE040@phx.gbl>

I was paying attention and the insults you send my way are not necessary.

Again, you ask for feedback but you are not willing to accept any
graciously.


From Bowerbird at aol.com  Mon Apr 26 16:04:31 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 19:04:31 EDT
Subject: [gutvol-d] Re: Removing spurious break lines
Message-ID: <37e1b.78fc9644.390775ff@aol.com>

jim said:
>    Perhaps you should start by examining what PG has *already* 
>    implemented for txt unwrapping to generated HTML, 
>    find out what works and what doesn't work, and 
>    what requirements this puts on txt submission 
>    in order to make it all work right?

the code that does unwrapping right now is marcello's.

i don't know if he has updated it since i looked at it, but
when i did, it was just what i'd expect from a technocrat:
he made the problem much more difficult than it is, and
subsequently his code is overwrought, _and_ it backfires.

(for instance, he was using a rhyming dictionary to try to
determine if a set of lines constituted a poem; good luck.)

all in all, once you approach the problem intelligently,
it's not that difficult to unwrap most p.g. files correctly,
even the ones which have not been formatted correctly,
because you can detect the lines that should be indented.
i could run a script that auto-fixes most p.g. e-texts, with
few introduced errors; too bad p.g. doesn't work that way;
the whitewashers insist on fixing the books one-at-a-time.


>    Otherwise PG will end up with 
>    two conflicting text unwrapping standards, 
>    which will make the submitter's task 
>    even more confusing.

marcello had to code his unwrapper precisely because
p.g. doesn't enforce its existing policy on text indents,
or have the foresight to expand it to cover other cases.

his code won't scale.   and the indentation policy _will_...
(and it'll replace his kludge code with something simple.)

so there's no issue with "two conflicting standards" here.


>   To the extent that you guys are heading more-and-more 
>    towards the "unobtrusive" marking up of txt files, please 
>    note that Python has already got very good efforts in this regard 
>    called "reStructured Text" -- and the tools existing to support it!?

you're a few years behind the threads here, jim...

"restructured text" is a light-markup format, just like z.m.l.

the main difference is z.m.l. is geared directly toward p.g.,
whereas restructured text has a provenance that's muddled,
so if you were gonna choose between the two, choose z.m.l.
(tools for any light markup system are _not_ hard to build.)

but hey, if you can get p.g. to go for restructured text, do it!

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/f84674be/attachment.html>

From Bowerbird at aol.com  Mon Apr 26 16:10:14 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Mon, 26 Apr 2010 19:10:14 EDT
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
Message-ID: <383fb.6aac4bd9.39077756@aol.com>

jim said:
>    I was paying attention

then why didn't you understand the situation?
why did you say something that made no sense?


>    Again, you ask for feedback 
>    but you are not willing to accept any graciously.

ok, jim, let me make things perfectly clear.

i do not value _your_ feedback.   so when i ask for feedback,
i am most specifically _not_ asking for feedback from _you_.

definitely not you.

are we clear?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100426/936d746d/attachment-0001.html>

From mmcdermott at mad-computer-scientist.com  Mon Apr 26 20:42:49 2010
From: mmcdermott at mad-computer-scientist.com (Michael McDermott)
Date: Mon, 26 Apr 2010 22:42:49 -0500
Subject: [gutvol-d] Re: Typesetting (back on track,
	calling the mad scientist)
In-Reply-To: <2f24a.5dfce11.390757cc@aol.com>
References: <2f24a.5dfce11.390757cc@aol.com>
Message-ID: <1272316031-sup-6696@zion>

> sometimes she checks her e-mail when i'm away from the machine,
> so when i come back i end up sending a message from her account.

> oh, but you probably got her name from the "author" box of the .pdf,
> now that i think about it.   that was filled in by the text-editor i 

"Leslie" is also the registrant of z-m-l.com according to whois. :)

> i'd love to see a .pdf representing your output from that...

That can be arranged:

http://www.mad-computer-scientist.com/files/14465-8.pdf

Fairly straightforward. The commands:

    txt2html 14465-8.txt | html2ps -D > 14465-8.ps
    ps2pdf 14465-8.ps

Some notes:

* Inconsistent recognition of minor sections (i.e., sections within the
  intro).
* Double dashes are not converted to em-dashes.
* Images are, of course, not put in. This can be improved by using PG
  HTML, where applicable.
* No TOC. This can be done by cutting the TOC from the original document
  and putting it in a link file for txt2html, then telling html2ps to
  convert links to page references.
* No way that I see to handle footnotes automatically. Some custom CSS
  should handle it, if the docs don't lie.

PS. The CSS:

    @html2ps 
    {
        option 
        { 
            hyphenate: 1;
            number: 1;
        }
    }

    @page 
    { 
        size: 5.5in 8.5in;
        margin-top: 0.5in;
        margin-bottom: 0.5in;
        margin-left: 0.5in;
        margin-right: 0.5in;
    }

    p { text-indent: 1.5em; }

Excerpts from Bowerbird's message of Mon Apr 26 15:55:40 -0500 2010:
> mike said:
> >    Leslie,
> 
> leslie is my girlfriend, not me.
> 
> sometimes she checks her e-mail when i'm away from the machine,
> so when i come back i end up sending a message from her account.
> 
> oh, but you probably got her name from the "author" box of the .pdf,
> now that i think about it.   that was filled in by the text-editor i 
> used...
> if you check out the later .pdfs, you should find that that metadata is
> supplied correctly by my authoring-tool.   (unless i forgot to specify it.)
> 
> >    The leading is perfect.
> 
> oops...   i took it up considerably in the newer version i just posted...
> 
> it has 10.5-point type, with 12-point leading; and still runs 400 pages.
> and bigger leading means fewer lines per page, and thus more pages.
> which might or might not be a big deal to you.   all of these variables
> make it complicated to know how to create a .pdf for somebody else.
> 
> which is why a cyberlibrary needs to put .pdf/hard-copy output creation
> ability into the hands of its end-users, so they can _customize_ it 
> fully...
> 
> >    if I had an ebook reader I probably
> >    would not be bothering with any of this.
> 
> that's why i make many of the decisions according to a smart default.
> 
> >    I also conducted some more experiments with CSS stylesheets for on 
> >    the html2ps side of things (using txt2html so that the chain looked 
> like:
> >    txt2html -> html2ps -> ps file -> printer/screen).
> 
> i'd love to see a .pdf representing your output from that...
> 
> -bowerbird
-- 
Michael McDermott
www.mad-computer-scientist.com

From Bowerbird at aol.com  Tue Apr 27 00:22:16 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 Apr 2010 03:22:16 EDT
Subject: [gutvol-d] Re: Typesetting (back on track,
	calling the mad scientist)
Message-ID: <4bc33.33941a9b.3907eaa8@aol.com>

michael mcd said:
>    That can be arranged:
>    http://www.mad-computer-scientist.com/files/14465-8.pdf

that's serviceable output.

the book designers won't be giving it any awards.

but if it meets your needs, that's pretty much all that counts.

as the object was a hard-copy printout, i didn't even talk about
things like a hotlinked table of contents or footnote presentation,
but i _can_ deal with them if you want to discuss the pdf qua pdf.

just from an ink-on-paper perspective, though, if you were to
critique any of the output, exactly what points would you make?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100427/f00b1dde/attachment.html>

From jimad at msn.com  Tue Apr 27 08:21:44 2010
From: jimad at msn.com (James Adcock)
Date: Tue, 27 Apr 2010 08:21:44 -0700
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
In-Reply-To: <383fb.6aac4bd9.39077756@aol.com>
References: <383fb.6aac4bd9.39077756@aol.com>
Message-ID: <SNT120-DS197DB303D96402EDD1A71CAE030@phx.gbl>

> then why didn't you understand the situation?
why did you say something that made no sense?

 
I believe I did say something that makes sense, its just that you still do
not understand that your problem was not font size but rather line length.
You also do not apparently understand that it is not generally true that
font size and line length are inversely related.

 
>i do not value _your_ feedback.  so when i ask for feedback,
i am most specifically _not_ asking for feedback from _you_.

So you retract your previous statements when you said that I should have
been reporting to you when your tools fail?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100427/c04bf159/attachment.html>

From Bowerbird at aol.com  Tue Apr 27 11:40:06 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 Apr 2010 14:40:06 EDT
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
Message-ID: <39f63.1062790c.39088986@aol.com>

jim...   you just utterly and completely fail to grok
the listserve imperative to move threads forward.

***

jim said:
>   I believe I did say something that makes sense,

then you have a serious intellectual problem too.


>    its just that you still do not understand that 
>    your problem was not font size but rather line length.

this is now the third time i have said it, twice now
directly to you, and you still don't seem to "get it".

i retained the p.g. linebreaks.

that means p.g. decided the line-length, not me.

if you don't understand the exact meaning of that,
continue pondering it until you _do_ understand it.

because you won't be able to keep up with the thread,
let alone advance it, until you've understood that fact.

i retained the p.g. linebreaks.

that's why each line was as exactly as long as it was.
because that's how long it was in the p.g. e-text...

(except for lines i rewrapped to get rid of orphans.)

are the lines in p.g. files too long, in general?   _yes._

does that have any bearing on our experimentation?
not really.   because we're just trying things out here.
nothing is cast in stone.   and perhaps, for this book,
for michael's eyes in particular, the p.g. lines are fine,
even though -- for you, or me, or somebody else --
they might be too long.   our opinion doesn't matter,
because michael is printing out this book for himself.
that's a beauty of print-on-demand -- customization.

besides, we're going to do _more_ experiments later...


>    You also do not apparently understand that 
>    it is not generally true that font size 
>    and line length are inversely related.

again, you need to put in some more thought...

if we talk about a pre-determined line of characters
-- like in this case, where we retain p.g. linebreaks --
it is _absolutely_true_ that the space required will be
directly related to the fontsize.   _it's_absolutely_true_.

the character-count of the line will remain unchanged
-- by definition, as it was determined by linebreaks --
but the width of the line printed on a page depends on
fontsize.   the bigger the size, the more space required.

and if we also constrain the size of the space in which
we are putting that pre-determined line of characters,
we have put an upper-limit on the fontsize we can use.

in this case, we are using a space that's 4.5-inches wide,
so the biggest fontsize i could use that kept all the lines
reasonably within the width of the space was 10.5-point.
a smaller fontsize wouldn't have filled up all of the page,
plus it would've been less readable, so i used 10.5-point.

in other words, all of the other factors were constrained,
and fontsize was left to vary, and had to make it all fit...

for my next .pdf, and i've said _this_ three times now too,
twice directly to you, so it really should've sunk in by now,
i will unwrap the text (i.e., free it from the p.g. linebreaks)
and jack the fontsize to 12-point, so michael can see that.

and then we'll do some more experimentation after _that_.


>   So you retract your previous statements 
>    when you said that I should have been 
>    reporting to you when your tools fail?

just exactly how useful do you think an "it doesn't work"
report is, anyway?   you never gave one worthwhile report.

so no, jim, i don't want any reports from you, none at all...
are we clear now?   or do i have to repeat that a third time?

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100427/6c48c803/attachment.html>

From Bowerbird at aol.com  Tue Apr 27 12:48:26 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Tue, 27 Apr 2010 15:48:26 EDT
Subject: [gutvol-d] Re: Typesetting ("gods and fighting men")
Message-ID: <3f3c4.5b33dc94.3908998a@aol.com>

ok, back to work...

***

have you ever tried to copy text out of a .pdf?

if you have, you know that it can be frustrating.

because a lot of information seems to get lost.

perhaps the most noticeable are "empty" lines.

if you used a blank line between paragraphs,
all of those blank lines are lost, which means
that your paragraphs are now all run together.

compounding that problem is that the "soft"
linebreak at the end of mid-paragraph lines is 
turned into a "hard" linebreak.   it's a disaster...

you _can_ choose the "export as plain text" item
from one of the menus, and that does retain the
empty lines.   unfortunately, it also strips styling.

the copy-text route retains the styling, or at least
_some_ of it.   but not all of it.   italics are often lost.
so is the indentation on block-quotes, poetry, etc.

so no matter what you do, getting text from a .pdf
is a struggle.   that's one reason why .pdf is called
"the roach motel of documents", because text can
go in, but it cannot come out again.

you can demonstrate this to yourself by using
the .pdf that michael mcd created...

***

i do some things with my .pdf tool to solve this
little problem.   for instance, it doesn't output a
"blank" line when it encounters one.   instead, it
outputs a double-colon -- "::" -- that is white,
thus _invisible._   (or i'll often make it light gray.)

but it's still there, and gets copied out when you
copy the text, so you can then do a global change
of "::" to nil, and voila, you have your blank lines.

in z.m.l., italics are represented by _underbars_,
so i also have my program output the underbars,
again turning them white so they'll be invisible...

i haven't worked on this for a while, so i cannot
remember what state of success it's in right now,
but my goal is to create "round-tripping", so that
when you use z.m.l. to create a .pdf, the text you
copy out of that .pdf, after a few global changes,
can be used to create that exact same .pdf again.

go ahead and copy the text out of one of my .pdfs,
and you'll get a good idea what i'm talking about...

***

the tricks that are built into my tool are ones that
you can do "manually" in your own wordprocessor,
if you'd like to create a "round-trip" capability too.

surround your italicized stuff with white underbars,
change your blank lines to a white double-colon,
and use white periods to create your indentation.

i did that in the next two .pdfs i will talk about, so
you can copy the text out of 'em to see this at work.

***

i used a text-editor to create two more .pdfs for us
in our experiments using "gods and fighting men".

i unwrapped the text, freeing it from p.g. linebreaks.
then i made the fontsize a more-readable 12-point.
i also put back in a more-spacious 14-point leading.

all these changes pushed the .pdf to some 567 pages,
from the previous 391, so that's an offsetting negative,
but the positive aspect is a much more readable .pdf...

i created a ragged-right version, and a justified one:
>    http://z-m-l.com/misc/14465-rewrapped-rag.pdf
>    http://z-m-l.com/misc/14465-rewrapped-just.pdf

these two are exactly the same, except for justification,
so you might find it odd that the first is just 1.5 megs,
while the second is almost twice as big, at 2.9 megs...

the reason for this discrepancy is the ragged-right .pdf
stores the location of each line, rendering it right there,
while the justified .pdf has to store the location of each
_word_, to print it in the right place.   it's a big difference.

at any rate, maybe the mad scientist will look at these
and advise us on what pointsize he'd like to see "final",
what leading he wants, and if he prefers justification...

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100427/d88b3d9e/attachment-0001.html>

From Bowerbird at aol.com  Fri Apr 30 01:24:46 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 30 Apr 2010 04:24:46 EDT
Subject: [gutvol-d] spill baby spill
Message-ID: <79295.5d9ffc0.390bedce@aol.com>


the worst slick in human history.

so let's all give our thanks to the republicans and 
the oil corporations who pull their puppet strings.

-bowerbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100430/40101f81/attachment.html>

From dakretz at gmail.com  Fri Apr 30 09:27:13 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 30 Apr 2010 09:27:13 -0700
Subject: [gutvol-d] Re: spill baby spill
In-Reply-To: <79295.5d9ffc0.390bedce@aol.com>
References: <79295.5d9ffc0.390bedce@aol.com>
Message-ID: <l2o627d59b81004300927jf49cb0eh3298c1eff992f6d6@mail.gmail.com>

s/republicans/politicians/

On Fri, Apr 30, 2010 at 1:24 AM, <Bowerbird at aol.com> wrote:

>
> the worst slick in human history.
>
> so let's all give our thanks to the republicans and
> the oil corporations who pull their puppet strings.
>
> -bowerbird
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100430/2f240edf/attachment.html>

From Bowerbird at aol.com  Fri Apr 30 11:25:36 2010
From: Bowerbird at aol.com (Bowerbird at aol.com)
Date: Fri, 30 Apr 2010 14:25:36 EDT
Subject: [gutvol-d] Re: spill baby spill
Message-ID: <b12f8.1139821.390c7aa0@aol.com>

dakretz said:
>    s/republicans/politicians/

all of the politicians are rotten, it's true.

but the republicans are more rotten.   way more rotten.

besides, look at the subject-header...

-bowerbird

p.s.   this means if any of you consider yourself to be
a republican, you'd better take a good look at yourself.
not that democrats couldn't stand a look in the mirror.
your whole system is now corrupt, and _you_ made it...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100430/402f0d4c/attachment.html>

From dakretz at gmail.com  Fri Apr 30 11:42:53 2010
From: dakretz at gmail.com (don kretz)
Date: Fri, 30 Apr 2010 11:42:53 -0700
Subject: [gutvol-d] Re: spill baby spill
In-Reply-To: <b12f8.1139821.390c7aa0@aol.com>
References: <b12f8.1139821.390c7aa0@aol.com>
Message-ID: <i2i627d59b81004301142o1d03a847x9a79c0c54adda29b@mail.gmail.com>

OK, then

/republicans/politicians currently in power/


On Fri, Apr 30, 2010 at 11:25 AM, <Bowerbird at aol.com> wrote:

> dakretz said:
> >   s/republicans/politicians/
>
> all of the politicians are rotten, it's true.
>
> but the republicans are more rotten.  way more rotten.
>
> besides, look at the subject-header...
>
> -bowerbird
>
> p.s.  this means if any of you consider yourself to be
> a republican, you'd better take a good look at yourself.
> not that democrats couldn't stand a look in the mirror.
> your whole system is now corrupt, and _you_ made it...
>
> _______________________________________________
> gutvol-d mailing list
> gutvol-d at lists.pglaf.org
> http://lists.pglaf.org/mailman/listinfo/gutvol-d
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pglaf.org/mailman/private/gutvol-d/attachments/20100430/32e45bd9/attachment.html>

From lee at novomail.net  Fri Apr 30 12:53:21 2010
From: lee at novomail.net (Lee Passey)
Date: Fri, 30 Apr 2010 13:53:21 -0600
Subject: [gutvol-d] Re: Cooperative proofreading
In-Reply-To: <4BD0BA56.5020806@novomail.net>
References: <4BD0BA56.5020806@novomail.net>
Message-ID: <4BDB3531.1040307@novomail.net>

On 4/22/2010 3:06 PM, Lee Passey wrote:

> Just an update on my cooperative proofreading site
> (http://www.ebookcooperative.com/).

[snip]

> The next step will be to create a servlet that will allow downloading an
> entire e-book by combining all the pages from the repository into a
> single file.

Now completed.

It is unlikely that I will make any further bulk changes to the source 
documents associated with any projects; feel free to make any 
corrections you feel appropriate.

As I populate the database with more projects, I would like to focus on 
the PG "Frankentexts," that is to say, those texts in Project Gutenberg 
which appear to be stitched together from multiple, unspecified sources. 
The classic example of this is, of course, the PG edition of 
_Frankenstein_. Are there other instances I should be aware of?

> As always, feedback is welcomed.