Re: CRLF fun stuff again...


Subject: Re: CRLF fun stuff again...
From: Donald Lee (donlee_nat@icompute.com)
Date: Sat Feb 10 2001 - 02:07:09 EST


At 5:38 PM -0600 2/9/01, Duncan Sinclair wrote:
>>When you've got to remember to add a line to AppleVolumes.system just to
>>keep a file with a new suffix from being corrupted, that's just bad.
>
>Doesn't cr/lf need to be explicitly turned on the in the options
>for the share? But maybe the default AppleVolumes.system wants
>to be changed too. My AppleVolumes.system file has commentary on
>this issue that doesn't seem to be in the current version:
>
> # default translation -- note that CR <-> LF translation is done on all
> # files of type TEXT. The first line turns off translation for files of
> # unknown type, the second turns this translation on.
> # . BINA UNIX
> # . TEXT UNIX
>
>The AppleVolumes.system really should be split into two different
>files - one with the actual volume info, and a second one for
>files types. I can't remember - can these settings be changed on
>a per-share basis? (Maybe they should even be settable on a
>per-directory basis (a la Apache .htaccess file.)
>
>I'll repeat again my position: if netatalk is going to have this
>feature, and there's evidence that people want it, then it should
>work properly. People who don't want this feature should be
>unaffected by it.

The problem with this feature is three fold. For one, it is not clear to
me that an algorithm can be devised that can be reliable. Two, the
conversion is not reversible, and three, the conversion (i.e. corruption)
is absolutely silent.

The first problem is
really much more serious and fundamental than it appears. That is
that there is really no way to know, reliably, that a file is of type TEXT.

This is because the file type is not intrinsic to the the data.

A good example of this is some applications (ref: archives) that
write out the data *first* and *then* change the type to "text".

The converse also happens. There are utilities that use TEXT
as their file types that are not TEXT at all. (most of these, I
admit, are programmer/geek tools)

One _could_ come up with ways to deal with this, but I'm not at all
convinced that they will be "correct". For instance, do you *really* want
to make netatalk rip through a given file and change all the returns
to linefeeds any time the file type changes to text? Do you want to
do this even though the conversion is *not* reversible?

If a file type changes from text to something else, do you want to
rip through and change all the linefeeds to returns?

If the cr/lf conversion were symmetric, it would not bother me
so much, but a file that is incorrectly "converted" is utter
trash. There is no way to take a binary file that has been "converted"
and figure out which of the linefeeds used to be returns.

The third problem is serious when you consider that the potential corruption
of the first two are fatal to your data, and there is no way for netatalk
to know (or report) that you just lost your data. One could put in
"safety" checks in the conversion code. For instance, you could check
if a file being converted has any high-bits turned on (a sure sign that it's
not 7-bit ASCII). OK, what do you *do*??? Would netatalk produce an I/O
error at this point? Would it simply stop converting? There is no
clean way to deal with this.

The upshot of all this is that the feature may or may not be useful,
but it is **DANGEROUS** if not fully understood and very carefully used,
and I don't think it is possible to make it safe and reliable.

-dgl-



This archive was generated by hypermail 2b28 : Sun Oct 14 2001 - 03:04:32 EDT