Re: CRLF fun stuff again...


Subject: Re: CRLF fun stuff again...
From: Duncan Sinclair (sinclair@dis.strath.ac.uk)
Date: Sat Feb 10 2001 - 05:18:52 EST


Hi Folks,

I've received a number of emails about all this stuff. This is the
only one I think really _needs_ a reply...

Donald Lee writes:
>At 5:38 PM -0600 2/9/01, Duncan Sinclair wrote:
>>I'll repeat again my position: if netatalk is going to have this
>>feature, and there's evidence that people want it, then it should
>>work properly. People who don't want this feature should be
>>unaffected by it.
>
>The problem with this feature is three fold. For one, it is not clear to
>me that an algorithm can be devised that can be reliable. Two, the
>conversion is not reversible, and three, the conversion (i.e. corruption)
>is absolutely silent.

Let's address these points then...

>The first problem is
>really much more serious and fundamental than it appears. That is
>that there is really no way to know, reliably, that a file is of type TEXT.

You know a file is text by two indicators:
  1) Its file type is TEXT.
  2) It has an extension which tells netatalk it is text.

These two indicators cover files created by Macs and files created
by Unix, repectively.

With a Mac-created file you can change the extension as much as you
want and it will not change how netatalk interprets the file. Only
if it is marked as "TEXT" will the cr/lf transformation be used.

With a unix created file, netatalk doesn't know if it is text or not,
so it uses the extension to decide whether to do a transform or not.
If a unix file is edited on a Mac it becomes a Mac file at this point.

>This is because the file type is not intrinsic to the the data.

True.

>A good example of this is some applications (ref: archives) that
>write out the data *first* and *then* change the type to "text".

My suggested enhancement to netatalk would catch this and do the conversion
on the fly.

>The converse also happens. There are utilities that use TEXT
>as their file types that are not TEXT at all. (most of these, I
>admit, are programmer/geek tools)

This should not matter. The transformation is reversable and so
is transparent to the Mac application.

>One _could_ come up with ways to deal with this, but I'm not at all
>convinced that they will be "correct". For instance, do you *really* want
>to make netatalk rip through a given file and change all the returns
>to linefeeds any time the file type changes to text?

Yes. And all the linefeeds to returns.

>Do you want to
>do this even though the conversion is *not* reversible?

But the transformation is reversible. the conversion is as follows:

  CR -> LF
  LF -> CR

You do the transformation twice and you get back an identical file.

>If a file type changes from text to something else, do you want to
>rip through and change all the linefeeds to returns?

Yes. Change all the linefeeds to returns, and all the returns to
linefeeds.

>If the cr/lf conversion were symmetric, it would not bother me
>so much, but a file that is incorrectly "converted" is utter
>trash. There is no way to take a binary file that has been "converted"
>and figure out which of the linefeeds used to be returns.

It's a transformation, not a conversion and it is symmetric.

>The third problem is serious when you consider that the potential corruption
>of the first two are fatal to your data, and there is no way for netatalk
>to know (or report) that you just lost your data. One could put in
>"safety" checks in the conversion code. For instance, you could check
>if a file being converted has any high-bits turned on (a sure sign that it's
>not 7-bit ASCII). OK, what do you *do*??? Would netatalk produce an I/O
>error at this point? Would it simply stop converting? There is no
>clean way to deal with this.

If it's handled right you never get corrupted files. You don't (and,
yes, shouldn't want to) go checking to see if the file "looks" like
text.

>The upshot of all this is that the feature may or may not be useful,
>but it is **DANGEROUS** if not fully understood and very carefully used,
>and I don't think it is possible to make it safe and reliable.

Well I think it can be fixed to make it safe and reliable.

Cheers,

Duncan.



This archive was generated by hypermail 2b28 : Sun Oct 14 2001 - 03:04:32 EDT