Friday, June 13, 2014

Filename Sanitization

I came across the Zaru project that provides filename sanitization for Ruby. You can learn a bit about filenames reading the article on Wikipedia. I thought it might be a nice feature to implement something like this for Factor.

The rules for sanitization are relatively simple, so I will list and then implement each one:

1. Certain characters generally don't mix well with certain file systems, so we filter them:

: filter-special ( str -- str' )
    [ "/\\?*:|\"<>" member? not ] filter ;

2. ASCII control characters (0x00 to 0x1f) are not usually a good idea, either:

: filter-control ( str -- str' )
    [ control? not ] filter ;

3. Unicode whitespace is trimmed from the beginning and end of the filename and collapsed to a single space within the filename:

: filter-blanks ( str -- str' )
    [ blank? ] split-when harvest " " join ;

4. Certain filenames are reserved on Windows and are filtered (substituting a "file" placeholder name):

: filter-windows-reserved ( str -- str' )
    dup >upper {
        "CON" "PRN" "AUX" "NUL" "COM1" "COM2" "COM3" "COM4"
        "COM5" "COM6" "COM7" "COM8" "COM9" "LPT1" "LPT2" "LPT3"
        "LPT4" "LPT5" "LPT6" "LPT7" "LPT8" "LPT9"
    } member? [ drop "file" ] when ;

5. Empty filenames are not allowed, replaced instead with file:

: filter-empty ( str -- str' )
    [ "file" ] when-empty ;

6. Filenames that begin with only a "dot" character are replaced with file:

: filter-dots ( str -- str' )
    dup first CHAR: . = [ "file" prepend ] when ;

Putting it all together, and requiring the filename to be no more than 255 characters:

: sanitize-path ( path -- path' )
    filter-special
    filter-control
    filter-blanks
    filter-windows-reserved
    filter-empty
    filter-dots
    255 short head ;

The code for this (and some tests) is on my GitHub.

No comments: