groking hfs+ character encoding 19. Mar 2011
Linux and (most?) other Unix-like operating systems use the so called normalization form C (NFC) for its UTF-8 encoding by default but do not enforce this. Darwin, the base of the Macintosh OS enforces normalization form D (NFD), where a few characters are encoded in a different way. On OS X it’s not possible to create NFC UTF-8 filenames because this is prevented at filesystem layer. On HFS+ filenames are internally stored in UTF-16 and when converted back to UTF-8, for the underlying BSD system to be handable, NFD is created. See here for defails. I think it was a very bad idea and breaks many things under OS X which expect a normal POSIX conforming system. Anywhere else convmv is able to convert files from NFC to NFD or vice versa which makes interoperability with such systems a lot easier. (Source: convmv)
If you print the german umlaut ä the composed form is used.
$ printf ä | hexdump
0000000 c3 a4
0000002
If you create a file named by ä the decomposed form is used instead.
$ touch ä
$ ls | tr -d '\n' | hexdump
0000000 61 cc 88
0000003
You can convert the decomposed form into the composed form.
$ ls | iconv -f utf-8-mac -t utf-8 | tr -d '\n' | hexdump
0000000 c3 a4
0000002
Kommentare (3)