Shell Null Termination / Separation

Date: — Topic: — Lang: — by Slatian

Null or zero termination or separation is when one uses a null byte to separate records, which is different from the newline usually used in shell pipelines.

Why

Usually operating things by a newline is good enough and easy to debug, but even a little shell pipeline like a ls | sort can blow up when it encounters unexpected data like a filename that contains a newline character.

Fun fact: Most filesystems actually allow you to use all kinds of characters from emoji to non-printables and control characters. Including newlines. One could even put ASCII-art in a filename (I've already done that)!

Using a null character saves you there as it is the only character guaranteed to never occur inside a filename and this becomes more important with other sources of (untrusted) input.

The improved version would be ls --zero | sort -z and a | tr '\0' '\n' to convert to newlines for the human that still wants the illusion of line separation.

Examples of inputs you shouldn't trust:

Translating between worlds

To translate between newline separated and the null separated world there is a little utility called tr.

null to newline
tr '\0' '\n'
newline to null
tr '\n' '\0'
swap newline and null
tr '\n\0' '\0\n'
remove potential nulls
tr -d '\0'

Make sure to actually remove the characters you assume to not be in your input!

Getting a program into null mode

Unfortunately there is no "the one way" to make a program use nulls instead of newlines so here is a a hopefully useful table. Please note that some programs only have the option available in the gnu version, but not in i.e. the busybox version.

If a program has some kind of printf option one can use that to make the output null separated.

Command Version Input Flag(s) Output Flag(s)
awk gnu, other modern -v 'RS=\0'
cut gnu -z -z
find -print0
fzf --read0 --print0
grep -r gnu -Z
grep gnu -z -z
inotifywait --format '…%0' --no-newline
ls gnu --zero
rg -0
sed gnu -z -z
sort gnu, busybox -z -z
tar -T gnu --null
uniq gnu, busybox -z -z
xargs gnu, busybox -0
read -r -d ""

Note: The best way to automatically find out if a program supports an option is to grep -q across the output of its --help. Just make sure to choose a specific enough regex to avoid false positives. Trying out is also an option but that usually is a bit more complex.