Shell Null Termination / Separation

Date: , Updated: — Topic: — Lang: — by Slatian

Null or zero termination or separation is when one uses a null byte to separate records, which is different from the newline usually used in shell pipelines.

Table of Contents

Why

Usually operating things by a newline is good enough and easy to debug, but even a little shell pipeline like a ls | sort can blow up when it encounters unexpected data like a filename that contains a newline character.

Fun fact: Most filesystems actually allow you to use all kinds of characters from emoji to non-printables and control characters. Including newlines. One could even put ASCII-art in a filename (I've already done that)!

Using a null character saves you there as it is the only character guaranteed to never occur inside a filename and this becomes more important with other sources of (untrusted) input.

The improved version would be ls --zero | sort -z and a | tr '\0' '\n' to convert to newlines for the human that still wants the illusion of line separation.

Examples of inputs you shouldn't trust:

Translating between Worlds

To translate between newline separated and the null separated world there is a little utility called tr.

null to newline
tr '\0' '\n'
newline to null
tr '\n' '\0'
swap newline and null
tr '\n\0' '\0\n'
remove potential nulls
tr -d '\0'

Make sure to actually remove the characters you assume to not be in your input!

Getting a Program into Null-Separation Mode

Unfortunately there is no "the one way" to make a program use nulls instead of newlines so here is a a hopefully useful table. Please note that some programs only have the option available in the gnu version, but not in i.e. the busybox version.

If a program has some kind of printf option one can use that to make the output null separated.

Command Version Input Flag(s) Output Flag(s)
awk gnu, other modern -v 'RS=\0'
cut gnu -z -z
find -print0
fzf --read0 --print0
grep -r gnu -Z
grep gnu -z -z
inotifywait --format '…%0' --no-newline
ls gnu --zero
rg -0
sed gnu -z -z
sort gnu, busybox -z -z
tar -T gnu --null
uniq gnu, busybox -z -z
xargs gnu, busybox -0
read IFS="" read

Note: The best way to automatically find out if a program supports an option is to grep -q across the output of its --help. Just make sure to choose a specific enough regex to avoid false positives. Trying out is also an option but that usually is a bit more complex.

Note: read is a shell builtin, which is why you don't have to export the IFS (information separator) variable. This also affects other shell builtins like for-loops.

Shell Variables and Null Bytes

Shells usually drop null-characters when storing them into variables, so don't rely on that happening and pipe information directly or, if your constraints allow, use bash arrays.

Because of this limitation for-loops can't make use of null separation, use while read (will probably result in a subshell though) and xargs instead.

Note on IFS: Setting the IFS to a newline by IFS="<newline>" where <newline> is a literal line break (as in: the closing quote being the first character of the next line) you at least get clean newline separation if a for loop is really needed.