Shell Null Termination / Separation
Null or zero termination or separation is when one uses a null byte to separate records, which is different from the newline usually used in shell pipelines.
Table of Contents
Why
Usually operating things by a newline is good enough and easy to debug, but even a little shell pipeline like a ls | sort
can blow up when it encounters unexpected data like a filename that contains a newline character.
Fun fact: Most filesystems actually allow you to use all kinds of characters from emoji to non-printables and control characters. Including newlines. One could even put ASCII-art in a filename (I've already done that)!
Using a null character saves you there as it is the only character guaranteed to never occur inside a filename and this becomes more important with other sources of (untrusted) input.
The improved version would be ls --zero | sort -z
and a | tr '\0' '\n'
to convert to newlines for the human that still wants the illusion of line separation.
Examples of inputs you shouldn't trust:
- The human in front of the machine (I know myself)
- Your filesystem (because USB sticks, downloads, the human, etc.)
- Anything that comes in over the network (Not even your own service, not the service you are paying for, …)
Translating between worlds
To translate between newline separated and the null separated world there is a little utility called tr
.
- null to newline
tr '\0' '\n'
- newline to null
tr '\n' '\0'
- swap newline and null
tr '\n\0' '\0\n'
- remove potential nulls
tr -d '\0'
Make sure to actually remove the characters you assume to not be in your input!
Getting a program into null mode
Unfortunately there is no "the one way" to make a program use nulls instead of newlines so here is a a hopefully useful table. Please note that some programs only have the option available in the gnu version, but not in i.e. the busybox version.
If a program has some kind of printf
option one can use that to make the output null separated.
Command | Version | Input Flag(s) | Output Flag(s) |
---|---|---|---|
awk | gnu, other modern | -v 'RS=\0' | |
cut | gnu | -z | -z |
find | -print0 | ||
fzf | --read0 | --print0 | |
grep -r | gnu | -Z | |
grep | gnu | -z | -z |
inotifywait | --format '…%0' --no-newline | ||
ls | gnu | --zero | |
rg | -0 | ||
sed | gnu | -z | -z |
sort | gnu, busybox | -z | -z |
tar -T | gnu | --null | |
uniq | gnu, busybox | -z | -z |
xargs | gnu, busybox | -0 | |
read | -r -d "" |
Note: The best way to automatically find out if a program supports an option is to grep -q
across the output of its --help
. Just make sure to choose a specific enough regex to avoid false positives. Trying out is also an option but that usually is a bit more complex.