Portable Shell-Scripting

Date: 2024-05-17 — Topic: Shell — Langs: awk, sh — by Slatian

Knowledge for writing portable shell scripts that work across operating system distributions.

Table of Contents

What makes your shellscripts break?

Before outlining how shellscripts can be made portable … why do they break?

This is a "you should consider", not a "you have to": Depending on the purpose and intended audience of your script not all of the following may apply.

Not my Shell

Not everyone has the same shell installed. Different systems come with different shells installed, mainly either bash or dash (though there are other shells for running scripts you might run across).

While the language those shells use looks the same every shell has some extensions to make live a bit easier. Problem is that using those extensions aren't standardised and only work in one specific shell, that might not be installed on your target system.

This problem usually happens with bash extensions, they are also known as bashisms.

Not my Coreutils

The coreutils are those programs everyone uses all the time when writing more advanced shell scripts. Examples are grep, sed, ls and so on.

Well known implementations are the GNU coreutils, busybox or Apples implementation for MacOS.

Like the shells different implementations of those commands have different extensions. Using those extensions will break your script in unexpected ways when running on a different system that has a different set or version of Coreutils installed.

Note on awk: awk usually isn't part of the coreutils package, but the problems applies. See mawk for a less well known, but still popular implementation.

Not my Environment

When writing scripts it is easy to assume that /tmp is where temporary files go, /run/user/$UID houses session files and ~/.cache/ is a sensible cache directory … all of those are wrong.

That is how your systems work, maybe your friends system works like this too, but some people out there use different file system layout for very good reasons (performance, security, etc.).

Those people are very very glad when everything uses standardised environment variables like TMPDIR, XDG_RUNTIME_DIR and XDG_CACHE_HOME. (And you might become such a someone faster than you expect!)

Not my Program

In rare cases programs have different names on different distributions.

This can't be easily avoided by some standardised mechanism and is something one should generally be aware of.

One example is readlink -f to canonicalize filenames, except for FreeBSD readlink, where that option sets a formatting template.

Another Example of this is the Tesseract OCR engine CLI-tool which is called tesseract, except on Void Linux where it is called tesseract-ocr because they already had a different program packaged that is called tesseract.

For other programs that aren't part of the directories in PATH one might run into this problem sooner.

When implementing something like this: Use sensible defaults, allow for working around any issues and document how the mechanism works.

Not my Display Server

Do you run Xorg, Wayland, only a TTY (over SSH) or something else?How many of your tools rely on that specific display protocol being available?

Make sure you know the limits of your tools and implement appropriate mechanisms to switch tools when necessary.

POSIX Shell

POSIX is a great thing, it standardises (amongst other things) shell syntax and command line utilities.

If you only use what POSIX defines in a way it is defined by POSIX your shellscripts will work across a wider range of systems without you having to do any extra work.

The official standard document is the Single UNIX® Specification, Version 4, 2018 Edition published by the Open Group, but that is behind a login.

Luckily Wikipedia has a List of POSIX commands, where for almost every listed command the first external link on the command specific page is the official POSIX specification in HTML (no login needed).

Starting a shellscript with #!/bin/sh usually indicates that it is using POSIX compatible shell syntax. In practice this means you just write normal shell code but watch out to avoid some shell-specific features. (See Linters)

POSIX also specifies some Environment Variables that can be used to find out the right thing to do, like where to put temporary files.

XDG - freedesktop.org

If you are interacting with user folders or the desktop try to use the Freedesktop specified Environment Variables. Also the xdg-utils, while not perfect provide ready to use implementations of commonly needed functionality, most prominent opening files.

Also see my blog-posts on the xdg-utils: "It just opens files" - xdg-open under the hood and xdg-mime: Mapping Files to Applications taken apart.

Linters

Linters can help you avoid some not very obvious mistakes and compatibility problems.

For shell scripts the go to linter is shellcheck (your favourite Linux distribution probably has it packaged).

If you need to lint an awk script/command: gawk has the --lint and --lint-old options (both will run the command) that make gawk tell you when it stumbles across something you might want to avoid happening. The --lint-old option is also great for catching constructs that only work in gawk but not other implementations.

Declare your Dependencies

When using non-POSIX tools, try to document what you are using and why.