Skip to main content

A Content-Aware Untabify Command

I have an application whose behavior is controlled by a YAML file. Recently, I modified the behavior and shipped off the new YAML file to the application, only to have the application die because the YAML file contained tabs instead of spaces. One remedy would be to modify the application to handle tabs properly. Another would be to ensure that tabs are never present in the YAML file to begin with. In this instance, the latter was the path of least resistance; this is a brief note about creating a shell script that will use GNU Emacs to perform the tab to space conversion.

In some editing contexts, for example a C program, tabs and spaces are cosmetic, and the number of spaces a tab represents doesn't affect the semantics of the program. In other contexts, such as that of a Python program or a YAML file, the number of spaces a tab represents can affect program behavior. Because of this, I didn't want to simply write a bash script to use tr to convert tabs to spaces. Rather, I wanted an intelligent, context-aware, method of converting tabs to spaces - one that didn't' risk altering the semantics of the content.

Most Emacs major modes for editing languages (e.g., Python, C, bash, ...) perform syntax aware formatting/indentation. I thought it would be nice to use this for tab to space substitution, but I'd never run Emacs in 'batch mode' before.

I found a few interesting references here and here.

These were helpful, but they didn't discuss the exact manner in which I hoped to process files. I finally found this which had a pointer to a spectacularly clever way of shebanging a file containing elisp.

Armed with these tricks, I put together a simple script that looked something like this:

":"; exec emacs --script "$0" "$@" # -*- mode: emacs-lisp; lexical-binding: t; -*-
;; Brilliant shell trickery in the previous line is from:

(find-file (pop argv))
(untabify (point-min) (point-max))
(princ (buffer-substring (point-min) (point-max)))
(kill-emacs 0)

The script expects a single argument - the name of a file to process. The elisp loads the file into a buffer, runs the syntax-aware untabify command on the buffer contents, and then prints the buffer contents on stdout.

This is exactly what I needed and it worked perfectly.

If one prefers to save the untabified results back in the original file (rather than printing it to stdout), just replace this line:

(princ (buffer-substring (point-min) (point-max)))

with this: