Standards for Path and Filename Nomenclature
Personal Standards For Path and Filename Terminology
I've been writing and designing Unix-based software for 20+ years. In this software domain, as in others, using proper names/terminology for things is crucial to conveying understanding. Despite a strong effort to name things properly, I plead guilty to being inconsistent and imprecise when dealing with names for files and paths. A recent encounter with some old code of mine pushed me to spend a few minutes to create a nomenclature that is accurate and well-defined.
I felt the need to document my conclusions and the thought process I followed in arriving at the nomenclature, thus this note.
Authoritative Documents
A logical first step is to examine the authoritative documents and see what they offer.
The POSIX Standard
The POSIX Standard provides a good starting point. Here are some relevant excerpts:
pathname - A string used to identify a file. It has optional beginning
/
characters, followed by zero or more filenames separated by/
characters. A pathname can optionally contain one or more trailing/
characters. Multiple successive/
characters are considered to be the same as one/
, except for the case of exactly two leading/
characters.filename - A sequence of one or more bytes used to name a file. A filename is sometimes referred to as a pathname component.
That's all there is on the subject in the standard.
Unix Man Pages
The following are excerpts from the System V and BSD man page for the
basename
command:
BASENAME(1) NAME basename, dirname -- return filename or directory portion of pathname SYNOPSIS basename string dirname string DESCRIPTION (BSD) The basename utility deletes any prefix ending with the last slash character present in string. The dirname utility deletes the filename portion, beginning with the last slash character to the end of string DESCRIPTION (SYSV) Basename deletes any prefix ending in '/' from string. Dirname places on standard output the name of the directory in which a file named string would nominally be found.
The NAME
section above is taken from the BSD man page. I chose it because
it specifically used the term pathname
from the POSIX standard, although BSD
predates the standard.
Disappointingly, most Unix man pages (e.g., Seventh Edition, SVR4, etc.), have
a NAME
section stating: strip filename affixes
, though the SUMMARY
more
precisely uses pathname
. The latter is an example of the fungible use of
filename, pathname, etc., that I'm trying to avoid.
Filename Components
Unix assigns no special meaning to the characters in a filename, other than the
lone .
and ..
. This means that Unix has no concept of file extensions
as an indicator of the contents of a file or the application associated with a
file.
In practice, however, filenames are often comprised of two components: a name and an
extension, separated by a .
. For example: photo.jpg
.
Nomenclature
Given the preceding, I have defined the following terms for use by me in code and documentation:
pathname - Entire path necessary to unambiguously identify a file.
path - A portion of a pathname.
filename
- The rightmost component of a pathname. The value that is returned
by basename pathname
.
filename extension
- If a filename contains a .
in other than the first character position,
the characters following the .
are the filename extension.
filename base
- If a filename contains a .
in other than the first character position,
the characters preceding the .
are the filename base.
Discussion and Example Usage
-
pathname is an absolute or fully-qualified path identifying a file. Relative paths can not be a pathname; a relative path is simply a path.
-
When referring to a true pathname, use pathname rather than 'a path to a file'. In the case of a relative path, it is permissible to use 'a relative path to a file'.
-
I don't care for file base due to the similarity to the
basename
command, which returns the full filename. file name, on its own, seems a better choice, but it would be too easily confused with filename.