Then a while ago I read about Rob Landley’s idea of
(like cut(1) but simpler)
and because I liked the idea I want to (re)implement this tool
while trying to see if I can keep it simpler than Rob’s C version.
We’ll start with Swift.
Words is supposed to be a filter,
which means it should read from standard input and write to standard output.
Let’s get started by writing the simplest possible filter
that just echoes stdin unmodified,
like cat(1) does.
To solve the problem at hand we can process each line to
- split the line into words
- select some of those words
- in arbitrary orders
- allowing multiple selections
- combine the selected words into a new line
Let’s implement the easier part of this:
The difference in behaviour is that this version replaces sequences of whitespace with single space characters.
Which words are being selected (and their order) need to be
- indicated by the user in some way
- evaluated by the program in some way
We’ll handle the user interface later. The program can store the required information either as a (passive) data structure or as a(n active) function or object.
This illustrates a typical design conflict between… well, not between functional and object-oriented programming, which I believe is entirely independent from this. I’m referring to what Noel Welsh describes as “opaque and transparent interpreters”.
In our program the difference would look something like this:
The trade-offs boil down to the question: Which parts of our solution do we want to separate?
We’ll go for the transparent variant first and see where it takes us.
We want to enter the selection on the command line. I can think of two approaches to this:
Each command line argument indicates a single selection, the program only ever operates on standard input.
words 3 4 2 < file
The first command line argument specifies all selections, the remaining ones indicate files to process.
words 3,4,2 file
We’ll go with the first approach here because I believe it’s easier to implement.
There’s a lot going on here, let’s walk through the code:
The first element of
CommandLine.arguments contains the path to the executable,
we’ll transform the remaining arguments into indices.
string to a number.
If that fails, like when
string doesn’t contain a number,
we print an error message
and abort with
EX_USAGE (from sysexits(3)).
Indices supplied on the command line should start at one, so we correct by one here to get array indices.
Once we have the indices, we can grab the corresponding elements from the array.
This naive version of the program crashes as soon as one of the indices is out of range for any line. We can prevent this problem by ignoring indices that are out of bounds for a given line.
One very useful feature of cut(1) is
that besides numbers you can supply ranges of numbers,
-f 3-6 to print the third through sixth fields.
We can model these ranges with the
so let’s try adding that feature to our program.
Once again, the more complex part is parsing the indices:
We have extracted the parsing of a single number into an inner function that we can call multiple times.
As an inner function it has access to the parameters of its containing function,
and we can make use of that to provide better error messages.
This function also expects a default value that it return when one bound of the range is missing,
which means we can write
3- to mean “every word starting with the third”.
For each range we parse the first and the last component as its lower and upper bounds (respectively),
Int.max as the default values,
and construct a
CountableRange from those bounds.
We only need to correct the lower bound by one
CountableRange expects the upper bound to be excluded.
To apply our selections we take each range,
trim it to the bounds of the array using
and grab the corresponding elements from the array.
Since this returns collections instead of single words,
flatMap instead of
map to flatten all those collections.
Here’s the code as a whole:
The change in requirements has staid pleasantly local in the two places that produce and consume the modified data. We left the body of the script unmodified since we had omitted the type signature of the changed data and since we didn’t need any other changes in processing.
The size of the script is comparable to the C code, although both versions support different features (our version supports ranges while the other one supports custom word separators) and use entirely different frameworks (the toybox infrastructure is intended for Unix command line tools, the swift standard library has a few bare bones provisions for them), so take this comparison with a huge grain of salt.
I have a bunch of ideas for more work on this tool, including:
- support the
- support a
-vflag to invert range matching
- support reverse ranges like
- print usage information, for example when no ranges were supplied
- rewrite the tool, maybe in Rust or Go
If you have more ideas, I’d love to hear about them!