I enjoy occasionally writing custom command line tools. Some recent examples are walk (like find(1) but without options) and stest (like test(1) but as a Unix Filter).

Then a while ago I read about Rob Landley’s idea of writing a words tool (like cut(1) but simpler) and because I liked the idea I want to (re)implement this tool while trying to see if I can keep it simpler than Rob’s C version.

We’ll start with Swift.

Swift

read/print

Words is supposed to be a filter, which means it should read from standard input and write to standard output. Let’s get started by writing the simplest possible filter that just echoes stdin unmodified, like cat(1) does.

while let line = readline() {
	print(line)
}

split/join

To solve the problem at hand we can process each line to

Let’s implement the easier part of this:

while let line = readLine() {
	let words = line.components(separatedBy: .whitespaces)
	print(words.joined(separator: " "))
}

The difference in behaviour is that this version replaces sequences of whitespace with single space characters.

transparent/opaque

Which words are being selected (and their order) need to be

We’ll handle the user interface later. The program can store the required information either as a (passive) data structure or as a(n active) function or object.

This illustrates a typical design conflict between… well, not between functional and object-oriented programming, which I believe is entirely independent from this. I’m referring to what Noel Welsh describes as “opaque and transparent interpreters”.

In our program the difference would look something like this:

transparent

let wanted: [Int] = // TODO
let words = line.components(separatedBy: .whitespaces)
let selected = select(wanted, from: words)
print(selected.joined(separator: " "))

opaque

let select: ([String]) -> [String] = // TODO
let words = line.components(separatedBy: .whitespaces)
let selected = select(words)
print(selected.joined(separator: " "))

The trade-offs boil down to the question: Which parts of our solution do we want to separate?

We’ll go for the transparent variant first and see where it takes us.

arguments

We want to enter the selection on the command line. I can think of two approaches to this:

  1. Each command line argument indicates a single selection, the program only ever operates on standard input.
words 3 4 2 < file
  1. The first command line argument specifies all selections, the remaining ones indicate files to process.
words 3,4,2 file

We’ll go with the first approach here because I believe it’s easier to implement.

func index(_ string: String) -> Int {
	guard let int = Int(string) else {
		print("invalid index: \(string)")
		exit(EX_USAGE)
	}
	return int - 1
}

let arguments = CommandLine.arguments.dropFirst()
let wanted = arguments.map(index)

There’s a lot going on here, let’s walk through the code:

let arguments = CommandLine.arguments.dropFirst()
let wanted = arguments.map(index)

The first element of CommandLine.arguments contains the path to the executable, we’ll transform the remaining arguments into indices.

guard let int = Int(string) else {
	print("invalid index: \(string)")
	exit(EX_USAGE)
}

We convert string to a number. If that fails, like when string doesn’t contain a number, we print an error message and abort with EX_USAGE (from sysexits(3)).

return int - 1

Indices supplied on the command line should start at one, so we correct by one here to get array indices.

select

Once we have the indices, we can grab the corresponding elements from the array.

func select<A>(_ indices: [Int], from array: [A]) -> [A] {
	return indices.map { array[$0] }
}

“fatal error: Index out of range”

This naive version of the program crashes as soon as one of the indices is out of range for any line. We can prevent this problem by ignoring indices that are out of bounds for a given line.

func select<A>(_ indices: [Int], from array: [A]) -> [A] {
	let valid = { array.indices.contains($0) }
	return indices.filter(valid).map { array[$0] }
}

Range

One very useful feature of cut(1) is that besides numbers you can supply ranges of numbers, like -f 3-6 to print the third through sixth fields.

We can model these ranges with the Range type so let’s try adding that feature to our program.

Once again, the more complex part is parsing the indices:

func index(_ string: String) -> CountableRange<Int> {
	func parse(_ component: String, default empty: Int) -> Int {
		if (component.isEmpty) {
			return empty
		}
		if let int = Int(component) {
			return int
		}
		print("invalid component: `\(component)' in range: `\(string)'")
		exit(EX_USAGE)
	}
	let components = string.components(separatedBy: "-")
	let lower = parse(components.first!, default: 1) - 1
	let upper = parse(components.last!, default: Int.max)
	return lower..<upper
}

We have extracted the parsing of a single number into an inner function that we can call multiple times. As an inner function it has access to the parameters of its containing function, and we can make use of that to provide better error messages. This function also expects a default value that it return when one bound of the range is missing, which means we can write 3- to mean “every word starting with the third”.

For each range we parse the first and the last component as its lower and upper bounds (respectively), using 1 and Int.max as the default values, and construct a CountableRange from those bounds. We only need to correct the lower bound by one because CountableRange expects the upper bound to be excluded.

func select<A>(_ ranges: [CountableRange<Int>], from array: [A]) -> [A] {
	return ranges.flatMap { range in
		return array[range.clamped(to: array.indices)]
	}
}

To apply our selections we take each range, trim it to the bounds of the array using clamp, and grab the corresponding elements from the array. Since this returns collections instead of single words, we call flatMap instead of map to flatten all those collections.

Conclusion

Here’s the code as a whole:

import Foundation

func index(_ string: String) -> CountableRange<Int> {
	func parse(_ component: String, default empty: Int) -> Int {
		if (component.isEmpty) {
			return empty
		}
		if let int = Int(component) {
			return int
		}
		print("invalid component: `\(component)' in range: `\(string)")
		exit(EX_USAGE)
	}
	let components = string.components(separatedBy: "-")
	let lower = parse(components.first!, default: 1) - 1
	let upper = parse(components.last!, default: Int.max)
	return lower..<upper
}

func select<A>(_ ranges: [CountableRange<Int>], from array: [A]) -> [A] {
	return ranges.flatMap { range in
		return array[range.clamped(to: array.indices)]
	}
}

let arguments = CommandLine.arguments.dropFirst()
let wanted = arguments.map(index)
while let line = readLine() {
	let words = line.components(separatedBy: .whitespaces)
	let selected = select(wanted, from: words)
	print(selected.joined(separator: " "))
}

The change in requirements has staid pleasantly local in the two places that produce and consume the modified data. We left the body of the script unmodified since we had omitted the type signature of the changed data and since we didn’t need any other changes in processing.

The size of the script is comparable to the C code, although both versions support different features (our version supports ranges while the other one supports custom word separators) and use entirely different frameworks (the toybox infrastructure is intended for Unix command line tools, the swift standard library has a few bare bones provisions for them), so take this comparison with a huge grain of salt.

Future Work

I have a bunch of ideas for more work on this tool, including:

If you have more ideas, I’d love to hear about them!