I recently decided to dive into a new bit of learning – creating my own software language interpreter. No, I’ve not gone stark raving mad due to COVID isolation, it is an interesting challenge that I wanted to understand better. Over a year ago, I remember Gus mentioning the process of creating an online book in his blog – that book was Crafting Interpreters, and I kept a reference to that site.
I’ve started working through the book – except that instead of doing the example code in Java, I’m converting the examples and content on the fly and implementing it in Swift. I just worked through implementing the scanner portion of the code. That code required me to read through a text file (or string) and get tokens from it. And as it turns out, this was super easy to do in swift, but quite non-obvious for me.
Since its inception the Swift language has changed, I think a couple of times, how it deals with strings. Because it supports the UTF8 strings, you can’t just iterate through it byte at a time and get what you want. A lot of early (pre-swift 3) examples did some of this, or variations on the theme, but that’s no longer valid. So a number of examples on StackOverflow tackling this kind of thing are dead code and very out-of-date.
The first, and most obvious way, to iterate through the characters in a string is to use it within a for
loop. This is the pattern that you’ll see directly in The Swift Programming Language Guide on Strings and Characters:
for char in yourString {
print(char)
}
Works great, super efficient… except, you don’t have any reference to do some of the tricks that parsers and scanners want to do – which is peek to see what the next character might be.
The way you can interact with strings in this fashion involve a specific type called String.Index. The details are in a section just a bit further down: Accessing and Modifying Strings in that same guide.
Don’t make the mistake of thinking a String.Index
is just a number that you can add and subtract to move around the index. Unicode makes it significantly more tricky than that, and the language represents String.Index
as its own separate type, I think partially to to keep me (and you) from making that mistake. Fortunately, the standard library offers a couple of properties on each string to give us reference points: startIndex
and endIndex
. You can step forward to the next index with a methods on the index: String.Index(after: _someIndex)
. There’s also a way to step back – use String.Index(before: _someIndex)
. These allow you to step forward (or back) through a string, character by character, or shift the index location forward a step (or two) that you kind of need when you’re making a scanner for a language interpretter.
One last mechanism that’s helpful to know: you can retrieve a substring from the characters between two index locations, using a range expression made up of the two indices. A substring isn’t quite a string – it’s a different type in Swift. Swift plays some tricks with referencing the original string when you’re using this type, but you an easily make a full string to use as an argument for another function easily enough:
newString = String(firstString[indexA...indexB])
As I mentioned earlier, this is all in the Swift language guide. I hate to admit that it hasn’t been the first place I looked for help and guidance, but it probably should have been. It’s all there, and more.