RealityKit on macOS

Guessing which frameworks are going to be updated, and which aren’t, is — I think — a part of developing software on Apple platforms. Sometimes it’s clear based on what’s been updated over the past three or fours, as is the case with RealityKit. I started my experiments with SceneKit, another lovely high-level API to 3D graphics, but wanted to utilize more of RealityKit for some of my procedural graphics work (Lindenmayer trees).

RealityKit has a huge API surface, only some of it directly related to 3D rendering. Other parts include physics simulations, including rigid body collisions, an ECS, and — as of WWDC’21 — methods for procedural geometry/mesh creation. RealityKit is fundamentally an API that’s meant to mix of 3D rendered content into live images from the real world. Because of that, the camera for rendering 3D content is tightly controlled to match the physical position and location of the actual camera on a device. So it isn’t surprising that you might think, “Hmmm.. how’s that going to work on macOS – which only has a forward-facing camera, and few to none of the fancy motion and depth sensors an iOS device has.”

The good news is that RealityKit, with macOS 10.15, added the capability to run on macOS. It includes an ARView for macOS (even though there’s no ARKit) inside RealityKit. It’s more constrained than the UIKit version, but it’s there. After a comment from James Thomson on a podcast last year talking about getting that view working on macOS, I wanted to do the same. It’s relatively easy to get an ARView showing up – but figuring out what do to with it from there is not as obvious. It turns out that the macOS-only ARView doesn’t respond to mouse, touch, or trackpad input – so user interactions don’t influence the camera position. You can’t appear to move around or look at different things without code to explicitly move the camera. If you use the UIKit ARView on macOS built with mac Catalyst, it’s a different story. This bit is about mixing AppKit and RealityKit together without using MacCatalyst and the UIKit Apis.

Once I’d convinced myself this wasn’t entirely a fools errand, I wrangled up the code to enable user interaction to move the camera. I made my own subclass of ARView and added in keyboard, trackpad, and mouse support to move the camera. It isn’t identical to how SceneKit works, but it’s sufficient to display content and look around the environment. I wanted something where I could load up and view content using RealityKit, but running on macOS. In particular, I wanted to load up a USD file and get sufficient screen captures to render the set of images into an animated gif. As a side note, there’s not an easy and consistently clean way to describe a 3D object in documentation – but a lot of the HTML based mechanisms (include DocC, which I’ve been writing about lately) have no trouble displaying an animated gif.

The end result took a bit, but I got there (the fish in question is from WWDC’21 sample code):

While trying to get this working, I learned that the ARView implementation for macOS that comes with RealityKit appears to swallow up mouse events, which meant there wasn’t an easy way to control it from SwiftUI views using gesture recognizers. I switched to the older mouse and keyboard interaction apis that AppKit supports, making a subclass to add in the various overrides to provide mouse and keyboard event handling. After I got it all working, I extracted the SwiftUI wrapper. Finally, I put the subclass into a it’s own swift package, available for anyone to use:

CameraControlARView (API Docs).

If you’re interested in using RealityKit on macOS, you’re welcome to use it – or to pull it apart to see how it works and use that to do something you prefer. My goal in making this tiny open-source package is to help the next soul who’s stumbling through figuring out how to get started with RealityKit and wanting to experiment a bit on macOS to see how things work.

Design decisions and technical choices

I made a (perhaps) odd choice when I assembled the SwiftUI container for it – I set it up so that you create the instance of my subclass either in a SwiftUI view, or hold it outside that and pass it through. The point was to have a reference to the ARView (and its properties) available from the declarative space of SwiftUI so that you could interact with the 3D scene and entities within it from the SwiftUI control declarations. I didn’t see another straight-forward path to creating the scene (such as a SceneKit Scene) externally and passing it down and in, and in order to do on-the-fly file loading (such as when you drop a USDZ file into the view) I needed to have access to that scene.

The other design choice I made was to make the camera motion support two different modes of motion. The first mode is what’s called “arc-ball” – it orbits and looking at a specific point in 3D space from a variety of rotations. You can imagine it as following arcs around a sphere, centered on a specific point. I also wanted to have a mode where I could move freely move about, which I called “first person” mode. In first person mode, you can move the camera forward, back, side to side, as well as turn to look elsewhere. There are a couple other popular 3D movement modes out there (turn-table being one I considered, but decided to not attempt), but I stuck to these two modes.

The rest is all pretty straight-forward. I’d fortunately become more comfortable with the gems in the Accelerate/SIMD framework, which provide lovely quick tools for creating quaternions and matrices. I still ended up embedding a few “build this rotation matrix from this angle around that axis” kinds of methods – stuff that’s under the covers in SceneKit, but I’m not aware of a “central” implementation in the platform tooling otherwise. It’s not “hard” per se, just tricky to make sure you get correct.

Generating Animated GIF from 3D models

I went ahead and put the macOS App project up publicly on Github as well: Film3D. It’s far from a complete app, it doesn’t even have a placeholder icon (as I’m writing this) and the user interface is a ludicrous “just get it working” sort of thing, but it works to load a USDZ file, set the camera, and capture images into an animated gif.

There’s more I’d like to do with it, but I’m not planning on selling it – I just wanted something that does what it does so that I can use it to create images for documentation.

Published by heckj

Developer, author, and life-long student. Writes online at

%d bloggers like this: