/SMT Solving on an iPhone

SMT Solving on an iPhone

SMT Solving on an iPhone

5 November 2018

Why buy an expensive desktop computer when your iPhone is a faster SMT solver?

A few days ago, I tweeted this:

I’ve been seeing discussion for a while about the incredible progress
Apple’s processor design team is making,
and how it won’t be too long until Macs use Apple’s own ARM processors.
These reports usually cite some cross-platform benchmarks
like Geekbench to show that Apple’s mobile processors
are at least as fast as Intel’s laptop and desktop chips.
But I’ve always been a little skeptical
of these cross-platform benchmarks
(as are others)—do they really represent
the sorts of workloads I use my Macs for?

As a formal methods researcher,
the only real compute-intensive workload I run regularly
is SMT solving, usually the Z3 SMT solver.
At this point I’ve spent a lot of time
learning about Z3’s performance characteristics,
and it also has some peculiarities benchmark suites won’t capture
(Z3 is generally single threaded).
I recently bought a new iPhone XS
featuring Apple’s latest A12 processor.
So, in a fit of procrastination,
I decided to cross-compile Z3 to iOS,
and see just how fast my new phone (or hypothetical future Mac) is.

The first test

Cross-compiling Z3 turns out to be remarkably simple,
with just a few lines of code changes necessary;
I open sourced the code to run Z3 on your own iOS device.
For benchmarks, I drew a few queries
from my recent work on profiling symbolic evaluation,
extracting the SMT output generated by Rosette in each case.

As a first test, I compared my iPhone XS
to one of my desktop machines, which uses an Intel Core i7-7700K—the best
consumer desktop chip Intel was selling when we built the machine 18 months ago.
I expected the Intel chip to win quite handily here,
but that’s not how things turned out:

The iPhone XS was about 11% faster on this 23 second benchmark!
This is the result I tweeted about,
but Twitter doesn’t leave much room for nuance,
so I’ll add some here:

  • This benchmark is in the QF_BV fragment of SMT,
    so Z3 discharges it using bit-blasting and SAT solving.
  • This result holds up pretty well
    even if the benchmark runs in a loop 10 times—the iPhone
    can sustain this performance and doesn’t seem thermally limited.
    That said, the benchmark is still pretty short.
  • Several folks asked me if this is down to non-determinism—perhaps
    the solver takes different paths on the different platforms,
    due to use of random numbers or otherwise—but I checked
    fairly thoroughly using Z3’s verbose output and that doesn’t seem to be the case.
  • Both systems ran Z3 4.8.1, compiled by me using Clang with the same
    optimization settings.
    I also tested on the i7-7700K using Z3’s prebuilt binaries (which use GCC),
    but those were actually slower.

What’s going on?

How could this be possible?
The i7-7700K is a desktop CPU;
when running a single-threaded workload,
it draws around 45 watts of power
and clocks at 4.5 GHz.
In contrast, the iPhone was unplugged, probably doesn’t draw 10% of that power,
and runs (we believe) somewhere in the 2 GHz range.
Indeed, after benchmarking I checked the iPhone’s battery usage report,
which said Slack had used 4 times more energy than the Z3 app
despite less time on screen.

Apple doesn’t expose enough information to understand Z3’s performance on the iPhone,
but luckily, Intel does for their desktop processor.
I spent some time poking around using VTune
to see where the bottlenecks were when running Z3 on the desktop.
As Mate Soos observes,
most SAT solving time is spent in propagation,
which is very cache-sensitive.
VTune agrees, and says that Z3 spends a lot of time
waiting on memory
while iterating through watched literals.
So the key to performance here seems to be
cache size and memory latency.
This effect might explain why the iPhone is so strong on this benchmark—the A12 chip
has a gigantic, low latency L2 cache,
and also seems to have better memory latency after a cache miss compared to the 7700K.

The rapid march of Apple silicon

To test whether that diagnosis is correct,
I ran a broader experiment,
gathering all the Apple devices I could get my hands on.
I also chose a benchmark about 10 times slower (i.e., 4 minutes on a desktop)
to mitigate any concerns about mobile burst performance.

Here are the results for the devices I gathered,
graphed according to their release date,
and relative to the Apple A7,
which was their first 64-bit custom CPU design:

The first thing to note is that the i7-7700K desktop processor
beats the iPhone XS on this different, longer benchmark.
But the iPhone is incredibly competitive, falling in between
the 7700K and its predecessor i7-6700K,
which was the fastest consumer desktop processor until just under two years ago.

For fun, I also added the Intel Core m7-6Y75,
which is the processor in my 2016 MacBook.
The iPhone XS is about 50% faster than my laptop at running Z3.

The really remarkable thing here is the trend for Apple: a fairly
consistency 30% year-on-year improvement for this Z3 benchmark.
Obviously we shouldn’t draw too many conclusions from this one silly benchmark,
but it seems like it will only take one or two more iterations of this trend
for Apple CPUs to make total sense for my workloads.
I honestly didn’t expect it to be this close—modern smartphone architectures
are incredible!

Thanks to Meghan Cowan, Max Willsey, and Eddie Yan for helping me track down more devices and run experiments.

Original Source