Today, we’re excited to introduce a groundbreaking update to SnappyCam: full-sensor capture up to 8 Mpx at an incredible 20 pictures/sec. That’s faster than any other mobile platform, including Android, and 4x faster than any other camera app on iOS.
Full-frame continuous shooting is the holy grail of smartphone photography. It’s what makes DSLRs great, and what professional photographers rely on to get that winning shot. As smartphone camera sensors improve so must the hardware and software that drives them. As an app developer, the best we can do is full-sensor capture, that utilizes every pixel on the camera sensor to produce photos of the highest quality and widest field of view.
Let’s put this into perspective. The closest competitor to the iPhone 5 is the Samsung Galaxy S4. It’s “best shot” camera feature shoots up to 20 pictures continuously at full-sensor resolution, with each photo 0.1333 sec apart: an average of just 7.5 photos/sec. SnappyCam is able to deliver over 250% faster continuous shooting performance on the iPhone 5, in pure software, on a hardware rig that’s 10 months older than the Galaxy S4.
Full-sensor shooting works well on the older iPhone 4S as well: 12 pictures/sec at 8 Mpx, or 15 pictures/sec at 5 Mpx. The competing Samsung Galaxy S3 tops out at just 3.3 full-sensor pictures/sec. The iPhone 4S is also 7 months older than the Galaxy S3.
This is a big deal: even with older hardware, the iPhone beats competing mobile platforms in the race toward the DSLR continuous shooting experience.
Today the iPhone can be officially crowned King of Speed.
At the core of SnappyCam is a capture and image signal processing engine with innovations that took over 12 months of research and development. With it, we can also beat competing iOS camera apps by 400% on full-sensor shooting performance with the same iOS device and hardware.
Once photos are captured and buffered in real-time, our multi-threaded JPEG compression engine takes over. It compresses shots in software at speeds that exceed that of the hardware encoder normally dedicated to the task.
We had to reinvent JPEG to do it. First we studied the fast discrete cosine transform (DCT) algorithms of the early 1990s, when JPEG was first introduced. We then extended some of that research to create a new algorithm that’s a good fit for the ARM NEON SIMD co-processor instruction set architecture. The final implementation comprises nearly 10,000 lines of hand-tuned assembly code, and over 20,000 lines of low-level C code. (In comparison, the SnappyCam app comprises almost 50,000 lines of Objective C code.)
At first we did try to leverage the iPhone graphics processing unit (GPU) for the DCT computation. It turned out to be a dead-end. Back then, iOS 4 limited the data transfer speed in and out of the GPU; but even with that limitation eliminated, with the introduction of OpenGL pixel buffers in iOS 5, it appeared that the GPU parallelism was limited to about two render units that ran at a slower clock-rate than the main CPU. Without support for OpenCL or multiple render targets, we were also forced to use a naive (slow) DCT algorithm that was essentially a full matrix multiplication.
The ARM NEON approach was optimal: the SIMD pipeline can perform up to 8 simultaneous arithmetic operations in parallel at the full clock rate of the device, without any data transfer overheads, and allowing us to use any DCT algorithm we could conceive. And when it comes to speed, it’s all about doing less for more. Less computation, more work done, faster.
JPEG compression comprises two parts: the DCT (above), and a lossless Huffman compression stage that forms a compact JPEG file. Having developed a blazing fast DCT implementation, Huffman then became a bottleneck. We innovated on that portion with tight hand-tuned assembly code that leverages special features of the ARM processor instruction set to make it as fast as possible.
Similar innovations were put into a custom JPEG decoder, powering the unique SnappyCam thumb-to-interact living photo viewer. When dealing with massive 8 Mpx (32 MByte BGRX uncompressed) images, decoder performance became critical to a great user experience.
We’re also hopeful that rumors of 120 picture/sec capabilities come true!
Do follow me on Twitter: @jpap