Perspective Tracking & Illusory 3D
Revisiting techniques for immersion in device-less, interactive media.
1. Background
2. Research & Journey
3. How it Works
4. Thoughts, Plans, & Personal Attempts
5. Use-cases & Implementations
1. Background / Research
I first discovered this technique from Johnny Chung Lee’s viral (10.47m views) 2007 video demonstration:
I’m still totally floored by how smooth and accurate the perspective is in this demo. It’s such a simple technique, but the effect is so powerful.
2. Research & Journey
Okay, but what’s it called?
Most of my time researching this technique was spent trying to find it.
Typically, the demo videos (see below) are filmed in first-person, with the camera operator moving around a computer monitor, TV, or projection screen to demonstrate the effect. The illusory space is typically a box or cube that fits the dimensions of the display (see frustum and matrix projection explanations).
I couldn’t figure out what it was called, so I had trouble even finding any other examples to see if other people had figured it out or how it was being used now.
Eventually, I found it under many different names.
The ones in italics are the most common based on my research
- Off-axis projection
- Camera matrix projection
- Fish Tank VR
- Frustum matrix projection
- Head-coupled perspective
- Desktop VR
- Device-less head tracking
- VR window (or some other combination of ‘tech acronym’ and ‘window’)
Johnny Lee called this ‘Desktop VR’ back in 2007. This was five years before we had a legitimate commercially-available VR headset solution in the original Oculus Rift.
Here in 2021, the maturation of the physical computing and human-computer interaction spaces necessitate that we standardize these terms and technologies and their meanings so that we can all be on the same page and lay the groundwork for the future.
In the instance of this technique, I’d categorize it as a device-less, illusory augmented reality (AR) or mixed reality (XR) — meaning that you don’t need a smartphone or body-mounted display hardware.
I’ve been calling it Perspective Tracking.
3. How it Works
The technique itself uses a lot of math that I truthfully can’t entirely grasp (calculus, trigonometry, and linear algebra).
In essence, here’s what’s happening:
Your typical real-time 3D environment (Unity, Unreal Engine, TouchDesigner, etc.) has virtual cameras. These cameras are based on real-world cameras and are able to emulate true, physically-based camera operating principles.
In these virtual 3D environments, the camera’s field of view is calculated using what’s called a perspective or projection matrix in form of a square pyramid. The apex of the pyramid is the camera’s origin, and its field of view expands from that point to fill the four sides of the visible frame and 3D perspective.
The section of this pyramid that we’re interested in is called the frustum, the chunk of a solid between one or two parallel planes “cutting” it. In 3D graphics, this chunk is the “viewable” region. In other words; what’s visible on-screen.
To create the illusion of real-time shifting perspective, the digital camera is linked to the viewer’s head position in space. As the viewer moves in physical space around the display, the virtual camera is skewing its perspective to match the in-game virtual view, creating the illusion that you’re looking around inside of the virtual space.
4. Thoughts, Plans, & Personal Attempts
How I Made my Versions
I began with Josh Ferns’ KinectHolographic Unity plugin and modified it by integrating it into a custom Unity HDRP project in a custom environment.
This project relies on another plugin in the Unity Marketplace called RF Solutions Kinect v2 Samples with MS-SDK to collect real-time positional data from a Kinect V2.
Once I had these plugins integrated into my project, I spent of lot of time calibrating and tweaking the plugin parameters until the perspective shift, light, animation, and fluidity all felt real. Relative scale of digital objects, camera settings (notably warping and distortion from simulated focal length as the frustum warped) played a big role in the believability of the illusion.
5. Use-cases & Implementations
I really believe that what we’re seeing now is just the beginning of this tech — we’ll be seeing a lot more of this in the very near future.
Recently, similar techniques have been quickly and heavily adopted in Virtual Production for TV and movies such as The Mandalorian.
Artists & Creators Working with this Technique
Harvey Moon / MB Labs
Harvey and the team at MB Labs are also pushing this technique forward, really excellent ideas and tests. They’re working on integrating physical objects in with the illusory space.
Michael Kozlowski @mpkoz
Michael, in my opinion, is pushing this to where I think it should be going and aligns with how I’m thinking about this technique. Here are some examples of his experiments:
This is a normal computer monitor displaying a normal, live image. A lot of people thought my last few posts were vfx or composited. I’ve been experimenting with a digital variation of a centuries-old illusion technique called trompe l’oeil, which uses a viewers perspective to simulate depth. Thanks to modern technology, instead of drawing one static point of view, we can now track someone’s head and update that perspective continuously. This creates the illusion of 3D space on a 2D screen when viewed by a single person.
Refik Anadol @refikanadol
Refik’s studio just very recently started working with this technique by incorporating it into their Data Paintings:
Matt Workman @cinetracer • @cinematographydb
Matt is doing a lot of virtual production work with a similar technique, while not using a Kinect like Michael, he’s using HTC’s Vive Tracker system (like Refik), but instead of sticking the tracker to a person, he’s mounting it to a camera system to film real-time virtual sets.
Plugins & Getting Started
If you’re interested in creating your own version of this illusion and building off of some of the great work that’s already been published, I’ve compiled a short list of plugins and starting places here.
Holo-SDK
is a Unity plugin and toolkit developed by Perception Codes that enables a fast setup for desktop VR using just a webcam.
The Parallax View
is an iOS app developed by Algomystic that uses the FaceID TrueDepth camera on iOS and iPadOS devices to achieve the same effect. The Unity/Xcode app is open-source and available on GitHub.
Florian Weidner’s UE4-Plugin-OffAxis
is an Unreal Engine 4 plugin that originally used an OptiTrack motion capture system to achieve the effect. This project uses a clever Blueprint integration to handle the tracking, so it in theory could be adapted to source the head perspective location from another source or sensor, i.e. a Kinect.
Josh Ferns’ KinectHolographic
Unity plugin. (This is what I used for my demos.)
Alternative Holographic Displays
There are a number of companies taking different approaches to the illusory display: “holographic” display technologies. These technologies are creating the same illusion, but through different means.
Lenticular Displays
Lenticular displays use lenticular (double-convex) lenses to show you multiple rendered angles of a virtual scene based on the angle that you’re viewing the display from. This is a popular alternative to a real-time tracked solution as it’s less expensive computationally, doesn’t require a camera or other means of tracking, and works for multiple viewers simultaneously.
Think of them like advanced, moving versions of those holographic kids’ books and toys or billboards that change images as you move past or look at them from another angle. This is also how the Nintendo 3DS worked.
The biggest player in this space right now is Looking Glass Factory with their holographic lenticular displays.
Sony is also building one that they’re calling the “Spatial Reality Display” for $5000.
Michael Kozlowski, who I mentioned earlier, is hand-building similar holographic lenticular displays and selling them paired with NFT artworks.
Google, back in May 2021 showed an incredible demo of their new Project Starline telepresence system. I believe this is also a lenticular display but at life-scale.
Logitech are also reportedly prototyping a telepresence booth similar to Project Starline called Project Ghost.
Volumetric Displays
One of the coolest developing technologies in this space are volumetric displays such as the Voxon Photonics VX1 volumetric display.
A Voxon display operates much like a 3D printer. We take 3D data, and slice it up into hundreds of layers. Those layers are then projected one at a time onto a specially designed high-speed reciprocating screen. Due to “persistence of vision”, the human eye blends the images together, and the result is a true 3D image that can be viewed in the same way as one would view a real object, from any angle, and without special effects, headgear or glasses.
Physical Pixel / Tactile Displays
As I understand it, there’s no standardized name for these technologies yet because they’re still so experimental, so we can generalize them as physical pixel displays or voxel displays. (I’m working on a separate article about these technologies because I think they’re amazing.) These displays use physical objects to display an image or convey information, such as the BREAKFAST Brixel Sculptures or the experimental 3D display created by physicist David Smalley and his team at Brigham Young University.
There’s also the incredible multimodal acoustic trap display (MATD)developed by scientists at the University of Sussex that levitates and controls the position of a plastic bead using ultrasound while simultaneous tracking and projecting a RGB laser at the particle to create an image.
Hologram LED Fan Displays
This method uses RGB LED strips attached to a high-speed rotating motor. By taking advantage of persistence of vision (like the Voxon VX1), the spinning LEDs produce the illusion of an animated image. These displays are super versatile because they come in many sizes and can even be synced together in an array to create a holographic wall.
Back to Perspective Tracking
Here’s what I think we need to do to take this further:
1) Device-less & Integrated
As simple and unobtrusive the illusion can be, the greater the immersion and the more significant the suspension of disbelief. Think: Kinect or other depth/CV solution for tracking face & head position. User would not need a Vive tracker, IR glasses, body-mounted hardware, smartphone, etc.
Michael Kozlowski, who I mentioned earlier, is hand-building similar holographic lenticular displays and selling them paired with NFT artworks.
2) Amorphous ‘window’ shapes
Get away from the illusion being tied to the display dimensions. Think interactive projection-mapping on non-square objects/surfaces, inside objects, creating new illusory spaces using the perspective.
3) Interaction/Interface/Depth
Hand tracking interaction for hand presence and manipulation of illusory 3D space. (Michael Kozlowski has done some of this with recent experiments)
4) Stylized & Photorealistic Media
With developments in real-time raytracing and technologies like Lumen in Unreal Engine 5, I’d love to see some demos that very closely mimic light & shadow using real-time lighting to totally suspend disbelief and push the illusion further.
Would be very cool to be able to dynamically match the ‘virtual’ light to match the room light based on time of day or the user shining a ‘light’ into the scene. Imagine a kinetic digital sculpture embedded in your wall that’s shadow changed with the Sun over the course of the day or an interactive game where you need to shine a flashlight to see inside the volume.
5) Screen Technology Developments
The next logical step is using high-refresh displays (120Hz+) to sell the effect and reduce jitter and perceived lag. Resolution/visual acuity would of course also be significant depending on the physical size of the experience and how close the viewer is able to get.
True high-output HDR would also significantly heighten the realism, especially for these kinds of “virtual window” implementations.