How Oculus uses AI for Hand Tracking

ai ar artificial intelligence vr Apr 16, 2020

Oculus has been working on the concept of hand tracking from quite some time now. The idea was sometimes dismissed as impractical by some and difficult by others. Therefore it was quite surprising for many when Oculus announced that they have created the hand tracking software and that it will be rolled out through the Oculus Quest next year. The tech is the brainchild of Facebook Reality Labs [1] and Oculus [2], who worked together to develop the state of the art, one of the only completely articulated hand tracking [3] system for Virtual Reality [4] that doesn’t use any kind of assistance from the expensive hardware or sensors. Conventionally the technology has had always relied on the

Depth Sensors
Specialized Gloves
Cables etc.

Points Detection with monochromatic Cameras

How it is Better than Everything that came Before

The new tech has many exciting things for a consumer of VR technologies. The new Oculus headset will be free of any cable attachments and will be the first of its kind to incorporate tracking based solely on computer vision, with the help of monochrome cameras and without the use of any additional equipment, which not only incorporate the use of expensive hardware but with the use of gloves etc also make the user experience unnatural and the tracking process cumbersome. It makes use of the four cameras [5], along with the new techniques in the field of machine learning models that are able to track the depths and positions of the hands to a great accuracy. This enables the new device to be built at a fraction of the existing technologies in terms of all the important parameters.

Reduced size: adds more natural user experience when using the app
Weight: it is very critical, since the device is supposed to be worn for longer duration of times.
Power : allows user to spend more time with the device and increase the duration of single application and games that require longer time to conclude.
Cost : the finest achievement means getting rid of the expensive hardware and sensors and therefore reducing the costs.
Processing in the device : One of the most outstanding features of the new tech is that despite the use of high end artificial intelligence for hand tracking the processing is optimized and happens on the device itself.

How it all Works

Tracking has been the most important component in the VR/AR[6], as the applications and equipment that have been able to track the relative position of its user, can then successfully project the image or the augmented world back to the user. The positioning has been mostly dominated by conventional means of tracking, if the computer vision was being used the use was only limited to the conventional trackers, which had plenty of inbuilt problems and challenges [7] like such as occlusion, background clutter [8], illumination changes, scale variations, to name a few.

These challenges hindered the more comprehensive use of the technology. Oculus is using deep neural networks with SLAM[9] (simultaneous localization and mapping), the technique has been in development with Facebook in its number of its lightweight implementation for tracking and similar activities in mobile devices. These architecture follow a distinct pipeline for tracking which includes

Prediction and localization of the hands including the important characteristics in it such as the joints, finger ends etc.
These important points on the hand are used to create a 26 degree-of-freedom pose of the person’s hand, where the distinct points and characteristics help to pinpoint the exact location of the fingers and other important joints (The architecture here is similar to PoseNets[10] used for detection of postures )

Posenet Detections can be applied to Hand Tracking.

The Data is then further processed and a second networks helps in the construction of the 3D model that includes the Geometry of the hand to a great precision and helps in interactions in the virtual world.

3D Model From actual image with Augmented Reality

What can go Wrong

The tech is relatively new and untested on a broader scale, while it has performed well in the lab environments and worked well in the initial phase a more thorough performance review is only possible when the actual products is at least a few months old in the market. The challenges to technology actually come from the very features that make it so intuitive in the first place, that being the camera based tracking. The 3D model created from the tracked joints is robust but a failure to detect even a single joint incorrectly can produce a very different and difficult to augment result. A backgrounds that blends with the color of the skin will be especially challenging when trying to determine depth or distance of the hands of the user.
Even with all the possible challenges the truth remains that it is and will be the technology of the future and with further improvements in the tracking domain in artificial intelligence the accuracy and the possibilities are only going to increase with time.

References

Stay connected with news and updates!

Join our newsletter to receive the latest news and updates from our team.
Don't worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.