Oh,
this is a topic we could go on and on about for hours. In general, to
provide a perfect experience it’s important for the digital content to
appear accurately placed in the real world. There are different
techniques and technologies that help with that. Without going too much
into details or giving a comprehensive technical overview, the 3 most
used techniques:
Visual keypoint matching
This method uses a
visual trigger - or marker. This is the “standard” AR that comes to mind
for most people when they hear about Augmented Reality. Unique
features (corners, edges) of the visual target extracted and stored in a
target database. The system utilising this method is constantly
extracting and comparing features of the live camera view with the one
stored in the database. Once a match is found (and after a series of
other calculations), an algorithm generates/calculates a virtual flat
surface based on the position and angle of the visual target and all of
the digital content is placed in the space related to that flat surface.
Whether the visual target remains in the camera view is academic at
that point, the flat surface remains the anchor of the digital content.
Spatial mapping
This
is where AR (or MR) gets interesting. Usually supported by a depth
sensor or similar, the framework is constantly building a virtual
representation of the real world (sort of like a 3D scan). This virtual
copy of the real world is then used for mainly two things:
Detect flat surfaces in the real world (horizontal and vertical)
Use the virtual “mesh” of the real world for occlusion
Using
spatial mapping we can define the virtual anchors in the real space and
position virtual content in relation to those anchors. Whenever the
viewer returns to that space, the virtual content will be in the same
place.
The interesting part here is not really the spatial
positioning of content though. Occlusion of the virtual content with
real world objects is just as important, if not more important. To give
the perfect illusion of a virtual content in the real world occlusion is
necessary and will be (or already is) the focus of development. Right
now occlusion is achievable using special depth sensors (think HoloLens
or Magic Leap), and building a high enough quality copy of the real
world.
Device sensors (accelerometer, gyroscope, magnetometer, GPS)
This
technique provides no visual search or mapping of the real world.
Instead, the viewer is the anchor of the virtual space and content is
placed in relation to the viewer, positioned usually using the compass
of the device and held in place by combining multiple sensors. The
easiest way to imagine this is: think of a VR scene you’re watching in
Cardboard for example.
The
scene has a table, and a room as a background. Now replace the room
with the camera view of the real world, but keep the table there. That’s
basically it. A big drawback of this approach is that the sensors are
usually either not accurate enough to keep the virtual content locked to
a certain point or it’s impossible to detect movement of the viewer
with this technique alone. As ARKit and ARCore showed though, when you
support visual tracking with this technique the results can be very
convincing. We were able to successfully mix this approach with the
visual trigger technique resulting in convincing AR experiences.
It’s
important to mention that ARKit and ARCore successfully integrated
visual tracking with sensor data providing a result that rivals with 3D
spatial mapping in accuracy. So it is a very exciting time to watch
those two frameworks and others to see how they will move forward with
occlusion for example.
0 Comments
Please do not enter any spam link in the comments box.