Reality is Boring — Let’s Spice it Up
Your handbook to the world of augmented reality.
Have you ever bought a piece of furniture for your home only to realize it doesn’t coordinate with the rest of your furniture? Yeah, I have too. And it certainly wasn’t a pleasant experience. The colour was much too bright and vibrant in contrast with the muted set-up. Ew.
All in all, it looked really out of place in the middle of my home.
Let me ask you something: what if there was a way for you to see how that couch would look in your home… without it actually being there?
You read that right — what if you could literally see how that brown couch would go with your muted blue carpet (in your home) before you bought it? (That’d save 💰 and energy!)
Even better, what if you could simply pull out a handheld device that would show you how exactly that couch would look with respect to its surroundings?
That sounds sick.
But it’s possible (and already happening rn!). Introducing…
The ABCs of Augmented Reality (AR).
Yes, augmented reality is the tech that powers the above example. Reality can be boring, so it helps spice it up! This article has been split to different parts so you can jump around based on where you want to get with the below links 😄 :
Hyperlinks to be added:( — Not working.
Part I: What even is it?
You are probably thinking about augmented reality, how it works, implications, complications and so on. But before that, what really is augmented reality? What can be considered augmented reality and what not?
Well, from the previously mentioned example, you’d probably gather that augmented reality is a type of technology that aims to enhance the real world and make our lives easier.
That doesn’t tell us much, since that’s literally what every other technology aims to do.
Let’s break the word up in order to get a better understanding:
Augmented Reality = Augmented + Reality
Augment(ed): to make greater, more numerous, larger, or more intense. (in short, to boost or enhance)
Reality: the state or quality of having existence or substance.
Now, connect the dots.
You’ll probably gather something like augmented reality being a technology that aims to alter the real world by enhancing it for our benefit.
With augmented reality, you can have an interactive experience within the real world (as opposed to VR) with overlays of visual, immersive content on top of the real world.
A simple AR system would consist of a camera, computational unit and a display. The camera would capture the physical environment, the computational unit would help overlay and augment digital content on top while the display will, well, display the enhanced content! (Check the next part for a deeper overview)
AR is a part of a group of tech which tries to connect the virtual world with our physical world. This group is called ‘extended reality’ (XR). Here are its contents:
Part III: The itty-bitty workings.
All this seems pretty cool, but how does it even come to life? In this part, we’ll dive into how it works, from object detection to display!
Let’s take the example of the Coco Pops Interactive Adventure Campaign.
In order to get the augmented experience, we can deduce a couple things we’d need which are:
- Learning the environment and its role — both recognition + perspective
- Overlaying digital content appropriately (virtual Coco Pops!) on top of the object
- Changing of angle/position as the device moves
Let’s go over all of these individually.
Step 1: Knowing the place: where are the Coco Pops and what is around it?
Imagine working on an AR that doesn’t even know what it has to target.
Bruh. That’ll make no sense at all — where will the augmented content be overlaid?
For this, we need object detection + knowing the environment. This basically focuses on knowing, detecting and recognizing whatever is placed in front of the system.
We have two steps to doing this: Sensors + Computer Vision.
Sensors are a technological thingamajig which basically detects (senses) information from the surroundings and responds back with some sweeeet data.
The hardware of an AR system is mainly constituted by the sensors. Also, sensors do not just come in one type. We have many different types that serve different purposes. Take a look at some of these:
These sensors have a wide range of capabilities + functions. Some of them are:
Depth Sensor: calculates distance and depth (obviously) of the object
Gyroscope: Detects angle + position of the device.
Proximity sensor: Detects distance of the object from the device.
Accelerometer: this has everything to do with motion — detects elements like velocity and movement.
Light sensor: detects all things light, from intensity to brightness…duh.
Magnetometer: This is kind of like an in-built compass. It tells where the north is.
These sensors together have a wide range of capabilities, which will together help understand the object and its environment. They indulge in a process known as tracking. Tracking is basically the process of scanning, analyzing, segmenting and recognizing environmental information.
The main sensor we need for object recognition itself is the camera (it isn't mentioned, I know!). The camera supplies a live feed of the world to the system.
Note that these sensors do not always have to be in-built — tracking via sensors can be of two types depending on this.
The first type is inside-out tracking. In this form of tracking, the sensors are placed in the device. Its data is received internally. This is like the classic AR system we think of.
However, sometimes the sensors are placed outside of the device and the data is transmitted. This type of tracking where the sensors are placed externally is known as Outside-In Tracking. This isn’t as common.
The two names are super similar (which you’ve probably noticed), so this picture will sum it up for you:
One of the most important types of tracking that SDKs (like the ARCore by Google) harness is motion tracking.
To know when you move, Simultaneous Localization And Mapping, or SLAM is used. This huuuuge phrase may sound a bit intimidating but it’s nothing to worry about; all it does is try and understand the location of the phone relative to the world around it! This is done in 3 simple steps:
- Detection of visually distinct features in the image. These are called feature points.
- Computation of change in location by observing these points
- Combining of the above visual data with inertial measurements to calculate pose (position + orientation) of the camera
This helps render the digital assets in the correct alignment and you’ll actually feel like zombies are in your workplace! (There’s an actual game like that 😮)
Not only do these feature points help in understanding the change in location, but also the observation of clusters of them lying along a vertical/horizontal manner could hint the presence of surfaces (these are called planes), hence building up environmental understanding.
There are a lot of different types of AR as well (discussed in part III), so these sensors come to use there!
All this data is meaningless though. *Correction: all this data is meaningless without interpretation. That’s where computer vision swoops in.
#2 Computer Vision
YES, that is exactly what it sounds like; making computers see and understand stuff! (AI x AR 😎)
We take our vision for granted, but computers don’t have it all that easy. In fact, computer vision is considered to be a relatively difficult field of computer science! Crazy, eh?
Now, AR mainly requires computer vision for semantics.
Semantics answers the ‘what?’ of the detected object. In terms of the Coco Pops example, semantics is about recognizing that the right cereal box is in front of the device. I mean, Kelloggs would probably be offended if the digital assets were augmented on top of a random cereal box (pain).
But, how does computer vision even work? How the hell do you make a computer ‘see’ the image?
Computer vision tries to mimic the way our brains detect and interpret objects. While the brain’s process in doing so is unclear, a popular hypothesis is that our brain depends on patterns to recognize and decode individual objects. Nevertheless, this concept is used to create computer vision systems.
We train the computers on a variety of training data, which the computer processes, labels and find patterns. Also, since computers aren’t lazy like us, it’ll just train like it’s nothing which means quicker results. Yay!
Let’s say we want to train our computer on identifying the cereal box. What we’d do is train the computer on a million pictures of the box, which the computer would analyze + identify patterns similar to all the boxes. After analyzing and identifying all the data, it’ll be able to create a ‘model image’ of the cereal box. This would help the computer recognize the cereal box the next time it encounters it. Makes sense, right?
Now, computers LOVE numbers! So much that they represent images in numbers!
Huh? How do you do that?
Every image has teeny-weeny-itty-bitty pixels. The machine interprets these images as a series of pixels. These pixels are assigned numerical colour values.
This means that the input received by the computer (from the camera) will be interpreted as a bunch of numbers! The crazy thing is that these numbers together are analysed by the computer to give you the right output.
The most effective way of doing this is via deep learning. This utilizes neural networks (inspired by ‘ze brain) which extract the features and classify the image.
Now you know the basics of AR Object detection + how cool our brain us (inspiring neural networks since 1943 😏)!
Okay, so now we have the image. Our AR system now knows the presence of a Coco Pops cereal box in front of it. What’s next?
Step 2: Rendering: Overlaying the virtual Coco Pops!
The recognition of the cereal box will trigger the rendering of digital content on to the original cereal box.
Rendering is basically the whole ‘augmenting’ bit of AR — it is the actual display of the digital content (virtual Coco Pops) in the right orientation + position.
In this case, rendering uses the geometric features derived by the sensors earlier to properly adjust the Coco Pops on top of the actual box.
Step 3: Changing Frames: Things are getting shaky!
Okay, here is a question: when you hold a phone, do you stay absolutely still?
Unless you are a superhuman and not as restless, chances are…no.
You are bound to shake the phone around, and sometimes someone might even get into the way. This will mean a change of frames.
What will then happen to the rendered content? How do we make sure it still makes sense and looks realistic?
Well, most modern phones operate at 30 frames per second, meaning we have 30 milliseconds to do step  + … And that's what it does! It goes on with steps 1 + 2 constantly as the live camera moves around and changes frame.
Here’s a fun tidbit:
In many cases the AR feed you see through the camera is delayed by roughly 50 ms to allow all this to happen, but our brain does not notice!
Anyways, that’s basically it! That’s how a simple AR works.
Part III: Is AR just of one type?
Cutting right to the chase — no.
While the principle is the same in all of the types, they vary based on what triggers an augmented view. There are two broad categories of AR, with others coming under these categories.
For the sake of simplicity, here is an image that sums it all up:
Note: there are a lot of different versions of these types, but all of them fall along the lines of these categories.
Before we get on to that, take a look at this sheep (that was random LOL).
Notice the red marking it has on its body. This is known as a smit mark (in shepherd language). What this smit mark does is help identify and associate the sheep to a certain flock.
This smit mark is a marker. It marks a sheep to a flock.
It’s kind of the same thing with marker-based AR — the shepherd is the AR system and the smit mark is the marker.
The marker in the AR system can be a QR code or other unique designs. It MUST be unique + distinctive as things could get a little (a lot) bumpy otherwise.
Anyway, the AR system scans the surroundings for the marker and then overlays the digital image on it. It latches on to the spot once the marker is detected.
An example? (Some) Kids storybooks. The AR unit detects a distinct pattern/marker on the page and then adds on the needed content. Simple as that.
A more versatile form of AR is markerless AR. This system doesn’t rely on any markers which means you can play around with it and follow your dreams.
How then does a markerless AR operate?
It allows the user to decide where to put the digital content. It really just relies on the device’s hardware to gather the information necessary for the AR.
Take a look at this AR Mustang. You can position it anywhere (unlike the marker-based AR which latches itself to the marker) regardless of its surroundings.
Markerless AR has a lot of other types as well, the analysis of which would give you a much better understanding of how it works. So, let’s dive in!
1. Location-based AR
Location-based AR is a type of AR which overlays digital content on the basis of location.
Under this form, the digital content is mapped out (location-wise). When the user is at a specific location, the digital content is displayed (based on the map!). It utilizes the GPS, digital compass and accelerometer features of the device to do so.
Fun fact: Pokemon GO is powered by this type of AR!
2. Superimposition AR
Superimposition AR is CRAZY COOL — it detects and recognizes an object in the physical (‘real’) world and attempts to replace it with an augmented view.
It enhances the detected object in some way by either recreating portion of it or the entire thing.
The chairs above, for instance, is copied, rotated and displaced from its location. I don’t know about you, but I find this awesome!
This technology is also used by doctors to superimpose an X-ray view of the patient’s image onto a real image to map out where the damage is.
3. Projection-based AR
This is probably something you’ve come across in your life.
Don’t get it? Here is a hint: it has another name that starts with an ‘H.’
If you guessed hologram, you are absolutely right! A hologram is just another type of AR, although it seems a bit different.
If you are unfamiliar with project-based AR, it is basically a type of AR that casts digital content on physical objects. That’s really it.
Projection-based AR doesn’t really need to have a smart device for it to be amazing — light can project the graphics onto a surface (holograms again!) 😮
This type is especially useful when it comes to showing projects and their inner workings in companies. How cool would it be when pitches are done with these types?!
4. Outlining AR
This is exactly what it sounds like; it outlines things!
Let’s suppose you are parking your car on a dark, foggy night. You look around and all you see is dust and fog. And it gets worse when there are other people around. I mean, it could be potentially fatal for both parties! Not to mention, traffic fines could go 📈📈📈
This is where outlining AR comes in; it scans the surroundings and outlines the road boundaries for you so that you can drive safely.
…And that is it with the types! You are probably thinking of how much AR had impacted your life without you even knowing it, am I right?
Part IV: Hardware + Software
What do you actually need to make all this reality?
As for everything, we can divide the stuff we need into hardware and software. Depending on the type of AR, our hardware (mainly sensors) might have to change a teeeensy bit. Here’s the scoop 🍨
- Processor: The 🧠 of the device.
- GPU (Graphic Processing Unit): The GPU handles the visual rendering. In order to have a seamless AR experience, you need a well-functioning GPU.
- Sensors: The sensors changes based on the type of AR as mentioned before. Let’s see how:
- Any motion-based AR would require an accelerometer and a gyroscope. These two would work together to ensure that with movement, the augmented content could follow along
- Location-based AR would additionally require the presence of a magnetometer and a GPS (provides geolocation and time information to a receiver) to do location-ish things like auto-rotating a map depending on where you want to go!
The reason most phones can handle super-cool augmented reality (not always the strongest but they can!) is because most of the above are already in-built in them.
I mean, just look at this:
It literally has most of the hardware! What more do you need in (AR) life?
Of course (yes, I actually answered a rhetorical question), we need software to do all of that!
- Platform: The platform of an AR system is the operating system something is specifically built for. For instance, a line of code for iOS will only operate on Apple devices.
- Engine: In an AR system, the engine is what specifically powers, converts and renders many different types of data on to content. Multiple engines can be used depending on the purpose (eg. physics engine, gaming engine etc…)
- Framework: You might already figure out from the name that it is a foundational line of code that makes the magic happen faster. That’s exactly what it is — a collection of predefined code that enables quicker development.
- SDK (Software Development Kit): Third-Party tools and frameworks that add some additional super-cool features come under SDKs. SDKs also provide frameworks a lot of the time.
To be honest with you, a lot of the SDKs provide most of the above. We have the software neatly put into these SDKs which makes creating AR systems as easy as ARCore and ARKit!
Part V: Cool AR Checklist!
Now, developing an AR is no walk in the park.
You don’t just place a cube on top of a regular cube and call that a good AR. Video editing can obviously do that! For an AR to be super cool, you need to pay attention to some features like:
- Placing and positioning content as though they are in the physical environment: does your coffee cup dance around when it place it in the table (even if you move)? Then why is the AR cup doing the Harlem Shake every time you nod? Realism, not just in the looks but also the behaviour, is 🔑
- Scaling of AR objects: Everything doesn’t look the exact same in all angles. With distance and perspective, your content could become bigger/smaller in size and look different in shape. A good AR experience must be able to incorporate that in.
- Occlusion: Let’s suppose you want to see how your AR mousepad will go with your computer mouse. You place your mouse on top and yikes! the mouse pad comes on top of it. This is what happens without occlusion. Occlusion is basically what happens when an object is covered by another one. For this, AR hardware would need to understand relative positions and things like that if it wants to provide realism.
- Appropriate Lighting: Look at the two images below:
Notice the change in shading, colours and even behaviour due to the change in light.
This is exactly how a real object would respond to change in lighting — its shadows, colour hues and sometimes, behaviour would change!
In order to bring the maximum degree of realism into your AR system, we need to pay attention to the response of digital assets to change in lighting.
- Context-awareness: This ties into some of the other factors like occlusion, but to re-emphasize the point, the digital assets must be aware of the surrounding objects and space. It needs to know its features + how they change with the response to different factors. This needs a lot of tracking and hence, is one of the biggest challenges faced by AR developers today.
Part VII: Applications
If there’s one thing you know now, it’s that AR is not an easy task.
So why do we do all this hassle?
Well, that’s because it proves its worth by its BEYOND 𝙼̶𝙴̶𝙰̶𝚃̶ AMAZING APPLICATIONS!
Let’s look at some of these applications, starting with…
- Education + Training 👩🏫
When I was younger, I always wanted to study at Hogwarts (Don’t get me wrong, I still do). The pictures used to move and there was a lot of action involved. Interactivity levels were at max.
With AR, interactive learning could be the norm and that’d certainly contribute a LOT to more interest in students. Moreover, you now have no need to get stressed out over something like the DNA 🧬 which are super small — you can just see it in front of you!
To support, data shows that assembly workers training with Augmented Reality systems scored almost 10% higher for task comprehension than workers receiving manual training. Woah — that’s a really important aspect in training!
Not to mention, that'd be super cool to gaze at awe with, haha.
Oh, and did I forget to mention that a bit of this is already being done right now? It’s been taken to the classrooms with Expedition AR by Google!
2. Healthcare 🏥:
Have you ever freaked out about a blood test? It’s especially annoying when they try and map out your veins, am I right? You aren’t alone with this opinion.
In fact, 20% of people feel discomfort in such procedures.
Surely, there must be some way. Well, no.
Not until now, anyway.
AR applications could help medical professionals (especially first-timers) get a little map of your vein as digital content. This would help them get it properly and would you save you seconds of fear.
But this isn't the only application in healthcare.
AR, with its enhanced visualization capabilities, could do wonders when it comes to projecting the anatomy, especially in cases like surgery. It could even guide the surgeon through the steps which would make it much safer and risk-free. Crazy!
These are just two of the many, many use cases present and I’m sure you can already see them impacting you and your loved ones!
3. Gaming 🎮
Okay, raise your hand if you’ve ever downloaded Pokemon GO. Back in those days, it was all the hype (and did make some pretty bizarre headlines lol). By early 2019, it had hit a billion global downloads! Damn.
That’s not the best of AR though. To be honest, Pokemon GO wasn’t even fully AR; it didn't really understand its environment!
Imagine popping on some glasses and shooting zombies in your workplace. Now, THAT is some sick AR 😎
Oooh ooh, or maybe you could even remodel your whole place with some digital content!
Whatever it is, the possibilities are endless — and the only limit is your imagination!
Part VIII: How is AR present right NOW?
All this is super-cool but how do we really harness it as of now?
Nowadays, we can use AR via our smartphones, headsets and glasses. The future of AR IS promising some dope lenses though… 👀
All of these fall under two categories: either standalone or smartphone.
Standalone AR is AR that does not require external processors, memory, or power.
Smartphone AR is well, AR on your smartphones and the complete opposite of standalone AR!
An example of AR Smart glasses. These are super cool — explore more below 👇
AR Smart Glasses 👓
Kind of self-explanatory but AR Glasses are well, your typical glasses with some added AR magic ✨
There are a lot of companies promising this wearable, including Facebook, Apple and Google.
In fact, Facebook has already hinted us on what these glasses are capable of, which include:
- locating keys you’ve misplaced
- navigation. literally in front of you 😮
- overlaying a cooler environment (spicing up what’s present 😉)
But, that’s not the only one we know. Apple is working on some crazy cool AR smart glasses.
While they haven’t released much information YET, insiders say that Apple Glasses is going to have some pretty dope features. It’ll probably look like a regular pair of glasses, with some Lidar tech added on.
Note: Lidar is basically a simple way to map out the environment by calculating the distance between different points.
Anyways, the Apple Glass’ data processing is said to take place on the iPhone itself and it is expected that these will synchronize with Apple products like the iPhone.
Rumours also suggest some super cool features like cutting off the need for prescription lenses or the ability to change backgrounds.
But again, these are rumours (Apple hasn’t confirmed any of these) and honestly, only time can tell.
As of for smartphone AR, we have… smartphones!
I hear the moans, I hear the groans — You probably want a taste of that sick AR life right now. Well, all it takes is downloading an app on your phone.
Our smartphones, as mentioned before, have a lot of the cool hardware AR needs. So it can run AR (though not always the best 😢)
If you want to experience some AR right now, here is a page with ALOT of AR apps. Check it out after you’re through with the article so that you have some knowledge on how all of this works 😎
If you are reading this, CONGRATULATIONS! Your curiosity got the best of you in this 20-minute read.
You now have some super cool AR knowledge to brag about to all your friends. Now it’s up to you to use this knowledge and change the world 😉 🌎
Or you could always fantasize about the era where AR becomes about as prevalent as the device you are reading this on :starstruck:
Whatever it is, spice up the world with some AR!
Oh, and before you leave, here are some action items:
- Comment some takeaways from this article to really let the meaning sink (it’ll help give a TL;DR as well!)
- Share this with ONE friend who you think will find AR cool!
- If you are interested in learning more — check out this page of resources + sources!
Oh, and while you are at it — drop in some claps and give a follow to see more content like that! To stay on the loop with my crazy learnings — subscribe to my newsletter here!