Our [Digital] Planet
We are scanning every corner of our globe using equipment in space, in the air and on the ground with a variety of devices ranging from space-based radar imaging down to the LiDAR-equipped iPad. What are we going to do with all this data? How are we going to disseminate it? When will it all be ready?
This is a follow up from my Through the Mirror World article where I dive a little deeper into the topic of the acquisition and eventual usage of the digital data representing our planet.
Scanning the World
A popular term these days in the context of this article is digital twins. I discussed digital twins briefly in my previous article but it’s worth exploring a bit further. IBM has a good webpage on the subject in which they describe a digital twin as follows:
“A digital twin is a virtual representation of an object or system designed to reflect a physical object accurately. It spans the object’s lifecycle, is updated from real-time data and uses simulation, machine learning and reasoning to help make decisions.”
— IBM (https://www.ibm.com/topics/what-is-a-digital-twin)
This is similar to the description in my previous article, but IBM articulates it clearer and more complete. Ultimately, digital twins are meant to aid in the decision-making process. For example, a digital twin process could involve the collection and analysis/simulation of real-time data from sensors attached to a system/product of interest. The resulting analysis/simulation could show inefficiencies in the process flow, or could identify components of a product that wear faster than others, etc. This enables process designers and engineers to improve the system or re-design a component to make the product last longer in this example.
A common [adjacent] digital twin practice today is companies scanning a part of the world that is interesting to a customer whether it be an archeological site, a famous church, a new building’s interior, a city bridge, a nation’s farmland, and so on, and creating a 3D model from the scan. Eventually, people dream about the entire world being scanned and replicated in 3D with every cubic centimeter across the planet accounted for. That would be a lot of data.
Coming back to the here and now, the simplest and most cost-effective form of this scanning practice is being done by ordinary folks using a LiDAR-equipped iPad. This breakthrough feature delivered with the iPad Pro in 2020, along with Apple’s augmented reality frameworks, enables people around the world to easily measure distances without a measuring tape (surprisingly useful) as well as scanning a real-world scene like their living rooms (which seems to be the most popular first scan). An example model of which is shown below.
As we can see from this image of the 3D model created, it looks a little choppy. Some of the voids could’ve been “cleaned up” if the user had walked around from different vantage points allowing the LiDAR scanner to “see” all angles of the features in the room but we would still be left with a lackluster model that we can’t do much with. This process is constantly improving and the scans are getting better and better. With these improved scans comes higher resolution data and increased storage requirements. The 3D mesh can be simplified using various techniques that effectively remove redundant information (e.g., vertex clustering) and/or erroneous vertices (e.g., quadric error), however, the end product is still going to require a lot of vertices to capture all the corners, dips, straights, bends, etc. of the furniture in the house. To further this point, the professional, high-resolution scan of a power sub-station shown below will require a huge amount of disk storage space in comparison to the living room scan above.
If we go outside and extend this to scan the interior of every house, every road, every object on the road, and so on, the amount of data would grow exponentially. Scanning an entire city would require huge amounts of storage.
Scanning for the Augmented World
In addition to digital twins, augmented reality (AR) pioneers aim to scan the world albeit in a slightly different manner. For augmented reality to work, the scan must always be from the individual’s perspective on the ground. This perspective, engineered into the emerging GeoPose standard, enables the augmented reality glasses to recognize the person’s location precisely allowing their surroundings to be augmented with information of interest. Examples include visual and audio presentations while standing at art installations, and menus displayed before your eyes as you stand outside of a restaurant. The uses extend well beyond these simple examples.
To enable (e.g.) augmented reality glasses to overlay the world in front of the wearer’s eyes, the AR software must understand what it is looking at through the camera lens mounted on the glasses. Starting with the device’s geodetic position, which is grabbed from its GPS receiver, key features must be recognized from the surroundings: corners/edges of buildings, rectangular street signs, high-contrast boundaries like grass vs. concrete, etc. To recognize these features, the data representing the buildings, street signs and concrete roads must be referenced. This data is stored as tracked point clouds. Tracked here means each significant point in the camera’s view is being tracked in 3D space and can be precisely positioned on Earth based on the GeoPose. A variety of methods can be used to identify and track these key points, known as geospatial anchors, including, but not limited to, LiDAR (laser range-finding of each point), simultaneous localization and mapping (SLAM) or structure from motion (Sfm), where the latter two methods are based on the camera’s optical video imagery.
To store all this data, the concept of the AR Cloud was introduced by Ori Inbar back in 2017 and, since then, he and his colleagues created the Open AR Cloud organization “to drive the development of open and interoperable AR Cloud technology, data and standards to connect the physical and digital worlds for the benefit of all.” Cool stuff. Even with its lofty and altruistic goals, the AR Cloud faces the same problem as digital twins: how is the huge amount of data going to be stored and disseminated to end users?
Delivering the Digital World
In the case of the AR Cloud, a unique aspect of augmented reality drives part of the answer. All someone wearing a pair of augmented reality glasses needs is an exact position and precise view of their surroundings. This can be done by the AR software sending the tracked points to the AR Cloud, services in the cloud matching the tracked points to the millions of geospatial anchors already generated and stored and, voilà, returning the GeoPose back to the device. This obviates the need for the end user to download all the geospatial anchors (a massive amount of data) and match the tracked points locally. From here, the AR software will understand what objects are in the vicinity and can augment the view as described earlier.
The other part of the answer as to how the data will be disseminated, or what is downloaded in this case, is hinted at in the last sentence above, the objects of interest, e.g., the restaurant’s menu. This data must be downloaded by the client software, but a menu is a lot smaller in digital size than a 3D mesh of the restaurant’s front window.
In the case of scanned 3D digital twins, once the data is collected, optimized and stored, the problem of disseminating it to end users is not as easily overcome as it is with the AR Cloud. For this data to be useful, you need the complete set of 3D meshes to render the corresponding scene in real-time. And we know this data can get big, fast.
* I’m purposely ignoring server-side rendering in this article, like the infamous Google Stadia, as it doesn’t change half the problem of the planet’s digital data: storage.
Cloud storage and network egress costs have come down significantly, but we are still nowhere near the ability to stream exabytes of data to a PC or mobile device. Of course, a user would only be interested in a small corner of the world so they wouldn’t have to download data on the order of exabytes, but gigabytes don’t even seem reasonable to measure all the data necessary to digitally replicate a city in 3D.
Do we need to wait until cloud storage and network egress costs plummet? Telecom companies inventing new wireless and wired technologies to deliver all this data essentially enabling “the [3D] world in the palm of a child’s hand?” Technology moves fast but not that fast.
If we sacrifice a little fidelity, we can build something now that will add a lot of value across many industries.
Generalized 3D Models
Engineers are notorious for being perfectionists. Google is an excellent example of this as Paul Graham eludes to in the tweet below.
Google abhors imperfect, eclectic and interpretative products. This attitude extends to many other engineers, especially in Silicon Valley, where they seek a 100% solution to a problem. In the case of the planet’s digital data, 100% perfection can never be achieved. There will always be one cubic centimeter somewhere on Earth that will not get mapped/labelled. This is a clear case of “the last 10% taking 90% of the time and resources,” which is a software development concept that I believe stems from the 80/20 rule, or the Pareto principle.
An application of the 80/20 rule in the context of our planet’s digital data is to fake the last 10–20%. Okay, ‘fake’ is being a little flippant but it gets the point across. If we sacrifice fidelity to the real world by using less-than-perfect data to represent the fine details of our planet, we can massively reduce storage costs and enable use cases today that wouldn’t otherwise be feasible for years to come.
This faking of data comes in the form of generalized 3D models. For example, instead of scanning a mailbox on the side of the street down to 1 cm precision/accuracy, where every last detail including the scratches on the paint are captured, we use a “stock” 3D model of the mailbox. The stock model would be at least an order of magnitude smaller in storage size, which translates directly to less bandwidth being used to download the model. This can be thought of as a very large-scale form of cartographic generalization, where the fine details of a 3D object (like the scratch on the mailbox), need not be included to achieve > 90% fidelity of that street-side view.
This leads us to the next question: how do we create generalized 3D models of all the objects on our planet? There are several approaches:
- Apply traditional video game workflows using tools like Blender, SketchUp, etc.
- Provide tools to enable crowd-sourced (or user generated) 3D content generation
- Use AI to create generalized models based on imagery like street-level, satellite, etc.
Video game workflows are a tried-and-true practice to generate 3D content but there are some nuances that make it less than ideal for real-world objects. For example, doorways in video games are often modeled larger than in real life so it is easier for players to pass through the door. (Turns out players run into true-scale doors and get stuck or held back.) Having said that, there is a vast array of 3D models available today; some paid, some free.
Even if these models are paid for, the cost of buying the models and placing them at their correct geographic coordinates would be orders of magnitude less expensive than taking equipment to a site, scanning the scene, returning to your computer and processing the data into a 3D model, only to gain a marginal increase in fidelity in the case of non-descript objects.
Alternatively, instead of having the user exit the virtual world to use their 3D modeling app of choice, what if there were building tools available in-world? This would enable users (i.e., the global geospatial community) to build and place the model in the same workflow. This is what Second Life did with their “prims.” While the concept is interesting and could be quite powerful, you end up with non-standardized world objects, which is great for a user-built virtual world like Second Life, however, would detract from the global virtual world’s look and feel. This was exemplified when Google struggled to accept heterogenous, non-standardized, user-generated 3D buildings for Google Earth. They eventually turfed the program in favour of using automated photogrammetry and LiDAR.
This leads us to having a computer generate the content in a standardized format and level-of-detail/fidelity.
The capabilities of AI today are impressive indeed. We are seeing the tip of the iceberg of what AI can offer in the years to come. Some may think the utopian AI future is already here. However, as I’ve said many times before, people are resistant to change. Even with the perfect tech, there will always be laggards. Having said that there are practical use cases today that AI can offer significant support to, and creating 3D models from imagery is one of them. There are lots of people and organizations around the world who have developed capabilities in this vein, however they focus more on the detection side of things, the 100% solution that generates a perfect 3D model. But when it comes to the dissemination side of things, they haven’t considered the size of the model and how that will affect network egress costs and download times. This is crucial for the lifelike 3D virtual world’s adoption; friction must be minimized. Like tap shoes on ice minimized.
Leveraging AI to create and/or place generalized 3D models has a lot of potential in developing the lifelike virtual world. For example, Blackshark.ai is pushing the bounds of using AI to identify and model objects from space-based imagery. Taking this to the ground, there are several AI startups (probably more) pursuing automatic segmentation, labelling and feature identification from street-level imagery like that from Mapillary. However, their efforts are almost exclusively for the autonomous vehicle industry, a major component of which is training the vehicle AI software. This blindly ignores the training of humans.
Steering a portion of these AI efforts toward identifying objects and creating/placing generalized 3D models could go a long way in getting the human-enjoyable lifelike virtual world off the ground.
Client-Side Procedural 3D Generation/Rendering
In addition to generalized 3D models, another technique to ease the size of models being downloaded is client-side procedural 3D generation/rendering. This involves generating the 3D content on the client device from simpler models, even 2D data, as opposed to generating the content offline, uploading to a server and the client downloading the prepared 3D model from the server. The best example of this is generating a 3D road.
The typical process for generating a real-world 3D road is to take 2D road data, apply parameters like road width, inclusion of markings, shoulders, curbs, etc., and generate a detailed mesh. This is the process used by tools like Terragen and BISim’s TerraTools. Again, this mesh then needs to be downloaded by the client.
What if we download the much smaller 2D data and perform the road generation on the client device? In previous years, this was performance-prohibitive; there was not enough compute power on the client’s device to perform this in real-time. However, today’s PCs equipped with powerful graphics cards, along with the advancements in programmable rendering pipelines, enable these approaches to a certain level of fidelity. Where this fidelity to the real world is easily >= 80%.
Bringing the Digital World to Life
One thing that the scanned digital twin cannot do is animate the scene. The 3D mesh generated from the scanning and optimization processes cannot [easily] have moving parts. For example, if you scan a crane and generate a 3D model, the crane cannot move. You can’t virtually hop into the cockpit and begin operating the virtual crane. This is a major limitation to using scanned digital twins to model our planet. Interactivity brings such a massive jump in usability, which increases adoption rate which, in turn, brings all sorts of tangential use cases to the forefront.
By using 3D models like those from the video game industry, we can articulate the moving parts to bring the model to life. Widen our scope to the whole planet and we get an ocean with waves, alpine forests waving in the wind, rivers flowing, animals grazing in the meadows, cities that come to life with moving cars, etc., etc.
Having said that, static scans still have their place. They seem especially useful to capture important sites digitally where all the details matter. For example, ancient indigenous sites, historical churches, statues, etc. In these cases, these static models can still be placed inside the lifelike virtual world. The added benefit is the model would be situated in context, with all its surroundings, as opposed to looking at it in a standalone viewer.
Conclusion
Circling back to digital twins, the value they provide is in the improvement of a system, process or product. These improvements can reduce a building’s energy usage, reduce the cost of manufacturing components or improve the quality of a product allowing the company to sell it for a higher price to name a few examples. However, I have yet to see an example of a digital twin that has made a significant economic impact. And they’re generally not cheap to implement. Bluntly, the benefit-cost ratio of digital twins is low, especially a scanned digital twin.
The AR Cloud has many exciting use cases, but it will take time to build. AR, by nature, necessitates the user be physically near the objects of interest, where they will need to upload the tracked points to the cloud.
Generalized 3D meshes are low-cost to develop and disseminate, and can be used for business cases today, especially those that require an interactive 3D digital world like training. Scanned models can then be added over time as the need arises.
AI shows huge potential in the identification of objects from imagery but it’s still early days.
Client-side procedural generation offers further promise to reduce the size of data to be downloaded which, as we know by now, further reduces cloud storage and network egress costs.
If we combine generalized 3D models, human-end-use AI object identification and placement, client-side procedural content generation and animated 3D models, we can deliver a low-friction, digital version of our planet today that offers many benefits to its human inhabitants.