AI tools such as ChatGPT, Claude, Gemini and Mistral do not understand what gravity means, how objects relate to each other spatially, or how to move around in a room. They only know two dimensions and still have to learn to comprehend the 3D world in which we humans live.
What is spatial intelligence?
This important evolutionary step that AI still has to take is called ‘spatial intelligence’. This refers to the ability of AI systems to understand three-dimensional spaces with all their physical laws, to navigate within them and to interact with them.
To enable this step in development, the database must first be expanded. While well-known large language models (LLMs) such as ChatGPT have been trained with text, spatial intelligence also requires information that enables a comprehensive understanding of the physical world. AI developers therefore also refer to ‘‘large world models’’ (LWMs).
Figuratively speaking, they should not only be able to recognise and distinguish between objects such as a ball and a feather, but also understand how they behave in a space based on their physical characteristics. While it is perfectly logical for an adult human to understand that a ball falls to the ground faster than a feather when dropped, AI must first be trained in this logic using the new LWM.
From a technical perspective, spatial intelligence is based on the fusion of various data sources, including camera systems, lidar sensors, radar and other spatial sensors. If this sounds familiar, it's because these sensor systems already play an important role in the development of autonomous vehicles. Here, too, the aim is to give driving systems as comprehensive a view of the environment as possible. In other words, cars should learn to see in a similar way to humans.