In one gesture-recognised step, Microsoft’s Kinect has virtually made the Wii obsolete. David Braue explains the tech behind Kinect and reveals where enthusiasts are taking it.
Nine years after the movie Minority Report
hit the cinemas, the film is still frequently referenced for the high-tech, keyboard-less user interface that Tom Cruise's protagonist used to interact with computers in the film's imagined year 2054.
It didn't take anywhere near that long for the interface to become reality, however: Microsoft's Kinect
has emerged from a three-year skunkworks dalliance to become a major money-spinner in the months since its November 2010 launch, selling over 3 million units in its first month of release and dominating the holiday console-buying market.
It might just look like another gizmo, but Microsoft's Kinect has literally far-reaching potential for the way we interact with technology.
Although the real fun was yet to come: within days of Kinect's release, new-technology tragics like Héctor Martin Cantero were already digging into Kinect to figure out how to use it to deliver a Minority Report-styled gesture interface, and myriad other capabilities, to everyday computers. Spain-based Cantero
won a US$3,000 prize
after responding to a global challenge by Adafruit - a group of technology enthusiasts known for opening up new technologies - that released a data dump of the Kinect's output and called on the global hacking community to produce a working Kinect-computer interface.
Cantero had the device working with his Linux notebook within three hours of purchasing it. His efforts, combined with the ongoing work of a 100-strong global community of hackers, have spawned the OpenKinect project (openkinect.org), an effort to produce free, open-source libraries like libfreenect
that allow Kinect to be used as an input device for Windows, Linux, and Mac systems.
"Our work is done here," says an Adafruit representative that characterised the group's mission statement as being to "unlock the potential of commodity hardware allowing scientists, educators, students, artists and more to create…Others will push this forward now."
And push they are. Enthusiasts are finding new ways
to exploit Kinect's unique motion-tracking capabilities, with the Minority Report-styled interface now realised and other applications emerging every day.
It's an amazing result for a product that was the stuff of fantasy nine years ago yet can now be bought at your local games shop for less than $200. But how does it work its magic, you ask? Here's a rundown.
The Kinect uses a skeletal model that breaks down the human body into dozens of segments and joints.Physical sensors
From its perch in front of the TV, Kinect relies on a pair of depth sensors to measure the distance of objects in the room in three dimensions by emitting infrared 'structured light' beams – which project a specific grid pattern that is distorted based on the person's distance from the emitter. These images are measured by an 11-bit 640 x 480 pixel monochrome CMOS sensor providing 2,048 levels of grey, which builds a map showing the distance from the sensor of every point in the image.
The unit also incorporates a conventional 640 x 480-resolution RGB webcam for videoconferencing, capturing in-game shots of players, and so on. Many experimental Kinect applications overlay the depth and RGB camera data to create interesting reality-manipulating applications. There's also a motor that improves tracking by rotating towards the speakers or players, with a 27° range of vertical motion.
In addition, four directional 16KHz microphones, which are angled towards different parts of its 180-degree field of vision, use noise-cancelling technology (calibrated during an initial setup process) to filter noise from games and the environment. The mics can distinguish different players using positional techniques: for example, if a family is playing a quiz game and family members are spread across the room, Kinect can compare the relative strength of the voice signals it receives through its four microphones to determine which person has yelled out the answer first.Software driven
These key sensors feed a rich set of interaction capabilities such as facial recognition, which lets the console instantly recognise people when they jump in front of the camera – automatically pulling up their avatar and profile, and adding them as a player in the current game.
The microphones can also drive free voice recognition, which launched with US English speech recognition capabilities at Kinect's launch but will be available locally this year as Aussie-accented dictionary files are processed and delivered through software updates. Dictionary files are being collected and analysed through both in-house work and, for users that opt in to an improve-my-experience option, a cloud-based collection and analysis system.
In the space of just a few years, Kinect has made sci-fi visions - like that of Minority Report - a technological reality.
The goal is to enable Kinect to manage full free speech recognition as opposed to limiting interaction to a small subset of words, says Jeremy Hinton, group category manager for Xbox with Microsoft Australia. "Kinectimals does have an element of voice recognition where you can name your animal, but that's fairly simple," Hinton explains. "The body gesture and voice recognition systems work on algorithms that sit on the cloud, and the more data they have, the smarter they get. You'll eventually be able to have full control of the dashboard – for example, ask the Xbox to suggest a movie for you, and it will bring up suggestions, letting you choose from a list – using only your voice."
Kinect's most important algorithm, however, is its gesture recognition. Feedback from the infrared sensors lets it continually monitor the depth of various objects in the room by comparing the feedback one sensor gets with the feedback from the other, just centimetres away. This lets Kinect calculate a three-dimensional view of the people in front of it, and its depth perception means it can tell when an arm is outstretched in front of the player, or held behind their back, or whatever position their limbs are in.
A detailed patent application, called 'Gesture Keyboarding', gives some insight as to what's going on inside the algorithm. Kinect uses a skeletal model that breaks down the human body into dozens of line segments that are drawn between joints in every part of the body – from the position of the head and neck, down to the position of individual fingers and toes.
Using this model, common physical motions can be broken down into distinctive combinations of skeletal segments: for example, walking in place would be detectable by the movement of the hips and thighs; the angle between the two lines would be continuously calculated to detect when the legs were lifted enough to count as a step. A jump could be detected by analysing the angle of the shoulders, hips and knees to see whether they're straight or at angles to each other, then monitoring those segments for a rapid change in acceleration as the person jumps up.
To minimise false recognition, the system can return a degree of certainty in its calculations, corresponding to a particular certainty that a particular gesture has occurred. This allows the system to differentiate between actions where gross movement detection is enough – for example, punching in a boxing game – and those where movement must be tracked with a high degree of certainty, such as when tracking the position of fingers for use in mooted sign-language recognition applications. A computer user interface would also require a high degree of certainty to avoid unintentional user activities.
Just how Kinect will be used in the future is anybody's guess. However, by both releasing the device at an affordable price and backing away from early claims that it would work to prevent hacking of the unit, Microsoft has spawned a revolution in the way people interact with their computers. Full-body motion recognition could pave the way for everything from detailed sports analysis for push-ups or tennis swings, to physical therapy, dance instruction, the list goes on.
"You're already seeing hackers in the marketplace coming up with innovative solutions," says Hinton. "We're really supportive of it, and made a conscious decision to leave that USB port to the Kinect open. Technically, it recognises anything in the room: it's a matter of developers coding anything they want it to do."