How Microsoft Kinect Works

Kinect Software Learns from "Experience"

Kinect's software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 48 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial details [source: Rule].

In an interview with Scientific American, Alex Kipman, Microsoft's Director of Incubation for Xbox 360, explains Project Natal's approach to developing the Kinect software. Kipman explains, "Every single motion of the body is an input," which creates seemingly endless combinations of actions [source: Kuchinskas]. Knowing this, developers decided not to program that seemingly endless combination into pre-established actions and reactions in the software. Instead, it would "teach" the system how to react based on how humans learn: by classifying the gestures of people in the real world.


To start the teaching process, Kinect developers gathered massive amounts of data from motion-capture in real-life scenarios. Then, they processed that data using a machine-learning algorithm by Jamie Shotton, a researcher at Microsoft Research Cambridge in England. Ultimately, the developers were able to map the data to models representing people of different ages, body types, genders and clothing. With select data, developers were able to teach the system to classify the skeletal movements of each model, emphasizing the joints and distances between those joints. An article in Popular Science describes the four steps Kinect's "brain" goes through 30 times per second to read and respond to your movements [source: Duffy].

The Kinect software goes a step further than just detecting and reacting to what it can "see." Kinect can also distinguish players and their movements even if they're partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room.