Photograph by Tom Henheffer; Photograph by Cole Garside Photograph by Tom Henheffer

The $150 robot revolution

How Microsoft’s affordable Kinect video game system is changing the world of advanced robotics

Tom Henheffer

November 3, 2011

A sudden gust of wind blew a six-bladed, remote-controlled helicopter over a white bus half buried in bricks and busted slabs of concrete. Jimmy Tran, a Ryerson University doctoral candidate, scrambled at the multi-levered controls as the device shot toward the horizon. “I had to land it as fast as possible,” he says. “ I didn’t want to hit power lines or cars.”

Despite his efforts, the hexarotor, now a mess of shattered blades and smashed chip boards, sits among the piles of electronics at Ryerson’s Network-Centric Applied Research Team’s (N-CART) lab. “That’s 5,000 bucks, another 1,000 for the parts to repair, plus man hours,” says Alex Ferworn, who oversees N-CART. But it could have been much worse—if not for one piece of hardware cradled under the helicopter. Ferworn’s group uses robots and computers to help search and rescue, bomb disposal and crime scene investigation teams. The day the chopper crashed they were testing a new technique to map rubble using a 3-D scanner that generates images to help rescuers.

And that scanner did not cost tens of thousand of dollars, like the scanners on most unmanned aerial vehicles (UAVs). It cost $150, and it came from a Microsoft Kinect video game.

It’s the crumpled piece of hardware that was left hanging under the chopper at the crash site, and it saved the project simply because it is so cheap to replace. “Something that costs 150 bucks, we’ll laugh about it. It’s change,” says Ferworn. Now, he says, the only lasting result of the gust of wind, aside from a few costly repairs, is that everyone is calling Jimmy, the pilot, “Crash.”

The Terrifying Rise of Ransomware Gangs

A new generation of ultra-sophisticated cybercriminals are targeting governments, corporations, hospitals and libraries—and laying bare how ill-equipped Canada is to fight back

A Canadian in Space

Commercial space pilot Jameel Janjua recounts his rigorous training and first experience in space, and why space tourism is not just for bored billionaires

Microsoft Kinect is fuelling a revolution in robotics. At its core is a simple device allowing gamers to eschew traditional hand-held controllers and interact with Xbox 360 consoles through gestures. About the size of a large remote control, it’s equipped with a standard video camera, a specialized microphone array, and a depth camera—the sensor that has the world’s roboticists and hardware tinkerers so excited. It works by projecting an infrared laser grid onto surfaces and scanning distortions in the digital mesh. This information can then be combined with the standard camera’s feed to create a full-colour, 3-D image. “A normal camera is like closing one eye,” says Patrick Bouffard, a Ph.D. student at University of California, Berkeley, who created a flying robot that pilots itself using the Kinect. “The Kinect gives you a 3-D picture of the world.”

The device is so cheap mainly due to the economies of scale—the worldwide market for a $20,000 precision-depth camera might be made up of a handful of researchers, but the potential for selling to millions of gamers drives down the Kinect’s price.

That market embraced the device with open arms when it launched in November 2010, under an ad campaign promising “You are the controller.” Sales reached eight million by January, shattering the world record—previously set by the iPhone and iPad—for the fastest selling consumer electronics device.

But the consumer market didn’t come close to using the Kinect’s full power. The Xbox 360 only used it for two things—voice command recognition and tracking body movements. But hackers quickly created programs using it to control robots, read hand gestures for high-level interaction with computers and scan real-world objects to create 3-D digital models.

The first step was to develop software for controlling the Kinect from a computer. Adafruit Industries, an open-source electronics company, got the ball rolling by offering a $1,000 reward to the first person who posted a program, also called a driver, online. “It was pretty clear to us Microsoft didn’t have any plans at all to open any parts of the Kinect,” says Limor Fried, Adafruit’s founder. In fact, Microsoft immediately condemned the project, without realizing there was division within its ranks—one of the Kinect’s developers, a computer scientist named Johnny Chung Lee, asked Adafruit to launch the contest, putting up the cash himself.

So when Microsoft threatened lawsuits against anyone tampering with their hardware, Adafruit raised the prize to $3,000. The drivers were online within a week.

Among the vast web of loosely connected individuals working with the technology, Bouffard was one of the first to show off the Kinect’s potential, posting videos of his autonomous flight on YouTube by December.

“In order to do interesting things with autonomous systems, you need to be able to sense your environment. Being able to sense it in 3-D really makes a big difference,” he says. More expensive sensors may give better data, he concedes, but cost and availability are critical. Just like the Ryerson team, he smashed his first Kinect and was only able to continue his research because it was so easy to replace. “The data has a lot of noise and isn’t the highest resolution, but the Kinect is low cost and pretty lightweight, which is key.”

By last March, a team of researchers at the University of Konstanz in Germany had strapped the Kinect to a helmet and developed a navigational aid for the blind. It works by running the video feed through software that reads directions posted around the environment, and alerts users to nearby objects with a vibrating belt. “You suddenly have an instrument in your hands opening possibilities that weren’t available before,” says Michael Zöllner, a master’s student who helped develop the device.

Gesture technology similar to what is powering the Kinect has already been integrated in 50 million cellphones in Japan, where users can set their devices on a table and swing their hands to digitally box opponents or play Ping-Pong. And Microsoft’s sensor even has applications in hospitals—surgeons at Sunnybrook hospital in Toronto use gestures to control medical images as they operate, saving precious minutes normally spent washing up after touching a non-sterile computer.

Every time a new innovation is discovered, a video is posted online, and another group watches and comes up with their own ideas. That’s how Ferworn and the Ryerson team decided to start using the Kinect.

The Ontario Provincial Police had asked for their help mapping potential bomb sites. They strapped an expensive laser scanner to a bomb-disposal robot, and started creating 3-D images of test sites. It was slow and expensive. “Jimmy showed me a video of this guy in Germany sitting in his basement, doing amazing things with the Kinect,” says Ferworn. The team was impressed, so they bought a handful of the devices, strapped one to the robot and wrote a program to stitch together the information it captured into a 3-D image. It worked so well—and was so cost effective—that they decided to apply the technology to mapping collapsed buildings. So far, it’s been used only at an OPP test site, but could be sent into the field to help rescuers find the likely locations of survivors after earthquakes or terrorist attacks.

While his team and tinkerers around the world are on the cutting edge of what hardware like the Kinect can do, the idea behind the device—video gesture control— actually has its roots in the ’80s with a psychology student and dance enthusiast from Toronto named Vincent John Vincent. He and a computer scientist friend named Francis MacDougall designed a system connecting a video camera to an early home computer running software to track a performer’s movement. The system worked in real time, placing Vincent, standing in front of a green screen, in a virtual environment on computer monitors. He was surrounded by interchangeable images of instruments which he “hit” to produce synthesized sounds as he danced.

The technology was far ahead of its time, and garnered interest from futurist investors like David Bowie and Moses Znaimer—who bought an installation for his office—and museums and science centres like the Smithsonian. By 2000, Vincent and MacDougall had created the world’s first interactive 3-D gesture interface. Sony bought some of their technology for the Playstation EyeToy, a crude precursor to the Kinect. Then, in 2006, Microsoft licensed patents from the company, eventually turning that technology into the device sitting on top of TVs and robots across the world. “Large companies need to embrace gesture control for it to be seen as the next innovation in computer devices,” says Vincent. “Microsoft showed that this is the future of technology.”

Microsoft, in fact, has done a complete about-face since its initial threats of litigation against Kinect hackers. In June, it released a free development kit containing its own official drivers. Then the company made the algorithms powering the Kinect public, making it easier for independent developers to create their own custom tracking hardware.

Futurists like Vincent and roboticists like Ferworn see this open-source, do-it-yourself system leading to massive technological changes in the coming years. Autonomous robotic flight, full 3-D scanning and dozens of other seemingly impossible advancements have already arrived—and were created, more often than not, by university students or inventors playing with modified Kinects in their basements. “People can build things, apply them directly to a problem, and they’re on a level playing field,” says Ferworn. “It changes the whole face of robotics.”

Tags: