I took a break during a recent trip to San Francisco to control a computer with my hands. Not with a keyboard, a touchpad or a mouse, but by waving them in the air — and not just in front of one screen, as you would with an Xbox Kinect setup, but in front of a wall's worth.
The occasion was a demo set up by Los Angeles company called Oblong Industries that spun out of MIT's Media Lab in 2006. Its chief scientist John Underkoffler had earlier designed the gesture-driven user interfaces that showed up in the 2002 film Minority Report, and now the company sells commercial versions of them.
Using this "g-speak" system, which Underkoffler demonstrated before I tried it, requires donning special gloves dotted with targets for the motion-capture cameras lining the room. (The gloves get a little sweaty.) This allows the system to track not just your hand, something I thought impressive enough when I first tried Microsoft's Kinect, but individual fingers.
So, for example, you point a finger with your thumb up to steer the cursor around the screen, then lower your thumb to select something. Thumb-forefinger circles serve as a "select all" command.
You can then toss items from one screen to another, making the data seem something that lives outside any one monitor. As Underkoffler put it: "Every device is a workspace where work might flow onto."
We used this to browse an interactive panorama of downtown L.A., select actors from a movie clip and drag them to another screen, and inspect a visualization of flight traffic across the nation.
The next demo combined lower-resolution gesture tracking via a Kinect-style camera with position data sent from the accelerometers in smartphones to let us play a Breakout-style video game.
After that came a simpler system still, in which a camera determined my hand's position so I could browse an onscreen map showing earthquake intensity. This had a different grammar (if this technology takes off, we'll all need to agree on a new language of gestures): make a fist to click, then move your hand to pan or zoom in or out.
My last stop was a look at Oblong's Mezzanine video-conferencing system. Its least-fascinating aspect was using gyroscope-enabled sensors to play with 3D models onscreen. Its most fascinating: dragging any one object, from a Web page to deck in a slide show, to the foreground on one monitor and having it show up on all of the dozen or so other screens in the room — including an iPhone and an iPad–almost instantly.
Underkoffler called this "a natural fit for the living room," which may be true on a sufficiently large budget. The closest anybody got to naming a price was to suggest Mezzanine's costs' lined up with those of high-end "telepresence" video systems — $300,000 or so from vendors like Cisco. Even substantial discounts would keep this a toy for the rich.
But I could see why this company has been fascinating techies for a while, in addition to getting business from Boeing and other industry and government clients. It also got me thinking, again, about how we could make more use of all the location and position data our devices gather.
And then I exited to a world in which people try to inform others by stepping through PowerPoint presentations, one tap of a keyboard at a time.
Credit: Rob Pegoraro/Discovery