Skip to content
Who's in the Video
Jeremy Bailenson is founding director of Stanford University’s Virtual Human Interaction Lab, Thomas More Storke Professor in the Department of Communication, Professor (by courtesy) of Education, Professor (by courtesy) Program in[…]

What if your work commute was as fast as putting on a headset? In the near future, working from home will be revolutionized—although virtual reality is not quite there yet, says Jeremy Bailenson, Founding Director of the Virtual Human Interaction Lab at Stanford University. In order to make virtual meetings a reality, where your avatar interacts naturally with others in real time, VR developers are chasing one quality: interactional synchrony. “Psychologists have been studying this for decades, since the 1960s, and the idea is that conversation, it’s a very—it’s an intricate dance, and when we’re in a room with people everything is so tightly choreographed. When you nod your head I change my intonation. And when she moves her elbow my knee bobs. And there’s all of these pairwise movements and that’s what makes a conversation feel special face to face,” says Bailenson. If VR programmers can capture this quality, it will be the end of commuting for those who want it. No more wasted productivity in bumper-to-bumper traffic, no more subway hotboxes of colds and flus, no more unnecessarily burned fossil fuels. “Maybe we only need to go two days to work. And for those meetings that are not essential, we need to put those in VR.” Jeremy Bailenson is the author of Experience on Demand: What Virtual Reality Is, How It Works, and What It Can Do.


Jeremy Bailenson: If I could succeed in any endeavor as an academic it would be perfecting what I call the virtual handshake. And I don’t mean an actual handshake, I mean that metaphorically. Why do we go to business meetings to be with other people? Because there’s a social connection, this intimacy that when you’re in the same room it feels like you’re there with them and you can do eye contact and you can do subtle posture changes and you can have multiway conversations with sidelong glances, and it feels real. We call that social presence.

VR is not there yet. But if you think about cars: 40,000 people died in the United States last year driving and 1.3 million people worldwide died in car accidents. Think about the productivity lost by sitting in a box for an hour each way to and from work. Think about the fossil fuel that we’re burning while we commute back and forth to work. Think about the road rage. Think about the germs that you get on public transportation. I’m not claiming that we should not see people; I love social connection. What I’m saying is that there’s a subset of travel that if you think about it, why do we drive all the way to work so we can sit at a desk and pound on a computer? Maybe we only need to go two days to work. And for those meetings that are not essential we need to put those in VR.

We cannot support a planet of 11 billion people—which we’ll be at quite soon—with everybody driving and flying everywhere using fossil fuels. It’s just not going to happen. So why don’t we have networked meetings yet? And the answer is because there’s this secret sauce, this social presence that we have face-to-face that we don’t get with videoconference yet. And VR isn’t there yet. So what we need to do is to be able to track more body movements.

The bottleneck is actually not bandwidth because avatar-based communication is cheaper from a bandwidth standpoint than video. The reason is, if you’re doing an avatar-based communication all the 3D models for the avatars are stored locally on each machine. What travels over the network is the tracking data. So locally a camera detects that I smiled and then it sends over network a packet that says smile at 22 percent. And then on the other computer it then draws that smile. So you’re not sending visual information over the network. What you’re sending is very cheap information which is semantic information about movement. The bottleneck is we can’t track movements that accurately. So if you think of the commercial systems right now they track what we call 18 degrees of freedom. Your head and both hands. You can do rotation which has three and X, Y and Z which is obviously three. And so you’ve got 18 points, two hands and a head. In order to have a conversation flow we need to have subtle cheek movements and the twitch of my elbow. Everything I do communicates meaning whether I’m doing it intentionally or not. And the theory that drives this understanding of how humans interact verbally and nonverbally is called interactional synchrony, and psychologists have been studying this for decades, since the 1960s. And the idea is that conversation, it’s a very—it’s an intricate dance and when we’re in a room with people everything is so tightly choreographed. When you nod your head I change my intonation. And when she moves her elbow my knee bobs. And there’s all of these pairwise movements and that’s what makes a conversation feel special face to face. We have to track all the movements of the people in the room in a way that’s sufficient to get that synchrony across.


Related