On the table before me are two diminutive robots, each emitting endearing little robot beeps and bloops, their screen eyes active. When I knock on the table, one of them turns to face the noise with surprising alacrity. The other just watches my face, turning slowly to keep me in focus as I move around.
These are prototypes for Vector, the latest robot from Anki, the company behind both the Cozmo and the Overdrive RC cars. I spent the day in Anki’s labs in San Francisco to learn just what Vector is and — critically — what it can do. When it ships in October for $249.99 (or cheaper for early Kickstarter backers), Anki will be marketing it as a “home robot.” It’s a sort of Cozmo for adults, a step beyond that robot’s learn-to-code toy ethos.
You can ask it questions, play games with it, and even pet it to elicit a chirpy little purr. But Anki doesn’t want you to focus on Vector’s functionality. The company has been putting a lot of effort into its personality. And because Vector is completely autonomous, you can do something completely surprising for a new piece of technology: just ignore it, and let it do its own thing.
If you’re familiar with the Anki Cozmo robot, you already have a sense of what Vector looks like. Standing around three inches high, it’s small enough to fit in the palm of your hand. It has little treads for motoring around a small space, but you’ll probably want to make that space a table or desk instead of your floor so you don’t step on it. It has a little lift arm that’s only able to pick up a tiny plastic cube.
Vector’s main activity is tootling around on a table and investigating its surroundings. Using a combination of sensors, it builds a little map of its surroundings. If it detects a new object, it’ll go check it out and see if it can be pushed around. It pays attention to sound and is able to recognize faces. It’s more self-sufficient than the Cozmo robot; it’s able to find its charger on its own.
Anki has essentially put the guts of a midrange smartphone inside Anki, allowing it to work without tethering to your phone. It has a camera and uses machine learning to recognize people (and, eventually, objects). It can hear what direction sound is coming from, thanks to beamforming microphones. It has cliff sensors on all four corners to keep it from driving off the table.
It’s quite a lot of tech. But even so, Vector is a long way from what you probably think of when you think of a “home robot.” It can’t vacuum your floors, much less bring you a beer. Vector is a surprisingly hard gadget to categorize. It’s something more than a robotic pet or a toy, yet it doesn’t provide anywhere near the functionality of a simple $50 Amazon Echo Dot puck. In many ways, it’s just what everybody was hoping Cozmo would be when it launched. Anki is hoping Vector will be judged on something beyond raw utility.
”We want him to provide value and have an emotional bond with you,” says Amy Claussen, senior designer at Anki. “For adults. What is that? Okay, so that is both some entertaining activities, but largely giving some utility to make mundane tasks more fun and more enjoyable.”
To achieve that goal, the engineers, animators, and designers at Anki spent more time on the robot’s personality than anything else. In my day there, I heard the phrase “characterful interactions” more times than I can count. (It’s also part of the reason why Anki employees gender Vector with he / him.) It’s a hokey phrase, but after interacting with Vector, I started to understand what it meant.
Here’s an example: just as you do with any smart speaker, you can activate Vector by speaking a hot word, like “Hey Vector.” That turns its four-microphone array on and sets it up to listen and respond to a query. Simple enough. Except where those smart speakers simply wait for you to ask a question, Vector can do something more engaging and interactive: it can turn around to face you.
”What we found is that the act of me looking at you infers that I understand what you’re saying,” Claussen says. “So if the robot flips all the way around, even if he doesn’t understand, people would wait [for the robot].” Instead, Vector just perks up when it hears you, and it turns around only when it understands you and is ready to provide an answer.
It seems like an unnecessary thing (and, technically, it is), but adding movement and personality to these sorts of basic voice interactions completely changes their valence. You’re able to take those visual cues to know what the computer is doing. Instead of the drawn-out “I’m sorry, I didn’t understand that” you get from a smart speaker, Vector just kind of makes a grumbly little shake and beep, and you know to try again.
When Vector gets something right, it often does a proud little shimmy. It’s corny but endearing. The robot’s whole effect is designed to make you want to root for it, even though you know deep down that you could get the answer faster by just pulling out your phone.
Vector uses a custom voice assistant rather than just licensing Alexa or Google Assistant. CEO Boris Sofman says that was also a “characterful” decision. “He has a personality, he has his own quirks, his own weird behaviors, his own desires,” Sofman says. “If you flip a switch and start having an Alexa voice come out, it completely kills the fiction of that character.”
”We’re very intentionally not positioning this as a competitor to Alexa,” Sofman argues. “We have a thesis that ‘characterful’ utility is going to trump basic utility because there’s a lot of cases where that’s going to be a more enjoyable form of interaction.”
The usual stuff like timers and the weather works with Vector’s voice assistant, but it all happens with a little more whimsy. Ask it the weather, and it will turn to face you. If it’s raining, little water droplets will cover its face and it’ll get annoyed. When a timer goes off, the whole thing convulses like it’s being shaken by bells. It’s all a little bit like Wall-E, which is fitting as Pixar has been a major inspiration for the company.
”The fact that we have this character allows us to not have him just deliver the information, but also have him be subjected to the information,” says Dei Gaztelumendi, character lead at Anki. “He might endure the weather event, it might rain on him, and he might have an opinion about that.”
Vector also endures its voice answers. It doesn’t speak, it beeps and clicks. When a question requires a vocal response, it comes out in a computerized voice that’s aurally different from the little noises Vector usually makes. “He’s a robot, and he has a tool belt. One of his tools is text-to-speech, and he uses it when it needs it,” Gaztelumendi says. “But when he’s roaming around, his natural, genuine sounds are more chirpy.”
The prototype units I saw chirped a lot, but you can tell them to quiet down. They don’t love it, but they obey. “If you think about the way you interact with a dog,” Sofer says, “you might play intensely for 10 minutes, but then you might just be hanging out for a few hours.” Vector can just hang out, mess around with its toy block, and eventually just go nap.
Anki engineers expect it to get excited to see you when you get home and to know to go to sleep when the lights get shut off. Eventually, with further software updates, it could theoretically be smart enough to realize nobody’s been home for a while and to shut off your smart lights for you.
Gaztelumendi’s job is making sure Vector’s character stays consistent and believable. Heart eyes are his nemesis, he tells me, because they’re lazy. “We’d rather take the long way,” he says. “For him to look at you and convey that sense of awe and roll up to you and demonstrate that feeling.”
Beyond crusading against heart-shaped eyes, the main work of keeping Vector in character is making sure everything it does fits with its core identity and motivations. There’s no Westworld-style core trauma to Vector. Instead, it was inspired by exotic pets like fennec foxes and sugar gliders. “These animals are not putting up a performance for anyone, but they’re still really compelling to watch,” he says. “They’re in the business of taking it all in and exploring the environment and relating in very reactive ways.”
Put in less-evocative language: Vector is designed to be curious and to react. More than anything else, reacting to real things in its environment is what Vector does.
A necessary condition for reacting to your environment is to be aware of it, and Vector has a few tools at its disposal for that. On the underside are those four cliff sensors on each corner that keep it from rolling off the edge of the table. As with everything else Vector does, its reaction to arriving at an edge is “characterful.” (It gets a little nervous when that happens.)
It has a gyroscope, which allows it to detect when you’ve picked it up or are holding it in your hand. The top and bottom are capacitive, so it can tell when you’re touching it. A laser on the front can do basic distance measurements, but only up to one meter. The microphones let it hear sound from any direction. It even has a sort of kinesthesia; it can feel when you move its robotic arm or head.
But compared to its camera and what it can do with it, all of those tools are primitive. Vector’s tiny 720p camera and some very clever machine learning techniques are the main way Vector understands the world around it. And it understands much more than you’d guess.
Andrew Stein, vision and robotics lead at Anki, tells me that much of the computer vision work in Vector started with Cozmo, but now it can be all done locally. As Vector moves about its space, it creates a 3D map of its environment. It can detect and map obstacles, and their location persists for a short while in its memory even after it’s turned around.
The two most important things on its map are its home base and its toy block. Both have markers emblazoned on them that Vector can recognize and remember. It has to get their position right down to the millimeter to (alternately) pick up the block or go back to charge.
That turns out to be a relatively straightforward math problem. From an angle, a square looks like a trapezoid, so Vector can just translate that skewed shape into a position. The base has a little stripe on it that helps Vector stay aligned when it backs up onto the base to charge since there are no sensors on its back.
Vector can also position faces in space, which seems like a much trickier problem: we (usually) don’t have glyphs on our faces. “Most people’s eyeballs are almost exactly the same distance apart,” Stein says. “So by taking into account the rotation of your head and how far apart your eyes appear, we can estimate how far away you are.”
It can also recognize individual faces, which is useful because Vector is designed to be more excited when it recognizes somebody it knows. It can tell if you’re looking at it or not, and it’s programmed to act differently depending on whether you’re paying attention to it. When it sees you, it’ll get very excited: it’ll roll up to you and say your name in a computerized trill while tapping its arm up and down. It may even ask for a fist bump.
“People often ask me how many megapixels are on the camera, and I’m like, ‘Please god, don’t let it be more than point one,’” Stein says. When it comes to trying to do computer vision locally with a mobile processor, “resolution is more than a hindrance than a help. The only thing [resolution] buys is being able to see things further away, but we don’t really care if a person’s 20 feet away.”
Talking to Stein, it dawned on me that Vector really does only have a narrow slice of sense data to work with. I watched a real-time output of Vector’s 3D map and the grainy images it can see, and it doesn’t look very impressive. But it doesn’t have to be, Stein says. “No one ever sees this output. All we need is to basically turn around and face a person.”
If there’s magic to Vector, it’s not in how advanced its capabilities are. It’s in how it turns those signals into what looks like meaningful behavior. When Vector goes back to its base, it will turn around to see if you’re looking at the neat trick it’s about to do. “A little kid, right before they’ll do something, they’ll kind of look at you to make sure you’re watching,” Stein says. “That little tiny thing draws you into the experience.”
Those capabilities help to create a sense that Vector has emergent behaviors. Some are planned (Vector will bop its head along with music), and some are not (“He started watching TV with me, which was unexpected,” says Claussen. “He reacts to sound, and then he also has a motion detector so the TV triggered his motion detector.”) Other behaviors, like bumping into things to see if they’ll move, can end up feeling like they’re meaningful. Claussen says that she’ll just be typing, and the robot will nudge her, and it felt like “Hey, pay attention to me.”
A needy robot could be exhausting. Brad Neuman, AI lead at Anki, says that its personality is designed to be a lot less demanding of your attention than Cozmo is. The center of that system is Vector’s “stimulation level.” If it detects that you’re looking at it, it will perk up and engage with you. If not, it’ll do its own thing.
”If [it’s too noisy and] somebody turns a robot off, you’ve lost. On the other hand, if all he ever does is sit there until you ask for a voice command, then you might as well buy some other product,” Neuman says. “There’s this balance that’s pretty hard to strike. That’s where simulation and mood systems come into play.”
Yes, Vector can get moody. “If he’s been failing at a bunch of things lately, he’ll be more frustrated, and that will change how he reacts to things for a period of maybe 20 to 30 minutes,” Neuman says. The goal is to find a balance: encourage people to anthropomorphize Vector a little but not let it slip into the uncanny valley.
Anki has to strike another balance: privacy. So far, it seems to be doing the right thing. Anki isn’t storing any voice or video information in its cloud; all the computation on Vector happens locally. “I put us in the same bucket as Apple,” Sofman says. “Our goal is to sell robots that create a really amazing experience for you, it’s not to sell advertising, it’s not to mine your data.” Anki does send a voice snippet up to the cloud to voice-to-text translation, but for now, that’s it.
Anki is launching Vector today on Kickstarter — not because it needs the funding, but because it wants to drum up interest among its target audience of “tech-immersive adults and families.” It also has big ideas for what Vector could do after launch. There will be an SDK for developers in December, for example.
Another example: messaging. Rather than sending a text to a family member to remind them to take out the trash, you can tell Vector to convey it. When they get home, Vector will recognize their face and deliver your message, presumably with a twee animation to make the chore less onerous. Anki has proposed adding notifications, security camera features, Tile integration, and other features.
Until more features arrive, it’s kind of still hard to know what Vector is. It objectively can’t do as much as Google Assistant or Alexa. And the similar stuff it can do takes a little bit longer and is a little bit harder to hear. (You can’t fit a big speaker in such a tiny body.) Judged solely on utility, Vector doesn’t make a lot of sense as a $250 purchase.
But Anki isn’t selling utility. It’s also not selling Vector as a robotic pet. Again and again, when I asked what Vector really is, Sofer said that it’s “a home robot.” Vector’s real goal may be to have a bunch of people help define what that means.
After a day with the little bot, I have an easier time answering another question: what is Vector’s main purpose? It’s something we don’t really talk about much with tech products.
Vector’s main purpose is that it’s fun.