Facebook has its own AI research group (FAIR) that does open research in AI. One of their many projects, Detectron, sees AI utilizing their image detection algorithms, like Mask R-CNN, for what they call “object detection research”. The Mask R-CNN algorithm is able to detect and identify different objects in a single image, called “object instance segmentation”, and that is just one of the object detection algorithms Detectron uses.
They’ve also put out an excellent primer video on how they teach machines to learn. This is what makes Facebook’s feed appear so tailored to you — not just through what you search and write, but through what pictures you post. It only makes sense that they’ve decided to use Instagram as a resource as well.
One of their initiatives that was spoken at-length about during the Facebook F8 developer summit was A-STAR or A* (Agents That See Talk Act And Reason).
Earlier today, Facebook CTO Mike Schroepfer mentioned our work on Embodied Question Answering (pic.twitter.com/hivrjBU7qN
— Abhishek Das (@abhshkdz) May 3, 2018
And a more focused talk by @deviparikh, @DhruvBatraDB covering work coming out of our lab (includes Grad-CAM, Visual Dialog, EmbodiedQA): pic.twitter.com/HVNBeb582B
— Abhishek Das (@abhshkdz) May 3, 2018
Act more human
A-STAR hosts different projects under its umbrella, one of them being the Visual Dialog project. In this project, researchers teach AI to analyze pictures and respond to questions asked by users about the picture. They’ve got it to a point where, when asked a question, the machine answers in very simple terms. The goal is to get the machine to answer like a human would, with nuance and further explanation.
“Machines are much more curt and to the point because generating long responses is still a challenging scientific problem,” said Dhruv Batra, a research scientist at Facebook.
A pattern the researchers noticed was that the machine would continually give “safe answers” of “I can’t tell” or “I don’t know”.
“We believe the reason this happens is because that such safe responses actually are valid responses to a number of questions,” said Batra. “So the agents discover this bias in the data set and amplify it, getting partial credit for just being safe and hedging their bets.”
One of the applications specifically mentioned was helping visually-impaired users understand what’s going on in a picture that a friend posted on social media—not just what’s in the photo, which is what Facebook can do already.
They almost don’t need us
Another project under A-STAR is focused on taking humans out of the question, literally, by using Deep Reinforcement Learning and having two machines, Q-Bot and A-Bot, interact and ask each other questions to figure out details from an image. Q-Bot is blindfolded, A-Bot “sees” the picture. Q-Bot asks A-Bot questions, A-Bot responds in kind and at the end a pool of images is revealed and Q-Bot has to pick the image in question.
“A 20-questions game, if you will,” said Batra, calling this interaction “self talk” or “self play”.
The next frontier
A third project under A-STAR, called Embodied Question Answering, is one where an agent is asked a question in a virtual environment where it has to take an action, like move around and leave the virtual room and travel to another, to get the answer. Some have called it an AI scavenger hunt that could eventually lead to the development of home robots.
Excited to announce Embodied Question Answering:
— An agent is spawned in a 3D environment and asked a question (‘What color is the car?’).
— It must intelligently navigate the environment and gather information via first-person vision to answer the question (‘orange’). pic.twitter.com/xgNeA2Qi2h— Dhruv Batra (@DhruvBatraDB) December 1, 2017
Batra said that this project under A-STAR is the jumping-off point for the types of AI systems Facebook wants to develop, someday.
“We believe this sort of a task really serves as a benchmark for the kinds of AI systems we want to develop,” he said. “Agents that have the ability to understand language…understand computer vision…and use common sense.”