Earlier this month, Boaty McBoatface became the favourite in a public online poll to name a UK research ship for polar exploration. Although the ship has now been named after Sir David Attenborough, the spirit of “Namey McNameface” looks set to live on.
Parsey McParseface is part of SyntaxNet, an open-source neural network that Google released to the world yesterday. SyntaxNet is able to work out the meaning of written sentences by assigning each word a tag that indicates its function. It can then work out the relationship between the tags to decide what the sentence means.
In a technical blog post, Google explained how the system can correctly interpret sentences such as “Alice saw Bob” and “Alice, who had been reading about SyntaxNet, saw Bob in the hallway yesterday.” In the first example, SyntaxNet realises that “Alice” and “Bob” are nouns and “saw” is a verb, allowing it to conclude that “saw” is the root of the sentence. It can then determine the subject, “Alice,” and the object, “Bob.”
This kind of natural language parsing has traditionally posed a problem to computers. Simple sentences are easy to process but complex ones with 20 or 30 words could have hundreds or thousands of possible syntactic meanings. It can be difficult for a computer to determine the true role of each word in the sentence.
“A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context,” said Google. It used an example of a sentence with two distinct meanings depending on interpretation, “Alice drove down the street in her car.”
The correct meaning is obvious to humans: Alice was in her car, driving down a street. Computers could incorrectly decide the preposition “in” refers to the street though, concluding that Alice drove down a street that was inside her car. To humans, this interpretation isn’t even considered because of its implausibility. Computers are unable to handle the subtleties of this ambiguity though, a problem SyntaxNet is designed to solve.
“Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same,” said Google. “Multiple ambiguities such as those in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence. Usually the vast majority of these sentences are wildly implausible, but are nevertheless possible and must be somehow discarded by a parser.”
SyntaxNet works by using a neural network to read a sentence from left to right. Dependencies are added incrementally as each word is processed. When a decision about meaning has to be made, the neural network always favours the most plausible scenario.
Even implausible scenarios are retained throughout the decision process though, in case later parts of the sentence suggest they could be correct after all. Because there are multiple hypotheses available at each stage of processing, SyntaxNet can develop a much more accurate interpretation than other systems.
Parsey McParseface is an extension of SyntaxNet that works with the English language. In testing, Google found it could identify individual dependencies with over 94 percent accuracy, better than any previous approach. In a set of sentences pulled from the Internet, Parsey McParseface achieved just over 90% accuracy.
Google said the accuracy “is not perfect” but “it’s certainly high enough to be useful in many applications.” The company intends to keep developing the technology, improving Parsey McParseface to work across all languages and contexts.