By James Walker     Sep 23, 2016 in Technology
Google has announced a new version of its image captioning algorithm that describes the contents of images with 94 percent accuracy. It's almost as good at writing captions as humans are. It has been trained to emulate descriptions written by real people.
Google's image captioning software is part of its wider TensorFlow machine learning kit. Today, it announced a new release of the algorithm that comes with substantially improved performance. It is able to make more accurate descriptions that include more detail, enabling it to caption images with a standard as high as humans.
In a blog post, Google provided some examples of images captioned by the new algorithm. They include "A person on a beach flying a kite" and "A man riding a wave on top of a surfboard." The company also highlighted the accuracy improvements made since the last-generation technology. It now captions "A brown bear is swimming in the water" as "Two brown bears sitting on top of rocks," a description that is more applicable to the contents of the image.
Google has made the improvements by switching to the Inception V3 model for the image encoder. This gives the image captioning system an improved ability to recognise individual objects with images. In turn, this directly facilitates more detailed descriptions. The Inception V3 model achieves 93.9 percent accuracy on the ImageNet classification task.
Google's AI image captioning software
The image model has also been fine-tuned to emphasise describing images rather than classifying them. Inception V3 is primarily aimed at grouping images into categories, rather than describing their contents. Google has optimised the model so it can turn "a dog, grass and a frisbee are in the image" into a natural language response including the colour of the grass and the position of the dog relative to the frisbee.
The final significant improvement has been made to the training process. Google trains the algorithm by feeding it hundreds of thousands of images that have been manually captioned by humans. The AI analyses the image to find out what's in it and then associates the contents with the supplied caption. After the process is complete, it can begin to use elements of the captions in images that are visually similar to those used in the training process.
Google's AI image captioning software
The upgrade has been a success, according to Google. The new algorithm offers almost 94 percent accuracy, compared with the 89.6 percent obtained by Inception V1 back in 2012.
"Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images," said Google. "Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions."
This kind of image recognition technology has a range of applications once perfected. It could help to improve the accuracy of image search results, or provide automatic screen reader descriptions of pictures when a website doesn't provide them. The captions would also be useful in photo apps including Google's own Photos.
