It’s an intuitive step: you’re sat at home flicking through a magazine and you spot a photograph of a nice looking dish. Is this easy to make? A one second scan with an app and the recipe for what you were looking at appears, complete with the cooking instructions. Such technology, which relies on artificial intelligence in order to select the correct recipe based on an image alone (was that image a pilaf or a jambalaya?)
The technology, which will appeal to a number of food related companies, has been developed by the Massachusetts Institute of Technology together with the Qatar Computing Research Institute. The technology, called Pic2Recipe, relies upon a special algorithm. To develop this, researchers trained the neural network by showing it a dataset of one million photos and one million recipes. The task was to ‘learn’ how to match these up successfully, with the longer-term aim of enabling consumers to see the very food they eat in a different way.
The video below reveals more about the technology:
In addition, an on-line demo has been set-up for people top lay around with and see how the technology is progressing.
The trial is on-going, with the researchers reporting greater success with desserts. More challenging, and where further work is required, is with things like smoothies and sushi. According to MIT researcher Yusuf Aytar, who told the BBC why progress was gradual was because: “In computer vision, food is mostly neglected because we don’t have the large-scale datasets needed to make predictions.” However, he added, that by taking thousands of images of food from social media the neural network was learning fast. An update is planned for the the Computer Vision and Pattern Recognition conference in Honolulu, which takes place towards the end of July 2017.
For businesses there are many potential digital applications, such as people using the technology to track their daily nutrition, or, alternatively photographing their meal at a restaurant and then knowing everything they need to cook the dish at home later. The technology has been detailed further in a white paper titled “Learning Cross-modal Embeddings for Cooking Recipes and Food Images.”