Google’s research scientists and a separate team of scientists working at Stanford University have independently developed an image recognition software that tends to recognize as well as describe the content of photographs and videos far more accurately than ever before.
It has been found that the software’s description of images and pictures are more or less similar to human being’s description of the same.
The software’s mechanism is based on the use of two neutral networks, the first one which deals with image recognition and the second one which deals with natural language processing. Although the system is not without errors, it performs comparatively well when evaluated against metrics used to judge the quality of descriptions.
“We’ve developed a machine-learning system that can automatically produce captions to accurately describe images the first time it sees them,” Google noted in a recent blog post.
The software giant claims that this kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images.
The researchers said that they would continue to develop the systems to improve its accuracy.
“A picture may be worth a thousand words, but sometimes it’s the words that are most useful — so it’s important we figure out ways to translate from images to words automatically and accurately. As the datasets suited to learning image descriptions grow and mature, so will the performance of end-to-end approaches like this. We look forward to continuing developments in systems that can read images and generate good natural-language descriptions,” they wrote.
“I consider the pixel data in images and video to be the dark matter of the internet,” said Li Fei-Fei, director of the Stanford Artificial Intelligence Laboratory, who conducted the other research with Andrej Karpathy, a graduate student.
“We are now starting to illuminate it.”
Li and Karpathy’s research was published as a Stanford University technical report, while the Google team paper was published on arXiv.org, an open source site hosted by Cornell University.