Google LLC today rolled out a new version of Open Images, its photo dataset for artificial intelligence research, that adds millions of additional data points and a feature dubbed “localized narratives” intended to aid academic projects. First released in 2016, Open Images contains 9 million photos annotated with descriptive labels. Such datasets play an important role in the AI ecosystem. They’re used by researchers to develop new varieties of machine learning models for tasks such as object recognition and autonomous driving. Beyond providing freely usable photos, Open Images contains millions of annotations valuable for AI training. Untrained neural networks can’t recognize objects in photos on their own and therefore need metadata such as annotations to understand what’s on the screen. The more detailed the metadata, the better an AI can learn.
The new version of Open Images released today adds 23.5 million “photo-level” labels verified by humans that provide a general description of what’s happening in images. The database now has 59.9 million of these labels in total. Google has also added more situational annotations, including 2.5 million labels that describe actions performed by people in photos and another 391,000 that describe relationships between objects. The main highlight, however, is Google’s localized narratives. Those are a new type of annotation developed by the search giant that it hopes will allow AI models to glean more information about an image than older annotation methods.
Google generates localized narratives by having human annotators hover their mouse over each object in a photo and describe it with their own words. A recording of their cursor movements is then paired with the natural-language description so that every single word can be associated with the object it applies to. This approach, Google says, will allow AI models to learn more effectively when trained on the Open Images dataset. “To get a sense of the amount of additional data these localized narratives represent, the total length of mouse traces is ~6400 km long, and if read aloud without stopping, all the narratives would take ~1.5 years to listen to,” Google research scientist Jordi Pont-Tuset detailed in a blog post.
Google has created localized narratives for about 500,000 Open Images files so far. The update represents “a significant qualitative and quantitative step towards improving the unified annotations for image classification, object detection, visual relationship detection, and instance segmentation,” Pont-Tuset wrote. “We hope that Open Images V6 will further stimulate progress towards genuine scene understanding.”