As any user of Facebook will immediately tell you, photos play a major function in sharing of life stories. From hiking mountains to capturing the beauty of a well-prepared dinner, photos add to the story tremendously. After all, a picture is worth a thousand words, right? (And thus the success of emoticons, I think).
But what if those photos cannot be seen by visually challenged FB friends? Not to worry. Facebook’s engineers have harnessed the power of an artificial intelligence network (AI) to describe these pictures to blind or partially blind users.
Facebook calls the system “automatic alternative text” and it’s based on a neural network primed with billions of parameters and millions of examples. Such neural networks – vast, complex databases designed to mimic the human brain as closely as possible – are playing an increasingly important role in modern computing.
The AI doesn’t actually “see” what is in the photo, and, as with all things computer at the moment, doesn’t understand context (and context is vital for a true understanding of photos). Rather, the AI compares the objects in the image with its “its vast internal database of similar photos and make an educated guess about what’s being shown.”
As to context, part of the challenge is in getting computers to recognize what’s most important in an image, whether that’s the people, the background, or the “action.” This requires a great more programming, but that’s in the future.
For now, the AI system returns a confidence score indicating how sure it is that it can identify what’s in the picture. If this is above 80 percent, an automatically-generated caption appears.
When objects and people have been identified, Facebook’s software constructs a sentence to describe the picture, usually ordered by how confident the AI is about the presence of each element. If there’s some ambiguity about the picture then the sentence starts with “image may contain” to express that uncertainty.
The feature is live now in the Facebook iOS app, as long as your language is set to English and you’re in the US, UK, Canada, Australia, or New Zealand. Facebook says it hopes to roll out the service to more platforms, languages and markets in the near future. It actually works with any screen reader software – on iOS you can enable it via the VoiceOver tool in the Accessibility section of Settings (under General), for example.
I wonder what the AI writes about such photos: