Computers used to be blind, and now they can see. Thanks to increasingly sophisticated algorithms, computers today can recognize and identify the Eiffel Tower, the Mona Lisa or a can of Budweiser.
Still, despite huge technological strides in the last decade or so, visual search has plenty more hurdles to clear.
At this point, it would be quicker to describe the types of things an image-search engine can interpret instead of what it can’t. But rapid progress, coupled with the growing number of brilliant minds taking up the challenge, is making intelligent robo-eyesight within reach.
Hartmut Neven, an engineering director leading visual-search initiatives for Google, predicts that near-perfection could come in the next decade.
“Within 10 years, we can pretty much recognize, in principle, pretty much any object we’re interested in,” Neven said in a recent interview. “Scientific and technical progress is accelerating in an exponential (pace).”
Neven began his research in 1992, and under his own forecasted timeline, is essentially more than halfway to meeting his goal.
The product of his work and of a team of engineers is contained in a service called Goggles. It exists as a standalone application for Android phones and as a feature of the Google Mobile App for the iPhone.
With Goggles, the user snaps a picture, which is transmitted across cellular networks to Google’s servers. Google’s computers then tell the phone what they recognized in the photo. This process can take only a second or two — and sometimes even less.
Google’s algorithms, the lines of code that break down data into bits recognizable by machines, are good at picking out certain things.
Iconic buildings and artwork, products on store shelves, barcodes and magazine advertisements are a breeze. The system can recognize text on a poster and search the Web for a page with similar writing, or translate the menu at a French restaurant.
Microsoft also has a visual-search app for Bing, though its features are more limited.
So far, these computer systems are less skilled at recognizing humans. But the Google Goggles team is working on a system that can identify faces in photos, as long as those people say it’s OK for Google to include them in its database, Neven said.
But Google’s algorithms return no results for loads of common things. Furniture, clothing, accessories, gadgets, food, animals, cars, trees and many everyday objects are seen as foreign objects by the system.
“Our ambition is nothing less than being able to recognize any object on the planet,” Neven said. “But today, computer vision is not in that state yet. There are many things that, unfortunately, we cannot properly recognize.”
The biggest obstacles are in one category: Objects without a strong “visual texture,” and with few distinct markings. These include many products that are hard to identify without colorful packaging, such as purses, shoes and cell phones.
“Unpackaged products is something that has been a priority for a while, but it’s not easy to solve,” Neven said. “If we get that done much better, then suddenly 90% of relevant objects are in our reach.”
Google developers are hammering away at the problem. Neven is excited about the potential to enable the system to identify which species of tree a leaf fell from, or the model of car parked on the street.
In the meantime, Google lists the Goggles app in its Labs section, meaning the project is still in an experimental phase, a Google spokesman said. Allowing access this way lowers expectations and avoids exposing the technology to too many people who may find themselves turned off by the fact that it often fails to return accurate results.
The app displays a quick tutorial when it’s first launched. Google also showcases Goggles features that aren’t all that practical but that create buzz for the technology, such as a version that can solve Sudoku puzzles.
The Goggles app can also read QR codes, those black-and-white squares found on ads and posters that, when scanned by smartphones, access videos and other interactive content. Until Goggles can recognize everything, QR codes serve an interim need: Take a picture of something and get its digital counterpart.
Other image-search uses
This underlying image-search technology is important to many Google products.
The image-recognition algorithms help to recognize cars and people in Google’s Street View service in order to blur license plates and faces. They also help raise red flags when a photo reveals too much human skin, categorizing them for Google Images’s “adult” filter.
Neven joined Google in 2006 when the search giant acquired his company, called Neven Vision. His former colleague, Orang Dialameh, is the CEO at IPPLEX, a holding company that also has teams of engineers working on image-recognition projects.
Dialameh’s developers have employed cameras to build apps that can help identify objects, such as cash or cereal boxes, without requiring the user to snap a picture. Some of these apps are being marketed as utilities for blind people. IPPLEX’s next venture, Nantworks, will allow users to tag objects using a cell phone’s camera, Dialameh said.
Dialameh, who, like his colleague, is based in Southern California, faces many of the same obstacles that Google does — not the least of which is convincing people to use the apps in their daily lives.
“How will this become a consumer behavior?”
“We’re not used to taking out a camera and showing stuff to our phone.”
Others are embedding this kind of technology in more obvious applications. Face.com can examine photos on Facebook to identify people in pictures who weren’t manually tagged. In the same way, Neven’s technology at Google can be used to identify faces in a Picasa user’s personal photo collection.
But this facial-recognition technology, which sometimes thinks your sister is actually Grandpa, has a ways to go. And not everyone is sold on its usefulness.
“Before people-tagging came out, I think most people would have said that the best way to figure out who’s in photos was to have some face-recognition algorithm,” Facebook CEO Mark Zuckerberg said in an interview with several reporters after a news conference in November.
“But it actually turns out that the best way is to just have people tagged.”