Computer Vision Technology: Friend, Foe or Fraud?

by Larry Barrett | 10/1/14

Ubiquitous smartphones, cheap and abundant digital storage repositories and the continuous development of new software applications designed to parse, label and contextualize the trillions of digital videos and images generated every year has created a whole new universe of intriguing – and some might say disturbing – business opportunities. Isn’t that wonderful?

More than 100 hours of video content are uploaded to YouTube every minute. On platforms like Facebook, Instagram, Snapchat and Flickr, users upload almost 2 billion digital images every day. Who knows how many images are captured and then deleted – well, deleted from your phone – but not really deleted from your iCloud account? I wonder if that could ever present a problem

The volume’s staggering, but that doesn’t even count all the videos and digital images that are captured, reviewed and stored somewhere – at least not publicly – every minute of every day in public. I’m talking about the non-consensual, I’m-not-looking-for-attention stuff. Everything from ATMs, traffic lights and convenience stores to restaurants, hotels, highways and yes, even elevator cars has a video camera recording all of us, all the time — or pretty much, anyway. And to think or live otherwise is beyond naïve.

Couple that with advances in facial recognition technology and an emerging and (relatively) new field of entrepreneurial fixation known as computer vision technology and you can see why businesses and government organizations are so pleased our society has somehow managed to make George Orwell’s darkest piece of fiction seem hopeful in retrospect.

Beyond ‘Big Brother’

Computer vision technology means many things to many people, but generally it’s the use of hardware and software to help devices “see,” which, in turn, allows either the device or the machine or the robot or the person operating the device to learn and know and do things more efficiently.

According to Gartner, “computer-vision-based services use facial, object and motion tracking algorithms to identify images and objects. For example, being able to identify a shoe among numerous objects on a table, Google Goggles (imaged-based search), or optical character recognition (OCR).”

For example, you’re having lunch with a friend and you like his jacket and instead of just asking him where he got it, you can take a picture with your smartphone and within seconds you can find out the manufacturer, the price and where you can go to buy it. Another example would be cameras that capture license plates on cars commuting over the Golden Gate Bridge to record and charge their toll fees to a credit card or prepaid transit pass.

Still another example would be where you use your smartphone or your wearable device – like glasses perhaps – to take a picture of a stranger at a restaurant and have the software do a search on that image to find out this person’s name. Maybe even find out where they work or what they’ve tweeted about in the past hour or run an address check to find out where they live and then use Google Maps to take a Street View look at their house and then run that address through Zillow to see what that house is worth. You get the idea.

Not that anyone would do that kind of stuff. But, just in case, a startup called Image Searcher just this month rolled out its CamFind image recognition app for Google Glass if you’re interested.

Computer vision technology, which is sort of tucked under the more general umbrella of “augmented reality,” can also be used by firefighters to determine how hot a fire is burning in a building or quickly ascertain the layout of a building as they’re doing their heroic work. Within enterprise companies themselves, Gartner sees the technology “enhancing current business practices, facilitating and optimizing the use of current technologies and providing business innovation.”

I watched a video recently that showed this technology being used to collect $3 billion in toll fees, assist cops to spot carpool cheats and potentially help them find a missing or abducted child or an elderly person.

What caught my attention was how one of the company’s retail customers is using computer vision technology to monitor employee and customer behavior to identify the purported strengths and weaknesses of a particular business. As people made their way down a cafeteria line, little green and black boxes outlined the customers’ faces and bodies. Intermittently, adjectives like “neutral” and “happy” and “surprise” popped up in these boxes, as the camera captured their faces and the software discerned their reaction to whatever was going on as they made their way down the line. From this “wealth of data,” the computer vision system can supposedly help customers reduce lost sales opportunities and potentially increase their revenue by millions of dollars a year.

What’s a smile worth?

My personal take on this is that, as a consumer, I’m not a fan of being recorded at all. I’m there to get something to eat or a cup of coffee. I know that’s no longer the world I live in. But when I’m in line to get lunch or a cup of coffee, the last thing I’m really thinking about or paying attention to is the signage or the barista (usually) or the atmosphere. I’m thinking about many other things including things that could manifest in an involuntary facial expression or some allegedly telling body language that has nothing whatsoever to do with the retail experience unfolding around me. I’m on autopilot, doing this drill again as I have a hundred times before while thinking about how angry I am about something work related or how amused I was by something my kid said five minutes ago. Or more, likely, I’m not thinking about anything at all. But to then use my images, my reactions – again, without my consent – as a data point to either sell me something or to determine whether a particular product or employee is registering “happy” in my mind based on my facial responses is just as offensive as it is unreliable – especially if I know (eventually) this technology is likely to be in use.

I guess there’s nothing we can do to stop companies from intruding on our illusion of privacy or the sanctity of our human emotions and reactions. If there’s technology to be used in the name of efficiency to drive profitability, companies won’t hesitate to use it. But that doesn’t mean we have to cede our individuality or our humanity. If that means putting on our poker faces at Starbucks, so be it.

 

Larry Barrett is a freelance journalist and blogger who has covered the information technology and business sectors for more than 15 years. Most recently, he served as the online news editor for 1105 Media’s Office Technology Group and as the online managing editor for SourceMedia’s Investment Advisory Group publications Financial Planning, On Wall Street and Bank Investment Consultant. He was also a senior writer and editor at Ziff Davis Media’s Baseline Magazine, winner of the Jesse H. Neal National Business Journalism Award, and ZDNet. In addition, he’s served as a senior writer and editor at prominent technology and business websites including CNET, InternetNews.com, Multichannel News and the San Jose Business Journal.