Connect with us

Hi, what are you looking for?

Tech & Science

Op-Ed: Spotify, voice profiling, and the Rube Goldberg Effect of psych tech

Spotify, the Thinking Person’s Best Way of Annoying Musicians, has a patent for voice analysis to manage customers.

Spotify on iPhone. — Photo: © Jonathan Nackstrand, AFP
Spotify on iPhone. — Photo: © Jonathan Nackstrand, AFP

Voice analysis is built-in to humans. If you’ve ever met a human (the risk remains) you can guess a lot by intonations, emphases, and other subtle indicators. That’s if you have a brain, social skills, or some other optional extras, of course.

Spotify, the Thinking Person’s Best Way of Annoying Musicians, has a patent for voice analysis to manage customers. They’re not alone. Many other ferociously pig-ignorant companies use these tools in customer service. Professor Joseph Turow of the University of Pennsylvania has an interesting take on this tech in The New York Times.

Turow explains in mercifully unambiguous terms the level of psych involved in these weighty deliberations and the tech used to do so. His NYT piece is well worth reading for that very clear layout of the issues.

I’m not going to rehash a good narrative. READ IT. So let’s cut to the chase about using these things to manage customers.

Sales factors? Damn straight there are sales factors.

I did 20 years in customer service. I’ve dealt with people in any emotional state from infuriating arrogance to utter despair; from complaints to negotiations with third parties for real money. I’m not impressed with this tech in any possible sense.

The theory of voice analysis is pretty simple, in fact, far too simple. Different emphases and intonations obviously reflect emotional and social interactive conditions. You can digitize this and add algorithms to your insular generic managerial heart’s content.  

This IS a form of biometrics. Voice can be used as a reliable identifier of individuals. This is hardly new. It’s simply been repackaged as some sort of revelation.

As a business asset, it’s a black hole for money at best. It’s for those who know nothing about sales, business, customers, or simply lack all possible basic social skills.

Bear with a little scenario-building here:

  • People’s mindsets change and vary regularly, and often.
  • The voice reflects at best a snapshot of a moving mindset.
  • The source of negative factors may have nothing at all to do with the phone call.
  • There may be some external mood-adjusting factor.  
  • Algorithms may read the effects, but can’t possibly manage or identify the cause of the mood variations.

Hm. You mean they people-critters done got all het up about things and you and your algorithm don’t know why? Gosh.

Yep, gosh, and in this case, gush. The languid pseudo-tech babble hype about “analytics” leaves a lot to be desired. You can actually hear this tech dating by the second. Real market analysts and real tech people wouldn’t get so casual about this “aspirin for sales” approach. It could easily be proven wrong, and/or expensive.

It’s also, interestingly, quite unnecessary. Most humans (sorry to keep bringing up this unpleasant subject) have things called minds. Nobody knows or cares why, but there it is. These minds are highly tuned to (may this never be a buzzword) “social factor recognition”.

Facial recognition is well-known as a major priority in human hardwiring. Voice and threat recognition are pretty much in the same bracket. Good customer service people have exceptional skills in these areas. That’s why, and how, they sell.

A good salesperson will hear a ticked-off voice and turn it around. You can actually do people a favor this way. A lousy day turns into a sort of remedial buy-something-you-actually-want day. You can get people out of negative mindsets.

You can also find out what the hell is bothering them, and fix it. Nobody rings customer service for the entertainment. They want something. The algorithm is blissfully unaware of this very obvious fact, and simply reads things before or during any sort of interaction.

The Rube Goldberg Factor

Rube Goldberg was the inventor of those truly fun, unnecessarily complex toys and processes. Think of the cartoonish scenes where you need a cruise ship to make a piece of toast, etc., or as Daffy Duck said, “Hey bub! You need a house to go with this doorknob!”

The thing here is the sheer scale and complexity of things you don’t need. That’s one of the ways of looking at voice analysis. Analysis can’t manage the issues, or affect them in any way. It requires hands-on, eyes-on, wits-on to manage any of the actual situations.

Despite being totally unimpressed and inclined to ring up the Smithsonian for a finder’s fee, to be fair to this tech:

  • Proper calibration could find a lot of useful information about respiratory conditions, nasal obstructions, etc.
  • Speech centers may also be analyzed with good calibration to pick up neurological hiccups, impediments, etc. (It’s a dots-joining exercise.)
  • No doubt at all that sound and tissue resonance do have a lot to say about each other. Not just ultra sound, but also infra-sound, etc., which relates to tissues.

It’s as a reliable psych and sales tool I’m not at all sold on this tech.

Spotify’s patent, backlash, and a few absurdities.

Spotify’s patent is a strange beast indeed. It refers directly to the “emotional state of a speaker”, which many people would consider private by definition. It also refers to age, gender, and many other things anyone with ears could easily identify.

The signal is filtered and formatted before being retrieved. …Does this mean it’s “sanitized” for analytical purposes? If so, it’s quite capable of defeating its own purpose. Removing some aspects of a voice can change interpretative values pretty easily.

The patent also factors in historical listening patterns and other not-very-well-defined things to create a playlist based on “taste”. Whose taste? The algorithm’s? The statistical base, plus or minus biases?

Nobody listens to music like that.

OK…Let’s put this in the wash and see what could happen:

  • Someone coming off a week-long meth smoking binge gets the Oldies Channel stuff. (…And probably doesn’t notice…)
  • Someone who’s just crashed their car and been busted for something trivial but expensive gets Death Metal. (…And definitely does notice…)
  • A tweaked calibration bridges the gap between reggae and Good Old Boys anthems.
  • The listener likes jazz and is therefore un-analyzable. (There are many good reasons for liking jazz, and this is one of them. By the way, can we be a bit less “relaxed” with new jazz? It’s getting obscene.)
  • The listener likes new advanced indie tech music and is therefore un-analyzable. The playlist hasn’t caught up with the new stuff.  

 Suggestion – Leave sales to the sales people who have reflexes this clockwork carrot of a technology will never have. Use the tech for something it can do, not something it can’t.

Spotify, if you have a brain in your heads, (which according to us musicians has yet to be proven in dollar terms), rethink this thing. Never mind the ethics, never mind the human issues, never mind the possible class actions.

The damn thing just can’t and won’t work. If you want to throw money at things, do user discount deals, do better royalties, have lives, whatever.

Written By

Editor-at-Large based in Sydney, Australia.

You may also like:


If you check out rentals in Sydney, you’re in for a wild ride of prices and problems.


London's under-fire police force strip-searched more than 600 children over a two-year period.

Social Media

Foreign ministry spokeswoman Hua Chunying suggested the popularity of mainland Chinese food in Taiwan proved the island belonged to Beijing.


All eyes are now on the release this week of US July inflation data, which is expected to show a slight slowdown from June.