Rachel Content Marketer at CleverX with a massive thing for people, conversations, and marketplaces.

Transcriptions: Mind Versus Machine

3 min read


Harvard psychologist Daniel Gilbert once said that every psychologist must, at some point in their career, write a version of “The Sentence.” It reads:

The human being is the only animal that ______.

This sentence is all-pervasive today, applicable to every field and every profession. With the rise of robots in the hyper digitization era, industries fizzle out overnight and everything that can be automated is going to be automated. The story of the humans’ sense of self is an improvisation of this sentence. Except now, it’s not just the animals we’re competing with.

The ability to speak a language is off the list. If it was intelligence that first set us apart, it is the least of our concerns now. Machines can apply knowledge at mind-numbing speeds. Some of the field’s earliest questions: can machines think? Can they have intelligent minds? The hope is to not merely romanticize the notion and defend the human race. This blog is not going to be another interpretation of Garry Kasparov’s chess match against Deep Blue either.

A few facts

Can machine transcriptions completely replace human transcriptionists?  Let’s take an honest look at what’s in store with some facts.

  • If you are a user to whom speed matters the most than machine-based transcription can be the cup of your tea. Various applications and software can be explored online. In minutes or even seconds, you convert your talk into text. In less than 30 minutes the transcription app can convert a video for 2 hours into text. However, to produce results at a fast pace won’t be feasible for humans without compromising on accuracy.
  • Every individual has a unique way of communicating which is not easy to differentiate by machines which are made even more complex with varied dialects as well as background noises at times. If your work includes complex terms relying on a machined code can be risky.
  • People involved in fieldwork like journalists or interviewers need to engage with different speakers. Their work may involve three or more speakers in a session and three or more sessions in a recording. No machine can accurately differentiate different vocals, accents, and personal communication styles. To achieve accurate results you will need human transcription.
  • The ambiance of recording is an important part of the transcript. At times you might have to switch to the machine where background noise is making the task tougher.
  • Content quality will define a person’s sense of selfhood, meaning, and brand perception. Take, for example, video captions, video subtitles, and audio transcripts. These things can hardly be templatized or straitjacketed.

Let’s see what happens when we redraw these battle-lines on the field of action and from the vantage view.

The devil in the details

Think about captioning, for instance. All the words should be there in order with the correct use of grammar and spelling. The placement and flow of text to align with the speaker’s tone of voice needs. ‘To-may-to’ or ‘to-mah-to’, makes a world of a difference. Machines can be fast with their fluidity and turn-taking and choppy grammar. What is lost in eloquence is made up for in agility. 

Meanwhile, on the balcony

Transcribing as a job may be banal in itself. For the money’s worth, bots can be efficient. If your transcription requires a stateless form, where the current discussion is without any knowledge of the history, perhaps they can also be witty enough. You can save your staff the drudgery of typing out conversations but can you articulate the human behind the text? Can machines learn how to be better friends, artists, teachers, parents, and lovers? 

Some of the most profound questions can also be practical ones: How do we connect meaningfully with other human beings? What does empathy have to do with it? How does another human being enter our life as a stranger and come to mean something? And how can machines be more human for all that?

‘Or’ or ‘And’?

The question then isn’t whether we need to choose one over the other. We’ll need to make the best of both. The gig is simple: transcribe, review, approve, and publish. But humans are far too nuanced and complex. Besides, there are other logistical and technical issues that need a human watch. Think engagement rates, search engine optimization, better quality, and formatting.

Relevant users can make do with the kind of transcription software we have today. Like a journalist sweating a deadline or a transcriptionist at a judiciary deposition. But to connect with digital native communities, there’s a long way to go. Podcasts and YouTube videos can’t be standardized. For this reason, the stenographer’s job won’t entirely go dodo on us as the ice-cream man did.

You get to pick 

If you need an immediate turnaround, have a limited budget, and have ridiculously a large amount of content you need transcribed. In each case, make sure you still have a clear audio file, accuracy isn’t the top of your concerns, have a disregard for SEO, and have time on your hands to edit the file.

Hire a transcriptionist if:

  • There are contextual elements that are important,
  • You need to have accurate captions,
  • You need to defend your brand’s perception,
  • Don’t have the time to edit and do rote work,
  • You want to formally publish your work,
  • Want a significant SEO boost,
  • Have a low-quality recording,
  • Or have one with foreign languages, diverse dialects, heavy accents, multiple speakers, or jargon.

Will transcriptionists ever go extinct?

You see, both the temp pecking at her computer and the professional transcriptionists are crucial to many important fields of business, study, and authority. Researchers, educators, government and parliamentary proceedings, corporates, and legal firms need work done with accuracy, emotion, and speed. Their specific target audiences will need to experience the human behind the text. And we can do better than emoticons. 

Our ability to dodge a question, lighten the mood, woo the listener, change the subject, kill time, distract, deflect, project, with all its broken speech, sighs, ums, ahs, and ughs is what makes us human. And each new step towards AI will determine how human we are in our documented conversations and how history will be told honestly.  


Rachel Content Marketer at CleverX with a massive thing for people, conversations, and marketplaces.

Get the latest market research insights