Making Audio Production Scalable

You might not realize it but audio production is manual, slow and expensive. This is about to change. Read how and why here.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

When you think of it, audio production is... Well, do you actually think of anything at all?

The last five years have seen a strong increase in the consumption of audio. Statistically, half of all Americans aged 12-34 listen to at least one podcast a month. And it doesn’t stop there: Americans aged 55+ listen to 29% more podcasts compared to last year. Now, for the first time, there are more people listening to spoken audio content than churchgoers.

The latest contributor to the rise of audio has been the advent of the audio social network Clubhouse. This seems to have hit the nerve of people around the globe by creating a real-time audio forum, where anybody can join the conversation. So much quicker than typing posts!

But how do you “make” audio? Do you just take your phone, record something and then post it? Well, this is how a lot of people start. But then, little by little, something happens: you buy better microphones, look for 'signature' music, add sound effects and play around with editing software -THAT is audio production.

It’s a horizontal technology, meaning that many people need it across all kinds of sectors. Audio production itself hasn’t been automated to a great extend so far. You still need to have a recording studio, hire voice talent (unless it’s yourself), find the right music (and own the copyright) and engineer ('mix') the entire thing. It requires a whole bunch of expert knowledge and it costs time and money.

What if you could automate that process and make it scalable? What if you could do it so quickly that you could send dynamic, personalized audio tracks to each and every one of your listeners at the precise moment they would appreciate it most?

Unfortunately, you can’t... yet. Not with the current status quo. But thankfully, advancements in tech that change the way we think about audio are on the way. Text-to-Speech (TTS )and Voice Cloning allow you to create limitless versions of audio snippets using synthetic speakers. This field is developing so rapidly that we will soon not be able to distinguish a real speaker from an artificial one.

But how do you get from the artificial voice actor to audio that listeners would love to listen to? This is where Aflorithmic enters the conversation (pun intended). Using our API and underlaying technology, anybody can access a fully-automated studio. It contains professional-sounding speakers, music libraries, personalization parameters and some amazing sound engineers. And all of them are AI. Starting with a text, you can create infinite versions of your audios, produce them on demand and deliver them wherever you want: websites, mobile apps, smart speakers - you name it!


Imagine any book could be rewritten to star you. It could be read in the voice of your favorite superhero, your grandpa or aunt Mary who lives too far away to visit. You would be the main protagonist and while reading, your story would change depending on your decisions and actions.

Think of your favorite “Choose Your Own Adventure” book and imagine it being an audio story. It would be a truly magical experience and a great way to 'gamify' education for children. No more “Maths is boring”! But how would you make this happen? You would need deep experience in audio production. On-demand, in unlimited versions and with any voice you’d choose. Oh, and it should sound great too for an extra immersing experience.

"They're talking about my sister!"


A lot of apps rely on audio. Meditation and Mindfulness apps such as Calm or Headspace, or Fitness and Sports hits such as Peloton or Mirror.

Wouldn’t it be amazing if you could get your personal meditation journey, delivering an evolving experience with the speaker of your choice, your favorite music and references to earlier sessions?

We created the world's first cloned meditation trainer, Will from Beejameditation. Listen to an example of how his natural voice sounds, versus an automated production using his cloned voice, sound layering and mastering. All Will had to do was type a sentence and clicked "Produce". He never said the words in the second audio file. In this way, Will can ‘speak’ directly to each of his listeners without having to record thousands and thousands of names and messages:

And couldn’t you get the last 5% of your workout motivation from prompts of your virtual coach, who would let you know that you’ve “just taken the lead” in your friends’ workout group? They would address you by name and inform you that you will hit a “new personal best” if you “increase your cadence by 15% for the last two minutes of the workout”. “COME ON MATT, YOU CAN DO IT!”

Given the unlimited number of possible combinations of names and messages, it would be impossible for a human to deliver the exact information at the right time. You need a scalable, automated audio production solution for that starting from text.


What does a good salesperson do? They know what you want and help you find it at the best price. There is a human element to this and it’s huge. Good salespeople are hard to find.

However, there is a lot of information that can be delivered to potential customers strolling through your website or shop. Imagine you could add a voiceover to your listing, helping customers to understand how this product or service is the right choice for them.

What easily comes to mind is real estate, boat sales or any other product that requires in-depth explanation. Wouldn’t it be great if you’d have a virtual walkthrough, personalised to you, of an apartment that would give you a voiceover as you discover the rooms?

Real estate agents would type or paste a description into a text file, choose a celebrity as a speaker, add the best matching background music and let the automated production chain create a personalized voiceover to their users.

‘Paula from London’ who is looking for something cozy and quiet would get a different audio than ‘Gerry’, who wants an apartment close to a kindergarten. The dynamic, personalized audio could highlight those things to each of them, just like a great salesperson.

In this blog, we’ve outlined just three examples where automated audio production adds significant value. We believe that an automated audio production service is much more than just text-to-speech. You need beautiful-sounding audio that is personalized to your users, with a wide choice of background music to choose from and with a voiceover that sounds just like a professional recording. You could even connect your Slack channel as an input source or combine your tweets, make an audio track out of them and publish that mini podcast on your website or LinkedIn profile. All you need to start is a text you’d like to be read out loud..

It’s time to make your audio production scalable.

This is how we used to work in Pre-Covid times...


Aflorithmic is a London/Barcelona-based technology company. Its platform enables fully automated, scalable audio production by using synthetic media, voice cloning, and audio mastering, to then deliver it on any device, such as websites, mobile apps, or smart speakers.

With this Audio-As-A-Service, anybody can create beautiful sounding audio, starting from a simple text to including music and complex audio engineering without any previous experience required.

The team consists of highly skilled specialists in machine learning, software development, voice synthesizing, AI research, audio engineering, and product development.