API.audio is now making more than 250 standard voices available to you. To top it off, we have various specialized voices available on request (like the voice that drove our Einstein avatar), and of course, you can also clone your own voice(s). It has never been easier to build innovative sounding audio. Get ready to build!
In 2018, we felt inspired by the new generation of speech synthesis models and saw a looming horizon of audio possibilities. So, we quit our jobs and set out to build a consumer audio app, and, well... failed miserably. There was just no infrastructure for developers to create modern audio applications, and everything we touched just took ages to build. So instead, we built API.audio - a one-stop-shop audio production tool for developers, enabling new boundaries in audio innovation. We took care of all the nitty-gritty for you to start building right away.
We created API.audio knowing that audio can be so much more than just a voice. It is the text to speech feature that makes audio scalable, which previously, has never been possible before. That’s why I’m going to explain in greater detail about that here.
One of the most powerful instruments ready for you within API.audio is the world’s most extensive artificial voice library - all available in one single API. Here’s an overview:
One API to rule them all
APi.audio currently integrates 7 different Text-To-Speech providers, allowing us to offer more than 250 voices through one API.
The tool also offers a large amount of specialized voices tailored to specific speaking scenarios and needs. You can add the specialized voices to your account and use them in the same way as the voices mentioned above. You can also have them speak to one another, like our Albert Einstein and Steven Hawking piece.
Another option is to clone a unique voice (or a thousand specific voices, e.g. those of your users) - you (and only you) can access them through the same API.
Ok, it looks like we’re onto something here. However, how does this benefit you in building your product?
More is not always better, but when it comes to synthetic voices, it definitely is.
Speech can be complex and having as many options as possible definitely helps. At Aflorithmic, new voices come out each week, and we'll gradually add them into the API for you to access freely.
But why do you need so many?
Great sounding audio is in the details: differences in voice are subtle but make all the difference in how your user interacts with your product. When you think about your audio experience, do you want the tone to sound active or reflected? Will the voice need to appeal to younger crowds, or would a local accent increase resonance? Do you want a unique branded voice or the voice of your company’s CEO? Even more thought-provoking, producing content in different languages is where synthetic audio truly shines (API.audio even has Telugu and Norwegian Bokmål).
But so many voice options can be quite overwhelming?
Exactly. That is why our experts annotate and curate all voices, so you don't have to. You can have a glimpse here on our overview page, where you can sort and filter by language, gender, rating, and so much more. We unify and digest different offerings to make your audio production easier than ever.
Moving beyond text-to-speech
Once you have found the right voice, next is making sure the audio you produce with it sounds great. API.audio has a bag full of little helpers that make that possible for you (e.g. filters and audio processing or content ingestion and formatting).
You can also easily do complex things like produce a track where several voices (maybe even from different providers) speak and integrate with an original voice recording that you or your voice artists upload. We also fully automate a proper studio production process, but we will tell you about that some other time ;)
Lastly, it will simply save you a lot of time and complexity. One API (API.audio) also means only one invoice. You do not need to hunt for the latest voices (we do that for you); developers save many headaches, and even if you do not have any audio expertise, we make sure your audio still sounds fantastic.

If you are a developer, here are a few advantages from your side of the business line:
- No need to integrate and maintain several APIs. We do this for you.
- Our tooling is created by developers for developers. It is nicely documented and built for you to have less work.
- SSML (Speech Synthesis Markup Language) is not as standard as the name might suggest. We came up with a robust approach that makes different voices play nicely with each other without you or your user needing to dive into the details.
- Audio gets complicated quite quickly. Different sample rates or levels, normalisation, effects. Yes, you can fix most of it with ffmpeg, but we have already built it all so it's ready for you to use.
- One of the main reasons people use synthetic voice is scalability, which normally goes hand in hand with versioning and/or personalization. With API.audio, you can easily import (text) content or connect to your CMS, version or personalize that content and transform it into a format that translates well into speech.
- Latency. If you need fast response times, we can make it happen. However, we baked smart caching into the voice creation process, often saving a lot on cost.
- A lot of complexity that we encounter is in the follow on processing once a speech file is created. We make it easy to put it where you want it to, in the format(s) needed, even embedded in a professionally sounding audio production if you want.
- Lastly, there are many voices out there, and we make it easy to present them to your business user or creator in an easy way. We annotate them, make them searchable and sortable, attach samples and pictures ready for you to build beautiful front ends and previews.
Sounds good, right? Don't forget to check out these links for more:
- Click here to start building with us
- Here to check out our voice library
- If you have heard a great voice that we are missing in API.audio, let us know, and we'll integrate it
- If you are a voice actor and want to hear your voice in API.audio, contact us here
- If you are a TTS provider that has created fantastic counting voices, contact us, and we'll integrate them
- You can also just say hi if you want - we'd love to hear from you
Conclusion
Audio is becoming, more and more, an essential part of the media landscape. Speech synthesis is getting ever closer to human-like characteristics, redefining what audio technology can do. With API.audio, you can access an infrastructure that combines all major speech providers and goes beyond rendering scripts with their voices. For the first time, you have a tool at your disposal that enables you to use the incredible power of synthetic media. It allows you to create one-to-one conversations with customers, personalized content based on your interests, real-time discussions with chatbots, and much more. Now is the time to build scalable audio.
About:
Aflorithmic is a London/Barcelona-based technology company. Its api.audio platform enables fully automated, scalable audio production by using synthetic media, voice cloning, and audio mastering, to then deliver it on any device, such as websites, mobile apps, or smart speakers.
With this Audio-As-A-Service, anybody can create beautiful sounding audio, starting from a simple text to including music and complex audio engineering without any previous experience required.
The team consists of highly skilled specialists in machine learning, software development, voice synthesizing, AI research, audio engineering, and product development.