AWS Polly

Amazon has made it easy for Developers to build applications which can talk. The new AWS Polly services allows you to pass a text and Amazon turns it into a life like speech in real-time. It supports >24 languages in varied male and female voices and the output is pretty impressive when compared to similar offerings. Advanced users can also make use of SSML, custom Lexicons(dictionary), Speech Marks and more to customize the speeches as per their need. You can signup/login to your AWS Console to see Polly in action and play-around with it here.

With a free-tier offering of 5 million characters and pay-as-you-go $4.00 per million thereafter makes it a ‘must add’ magic ingredient to your apps! Moreover the responses can be saved and cached for offline replay which can further reduce the costs significantly.

AWS PollyAWS Polly

Let’s have a look at how easy is it to write a script which uses the AWS Polly service to add voice to your texts. In most cases you will need to stream the audio response to your front-end interface, for the sake of simplicity we will go ahead and create a quick Node.js script which plays it directly on the host’s speaker. For this tutorial we will be assuming that you are familiar with the basics of Node.js. You will also need a registered AWS Account and an authorized IAM User Access Key and Secret.

To begin, initialize a Node.js project using the npm init command.

Following npm modules need to be installed to the project:

  1. aws-sdk  – AWS JavaScript SDK which provides interface to use the Polly service.
    npm install aws-sdk --save
  2. speaker – Stream PCM audio to the speaker. It can also be used to stream mp3 audio to a browser client.
    npm install speaker --save

Create a generic file polly.js which can directly be included in any node application to convert text to speech. In polly.js require the discussed modules and configure your AWS Credentials.

var AWS = require('aws-sdk');
var Speaker = require('speaker');
var Stream = require('stream');

//Warning: Don't hardcode your AWS Keys
//Read more at http://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html
var Polly = new AWS.Polly({
    region: 'eu-west-1',
    accessKeyId: 'REPLACETHISWITHYOURACCESSKEYID',
    secretAccessKey: 'rePlaceThisWithYourSecretAccessKey1234567890'
});


Warning!
 Don’t hardcode your AWS Keys. Read more on AWS recommendation for storing Access Keys.

Prepare the Speaker to receive PCM audio from Polly. Parameters will depend on the output type and can be checked from here.

var getPlayer = function() {
    return new Speaker({
        channels: 1,
        bitDepth: 16,
        sampleRate: 16000
    });
}

Since this will be a reusable wrapper function, the getPlayer function will return a new Speaker instance for every call.

Once the Speaker instance is pre-configured, all we need to do is call the Polly.synthesizeSpeech API with the text and additional parameters and stream the response to a new Speaker instance. The speak function is exported and can be called from any JS code in your application.

var params = { OutputFormat: 'pcm', VoiceId: 'Raveena' }
var speak = function(text) {
    params.Text = text;
    Polly.synthesizeSpeech(params, function(err, res) {
        if (err) {
            console.log('err', err)
        } else if (res && res.AudioStream instanceof Buffer) {
            var bufferStream = new Stream.PassThrough()
            bufferStream.end(res.AudioStream)
            bufferStream.pipe(getPlayer());
        }
    })
}
module.exports = { Speak: speak };

VoiceId can be chosen from the Polly console and valid ids. Additional parameters are defined here.

We are all set. Now, from any file, say index.js, require polly.js and call the Speak function:

var Polly = require('./polly');
Polly.Speak('Node.js is beautiful and so is AWS Polly!');

Run the index.js file using node index.js  command and let your computer talk to you 🙂

As mentioned, the script used in this example will stream audio to the host speaker, but with little work it can also be streamed to a browser or a phone app. Current example will come in handy if you are working on one of those IoT projects on a Raspberry Pi or similar.

In comments let us know how you used Polly for your project.

Looking for Node.js Development Company, Hire our dedicated developers!

  1. I’m confused. I assume the code here is server side. How do you call it from the client especially if you are using a server like Windows Server?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.