Raspberry Pi Voice Recognition Works Like Siri

Raspberry-Pi-Siri-voice-recognition

Raspberry Pi Speech Recognition Introduction

This tutorial demonstrate how to use voice recognition on the Raspberry Pi. By the end of this demonstration, we should have a working application that understand and answers your oral question.

This is going to be a simple and easy project because we have a few free API available for all the goals we want to achieve. It basically converts our spoken question into to text, process the query and return the answer, and finally turn the answer from text to speech. I will divide this demonstration into four parts:

  1. speech to text
  2. query processing
  3. text to speech
  4. Putting Them Together

Result Example:

Raspberry Pi Voice Recognition For Home Automation

This has been a very popular topic since Raspberry Pi came out. With the help of this tutorial, it should be quite easily achieved. I actually having an idea of combining the Speech recognition ability on the Raspberry Pi with the powerful digital/analog i/o hardware, to build a useful voice control system, which could also be adopted in Robotics and Home Automation. This will be in the next couple of blog posts.

Hardware and Preparation

Voice speech recognition raspberry Pi

You can use an USB Microphone, but I don’t have one so I am using the built-in Mic on my webcam. It worked straight away without any driver installation or configuration.
8601p

Of course, the Raspberry Pi as well.
images

You will also need to have internet connection on your Raspberry Pi.

Speech To Text

Speech recognition can be achieved in many ways on Linux (so on the Raspberry Pi), but personally I think the easiest way is to use Google voice recognition API. I have to say, the accuracy is very good, given I have a strong accent as well. To ensure recording is setup, you first need to make sure ffmpeg is installed:

sudo apt-get install ffmpeg

To use the Google’s voice recognition API, I use the following bash script. You can simply copy this and save it as ‘speech2text.sh


#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

echo -n "You Said: "
cat stt.txt

rm file.flac  > /dev/null 2>&1

What it does is, it starts recording and save the audio in a flac file. You can stop the recording by pressing CTRL+C. The audio file is then sent to Google for conversion and text will be returned and saved in a file called “stt.txt”. And the audio file will be deleted.

And to make it executable.

chmod +x speech2text.sh

To run it

./speech2text.sh

The screen shot shows you some tests I did.

Raspberry Pi Voice Recognition Works Like Siri

Query Processing

Processing the query is just like “Google-ing” a question, but what we want is when we ask a question, only one answer is returned. Wolfram Alpha seems to be a good choice here.

There is a Python interface library for it, which makes our life much easier, but you need to install it first.

Installing Wolframalpha Python Library

Download package from https://pypi.python.org/pypi/wolframalpha, unzip it somewhere. And then you need to install setuptool and build the setup.

apt-get install python-setuptools easy_install pip
sudo python setup.py build

And finally run the setup.

sudo python setup.py

Getting the APP_ID

To get a unique Wolfram Alpha AppID, signup here for a Wolfram Alpha Application ID.

You should now be signed in to the Wolfram Alpha Developer Portal and, on the My Apps tab, click the “Get an AppID” button and fill out the “Get a New AppID” form. Use any Application name and description you like. Click the “Get AppID” button.

Wolfram Alpha Python Interface

Save this Pyhon script as “queryprocess.py”.


#!/usr/bin/python

import wolframalpha
import sys

# Get a free API key here http://products.wolframalpha.com/api/
# This is a fake ID, go and get your own, instructions on my blog.
app_id='HYO4TL-A9QOUALOPX'

client = wolframalpha.Client(app_id)

query = ' '.join(sys.argv[1:])
res = client.query(query)

if len(res.pods) > 0:
    texts = ""
    pod = res.pods[1]
    if pod.text:
        texts = pod.text
    else:
        texts = "I have no answer for that"
    # to skip ascii character in case of error
    texts = texts.encode('ascii', 'ignore')
    print texts
else:
    print "Sorry, I am not sure."

You can test it like this shown in the screen shot below.

Raspberry Pi Voice Recognition Works Like Siri

Text To Speech

From the processed query, we are returned with an answer in text format. What we need to do now is turning the text to audio speech. There are a few options available like Cepstral or Festival, but I chose Google’s speech service due to its excellent quality. Here is a good introductions of these software mentioned.

First of all, to play audio we need to install mplayer:

sudo apt-get install mplayer

We have this simple bash script. It downloads the MP3 file via the URL and plays it. Copy and call it “text2speech.sh“:


#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=$*"; }
say $*

And to make it executable.

chmod +x text2speech.sh

To test it, you can try

./text2speech.sh "My name is Oscar and I am testing the audio."

Google Text To Speech Text Length Limitation

Although it’s very kind of Google sharing this great service, there is a limit on the length of the message. I think it’s around 100 characters.

To work around this, here is an upgraded bash script that breaks up the text into multiple parts so each part is no longer than 100 characters, and each parts can be played successfully. I modified the original script is from here to fit into our application.


#!/bin/bash

INPUT=$*
STRINGNUM=0
ary=($INPUT)
for key in "${!ary[@]}"
do
SHORTTMP[$STRINGNUM]="${SHORTTMP[$STRINGNUM]} ${ary[$key]}"
LENGTH=$(echo ${#SHORTTMP[$STRINGNUM]})

if [[ "$LENGTH" -lt "100" ]]; then

SHORT[$STRINGNUM]=${SHORTTMP[$STRINGNUM]}
else
STRINGNUM=$(($STRINGNUM+1))
SHORTTMP[$STRINGNUM]="${ary[$key]}"
SHORT[$STRINGNUM]="${ary[$key]}"
fi
done
for key in "${!SHORT[@]}"
do
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=${SHORT[$key]}"; }
say $*
done

Putting It Together

For all of these scripts to work together, we have to call them in a another script. I call this “main.sh“.


#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."

./speech2text.sh

QUESTION=$(cat stt.txt)
echo "Me: ", $QUESTION

ANSWER=$(python queryprocess.py $QUESTION)
echo "Robot: ", $ANSWER

./text2speech.sh $ANSWER

I have also updated and removed all the ‘echo’ commands from “speech2text.sh


#!/bin/bash

arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt
rm file.flac  > /dev/null 2>&1

Finally, make “main.sh” executable, run it and have silly conversation with your computer :-)

chmod +x text2speech.sh
./main.sh

The End

That’s the end of Raspberry Pi Voice Recognition tutorial, but it’s just the beginning of fun! You can now modify this project and turn it into something really cool, let me know what you can come up with. In the next project, I will exploit the speech to text feature, to make a voice control system to control an Arduino board, and even better, a robot.

Have fun.

Note: Errors You May Get

mplayer: could not connect to socket

If you gets this error, all you need to do to is to disable LIRC support by doing the following:

sudo nano /etc/mplayer/mplayer.conf

And put in the line:

nolirc=yes

And that should sort it out.

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xxx in position x: ordinal not in range(128)

I tend to just ignore those character for now, if you know a good way to convert them please let me know. In the ‘queryprocess.py’ script, replace the ‘print’ command just above ‘else’ with these lines (which I have done).

texts = texts.encode('ascii', 'ignore')
print texts

Some More errors people have been having

Problems

When I tried sudo apt-get install ffmpeg I got this error:

  • Unable to locate package ffmeg

When I tried apt-get install python-setuptools easy_install pip I got this error:

  • E: Unable to locate package easy_install
  • E: Unable to locate package pip

Solution

Make sure your Pi is connected to the internet when you run these commands. Type the ifconfig command to make sure that your ethernet or wifi adapter has an IP address.

When using APT, one should first always use:

  • sudo apt-get update
  • sudo apt-get upgrade

To re-synchronise the package / sources and ensure that Raspbian is up-to-date: http://linux.die.net/man/8/apt-get

Easy Install is a python module (easy_install) bundled with setuptools and pip should be python-pip, therefore:

  • sudo apt-get update
  • sudo apt-get upgrade
  • apt-get install ffmpeg
  • apt-get install python-setuptools
  • apt-get install python-pip
  • apt-get install mplayer

—————————————————–

Problems

When I attempt to run ./queryprocess.py and type in “what time is it?” I got this error:

  • bash: ./queryprocess.py: Permission denied

solution

Type this command to make your script executable:

chmod +x ./queryprocess.py

—————————————————–

Subscribe to our Youtube Channel for more videos :D
Don't have much on Google Plus, but follow me anyway :)
Donate any amount, to help us maintain this website.
Love Multicopters? Join our discussion Group on Facebook!

135 thoughts on “Raspberry Pi Voice Recognition Works Like Siri

Leave a Reply

Your email address will not be published. Required fields are marked *


6 × = six

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>