Raspberry Pi Voice Recognition Works Like Siri

Raspberry-Pi-Siri-voice-recognition

Raspberry Pi Speech Recognition Introduction

This tutorial demonstrate how to use voice recognition on the Raspberry Pi. By the end of this demonstration, we should have a working application that understand and answers your oral question.

This is going to be a simple and easy project because we have a few free API available for all the goals we want to achieve. It basically converts our spoken question into to text, process the query and return the answer, and finally turn the answer from text to speech. I will divide this demonstration into four parts:

  1. speech to text
  2. query processing
  3. text to speech
  4. Putting Them Together

Result Example:

Raspberry Pi Voice Recognition For Home Automation

This has been a very popular topic since Raspberry Pi came out. With the help of this tutorial, it should be quite easily achieved. I actually having an idea of combining the Speech recognition ability on the Raspberry Pi with the powerful digital/analog i/o hardware, to build a useful voice control system, which could also be adopted in Robotics and Home Automation. This will be in the next couple of blog posts.

Hardware and Preparation

Voice speech recognition raspberry Pi

You can use an USB Microphone, but I don’t have one so I am using the built-in Mic on my webcam. It worked straight away without any driver installation or configuration.
8601p

Of course, the Raspberry Pi as well.
images

You will also need to have internet connection on your Raspberry Pi.

Speech To Text

Speech recognition can be achieved in many ways on Linux (so on the Raspberry Pi), but personally I think the easiest way is to use Google voice recognition API. I have to say, the accuracy is very good, given I have a strong accent as well. To ensure recording is setup, you first need to make sure ffmpeg is installed:

sudo apt-get install ffmpeg

To use the Google’s voice recognition API, I use the following bash script. You can simply copy this and save it as ‘speech2text.sh


#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

echo -n "You Said: "
cat stt.txt

rm file.flac  > /dev/null 2>&1

What it does is, it starts recording and save the audio in a flac file. You can stop the recording by pressing CTRL+C. The audio file is then sent to Google for conversion and text will be returned and saved in a file called “stt.txt”. And the audio file will be deleted.

And to make it executable.

chmod +x speech2text.sh

To run it

./speech2text.sh

The screen shot shows you some tests I did.

Raspberry Pi Voice Recognition Works Like Siri

Query Processing

Processing the query is just like “Google-ing” a question, but what we want is when we ask a question, only one answer is returned. Wolfram Alpha seems to be a good choice here.

There is a Python interface library for it, which makes our life much easier, but you need to install it first.

Installing Wolframalpha Python Library

Download package from https://pypi.python.org/pypi/wolframalpha, unzip it somewhere. And then you need to install setuptool and build the setup.

apt-get install python-setuptools easy_install pip
sudo python setup.py build

And finally run the setup.

sudo python setup.py

Getting the APP_ID

To get a unique Wolfram Alpha AppID, signup here for a Wolfram Alpha Application ID.

You should now be signed in to the Wolfram Alpha Developer Portal and, on the My Apps tab, click the “Get an AppID” button and fill out the “Get a New AppID” form. Use any Application name and description you like. Click the “Get AppID” button.

Wolfram Alpha Python Interface

Save this Pyhon script as “queryprocess.py”.


#!/usr/bin/python

import wolframalpha
import sys

# Get a free API key here http://products.wolframalpha.com/api/
# This is a fake ID, go and get your own, instructions on my blog.
app_id='HYO4TL-A9QOUALOPX'

client = wolframalpha.Client(app_id)

query = ' '.join(sys.argv[1:])
res = client.query(query)

if len(res.pods) > 0:
    texts = ""
    pod = res.pods[1]
    if pod.text:
        texts = pod.text
    else:
        texts = "I have no answer for that"
    # to skip ascii character in case of error
    texts = texts.encode('ascii', 'ignore')
    print texts
else:
    print "Sorry, I am not sure."

You can test it like this shown in the screen shot below.

Raspberry Pi Voice Recognition Works Like Siri

Text To Speech

From the processed query, we are returned with an answer in text format. What we need to do now is turning the text to audio speech. There are a few options available like Cepstral or Festival, but I chose Google’s speech service due to its excellent quality. Here is a good introductions of these software mentioned.

First of all, to play audio we need to install mplayer:

sudo apt-get install mplayer

We have this simple bash script. It downloads the MP3 file via the URL and plays it. Copy and call it “text2speech.sh“:


#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=$*"; }
say $*

And to make it executable.

chmod +x text2speech.sh

To test it, you can try

./text2speech.sh "My name is Oscar and I am testing the audio."

Google Text To Speech Text Length Limitation

Although it’s very kind of Google sharing this great service, there is a limit on the length of the message. I think it’s around 100 characters.

To work around this, here is an upgraded bash script that breaks up the text into multiple parts so each part is no longer than 100 characters, and each parts can be played successfully. I modified the original script is from here to fit into our application.


#!/bin/bash

INPUT=$*
STRINGNUM=0
ary=($INPUT)
for key in "${!ary[@]}"
do
SHORTTMP[$STRINGNUM]="${SHORTTMP[$STRINGNUM]} ${ary[$key]}"
LENGTH=$(echo ${#SHORTTMP[$STRINGNUM]})

if [[ "$LENGTH" -lt "100" ]]; then

SHORT[$STRINGNUM]=${SHORTTMP[$STRINGNUM]}
else
STRINGNUM=$(($STRINGNUM+1))
SHORTTMP[$STRINGNUM]="${ary[$key]}"
SHORT[$STRINGNUM]="${ary[$key]}"
fi
done
for key in "${!SHORT[@]}"
do
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=${SHORT[$key]}"; }
say $*
done

Putting It Together

For all of these scripts to work together, we have to call them in a another script. I call this “main.sh“.


#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."

./speech2text.sh

QUESTION=$(cat stt.txt)
echo "Me: ", $QUESTION

ANSWER=$(python queryprocess.py $QUESTION)
echo "Robot: ", $ANSWER

./text2speech.sh $ANSWER

I have also updated and removed all the ‘echo’ commands from “speech2text.sh


#!/bin/bash

arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt
rm file.flac  > /dev/null 2>&1

Finally, make “main.sh” executable, run it and have silly conversation with your computer :-)

chmod +x text2speech.sh
./main.sh

The End

That’s the end of Raspberry Pi Voice Recognition tutorial, but it’s just the beginning of fun! You can now modify this project and turn it into something really cool, let me know what you can come up with. In the next project, I will exploit the speech to text feature, to make a voice control system to control an Arduino board, and even better, a robot.

Have fun.

Note: Errors You May Get

mplayer: could not connect to socket

If you gets this error, all you need to do to is to disable LIRC support by doing the following:

sudo nano /etc/mplayer/mplayer.conf

And put in the line:

nolirc=yes

And that should sort it out.

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xxx in position x: ordinal not in range(128)

I tend to just ignore those character for now, if you know a good way to convert them please let me know. In the ‘queryprocess.py’ script, replace the ‘print’ command just above ‘else’ with these lines (which I have done).

texts = texts.encode('ascii', 'ignore')
print texts

Some More errors people have been having

Problems

When I tried sudo apt-get install ffmpeg I got this error:

  • Unable to locate package ffmeg

When I tried apt-get install python-setuptools easy_install pip I got this error:

  • E: Unable to locate package easy_install
  • E: Unable to locate package pip

Solution

Make sure your Pi is connected to the internet when you run these commands. Type the ifconfig command to make sure that your ethernet or wifi adapter has an IP address.

When using APT, one should first always use:

  • sudo apt-get update
  • sudo apt-get upgrade

To re-synchronise the package / sources and ensure that Raspbian is up-to-date: http://linux.die.net/man/8/apt-get

Easy Install is a python module (easy_install) bundled with setuptools and pip should be python-pip, therefore:

  • sudo apt-get update
  • sudo apt-get upgrade
  • apt-get install ffmpeg
  • apt-get install python-setuptools
  • apt-get install python-pip
  • apt-get install mplayer

—————————————————–

Problems

When I attempt to run ./queryprocess.py and type in “what time is it?” I got this error:

  • bash: ./queryprocess.py: Permission denied

solution

Type this command to make your script executable:

chmod +x ./queryprocess.py

—————————————————–

Like us on Facebook if you haven't already.
Subscribe to our Channel for random videos :D
Don't have much on there, but follow me anyway :)
Donate any amount, to help us maintain this website.
Love Multicopters? Join our discussion Group on Facebook!

133 thoughts on “Raspberry Pi Voice Recognition Works Like Siri

  1. Well done, I certainly enjoyed reading this article. I will be trying this out as soon as I find a suitable microphone for my application. Question though, have you tried running the script using python as an active listener, so you wouldn’t have to keep restarting the script?

    • Thank you.
      Not really I was trying to keep this project as short as I can so it’s easy for everyone to pick up and modify it for their own project.
      Are you planning to do this? would you mind letting me know if your project is going to be available online?

      • I modified a few parts of the script to allow for this.
        I dont think i am going to put it online, but if I do I will surely credit you.

      • I came here looking to accomplish this exact thing. It is uncanny how similar it is to what I had in mind. Anyways, I’m looking into doing active listening for my first Pi project. My aim is to add a 16×2 lcd display that will print the response from Wolfram Alpha in addition to speaking the response. The one thing I haven’t figured out is the active listening.

        @Matt – You found the script online and did something cool with it. It is within the spirit of the community to give back the results. I can respect your decision; I just don’t understand the motivations.

  2. Thank you for your great article. actually i was stop in first step, when i was trying to do this:
    sudo apt-get install ffmpeg
    I have a error like this
    unable to connect to raspbian.caro.net:http:

    • Are you connected to the internet on the Raspberry Pi? Try this command, you should getting response if you are connected.
      ping google.com

      you could also try running this first for updates:
      sudo upgrade
      sudo update

  3. yes my rpi is connected when i write this command :sudo apt-get install ffmpeg
    it’s start and it’s going and in the middle i have a error

  4. Welldone,,good work.
    when i was trying this speech to text conversion,i didn’t get the text..
    I’ve installed ffmpeg..program also working without any error..but i didn’t get the output..

  5. Excellent Tutorial!

    Very well presented and easy to follow.

    I’m very excited to get this to work with my CEC controls to allow me to control my tv with voice :)

    awesome work

  6. when i try to install Wolfram Alpha Python Interface
    i type apt-get install python-setuptools easy_install pip
    and get: E: Unable to locate package easy_install……..
    do you have a solution?

      • After having run both upgrades, I’m running into the same issue. when i use the sudo command i get unable to locate packages, when i dont it tells me it can’t open a lock file /var/lib/dpkg/lock – open (13:permission denied)
        Then tells me unable to lock the administration directory (/var/lib/dpkg/) and asks if I’m root

      • You just enter “sudo apt-get install python-setuptools” into the command line then unpackage it and build it

        I didnt unpackage and build it but once I typed that line of code into the terminal it worked for me

        I too had that error until i did that

  7. When i start the speech2text.sh with the command chmod +x text2speech.sh
    it just makes a new line with nothin in:
    pi@raspberry …… chmod +x text2speech.sh
    pi@raspberry…….

    • that command is just to change permission on the file (executable). so it’s normal it returns nothing.
      to run the script, you need this command:

      ./text2speech.sh

      maybe I should make it more clear in the article.

  8. For the unicode error, I’ve had some success using the UnicodeDammit module from BeautifulSoup on strings in general – not perfect, since the library is more designed to work with ML documents, but it can help. Also big fan of unidecode, which is a good catchall solution (does its best to convert accented characters into their unaccented equivalents, which can be nicer than just doing an ascii ignore)

    • Thanks for the information, I also suspect if this is something to do with system language package because one of my friends claims he didn’t have this error. Mine was straight out of the box, so I didn’t install any language package at all.

    • Hi, try create a new text file, and call it “stt.txt”. Then give yourself read and write permission to it. Or simple delete the old “stt.txt” file, the script should create a new one for you with correct permission.

  9. I get an error at every stage of this :/

    When I try to capture audio it says:

    arecord: main:682: audio open error: No such file or directory
    ffmpeg version 0.8.6-6:0.8.6-1+rpi1, Copyright (c) 2000-2013 the Libav developers
    built on Mar 31 2013 13:58:10 with gcc 4.6.3
    *** THIS PROGRAM IS DEPRECATED ***
    This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
    pipe:: Invalid data found when processing input

    When I tried just to do a wolfram alpha query my python code would not work – i have done mostly perl programming but I could not find an error in the script yet it would not run

    Any help is greatly appreciated

  10. Great work !!

    It worked for me initially but now my raspberry pi is not converting wav recorded using arecord to flac using either ffmpeg / sox , the converted file sounds like static noise and google tells me no match found. Can you please help ?

    • thanks, have you checked your mic, maybe try a different mic? if it worked initially, I don’t think it would be related to the software but hardware.

  11. I get an error at every stage of this :/

    When I try to capture audio it says:

    arecord: main:682: audio open error: No such file or directory
    ffmpeg version 0.8.6-6:0.8.6-1+rpi1, Copyright (c) 2000-2013 the Libav developers
    built on Mar 31 2013 13:58:10 with gcc 4.6.3
    *** THIS PROGRAM IS DEPRECATED ***
    This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
    pipe:: Invalid data found when processing input

    When I tried just to do a wolfram alpha query my python code would not work – i have done mostly perl programming but I could not find an error in the script yet it would not run

    Any help is greatly appreciated

  12. I got Wolframalpha working thanks

    When I try to hear what is said at the text2speech stage I had to add the nolirc=yes line because of the error

    In the command line I enter ./text2speech “Hello my name is Ethan”

    This runs with no errors but I cannot hear anything even if the tv volume is at the max

    I know it is not the tv because i was playing mp3 files with omxplayer earlier when I was connecting up my wiimote

    Does anyone know how to correct this so I can hear what is said?

    Thanks

  13. Hi .. I live in South Korea is the user pie.

    Question something.

    How to add a command do?

    For example.

    When I say that rc.

    Enter RC Mode pi is to say, after having

    / home / pi / wiringPi / RC ./RCv9 in the path expressions that run the compiled file

    And the remote control RC cars try.

    • you can modify my bash script, to add a if/switch, to check what command you just told the RPi.

      and then run your script within the if/switch statement. for example

      #!/bin/bash
      
      echo "Recording... Press Ctrl+C to Stop."
      arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1
      
      echo "Processing..."
      wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt
      
      command = $( /dev/null 2>&1
      
      # Modified Parts below...
      
      if [ "$command " = "Enter RC Mode" ]; then
         # Run your script
      else
         # Do something else
      fi
      

      Might be mistake in the example (not a good bash programmer myself) but that how i would do it.

  14. Hi
    I Have problem to use the Wolframalpha Python library.
    i downloaded it and unzip it but i have no idea where am i suppose to put it. am i copy it on my bootup flash ?
    and another problem i have is, after run the ./speech2text.sh and say something in mic and push control C, takes a lot of time to show my text. like a minut. it’s not fast as video on this page.
    thank You

    • Hi
      it doesn’t matter where you put it, you just need to install it and can forget about it.
      the speed depends on: 1. how much stuff are running on your Pi, 2. How fast your internet connection is.

      • Hi
        Thank you for your reply. i download:
        https://pypi.python.org/pypi/wolframalpha
        and unzip it
        and run:
        apt-get install python-setuptools easy_install pip
        but i have this error
        E: Could not open lock file /var/lib/dpkg/lock – open (13: Permission denied)
        E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
        could you tell me why i have this error ?
        and how can i find out any other stuff running on my Rpi?
        Thank you
        Thank you

      • You need to run the command as root,

        Run sudo apt-get install python-setuptools easy_install pip instead.

  15. Hello!

    I have changed into polish:

    say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols “http://translate.google.com/translate_tts?tl=pl&q=${SHORT[$key]}”; }

    But I have troubles with reading polish characters (ą, ę, ś, ć etc.)

    It’s mobile site: http://meyerweb.com/eric/tools/dencoder/

    String “Literki ą ś ć ń ź” is translated to:
    “Literki%20%C4%85%20%C5%9B%20%C4%87%20%C5%84%20%C5%BA”

    Unfortunatelly:

    root@raspberrypi:/home/pi# /home/pi/scripts/text2speech_pl.sh “Literki%20%C4%85%20%C5%9B%20%C4%87%20%C5%84%20%C5%BA”

    Causes only read the “literki”, so that’s not it …

    Any ideas?

  16. This tutorial is really well-put-together! Thanks for taking the time to make this so readable! I’m having issues however with my text2speech.sh. When I try to run it on a test phrase, i.e. ./text2speech.sh “hello world”, I’m not able to see anything on the command line, nor does mplayer terminate. I took the -really-quiet command out and I get a printout of what is going on (I think…) and it seems that mplayer is reading in my string and is configured for the ffmpeg codec, but I just get a cursor blinking below the command. Any ideas?

    Thanks again for such a cool tutorial!

  17. Hey Oscar, I have a couple of questions:

    1. When I tried sudo apt-get install ffmpeg I got this error:
    E: Unable to locate package ffmeg

    2. When I tried apt-get install python-setuptools easy_install pip I got this error:
    E: Unable to locate package easy_install
    E: Unable to locate package pip

    3. When I attempt to run ./queryprocess.py and type in “what time is it?” I got this error:
    bash: ./queryprocess.py: Permission denied

    • have you tried running:
      sudo upgrade
      sudo update
      ?
      it also depends on what distro you are using… i was using Resbian. you might want to find out the different package name of yours.

      • I am having the same issue, when I run ./queryprocess.py “what time is it?” it returns permission denied. Did you find a solution for this issue? I have run update and upgrade, I am running current version of raspbian

      • I resolved this issue with chmod +x to make it executable, and I initially had the program “running” but kept getting the I don’t know response, now I am getting no module named wolframalpha?

  18. I’m reseaching Voice Recognition on Raspberry Pi to control GPIO , example control LED .But in my country hasn’t have document about this issue

    what operating system it works on ? Raspbian or Arch Linux

    please add my contact
    (skype) nguyenlong.0805
    (gmail) [email protected]

    I want to discuss about issue . thanks

  19. Hey,
    first: Very good articel and sorry for my bad englisch.

    I have a problem by the first step:
    I get no respond. At the moment it is the same code. The other examples working fine.MY microfone is working well on the Raspberry Pi too.

  20. Hello,
    I have been battling with the Speech api for some time now, no matter how I send the file, curl or wget, google never responds…

    This is all of my wget output:
    –2013-09-16 23:28:26– http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium
    Resolving http://www.google.com (www.google.com)… 74.125.227.115, 74.125.227.113, 74.125.227.116, …
    Connecting to http://www.google.com (www.google.com)|74.125.227.115|:80… connected.
    HTTP request sent, awaiting response…

    and it just sits there…

    Any suggestion?

    • sounds like you are having a DNS issue, maybe your network, maybe your router…

      try to force ipv4 by replacing:

      wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

      with

      wget -4 -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

      let me know if that works for you. :-)

      • Interestingly enough, I ran a perl version of this process from my laptop, and it worked properly. Thanks for the insight!

  21. Hi Oscar,

    Thank you for an amassing guide. Just a one note, I had to invoke
    sudo python setup.py install
    instead of just
    sudo python setup.py
    But everything else worked perfectly :)

    Thanks again,
    Radek

    • Please note the comment above about
      sudo pythong setup.py install

      But other than that, brilliant. Loving it, will add it to my list of tutorials with the Pi, might combine with the PiFm tutorial from another site and Pipe the audio output to an FM frequency!

  22. Thank you Oscar for posting this great idea. It’s exciting as a new programmer to start with a RPi and working with Linux for the first time. However, I do note some issues with the SpeechtoText.sh code. Here is my experience:
    a) if you run ‘ffmpeg -loglevel verbose’ and run the bash script, you see the ‘deprecated’ alert and advice to use ‘avconv’ as the new command. So I changed the code there *ffmpeg = avconv as far as I can see – compare help files to see this

    b) run ./speechtotext.sh to start recording, but when I ctrl+c to stop, say after 3 seconds of speaking, the flac file fails to be created and no translation to text occurs. I need to record for at least 5 seconds to get a successful flac file. This seems to be a result of the “pipeline” in the bash script to execute ‘arecord’ and immediately ‘ffmpeg’. I fixed this by splitting the pipeline into two distinct commands in the script:

    arecord… -t wav > speech.wav
    avconv…. -i speech.wav …. (explicitly specifying the input file to be converted)

    Just my small contribution to this blog.

    regards
    Matt Nassau

    • Could you please post your entire (modified) script?

      I’m having the same 5 second problem you describe, but modifying as you explain does not seem to fix it. In fact, it breaks, only returning “You Said: %”. It must be something stupid simple that I’m missing, but I cannot seem to nail it down.

      Thanks in advance.
      -Shane

  23. Thank you for the turorial, it´s excellent.
    However, I have a problem with the answers I get from Wolfram: if an answer has athis character -> | for example “1 | noun | some text” the text2speech script will only read the string infornt of the “|” character (in this case: “1”)
    FYI I´m using the simple text2speech script (couldn´t get the other one to work). I´m also using omxplayer instead of mplayer

  24. Such a great post, but i want to intergrate this with some commands for my pi etc.
    How to i remove the speech2text part of the main so i can test through my keyboard?
    Basically im looking to use “pi {command} {variable}” so “pi google time in london” would use this (so word google runs main.sh) but “pi turn on light” would run seperate scripts.

    Purely for testing, i will reinstate it in my finished project. (voice activated night light for a toddler)

    • To run only the “text to speech” part, you can use the “text2speech.sh”script, with the text appended to it, e.g.:

      ./text2speech.sh "Your Text Here"

      Good project, hope you have some good result :-)

      • Hi, thanks for the reply but i can do that bit.
        What i was after was queryprocess.py “question” with the text2speech.sh output.
        Currently the main.sh run from what is in the .txt file but i was hoping to just string the 2 scripts together ignoring that first section. (once again only for testing so those around me dont think im going mad talking to a computer all day)

  25. Dear friends, everything is working piece by piece, except while doing :
    wget -U “Mozilla/5.0″ –post-file file.flac –header “Content-Type: audio/x-flac; rate=16000″ -O – “http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium”

    giving me error:

    HTTP request sent, awaiting response… 500 Internal Server Error
    2013-11-16 23:33:05 ERROR 500: Internal Server Error

  26. I resolved the issue I was having before about getting the no wolframalpha module by moving the wolfram files into my speech_recognition folder/dir, now however when I try speech2text.sh or main.sh I keep getting an overrun error Recording… Press Ctrl+C to Stop.
    overrun!!! (at least 19.241 ms long)
    Processing…
    cut: invalid byte or field list
    Try `cut –help’ for more information.
    You Said: Me: ,
    Robot: , Sorry, I am not sure.

    any suggestions?
    You are certainly keeping me busy here, thank you for actually getting me using my pi!

    • ha! Seems like you have figured out most of your questions yourself.
      As for your newest question, have you tried running only the speech to text part alone? making sure the RPi can interpret what you say? From what I see here, it could be software or hardware (your mic)

  27. Hello again Oscar, I do believe you are right, that it must be a hardware issue. I do believe that although my Rpi recognizes the webcame, the built in Mic must not be picking up. Could you possibly tell me what model of webcam you are using? I am using a logitech c905. Totally got the facial recognition working today though, after 4 days of trying ;) Thank you again for keeping me busy.

  28. So did you have to modify any ALSA configs to allow use of your webcams mic to record? Or do anything else I may be missing before I run out to buy a different webcam? Add user to audio group? I’m not entirely convinced it’s my mic yet, when I try to practice recording with arecord, I get an error telling me that arecord: main:682: audio open error: no such file or directory, which is leading me to believe that maybe I don’t have something configured right to allow for recording?

    • not really, I didn’t modify any config, didn’t install any additional package apart from the ones I mentioned in the post. The video and audio on the webcam just works without any effort, it was almost as easy as plug-and-play for the webcam I used.

      What kind of Mic have you got there? would there be some sort of driver issue? have you used it previously on a Linux OS computer? I would google the brand and model of your mic to see if how people are using in in Linux OS or rasbian OS.

      sorry I am not much of a help, When it comes to hardware issue it’s difficult since it really varies from case to case.

  29. Hi, thank you so much for this article. I’m wondering is there any way i can make it run and stop recording automatically without press CTRL+C

  30. Having lots of fun with this.
    May try to mix things up with native unix espeak and
    surfraw -elvis…. news api’s
    Ill let you know if there are any advantages.
    Cheers.

  31. Great work! I have a question pertaining to your script;

    I have a python script I am running, and I would need to trigger ” ctrl c” through the script instead of manual keyboard input. Suggestions?

  32. whoah this weblog is great i love studying your posts.
    Keep up the great work! Yoou understand,
    many people are hunting around ffor this info, you can hellp
    thhem greatly.

  33. hello there and thank you for your info – I’ve certainly picked
    up something new from right here. I did however expertise several technical points using this website, since I experienced to reload the site
    lots of times previous to I could get it to load properly.

  34. I keep getting this error:

    pi@raspberrypi ~ $ ./speech2text.sh
    Recording… Press Ctrl+C to Stop.
    ^CProcessing…
    cut: invalid byte or field list
    Try `cut –help’ for more information.

    Any idea what I have to do to fix it?

      • Yes I am pretty sure it is working I was using a Turtle beach headset and I was getting feedback but I also suppose that the Turtle beach has a sound card and that’s how I was receiving feedback, perhaps it still is my microphone? Is there anything else it could possibly be?

  35. I thought my mic was working but considering it holds it’s own sound card, that is probably where I was getting the feedback from so the Pi wasn’t receiving anything?

    I have set up a bluetooth headset however, and that also doesn’t work, same problem…

    Any ideas on what might be the problem or what might be a solution?

    I am running Rasbian, if that helps…

  36. hey, nice tutorial!
    I’m using avconv instead of ffmpeg to make mt flac file, as ffmpeg was acting erratically and I got tired of fussing with it.
    To save internet lag time, I’m using flite to speak the response I get from Wolfram Alpha. Voice is less clear, of course, but it does sound more robot-ish.
    So thanks for the great writeup. Didn’t even think of trying this before reading it.
    The upside is that my robot project is now conversant. The downside is that I now need an additional RPi because doing this while doing face tracking with openCV is just too much for one board to handle.
    If you have any interest in seeing the robot into which this has been integrated, it’s the Model III in the Google+ community ‘World Robotics Project’

  37. It’s possible to change the response language ?? i wont to set the response and the question in another language…
    sorry form my English

  38. Hi, i just setup my own raspberry pi and am still pretty new to this. I am wondering if it is possible that i am able to tell the raspberry pi to play a local radio station like how you tell the raspberry the date or what is beatles? Is there any guide to it as i am still pretty new thanks!

    • i believe it’s possible although i haven’t done this. There are two parts, first you will need to google how to activate and play the radio station using command line script, then use the voice recognition to try to catch a keyword such as “play radio”, and run that script.

      • i would like to try and do your example above. Hopefully i get it working and when it’s done i would take a look at the radio station.

      • i started on this thing and came upon an error. After typing chmod +x speech2text.sh ( automatically goes to the next line ) and type in ./speech2text.sh and get a syntax error near unexpected token ‘(
        Any idea how i can solve it?

  39. Traceback (most recent call last):
    File “C:\Users\samue_000\Desktop\queryprocessing.py”, line 13, in
    res = client.query(query)
    File “C:\Python33\lib\site-packages\wolframalpha-1.0.2-py3.3.egg\wolframalpha\__init__.py”, line 68, in query
    assert resp.headers.gettype() == ‘text/xml’
    AttributeError: ‘HTTPMessage’ object has no attribute ‘gettype’

    is what I get when running it. I guessed it could be a problem (I am using Python 3.3) so incise this was written in Python 2.7. I am about to try that, however, do you know of a solution?

  40. Hi, Oscar,

    This is a ridiculously cool project! Every step worked for me, but then trying to put it all together, it just says I don’t know. Even going back and running speech2text.sh again just gives a blank.

    Ambitious idea, anyway!

    Thanks,

    Peter

  41. Hey Oscar, really cool project you got here and awesome of you to share it with the world , top bloke.
    one thing I would really like to do is have it display images on an attached small tft display, whilst still keeping its current audio features, for instance command “print usa flag” would display the stars+stripes on the tft, presume a seperate script will be needed for the “print” command ? any pointers for a noob how I could accomplish this and is it a noob friendly job ?
    many thanks and best wishes from UK

    • firstly, create a script that displays different images (flags), and you just need to feed that script to your speech recognition system, when command received.

  42. Hello, what model is your Logitech camera?. It seems the C270 but the manufacturer says that it is not compatible with Linux … ? Thank you very much. Very good tutorial. Gustavo

  43. Hi Oscar, nice project and really good of you to share your skills with others, much appreciated.
    really like this project but curious if it can be taken 1 step further, and how difficult it may be for a beginner. what I would like to configure is a “print” command, that displays the answer on a small tft monitor attached to pi, for instance I would like to say
    “print map of usa”
    and the result is stars+stripes printed to the tft

    any pointers or tips most welcome mate

  44. Hi
    I’m trying to do this using the logitech c270 and I cannot get the speech entry element to work whenever I try the light on the webcam does not come on and the screen displays ‘Invalid byte or field list’
    Any idea on how to fix this would be gratefully received

  45. Hi Oscar,

    you are just amazing! After 4 hours it finally works! But one big problem:
    1) After running the main.sh script, I have to wait 7 seconds to ask a question otherwise it will not record
    2) After typing CRL C the process takes around 13 seconds to give me an audio answer back
    3) In total the system needs 20 seconds to answer one question

    Can you help me to solve that issue? Hope to hear from you soon Oscar!

    Raphael

    • I did struggled with the speed as well. Here is what I found.

      Point 1) it could possibly related to system speed, for example the class of your SD card could affect how fast your system runs. Try to kill any other process and application that you are not using. My was taking 3 to 5 seconds to start recording after pressing ctrl + C.
      I know poin 2) is definately related to internet speed. It was much quicker for me when using LAN than Wifi. Usually it takes me 3 to 4 seconds to get an answer.

      So the total time was 6 to 9 seconds, which is reasonable. Don’t expect you will get anything instantaneous, don’t forget your are running this on the RPi which is not a powerful device at all (your smart phone is probably a few times more powerful than that!)

  46. Fine way of describing, and fastidious piece of writing to get data about my
    presentation subject, which i am going to convey in institution of higher education.

  47. Hi I have been working on voice recognization project.
    I have installed mplayer and used the following script for playing the inputted text.
    it didnt work.
    then I installed omxplayer and using the following script to play the inputted text, then it worked
    ——————————————————————————————————————————————
    #!/bin/bash
    say() { local IFS=+;/usr/bin/omxplayer “http://translate.google.com/translate_tts?tl=en&q=$*”; }
    say $*
    ——————————————————————————————————————————————-

    Now the problem with the omxplayer hangs sometimes and holds the shell and not able to come out.
    do i need to do some changes in the above script ..
    Please help me ….Your help will be appreciated.

  48. I was thinking of something similar to this in a smallish custom enclosure, inbuilt mic and a small speaker, 20×4 LCD to indicate listening and write out the response and a button to activate the script – it could be useful for quite a few things as well as the Siri like functions, especially when given the ability to talk to other networked devices.

  49. I have been experimenting with this over the past few days. As an early I’ve created a simple breadboard circuit that invokes the script via a button press – creates a ten second recording period (I figure this is long enough for any input), then submits. The microphone is taken care of by a Creative Play USB sound card, which will be stripped down and soldered directly to the Pi when I make the appropriate enclosure. The text input / output is displayed on a 20×4 LCD display, and the mp3 is played via a small speaker on the breadboard.

    The main problem is speed. It isn’t quite as seem less as it should be. The Pi is over clocked to 900mhz and I have reduced GPU RAM to a minimum, and I am using a class 10 SD card. I’m going to experiment with streamlining Raspbian as much as possible.

    • yes you are right, RPi is a bit slow for that kind of stuff (voice recognition / object recognition etc…) hope they can make the board more powerful in the future.

  50. Hey there !
    Im having just a hard time when the text is been read.
    Because of the break every 100 characters, the system pauses for 2 seconds and comeback to start reading the next line. But this little pause makes you loose the understanding of the computer is saying… I don’t know if that makes sense, but is there any trick to remove the little pause in between the 100 characters ?!

    Thank you very much !

    Thiago

  51. i have a ms lifecam vx-6000 v1.0 and the recording is too bad. Probably that’s why google can’t make any speech to text conversion.
    Here is a sample file https://www.dropbox.com/s/5uxh0w5f5ivhqok/file.flac.
    Is there any solution on this?
    Running “arecord -D plughw:1 –duration=10 -f cd -vv file.flac” i get the above,

    Recording WAVE ‘file.flac’ : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
    Plug PCM: Route conversion PCM (sformat=S16_LE)
    Transformation table:
    0 <- 0
    1 <- 0
    Its setup is:
    stream : CAPTURE
    access : RW_INTERLEAVED
    format : S16_LE
    subformat : STD
    channels : 2
    rate : 44100
    exact rate : 44100 (44100/1)
    msbits : 16
    buffer_size : 22050
    period_size : 5512
    period_time : 125000
    tstamp_mode : NONE
    period_step : 1
    avail_min : 5512
    period_event : 0
    start_threshold : 1
    stop_threshold : 22050
    silence_threshold: 0
    silence_size : 0
    boundary : 1445068800
    Slave: Rate conversion PCM (16000, sformat=S16_LE)
    Converter: linear-interpolation
    Protocol version: 10002
    Its setup is:
    stream : CAPTURE
    access : MMAP_INTERLEAVED
    format : S16_LE
    subformat : STD
    channels : 1
    rate : 44100
    exact rate : 44100 (44100/1)
    msbits : 16
    buffer_size : 22050
    period_size : 5512
    period_time : 125000
    tstamp_mode : NONE
    period_step : 1
    avail_min : 5512
    period_event : 0
    start_threshold : 1
    stop_threshold : 22050
    silence_threshold: 0
    silence_size : 0
    boundary : 1445068800
    Slave: Hardware PCM card 1 'USB20 Camera' device 0 subdevice 0
    Its setup is:
    stream : CAPTURE
    access : MMAP_INTERLEAVED
    format : S16_LE
    subformat : STD
    channels : 1
    rate : 16000
    exact rate : 16000 (16000/1)
    msbits : 16
    buffer_size : 8000
    period_size : 2000
    period_time : 125000
    tstamp_mode : NONE
    period_step : 1
    avail_min : 2000
    period_event : 0
    start_threshold : 0
    stop_threshold : 8000
    silence_threshold: 0
    silence_size : 0
    boundary : 2097152000
    appl_ptr : 0
    hw_ptr : 0
    ##################################################+| 99%overrun!!! (at least 32.046 ms long)
    Status:
    state : XRUN
    trigger_time: 2880.581903178
    tstamp : 2880.902331092
    delay : 2
    avail : 22048
    avail_max : 22089
    ##################################################+| 99%

  52. Hello,

    Thanks for this how to, it’s great!

    I’m getting issues, when I run the main.sh, only one question is allowed it seems unstable.
    After that the script output :

    Me: ,
    Answer: Sorry I’m not sure

    I have the same issue when I run the speech2text script.

    Any ideas?

    I’ve installed wolfram-engine on the rasberry pi (alpha seems to be outdated)

    Regards,

    Dav

  53. hi Oscar

    I use the same code yo run it
    but pi got no respond

    Processing…..
    You said:
    pi@raspberrypi~$

    I also try
    sudo apt-get update
    sudo apt-get upgrade
    sudo apt-get install ffmpeg

    pi is all install….= =

    so I really don’t know what ythe problem with it = =

  54. Hi Oscar,

    I have installed raspberry pi on a vm to work faster. I am using MAC.

    So when i run your script i get the following

    rpi@RaspberryPi:~/bozims$ sudo ./speech2text.sh
    Recording… Press Ctrl+C to Stop.
    Processing..

    Meaning that it doesn’t wait for the recording. I tried to run this command partially
    arecord -D “plughw:0,0″ -q -f cd -t wav

    and this is working fine and i can see recording there in binary format but when i run the part after |
    rpi@RaspberryPi:~/bozims$ ffmpeg -loglevel panic -y -i – -ar 16000 -acfile.flac > /dev/null 2>&1

    nothing happens
    rpi@RaspberryPi:~/bozims$ ls
    speech2text.sh stt.txt

    i can’t find any flac file here as well.

    Any idea what can be the problem ? I am not getting any error which is very confusing.

  55. Banana PI, New generation development Board made in China, an run Android 4.4, Ubuntu, Debian, Rasberry Pi Image, as well as the Cubieboard Image
    What’s Banana Pi?

    It’s an open-source single-board computer. It can run Android 4.4, Ubuntu, Debian, Raspberry Pi Image,more powerful than raspberry pi

    What can I do with Banana Pi?

    Build…
    A computer
    A wireless server
    Games
    Music and sounds
    HD video
    A speaker
    Android
    Scratch
    Pretty much anything else, because Banana Pi is open source

    Who’s it for?

    Banana Pi is for anyone who wants to start creating with technology – not just consuming it. It’s a simple, fun, useful tool that you can use to start taking control of the world around you.

  56. Speech2text code doesn’t work any more now. it was working on my RPI.
    Do you guys this code is working now?
    text2speech code is working now.

    Do you have any idea about this? or someone has same issue as me.

  57. It’s the google speech recognition api. It changed somehow, so you’ll have to figure out how to change the wget client for it to work.

  58. hi Oscar,
    i have webcam c270. RPi os raspbian.
    Problem: ./speech2text.sh:line 5 $’arecord\302\240-D’ :command not found
    i can record : #arecord xyz.wav
    i can play: #aplay -D hw:1,0 xyz.wav

    Thanks in advance!!!!

Leave a Reply

Your email address will not be published. Required fields are marked *


6 × five =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>