Monday, November 30, 2009

Programming Speech in WPF - Speech Synthesis

New Microsoft Speech API (SAPI) version 5.3, which is an integral part of Windows Vista, is a managed API that allows developers to write speech-enable applications in .NET Framework 3.0.  This speech functionality is defined in the System.Speech and its five sub namespaces. Physically, the speech API resides in System.Speech.Dll assembly.
Here is a list of five namespaces that defines Speech related functionality.
  • System.Speech.Audioformat
  • System.Speech.Recognition
  • System.Speech.Recognition.SrgsGrammar
  • System.Speech.Synthesis
  • System.Speech.Synthesis.TtsEngine
To access Speech API in WPF, you must add System.Speech.Dll assembly reference to a project. Right click on the project name in Solution Explorer, select Add Reference and select System.Speech on the .NET Tab and select OK button as shown in Figure 1.

Figure 1.
This action will add System.Speech assembly reference and copy System.Speech.dll to the bin folder of your project.  Now you can import System.Speech related namespaces in your application.
Speech Synthesis
Speech Synthesis, known as text-to-speech in previous versions of SAPI, is a process of converting text to speech.
Windows Vista comes with a default voice called Microsoft Anna. Let's take a look at it. Go to Control Panel and click on Text to Speech. You will see Speech Properties dialog with two tabs - Text to Speech and Speech Recognition as you can see in Figure 2 and Figure 3.

Figure 2.

On Text to Speech dialog box, you will see Voice Selection dropdown showing Microsoft Anna. On this dialog, you may also test the voice and audio output. If you have more voices installed, you will see them in the dropdown list as well. You can install more voices when you install Microsoft Speech SDK. 5.1.

Figure 3.

Table 1 describes the classes available in System.Speech.Synthesis namespace.
Represents a prompt spoken from a file.
Represents an installed Voice object.
Plays a prompt from text or from a PromptBuilder.
Creates an empty Prompt object and provides methods for adding content.
Defines a style of prompting that consists of settings for emphasis, rate, and volume.
Supports the production of speech and DTMF output.
Represents a text-to-speech (TTS) voice.

In this article, our focus is on SpeechSynthesizer class and its methods and properties.
The SpeechSynthesizer generates text to speech.
The Speak method speaks the text synchronously. The following code creates a SpeechSynthesizer object and calls Speak method that says "Hello WPF.". By default, the SpeechSynthesizer uses Microsoft Mary voice.
SpeechSynthesizer ss = new SpeechSynthesizer();
ss.Speak("Hello WPF.");
SpeechSynthesizer Properties
The SpeechSynthesizer has four properties - Rate, State, Voice, and Volume that are used to get and set rate, state, voice, and volume of the speech. The value of rate is between -10 to 10 and value of Volume is between 0 and 100. The Voice is the VoiceInfo object and State is SynthesizerState object. I will discuss these properties in more details in my forthcoming articles.
Asynchronous Speech
The SpeakAsync method speaks asynchronously and takes a Prompt, PromptBuilder or string as input text.
SpeechSynthesizer ss = new SpeechSynthesizer();
ss.SpeakAsync("Hello WPF");

The Application
Based on above class, properties, and methods, I build an application that allows you to browse a text file, opens it in a RichTextBox control, set the volume and rate of the speech and speaks it for you.
The application UI looks like Figure 4.

Figure 4.
The XAML code for controls looks like following:

See full details: