I recently had cause to write a universal app that has voice capabilities. Hopefully it will shortly be available on the Windows Store.
Adding voice synthesis, while a seemingly basic task, proved to be anything but. What follows is a series of trials and tribulations that I have overcome in order to bring this app to the public!
The Code
Here’s the basic code to get started.
var voiceStream = await synth.SynthesizeTextToStreamAsync("Hello");
MediaElement mediaElement = new MediaElement();
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.AutoPlay = false;
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
mediaElement.Play();
Windows Phone
The first issue I encountered with Windows Phone was the volume. Having maxed out my laptop speakers, I managed get a faint whimper. What I finally deduced is that, for some reason, when the emulator starts, it starts at half volume. To increase it you have to press the phone’s volume buttons (which I initially assumed were just there for aesthetics. Clearly Microsoft haven’t entirely abandoned skeuomorphism.
The volume controls are the top right buttons (which I happened to know because I have owned a Windows Phone in the past). Once you press it then a more sensible interface appears and you can change either the ringer volume, or the game volume.
Overtalking
Although the code above does work, try calling the code in a loop (or just twice). What happens is that it doesn’t wait for itself to finish.
What I didn’t realise (until I asked this) was that there are some events which supposedly fire when the media element has finished playing.
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
MediaElement mediaElement = new MediaElement();
mediaElement.Loaded += mediaElement\_Loaded;
mediaElement.MediaEnded += mediaElement\_MediaEnded;
mediaElement.MediaFailed += mediaElement\_MediaFailed;
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.AutoPlay = false;
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
mediaElement.Play();
I say supposedly, because when I first tried to capture the events, they did nothing. After a bit of searching, it turns out that the element needs to be part of the visual tree! Which of course makes total sense - an AUDIO media element must be part of the VISUAL tree.
In the Visual Tree
The code below now looks for the media element in the visual tree, and if it can’t find one, adds it. It also uses the TaskCompletionSource object to await the audio stream.
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
MediaElement mediaElement;
mediaElement = this.rootControl.Children.FirstOrDefault(m => (m as MediaElement) != null) as MediaElement;
if (mediaElement == null)
{
mediaElement = new MediaElement();
this.rootControl.Children.Add(mediaElement);
}
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
var tcs = new TaskCompletionSource<bool>();
mediaElement.MediaEnded += (o, e) => { tcs.TrySetResult(true); };
mediaElement.Play();
await tcs.Task;
}
That works. At least, it works on Windows Phone. Because it uses a media element, I thought putting it on a shared XAML page would work… but it doesn’t. Windows 8.1 just sits there quietly and says nothing.
Windows Store
After much trial and error, it occurred to me (prompted by a comment on the above question) that if the problem was down to a deadlock, then destroying and recreating the control might clear this up.
Amazingly, that did seem to work; the working code is:
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
var voiceStream = await synth.SynthesizeTextToStreamAsync(toSay);
MediaElement mediaElement;
mediaElement = rootControl.Children.FirstOrDefault(a => a as MediaElement != null) as MediaElement;
if (mediaElement == null)
{
mediaElement = new MediaElement();
rootControl.Children.Add(mediaElement);
}
mediaElement.SetSource(voiceStream, voiceStream.ContentType);
mediaElement.Volume = 1;
mediaElement.IsMuted = false;
var tcs = new TaskCompletionSource<bool>();
mediaElement.MediaEnded += (o, e) => { tcs.TrySetResult(true); };
mediaElement.Play();
await tcs.Task;
rootControl.Children.Remove(mediaElement);
}
And this does seem to work on both platforms.