Wolfram Language

Build an Audio-Enabled Question-Answering System

This example shows how conversions between audio signals and textual transcriptions can enable a large domain of voice-enabled applications, such as textual question answering using recorded questions and synthesized answers. Speech can be converted to text using SpeechRecognize, and text can be converted to speech using SpeechSynthesize.

Start by downloading the text of presidential inaugural speeches.

Select a part of one of the speeches to use as the context for question answering.

Now record the question using AudioCapture.

Transcribe the recorded question with SpeechRecognize.

Now you can use text processing to answer the question. Use FindTextualAnswer to identify the most probable answer to the question.

Use SpeechSynthesize to convert the result into an Audio object. You can also choose two different voices from $VoiceStyles to represent the surrounding context and answer, respectively.

Define a function to synthesize speech using a specific voice for the given input.

Synthesize and combine parts of the speech.

Related Examples

de es fr ja ko pt-br zh