THIS IS PRETTY MUCH THE 21ST CENTURY I WAS PROMISED: Google’s DeepMind claims major milestone in making machines talk like humans.

The researchers note that today’s best TTS systems, generally considered to be powered by Google, are built on “speech fragments” recorded from a single speaker. Those fragments are then reconstructed to create utterances.

While this approach, known as concatenative TTS, has produced natural-sounding speech, it is generally limited to a single voice unless a new database is provided.

Another technique called parametric TTS, which relies on voice codec synthesizers, may be more flexible, but this hasn’t achieved as natural-sounding speech.

WaveNet differs by being trained on raw audio waveform from multiple speakers and then using the network to model these signals to generate synthetic utterances. Each sample it creates is fed back into the network to generate another sample.

“As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music,” DeepMind researchers note in a blogpost.

Just don’t give it control of the pod bay doors.