cat | say | transcribe

So, apparently I was quite remiss with my last post. I was, apparently, supposed to take my last exploration with speech recognition to its logical conclusion, and see what happens when the computer tries transcribing its own “voice.”

So, we’ll take a text file containing a snippet of text, run that through Say.exe to produce a WAV file, run the WAV file through Transcribe.exe to produce a text file, and compare the two text files.

Here’s the introduction to Pride and Prejudice:

It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.

However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.

Here’s what the older Windows XP recognition engine produces. As before, the lack of punctuation is expected.

A’s the truth universally acknowledged that a single man in possession of the good fortune that the unwanted the wind out average little known the feelings organisms that chain and maybe I’ve his first entered a bridge read this to the cell wealth extend the minds of the surrounding families that he is considered the rightful property on someone or other of their daughters

And Vista’s newer engine:

He is a truth universally acknowledged that a single man in possession of a good fortune
part B1 of the white pal ever little down the feeling for the use of such a manner beyond year’s first entry neighborhood this truth in cell lab fixed in the mind of the surrounding family’s that he is considered a rightful property at someone or other of the daughters

Interestingly, Vista’s output isn’t significantly better than XP’s. Past experience showed that Vista transcribes more accurately given the same input. Thus, we are led to an intriguing hypothesis: in an effort to create a more natural-sounding computer voice for Vista, Microsoft also created a voice that was more difficult to automatically transcribe!

Another example, this time using Shakespeare’s Sonnet 68:

Thus is his cheek the map of days outworn,
When beauty lived and died as flowers do now,
Before these bastard signs of fair were born,
Or durst inhabit on a living brow:
Before the golden tresses of the dead,
The right of sepulchres, were shorn away,
To live a second life on second head,
Ere beauty’s dead fleece made another gay:
In him those holy antique hours are seen,
Without all ornament, it self and true,
Making no summer of another’s green,
Robbing no old to dress his beauty new,
And him as for a map doth Nature store,
To show false Art what beauty was of yore.

I added linebreaks to Vista’s output, to better enable comparisons between the two versions:

Not that his cheek and at a denny’s out
120 minutes and I has flowered now
the forties bastard signs of tired were born
all orders to inhabit, leading brown
deflected golden tresses added that
the right of sepulcher sit where slightly delayed
a second light on secondhand
terribly stabbed Fleetwood another game
became those calling NT Cal sky scene
but not all or none at itself a true
B. Demille summer of another screen
prodding know all too dressy is getting you
at ms foreign at Tiffany to restore
to show false are like the meet with your

I didn’t bother with XP.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s