On March 18, 2010. Dr. Paul Dempster, a Research Fellow at University of Leeds, did an experiment with voice recognition software, trying to get it to transcribe two audio files. He used one of the top-name commercial voice recognition products. He used media files from news reports, files with only one or two speakers and with very high audio quality compared to most research media collected in the field, so he was really testing ideal circumstances.
Below are the results. The voice recognition transcript is on the left and my manual transcript generated using Transana is on the right. Both transcripts are presented in editable text fields so that you can try to align them, if you are so inclined, and so you can explore the experience of correcting voice-recognition transcripts.
His first attempt used this audio file.
His second attempt used this video file. It was considerably more successful, due in part to having a single speaker with careful annunciation.
Remember, these were both source files with excellent audio quality, one or two speakers, no overlapping speech, and no background noise at all. How well does this match your research media?
Voice recognition is a promising technology. (They've been promising the technology for decades!) It's just not capable of doing what we need yet.