Industry Updates

'SAMENA Daily' - News

Pakistani researchers set to build Urdu speech recognition system

The possibility of some software application developer coming up with an Urdu speech recognition program just got more likely as the most fundamental tool needed for it has just been developed at Lahore’s Information Technology University.

Linguistic technology expert Dr Agha Ali Raza and his team at ITU’s Center for Speech and Language Technologies (CSaLT) laboratory has released for public use a corpus of Urdu sentences that covers all possible distinct sounds, called phoneme by linguists, used in everyday speech.

This corpus comprising 708 sentences that covers all 63 phonemes will soon be available for download at the C-SALT website.

Those interested in developing an Urdu speech recognition software will now have access to the most basic ingredient needed for the purpose.

They will just need a repository of words used in everyday speech to proceed with developing the application, says Dr. Raza.

“Speech recognition is a two-step process. The corpus will give the computer application access to all possible phonemes used in formation of meaningful Urdu words from everyday speech,” he says.

Though there are 63 distinct phonemes in Urdu, in everyday speech these don’t correspond to 63 distinct sounds. Dr. Raza explains that sound made for a phoneme may vary from one utterance to another depending on the phoneme used before and after it in a word.

Thus, he says, for every phoneme x, there will be 63x63 possible (tri-phoneme) sounds. The corpus of sentences covers for all these possible sounds.

In the first step, words from the corpus will allow the application to train itself in the sounds of various Urdu words.

The separate repository of words will come into play in the second stage allowing the application to choose the most appropriate words for the output sentences.

“This will enhance accuracy of the software,” Dr. Raza says.

Read more: Data Science Lab in Pakistan Makes Urdu-Hindi Dictionary

Thus, the accuracy of the speech recognition softwares depends on written or oral sources from where words and sentences are generated for the corpus and the repository maintained separately for ruling out meaningless words.