All Spoken Languages Matter for Voice Technologies

Growing up in Lagos (and sometimes Ibadan) West Africa, I always wondered if language was a barrier in preventing my Mom from understanding how to use computers. For context, my mother grew up in the 60's in Ibadan and her spoken English was good but her command of the language in other aspects was not that strong, so she occasionally struggled to fully express herself in English unlike my Dad. However, her understanding of Mathematics and other technical concepts was very strong but I could see the lack of confidence she struggled with when it came to expressing herself and her knowledge in English. Conversely, when the language being used for communication is Yoruba (a West African language and my parents’ first language) she showed no lack of confidence and was as expressive as you can get demonstrating full mastery of the language.

My fascination with this peculiar problem led to my undergraduate thesis work which was “Localizing the Ubuntu operating system to Yoruba”. After completing my undergraduate research work, complete with a prototype, I discovered that localizing the text display of the computer alone was not enough, as this did not result in increased engagement with computers by my Mom or any of her relatives and friends. Then I realized there was another barrier I hadn't considered earlier which was that only a few people who speak Yoruba fluently can read and write in the language. This problem is not unique to the Yoruba language, however, as 80% of African languages and more than half of the world's languages have no written form. Added to this is the fact that in all human communities, the oral form of any language is used more often than the written form of the language.

About a decade later, after having the privilege of managing products and programs in artificial intelligence (AI) organizations at Intel and Meta (previously Facebook), I developed a deeper technical understanding of how conversational systems are built and decided to work on a side project. With the massive improvements in voice technologies over the years, I thought that it would not be too challenging to build a voice experience on either Google Assistant, Alexa or Siri to support speaking Yoruba. Well, turns out I was wrong. The development tools for these popular voice assistants did not provide the level of customization needed for me to experiment with supporting more languages.

To make this even more difficult, only a few research papers in the natural language processing (NLP) and AI space focused on supporting languages such as Yoruba. This inspired me to expand the scope of my side project to include exploring state-of-the-art research ideas in order to develop the technology needed to support a wider range of spoken languages.

A few of my friends agreed to help out with the project, and viola!, after a lot of hard work, we finally had our first prototype for using a voicebot to request songs on YouTube using Yoruba!

Shortly after this, I decided it was time to take a break from professional employment, as I found the context switch between the role of a Technical Program Manager during regular hours and an Applied Researcher/Engineer during off hours rather difficult to manage. I felt compelled to focus on improving the project and work towards helping people access technology services via voice in any language as I deeply believe that it would be a huge benefit for humanity, when all voices from any part of the world can harness the power of computing to shape and improve their daily lives.

Hence the start of my new journey with amazéthu (ah-mah-zay-too).