Skip to content

Press Release: Nepali NLP Initiative

Don’t Let the Nepali Language Die in the 21st Century

Clemson, South Carolina, USA, June 16, 2022 — Clemson University PhD Student Ananya Gupta announced a Nepali NLP Initiative that hopes to lay the necessary foundations for getting computers to read and understand the Nepali language. Nepali, a language spoken by nearly 30 million people, is a neglected language in the area of computer science and artificial intelligence. With the emergence of new open source technologies (NLP++ and VisualText), it is now possible to bring the Nepali language into the modern age. Gupta is calling on all her fellow Nepali speakers to join in the effort.

The official Nepali language dictionary is maintained by the Nepal Pragya Pratisthan, an official government body, and contains more than 110,000 Nepali words. Wiktionary, the go-to place for open source dictionaries currently contains only 16,714 Nepali words listed as of June 1st, 2022. Furthermore, those words that are listed in Nepali Wiktionary do not give complete information about the very word; for example, phonetics or pronunciation, all parts of speech along with various meanings if applicable, synonyms, translation in English and other languages, etc. are either missing for most words or are not added in a proper format.

Gupta, a native Nepali language speaker and a research enthusiast in the domain of Natural Language Processing (NLP), has listed as the first major task in the Nepali NLP Initiative to build up the Nepali Wiktionary both in terms of the number of words as well as the content. “Without a comprehensive Nepali dictionary, we cannot get computers to understand our language. I am here today to announce to all my people and all those interested in the Nepali language to help out in this pioneering effort.”, Gupta says. She is currently looking for non-computer programmers as well as computer programmers to help out. The task of adding words to the dictionary only requires a computer and internet connection. But Gupta is also looking for computer programmers who share her desire for NLP in Nepali.

Gupta is currently working on an intern project with LexisNexis in developing the foundations for NLP using the HPCC Systems technology along with the newly developed open-source NLP programming language NLP++. In order to do NLP for Nepali, a comprehensive dictionary must be in place and in proper formatting for other more complex tasks such as Word Sense Disambiguity (WSD), sentiment analysis, etc.

Gupta is inviting everyone interested in the Nepali language to join in the initiative. She adds, “You don’t have to be a computer programmer to participate. All you need is a love and desire to move the Nepali language into the 21st century and you can contribute to our current effort to add words to Wiktionary. This cannot be done with just one person. We need the entire Nepali nation to embrace this effort.” Gupta is looking to reach out to Nepali Language school teachers across the nation to help build out the online dictionary. “This is a way for students to get excited about the Nepali language again knowing that it will be part of the AI revolution.”, Gupta adds. A website has been constructed in order to help all those interested in adding to the dictionary with example words and templates for the Wiktionary format.

Here are links related to the Nepali NLP Initiative:

Contact info:

Ananya Gupta

Email: ananya@nepalinlp.org

Leave a Reply

Your email address will not be published. Required fields are marked *