Week 10: Final Presentation and Transforming KB to XML

Day 1 & 2: Final Presentation On the first day of this week, I worked on finalizing my presentation and shared it with my mentor. After receiving his feedback, I made changes and focused on rehearsing the final presentation that I had to present to the entire team of HPCC systems (Richard Chapman, Vijay Raghavan,Continue reading “Week 10: Final Presentation and Transforming KB to XML”

Week 9: Remote Access to HPCC Systems Setup and Spraying Data files

Day 1, 2, and 3 After developing the knowledge base of wiktionary data, I started to access HPCC systems remotely. First, I tried to access the server remotely through Windows Subsystems for Linux (WSL) by following the guidelines: https://github.com/hpcc-systems/HPCC-Platform/wiki/Building-HPCC . While following step by step, I could able to install nodeJS for the Linux environment.Continue reading “Week 9: Remote Access to HPCC Systems Setup and Spraying Data files”

Week 8: Bug fixation in Knowledge Base and Preparation to Run it on HPCC Systems

Day 1 After knowing all the bugs in the analyzer sequences, I went through the analyzer sequences and made changes to the respective sequences where needed. As a result, I could able to fix most of the bugs such as the pronunciation zone was showing its content correctly though there were only phonetic(s) or phonemic(s)Continue reading “Week 8: Bug fixation in Knowledge Base and Preparation to Run it on HPCC Systems”

Week 7: NeWiktionary Analyzer Dev. Continuation, KB Dev. Initiation and Testing, and Planning for Remaining Weeks

Day 1: I completed parsing the Wiktionary data by building all the required analyzer sequences that match the pattern of word entry text in the Wiktionary word entry format. Day 2 & 3: I started to build a knowledge base (KB) by adding post areas in the existing analyzer sequences and even by building newContinue reading “Week 7: NeWiktionary Analyzer Dev. Continuation, KB Dev. Initiation and Testing, and Planning for Remaining Weeks”

Week 2 Day 5: Planning for Next Week

Today, I and my mentor discussed possible ways we can proceed with this project. Since I started with small data and parsed the “1000 most common Nepali words” dataset. Now, my goal is to do background research on Wiktionary, get familiar with the format words are added to Wiktionary, lookup common Nepali words whether theContinue reading “Week 2 Day 5: Planning for Next Week”

Week 2 Day 4: Running NLP++ Analyzers on Actual Nepali Data

I continued parsing text extracted from “1000 most common Nepali words” webpage. With the guidance of my mentor, I created more NLP++ pass having rules to parse rows and columns and remove whitespaces, used built-in library such as KBFuncs, created other NLP++ passes such as KbInit, KbBuild and KbDisplay to parse and display the outputContinue reading “Week 2 Day 4: Running NLP++ Analyzers on Actual Nepali Data”

Week 2 Day 3: Training on NLP++

Today, I attended a meeting with David, my mentor, and Lucas, another intern of my mentor who is also working on Wiktionary data using NLP++. Lucas was running an NLP++ analyzer on WikiText obtained from Chinese Wiktionary, which was in the form of header, dictionary, and synonyms. In this meeting, I learned about the following:Continue reading “Week 2 Day 3: Training on NLP++”

Week 2 Day 2: Azure Account Setup and Pre-processing Training Data

Microsoft Azure Account Setup To run big data from Wiktionary, I will need Azure to build and test our NLP++ parser and analyzer on Cloud. Therefore, I signed up for a free account on Azure using the following link: https://azure.microsoft.com/en-us/features/azure-portal/ I also received $200 credit through that account that I will be using later whileContinue reading “Week 2 Day 2: Azure Account Setup and Pre-processing Training Data”

Week 2 Day 1: Resolve HPCC systems access issues and NLP++ Training

HPCC Systems Access Issues I was having issues accessing hpccsystems.com or training documents available there that I needed to complete ECL training. Therefore, I reached out to the Help Desk, hpcc systems team, and the ECL team. I later came to know that I am supposed to wait for approval once I create an account.Continue reading “Week 2 Day 1: Resolve HPCC systems access issues and NLP++ Training”

Nepali Language Enrichment: Leveraging Wiktionary for NLP

Nepali is an under-resourced language when it comes to its presence in the domain of Natural Language Processing (NLP). Nepali is my native language and I feel that it is my responsibility to take an initiative and work on making Nepali language popular, formal, and eventually make it counted as rich-resourced language on online platforms.Continue reading “Nepali Language Enrichment: Leveraging Wiktionary for NLP”