Week 8: Bug fixation in Knowledge Base and Preparation to Run it on HPCC Systems

Day 1

After knowing all the bugs in the analyzer sequences, I went through the analyzer sequences and made changes to the respective sequences where needed. As a result, I could able to fix most of the bugs such as the pronunciation zone was showing its content correctly though there were only phonetic(s) or phonemic(s) or both, [[ ]] were removed from most sections in each input file. However, there was still [[ ]] in some sections such as variation and definition sections which I could not remove. Additionally, the definition count was still wrong and gave counts of all definitions of that word instead of the count of definitions that were under a particular part of speech.

Day 2

I continued working on fixing bugs. While debugging, I came to know that the hierarchy of content zones was wrong for definition. Therefore, I had to add an “explanation” zone under the definition that takes care of the text in the definition zone. To fix the number of count of definitions for the respective part of speech, I had to make changes in the code of kbDefText, kbDefVariation, and add kbExp for explanation section of definitions. As a result, I obtained a complete knowledge base of word’s content and one of the example of knowledge base structure is as follows:

Word: answer (जवाफ)

words
  जवाफ: 
    pos=[1]
    pos1: 
      pos=[नाम]
      definition=[1]
      definition1: 
        explanation=[1]
        variation=[3]
        example=[1]
        explanation1: 
          text=[उत्तर; उत्तर पक्ष; उत्तरा; प्रतिवचन]
        variation1: 
          text=[उत्तर पक्ष]
        variation2: 
          text=[उत्तरा]
        variation3: 
          text=[प्रतिवचन]
        example1: 
          text=[श्यामले आफ्नो प्रियसीलाई छोडेर जाने कारण सोध्दा प्रियसीले केहि जवाफ दीनन।]
    pronunciation: 
      phonetic=[javāpha,javaapha]
    synonym: 
      synonym=[2]
      synonym1: 
        text=[उत्तर]
      synonym2: 
        text=[प्रतिक्रिया]
    derivedTerms: 
      derived=[3]
      derived1: 
        derivedTerms=[जवाफदेही]
      derived2: 
        derivedTerms=[जवाफमाग्नु]
      derived3: 
        derivedTerms=[जवाफदिनु]
    translation
      अङ्ग्रेजी: 
        translation=[3]
        translation1: 
          text=[answer]
        translation2: 
          text=[reply]
        translation3: 
          text=[response]

Day 3,4 & 5

After completing building knowledge base in NLP++, the next step was to get HPCC systems running on my PC. For this, I installed Ubuntu on windows and followed the link: https://github.com/hpcc-systems/HPCC-Platform/wiki/Building-HPCC to install NodeJs, Ubuntu, and complete required installation. However, I could able to install vcpkg on Windows subsystems for Linux (WSL) but, when I was trying to run sudo cmake -j4 package the compilation was not completing to 100% and stopping in the middle giving error. This may be occuring because of very less memory space left in my PC. Therefore, I tried another way i.e. by installing Ubuntu and running it on virtual box. I received similar error as on WSL while trying to complete configuration for HPCC systems on Ubuntu on Virtual Box. I spent three days and even after receiving help from my mentor and another expert, Michael Gardner, I still haven’t been able to resolve the issue.

Leave a comment