Today, I attended a meeting with David, my mentor, and Lucas, another intern of my mentor who is also working on Wiktionary data using NLP++. Lucas was running an NLP++ analyzer on WikiText obtained from Chinese Wiktionary, which was in the form of header, dictionary, and synonyms. In this meeting, I learned about the following:
- Loading .txt file to the NLP++ and creating an ANALYZERS in VisualText; Once an ANALYZER is created, it is tokenized in the ANALYZERS SEQUENCE
- Writing rules to parse the text by understanding the repeated pattern
- Running the ANALYZER SEQUENCE on a particular .txt file
- Testing the ANALYZER SEQUENCE if it has worked by looking at the tree file, highlighted text for each ANALYZER SEQUENCE, and the final tree for overall ANALYZER SEQUENCE; the tree is formed from parent node i.e., ROOT to many child nodes as required to analyze the data in the .txt file.
Later, I also had a one-to-one meeting with my mentor. In that meeting, he helped me load a .txt file which is cleaned data of “1000 most common Nepali words”, create ANALYZER on VisualText, create rules under ANALYZER SEQUENCE to parse the text file, and run it on actual text data.
