Week 2 Day 3: Training on NLP++

Today, I attended a meeting with David, my mentor, and Lucas, another intern of my mentor who is also working on Wiktionary data using NLP++. Lucas was running an NLP++ analyzer on WikiText obtained from Chinese Wiktionary, which was in the form of header, dictionary, and synonyms. In this meeting, I learned about the following:

Loading .txt file to the NLP++ and creating an ANALYZERS in VisualText; Once an ANALYZER is created, it is tokenized in the ANALYZERS SEQUENCE
Writing rules to parse the text by understanding the repeated pattern
Running the ANALYZER SEQUENCE on a particular .txt file
Testing the ANALYZER SEQUENCE if it has worked by looking at the tree file, highlighted text for each ANALYZER SEQUENCE, and the final tree for overall ANALYZER SEQUENCE; the tree is formed from parent node i.e., ROOT to many child nodes as required to analyze the data in the .txt file.

Later, I also had a one-to-one meeting with my mentor. In that meeting, he helped me load a .txt file which is cleaned data of “1000 most common Nepali words”, create ANALYZER on VisualText, create rules under ANALYZER SEQUENCE to parse the text file, and run it on actual text data.

Share this:

Related

Leave a comment Cancel reply