Skip to content

3. delfi.lt article vocablulary

Part 3

Articles: 472 892
Period: 2000- 2019 years
Technologies: Python, Pandas, Matplotlib, Spacy

After lemmatisation (Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item) with Spacy distribution of most popular words has changed. Some wrong lemmas occurred but overall quality is quite good.

Also most popular Bi-gram (top two sequence of words ) and Tri-gram (top three sequence of words ) was produced. In this case non lemma set of words was used as lemma set for bi-gram and tri-gram does not contribute any extra information but provides more distortion.

First and second bi-gram should be treated as nosy info as it not provides any information. Other most popular bi-grams should be treated as negative, usually it prompts about distress, disaster, death… Only one bi-gram possesses religion.

Chart of Tri-gram actually did not changed overall picture of negative communication. We can not extract any important events in Lithuania and rest of the word. There are no dominating trends of economy and social development.

Published inLT press analysis

Be First to Comment

Leave a Reply

Your email address will not be published.