Part 2
Articles: | 472 892 |
Period: | 2000- 2019 years |
Technologies: | Python, Pandas, Matplotlib, Spacy |
All article’s data was tokenized using regex. Lemma and Named-entity recognition (NER) was applied using Spacy.
Most common words used in titles are: “Jav”, “Lietuvos”, “žuvo”, “Rusijos”, “ES”. In this case stop words were removed but lemmatization was not applied.

Using Spacy (NER) all named entities were given six different labels:
GPE Geopolitical location LOC Location ORG Organisation PERSON Person PRODUCT Product TIME Time

Most often is used Geopolitical location (192.6 K) and person (122.9).
Top named entities in articles are: ‘JAV’, ‘Lietuva’, ‘Rusija’, ‘ES’.

Quite strange that among Lithuanian politicians only R.Paksas is in Top 30. Other most popular people mentioned in articles are: D. Kedys, A.Butkevičius, L. Graužinienė , A.Kubilius, D.Grybauskaitė.
If analyzing most popular named entities in article titles, we navigate a huge spike in keyword ‘Russia’ in year 2014. In this case all variations like ‘Rusija’, ‘Rusijos’, ‘Rusijoje’ were summed up. Very widely is used ‘USA’ which is evenly gaining popularity.

Comments are closed.