|Period:||2000- 2019 years|
|Technologies:||Python, Pandas, Matplotlib, Spacy|
All article’s data was tokenized using regex. Lemma and Named-entity recognition (NER) was applied using Spacy.
Most common words used in titles are: “Jav”, “Lietuvos”, “žuvo”, “Rusijos”, “ES”. In this case stop words were removed but lemmatization was not applied.
Using Spacy (NER) all named entities were given six different labels:
GPE Geopolitical location LOC Location ORG Organisation PERSON Person PRODUCT Product TIME Time
Most often is used Geopolitical location (192.6 K) and person (122.9).
Top named entities in articles are: ‘JAV’, ‘Lietuva’, ‘Rusija’, ‘ES’.
Quite strange that among Lithuanian politicians only R.Paksas is in Top 30. Other most popular people mentioned in articles are: D. Kedys, A.Butkevičius, L. Graužinienė , A.Kubilius, D.Grybauskaitė.
If analyzing most popular named entities in article titles, we navigate a huge spike in keyword ‘Russia’ in year 2014. In this case all variations like ‘Rusija’, ‘Rusijos’, ‘Rusijoje’ were summed up. Very widely is used ‘USA’ which is evenly gaining popularity.