EFFICIENCY ESTIMATION OF METHODS FOR SENTIMENT ANALYSIS OF SOCIAL NETWORK MESSAGES

Authors

DOI:

https://doi.org/10.20998/2079-0023.2019.02.13

Keywords:

sentiment analysis, social networks messages analysis, machine learning, text classification, naïve Bayesian classification, recurrent neural network, efficiency estimation

Abstract

The results of effectiveness evaluating of machine learning methods for sentiment analysis of social network messages are presented in this paper. The importance of the sentiment analysis problem as one of the important tasks of natural language processing in general and textual information processing in particular is substantiated. A review of existing methods and software for sentiment analysis are made. The choice of classifiers for sentiment analysis of texts for this research is substantiated. The principles of functioning of a Naïve Bayesian Classifier and classifier based on a recurrent neural network are described. Classifiers were sequentially trained in two corpuses: first, in the RuTweetCorp corpus, the corpus of short messages from the social network Twitter, and then on the Slang corpus, the corpus of messages from social networks Facebook and Instagram and posts from the Pikabu website, second corpus have been marked up the tonality of slang words. Information about the tonality of slang words was taken from the youth slang dictionary obtained as a result of the survey of users. The separation of texts by tonality was carried out into three classes: positive, negative and neutral. The efficiency of these classifiers was evaluated. Efficiency evaluation was carried out according to standard metrics Recall, Precision, F-measure, Accuracy. For the naive Bayesian classifier, after training on the first corpus, the following metric values were obtained: Recall = 0,853; Precision = 0,869; F-measure = 0,861; Accuracy = 0,855; and after training on the second corpus such values were obtained: Recall = 0,948; Precision = 0,975; F-measure = 0,961; Accuracy = 0,960. For the classifier based on a recurrent neural network, after training on the first corpus, the following metric values were obtained: Recall = 0,870; Precision = 0,878; F-measure = 0,874; Accuracy = 0,861; and after training on the second corpus such values were obtained: Recall = 0,965; Precision = 0,982; F-measure = 0,973; Accuracy = 0,973. These results prove that additional training on the second corpus increased the efficiency of classifiers by 10–11%.

Author Biographies

Natalia Volodymyrivna Borysova, National Technical University "Kharkiv Polytechnic Institute"

Candidate of Engineering Sciences, National Technical University "Kharkiv Polytechnic Institute", Associate Professor, Department of Computer Science and Intellectual Property; Kharkiv, Ukraine

Karina Volodymyrivna Melnyk, National Technical University "Kharkiv Polytechnic Institute"

Candidate of Engineering Sciences, National Technical University "Kharkiv Polytechnic Institute", Associate Professor, Department of Software Engineering and Management Information Technology; Kharkiv, Ukraine

References

Ameur H., Jamoussi S., Hamadou A.B. A New Method for Sentiment Analysis Using Contextual Auto-Encoders. Journal of Computer Science and Technology. 2018, vol. 33, issue 6, pp. 1307–1319. DOI: https://doi.org/10.1007/s11390-018-1889-1.

Eureka Engine. Available at: http://eurckacngine.ru/ru/description (accessed 15.09.2019).

Huang M., Zhuang F., Zhang X. et al. Supervised representation learning for multi-label classification. Machine Learning. 2019, vol. 108, issue 5, pp. 747–763. DOI: https://doi.org/10.1007/ s10994-019-05783-5.

Jeffrey L. Elman. Finding Structure in Time. Cognitive Science. 1990, vol. 14, issue 2, pp. 179–211.

Melnyk K. V., Borysova N. V. Improving the quality of credit activity by using scoring model. Radio Electronics, Computer Science, Control. 2019, vol. 2, pp. 60–70. DOI 10.15588/1607-3274-2019-2-7 . e-ISSN 1607-3274.

Mikolov T., Karafiat M., Burget L., Cernocky J., Khudanpur S. Recurrent neural network based language model. Proceedings 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Makuhari, Chiba, Japan, 2010, рp. 1045–1048.

Nguyen-Trang T., Vo-Van T. A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in Data Analysis and Classification. 2017, volume 11, issue 3, pp. 629–643. DOI: https://doi.org/10.1007/s11634-016-0253-y.

Pang B., Lee L., Vaithyanathan Sh. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02), Association for Computational Linguistics. Vol. 10. 2002, pp. 79–86. DOI: https://doi.org/10.3115/1118693. 1118704.

Rahimi Z., Noferesti S., Shamsfard M. Applying data mining and machine learning techniques for sentiment shifter identification. Language Resources and Evaluation, 2019, vol. 53, issue 2, pp. 279–302. DOI: https://doi.org/10.1007/s10579-018-9432-0 .

RCO Fact Extractor SDK. Available at: http://www.rco.ru/ ?page_id=3554. (accessed 15.09.2019).

Rubtsova Y. Automatic Term Extraction for Sentiment Classification of Dynamically Updated Text Collections into Three Classes. Proceedings of International Conference on Knowledge Engineering and the Semantic Web (KESW 2014), Communications in Computer and Information Science. Vol. 468. Pp. 140–149. DOI: https://doi.org/10.1007/978-3-319-11716-4_12.

SentiStrength – sentiment strength detection in short texts. Available at: http://sentistrength.wlv.ac.uk/#About (accessed 15.09.2019).

System «Analytical Courier». Available at: http://www.iteco.ru/ solutions/business_intclligence_products/analytical_courier (accessed 15.09.2019).

VAAL project. Available at: http://www.vaal.ru (accessed 15.09.2019).

Wu L., Morstatter F., Liu H. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Language Resources and Evaluation. 2018, vol. 52, issue 3, pp. 839–852. DOI: https://doi.org/10.1007/ s10579-018-9416-0.

Borysova N. V, Niftilin V. V. Avtomatyzovane stvorennia elektronnogo slovnyka [Automated creation of electronic dictionary]. Informaciyni technologii: nauka, technika, technologiia, osvita, zdorov’ia: tezy dopovidei ХXV Mizhnarodnoi naukovo-practychnoi konferencii MicroCAD-2017. Ch. I [Proceedings of XXV International scientific-practical conference in Information technologies: science, engineering, technology, education, health MicroCAD-2017. Part I.]. Kharkiv: NTU "KhPI", 2017, p. 32.

Borysova N. V, Niftilin V. V. Zastosuvaniia metodiv korpusnoi lingvistiki dlia doslidzhennia osoblyvostei vykorystannia suchasnogo molodizhnogo slengu [Using of corpus linguistics methods to study the features of using modern youth slang]. Informaciyni technologii: nauka, technika, technologiia, osvita, zdorov’ia: tezy dopovidei ХXV Mizhnarodnoi naukovo-practychnoi konferencii MicroCAD-2018. Ch. I [Proceedings of XXV International scientific-practical conference in Information technologies: science, engineering, technology, education, health MicroCAD-2018. Part I.]. Kharkiv: NTU "KhPI", 2018, p. 27.

Korpus korotkih tekstov RuTweetCorp [Short texts corpus RuTweetCorp]. Available at: http://study.mokoron.com (accessed 15.09.2019).

Romanov A. V., Vasilieva M. I., Kurtukova A. V., Meshcheriakov R. V. Analiz tonalnosti tekstov s ispolzovaniem metodov mashinnogo obucheniia [Sentiment Analysis of Text Using Machine Learning Techniques]. Proceedings of the R. Piotrowski’s Readings in Language Engineering and Applied Linguistics. CEUR Workshop Proceedings. Vol.-2233. Saint Petersburg, Russia, 2017, pp. 86–95.

Rubtsova Yu. V. Postroenie korpusa tekstov dlia nastroyki tonovogo klassifikatora [Constructing a corpus for sentiment classification training]. Programnye produkty i sistemy [Program products and systems]. 2015, no. 1 (109), pp. 72–78. DOI: 10.15827/0236-235X.109.072-078.

Downloads

Published

2024-07-05

How to Cite

Borysova, N. V., & Melnyk, K. V. (2024). EFFICIENCY ESTIMATION OF METHODS FOR SENTIMENT ANALYSIS OF SOCIAL NETWORK MESSAGES. Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies, (2), 76–81. https://doi.org/10.20998/2079-0023.2019.02.13

Issue

Section

INFORMATION TECHNOLOGY