EFFICIENCY ESTIMATION OF METHODS FOR SENTIMENT ANALYSIS OF SOCIAL NETWORK MESSAGES
DOI:
https://doi.org/10.20998/2079-0023.2019.02.13Keywords:
sentiment analysis, social networks messages analysis, machine learning, text classification, naïve Bayesian classification, recurrent neural network, efficiency estimationAbstract
The results of effectiveness evaluating of machine learning methods for sentiment analysis of social network messages are presented in this paper. The importance of the sentiment analysis problem as one of the important tasks of natural language processing in general and textual information processing in particular is substantiated. A review of existing methods and software for sentiment analysis are made. The choice of classifiers for sentiment analysis of texts for this research is substantiated. The principles of functioning of a Naïve Bayesian Classifier and classifier based on a recurrent neural network are described. Classifiers were sequentially trained in two corpuses: first, in the RuTweetCorp corpus, the corpus of short messages from the social network Twitter, and then on the Slang corpus, the corpus of messages from social networks Facebook and Instagram and posts from the Pikabu website, second corpus have been marked up the tonality of slang words. Information about the tonality of slang words was taken from the youth slang dictionary obtained as a result of the survey of users. The separation of texts by tonality was carried out into three classes: positive, negative and neutral. The efficiency of these classifiers was evaluated. Efficiency evaluation was carried out according to standard metrics Recall, Precision, F-measure, Accuracy. For the naive Bayesian classifier, after training on the first corpus, the following metric values were obtained: Recall = 0,853; Precision = 0,869; F-measure = 0,861; Accuracy = 0,855; and after training on the second corpus such values were obtained: Recall = 0,948; Precision = 0,975; F-measure = 0,961; Accuracy = 0,960. For the classifier based on a recurrent neural network, after training on the first corpus, the following metric values were obtained: Recall = 0,870; Precision = 0,878; F-measure = 0,874; Accuracy = 0,861; and after training on the second corpus such values were obtained: Recall = 0,965; Precision = 0,982; F-measure = 0,973; Accuracy = 0,973. These results prove that additional training on the second corpus increased the efficiency of classifiers by 10–11%.
References
Ameur H., Jamoussi S., Hamadou A.B. A New Method for Sentiment Analysis Using Contextual Auto-Encoders. Journal of Computer Science and Technology. 2018, vol. 33, issue 6, pp. 1307–1319. DOI: https://doi.org/10.1007/s11390-018-1889-1.
Eureka Engine. Available at: http://eurckacngine.ru/ru/description (accessed 15.09.2019).
Huang M., Zhuang F., Zhang X. et al. Supervised representation learning for multi-label classification. Machine Learning. 2019, vol. 108, issue 5, pp. 747–763. DOI: https://doi.org/10.1007/ s10994-019-05783-5.
Jeffrey L. Elman. Finding Structure in Time. Cognitive Science. 1990, vol. 14, issue 2, pp. 179–211.
Melnyk K. V., Borysova N. V. Improving the quality of credit activity by using scoring model. Radio Electronics, Computer Science, Control. 2019, vol. 2, pp. 60–70. DOI 10.15588/1607-3274-2019-2-7 . e-ISSN 1607-3274.
Mikolov T., Karafiat M., Burget L., Cernocky J., Khudanpur S. Recurrent neural network based language model. Proceedings 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Makuhari, Chiba, Japan, 2010, рp. 1045–1048.
Nguyen-Trang T., Vo-Van T. A new approach for determining the prior probabilities in the classification problem by Bayesian method. Advances in Data Analysis and Classification. 2017, volume 11, issue 3, pp. 629–643. DOI: https://doi.org/10.1007/s11634-016-0253-y.
Pang B., Lee L., Vaithyanathan Sh. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02), Association for Computational Linguistics. Vol. 10. 2002, pp. 79–86. DOI: https://doi.org/10.3115/1118693. 1118704.
Rahimi Z., Noferesti S., Shamsfard M. Applying data mining and machine learning techniques for sentiment shifter identification. Language Resources and Evaluation, 2019, vol. 53, issue 2, pp. 279–302. DOI: https://doi.org/10.1007/s10579-018-9432-0 .
RCO Fact Extractor SDK. Available at: http://www.rco.ru/ ?page_id=3554. (accessed 15.09.2019).
Rubtsova Y. Automatic Term Extraction for Sentiment Classification of Dynamically Updated Text Collections into Three Classes. Proceedings of International Conference on Knowledge Engineering and the Semantic Web (KESW 2014), Communications in Computer and Information Science. Vol. 468. Pp. 140–149. DOI: https://doi.org/10.1007/978-3-319-11716-4_12.
SentiStrength – sentiment strength detection in short texts. Available at: http://sentistrength.wlv.ac.uk/#About (accessed 15.09.2019).
System «Analytical Courier». Available at: http://www.iteco.ru/ solutions/business_intclligence_products/analytical_courier (accessed 15.09.2019).
VAAL project. Available at: http://www.vaal.ru (accessed 15.09.2019).
Wu L., Morstatter F., Liu H. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Language Resources and Evaluation. 2018, vol. 52, issue 3, pp. 839–852. DOI: https://doi.org/10.1007/ s10579-018-9416-0.
Borysova N. V, Niftilin V. V. Avtomatyzovane stvorennia elektronnogo slovnyka [Automated creation of electronic dictionary]. Informaciyni technologii: nauka, technika, technologiia, osvita, zdorov’ia: tezy dopovidei ХXV Mizhnarodnoi naukovo-practychnoi konferencii MicroCAD-2017. Ch. I [Proceedings of XXV International scientific-practical conference in Information technologies: science, engineering, technology, education, health MicroCAD-2017. Part I.]. Kharkiv: NTU "KhPI", 2017, p. 32.
Borysova N. V, Niftilin V. V. Zastosuvaniia metodiv korpusnoi lingvistiki dlia doslidzhennia osoblyvostei vykorystannia suchasnogo molodizhnogo slengu [Using of corpus linguistics methods to study the features of using modern youth slang]. Informaciyni technologii: nauka, technika, technologiia, osvita, zdorov’ia: tezy dopovidei ХXV Mizhnarodnoi naukovo-practychnoi konferencii MicroCAD-2018. Ch. I [Proceedings of XXV International scientific-practical conference in Information technologies: science, engineering, technology, education, health MicroCAD-2018. Part I.]. Kharkiv: NTU "KhPI", 2018, p. 27.
Korpus korotkih tekstov RuTweetCorp [Short texts corpus RuTweetCorp]. Available at: http://study.mokoron.com (accessed 15.09.2019).
Romanov A. V., Vasilieva M. I., Kurtukova A. V., Meshcheriakov R. V. Analiz tonalnosti tekstov s ispolzovaniem metodov mashinnogo obucheniia [Sentiment Analysis of Text Using Machine Learning Techniques]. Proceedings of the R. Piotrowski’s Readings in Language Engineering and Applied Linguistics. CEUR Workshop Proceedings. Vol.-2233. Saint Petersburg, Russia, 2017, pp. 86–95.
Rubtsova Yu. V. Postroenie korpusa tekstov dlia nastroyki tonovogo klassifikatora [Constructing a corpus for sentiment classification training]. Programnye produkty i sistemy [Program products and systems]. 2015, no. 1 (109), pp. 72–78. DOI: 10.15827/0236-235X.109.072-078.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information TechnologiesAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).