PARSIMONIOUS MACHINE LEARNING MODELS IN REQUIREMENTS ELICITATION TECHNIQUES SELECTION

Authors

DOI:

https://doi.org/10.20998/2079-0023.2023.01.13

Keywords:

requirements elicitation techniques, Bayesian Information Criterion, Bayes factor grades, log-likelihood, parsimonious model

Abstract

The subject of research in the article is machine learning algorithms used for requirement elicitation technique selection. The goal of the work is to build effective parsimonious machine learning models to predict the using particular elicitation techniques in IT projects that allow using as few predictor variables as possible without a significant deterioration in the prediction quality. The following tasks are solved in the article: design an algorithm to build parsimonious machine learning candidate models for requirement elicitation technique selection based on gathered information on practitioners' experience, assess parsimonious machine learning model accuracy, and design an algorithm for the best candidate model selection. The following methods are used: algorithm theory, statistics theory, sampling techniques, data modeling theory, and science experiments. The following results were obtained: 1) parsimonious machine learning candidate models were built for the requirement elicitation technique selection. They included less number of features that helps in the future to avoid overfitting problems associated with the best-fit models; 2) according to the proposed algorithm for best candidate selection – a single parsimonious model with satisfied performance was chosen. Conclusion: An algorithm is proposed to build parsimonious candidate models for requirement elicitation technique selection that avoids the overfitting problem. The algorithm for the best candidate model selection identifies when a parsimonious model's performance is degraded and decides on the suitable model's selection. Both proposed algorithms were successfully tested with four datasets and can be proposed for their extensions to others.

Author Biographies

Olga Solovei, Kyiv National University of Construction and Architecture

Candidate of Technical Sciences (Ph.D.), Kyiv National University of Construction and Architecture, Associate Professor at the Department of Information Technology for Design and Applied Mathematics, Kyiv, Ukraine

Denys Gobov, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Candidate of Technical Sciences (Ph.D.), National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Associate Professor at the Department of Computer Science and Software Engineering of the Faculty of Informatics and Computer Science, Kyiv, Ukraine

References

Gobov D., Huchenko, I. Influence of the Software Development Project Context on the Requirements Elicitation Techniques Selection. Lecture Notes on Data Engineering and Communications Technologies. Springer, Cham, 2021, vol 83, pp. 208–218. DOI: 10.1007/978-3-030-80472-5_18.

Gobov D., Solovei O. Approaches to Improving the Accuracy of Machine Learning Models in Requirements Elicitation Techniques Selection, arXiv:2303.14762, 2023. DOI: 10.48550/arXiv.2303.14762.

Harrell F. E. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis. New York, USA, Springer, 2001. 582 p.

Anderson D., Burnham K. Model selection and multi-model inference. Second ed. New York, USA, Springer-Verlag, 2004. 488 p. DOI: 10.1007/b97636.

Bursac Z., Gauss H. C., Williams D. K., and Hosmer D. W. Purposeful selection of variables in logistic regression. Source Code for Biology and Medicine. 2008, vol. 3 (17), pp. 3–17. DOI: 10.1186/1751-0473-3-17.

Zhang Z. Variable selection with stepwise and best subset approaches. Annals of translational medicine. 2016, vol. 4 (7), p. 136. DOI: 10.21037/atm.2016.03.35.

Solovei O. New organization process of feature selection by filter with correlation-based features selection method. Innovative Technologies and Scientific Solutions for Industries. 2022, vol. 3 (21), pp. 39–50. DOI: 10.30837/ITSSI.2022.21.039.

Vrieze S. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods. 2012, Vol. 17, no. 2, pp. 228–243. DOI:10.1037/a0027127.

Arnold T. Uninformative parameters and model selection using Akaike's Information Criterion. The Journal of Wildlife Management. 2010, vol. 74, no. 6, pp. 1175–1178. DOI: 10.1111/j.1937-2817.2010.tb01236.x.

van de Schoot R., Depaoli S., King R., Kramer B., Märtens K., Tadesse M. G., Vannucci M., Gelman A., Veen D., Willemsen J., Yau C. Bayesian statistics and modelling. Nature Reviews Methods Primers. 2021, 1, vol. 1. DOI: 10.1038/s43586-020-00001-2.

Lesaffre E., Lawson A. Bayesian Biostatistics. West Sussex, United Kingdom, John Wiley & Sons, 2012. 544 р. DOI: 10.1002/9781119942412.

Wasserman L. Bayesian model selection and model averaging. Journal of mathematical psychology. 2000, vol. 44, no. 1, pp. 92–107. DOI: 10.1006/jmps.1999.1278.

Rouder J.N., Speckman P.L., Sun D., Morey R. D., Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic bulletin & review. 2009, vol.16, pp. 225–237. DOI: 10.3758/PBR.16.2.225.

Hosmer Jr. D., Lemeshov S., Sturdivant R. Applied logistic regression, West Sussex, United Kingdom, John Wiley & Sons, 2013, 510 p.

Ghoroghi A., Rezgui Y., Petri I., Beach T. Advances in application of machine learning to life cycle assessment: a literature review. The International Journal of Life Cycle Assessment. 2022, vol. 27, pp. 433–456. DOI: 10.1007/s11367-022-02030-3.

Downloads

Published

2023-07-15

How to Cite

Solovei, O., & Gobov, D. (2023). PARSIMONIOUS MACHINE LEARNING MODELS IN REQUIREMENTS ELICITATION TECHNIQUES SELECTION. Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies, (1 (9), 82–88. https://doi.org/10.20998/2079-0023.2023.01.13

Issue

Section

INFORMATION TECHNOLOGY