PARSIMONIOUS MACHINE LEARNING MODELS IN REQUIREMENTS ELICITATION TECHNIQUES SELECTION
DOI:
https://doi.org/10.20998/2079-0023.2023.01.13Keywords:
requirements elicitation techniques, Bayesian Information Criterion, Bayes factor grades, log-likelihood, parsimonious modelAbstract
The subject of research in the article is machine learning algorithms used for requirement elicitation technique selection. The goal of the work is to build effective parsimonious machine learning models to predict the using particular elicitation techniques in IT projects that allow using as few predictor variables as possible without a significant deterioration in the prediction quality. The following tasks are solved in the article: design an algorithm to build parsimonious machine learning candidate models for requirement elicitation technique selection based on gathered information on practitioners' experience, assess parsimonious machine learning model accuracy, and design an algorithm for the best candidate model selection. The following methods are used: algorithm theory, statistics theory, sampling techniques, data modeling theory, and science experiments. The following results were obtained: 1) parsimonious machine learning candidate models were built for the requirement elicitation technique selection. They included less number of features that helps in the future to avoid overfitting problems associated with the best-fit models; 2) according to the proposed algorithm for best candidate selection – a single parsimonious model with satisfied performance was chosen. Conclusion: An algorithm is proposed to build parsimonious candidate models for requirement elicitation technique selection that avoids the overfitting problem. The algorithm for the best candidate model selection identifies when a parsimonious model's performance is degraded and decides on the suitable model's selection. Both proposed algorithms were successfully tested with four datasets and can be proposed for their extensions to others.
References
Gobov D., Huchenko, I. Influence of the Software Development Project Context on the Requirements Elicitation Techniques Selection. Lecture Notes on Data Engineering and Communications Technologies. Springer, Cham, 2021, vol 83, pp. 208–218. DOI: 10.1007/978-3-030-80472-5_18.
Gobov D., Solovei O. Approaches to Improving the Accuracy of Machine Learning Models in Requirements Elicitation Techniques Selection, arXiv:2303.14762, 2023. DOI: 10.48550/arXiv.2303.14762.
Harrell F. E. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis. New York, USA, Springer, 2001. 582 p.
Anderson D., Burnham K. Model selection and multi-model inference. Second ed. New York, USA, Springer-Verlag, 2004. 488 p. DOI: 10.1007/b97636.
Bursac Z., Gauss H. C., Williams D. K., and Hosmer D. W. Purposeful selection of variables in logistic regression. Source Code for Biology and Medicine. 2008, vol. 3 (17), pp. 3–17. DOI: 10.1186/1751-0473-3-17.
Zhang Z. Variable selection with stepwise and best subset approaches. Annals of translational medicine. 2016, vol. 4 (7), p. 136. DOI: 10.21037/atm.2016.03.35.
Solovei O. New organization process of feature selection by filter with correlation-based features selection method. Innovative Technologies and Scientific Solutions for Industries. 2022, vol. 3 (21), pp. 39–50. DOI: 10.30837/ITSSI.2022.21.039.
Vrieze S. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods. 2012, Vol. 17, no. 2, pp. 228–243. DOI:10.1037/a0027127.
Arnold T. Uninformative parameters and model selection using Akaike's Information Criterion. The Journal of Wildlife Management. 2010, vol. 74, no. 6, pp. 1175–1178. DOI: 10.1111/j.1937-2817.2010.tb01236.x.
van de Schoot R., Depaoli S., King R., Kramer B., Märtens K., Tadesse M. G., Vannucci M., Gelman A., Veen D., Willemsen J., Yau C. Bayesian statistics and modelling. Nature Reviews Methods Primers. 2021, 1, vol. 1. DOI: 10.1038/s43586-020-00001-2.
Lesaffre E., Lawson A. Bayesian Biostatistics. West Sussex, United Kingdom, John Wiley & Sons, 2012. 544 р. DOI: 10.1002/9781119942412.
Wasserman L. Bayesian model selection and model averaging. Journal of mathematical psychology. 2000, vol. 44, no. 1, pp. 92–107. DOI: 10.1006/jmps.1999.1278.
Rouder J.N., Speckman P.L., Sun D., Morey R. D., Iverson G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic bulletin & review. 2009, vol.16, pp. 225–237. DOI: 10.3758/PBR.16.2.225.
Hosmer Jr. D., Lemeshov S., Sturdivant R. Applied logistic regression, West Sussex, United Kingdom, John Wiley & Sons, 2013, 510 p.
Ghoroghi A., Rezgui Y., Petri I., Beach T. Advances in application of machine learning to life cycle assessment: a literature review. The International Journal of Life Cycle Assessment. 2022, vol. 27, pp. 433–456. DOI: 10.1007/s11367-022-02030-3.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).