AN ALGORITHM FOR NLP-BASED SIMILARITY MEASUREMENT OF ACTIVITY LABELS IN A DATABASE OF BUSINESS PROCESS MODELS
DOI:
https://doi.org/10.20998/2079-0023.2023.01.08Keywords:
business process model, database of business process models, natural language processing, similarity measurement algorithm, activity labels, software implementation of the algorithmAbstract
Business process modeling is an important part of organizational management since it enables companies to obtain insights into their operational workflows and find opportunities for development. However, evaluating and quantifying the similarity of multiple business process models can be difficult because these models frequently differ greatly in terms of structure and nomenclature. This study offers an approach that uses natural language processing techniques to evaluate the similarity of business process models in order to address this issue. The algorithm uses the activity labels given in the business process models as input to produce textual descriptions of the associated business processes. The algorithm includes various preprocessing stages to guarantee that the textual descriptions are correct and consistent. First, single words are retrieved and transformed to lower case from the resulting textual descriptions. After that, all non-alphabetic and stop words are removed from the retrieved words. The remaining words are then stemmed, which includes reducing them to their base form. The algorithm evaluates the similarity of distinct business process models using similarity measures, including Jaccard, Sorensen – Dice, overlap, and simple matching coefficients, after the textual descriptions have been prepared and preprocessed. These metrics provide a more detailed understanding of the similarities and differences across various business process models, which can then be used to influence decision-making and business process improvement initiatives. The software implementation of the proposed algorithm demonstrates its usage for similarity measurement in a database of business process models. Experiments show that the developed algorithm is 31% faster than a search based on the SQL LIKE clause and allows finding 18% more similar models in the business process model database.
References
Geiger M. et al. BPMN 2.0: The state of support and implementation. Available at: https://doi.org/10.1016/j.future.2017.01.006 (accessed 01.04.2023).
Fettke P. et al. Business Process Reference Models: Survey and Classification. Available at: https://doi.org/10.1007/11678564_44 (accessed 01.04.2023).
APQC Process Classification Framework. Available at: https://www.signavio.com/reference-models/apqc-framework/ (accessed 01.04.2023).
SCOR Model. Available at: https://scor.ascm.org/ (accessed 01.04.2023).
Dumas M. et al. Similarity Search of Business Process Models. Available at: http://sites.computer.org/debull/A09sept/marlon.pdf (accessed 02.04.2023).
Dijkman R. Similarity of business process models: Metrics and evaluation. Available at: https://doi.org/10.1016/j.is.2010.09.006 (accessed 02.04.2023).
Humm B. G., Fengel J. Semantics-Based Business Process Model Similarity. Available at: https://doi.org/10.1007/978-3-642-30359-3_4 (accessed 02.04.2023).
Yan Z., Dijkman R. Fast business process similarity search. Available at: https://doi.org/10.1007/s10619-012-7089-z (accessed 02.04.2023).
van Dongen B. et al. Measuring Similarity between Business Process Models. Available at: https://doi.org/10.1007/978-3-540-69534-9_34 (accessed 02.04.2023).
Kopp A. M., Orlovskyi D. L. Estimation and analysis of business process models similarity in enterprise continuum repository. Available at: https://doi.org/10.20535/SRIT.2308-8893.2018.4.04 (accessed 02.04.2023).
Verma V., Aggarwal R. K. A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective. Available at: https://doi.org/10.1007/s13278-020-00660-9 (accessed 04.04.2023).
Kopp A., Orlovskyi D. The approach and the software tool to calculate semantic quality measures of business process models. Available at: http://dx.doi.org/10.20998/2079-0023.2022.02.12 (accessed 04.04.2023).
Python. Available at: https://www.python.org/ (accessed 06.04.2023).
NLTK. Available at: https://www.nltk.org/ (accessed 06.04.2023).
MySQL Connector/Python Developer Guide. Available at: https://dev.mysql.com/doc/connector-python/en/ (accessed 06.04.2023).
MySQL. Available at: https://www.mysql.com/ (accessed 07.04.2023).
Porter Stemmer. Available at: https://tartarus.org/martin/PorterStemmer/ (accessed 08.04.2023).
BPMN for research. Available at: https://github.com/camunda/bpmn-for-research (accessed 10.04.2023).
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).