ADAPTATION OF LAMBDAMART MODEL TO SEMI-SUPERVISED LEARNING

Authors

DOI:

https://doi.org/10.20998/2079-0023.2023.01.12

Keywords:

learning to rank, information retrieval, semi-supervised learning, pairwise ranking, LambdaMART, pseudo labeling, NDCG

Abstract

The problem of information searching is very common in the age of the internet and Big Data. Usually, there are huge collections of documents and only multiple percent of them are relevant. In this setup brute-force methods are useless. Search engines help to solve this problem optimally. Most engines are based on learning to rank methods, i.e. first of all algorithm produce scores for documents based on they feature and after that sorts them according to the score in an appropriate order. There are a lot of algorithms in this area, but one of the most fastest and a robust algorithm for ranking is LambdaMART. This algorithm is based on boosting and developed only for supervised learning, where each document in the collection has a rank estimated by an expert. But usually, in this area, collections contain tons of documents and their annotation requires a lot of resources like time, money, experts, etc. In this case, semi-supervised learning is a powerful approach. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. This paper is dedicated to the adaptation of LambdaMART to semi-supervised learning. The author proposes to add different weights for labeled and unlabeled data during the training procedure to achieve higher robustness and accuracy. The proposed algorithm was implemented using Python programming language and LightGBM framework that already has supervised the implementation of LambdaMART. For testing purposes, multiple datasets were used. One synthetic 2D dataset for a visual explanation of results and two real-world datasets MSLR-WEB10K by Microsoft and Yahoo LTRC.

Author Biography

Klym Yamkovyi, National Technical University "Kharkiv Polytechnic Institute"

Національний технічний університет «Харківський політехнічний інститут», асистент кафедри комп’ютерної математики і аналізу даних, м. Харків, Україна

References

Burges C. J. C., Svore K. M., Wu Q., Gao J. Ranking, boosting and model adaptation. Available at: https://www.microsoft.com/en-us/research/publication/ranking-boosting-and-model-adaptation/ (accessed 07.04.2023).

Chang Y., Chapelle O. Yahoo! Learning to Rank Challenge Overview. JMLR: Workshop and Conference Proceedings 14. 2011, pp. 1–24.

Xu H., Li H. AdaRank: A Boosting Algorithm for Information Retrieval. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, pp. 391–398.

Yilmaz E., Szummer M. Semi-supervised Learning to Rank withPreference Regularization. Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, pp. 269–278.

Burges C. J. C. From RankNet to LambdaMART to LambdaMART: An Overview. Available at: https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/ (accessed 07.04.2023).

Grira N., Crucianu M., Boujemaa N. Unsupervised and Semi-supervised Clustering: a Brief Survey. Available at: http://cedric.cnam.fr/~crucianm/src/BriefSurveyClustering.pdf (accessed 07.04.2023)

Vapnik V. N. Statistical Learning Theory. New York, Wiley, 1998. 768 p.

Rahangdale A. U., Raut, S. Clustering Based Transductive Semi-supervised Learning for Learning-to-Rank. International Journal of Pattern Recognition and Artificial Intelligence. 2019, vol. 33, no. 12, pp. 1951007:1–1951007:27. DOI: 10.1142/s0218001419510078.

Amini M., Truong T., Goutte C. A Boosting Algorithm for Learning Bipartite Ranking Functions with Partially Labeled Data. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008. 2008, pp. 99–106.

Szummer M., Yilmaz E. Semi-supervised Learning to Rank with Preference Regularization. Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, pp. 269–278.

Weston J., Leslie C., Ie E., Zhou D., Elisseeff A., Noble W. S. Semi-supervised protein classification using cluster kernels. Bioinformatics. 2005, vol. 21, no. 15, pp. 3241–3247.

Valizadegan H., Jin R., Zhang R., and Mao J. Learning to Rank by Optimizing NDCG Measure. Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. 2009, pp. 1883–1891.

Downloads

Published

2023-07-15

How to Cite

Yamkovyi, K. (2023). ADAPTATION OF LAMBDAMART MODEL TO SEMI-SUPERVISED LEARNING. Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies, (1 (9), 76–81. https://doi.org/10.20998/2079-0023.2023.01.12

Issue

Section

INFORMATION TECHNOLOGY