skip to main content
research-article

The Distance Function Optimization for the Near Neighbors-Based Classifiers

Authors Info & Claims
Published:30 July 2022Publication History
Skip Abstract Section

Abstract

Based on the analysis of conditions for a good distance function we found four rules that should be fulfilled. Then, we introduce two new distance functions, a metric and a pseudometric one. We have tested how they fit for distance-based classifiers, especially for the IINC classifier. We rank distance functions according to several criteria and tests. Rankings depend not only on criteria or nature of the statistical test, but also whether it takes into account different difficulties of tasks or whether it considers all tasks as equally difficult. We have found that the new distance functions introduced belong among the four or five best out of 23 distance functions. We have tested them on 24 different tasks, using the mean, the median, the Friedman aligned test, and the Quade test. Our results show that a suitable distance function can improve behavior of distance-based classification rules.

REFERENCES

  1. [1] Alkasassbeh M., Altarawnwh G. A., and Hassanat A. B.. 2015. On enhancing the performance of nearest neighbor classifiers using hassanat distance metric. Canadian Journal of Pure and Applied Science 9, 1 (2015), 6.Google ScholarGoogle Scholar
  2. [2] Angiulli F. and Fassetti F.. 2013. Nearest neighbor-based classification of uncertain data. ACM Transactions on Knowledge Discovery from Data 7, 1, Article 1 (2013), 35, DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ashraf M., Le K., and Huang X.. 2011. Iterative weighted k-NN for constructing missing feature values in wisconsin breast cancer dataset. In Proceedings of the 3rd International Conference on Data Mining and Intelligent Information Technology Applications , Macao, 24–26 Oct. 2011, 23–27, ISBN: 978-1-4673-0231-9 (IEEE)Google ScholarGoogle Scholar
  4. [4] Benzi M., Cullum J. K., and Tu̇ma M.. 2000. Robust approximate inverse preconditioning for the conjugate gradient method. SIAM Journal on Scientific Computing 22, 1318–1332.Google ScholarGoogle Scholar
  5. [5] Cover T. M. and Hart P. E.. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (1967), 21–27.Google ScholarGoogle Scholar
  6. [6] Derrac J., Garcia S., Molina D., and Herrera F.. 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation 1, 1 (2011), 3–18.Google ScholarGoogle Scholar
  7. [7] Deza E. and Deza M. M.. 2006. Dictionary of Distances. Elsevier, Amsterdam, 391.Google ScholarGoogle Scholar
  8. [8] Deza M. M. and Deza E.. 2009. Encyklopedia of Distances. Springer, Heildelberg, 590.Google ScholarGoogle Scholar
  9. [9] Domeniconi C., Peng J., and Gunopulos D.. 2002. Locally adaptive metric nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 9 (2002), 1281–1285.Google ScholarGoogle Scholar
  10. [10] Dua D. and Taniskidou E. Karra. 2017. UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. Retrieved 13 March, 2008 from http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  11. [11] Duda R., Hart P., and Stork D. G.. 2000. Pattern Classification. John Wiley and Sons, 2000.Google ScholarGoogle Scholar
  12. [12] Grassberger P. and Procaccia I.. 1983. Measuring the strangeness of strange attractors. Physica 9D, 1–2 (1983), 189–208.Google ScholarGoogle Scholar
  13. [13] Hassanat A. B.. 2014. Dimensionality invariant similarity measure. Journal of American Science, 10, 8 (2014), 221–226.Google ScholarGoogle Scholar
  14. [14] Hassanat A. B., Abbadi M. A., Altarawneh G. A., Alhasanat A. A.. 2014. Solving problem of K parameter in the KNN classifier using an ensemble learning approach. International Journal of Computer Science and Information Security 12, 8 (2014), 33–39.Google ScholarGoogle Scholar
  15. [15] Jiřina M. and Jiřina M., Jr. 2013. Utilization of singularity exponent in nearest neighbor based classifier. Journal of Classification 30, 1 (2013), 3–29. ISSN 0176–4268.Google ScholarGoogle Scholar
  16. [16] Jiřina M. and Jiřina M., Jr. 2014. Correlation dimension based classifier. IEEE Transactions on Cybernetics 44, 12 (2014), 2253–2263. ISSN 2168–2267.Google ScholarGoogle Scholar
  17. [17] Jiřina M. and Jiřina M., Jr. 2015. Classification using zipfian kernel. Journal of Classification (Springer) 32, 2 (2015), 305–326. ISSN 0176–4268.Google ScholarGoogle Scholar
  18. [18] Joachims T.. 1999. Making large-scale SVM learning practical. In Proceedings of the Advances in Kernel Methods - Support Vector Learning, (Eds). B. Scholkopf, C. Burges and A. Smola, MIT-Press.Google ScholarGoogle Scholar
  19. [19] Joachims T.. 2008. Program codes for SVM-light and SVM-multiclass. Retrieved 30 Jan., 2014 from http://svmlight.joachims.org/.Google ScholarGoogle Scholar
  20. [20] Kontorovich A. and Weiss R.. 2015. A bayes consistent 1-NN classifier. In Proceedings of the 18th International Conference on Artifficial Intelligence and Statistics 2015, San Diego, JMLR: W&CP 38, 480–488.Google ScholarGoogle Scholar
  21. [21] Li B., Chen Y. W., and Chen Y. Q.. 2008. The nearest neighbor algorithm of local probability centers. IEEE Transactions on Systems, Man, and Cybernetics/Part B: Cybernetics 38, 1 (2008), 141–154.Google ScholarGoogle Scholar
  22. [22] Luschow A. and Wartena C.. 2017. Classifying medical literature using k-nearest-neighbours algorithm. In Proceedings of the 17th European Networked Knowledge Organization Systems Workshop Co-located with the 21st International Conference on Theory and Practice of Digital Libraries 2017 , Mayr P., Tudhope D., Golub K., Wartena C., Luca E. W. D. (Eds.), CEUR-WS.org, CEUR Workshop Proceedings, Vol. 1937, pp. 26–38. Retrieved from http://ceur-ws.org/Vol-1937/paper3.pdf.Google ScholarGoogle Scholar
  23. [23] Mandelbrot B. B.. 1982. The Fractal Geometry of Nature. W. H. Freeman and Co., ISBN 0-7167-1186-9.Google ScholarGoogle Scholar
  24. [24] Mishra A.. 2020. k-nearest neighbor (k-NN) for machine learning. Data Science Foundation, May 2020, 4 pp., Retrieved from https://datascience.foundation/datatalk/k-nearest-neighbor-k-nn-for-machine-learning.Google ScholarGoogle Scholar
  25. [25] Muja M. and Lowe D. G.. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 2227–2240.Google ScholarGoogle Scholar
  26. [26] Noh Y. -K., Zhang B. T., and Lee D. D.. 2018. Generative local metric learning for nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 106–118.Google ScholarGoogle Scholar
  27. [27] Paredes R.. 2008. CPW: Class and prototype weights learning. Retrieved 11 Dec., 2007 from http://www.dsic.upv.es/rparedes/research/CPW/index.html.Google ScholarGoogle Scholar
  28. [28] Paredes R.. 2010. Data sets corpora. Retrieved 11 Dec., 2007 from Available http://algoval.essex.ac.uk/data/vector/UCI/, in fact, the primary source is S. M. Lucas, Algoval: Algorithm Evaluation over the Web.Google ScholarGoogle Scholar
  29. [29] Paredes R. and Vidal E.. 2006. Learning weighted metrics to minimize nearest neighbor classification error. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 7 (2006), 1100–1110.Google ScholarGoogle Scholar
  30. [30] Piryonesi S. M. and El-Diraby T. E.. 2020. Role of data analytics in infrastructure asset management: Overcoming data size and quality problems. Journal of Transportation Engineering, Part B: Pavements. 146, 2 (2020), 1–15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Garcia S., Wozniak M., and Krawczyk B.. 2017. Nearest neighbor classification for high-speed big data streams using spark. IEEE Transactions on Systems, Man and Cybernetics Systems 47, 10 (2017), 2727–2739.Google ScholarGoogle Scholar
  32. [32] Samworth B. J.. 2012. Optimal weighted nearest neighbour classifiers. The Annals of Statistics 40, 5 (2012), 2733–2763. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Silverman B. W.. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.Google ScholarGoogle Scholar
  34. [34] Weinberger K. Q. and Saul L. K.. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 2 (2009), 207–244.Google ScholarGoogle Scholar
  35. [35] Xiong F., Kam M., Hrebien L., Wang B., and Qi Y.. 2016. Kernelized information-theoretic metric learning for cancer diagnosis using high-dimensional moleculr profiling data. ACM Transactions on Knowledge Discovery from Data 10, 4, Article 38 (2016), 23. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Yu D., Yu X., and Wu A.. 2011. Making the nearest neighbor meaningful for time series classification. In Proceedings of the 4th International Congress on Image and Signal Processing. 2481–2485.Google ScholarGoogle Scholar
  37. [37] Zhang B. and Srihari S. N.. 2004. Fast k-nearest neighbor classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 4 (2004), 525–528.Google ScholarGoogle Scholar
  38. [38] Liang Y.. 2018. Integrating forest inventory data and MODIS data to map species-level biomass in chinese boreal forests. Canadian Journal of Forest Research 48, 5 (2018), 461–479. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The Distance Function Optimization for the Near Neighbors-Based Classifiers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 6
        December 2022
        631 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3543989
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 July 2022
        • Online AM: 24 February 2022
        • Accepted: 1 November 2020
        • Revised: 1 September 2020
        • Received: 1 April 2019
        Published in tkdd Volume 16, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)66
        • Downloads (Last 6 weeks)7

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format