Abstract
Based on the analysis of conditions for a good distance function we found four rules that should be fulfilled. Then, we introduce two new distance functions, a metric and a pseudometric one. We have tested how they fit for distance-based classifiers, especially for the IINC classifier. We rank distance functions according to several criteria and tests. Rankings depend not only on criteria or nature of the statistical test, but also whether it takes into account different difficulties of tasks or whether it considers all tasks as equally difficult. We have found that the new distance functions introduced belong among the four or five best out of 23 distance functions. We have tested them on 24 different tasks, using the mean, the median, the Friedman aligned test, and the Quade test. Our results show that a suitable distance function can improve behavior of distance-based classification rules.
- [1] . 2015. On enhancing the performance of nearest neighbor classifiers using hassanat distance metric. Canadian Journal of Pure and Applied Science 9, 1 (2015), 6.Google Scholar
- [2] . 2013. Nearest neighbor-based classification of uncertain data. ACM Transactions on Knowledge Discovery from Data 7, 1, Article 1 (2013), 35,
DOI :Google ScholarDigital Library - [3] . 2011. Iterative weighted k-NN for constructing missing feature values in wisconsin breast cancer dataset. In Proceedings of the 3rd International Conference on Data Mining and Intelligent Information Technology Applications , Macao, 24–26 Oct. 2011, 23–27, ISBN: 978-1-4673-0231-9 (IEEE)Google Scholar
- [4] . 2000. Robust approximate inverse preconditioning for the conjugate gradient method. SIAM Journal on Scientific Computing 22, 1318–1332.Google Scholar
- [5] . 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (1967), 21–27.Google Scholar
- [6] . 2011. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation 1, 1 (2011), 3–18.Google Scholar
- [7] . 2006. Dictionary of Distances. Elsevier, Amsterdam, 391.Google Scholar
- [8] . 2009. Encyklopedia of Distances. Springer, Heildelberg, 590.Google Scholar
- [9] . 2002. Locally adaptive metric nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 9 (2002), 1281–1285.Google Scholar
- [10] . 2017. UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. Retrieved 13 March, 2008 from http://archive.ics.uci.edu/ml.Google Scholar
- [11] . 2000. Pattern Classification. John Wiley and Sons, 2000.Google Scholar
- [12] . 1983. Measuring the strangeness of strange attractors. Physica 9D, 1–2 (1983), 189–208.Google Scholar
- [13] . 2014. Dimensionality invariant similarity measure. Journal of American Science, 10, 8 (2014), 221–226.Google Scholar
- [14] . 2014. Solving problem of K parameter in the KNN classifier using an ensemble learning approach. International Journal of Computer Science and Information Security 12, 8 (2014), 33–39.Google Scholar
- [15] , Jr. 2013. Utilization of singularity exponent in nearest neighbor based classifier. Journal of Classification 30, 1 (2013), 3–29. ISSN 0176–4268.Google Scholar
- [16] , Jr. 2014. Correlation dimension based classifier. IEEE Transactions on Cybernetics 44, 12 (2014), 2253–2263. ISSN 2168–2267.Google Scholar
- [17] , Jr. 2015. Classification using zipfian kernel. Journal of Classification (Springer) 32, 2 (2015), 305–326. ISSN 0176–4268.Google Scholar
- [18] . 1999. Making large-scale SVM learning practical. In Proceedings of the Advances in Kernel Methods - Support Vector Learning, (Eds). B. Scholkopf, C. Burges and A. Smola, MIT-Press.Google Scholar
- [19] . 2008. Program codes for SVM-light and SVM-multiclass. Retrieved 30 Jan., 2014 from http://svmlight.joachims.org/.Google Scholar
- [20] . 2015. A bayes consistent 1-NN classifier. In Proceedings of the 18th International Conference on Artifficial Intelligence and Statistics 2015, San Diego, JMLR: W&CP 38, 480–488.Google Scholar
- [21] . 2008. The nearest neighbor algorithm of local probability centers. IEEE Transactions on Systems, Man, and Cybernetics/Part B: Cybernetics 38, 1 (2008), 141–154.Google Scholar
- [22] . 2017. Classifying medical literature using k-nearest-neighbours algorithm. In Proceedings of the 17th European Networked Knowledge Organization Systems Workshop Co-located with the 21st International Conference on Theory and Practice of Digital Libraries 2017 , Mayr P., Tudhope D., Golub K., Wartena C., Luca E. W. D. (Eds.), CEUR-WS.org, CEUR Workshop Proceedings, Vol. 1937, pp. 26–38. Retrieved from http://ceur-ws.org/Vol-1937/paper3.pdf.Google Scholar
- [23] . 1982. The Fractal Geometry of Nature. W. H. Freeman and Co., ISBN 0-7167-1186-9.Google Scholar
- [24] . 2020. k-nearest neighbor (k-NN) for machine learning. Data Science Foundation, May 2020, 4 pp., Retrieved from https://datascience.foundation/datatalk/k-nearest-neighbor-k-nn-for-machine-learning.Google Scholar
- [25] . 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 2227–2240.Google Scholar
- [26] . 2018. Generative local metric learning for nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 106–118.Google Scholar
- [27] . 2008. CPW: Class and prototype weights learning. Retrieved 11 Dec., 2007 from http://www.dsic.upv.es/rparedes/research/CPW/index.html.Google Scholar
- [28] . 2010. Data sets corpora. Retrieved 11 Dec., 2007 from Available http://algoval.essex.ac.uk/data/vector/UCI/, in fact, the primary source is S. M. Lucas, Algoval: Algorithm Evaluation over the Web.Google Scholar
- [29] . 2006. Learning weighted metrics to minimize nearest neighbor classification error. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 7 (2006), 1100–1110.Google Scholar
- [30] . 2020. Role of data analytics in infrastructure asset management: Overcoming data size and quality problems. Journal of Transportation Engineering, Part B: Pavements. 146, 2 (2020), 1–15.
DOI :Google ScholarCross Ref - [31] . 2017. Nearest neighbor classification for high-speed big data streams using spark. IEEE Transactions on Systems, Man and Cybernetics Systems 47, 10 (2017), 2727–2739.Google Scholar
- [32] . 2012. Optimal weighted nearest neighbour classifiers. The Annals of Statistics 40, 5 (2012), 2733–2763.
DOI :Google ScholarCross Ref - [33] . 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.Google Scholar
- [34] . 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 2 (2009), 207–244.Google Scholar
- [35] . 2016. Kernelized information-theoretic metric learning for cancer diagnosis using high-dimensional moleculr profiling data. ACM Transactions on Knowledge Discovery from Data 10, 4, Article 38 (2016), 23.
DOI :Google ScholarDigital Library - [36] . 2011. Making the nearest neighbor meaningful for time series classification. In Proceedings of the 4th International Congress on Image and Signal Processing. 2481–2485.Google Scholar
- [37] . 2004. Fast k-nearest neighbor classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 4 (2004), 525–528.Google Scholar
- [38] . 2018. Integrating forest inventory data and MODIS data to map species-level biomass in chinese boreal forests. Canadian Journal of Forest Research 48, 5 (2018), 461–479.
DOI: Google ScholarCross Ref
Index Terms
- The Distance Function Optimization for the Near Neighbors-Based Classifiers
Recommendations
A Normalized Levenshtein Distance Metric
Although a number of normalized edit distances presented so far may offer good performance in some applications, none of them can be regarded as a genuine metric between strings because they do not satisfy the triangle inequality. Given two strings X ...
An ADMM-based scheme for distance function approximation
AbstractA novel variational problem for approximating the distance function (to a domain boundary) is proposed. It is shown that this problem can be efficiently solved by ADMM. A review of several other variational and PDE-based methods for distance ...
Graph Edit Distance or Graph Edit Pseudo-Distance?
Structural, Syntactic, and Statistical Pattern RecognitionAbstractGraph Edit Distance has been intensively used since its appearance in 1983. This distance is very appropriate if we want to compare a pair of attributed graphs from any domain and obtain not only a distance, but also the best correspondence ...
Comments