19 research outputs found

    A New Weighted k-Nearest Neighbor Algorithm Based on Newton¿s Gravitational Force

    Full text link
    [EN] The kNN algorithm has three main advantages that make it appealing to the community: it is easy to understand, it regularly offers competitive performance and its structure can be easily tuning to adapting to the needs of researchers to achieve better results. One of the variations is weighting the instances based on their distance. In this paper we propose a weighting based on the Newton's gravitational force, so that a mass (or relevance) has to be assigned to each instance. We evaluated this idea in the kNN context over 13 benchmark data sets used for binary and multi-class classification experiments. Results in F1 score, statistically validated, suggest that our proposal outperforms the original version of kNN and is statistically competitive with the distance weighted kNN version as well.This research was partially supported by CONACYT-Mexico (project FC-2410). The work of Paolo Rosso has been partially funded by the SomEMBED TIN2015-71147-C2-1-P MINECO research project.Aguilera, J.; González, LC.; Montes-Y-Gómez, M.; Rosso, P. (2019). A New Weighted k-Nearest Neighbor Algorithm Based on Newton¿s Gravitational Force. Lecture Notes in Computer Science. 11401:305-313. https://doi.org/10.1007/978-3-030-13469-3_36S3053131140

    Bayesian hypothesis testing in machine learning

    Get PDF
    Abstract. Most hypothesis testing in machine learning is done using the frequentist null-hypothesis significance test, which has severe drawbacks. We review recent Bayesian tests which overcome the drawbacks of the frequentist ones

    Unveiling and unraveling aggregation and dispersion fallacies in group MCDM

    Full text link
    Priorities in multi-criteria decision-making (MCDM) convey the relevance preference of one criterion over another, which is usually reflected by imposing the non-negativity and unit-sum constraints. The processing of such priorities is different than other unconstrained data, but this point is often neglected by researchers, which results in fallacious statistical analysis. This article studies three prevalent fallacies in group MCDM along with solutions based on compositional data analysis to avoid misusing statistical operations. First, we use a compositional approach to aggregate the priorities of a group of DMs and show that the outcome of the compositional analysis is identical to the normalized geometric mean, meaning that the arithmetic mean should be avoided. Furthermore, a new aggregation method is developed, which is a robust surrogate for the geometric mean. We also discuss the errors in computing measures of dispersion, including standard deviation and distance functions. Discussing the fallacies in computing the standard deviation, we provide a probabilistic criteria ranking by developing proper Bayesian tests, where we calculate the extent to which a criterion is more important than another. Finally, we explain the errors in computing the distance between priorities, and a clustering algorithm is specially tailored based on proper distance metrics

    Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections

    Get PDF
    Sequential data such as time series, video, or text can be challenging to analyse as the ordered structure gives rise to complex dependencies. At the heart of this is non-commutativity, in the sense that reordering the elements of a sequence can completely change its meaning. We use a classical mathematical object -- the tensor algebra -- to capture such dependencies. To address the innate computational complexity of high degree tensors, we use compositions of low-rank tensor projections. This yields modular and scalable building blocks for neural networks that give state-of-the-art performance on standard benchmarks such as multivariate time series classification and generative models for video.Comment: 33 pages, 5 figures, 6 table

    Sample size estimation for power and accuracy in the experimental comparison of algorithms

    Get PDF
    Experimental comparisons of performance represent an important aspect of research on optimization algorithms. In this work we present a methodology for defining the required sample sizes for designing experiments with desired statistical properties for the comparison of two methods on a given problem class. The proposed approach allows the experimenter to define desired levels of accuracy for estimates of mean performance differences on individual problem instances, as well as the desired statistical power for comparing mean performances over a problem class of interest. The method calculates the required number of problem instances, and runs the algorithms on each test instance so that the accuracy of the estimated differences in performance is controlled at the predefined level. Two examples illustrate the application of the proposed method, and its ability to achieve the desired statistical properties with a methodologically sound definition of the relevant sample sizes
    corecore