24,344 research outputs found

    Ranking Median Regression: Learning to Order through Local Consensus

    Full text link
    This article is devoted to the problem of predicting the value taken by a random permutation Σ\Sigma, describing the preferences of an individual over a set of numbered items {1,  ,  n}\{1,\; \ldots,\; n\} say, based on the observation of an input/explanatory r.v. XX e.g. characteristics of the individual), when error is measured by the Kendall τ\tau distance. In the probabilistic formulation of the 'Learning to Order' problem we propose, which extends the framework for statistical Kemeny ranking aggregation developped in \citet{CKS17}, this boils down to recovering conditional Kemeny medians of Σ\Sigma given XX from i.i.d. training examples (X1,Σ1),  ,  (XN,ΣN)(X_1, \Sigma_1),\; \ldots,\; (X_N, \Sigma_N). For this reason, this statistical learning problem is referred to as \textit{ranking median regression} here. Our contribution is twofold. We first propose a probabilistic theory of ranking median regression: the set of optimal elements is characterized, the performance of empirical risk minimizers is investigated in this context and situations where fast learning rates can be achieved are also exhibited. Next we introduce the concept of local consensus/median, in order to derive efficient methods for ranking median regression. The major advantage of this local learning approach lies in its close connection with the widely studied Kemeny aggregation problem. From an algorithmic perspective, this permits to build predictive rules for ranking median regression by implementing efficient techniques for (approximate) Kemeny median computations at a local level in a tractable manner. In particular, versions of kk-nearest neighbor and tree-based methods, tailored to ranking median regression, are investigated. Accuracy of piecewise constant ranking median regression rules is studied under a specific smoothness assumption for Σ\Sigma's conditional distribution given XX

    Do performance measures of donors' aid allocation underperform?

    Get PDF
    Indices of donor performance abound. Their recent popularity has occurred within the context of pessimism over aid's impact and optimism over the effect of changes in donor behaviour. Rankings of donor allocative performance aim to change donor behaviour, either through direct pressure on governments or indirectly through public engagement. The indices themselves rely on descriptive measures, and typically claim methodological superiority over positive alternatives due to their simplicity. However, there are two problems. First, measures do not seem robust to simple variations in methodology. Second, correlation amongst competing indices is low, leading to a host of contradictory judgements. This offers neither clear technical guidance nor consistent political pressure. The advantages and disadvantages of the approach are discussed, building upon the more general critique of aggregate indices. I suggest a graphical solution that embraces the advantages of the descriptive approach (including ease of public communication) while avoiding some of its major weaknesses (which typically stem from aggregation)

    A Geo-Statistical Analysis of Road Mortality in the Enlarged EU

    Get PDF
    This paper aims at showing and understanding the spatial regional disparities hidden behind average national statistics on road fatalities in Europe; special attention is given on the EU last enlargement. The work is not limited on differences descriptions, but unveils what is hidden behind the observed infra-national heterogeneity in terms of road risk. It is indeed common practice to compare countries in terms of road safety performance and to rank them in terms of a risk indicator such as the mortality rate, which is often expressed by the number of fatalities due to road accidents per 100,000 inhabitants. Some countries are known for their very bad risk records and are often pointed out by national or international authorities, without any understanding of the regional differences hidden behind a national mean value. The data analysis shows that changes in the level of spatial aggregation of the data produce significant differences in the variables describing the level of road safety, and hence in operational recommendation and conclusions. Beside the differences in national conditions and polices, the regional differences in road environment characteristics, traffic performance, road user mix, travel speeds, seat-belt use, and availability of emergency care have been major contributors to these variations. Road safety professionals and decision makers should be aware of the differences existing when trying to reduce road toll of the country in sustainable and cost-effective way.

    Statistical and Electrical Features Evaluation for Electrical Appliances Energy Disaggregation

    Get PDF
    In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.Peer reviewedFinal Published versio

    Developing archetypes for domestic dwellings : An Irish case study

    Get PDF
    Stock modelling, based on representative archetypes, is a promising tool for exploring areas for resource and emission reductions in the residential sector. The use of archetypes developed using detailed statistical analysis (multi-linear regression analysis, clustering and descriptive statistics) rather than traditional qualitative techniques allows a more accurate representation of the overall building stock variability in terms of geometric form, constructional materials and operation. This paper presents a methodology for the development of archetypes based on information from literature and a sample of detailed energy-related housing data. The methodology involves a literature review of studies to identify the most important variables which explain energy use and regression analysis of a housing database to identify the most relevant variables associated with energy consumption. A statistical analysis of the distributions for each key variable was used to identify representative parameters. Corresponding construction details were chosen based on knowledge of housing construction details. Clustering analysis was used to identify coincident groups of parameters and construction details; this led to the identification of 13 representative archetypes

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio
    corecore