Search CORE

7,312 research outputs found

The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

Author: Anderlucci Laura
Montanari Angela
Viroli Cinzia
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

High-entropy high-hardness metal carbides discovered by entropy descriptors

Author: Brenner Donald W.
Curtarolo Stefano
Harrington Tyler
Maria Jon-Paul
Oses Corey
Samiee Mojtaba
Sarker Pranab
Toher Cormac
Vecchio Kenneth S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

High-entropy materials have attracted considerable interest due to the combination of useful properties and promising applications. Predicting their formation remains the major hindrance to the discovery of new systems. Here we propose a descriptor - entropy forming ability - for addressing synthesizability from first principles. The formalism, based on the energy distribution spectrum of randomized calculations, captures the accessibility of equally-sampled states near the ground state and quantifies configurational disorder capable of stabilizing high-entropy homogeneous phases. The methodology is applied to disordered refractory 5-metal carbides - promising candidates for high-hardness applications. The descriptor correctly predicts the ease with which compositions can be experimentally synthesized as rock-salt high-entropy homogeneous phases, validating the ansatz, and in some cases, going beyond intuition. Several of these materials exhibit hardness up to 50% higher than rule of mixtures estimations. The entropy descriptor method has the potential to accelerate the search for high-entropy systems by rationally combining first principles with experimental synthesis and characterization.Comment: 12 pages, 2 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

MPG.PuRe

Cluster validation by measurement of clustering characteristics relevant to the user

Author: Bowcock
Calinski
Coretto
Fang
Franck
Halkidi
Hausdorf
Hennig
Hennig
Hennig
Hennig
Hubert
Hubert
Katsnelson
Kaufman
Lago-Fernandez
Stigler
Tibshirani
Publication venue
Publication date: 01/01/2019
Field of study

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Shrinkage estimation of variance components with applications to microarray data

Author: An Lihua
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Recent advances in directional statistics

Author: García-Portugués Eduardo
Pewsey Arthur
Publication venue
Publication date: 22/09/2020
Field of study

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

arXiv.org e-Print Archive

Crossref

Universidad Carlos III de Madrid e-Archivo

Two Decades of Unsupervised POS tagging---How Far Have We Come?

Author: Christodoulopoulos Christos
Goldwater Sharon
Steedman Mark
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer