826 research outputs found

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Weighted Mahalanobis Distance for Hyper-Ellipsoidal Clustering

    Get PDF
    Cluster analysis is widely used in many applications, ranging from image and speech coding to pattern recognition. A new method that uses the weighted Mahalanobis distance (WMD) via the covariance matrix of the individual clusters as the basis for grouping is presented in this thesis. In this algorithm, the Mahalanobis distance is used as a measure of similarity between the samples in each cluster. This thesis discusses some difficulties associated with using the Mahalanobis distance in clustering. The proposed method provides solutions to these problems. The new algorithm is an approximation to the well-known expectation maximization (EM) procedure used to find the maximum likelihood estimates in a Gaussian mixture model. Unlike the EM procedure, WMD eliminates the requirement of having initial parameters such as the cluster means and variances as it starts from the raw data set. Properties of the new clustering method are presented by examining the clustering quality for codebooks designed with the proposed method and competing methods on a variety of data sets. The competing methods are the Linde-Buzo-Gray (LBG) algorithm and the Fuzzy c-means (FCM) algorithm, both of them use the Euclidean distance. The neural network for hyperellipsoidal clustering (HEC) that uses the Mahalnobis distance is also studied and compared to the WMD method and the other techniques as well. The new method provides better results than the competing methods. Thus, this method becomes another useful tool for use in clustering

    Contributions of Clustering Variable Selection: Methods for International Segmentation

    Get PDF
    Performing international activities is a challenging operation given the heterogeneity of the international market which makes practically impossible the development of successful standardized strategies for the entire world’s population. Finding homogeneous international customer segments helps companies to better communicate with the targeted customers by concentrating on a few units, a group, or several groups. Depending on the study purpose, the segmentation results may help to select potentially attractive international markets, to develop in the context of a global marketing standardized strategies for a segment of countries, or to develop in the context of an international marketing a totally or partially differentiated strategy for several groups. Thus, international segmentation has become an indispensable task in the strategic decision-making process for various international business research questions. Consequently, choosing the relevant segmentation bases and the statistical method represent crucial steps to carry out to identify segments of customers. Actually, research studies in which an international market is segmented mainly employ as bases socio-economic or cultural variables. Moreover, in these studies, since the purpose of the analysis is usually to discover a priori unknown segments in an international population, the segmentation task is performed by clustering techniques. Typically, in this scientific research, to facilitate the interpretation of the results, the segmentation task is preceded by factor analysis to reduce a large number of the initial variables into a few dimensions or factors. However, on the one hand, factor analysis usually generates a loss of information and distortion of reality. On the other hand, the set of the variables initially considered may contain irrelevant variables that might lead to incorrect classification. Therefore, to retain only relevant information for the clustering task: variable selection should be performed to reduce the data dimension before considering a factor analysis.As shown by the numerical experiments,conducted on the basis of two secondary databases: the 03/07/2018 updates of the structure of consumption expenditure published by Eurostat including 32 countries of the European Union and its neighboring countries and the 15/04/2016 version of the updated European Values Study data including customers from 48 European countries, it will allow discovering the accurate groups and facilitate result interpretation. As a result, variable selection allows discovering relevant segments that are easy to interpret. Thus, once the variable selection is performed, the segmentation results will enable relevant and accurate analysis and support correct decision-making.     JEL Classification: C38 Paper type: Empirical researchPerforming international activities is a challenging operation given the heterogeneity of the international market which makes practically impossible the development of successful standardized strategies for the entire world’s population. Finding homogeneous international customer segments helps companies to better communicate with the targeted customers by concentrating on a few units, a group, or several groups. Depending on the study purpose, the segmentation results may help to select potentially attractive international markets, to develop in the context of a global marketing standardized strategies for a segment of countries, or to develop in the context of an international marketing a totally or partially differentiated strategy for several groups. Thus, international segmentation has become an indispensable task in the strategic decision-making process for various international business research questions. Consequently, choosing the relevant segmentation bases and the statistical method represent crucial steps to carry out to identify segments of customers. Actually, research studies in which an international market is segmented mainly employ as bases socio-economic or cultural variables. Moreover, in these studies, since the purpose of the analysis is usually to discover a priori unknown segments in an international population, the segmentation task is performed by clustering techniques. Typically, in this scientific research, to facilitate the interpretation of the results, the segmentation task is preceded by factor analysis to reduce a large number of the initial variables into a few dimensions or factors. However, on the one hand, factor analysis usually generates a loss of information and distortion of reality. On the other hand, the set of the variables initially considered may contain irrelevant variables that might lead to incorrect classification. Therefore, to retain only relevant information for the clustering task: variable selection should be performed to reduce the data dimension before considering a factor analysis.As shown by the numerical experiments,conducted on the basis of two secondary databases: the 03/07/2018 updates of the structure of consumption expenditure published by Eurostat including 32 countries of the European Union and its neighboring countries and the 15/04/2016 version of the updated European Values Study data including customers from 48 European countries, it will allow discovering the accurate groups and facilitate result interpretation. As a result, variable selection allows discovering relevant segments that are easy to interpret. Thus, once the variable selection is performed, the segmentation results will enable relevant and accurate analysis and support correct decision-making.     JEL Classification: C38 Paper type: Empirical researc

    Nonperturbative studies of fuzzy spheres in a matrix model with the Chern-Simons term

    Full text link
    Fuzzy spheres appear as classical solutions in a matrix model obtained via dimensional reduction of 3-dimensional Yang-Mills theory with the Chern-Simons term. Well-defined perturbative expansion around these solutions can be formulated even for finite matrix size, and in the case of kk coincident fuzzy spheres it gives rise to a regularized U(kk) gauge theory on a noncommutative geometry. Here we study the matrix model nonperturbatively by Monte Carlo simulation. The system undergoes a first order phase transition as we change the coefficient (α\alpha) of the Chern-Simons term. In the small α\alpha phase, the large NN properties of the system are qualitatively the same as in the pure Yang-Mills model (α=0\alpha =0), whereas in the large α\alpha phase a single fuzzy sphere emerges dynamically. Various `multi fuzzy spheres' are observed as meta-stable states, and we argue in particular that the kk coincident fuzzy spheres cannot be realized as the true vacuum in this model even in the large NN limit. We also perform one-loop calculations of various observables for arbitrary kk including k=1k=1. Comparison with our Monte Carlo data suggests that higher order corrections are suppressed in the large NN limit.Comment: Latex 37 pages, 13 figures, discussion on instabilities refined, references added, typo corrected, the final version to appear in JHE

    An overview of clustering methods with guidelines for application in mental health research

    Get PDF
    Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

    3D Robotic Sensing of People: Human Perception, Representation and Activity Recognition

    Get PDF
    The robots are coming. Their presence will eventually bridge the digital-physical divide and dramatically impact human life by taking over tasks where our current society has shortcomings (e.g., search and rescue, elderly care, and child education). Human-centered robotics (HCR) is a vision to address how robots can coexist with humans and help people live safer, simpler and more independent lives. As humans, we have a remarkable ability to perceive the world around us, perceive people, and interpret their behaviors. Endowing robots with these critical capabilities in highly dynamic human social environments is a significant but very challenging problem in practical human-centered robotics applications. This research focuses on robotic sensing of people, that is, how robots can perceive and represent humans and understand their behaviors, primarily through 3D robotic vision. In this dissertation, I begin with a broad perspective on human-centered robotics by discussing its real-world applications and significant challenges. Then, I will introduce a real-time perception system, based on the concept of Depth of Interest, to detect and track multiple individuals using a color-depth camera that is installed on moving robotic platforms. In addition, I will discuss human representation approaches, based on local spatio-temporal features, including new “CoDe4D” features that incorporate both color and depth information, a new “SOD” descriptor to efficiently quantize 3D visual features, and the novel AdHuC features, which are capable of representing the activities of multiple individuals. Several new algorithms to recognize human activities are also discussed, including the RG-PLSA model, which allows us to discover activity patterns without supervision, the MC-HCRF model, which can explicitly investigate certainty in latent temporal patterns, and the FuzzySR model, which is used to segment continuous data into events and probabilistically recognize human activities. Cognition models based on recognition results are also implemented for decision making that allow robotic systems to react to human activities. Finally, I will conclude with a discussion of future directions that will accelerate the upcoming technological revolution of human-centered robotics

    Fuzzy spectral clustering methods for textual data

    Get PDF
    Nowadays, the development of advanced information technologies has determined an increase in the production of textual data. This inevitable growth accentuates the need to advance in the identification of new methods and tools able to efficiently analyse such kind of data. Against this background, unsupervised classification techniques can play a key role in this process since most of this data is not classified. Document clustering, which is used for identifying a partition of clusters in a corpus of documents, has proven to perform efficiently in the analyses of textual documents and it has been extensively applied in different fields, from topic modelling to information retrieval tasks. Recently, spectral clustering methods have gained success in the field of text classification. These methods have gained popularity due to their solid theoretical foundations which do not require any specific assumption on the global structure of the data. However, even though they prove to perform well in text classification problems, little has been done in the field of clustering. Moreover, depending on the type of documents analysed, it might be often the case that textual documents do not contain only information related to a single topic: indeed, there might be an overlap of contents characterizing different knowledge domains. Consequently, documents may contain information that is relevant to different areas of interest to some degree. The first part of this work critically analyses the main clustering algorithms used for text data, involving also the mathematical representation of documents and the pre-processing phase. Then, three novel fuzzy versions of spectral clustering algorithms for text data are introduced. The first one exploits the use of fuzzy K-medoids instead of K-means. The second one derives directly from the first one but is used in combination with Kernel and Set Similarity (KS2M), which takes into account the Jaccard index. Finally, in the third one, in order to enhance the clustering performance, a new similarity measure S∗ is proposed. This last one exploits the inherent sequential nature of text data by means of a weighted combination between the Spectrum string kernel function and a measure of set similarity. The second part of the thesis focuses on spectral bi-clustering algorithms for text mining tasks, which represent an interesting and partially unexplored field of research. In particular, two novel versions of fuzzy spectral bi-clustering algorithms are introduced. The two algorithms differ from each other for the approach followed in the identification of the document and the word partitions. Indeed, the first one follows a simultaneous approach while the second one a sequential approach. This difference leads also to a diversification in the choice of the number of clusters. The adequacy of all the proposed fuzzy (bi-)clustering methods is evaluated by experiments performed on both real and benchmark data sets

    Micro-structure diffusion scalar measures from reduced MRI acquisitions

    Get PDF
    In diffusion MRI, the Ensemble Average diffusion Propagator (EAP) provides relevant microstructural information and meaningful descriptive maps of the white matter previously obscured by traditional techniques like the Diffusion Tensor. The direct estimation of the EAP, however, requires a dense sampling of the Cartesian q-space. Due to the huge amount of samples needed for an accurate reconstruction, more efficient alternative techniques have been proposed in the last decade. Even so, all of them imply acquiring a large number of diffusion gradients with different b-values. In order to use the EAP in practical studies, scalar measures must be directly derived, being the most common the return-to-origin probability (RTOP) and the return-to-plane and return-to-axis probabilities (RTPP, RTAP). In this work, we propose the so-called “Apparent Measures Using Reduced Acquisitions” (AMURA) to drastically reduce the number of samples needed for the estimation of diffusion properties. AMURA avoids the calculation of the whole EAP by assuming the diffusion anisotropy is roughly independent from the radial direction. With such an assumption, and as opposed to common multi-shell procedures based on iterative optimization, we achieve closed-form expressions for the measures using information from one single shell. This way, the new methodology remains compatible with standard acquisition protocols commonly used for HARDI (based on just one b-value). We report extensive results showing the potential of AMURA to reveal microstructural properties of the tissues compared to state of the art EAP estimators, and is well above that of Diffusion Tensor techniques. At the same time, the closed forms provided for RTOP, RTPP, and RTAP-like magnitudes make AMURA both computationally efficient and robust
    • 

    corecore