76,787 research outputs found

    Analysis of FMRI Exams Through Unsupervised Learning and Evaluation Index

    Get PDF
    In the last few years, the clustering of time series has seen significant growth and has proven effective in providing useful information in various domains of use. This growing interest in time series clustering is the result of the effort made by the scientific community in the context of time data mining. For these reasons, the first phase of the thesis focused on the study of the data obtained from fMRI exams carried out in task-based and resting state mode, using and comparing different clustering algorithms: SelfOrganizing map (SOM), the Growing Neural Gas (GNG) and Neural Gas (NG) which are crisp-type algorithms, a fuzzy algorithm, the Fuzzy C algorithm, was also used (FCM). The evaluation of the results obtained by using clustering algorithms was carried out using the Davies Bouldin evaluation index (DBI or DB index). Clustering evaluation is the second topic of this thesis. To evaluate the validity of the clustering, there are specific techniques, but none of these is already consolidated for the study of fMRI exams. Furthermore, the evaluation of evaluation techniques is still an open research field. Eight clustering validation indexes (CVIs) applied to fMRI data clustering will be analysed. The validation indices that have been used are Pakhira Bandyopadhyay Maulik Index (crisp and fuzzy), Fukuyama Sugeno Index, Rezaee Lelieveldt Reider Index, Wang Sun Jiang Index, Xie Beni Index, Davies Bouldin Index, Soft Davies Bouldin Index. Furthermore, an evaluation of the evaluation indices will be carried out, which will take into account the sub-optimal performance obtained by the indices, through the introduction of new metrics. Finally, a new methodology for the evaluation of CVIs will be introduced, which will use an ANFIS model

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Vertical wind profile characterization and identification of patterns based on a shape clustering algorithm

    Get PDF
    Wind power plants are becoming a generally accepted resource in the generation mix of many utilities. At the same time, the size and the power rating of individual wind turbines have increased considerably. Under these circumstances, the sector is increasingly demanding an accurate characterization of vertical wind speed profiles to estimate properly the incoming wind speed at the rotor swept area and, consequently, assess the potential for a wind power plant site. The present paper describes a shape-based clustering characterization and visualization of real vertical wind speed data. The proposed solution allows us to identify the most likely vertical wind speed patterns for a specific location based on real wind speed measurements. Moreover, this clustering approach also provides characterization and classification of such vertical wind profiles. This solution is highly suitable for a large amount of data collected by remote sensing equipment, where wind speed values at different heights within the rotor swept area are available for subsequent analysis. The methodology is based on z-normalization, shape-based distance metric solution and the Ward-hierarchical clustering method. Real vertical wind speed profile data corresponding to a Spanish wind power plant and collected by using a commercialWindcube equipment during several months are used to assess the proposed characterization and clustering process, involving more than 100000 wind speed data values. All analyses have been implemented using open-source R-software. From the results, at least four different vertical wind speed patterns are identified to characterize properly over 90% of the collected wind speed data along the day. Therefore, alternative analytical function criteria should be subsequently proposed for vertical wind speed characterization purposes.The authors are grateful for the financial support from the Spanish Ministry of the Economy and Competitiveness and the European Union —ENE2016-78214-C2-2-R—and the Spanish Education, Culture and Sport Ministry —FPU16/042

    Development Of Climate Classification Through Hierarchical Clustering For Building Energy Simulation

    Get PDF
    Climate classification plays an important role for the identification of homogeneous groups of climates, from which representative locations can be extracted and used for building energy simulation analyses. Nevertheless, according to the current state-of-the-art, the main reference systems consider just a fraction of those weather quantities which are relevant in the building energy balance, i.e., ambient temperature and humidity and solar radiation. To overcome this issue, in previous researches a new methodology was defined, based on monthly series of weather quantities, statistical analyses and data-mining techniques for climate clustering. In this work, with the aim of further developing such approach, a shorter time-discretization of weather quantities, i.e., a weekly discretization, was tested, alongside additional variables describing the daily range of ambient temperature and humidity. In order to investigate the potential of those modifications, a dataset with more than 300 European reference climates was analyzed and subdivided into climate classes according to the proposed clustering procedure

    Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

    Full text link
    Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an \emph{exact} solution to the problem. The suggested solution provides the tightest estimation of the L2L_2-norm or the correlation. We show that typical data-analysis operations, such as k-NN search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD
    • 

    corecore