496 research outputs found

    Analysis of FMRI Exams Through Unsupervised Learning and Evaluation Index

    Get PDF
    In the last few years, the clustering of time series has seen significant growth and has proven effective in providing useful information in various domains of use. This growing interest in time series clustering is the result of the effort made by the scientific community in the context of time data mining. For these reasons, the first phase of the thesis focused on the study of the data obtained from fMRI exams carried out in task-based and resting state mode, using and comparing different clustering algorithms: SelfOrganizing map (SOM), the Growing Neural Gas (GNG) and Neural Gas (NG) which are crisp-type algorithms, a fuzzy algorithm, the Fuzzy C algorithm, was also used (FCM). The evaluation of the results obtained by using clustering algorithms was carried out using the Davies Bouldin evaluation index (DBI or DB index). Clustering evaluation is the second topic of this thesis. To evaluate the validity of the clustering, there are specific techniques, but none of these is already consolidated for the study of fMRI exams. Furthermore, the evaluation of evaluation techniques is still an open research field. Eight clustering validation indexes (CVIs) applied to fMRI data clustering will be analysed. The validation indices that have been used are Pakhira Bandyopadhyay Maulik Index (crisp and fuzzy), Fukuyama Sugeno Index, Rezaee Lelieveldt Reider Index, Wang Sun Jiang Index, Xie Beni Index, Davies Bouldin Index, Soft Davies Bouldin Index. Furthermore, an evaluation of the evaluation indices will be carried out, which will take into account the sub-optimal performance obtained by the indices, through the introduction of new metrics. Finally, a new methodology for the evaluation of CVIs will be introduced, which will use an ANFIS model

    Customer Segmentation with Subscription-based Online Media Customers

    Get PDF
    In the modern era, using personalization when reaching out to potential or current customers is essential for businesses to compete in their area of business. With large customer bases, this personalization becomes more difficult, thus segmenting entire customer bases into smaller groups helps businesses focus better on personalization and targeted business decisions. These groups can be straightforward, like segmenting solely based on age, or more complex, like taking into account geographic, demographic, behavioral, and psychographic differences among the customers. In the latter case, customer segmentation should be performed with Machine Learning, which can help find more hidden patterns within the data. Often, the number of features in the customer data set is so large that some form of dimensionality reduction is needed. That is also the case with this thesis, which includes 12802 unique article tags that are desired to be included in the segmentation. A form of dimensionality reduction called feature hashing is selected for hashing the tags for its ability to be introduced new tags in the future. Using hashed features in customer segmentation is a balancing act. With more hashed features, the evaluation metrics might give better results and the hashed features resemble more closely the unhashed article tag data, but with less hashed features the clustering process is faster, more memory-efficient and the resulting clusters are more interpretable to the business. Three clustering algorithms, K-means, DBSCAN, and BIRCH, are tested with eight feature hashing bin sizes for each, with promising results for K-means and BIRCH

    An approach to validity indices for clustering techniques in Big Data

    Get PDF
    Clustering analysis is one of the most used Machine Learning techniques to discover groups among data objects. Some clustering methods require the number of clus ters into which the data is going to be partitioned. There exist several cluster validity indices that help us to approximate the optimal number of clusters of the dataset. However, such indices are not suitable to deal with Big Data due to its size limitation and runtime costs. This paper presents two cluster ing validity indices that handle large amount of data in low computational time. Our indices are based on redefinitions of traditional indices by simplifying the intra-cluster distance calculation. Two types of tests have been carried out over 28 synthetic datasets to analyze the performance of the proposed indices. First, we test the indices with small and medium size datasets to verify that our indices have a similar effectiveness to the traditional ones. Subsequently, tests on datasets of up to 11 million records and 20 features have been executed to check their efficiency. The results show that both indices can handle Big Data in a very low computational time with an effectiveness similar to the traditional indices using Apache Spark framework.Ministerio de Economía y Competitividad TIN2014-55894-C2-1-

    Visual and semantic interpretability of projections of high dimensional data for classification tasks

    Full text link
    A number of visual quality measures have been introduced in visual analytics literature in order to automatically select the best views of high dimensional data from a large number of candidate data projections. These methods generally concentrate on the interpretability of the visualization and pay little attention to the interpretability of the projection axes. In this paper, we argue that interpretability of the visualizations and the feature transformation functions are both crucial for visual exploration of high dimensional labeled data. We present a two-part user study to examine these two related but orthogonal aspects of interpretability. We first study how humans judge the quality of 2D scatterplots of various datasets with varying number of classes and provide comparisons with ten automated measures, including a number of visual quality measures and related measures from various machine learning fields. We then investigate how the user perception on interpretability of mathematical expressions relate to various automated measures of complexity that can be used to characterize data projection functions. We conclude with a discussion of how automated measures of visual and semantic interpretability of data projections can be used together for exploratory analysis in classification tasks.Comment: Longer version of the VAST 2011 poster. http://dx.doi.org/10.1109/VAST.2011.610247
    • …
    corecore