465 research outputs found

    Hybrid Unsupervised Exploratory Plots: A Case Study of Analysing Foreign Direct Investment

    Get PDF
    The curse of dimensionality has been an open issue for many years and still is, as finding nonobvious and previously unknown patterns in ever-increasing amounts of high-dimensional data is not an easy task. Advancing in descriptive data analysis, the present paper proposes Hybrid Unsupervised Exploratory Plots (HUEPs) as a new visualization technique to combine the outputs of Exploratory Projection Pursuit and Clustering methods in a novel and informative way. As a case study, HUEPs are validated in a real-world context for analysing the internationalization strategy of companies, by taking into account bilateral distance between home and host countries. As a multifaceted concept, distance encompasses multiple dimensions. Together with data from both the countries and the companies, various psychic distances are analysed by means of HUEPs, to gain deep knowledge of the internationalization strategy of large Spanish companies. Informative visualizations are obtained from the analysed dataset, leading to useful business implications and decision making.The work was conducted during Álvaro Herrero’s research stay at KEDGE Business School in Bordeaux (France). Some results of this ongoing research, from the same dataset, have been presented in the 13th International Conference on Soft Computing Models in Industrial and Environmental Applications, as a paper entitled “Visualizing Industrial Development Distance to Better Understand Internationalization of Spanish Companies”

    G-Tric: enhancing triclustering evaluation using three-way synthetic datasets with ground truth

    Get PDF
    Tese de mestrado, CiĂȘncia de Dados, Universidade de Lisboa, Faculdade de CiĂȘncias, 2020Three-dimensional datasets, or three-way data, started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations _ features _ contexts). With an increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount.These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real three-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. G-Tric can replicate real-world datasets and create new ones that match researchers’ needs across several properties, including data type (numeric or symbolic), dimension, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled by defining the number of missing values, noise, and errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches. Besides reviewing the current state-of-the-art regarding triclustering approaches, comparison studies and evaluation metrics, this work also analyzes how the lack of frameworks to generate synthetic data influences existent evaluation methodologies, limiting the scope of performance insights that can be extracted from each algorithm. As well as exemplifying how the set of decisions made on these evaluations can impact the quality and validity of those results. Alternatively, a different methodology that takes advantage of synthetic data with ground truth is presented. This approach, combined with the proposal of an extension to an existing clustering extrinsic measure, enables to assess solutions’ quality under new perspectives

    Mining User Personality from Music Listening Behavior in Online Platforms Using Audio Attributes

    Get PDF
    Music and emotions are inherently intertwined. Humans leave hints of their personality everywhere, and particularly their music listening behavior shows conscious and unconscious diametric tendencies and inïŹ‚uences. So, what could be more elegant than ïŹnding the underlying character given the attributes of a certain music piece and, as such, identifying the likelihood that music preference is also imprinted or at least resonating with its listener? This thesis focuses on the music audio attributes or the latent song features to determine human personality. Based on unsupervised learning, we cluster several large music datasets using multiple clustering techniques known to us. This analysis led us to classify song genres based on audio attributes, which can be deemed a novel contribution in the intersection of Music Information Retrieval (MIR) and human psychology studies. Existing research found a relationship between Myers-Briggs personality models and music genres. Our goal was to correlate audio attributes with the music genre, which will ultimately help us to determine user personality based on their music listening behavior from online music platforms. This target has been achieved as we showed the users’ spectral personality traits from the audio feature values of the songs they listen to online and verified our decision process with the help of a customized Music Recommendation System (MRS). Our model performs genre classification and personality detection with 78% and 74% accuracy, respectively. The results are promising compared to competitor approaches as they are explainable via statistics and visualizations. Furthermore, the RS completes and validates our pursuit through 81.3% accurate song suggestions. We believe the outcome of this thesis will work as an inspiration and assistance for fellow researchers in this arena to come up with more personalized song suggestions. As music preferences will shape specific user personality parameters, it is expected that more such elements will surface that would portray the daily activities of individuals and their underlying mentality

    Fractal and multifractal analysis of PET-CT images of metastatic melanoma before and after treatment with ipilimumab

    Get PDF
    PET/CT with F-18-Fluorodeoxyglucose (FDG) images of patients suffering from metastatic melanoma have been analysed using fractal and multifractal analysis to assess the impact of monoclonal antibody ipilimumab treatment with respect to therapy outcome. Our analysis shows that the fractal dimensions which describe the tracer dispersion in the body decrease consistently with the deterioration of the patient therapeutic outcome condition. In 20 out-of 24 cases the fractal analysis results match those of the medical records, while 7 cases are considered as special cases because the patients have non-tumour related medical conditions or side effects which affect the results. The decrease in the fractal dimensions with the deterioration of the patient conditions (in terms of disease progression) are attributed to the hierarchical localisation of the tracer which accumulates in the affected lesions and does not spread homogeneously throughout the body. Fractality emerges as a result of the migration patterns which the malignant cells follow for propagating within the body (circulatory system, lymphatic system). Analysis of the multifractal spectrum complements and supports the results of the fractal analysis. In the kinetic Monte Carlo modelling of the metastatic process a small number of malignant cells diffuse throughout a fractal medium representing the blood circulatory network. Along their way the malignant cells engender random metastases (colonies) with a small probability and, as a result, fractal spatial distributions of the metastases are formed similar to the ones observed in the PET/CT images. In conclusion, we propose that fractal and multifractal analysis has potential application in the quantification of the evaluation of PET/CT images to monitor the disease evolution as well as the response to different medical treatments.Comment: 38 pages, 9 figure

    Subspace Representations and Learning for Visual Recognition

    Get PDF
    Pervasive and affordable sensor and storage technology enables the acquisition of an ever-rising amount of visual data. The ability to extract semantic information by interpreting, indexing and searching visual data is impacting domains such as surveillance, robotics, intelligence, human- computer interaction, navigation, healthcare, and several others. This further stimulates the investigation of automated extraction techniques that are more efficient, and robust against the many sources of noise affecting the already complex visual data, which is carrying the semantic information of interest. We address the problem by designing novel visual data representations, based on learning data subspace decompositions that are invariant against noise, while being informative for the task at hand. We use this guiding principle to tackle several visual recognition problems, including detection and recognition of human interactions from surveillance video, face recognition in unconstrained environments, and domain generalization for object recognition.;By interpreting visual data with a simple additive noise model, we consider the subspaces spanned by the model portion (model subspace) and the noise portion (variation subspace). We observe that decomposing the variation subspace against the model subspace gives rise to the so-called parity subspace. Decomposing the model subspace against the variation subspace instead gives rise to what we name invariant subspace. We extend the use of kernel techniques for the parity subspace. This enables modeling the highly non-linear temporal trajectories describing human behavior, and performing detection and recognition of human interactions. In addition, we introduce supervised low-rank matrix decomposition techniques for learning the invariant subspace for two other tasks. We learn invariant representations for face recognition from grossly corrupted images, and we learn object recognition classifiers that are invariant to the so-called domain bias.;Extensive experiments using the benchmark datasets publicly available for each of the three tasks, show that learning representations based on subspace decompositions invariant to the sources of noise lead to results comparable or better than the state-of-the-art

    K-Means Clustering Using Principal Component Analysis (PCA) Indonesia Multi-Finance Industry Performance Before and During Covid-19

    Get PDF
    The cluster analysis within specific industry such as in multi finance indsutries is designed to be a tool for accelerating investment decisions, such as whether to buy, sell, or hold stocks in a way to construct an optimized portfolio. The purpose of the study was to apply cluster analysis on multi-finance stock data listed on the Indonesia Stock Exchange in the years 2019 and 2021, before and during Covid-19, using the PCA (Principal Component Analysis) K-means algorithm. The objective of this study is to classify stocks based on PCAs in order to assist investors in segmenting a multi-finance stocks cluster. The clustering is done on the 16 stocks registered in ISE using two-time windows: 2019 data where Covid-19 has not yet occurred and 2021 data where Covid-19 is still ongoing, and the firm is still in the recovery stage. The cluster analysis results show 12 companies worth investing in because they performed well. There is finding that  company that have unfavorable Covid-19 externalities since this cluster has worsening performance and is thus not advised as a stock investment. Meanwhile, the others company has neutral externalities because it remains in the same cluster in 2019 and 2021

    Mass spectral imaging of clinical samples using deep learning

    Get PDF
    A better interpretation of tumour heterogeneity and variability is vital for the improvement of novel diagnostic techniques and personalized cancer treatments. Tumour tissue heterogeneity is characterized by biochemical heterogeneity, which can be investigated by unsupervised metabolomics. Mass Spectrometry Imaging (MSI) combined with Machine Learning techniques have generated increasing interest as analytical and diagnostic tools for the analysis of spatial molecular patterns in tissue samples. Considering the high complexity of data produced by the application of MSI, which can consist of many thousands of spectral peaks, statistical analysis and in particular machine learning and deep learning have been investigated as novel approaches to deduce the relationships between the measured molecular patterns and the local structural and biological properties of the tissues. Machine learning have historically been divided into two main categories: Supervised and Unsupervised learning. In MSI, supervised learning methods may be used to segment tissues into histologically relevant areas e.g. the classification of tissue regions in H&E (Haemotoxylin and Eosin) stained samples. Initial classification by an expert histopathologist, through visual inspection enables the development of univariate or multivariate models, based on tissue regions that have significantly up/down-regulated ions. However, complex data may result in underdetermined models, and alternative methods that can cope with high dimensionality and noisy data are required. Here, we describe, apply, and test a novel diagnostic procedure built using a combination of MSI and deep learning with the objective of delineating and identifying biochemical differences between cancerous and non-cancerous tissue in metastatic liver cancer and epithelial ovarian cancer. The workflow investigates the robustness of single (1D) to multidimensional (3D) tumour analyses and also highlights possible biomarkers which are not accessible from classical visual analysis of the H&E images. The identification of key molecular markers may provide a deeper understanding of tumour heterogeneity and potential targets for intervention.Open Acces

    Object detection, recognition and classification using computer vision and artificial intelligence approaches

    Get PDF
    Object detection and recognition has been used extensively in recent years to solve numerus challenges in different fields. Due to the vital roles they play, object detection and recognition has enabled quantum leaps in many industry fields by helping to overcome some serious challenges and obstacles. For example, worldwide security concerns have drawn the attention and stimulated the use of highly intelligent computer vision technology to provide security in different environments and in diverse terrains. In addition, some wildlife is at present exposed to danger and extinction worldwide. Therefore, early detection and recognition of potential threats to wildlife have become essential and timely. The extent of using computer vision and artificial intelligence to convert the seemingly insecure world to a more secure one has been widely accepted. Such technologies are used in monitoring, tracking, organising, analysing objects in a scene and for a number of other countless purposes. [Continues.

    Non-parametric Methods for Correlation Analysis in Multivariate Data with Applications in Data Mining

    Get PDF
    In this thesis, we develop novel methods for correlation analysis in multivariate data, with a special focus on mining correlated subspaces. Our methods handle major open challenges arisen when combining correlation analysis with subspace mining. Besides traditional correlation analysis, we explore interaction-preserving discretization of multivariate data and causality analysis. We conduct experiments on a variety of real-world data sets. The results validate the benefits of our methods
    • 

    corecore