70 research outputs found

    Advances in transfer learning methods based on computational intelligence

    Get PDF
    Traditional machine learning and data mining have made tremendous progress in many knowledge-based areas, such as clustering, classification, and regression. However, the primary assumption in all of these areas is that the training and testing data should be in the same domain and have the same distribution. This assumption is difficult to achieve in real-world applications due to the limited availability of labeled data. Associated data in different domains can be used to expand the availability of prior knowledge about future target data. In recent years, transfer learning has been used to address such cross-domain learning problems by using information from data in a related domain and transferring that data to the target task. The transfer learning methodology is utilized in this work with unsupervised and supervised learning methods. For unsupervised learning, a novel transfer-learning possibilistic c-means (TLPCM) algorithm is proposed to handle the PCM clustering problem in a domain that has insufficient data. Moreover, TLPCM overcomes the problem of differing numbers of clusters between the source and target domains. The proposed algorithm employs the historical cluster centers of the source data as a reference to guide the clustering of the target data. The experimental studies presented here were thoroughly evaluated, and they demonstrate the advantages of TLPCM in both synthetic and real-world transfer datasets. For supervised learning, a transfer learning (TL) technique is used to pre-train a CNN model on posture data and then fine-tune it on the sleep stage data. We used a ballistocardiography (BCG) bed sensor to collect both posture and sleep stage data to provide a non-invasive, in-home monitoring system that tracks changes in the subjects' health over time. The quality of sleep has a significant impact on health and life. This study adopts a hierarchical and none-hierarchical classification structure to develop an automatic sleep stage classification system using ballistocardiogram (BCG) signals. A leave-one-subject-out cross-validation (LOSO-CV) procedure is used for testing classification performance in most of the experiments. Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Deep Neural Networks DNNs are complementary in their modeling capabilities, while CNNs have the advantage of reducing frequency variations, LSTMs are good at temporal modeling. Polysomnography (PSG) data from a sleep lab was used as the ground truth for sleep stages, with the emphasis on three sleep stages, specifically, awake, rapid eye movement (REM), and non-REM sleep (NREM). Moreover, a transfer learning approach is employed with supervised learning to address the cross-resident training problem to predict early illness. We validate our method by conducting a retrospective study on three residents from TigerPlace, a retirement community in Columbia, MO, where apartments are fitted with wireless networks of motion and bed sensors. Predicting the early signs of illness in older adults by using a continuous, unobtrusive nursing home monitoring system has been shown to increase the quality of life and decrease care costs. Illness prediction is based on sensor data and uses algorithms such as support vector machine (SVM) and k-nearest neighbors (kNN). One of the most significant challenges related to the development of prediction algorithms for sensor networks is the use of knowledge from previous residents to predict new ones' behaviors. Each day, the presence or absence of illness was manually evaluated using nursing visit reports from a homegrown electronic medical record (EMR) system. In this work, the transfer learning SVM approach outperformed three other methods, i.e., regular SVM, one-class SVM, and one-class kNN.Includes bibliographical references (pages 114-127)

    Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition

    Get PDF
    Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures

    A Fast Clustering Algorithm based on pruning unnecessary distance computations in DBSCAN for High-Dimensional Data

    Get PDF
    Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is “optimal” for large scale data. For example, DBSCAN requires O(n2) time, Fast-DBSCAN only works well in 2 dimensions, and ρ-Approximate DBSCAN runs in O(n) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that ρ-Approximate DBSCAN degenerates to an O(n2) algorithm in very high dimension such that 2D >  > n. In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O(n*log(n)) with the help of indexing technique, and the best case is O(n) if proper parameters are used, which makes it suitable for many realtime data

    Cluster Prototypes and Fuzzy Memberships Jointly Leveraged Cross-Domain Maximum Entropy Clustering

    No full text

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Applications of Molecular Dynamics simulations for biomolecular systems and improvements to density-based clustering in the analysis

    Get PDF
    Molecular Dynamics simulations provide a powerful tool to study biomolecular systems with atomistic detail. The key to better understand the function and behaviour of these molecules can often be found in their structural variability. Simulations can help to expose this information that is otherwise experimentally hard or impossible to attain. This work covers two application examples for which a sampling and a characterisation of the conformational ensemble could reveal the structural basis to answer a topical research question. For the fungal toxin phalloidin—a small bicyclic peptide—observed product ratios in different cyclisation reactions could be rationalised by assessing the conformational pre-organisation of precursor fragments. For the C-type lectin receptor langerin, conformational changes induced by different side-chain protonations could deliver an explanation of the pH-dependency in the protein’s calcium-binding. The investigations were accompanied by the continued development of a density-based clustering protocol into a respective software package, which is generally well applicable for the use case of extracting conformational states from Molecular Dynamics data

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data

    The 8th International Conference on Time Series and Forecasting

    Get PDF
    The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields

    Image Based Biomarkers from Magnetic Resonance Modalities: Blending Multiple Modalities, Dimensions and Scales.

    Get PDF
    The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information.The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information
    • 

    corecore