17,445 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Forecasting bus passenger flows by using a clustering-based support vector regression approach

    Get PDF
    As a significant component of the intelligent transportation system, forecasting bus passenger flows plays a key role in resource allocation, network planning, and frequency setting. However, it remains challenging to recognize high fluctuations, nonlinearity, and periodicity of bus passenger flows due to varied destinations and departure times. For this reason, a novel forecasting model named as affinity propagation-based support vector regression (AP-SVR) is proposed based on clustering and nonlinear simulation. For the addressed approach, a clustering algorithm is first used to generate clustering-based intervals. A support vector regression (SVR) is then exploited to forecast the passenger flow for each cluster, with the use of particle swarm optimization (PSO) for obtaining the optimized parameters. Finally, the prediction results of the SVR are rearranged by chronological order rearrangement. The proposed model is tested using real bus passenger data from a bus line over four months. Experimental results demonstrate that the proposed model performs better than other peer models in terms of absolute percentage error and mean absolute percentage error. It is recommended that the deterministic clustering technique with stable cluster results (AP) can improve the forecasting performance significantly.info:eu-repo/semantics/publishedVersio

    Predictive Liability Models and Visualizations of High Dimensional Retail Employee Data

    Full text link
    Employee theft and dishonesty is a major contributor to loss in the retail industry. Retailers have reported the need for more automated analytic tools to assess the liability of their employees. In this work, we train and optimize several machine learning models for regression prediction and analysis on this data, which will help retailers identify and manage risky employees. Since the data we use is very high dimensional, we use feature selection techniques to identify the most contributing factors to an employee's assessed risk. We also use dimension reduction and data embedding techniques to present this dataset in a easy to interpret format
    • …
    corecore