4,664 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Imbalanced data classification using support vector machine based on simulated annealing for enhancing penalty parameter

    Get PDF
    For pattern cataloguing and regression issues, the support vector machine (SVM) is an eminent and computationally prevailing machine learning method. It’s been effectively addressing several concrete issues across an extensive gamut of domains. SVM possesses a key aspect called penalty factor C. The choice of these aspects has a substantial impact on the classification precision of SVM as unsuitable parameter settings might drive substandard classification outcomes. Penalty factor C is required to achieve an adequate trade-off between classification errors and generalisation performance. Hence, formulating an SVM model having appropriate performance requires parameter optimisation. The simulated annealing (SA) algorithm is employed to formulate a hybrid method for evaluating SVM parameters. Additionally, the intent is to enhance system efficacy to obtain the optimal penalty parameter and balance classification performance at the same time. Our experiments with many UCI datasets indicate that the recommended technique could attain enhanced classification precision
    corecore