Search CORE

1,038,172 research outputs found

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Author: Pandey Gaurav
Whalen Sean
Publication venue
Publication date: 19/09/2013
Field of study

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

arXiv.org e-Print Archive

Crossref

In silico comparative genomics analysis of Plasmodium falciparum for the identification of putative essential genes and therapeutic candidates.

Author: Altschul
Andrews
Anishetty
Arama
Berman
Bernstein
Bhasin
Boeckmann
Chang
Chawley
Chen
Colovos
Corpet
David Charles Warhurst
Eisenberg
Finn
Franceschini
Ghosh
Irwin
Irwin
Kanehisa
Konc
Krogh
Kushwaha
Larsen
Laskowski
Ludin
Miller
Morris
Mrutyunjay Suar
Mulder
Pieper
Pierleoni
Pontius
Rajani Kanta Mahapatra
Sali
Subhashree Rout
Sun
Tam
von Mering
Wallner
Wiederstein
World Health Organization
Yang
Yeh
Zhexin
Publication venue: 'Elsevier BV'
Publication date: 05/12/2014
Field of study

A sequence of computational methods was used for predicting novel drug targets against drug resistant malaria parasite Plasmodium falciparum. Comparative genomics, orthologous protein analysis among same and other malaria parasites and protein-protein interaction study provide us new insights into determining the essential genes and novel therapeutic candidates. Among the predicted list of 21 essential proteins from unique pathways, 11 proteins were prioritized as anti-malarial drug targets. As a case study, we built homology models of two uncharacterized proteins using MODELLER v9.13 software from possible templates. Functional annotation of these proteins was done by the InterPro databases and from ProBiS server by comparison of predicted binding site residues. The model has been subjected to in silico docking study with screened potent lead compounds from the ZINC database by Dock Blaster software using AutoDock 4. Results from this study facilitate the selection of proteins and putative inhibitors for entry into drug design production pipelines

Crossref

LSHTM Research Online

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Author: Khare Kshitij
Oh Sang-Yun
Rajaratnam Bala
Publication venue
Publication date: 14/08/2014
Field of study

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Genetic heterogeneity analysis using genetic algorithm and network science

Author: Chen Yuanzhu
Hu Ting
Sha Zhendong
Publication venue
Publication date: 11/08/2023
Field of study

Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset

arXiv.org e-Print Archive

Distributed Quantile Regression Analysis and a Group Variable Selection Method

Author: Yu Liqun
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2018
Field of study

This dissertation develops novel methodologies for distributed quantile regression analysis for big data by utilizing a distributed optimization algorithm called the alternating direction method of multipliers (ADMM). Specifically, we first write the penalized quantile regression into a specific form that can be solved by the ADMM and propose numerical algorithms for solving the ADMM subproblems. This results in the distributed QR-ADMM algorithm. Then, to further reduce the computational time, we formulate the penalized quantile regression into another equivalent ADMM form in which all the subproblems have exact closed-form solutions and hence avoid iterative numerical methods. This results in the single-loop QPADM algorithm that further improve on the computational efficiency of the QR-ADMM. Both QR-ADMM and QPADM enjoy flexible parallelization by enabling data splitting across both sample space and feature space, which make them especially appealing for the case when both sample size n and feature dimension p are large. Besides the QR-ADMM and QPADM algorithms for penalized quantile regression, we also develop a group variable selection method by approximating the Bayesian information criterion. Unlike existing penalization methods for feature selection, our proposed gMIC algorithm is free of parameter tuning and hence enjoys greater computational efficiency. Although the current version of gMIC focuses on the generalized linear model, it can be naturally extended to the quantile regression for feature selection. We provide theoretical analysis for our proposed methods. Specifically, we conduct numerical convergence analysis for the QR-ADMM and QPADM algorithms, and provide asymptotical theories and oracle property of feature selection for the gMIC method. All our methods are evaluated with simulation studies and real data analysis

Washington University St. Louis: Open Scholarship

Using Pythagorean Fuzzy Sets (PFS) in Multiple Criteria Group Decision Making (MCGDM) Methods for Engineering Materials Selection Applications

Author: Momena Alaa Fouad
Publication venue: UWM Digital Commons
Publication date: 01/05/2019
Field of study

The process of materials’ selection is very critical during the initial stages of designing manufactured products. Inefficient decision-making outcomes in the material selection process could result in poor quality of products and unnecessary costs. In the last century, numerous materials have been developed for manufacturing mechanical components in different industries. Many of these new materials are similar in their properties and performances, thus creating great challenges for designers and engineers to make accurate selections. Our main objective in this work is to assist decision makers (DMs) within the manufacturing field to evaluate materials alternatives and to select the best alternative for specific manufacturing purposes. In this research, new hybrid fuzzy Multiple Criteria Group Decision Making (MCGDM) methods are proposed for the material selection problem. The proposed methods tackle some challenges that are associated with the material selection decision making process, such as aggregating decision makers’ (DMs) decisions appropriately and modeling uncertainty. In the proposed hybrid models, a novel aggregation approach is developed to convert DMs crisp decisions to Pythagorean fuzzy sets (PFS). This approach gives more flexibility to DMs to express their opinions than the traditional fuzzy and intuitionistic sets (IFS). Then, the proposed aggregation approach is integrated with a ranking method to solve the Pythagorean Fuzzy Multi Criteria Decision Making (PFMCGDM) problem and rank the material alternatives. The ranking methods used in the hybrid models are the Pythagorean Fuzzy TOPSIS (The Technique for Order of Preference by Similarity to Ideal Solution) and Pythagorean Fuzzy COPRAS (COmplex PRoportional Assessment). TOPSIS and COPRAS are selected based on their effectiveness and practicality in dealing with the nature of material selection problems. In the aggregation approach, the Sugeno Fuzzy measure and the Shapley value are used to fairly distribute the DMs weight in the Pythagorean Fuzzy numbers. Additionally, new functions to calculate uncertainty from DMs recommendations are developed using the Takagai-Sugeno approach. The literature reveals some work on these methods, but to our knowledge, there are no published works that integrate the proposed aggregation approach with the selected MCDM ranking methods under the Pythagorean Fuzzy environment for the use in materials selection problems. Furthermore, the proposed methods might be applied, due to its novelty, to any MCDM problem in other areas. A practical validation of the proposed hybrid PFMCGDM methods is investigated through conducting a case study of material selection for high pressure turbine blades in jet engines. The main objectives of the case study were: 1) to investigate the new developed aggregation approach in converting real DMs crisp decisions into Pythagorean fuzzy numbers; 2) to test the applicability of both the hybrid PFMCGDM TOPSIS and the hybrid PFMCGDM COPRAS methods in the field of material selection. In this case study, a group of five DMs, faculty members and graduate students, from the Materials Science and Engineering Department at the University of Wisconsin-Milwaukee, were selected to participate as DMs. Their evaluations fulfilled the first objective of the case study. A computer application for material selection was developed to assist designers and engineers in real life problems. A comparative analysis was performed to compare the results of both hybrid MCGDM methods. A sensitivity analysis was conducted to show the robustness and reliability of the outcomes obtained from both methods. It is concluded that using the proposed hybrid PFMCGDM TOPSIS method is more effective and practical in the material selection process than the proposed hybrid PFMCGDM COPRAS method. Additionally, recommendations for further research are suggested

University of Wisconsin-Milwaukee