Search CORE

9,250 research outputs found

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

Author: Fernandes Marco
Husi Holger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis

PubMed Central

Enlighten

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

EFSIS: Ensemble Feature Selection Integrating Stability

Author: Jonassen Inge
Zhang Xiaokang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2018
Field of study

Ensemble learning that can be used to combine the predictions from multiple learners has been widely applied in pattern recognition, and has been reported to be more robust and accurate than the individual learners. This ensemble logic has recently also been more applied in feature selection. There are basically two strategies for ensemble feature selection, namely data perturbation and function perturbation. Data perturbation performs feature selection on data subsets sampled from the original dataset and then selects the features consistently ranked highly across those data subsets. This has been found to improve both the stability of the selector and the prediction accuracy for a classifier. Function perturbation frees the user from having to decide on the most appropriate selector for any given situation and works by aggregating multiple selectors. This has been found to maintain or improve classification performance. Here we propose a framework, EFSIS, combining these two strategies. Empirical results indicate that EFSIS gives both high prediction accuracy and stability.Comment: 20 pages, 3 figure

arXiv.org e-Print Archive

University of Bergen

Crossref

NORA - Norwegian Open Research Archives

BcCluster: a bladder cancer database at the molecular level

Author: Bhat Akshay
Jankowski Vera
Mischak Harald
Mokou Marika
Vlahou Antonia
Zoidakis Jerome
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Background: Bladder Cancer (BC) has two clearly distinct phenotypes. Non-muscle invasive BC has good prognosis and is treated with tumor resection and intravesical therapy whereas muscle invasive BC has poor prognosis and requires usually systemic cisplatin based chemotherapy either prior to or after radical cystectomy. Neoadjuvant chemotherapy is not often used for patients undergoing cystectomy. High-throughput analytical omics techniques are now available that allow the identification of individual molecular signatures to characterize the invasive phenotype. However, a large amount of data produced by omics experiments is not easily accessible since it is often scattered over many publications or stored in supplementary files. Objective: To develop a novel open-source database, BcCluster (http://www.bccluster.org/), dedicated to the comprehensive molecular characterization of muscle invasive bladder carcinoma. Materials: A database was created containing all reported molecular features significant in invasive BC. The query interface was developed in Ruby programming language (version 1.9.3) using the web-framework Rails (version 4.1.5) (http://rubyonrails.org/). Results: BcCluster contains the data from 112 published references, providing 1,559 statistically significant features relative to BC invasion. The database also holds 435 protein-protein interaction data and 92 molecular pathways significant in BC invasion. The database can be used to retrieve binding partners and pathways for any protein of interest. We illustrate this possibility using survivin, a known BC biomarker. Conclusions: BcCluster is an online database for retrieving molecular signatures relative to BC invasion. This application offers a comprehensive view of BC invasiveness at the molecular level and allows formulation of research hypotheses relevant to this phenotype

PubMed Central

Publikationsserver der RWTH Aachen University

Enlighten

Quantification and expert evaluation of evidence for chemopredictive biomarkers to personalize cancer treatment.

Author: Beckman Robert A.
Boca Simina M.
Brody Jonathan R.
Madhavan Subha
Marshall John L.
Pishvaian Michael J.
Rao Shruti
Riazi Shahla
Yabar Cinthya S.
Publication venue: Jefferson Digital Commons
Publication date: 06/06/2017
Field of study

Predictive biomarkers have the potential to facilitate cancer precision medicine by guiding the optimal choice of therapies for patients. However, clinicians are faced with an enormous volume of often-contradictory evidence regarding the therapeutic context of chemopredictive biomarkers.We extensively surveyed public literature to systematically review the predictive effect of 7 biomarkers claimed to predict response to various chemotherapy drugs: ERCC1-platinums, RRM1-gemcitabine, TYMS-5-fluorouracil/Capecitabine, TUBB3-taxanes, MGMT-temozolomide, TOP1-irinotecan/topotecan, and TOP2A-anthracyclines. We focused on studies that investigated changes in gene or protein expression as predictors of drug sensitivity or resistance. We considered an evidence framework that ranked studies from high level I evidence for randomized controlled trials to low level IV evidence for pre-clinical studies and patient case studies.We found that further in-depth analysis will be required to explore methodological issues, inconsistencies between studies, and tumor specific effects present even within high evidence level studies. Some of these nuances will lend themselves to automation, others will require manual curation. However, the comprehensive cataloging and analysis of dispersed public data utilizing an evidence framework provides a high level perspective on clinical actionability of these protein biomarkers. This framework and perspective will ultimately facilitate clinical trial design as well as therapeutic decision-making for individual patients

Jefferson Digital Commons

PeptiCKDdb-peptide- and protein-centric database for the investigation of genesis and progression of chronic kidney disease

Author: Fernandes Marco
Filip Szymon
Husi Holger
Jankowski Joachim
Krochmal Magdalena
Mischak Harald
Pontillo Claudia
Vlahou Antonia
Zoidakis Jerome
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

The peptiCKDdb is a publicly available database platform dedicated to support research in the field of chronic kidney disease (CKD) through identification of novel biomarkers and molecular features of this complex pathology. PeptiCKDdb collects peptidomics and proteomics datasets manually extracted from published studies related to CKD. Datasets from peptidomics or proteomics, human case/control studies on CKD and kidney or urine profiling were included. Data from 114 publications (studies of body fluids and kidney tissue: 26 peptidomics and 76 proteomics manuscripts on human CKD, and 12 focusing on healthy proteome profiling) are currently deposited and the content is quarterly updated. Extracted datasets include information about the experimental setup, clinical study design, discovery-validation sample sizes and list of differentially expressed proteins (P-value < 0.05). A dedicated interactive web interface, equipped with multiparametric search engine, data export and visualization tools, enables easy browsing of the data and comprehensive analysis. In conclusion, this repository might serve as a source of data for integrative analysis or a knowledgebase for scientists seeking confirmation of their findings and as such, is expected to facilitate the modeling of molecular mechanisms underlying CKD and identification of biologically relevant biomarkers.Database URL: www.peptickddb.com

Institutional Repository of the Freie Universität Berlin

Maastricht University Research Portal

PubMed Central

Publikationsserver der RWTH Aachen University

Enlighten

Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

Author: Mayr Andreas
Schmid Matthias
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/10/2013
Field of study

The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

Open Access LMU

PubMed Central

FigShare