Search CORE

3,647 research outputs found

BcCluster: a bladder cancer database at the molecular level

Author: Bhat Akshay
Jankowski Vera
Mischak Harald
Mokou Marika
Vlahou Antonia
Zoidakis Jerome
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Background: Bladder Cancer (BC) has two clearly distinct phenotypes. Non-muscle invasive BC has good prognosis and is treated with tumor resection and intravesical therapy whereas muscle invasive BC has poor prognosis and requires usually systemic cisplatin based chemotherapy either prior to or after radical cystectomy. Neoadjuvant chemotherapy is not often used for patients undergoing cystectomy. High-throughput analytical omics techniques are now available that allow the identification of individual molecular signatures to characterize the invasive phenotype. However, a large amount of data produced by omics experiments is not easily accessible since it is often scattered over many publications or stored in supplementary files. Objective: To develop a novel open-source database, BcCluster (http://www.bccluster.org/), dedicated to the comprehensive molecular characterization of muscle invasive bladder carcinoma. Materials: A database was created containing all reported molecular features significant in invasive BC. The query interface was developed in Ruby programming language (version 1.9.3) using the web-framework Rails (version 4.1.5) (http://rubyonrails.org/). Results: BcCluster contains the data from 112 published references, providing 1,559 statistically significant features relative to BC invasion. The database also holds 435 protein-protein interaction data and 92 molecular pathways significant in BC invasion. The database can be used to retrieve binding partners and pathways for any protein of interest. We illustrate this possibility using survivin, a known BC biomarker. Conclusions: BcCluster is an online database for retrieving molecular signatures relative to BC invasion. This application offers a comprehensive view of BC invasiveness at the molecular level and allows formulation of research hypotheses relevant to this phenotype

PubMed Central

Publikationsserver der RWTH Aachen University

Enlighten

PaperRobot: Incremental Draft Generation of Scientific Ideas

Author: Bansal Mohit
Huang Lifu
Ji Heng
Jiang Zhiying
Knight Kevin
Luan Yi
Wang Qingyun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively.Comment: 12 pages. Accepted by ACL 2019 Code and resource is available at https://github.com/EagleW/PaperRobo

arXiv.org e-Print Archive

Crossref

PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Author: Campbell Moray J.
Dan Li Shuyu
Franz Eric
Huang Kun
Huang Zhi
Johnson Travis S.
Li Sihong
Zhang Yan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2019
Field of study

BACKGROUND: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes. RESULTS: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. CONCLUSIONS: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike

IUPUIScholarWorks

Recommended from our members

Synergistic drug combinations from electronic health records and gene expression.

Author: Chen William
Daugherty Aaron C
Desai Manisha
Farrington Carl
Gomez Scarlett L
Hastie Trevor
Kenkare Pragati
Kurian Allison W
Lim Michael
Low Yen S
Mathur Maya
Radin Andrew A
Schroeder Elizabeth A
Seto Tina
Shah Nigam H
Sirota Marina
Sledge George W
Thompson Caroline A
Weber Susan
Yu Peter P
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

ObjectiveUsing electronic health records (EHRs) and biomolecular data, we sought to discover drug pairs with synergistic repurposing potential. EHRs provide real-world treatment and outcome patterns, while complementary biomolecular data, including disease-specific gene expression and drug-protein interactions, provide mechanistic understanding.MethodWe applied Group Lasso INTERaction NETwork (glinternet), an overlap group lasso penalty on a logistic regression model, with pairwise interactions to identify variables and interacting drug pairs associated with reduced 5-year mortality using EHRs of 9945 breast cancer patients. We identified differentially expressed genes from 14 case-control human breast cancer gene expression datasets and integrated them with drug-protein networks. Drugs in the network were scored according to their association with breast cancer individually or in pairs. Lastly, we determined whether synergistic drug pairs found in the EHRs were enriched among synergistic drug pairs from gene-expression data using a method similar to gene set enrichment analysis.ResultsFrom EHRs, we discovered 3 drug-class pairs associated with lower mortality: anti-inflammatories and hormone antagonists, anti-inflammatories and lipid modifiers, and lipid modifiers and obstructive airway drugs. The first 2 pairs were also enriched among pairs discovered using gene expression data and are supported by molecular interactions in drug-protein networks and preclinical and epidemiologic evidence.ConclusionsThis is a proof-of-concept study demonstrating that a combination of complementary data sources, such as EHRs and gene expression, can corroborate discoveries and provide mechanistic insight into drug synergism for repurposing

eScholarship - University of California

Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer

Author: CD Lehman
CM Grinstead
David Page
EA Sickles
Elizabeth Burnside
FM Hall
I Vizcaino
JA Baker
Jude Shavlik
Kazuhiko Shinki
KJ Cios
L Liberman
Louis Oliphant
M Lang
MA Helvie
PA Dang
RL Egan
Ryan W. Woods
S Ciatto
S Dzeroski
VP Jackson
X Varas
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5–17.6), irregular mass shape (OR 10.0, CI 3.4–29.5), spiculated mass margin (OR 20.4, CI 1.9–222.8), and subject age (β = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

The Genomic HyperBrowser: inferential genomics at the sequence level

Author: Clancy Trevor
Ferkingstad Egil
Frigessi Arnoldo
Glad Ingrid K.
Gundersen Sveinung
Holden Lars
Holden Marit
Hovig Eivind
Johansen Morten
Liestøl Knut
Nygaard Vegard
Rydbeck Halfdan
Sandve Geir K.
Tøstesen Eivind
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no

arXiv.org e-Print Archive

Springer - Publisher Connector

PubMed Central

NORA - Norwegian Open Research Archives

Recommended from our members

Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy.

MotivationMultiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.ResultsWe performed a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets included measurements from the immunome, transcriptome, microbiome, proteome and metabolome of samples obtained simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single model. This model not only significantly increased predictive power by combining all datasets, but also revealed novel interactions between different biological modalities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo analysis of immune-modulating interventions based on the mechanisms identified.Availability and implementationDatasets and scripts for reproduction of results are available through: https://nalab.stanford.edu/multiomics-pregnancy/.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

PolyPublie

Mining expressed sequence tags identifies cancer markers of clinical interest

Author: A Aouacheria
A Cromer
AG Bader
B Vogelstein
BJ Quade
BR Zeeberg
C Cortes
CF Aliferis
CL Nutt
CM Perou
DR Rhodes
ET Munoz
Fabien Campagne
G Dennis Jr.
GP Donovan
GS Sellick
HK Lee
IB Rosenwald
JC Darnell
KF Manly
L Dyrskjot
L Skrabanek
LJ van 't Veer
Lucy Skrabanek
M Unoki
MJ Clemens
MS Boguski
R Aebersold
R Edgar
R Simon
RB Darnell
S Mukherjee
S Ramaswamy
SL Pomeroy
T Joachims
TJ MacDonald
TM Chu
TM Chu
VE Velculescu
W Liu
YT Chen
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers. RESULTS: We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered. CONCLUSION: These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MOLIERE: Automatic Biomedical Hypothesis Generation System

Author: Safro Ilya
Shtutman Michael
Sybrandt Justin
Publication venue: Clemson University Libraries
Publication date: 01/05/2017
Field of study

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI). These objects include but are not limited to scientific papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within this network, and demonstrate the effectiveness of MOLIERE by performing hypothesis generation on historical data. Our network, implementation, and resulting data are all publicly available for the broad scientific community

arXiv.org e-Print Archive

Clemson University: TigerPrints

MOLIERE: Automatic Biomedical Hypothesis Generation System

Author: Safro Ilya
Shtutman Michael
Sybrandt Justin
Publication venue: Clemson University Libraries
Publication date: 01/05/2017
Field of study

Clemson University: TigerPrints