Search CORE

15 research outputs found

Bayesian Unsupervised Learning with Multiple Data Types

Author: Colin Campbell
Phaedra Agius
Yiming Ying
Publication venue
Publication date: 22/07/2013
Field of study

We propose Bayesian generative models for unsupervised learning with two types of data and an assumed dependency of one type of data on the other. We consider two algorithmic approaches, based on a correspondence model where latent variables are shared across datasets. These models indicate the appropriate number of clusters in addition to indicating relevant features in both types of data. We evaluate the model on artificially created data. We then apply the method to a breast cancer dataset consisting of gene expression and microRNA array data derived from the same patients. We assume dependence of gene expression on microRNA expression in this study. The method ranks genes within subtypes which have statistically significant abnormal expression and ranks associated abnormally expressing microRNA. We report a genetic signature for the basal-like subtype of breast cancer found across a number of previous gene expression array studies. Using the two algorithmic approaches we find that this signature also arises from clustering on the microRNA expression data and appears derivative from this data.

CiteSeerX

Open Research Exeter

Bayesian unsupervised learning with multiple data types

Author: Agius Phaedra
Campbell Colin
Ying Yiming
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 22/07/2013
Field of study

Copyright © 2009 Walter de Gruyter. The final publication is available at www.degruyter.comWe propose Bayesian generative models for unsupervised learning with two types of data and an assumed dependency of one type of data on the other. We consider two algorithmic ap- proaches, based on a correspondence model where latent variables are shared across datasets. These models indicate the appropriate number of clusters in addition to indicating relevant features in both types of data. We evaluate the model on arti¯cially created data. We then apply the method to a breast cancer dataset consisting of gene expression and microRNA array data derived from the same patients. We assume dependence of gene expression on microRNA expression in this study. The method ranks genes within subtypes which have statistically signi¯cant abnormal expression and ranks associated abnormally expressing mi- croRNA. We report a genetic signature for the basal-like subtype of breast cancer found across a number of previous gene expression array studies. Using the two algorithmic ap- proaches we ¯nd that this signature also arises from clustering on the microRNA expression data and appears derivative from this data

Open Research Exeter

Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

Author: Agius Phaedra
Betel Doron
Koppal Anjali
Leslie Christina
Sander Chris
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

mirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites

Crossref

Springer - Publisher Connector

PubMed Central

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Author: Aaron Arvey
C Kissinger
C Leslie
C Zhu
Christina Leslie
CT Harbison
D Fulton
DE Newburger
E Bolotin
E Fraenkel
G Badis
G Badis
G Pavesi
MF Berger
O Wallerman
P Kharchenko
Phaedra Agius
R Kuang
S Georgiev
Uwe Ohler
William Chang
William Stafford Noble
WS Noble
X Chen
X Chen
XS Liu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel -mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Unexpected similarities between C9ORF72 and sporadic forms of ALS/FTD suggest a common disease mechanism

Author: Agius Phaedra
Conlon Erin Grace
Davis-Porada Julia
Fagegaltier Delphine
Gregory James
Hubbard Isabel
Kang Kristy
Kim Duyang
Manley James L.
Phatnani Hemali
Shneider Neil A.
The New York Genome Center ALS Consortium
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) represent two ends of a disease spectrum with shared clinical, genetic and pathological features. These include near ubiquitous pathological inclusions of the RNA-binding protein (RBP) TDP-43, and often the presence of a GGGGCC expansion in the C9ORF72 (C9) gene. Previously, we reported that the sequestration of hnRNP H altered the splicing of target transcripts in C9ALS patients. Here, we show that this signature also occurs in half of 50 postmortem sporadic, non-C9 ALS/FTD brains. Furthermore, and equally surprisingly, these ‘like-C9’ brains also contained correspondingly high amounts of insoluble TDP-43, as well as several other disease-related RBPs, and this correlates with widespread global splicing defects. Finally, we show that the like-C9 sporadic patients, like actual C9ALS patients, were much more likely to have developed FTD. We propose that these unexpected links between C9 and sporadic ALS/FTD define a common mechanism in this disease spectrum

Columbia University Academic Commons

eScholarship - University of California

Bayesian Unsupervised Learning with Multiple Data Types

Author: Campbell ICG
Phaedra Agius
Yiming Ying
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2009
Field of study

Explore Bristol Research

Bayesian Unsupervised Learning with Multiple Data Types

Author: Colin Campbell
Phaedra Agius
Yiming Ying
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref

Bayesian Unsupervised Learning with Multiple Data Types

Author: Agius Phaedra
Campbell Colin
Ying Yiming
Publication venue
Publication date
Field of study

We propose Bayesian generative models for unsupervised learning with two types of data and an assumed dependency of one type of data on the other. We consider two algorithmic approaches, based on a correspondence model, where latent variables are shared across datasets. These models indicate the appropriate number of clusters in addition to indicating relevant features in both types of data. We evaluate the model on artificially created data. We then apply the method to a breast cancer dataset consisting of gene expression and microRNA array data derived from the same patients. We assume partial dependence of gene expression on microRNA expression in this study. The method ranks genes within subtypes which have statistically significant abnormal expression and ranks associated abnormally expressing microRNA. We report a genetic signature for the basal-like subtype of breast cancer found across a number of previous gene expression array studies. Using the two algorithmic approaches we find that this signature also arises from clustering on the microRNA expression data and appears derivative from this data.

Research Papers in Economics

Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans

Author: Agius Phaedra
Chen Michael
Chung Wei-Jen
Lai Eric C.
Leslie Christina S.
Okamura Katsutomo
Robine Nicolas
Westholm Jakub O.
Publication venue: Cold Spring Harbor Laboratory Press
Publication date
Field of study

Mirtrons are intronic hairpin substrates of the dicing machinery that generate functional microRNAs. In this study, we describe experimental assays that defined the essential requirements for entry of introns into the mirtron pathway. These data informed a bioinformatic screen that effectively identified functional mirtrons from the Drosophila melanogaster transcriptome. These included 17 known and six confident novel mirtrons among the top 51 candidates, and additional candidates had limited read evidence in available small RNA data. Our computational model also proved effective on Caenorhabditis elegans, for which the identification of 14 cloned mirtrons among the top 22 candidates more than tripled the number of validated mirtrons in this species. A few low-scoring introns generated mirtron-like read patterns from atypical RNA structures, but their paucity suggests that relatively few such loci were not captured by our model. Unexpectedly, we uncovered examples of clustered mirtrons in both fly and worm genomes, including a <8-kb region in C. elegans harboring eight distinct mirtrons. Altogether, we demonstrate that discovery of functional mirtrons, unlike canonical miRNAs, is amenable to computational methods independent of evolutionary constraint

Crossref

PubMed Central