Search CORE

Ghent University Academic Bibliography

Edinburgh Research Explorer

Archivsystem Ask23

HAL-CEA

Fast approximate hierarchical clustering using similarity heuristics

Author: A Saeed
AJ Saldanha
AK Jain
C Böhm
D Eppstein
J Herrero
J Vilo
Jaak Vilo
JC Gower
L Kaufmann
M Ashburner
M Lukk
MB Eisen
Meelis Kull
MJL de Hoon
P Erdös
P Legendre
P Zezula
Q Zhang
R Shyamsundar
S Datta
T Cormen
Z Du
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

CiteSeerX

Public Library of Science (PLOS)

Time-Course Analysis of Cyanobacterium Transcriptome: Detecting Oscillatory Genes

Author: AM Muro-Pastor
Carla Layana
CE Shannon
CH Johnson
ET Jaynes
HR Ueda
JB Hogenesch
JL Ditty
K Kucho
KF Storch
L Diambra
L Diambra
Luis Diambra
M Ahdesmaki
M Ishiura
ME Hughes
ME Hughes
ME Hughes
ME Hughes
MJ McDonald
MJL de Hoon
P Chaudhuri
RD Levine
S Panda
S Wichert
SS Golden
U Albrecht
U de Lichtenberg
WA Schmitt
WA Schmitt
X Lu
Y Luan
Ying Xu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The microarray technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining these data one can identify the dynamics of the gene expression time series. The detection of genes that are periodically expressed is an important step that allows us to study the regulatory mechanisms associated with the circadian cycle. The problem of finding periodicity in biological time series poses many challenges. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, outliers and unevenly sampled time points. Consequently, the method for finding periodicity should preferably be robust against such anomalies in the data. In this paper, we propose a general and robust procedure for identifying genes with a periodic signature at a given significance level. This identification method is based on autoregressive models and the information theory. By using simulated data we show that the suggested method is capable of identifying rhythmic profiles even in the presence of noise and when the number of data points is small. By recourse of our analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors.

Author: A Kapoor
A Krapp
A Rada-Iglesias
A Sawada
A-C Binot
AF Hezel
AJ Saldanha
B Zhao
B Zhao
C Cortijo
C Haumaitre
C Trapnell
CHH Cho
CY McLean
DA Stoffers
DW Huang
E Kroon
E Rodríguez-Seguel
EF Chiang
F Argenton
F Esni
F Supek
FC Lynn
FC Pan
H Fang
H Fang
H Lango Allen
I Morán
I Rooman
J Bessa
J van Arensbergen
JM Oliver-Krasinski
K Kawakami
K Piper
K Piper
K Skouloudaki
KJ Gaulton
KM Petzold
KS Zaret
L Elghazi
L Pasquali
M Borowiak
M Carrasco
M Gannon
MA Maestro
MC Whitlock
MF Offield
MJL de Hoon
MN Weedon
MP Creyghton
N Bardeesy
N Gao
NM George
P Jacquemin
PA Seymour
R O’Rahilly
R Xie
RE Jennings
RF Luco
S Gupta
S Heinz
S Xuan
T Derrien
T Jowett
W Zhang
WA Whyte
Y Fujitani
Y Liu-Chittenden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2015
Field of study

The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

University of Birmingham Research Portal

Spiral - Imperial College Digital Repository

The University of Manchester - Institutional Repository

HAMSTER: visualizing microarray experiments as a set of minimum spanning trees

Author: A Bender
A Thalamuthu
D Eppstein
D Xu
E Gabriel
ER Gansner
Hajime Harada
Hiroshi Mamitsuka
J Shi
JB Kruskal
JG Siek
JR Dinneny
Larisa Kiseleva
M Maechler
MB Eisen
MJL de Hoon
P Agarwal
P Tamayo
Paul Horton
PM Magwene
R Herwig
R Sedgewick
Raymond Wan
RC Prim
SC Wieland
T Barrett
T Liu
TH Cormen
The Gene Ontology Consortium
V Olman
V Olman
V Olman
Y Xu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (<it>e.g. microarray</it>) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different. Methods HAMSTER (Helpful Abstraction using Minimum Spanning Trees for Expression Relations) is an open source system for generating a set of MSTs from the experiments of a microarray data set. While previous works have generated a single MST from a data set for data clustering, we recursively merge experiments and repeat this process to obtain a set of MSTs for data visualization. Depending on the parameters chosen, each tree is analogous to a snapshot of one step of the hierarchical clustering process. We scored and ranked these trees using one of three proposed schemes. HAMSTER is implemented in C++ and makes use of Graphviz for laying out each MST. Results We report on the running time of HAMSTER and demonstrate using data sets from the NCBI Gene Expression Omnibus (GEO) that the images created by HAMSTER offer insights that differ from the dendrograms of hierarchical clustering. In addition to the C++ program which is available as open source, we also provided a web-based version (HAMSTER+) which allows users to apply our system through a web browser without any computer programming knowledge. Conclusion Researchers may find it helpful to include HAMSTER in their microarray analysis workflow as it can offer insights that differ from hierarchical clustering. We believe that HAMSTER would be useful for certain types of gradient data sets (e.g time-series data) and data that indicate relationships between cells/tissues. Both the source and the web server variant of HAMSTER are available from <url>http://hamster.cbrc.jp/</url>.</p

Kyoto University Research Information Repository

A comparison of four clustering methods for brain expression microarray data

Author: A Prelić
A Riva
A Thalamuthu
AI Su
Alexander L Richards
BM Bolstad
BW Higgs
C Stansberg
CC Liu
D Dembélé
D Thain
DA Hosack
DB Allison
GC Tseng
J Ihmels
JA Hartigan
JD Cahoy
Lesley Jones
M Dai
M Dettling
M Kloster
MC O'Donovan
Michael C O'Donovan
Michael J Owen
MJL de Hoon
NR Garge
P Khatri
Peter Holmans
R Edgar
S Bergmann
SV Kyosseva
T Barrett
T Beißbarth
T Walsh
Z Fang
ZS Qin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. Results k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. Conclusion Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice

Online Research @ Cardiff

Pathogenic Bacillus anthracis in the progressive gene losses and gains in adaptive evolution

Author: A Fagerlund
A Mira
A Sorokin
AJ Saldanha
AT Maurelli
BA Wilson
C Guidi-Rontani
C Guidi-Rontani
CK Marston
CS Han
D Xu
DA Rasko
DA Rasko
E Borezee
G Sternbach
GX Yu
H Werbrouck
I Miras
J Parkhill
JJ Wernegreen
JL Rodriguez
JM Janda
L Gómez-Valero
LR Washburn
LS Lepore
M Kostrzynska
M Olier
M Olier
MJL De Hoon
NR Thomson
P Cossart
P Sumby
R Overbeek
R Schuch
S Fedhila
S Shafazand
SB Beres
SG Andersson
TD Read
WA Day
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Sequence mutations represent a driving force of adaptive evolution in bacterial pathogens. It is especially evident in reductive genome evolution where bacteria underwent lifestyles shifting from a free-living to a strictly intracellular or host-depending life. It resulted in loss of function mutations and/or the acquisition of virulence gene clusters. Bacillus anthracis shares a common soil bacterial ancestor with its closely related bacillus species but is the only obligate, causative agent of inhalation anthrax within the genus Bacillus. The anthrax-causing Bacillus anthracis experienced the similar lifestyle changes. We thus hypothesized that the bacterial pathogen would follow a compatible evolution path. Results: In this study, a cluster-based evolution scheme was devised to analyze genes that are gained by or lost from B. anthracis. The study detected gene losses/gains at two separate evolutionary stages. The stage I is when B. anthracis and its sister species within the Bacillus cereus group diverged from other species in genus Bacillus. The stage II is when B. anthracis differentiated from its two closest relatives: B. cereus and B. thuringiensis. Many genes gained at these stages are homologues of known pathogenic factors such those for internalin, B. anthracis-specific toxins and large groups of surface proteins and lipoproteins. Conclusion: The analysis presented here allowed us to portray a progressive evolutionary process during the lifestyle shift of B. anthracis, thus providing new insights into how B. anthracis had evolved and bore a promise of finding drug and vaccine targets for this strategically important pathogen

Boise State University - ScholarWorks

Cross-Mapping Events in miRNAs Reveal Potential miRNA-Mimics and Evolutionary Implications

Author: A Azuma-Mukai
A Stark
AS O'Toole
B Czech
B Langmead
C Borel
C Pelletier
CF Hongay
DP Bartel
F Kuchenbauer
G Jagadeeswaran
J Wang
JE Babiarz
JG Ruby
K Okamura
K Okamura
KE Shearwin
L Guo
L Guo
L Guo
Li Guo
M Ghildiyal
M Lagos-Quintana
M Morlando
MA Larkin
MJL de Hoon
NC Lau
NC Schopman
P Jiang
Pawel Michalak
R Contu
RC Lee
RD Morin
RJ Taft
S Ro
Tingming Liang
Wanjun Gu
WC Cho
WK Wu
Yuming Xu
Yunfei Bai
Z Mourelatos
Zuhong Lu
Publication venue: Public Library of Science
Publication date: 26/05/2011
Field of study

MicroRNAs (miRNAs) have important roles in various biological processes. miRNA cross-mapping is a prevalent phenomenon where miRNA sequence originating from one genomic region is mapped to another location. To have a better understanding of this phenomenon in the human genome, we performed a detailed analysis in this paper using public miRNA high-throughput sequencing data and all known human miRNAs. We observed widespread cross-mapping events between miRNA precursors (pre-miRNAs), other non-coding RNAs (ncRNAs) and the opposite strands of pre-miRNAs by analyzing the high-throughput sequencing data. Computational analysis on all known human miRNAs also confirmed that many of them could be involved in cross-mapping events. The processing or decay of both ncRNAs and pre-miRNA opposite strand transcripts may contribute to miRNA enrichment, although some might be miRNA-mimics due to miRNA mis-annotation. Comparing to canonical miRNAs, miRNAs involved in cross-mapping events between pre-miRNAs and other ncRNAs normally had shorter lengths (17–19 nt), lower prediction scores and were classified as pseudo miRNA precursors. Notably, 4.9% of all human miRNAs could be accurately mapped to the opposite strands of pre-miRNAs, which showed that both strands of the same genomic region had the potential to produce mature miRNAs and simultaneously implied some potential miRNA precursors. We proposed that the cross-mapping events are more complex than we previously thought. Sequence similarity between other ncRNAs and pre-miRNAs and the specific stem-loop structures of pre-miRNAs may provide evolutionary implications

Public Library of Science (PLOS)

Bayesian profiling of molecular signatures to predict event times

Author: A Al-katib
A Rosenwald
AE Papatestas
AL Shaffer
CJ Campbell
Dabao Zhang
DG Beer
DR Cox
DV Nguyen
E Bair
EA Garcia-Zepeda
F Hogervorst
H Li
H Li
H Li
H Wold
IS Lossos
J Gui
JD Kalbfleisch
L Ein-Dor
L Li
M Subramaniam
M West
M Zhang
M Zhang
M Zhang
M Zhang
MG Tadesse
MG Tadesse
Min Zhang
MJL De Hoon
MK Cowles
N Sha
O Troyanskaya
P Sterpetti
PH Garthwaite
PJ Park
R Jörnsten
RL Strausberg
S Henikoff
S Mocellin
T Hastie
TR Golub
V Ling
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: It is of particular interest to identify cancer-specific molecular signatures for early diagnosis, monitoring effects of treatment and predicting patient survival time. Molecular information about patients is usually generated from high throughput technologies such as microarray and mass spectrometry. Statistically, we are challenged by the large number of candidates but only a small number of patients in the study, and the right-censored clinical data further complicate the analysis. RESULTS: We present a two-stage procedure to profile molecular signatures for survival outcomes. Firstly, we group closely-related molecular features into linkage clusters, each portraying either similar or opposite functions and playing similar roles in prognosis; secondly, a Bayesian approach is developed to rank the centroids of these linkage clusters and provide a list of the main molecular features closely related to the outcome of interest. A simulation study showed the superior performance of our approach. When it was applied to data on diffuse large B-cell lymphoma (DLBCL), we were able to identify some new candidate signatures for disease prognosis. CONCLUSION: This multivariate approach provides researchers with a more reliable list of molecular features profiled in terms of their prognostic relationship to the event times, and generates dependable information for subsequent identification of prognostic molecular signatures through either biological procedures or further data analysis

CiteSeerX

Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency

Author: A Bairoch
A P Potapov
A Subramanian
A Wagner
AL Barabasi
Cheng-Yu Yeh
DM Parkin
DW Huang
E Segal
EW Dijkstra
G Sherlock
H Wei
Hsiang-Yuan Yeh
J Cheng
J Lapointe
JC David
L Lopez-Serra
M Ashburner
M Benson
M Kanehisa
M Mayo
MJL de Hoon
N Friedman
O Troyanskaya
P Humbert
PM Haverty
RE Neapolitan
RV Sol'e
S Acid
S Gordon
SA Tomlins
SA Tomlins
Shih-Fang Lin
Shih-Wu Cheng
SS Shen-Orr
T Akutsu
V Curwen
Von-Wun Soo
Y Tamada
Yu-Chun Lin
Publication venue: BioMed Central
Publication date: 01/12/2009
Field of study

Abstract Background Prostate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer. Results To deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage. Conclusions We provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment.</p