Search CORE

342 research outputs found

SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding

Author: Chen Kevin C.
Roy Rajat S.
Schliep Alexander
Sengupta Anirvan M.
Publication venue
Publication date: 09/11/2011
Field of study

Scaffolding is an important subproblem in "de novo" genome assembly in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex mammalian genomes and compared the results to the majority voting procedure used by many other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs with high accuracy in all cases and gave more accurate position predictions than majority voting for complex genomes, in particular the human genome. Finally, we present a simple scaffolding algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting both contig positions and orientations for real data sets.Comment: 16 pages, 6 figures, 7 table

arXiv.org e-Print Archive

CiteSeerX

pGQL: A probabilistic graphical query language for gene expression time courses

Author: A Schliep
A Schliep
Alexander Schliep
H Hochheiser
IG Costa
Ivan G Costa
J Ernst
KY Yeung
LR Rabiner
M Ashburner
MF Ramoni
R Durbin
Ruben Schilling
S Chu
Z Bar-Joseph
Z Bar-Joseph
Publication venue: BMC
Publication date: 01/01/2011
Field of study

Abstract Background Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance. Results We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set. Conclusions We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Evaluation of reference genes for RT-qPCR studies in the seagrass zostera muelleri exposed to light limitation

Author: Bryant CV
Pernice M
Ralph PJ
Rasheed MA
Schliep M
Sinutok S
York PH
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/11/2015
Field of study

Seagrass meadows are threatened by coastal development and global change. In the face of these pressures, molecular techniques such as reverse transcription quantitative real-time PCR (RT-qPCR) have great potential to improve management of these ecosystems by allowing early detection of chronic stress. In RT-qPCR, the expression levels of target genes are estimated on the basis of reference genes, in order to control for RNA variations. Although determination of suitable reference genes is critical for RT-qPCR studies, reports on the evaluation of reference genes are still absent for the major Australian species Zostera muelleri subsp. capricorni (Z. muelleri). Here, we used three different software (geNorm, NormFinder and Bestkeeper) to evaluate ten widely used reference genes according to their expression stability in Z. muelleri exposed to light limitation. We then combined results from different software and used a consensus rank of four best reference genes to validate regulation in Photosystem I reaction center subunit IV B and Heat Stress Transcription factor A- gene expression in Z. muelleri under light limitation. This study provides the first comprehensive list of reference genes in Z. muelleri and demonstrates RT-qPCR as an effective tool to identify early responses to light limitation in seagrass

ResearchOnline@JCU

OPUS - University of Technology Sydney

ResearchOnline at James Cook University

PubMed Central

Recommended from our members

Predicting responses to climate change using a joint species, spatially dependent physiologically guided abundance model

Author: Custer Christopher A
Hansen Gretchen JA
North Joshua S
Schliep Erin M
Verhoeven Michael R
Wagner Tyler
Publication venue: eScholarship, University of California
Publication date: 20/06/2024
Field of study

Predicting the effects of warming temperatures on the abundance and distribution of organisms under future climate scenarios often requires extrapolating species-environment correlations to climatic conditions not currently experienced by a species, which can result in unrealistic predictions. For poikilotherms, incorporating species' thermal physiology to inform extrapolations under novel thermal conditions can result in more realistic predictions. Furthermore, models that incorporate species and spatial dependencies may improve predictions by capturing correlations present in ecological data that are not accounted for by predictor variables. Here, we present a joint species, spatially dependent physiologically guided abundance (jsPGA) model for predicting multispecies responses to climate warming. The jsPGA model uses a basis function approach to capture both species and spatial dependencies. We apply the jsPGA model to predict the response of eight fish species to projected climate warming in thousands of lakes in Minnesota, USA. By the end of the century, the cold-adapted species was predicted to have high probabilities of extirpation across its current range-with 10% of lakes currently inhabited by this species having an extirpation probability >0.90. The remaining species had varying levels of predicted changes in abundance, reflecting differences in their thermal physiology. Though the model did not identify many strong species dependencies, the variation in estimated spatial dependence across species suggested that accounting for both dependencies was important for predicting the abundance of these fishes. The jsPGA model provides a new tool for predicting changes in the abundance, distribution, and extirpation probability of poikilotherms under novel thermal conditions

eScholarship - University of California

Conformational rearrangements upon start codon recognition in human 48S translation initiation complex

Author: Adio S.
Chari A.
Fischer N.
Goyal A.
Linden A.
Petrychenko V.
Rodnina M.
Schliep J.
Stark H.
Urlaub H.
Yi S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/05/2022
Field of study

Selection of the translation start codon is a key step during protein synthesis in human cells. We obtained cryo-EM structures of human 48S initiation complexes and characterized the intermediates of codon recognition by kinetic methods using eIF1A as a reporter. Both approaches capture two distinct ribosome populations formed on an mRNA with a cognate AUG codon in the presence of eIF1, eIF1A, eIF2–GTP–Met-tRNAiMet and eIF3. The ‘open’ 40S subunit conformation differs from the human 48S scanning complex and represents an intermediate preceding the codon recognition step. The ‘closed’ form is similar to reported structures of complexes from yeast and mammals formed upon codon recognition, except for the orientation of eIF1A, which is unique in our structure. Kinetic experiments show how various initiation factors mediate the population distribution of open and closed conformations until 60S subunit docking. Our results provide insights into the timing and structure of human translation initiation intermediates and suggest the differences in the mechanisms of start codon selection between mammals and yeast

MPG.PuRe

Photosynthetic acclimation of Nannochloropsis oculata investigated by multi-wavelength chlorophyll fluorescence analysis

Author: Guruprasad S
Kuzhiumparambil U
Larkum AWD
Lilley RMC
Parker K
Ralph PJ
Raven JA
Schliep M
Schreiber U
Szabó M
Tamburic B
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Multi-wavelength chlorophyll fluorescence analysis was utilised to examine the photosynthetic efficiency of the biofuel-producing alga Nannochloropsis oculata, grown under two light regimes; low (LL) and high (HL) irradiance levels. Wavelength dependency was evident in the functional absorption cross-section of Photosystem II (σII(λ)), absolute electron transfer rates (ETR(II)), and non-photochemical quenching (NPQ) of chlorophyll fluorescence in both HL and LL cells. While σII(λ) was not significantly different between the two growth conditions, HL cells upregulated ETR(II) 1.6-1.8-fold compared to LL cells, most significantly in the wavelength range of 440-540nm. This indicates preferential utilisation of blue-green light, a highly relevant spectral region for visible light in algal pond conditions. Under these conditions, the HL cells accumulated saturated fatty acids, whereas polyunsaturated fatty acids were more abundant in LL cells. This knowledge is of importance for the use of N. oculata for fatty acid production in the biofuel industry. © 2014 Elsevier Ltd

OPUS - University of Technology Sydney

Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm

Author: A Schliep
David L. Wild
E Cooke
Emma J. Cooke
G Brock
K Heller
L Bauwens
L Hubert
M Eisen
Magnus Rattray
NA Heard
NA Heard
O Stegle
P Ma
Paul D. W. Kirk
PDW Kirk
Q Liu
R Cho
Richard S. Savage
Robert Darkins
RS Savage
RS Savage
S Datta
S Frühwirth-Schnatter
W Chu
Z Bar-Joseph
Z Bar-Joseph
Zoubin Ghahramani
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 02/04/2013
Field of study

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Fast MCMC sampling for hidden markov models to determine copy number variations

Author: A Krogh
A Schliep
A Viterbi
AB Olshen
Alexander Schliep
AM Snijders
CM Bishop
D Pelleg
D Pinto
F Picard
H Willenbrock
J Fridlyand
J Fritsch
K Wang
L Rabiner
LE Baum
M Bredel
Md Pavel Mahmud
P Wang
PHC Eilers
Q McNemar
R Andersson
R Durbin
R Tibshirani
RJD Leeuw
S Chib
S Geman
S Guha
S Morganella
S Mozes
S Salvador
S Scott
S Srivastava
SP Shah
SP Shah
SR Eddy
T Harada
W Gilks
WR Lai
Y Nannya
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems. Results We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling. Conclusions We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches. <it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central