Search CORE

17 research outputs found

AphanoDB: a genomic resource for Aphanomyces pathogens.

Author: Couloux Arnaud
Dumas Bernard
Gaulin Elodie
Madoui Mohammed-Amine
Mathé Catherine
San Clemente Hélène
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

BACKGROUND: The Oomycete genus Aphanomyces comprises devastating plant and animal pathogens. However, little is known about the molecular mechanisms underlying pathogenicity of Aphanomyces species. In this study, we report on the development of a public database called AphanoDB which is dedicated to Aphanomyces genomic data. As a first step, a large collection of Expressed Sequence Tags was obtained from the legume pathogen A. euteiches, which was then processed and collected into AphanoDB. DESCRIPTION: Two cDNA libraries of A. euteiches were created: one from mycelium growing on synthetic medium and one from mycelium grown in contact to root tissues of the model legume Medicago truncatula. From these libraries, 18,684 expressed sequence tags were obtained and assembled into 7,977 unigenes which were compared to public databases for annotation. Queries on AphanoDB allow the users to retrieve information for each unigene including similarity to known protein sequences, protein domains and Gene Ontology classification. Statistical analysis of EST frequency from the two different growth conditions was also added to the database. CONCLUSION: AphanoDB is a public database with a user-friendly web interface. The sequence report pages are the main web interface which provides all annotation details for each unigene. These interactive sequence report pages are easily available through text, BLAST, Gene Ontology and expression profile search utilities. AphanoDB is available from URL: http://www.polebio.scsv.ups-tlse.fr/aphano/

HAL Evry

Crossref

Springer - Publisher Connector

PubMed Central

HAL-CEA

A Bayesian Nonparametric Method for Prediction in EST Analysis

Author: Antonio Lijoi
Igor Prünster
Ramsés H. Mena
Publication venue
Publication date
Field of study

In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries studied in Susko and Roger (2004), with frequentist methods, are analyzed in detail.

Research Papers in Economics

An EST resource for tilapia based on 17 normalized libraries and assembly of 116,899 sequence tags

Author: Baroiller Jean-Francois
Carleton Karen L
Conte Matthew A
D'Cotta Helena
di Palma Federica
Howe Aimee E
Kocher Thomas D
Lee Bo-Young
Pepey Elodie
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Large collections of expressed sequence tags (ESTs) are a fundamental resource for analysis of gene expression and annotation of genome sequences. We generated 116,899 ESTs from 17 normalized and two non-normalized cDNA libraries representing 16 tissues from tilapia, a cichlid fish widely used in aquaculture and biological research. Results The ESTs were assembled into 20,190 contigs and 36,028 singletons for a total of 56,218 unique sequences and a total assembled length of 35,168,415 bp. Over the whole project, a unique sequence was discovered for every 2.079 sequence reads. 17,722 (31.5%) of these unique sequences had significant BLAST hits (e-value < 10-10) to the UniProt database. Conclusion Normalization of the cDNA pools with double-stranded nuclease allowed us to efficiently sequence a large collection of ESTs. These sequences are an important resource for studies of gene expression, comparative mapping and annotation of the forthcoming tilapia genome sequence.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Agritrop

Digital Repository at the University of Maryland

増殖硝子体網膜症に伴う増殖組織と続発性黄斑上膜における包括的遺伝子発現の比較

Author: Asato Ryo
安里良
Publication venue
Publication date
Field of study

Kyushu University Institutional Repository

Large deviation principles for the Ewens-Pitman sampling model

Author: S. Favaro
S. Feng
Shui Feng
Stefano Favaro
Publication venue
Publication date: 28/06/2014
Field of study

Let

M_{l,n}

be the number of blocks with frequency

l

in the exchangeable random partition induced by a sample of size

n

from the Ewens-Pitman sampling model. We show that, as

n

tends to infinity,

n^{-1}M_{l,n}

satisfies a large deviation principle and we characterize the corresponding rate function. A conditional counterpart of this large deviation principle is also presented. Specifically, given an initial sample of size

n

from the Ewens-Pitman sampling model, we consider an additional sample of size

m

. For any fixed

n

and as

m

tends to infinity, we establish a large deviation principle for the conditional number of blocks with frequency

l

in the enlarged sample, given the initial sample. Interestingly, the conditional and unconditional large deviation principles coincide, namely there is no long lasting impact of the given initial sample. Potential applications of our results are discussed in the context of Bayesian nonparametric inference for discovery probabilities.Comment: 30 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Institutional Research Information System University of Turin

A new estimator of the discovery probability

Author: Favaro Stefano
Lijoi A.
Pruenster Igor
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Institutional Research Information System University of Turin

Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries

Author: Cui Liying
dePamphilis Claude W
Lindsay Bruce G
Marion Josh
Wall P Kerr
Wang Ji-Ping Z
Zhang Jiaxuan
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

Author: Favaro Stefano
Nipoti Bernardo
Teh Yee Whye
Publication venue
Publication date: 16/06/2015
Field of study

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library

arXiv.org e-Print Archive

CiteSeerX

Institutional Research Information System University of Turin

Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics

Author: Arbel Julyan
Favaro Stefano
Nipoti Bernardo
Teh Yee Whye
Publication venue: 'Institute of Statistical Science'
Publication date: 02/07/2016
Field of study

Given a sample of size

n

from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability

D_{n}(l)

that the

(n+1)

-th draw coincides with a species with frequency

l

in the sample, for any

l=0,1,\ldots,n

. This paper contributes to the methodology of Bayesian nonparametric inference for

D_{n}(l)

. Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of

D_{n}(l)

, and we investigate the large

n

asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for

D_{n}(l)

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Oxford University Research Archive

Institutional Research Information System University of Turin

Pengenalpastian dan profil pengekspresan gen biosintesis asid amino yis psikrofil, Glaciozyma antarctica

Author: Abdul Munir Abdul Murad
Farah Diba Abu Bakar
Izwan Bharudin
Mohd Faizal Abu Bakar
Nazalan Najimudin
Nor Muhammad Mahadi
Radziah Zolkefli
Rosli Md. Illias
Shazilah Kamaruddin
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/08/2018
Field of study

Mekanisme pengambilan dan penghasilan asid amino bagi mikroorganisma psikrofil yang bermandiri dan berpoliferasi pada persekitaran sejuk melampau masih belum difahami sepenuhnya. Objektif kajian ini ialah untuk mengenal pasti gen yang terlibat dalam penjanaan asid amino bagi yis psikrofil, Glaciozyma antarctica serta menentukan pengekspresan gen tersebut semasa kehadiran dan kekurangan asid amino dalam medium pertumbuhan. Pengenalpastian gen telah dilakukan melalui penjanaan penanda jujukan terekspres (ESTs) daripada dua perpustakaan cDNA yang dibina daripada sel yang dikultur dalam medium pertumbuhan kompleks dan medium pertumbuhan minimum tanpa asid amino. Sebanyak 3552 klon cDNA daripada setiap perpustakaan dipilih secara rawak untuk dijujuk menghasilkan 1492 transkrip unik (medium kompleks) dan 1928 transkrip unik (medium minimum). Analisis pemadanan telah mengenl pasti gen mengekod protein yang terlibat di dalam pengambilan asid amino bebas, biosintesis asid amino serta gen yang terlibat dengan kitar semula asid amino berdasarkan tapak jalan yang digunakan oleh yis model, Saccharomyces cerevisiae. Analisis pengekspresan gen menggunakan kaedah RT-qPCR menunjukkan pengekspresan gen mengekod protein yang terlibat di dalam pengambilan asid amino bebas iaitu permease adalah tinggi pada medium kompleks manakala pengekspresan kebanyakan gen mengekod protein yang terlibat dalam kitar semula dan biosintesis asid amino adalah tinggi di dalam medium minimum. Kesimpulannya, gen yang terlibat dalam penjanaan dan pengambilan asid amino bagi mikroorganisma psikrofil adalah terpulihara seperti mikroorganisma mesofil dan pengekspresan gen-gen ini adalah diaruh oleh kehadiran atau ketiadaan asid amino bebas pada persekitaran

UKM Journal Article Repository