2,102 research outputs found
Computational comparison of two mouse draft genomes and the human golden path
BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artificial chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics
Boosting with stumps for predicting transcription start sites
Promoter prediction is a difficult but important problem in gene finding, and it is critical for elucidating the regulation of gene expression. We introduce a new promoter prediction program, CoreBoost, which applies a boosting technique with stumps to select important small-scale as well as large-scale features. CoreBoost improves greatly on locating transcription start sites. We also demonstrate that by further utilizing some tissue-specific information, better accuracy can be achieved
Experimental and numerical observation of dark and bright breathers in the band gap of a diatomic electrical lattice
We observe dark and bright intrinsic localized modes (ILMs), also known as discrete breathers, experimentally
and numerically in a diatomic-like electrical lattice. The experimental generation of dark ILMs by driving a
dissipative lattice with spatially homogenous amplitude is, to our knowledge, unprecedented. In addition, the
experimental manifestation of bright breathers within the band gap is also novel in this system. In experimental
measurements the dark modes appear just below the bottom of the top branch in frequency. As the frequency is
then lowered further into the band gap, the dark ILMs persist, until the nonlinear localization pattern reverses
and bright ILMs appear on top of the finite background. Deep into the band gap, only a single bright structure
survives in a lattice of 32 nodes. The vicinity of the bottom band also features bright and dark self-localized
excitations. These results pave the way for a more systematic study of dark breathers and their bifurcations in
diatomic-like chains.VI Plan Propio of the University of Seville, Spain (VI PPITUS)AEI/FEDER, UE MAT2016- 79866-
Using quality scores and longer reads improves accuracy of Solexa read mapping
<p>Abstract</p> <p>Background</p> <p>Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (<it>e.g</it>. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.</p> <p>Results</p> <p>To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at <url>http://rulai.cshl.edu/rmap/</url>.</p> <p>Conclusion</p> <p>Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.</p
A Dispersive Analysis on the and Resonances in Processes
We estimate the di-photon coupling of , and
resonances in a coupled channel dispersive approach. The di-photon
coupling is also reinvestigated using a single channel matrix for
scattering with better analyticity property, and it is found to be
significantly smaller than that of a state. Especially we also
estimate the di-photon coupling of the third sheet pole located near
threshold, denoted as .
It is argued that this third sheet pole may be originated from a coupled
channel Breit-Wigner description of the resonance.Comment: 24 pages and 13 eps figures. A nuerical bug in previous version is
fixed. Some results changed. References and new figures added. Version to
appear in Phys. Rev.
TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies
In order to understand gene regulation, accurate and comprehensive knowledge of transcriptional regulatory elements is essential. Here, we report our efforts in building a mammalian Transcriptional Regulatory Element Database (TRED) with associated data analysis functions. It collects cis- and trans-regulatory elements and is dedicated to easy data access and analysis for both single-gene-based and genome-scale studies. Distinguishing features of TRED include: (i) relatively complete genome-wide promoter annotation for human, mouse and rat; (ii) availability of gene transcriptional regulation information including transcription factor binding sites and experimental evidence; (iii) data accuracy is ensured by hand curation; (iv) efficient user interface for easy and flexible data retrieval; and (v) implementation of on-the-fly sequence analysis tools. TRED can provide good training datasets for further genome-wide cis-regulatory element prediction and annotation, assist detailed functional studies and facilitate the decipher of gene regulatory networks (http://rulai.cshl.edu/TRED)
- …