68,880 research outputs found
Are we there yet? : reliably estimating the completeness of plant genome sequences
Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation
Methods to study splicing from high-throughput RNA Sequencing data
The development of novel high-throughput sequencing (HTS) methods for RNA
(RNA-Seq) has provided a very powerful mean to study splicing under multiple
conditions at unprecedented depth. However, the complexity of the information
to be analyzed has turned this into a challenging task. In the last few years,
a plethora of tools have been developed, allowing researchers to process
RNA-Seq data to study the expression of isoforms and splicing events, and their
relative changes under different conditions. We provide an overview of the
methods available to study splicing from short RNA-Seq data. We group the
methods according to the different questions they address: 1) Assignment of the
sequencing reads to their likely gene of origin. This is addressed by methods
that map reads to the genome and/or to the available gene annotations. 2)
Recovering the sequence of splicing events and isoforms. This is addressed by
transcript reconstruction and de novo assembly methods. 3) Quantification of
events and isoforms. Either after reconstructing transcripts or using an
annotation, many methods estimate the expression level or the relative usage of
isoforms and/or events. 4) Providing an isoform or event view of differential
splicing or expression. These include methods that compare relative
event/isoform abundance or isoform expression across two or more conditions. 5)
Visualizing splicing regulation. Various tools facilitate the visualization of
the RNA-Seq data in the context of alternative splicing. In this review, we do
not describe the specific mathematical models behind each method. Our aim is
rather to provide an overview that could serve as an entry point for users who
need to decide on a suitable tool for a specific analysis. We also attempt to
propose a classification of the tools according to the operations they do, to
facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde
A Systemic Receptor Network Triggered by Human cytomegalovirus Entry
Virus entry is a multistep process that triggers a variety of cellular
pathways interconnecting into a complex network, yet the molecular complexity
of this network remains largely unsolved. Here, by employing systems biology
approach, we reveal a systemic virus-entry network initiated by human
cytomegalovirus (HCMV), a widespread opportunistic pathogen. This network
contains all known interactions and functional modules (i.e. groups of
proteins) coordinately responding to HCMV entry. The number of both genes and
functional modules activated in this network dramatically declines shortly,
within 25 min post-infection. While modules annotated as receptor system, ion
transport, and immune response are continuously activated during the entire
process of HCMV entry, those for cell adhesion and skeletal movement are
specifically activated during viral early attachment, and those for immune
response during virus entry. HCMV entry requires a complex receptor network
involving different cellular components, comprising not only cell surface
receptors, but also pathway components in signal transduction, skeletal
development, immune response, endocytosis, ion transport, macromolecule
metabolism and chromatin remodeling. Interestingly, genes that function in
chromatin remodeling are the most abundant in this receptor system, suggesting
that global modulation of transcriptions is one of the most important events in
HCMV entry. Results of in silico knock out further reveal that this entire
receptor network is primarily controlled by multiple elements, such as EGFR
(Epidermal Growth Factor) and SLC10A1 (sodium/bile acid cotransporter family,
member 1). Thus, our results demonstrate that a complex systemic network, in
which components coordinating efficiently in time and space contributes to
virus entry.Comment: 26 page
Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks
Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin
- …