68,880 research outputs found

    Are we there yet? : reliably estimating the completeness of plant genome sequences

    Get PDF
    Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    A Systemic Receptor Network Triggered by Human cytomegalovirus Entry

    Get PDF
    Virus entry is a multistep process that triggers a variety of cellular pathways interconnecting into a complex network, yet the molecular complexity of this network remains largely unsolved. Here, by employing systems biology approach, we reveal a systemic virus-entry network initiated by human cytomegalovirus (HCMV), a widespread opportunistic pathogen. This network contains all known interactions and functional modules (i.e. groups of proteins) coordinately responding to HCMV entry. The number of both genes and functional modules activated in this network dramatically declines shortly, within 25 min post-infection. While modules annotated as receptor system, ion transport, and immune response are continuously activated during the entire process of HCMV entry, those for cell adhesion and skeletal movement are specifically activated during viral early attachment, and those for immune response during virus entry. HCMV entry requires a complex receptor network involving different cellular components, comprising not only cell surface receptors, but also pathway components in signal transduction, skeletal development, immune response, endocytosis, ion transport, macromolecule metabolism and chromatin remodeling. Interestingly, genes that function in chromatin remodeling are the most abundant in this receptor system, suggesting that global modulation of transcriptions is one of the most important events in HCMV entry. Results of in silico knock out further reveal that this entire receptor network is primarily controlled by multiple elements, such as EGFR (Epidermal Growth Factor) and SLC10A1 (sodium/bile acid cotransporter family, member 1). Thus, our results demonstrate that a complex systemic network, in which components coordinating efficiently in time and space contributes to virus entry.Comment: 26 page

    Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks

    Get PDF
    Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin
    corecore