17 research outputs found

    Plant protein-coding gene families: emerging bioinformatics approaches

    Get PDF
    Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is

    A Human-Specific De Novo Protein-Coding Gene Associated with Human Brain Functions

    Get PDF
    To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkage analyses of nicotine addiction and found one human-specific de novo protein-coding gene, FLJ33706 (alternative gene symbol C20orf203). Cross-species analysis revealed interesting evolutionary paths of how this gene had originated from noncoding DNA sequences: insertion of repeat elements especially Alu contributed to the formation of the first coding exon and six standard splice junctions on the branch leading to humans and chimpanzees, and two subsequent substitutions in the human lineage escaped two stop codons and created an open reading frame of 194 amino acids. We experimentally verified FLJ33706's mRNA and protein expression in the brain. Real-Time PCR in multiple tissues demonstrated that FLJ33706 was most abundantly expressed in brain. Human polymorphism data suggested that FLJ33706 encodes a protein under purifying selection. A specifically designed antibody detected its protein expression across human cortex, cerebellum and midbrain. Immunohistochemistry study in normal human brain cortex revealed the localization of FLJ33706 protein in neurons. Elevated expressions of FLJ33706 were detected in Alzheimer's brain samples, suggesting the role of this novel gene in human-specific pathogenesis of Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and differential protein expression, and be involved in human brain functions

    The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.</p> <p>Results</p> <p>We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses.</p> <p>Conclusions</p> <p>Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.</p> <p>[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p

    High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes

    Get PDF
    This work is supported by grants BIO2014- 57291-R from the Spanish Ministry of Economy and Competitiveness and “Plataforma de Recursos Biomoleculares y Bioinformáticos” PT13/0001/0007 from the ISCIII, both co-funded with European Regional Development Funds (ERDF); PROMETEOII/2014/025 from the Generalitat Valenciana (GVA-FEDER); Fundació la Marató TV3 (ref. 20133134); and EU H2020- INFRADEV-1-2015-1 ELIXIR-EXCELERATE (ref. 676559) and EU FP7-People ITN Marie Curie Project (ref 316861)

    CRK: an evolutionary approach for distinguishing biologically relevant interfaces from crystal contacts

    Full text link
    Protein crystals contain two different types of interfaces: biologically relevant ones, observed in protein-protein complexes and oligomeric proteins, and nonspecific ones, corresponding to crystal lattice contacts. Because of the increasing complexity of the objects being tackled in structural biology, distinguishing biological contacts from crystal contacts is not always a trivial task and can lead to wrong interpretation of macromolecular structures. We devised an approach (CRK, core-rim K(a)/K(s) ratio) for distinguishing biologically relevant interfaces from nonspecific ones. Given a protein-protein interface, CRK finds a set of homologs to the sequences of the proteins involved in the interface, retrieves and aligns the corresponding coding sequences, on which it carries out a residue-by-residue K(a)/K(s) ratio (omega) calculation. It divides interface residues into a "rim" and a "core" set and analyzes the selection pressure on the residues belonging to the two sets. We developed and tested CRK on different datasets and test cases, consisting of biologically relevant contacts, nonspecific ones or of both types. The method proves very effective in distinguishing the two categories of interfaces, with an overall accuracy rate of 84%. As it relies on different principles when compared with existing tools, CRK is optimally suited to be used in combination with them. In addition, CRK has potential applications in the validation of structures of oligomeric proteins and protein complexes
    corecore