406 research outputs found
The Selaginella Genome Identifies Changes in Gene Content Associated With the Evolution of Vascular Plants
Vascular plants appeared ~410 million years ago, then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes. We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first nonseed vascular plant genome reported. By comparing gene content in evolutionarily diverse taxa, we found that the transition from a gametophyte- to a sporophyte-dominated life cycle required far fewer new genes than the transition from a nonseed vascular to a flowering plant, whereas secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in posttranscriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the trans-acting small interfering RNA pathway, and extensive RNA editing of organellar genes
Analysis of Gap Gene Regulation in a 3D Organism-Scale Model of the Drosophila melanogaster Embryo
The axial bodyplan of Drosophila melanogaster is determined during a process called morphogenesis. Shortly after fertilization, maternal bicoid mRNA is translated into Bicoid (Bcd). This protein establishes a spatially graded morphogen distribution along the anterior-posterior (AP) axis of the embryo. Bcd initiates AP axis determination by triggering expression of gap genes that subsequently regulate each other's expression to form a precisely controlled spatial distribution of gene products. Reaction-diffusion models of gap gene expression on a 1D domain have previously been used to infer complex genetic regulatory network (GRN) interactions by optimizing model parameters with respect to 1D gap gene expression data. Here we construct a finite element reaction-diffusion model with a realistic 3D geometry fit to full 3D gap gene expression data. Though gap gene products exhibit dorsal-ventral asymmetries, we discover that previously inferred gap GRNs yield qualitatively correct AP distributions on the 3D domain only when DV-symmetric initial conditions are employed. Model patterning loses qualitative agreement with experimental data when we incorporate a realistic DV-asymmetric distribution of Bcd. Further, we find that geometry alone is insufficient to account for DV-asymmetries in the final gap gene distribution. Additional GRN optimization confirms that the 3D model remains sensitive to GRN parameter perturbations. Finally, we find that incorporation of 3D data in simulation and optimization does not constrain the search space or improve optimization results
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
The effectiveness of position- and composition-specific gap costs for protein similarity searches
The flexibility in gap cost enjoyed by Hidden Markov Models (HMMs) is
expected to afford them better retrieval accuracy than position-specific
scoring matrices (PSSMs). We attempt to quantify the effect of more general gap
parameters by separately examining the influence of position- and
composition-specific gap scores, as well as by comparing the retrieval accuracy
of the PSSMs constructed using an iterative procedure to that of the HMMs
provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments.
We found that position-specific gap penalties have an advantage over uniform
gap costs. We did not explore optimizing distinct uniform gap costs for each
query. For Pfam, PSSMs iteratively constructed from seeds based on HMM
consensus sequences perform equivalently to HMMs that were adjusted to have
constant gap transition probabilities, albeit with much greater variance. We
observed no effect of composition-specific gap costs on retrieval performance.Comment: 17 pages, 4 figures, 2 table
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Background
Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment.
Results
In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware.
Conclusions
The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches
A Search for Parent-of-Origin Effects on Honey Bee Gene Expression
Parent-specific gene expression (PSGE) is little known outside of mammals and plants. PSGE occurs when the expression level of a gene depends on whether an allele was inherited from the mother or the father. Kin selection theory predicts that there should be extensive PSGE in social insects because social insect parents can gain inclusive fitness benefits by silencing parental alleles in female offspring. We searched for evidence of PSGE in honey bees using transcriptomes from reciprocal crosses between European and Africanized strains. We found 46 transcripts with significant parent-of-origin effects on gene expression, many of which overexpressed the maternal allele. Interestingly, we also found a large proportion of genes showing a bias toward maternal alleles in only one of the reciprocal crosses. These results indicate that PSGE may occur in social insects. The nonreciprocal effects could be largely driven by hybrid incompatibility between these strains. Future work will help to determine if these are indeed parent-of-origin effects that can modulate inclusive fitness benefits
Prediction of drug–target interaction networks from the integration of chemical and genomic spaces
Motivation: The identification of interactions between drugs and target proteins is a key area in genomic drug discovery. Therefore, there is a strong incentive to develop new methods capable of detecting these potential drug–target interactions efficiently
A graph-based integration of multimodal brain imaging data for the detection of early mild cognitive impairment (E-MCI)
Alzheimer's disease (AD) is the most common cause of dementia in older adults. By the time an individual has been diagnosed with AD, it may be too late for potential disease modifying therapy to strongly influence outcome. Therefore, it is critical to develop better diagnostic tools that can recognize AD at early symptomatic and especially pre-symptomatic stages. Mild cognitive impairment (MCI), introduced to describe a prodromal stage of AD, is presently classified into early and late stages (E-MCI, L-MCI) based on severity. Using a graph-based semi-supervised learning (SSL) method to integrate multimodal brain imaging data and select valid imaging-based predictors for optimizing prediction accuracy, we developed a model to differentiate E-MCI from healthy controls (HC) for early detection of AD. Multimodal brain imaging scans (MRI and PET) of 174 E-MCI and 98 HC participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort were used in this analysis. Mean targeted region-of-interest (ROI) values extracted from structural MRI (voxel-based morphometry (VBM) and FreeSurfer V5) and PET (FDG and Florbetapir) scans were used as features. Our results show that the graph-based SSL classifiers outperformed support vector machines for this task and the best performance was obtained with 66.8% cross-validated AUC (area under the ROC curve) when FDG and FreeSurfer datasets were integrated. Valid imaging-based phenotypes selected from our approach included ROI values extracted from temporal lobe, hippocampus, and amygdala. Employing a graph-based SSL approach with multimodal brain imaging data appears to have substantial potential for detecting E-MCI for early detection of prodromal AD warranting further investigation
COMPASS server for homology detection: improved statistical accuracy, speed and functionality
COMPASS is a profile-based method for the detection of remote sequence similarity and the prediction of protein structure. Here we describe a recently improved public web server of COMPASS, http://prodata.swmed.edu/compass. The server features three major developments: (i) improved statistical accuracy; (ii) increased speed from parallel implementation; and (iii) new functional features facilitating structure prediction. These features include visualization tools that allow the user to quickly and effectively analyze specific local structural region predictions suggested by COMPASS alignments. As an application example, we describe the structural, evolutionary and functional analysis of a protein with unknown function that served as a target in the recent CASP8 (Critical Assessment of Techniques for Protein Structure Prediction round 8). URL: http://prodata.swmed.edu/compas
Discovering patterns in drug-protein interactions based on their fingerprints
<p>Abstract</p> <p>Background</p> <p>The discovering of interesting patterns in drug-protein interaction data at molecular level can reveal hidden relationship among drugs and proteins and can therefore be of paramount importance for such application as drug design. To discover such patterns, we propose here a computational approach to analyze the molecular data of drugs and proteins that are known to have interactions with each other. Specifically, we propose to use a data mining technique called <it>Drug-Protein Interaction Analysis </it>(<it>D-PIA</it>) to determine if there are any commonalities in the fingerprints of the substructures of interacting drug and protein molecules and if so, whether or not any patterns can be generalized from them.</p> <p>Method</p> <p>Given a database of drug-protein interactions, <it>D-PIA </it>performs its tasks in several steps. First, for each drug in the database, the fingerprints of its molecular substructures are first obtained. Second, for each protein in the database, the fingerprints of its protein domains are obtained. Third, based on known interactions between drugs and proteins, an interdependency measure between the fingerprint of each drug substructure and protein domain is then computed. Fourth, based on the interdependency measure, drug substructures and protein domains that are significantly interdependent are identified. Fifth, the existence of interaction relationship between a previously unknown drug-protein pairs is then predicted based on their constituent substructures that are significantly interdependent.</p> <p>Results</p> <p>To evaluate the effectiveness of <it>D-PIA</it>, we have tested it with real drug-protein interaction data. <it>D-PIA </it>has been tested with real drug-protein interaction data including enzymes, ion channels, and protein-coupled receptors. Experimental results show that there are indeed patterns that one can discover in the interdependency relationship between drug substructures and protein domains of interacting drugs and proteins. Based on these relationships, a testing set of drug-protein data are used to see if <it>D-PIA </it>can correctly predict the existence of interaction between drug-protein pairs. The results show that the prediction accuracy can be very high. An AUC score of a ROC plot could reach as high as 75% which shows the effectiveness of this classifier.</p> <p>Conclusions</p> <p><it>D-PIA </it>has the advantage that it is able to perform its tasks effectively based on the fingerprints of drug and protein molecules without requiring any 3D information about their structures and <it>D-PIA </it>is therefore very fast to compute. <it>D-PIA </it>has been tested with real drug-protein interaction data and experimental results show that it can be very useful for predicting previously unknown drug-protein as well as protein-ligand interactions. It can also be used to tackle problems such as ligand specificity which is related directly and indirectly to drug design and discovery.</p
- …