152 research outputs found
Evolution of a domain conserved in microtubule-associated proteins of eukaryotes
The microtubule network, the major organelle of the eukaryotic cytoskeleton, is involved in cell division and differentiation but also with many other cellular functions. In plants, microtubules seem to be involved in the ordered deposition of cellulose microfibrils by a so far unknown mechanism. Microtubule-associated proteins (MAP) typically contain various domains targeting or binding proteins with different functions to microtubules. Here we have investigated a proposed microtubule-targeting domain, TPX2, first identified in the Kinesin-like protein 2 in Xenopus. A TPX2 containing microtubule binding protein, PttMAP20, has been recently identified in poplar tissues undergoing xylogenesis. Furthermore, the herbicide 2,6-dichlorobenzonitrile (DCB), which is a known inhibitor of cellulose synthesis, was shown to bind specifically to PttMAP20. It is thus possible that PttMAP20 may have a role in coupling cellulose biosynthesis and the microtubular networks in poplar secondary cell walls. In order to get more insight into the occurrence, evolution and potential functions of TPX2-containing proteins we have carried out bioinformatic analysis for all genes so far found to encode TPX2 domains with special reference to poplar PttMAP20 and its putative orthologs in other plants
GAM-NGS: genomic assemblies merger for next generation sequencing
Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct
PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions
As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
Maximum likelihood models and algorithms for gene tree evolution with duplications and losses
<p>Abstract</p> <p>Background</p> <p>The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.</p> <p>Results</p> <p>We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.</p> <p>Conclusions</p> <p>In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.</p
Kajian Bentuk Dan Sensitivitas Rumus Indeks Pi, Storet, Ccme Untuk Penentuan Status Mutu Perairan Sungai Tropis Di Indonesia (Assessment of the Forms and Sensitivity of the Index Formula Pi, Storet, Ccme for the Determination of Water Quality Status)
Metode-metode Pollution Index (USA), metode Storet (USA) dan metode CCME (Canada) adalah metode indeks kualitas air (IKA) untuk penentuan status mutu air. Dua yang pertama banyak digunakan praktisi lingkungan di Indonesia karena dirujuk dalam Keputusan Menteri Lingkungan Hidup No. 115/2013. Ketiganya dapat menghitung IKA dengan baku mutu kualitas air lokal sungai kajian. Mengingat negara penyusun metode tersebut berbeda kondisi lingkungannya dan masing-masing metode mempunyai faktor spesifik untuk menghitung IKA, maka perlu dikaji kesesuaian masing-masing metode untuk diterapkan di sungai tropis Indonesia. Masing-masing metode akan dikaji bentuk persamaan dan sensitivitasnya dengan menggunakan banyak parameter kualitas air dan menggunakan jumlah parameter kualitas air tertentu mengacu pada metode IKA yang dikembangkan di negara tropis lainnya. Kajian menggunakan data pemantauan “Prokasih” di sungai Gadjah Wong Yogyakarta tahun 1996/1997 - 2011/2012. Penelitian ini dilakukan dalam rangka menyusun metode IKA sungai tropis Indonesia pada umumnya dan di sungai Gadjah Wong khususnya serta program pengelolaan kualitas air untuk pengendalian pencemaran air sungai, dengan target konservasi air sungai yang multifungsi atau overall/general use(memenuhi kriteria kesehatan air baku, memenuhi kriteria estetika serta kriteria ekologi/aman bagi kehidupan di perairan). Hasil kajian menunjukkan bahwa dibandingkan 2 metode lainnya, metode CCME dinilai paling obyektif (secara statistik) menghitung IKA perairan sungai Gadjah Wong. CCME paling sensitif merespon dinamika indeks mutu air di setiap lokasi pemantauan, lebih universal untuk dapat diaplikasikan di luar negara penyusunnya. Namun untuk diaplikasikan di sungai Gadjah Wong, metode CCME perlu diadaptasi terhadap beberapa hal yaitu jumlah dan jenis parameter kualitas air yang dianggap signifikan, jumlah dan kelas mutu air. Adaptasi mempertimbangkan program pengendalian pencemaran air dan strategi operasional/manajemen aliran sungai yang ekologis dan berkelanjutan. Skor batas dan makna setiap kelas mutu air dalam IKA harus diverifikasi terhadap data lingkungan lain misal hasil biotilik ataupun bioassay sehingga status indeks kualitas air tidak bertentangan dengan kondisi biologi di sungai. Pelibatan parameter bakteriologi kualitas air (Escherichia Coli dan Total Coliform) serta Electric Conductivity/EC sebagai parameter kualitas air signifikan dalam metode IKA masih perlu dikaji lebih lanjut untuk pengembangan metode IKA khas perairan sungai di negara tropis Indonesia
Fast computation of distance estimators
BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds
- …