Search CORE

51 research outputs found

TaxMan: a taxonomic database manager

Author: A Rokas
C Lee
D Gordon
DA Benson
H Philippe
JD Thompson
JE Stajich
M Jones
Mark Blaxter
Martin Jones
PC Feijao
SA Olson
SF Altschul
W Ludwig
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study. RESULTS: TaxMan uses freely available tools to allow rapid assembly, storage and analysis of large, aligned DNA and protein sequence datasets for user-defined sets of species and genes. The user provides GenBank format files and a list of gene names and synonyms for the loci to analyse. Sequences are extracted from the GenBank files on the basis of annotation and sequence similarity. Consensus sequences are built automatically. Alignment is carried out (where possible, at the protein level) and aligned sequences are stored in a database. TaxMan can automatically determine the best subset of taxa to examine phylogeny at a given taxonomic level. By using the stored aligned sequences, large concatenated multiple sequence alignments can be generated rapidly for a subset and output in analysis-ready file formats. Trees resulting from phylogenetic analysis can be stored and compared with a reference taxonomy. CONCLUSION: TaxMan allows rapid automated assembly of a multigene datasets of aligned sequences for large taxonomic groups. By extracting sequences on the basis of both annotation and BLAST similarity, it ensures that all available sequence data can be brought to bear on a phylogenetic problem, but remains fast enough to cope with many thousands of records. By automatically assisting in the selection of the best subset of taxa to address a particular phylogenetic problem, TaxMan greatly speeds up the process of generating multiple sequence alignments for phylogenetic analysis. Our results indicate that an automated phylogenetic workbench can be a useful tool when correctly guided by user knowledge

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

FLYSNPdb: a high-density SNP database of Drosophila melanogaster

Author: Adams
Altschul
Berger
Celniker
Chen
Crosby
Doris Chen
Drysdale
Ewing
Ewing
Gordon
Hoskins
Jürg Berger
Lunter
Lunter
Marth
Martin
Michaela Fellner
Nairz
Olson
Rice
Roberts
Roberts
Rorth
Rozen
Takashi Suzuki
Teeter
Xu
Publication venue: Oxford University Press
Publication date
Field of study

FLYSNPdb provides high-resolution single nucleotide polymorphism (SNP) data of Drosophila melanogaster. The database currently contains 27 367 polymorphisms, including >3700 indels (insertions/deletions), covering all major chromsomes. These SNPs are clustered into 2238 markers, which are evenly distributed with an average density of one marker every 50.3 kb or 6.6 genes. SNPs were identified automatically, filtered for high quality and partly manually curated. The database provides detailed information on the SNP data including molecular and cytological locations (genome Releases 3–5), alleles of up to five commonly used laboratory stocks, flanking sequences, SNP marker amplification primers, quality scores and genotyping assays. Data specific for a certain region, particular stocks or a certain genome assembly version are easily retrievable through the interface of a publicly accessible website (http://flysnp.imp.ac.at/flysnpdb.php)

Crossref

PubMed Central

Bioinformatics tools for marine biotechnology: A practical tutorial with a metagenomic approach

Author: Allocca M.
Cubellis M. V.
Hay Mele B.
Liguori L.
Monticelli M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Background: Bioinformatics has pervaded all fields of biology and has become an indispensable tool for almost all research projects. Although teaching bioinformatics has been incorporated in all traditional life science curricula, practical hands-on experiences in tight combination with wet-lab experiments are needed to motivate students. Results: We present a tutorial that starts from a practical problem: finding novel enzymes from marine environments. First, we introduce the idea of metagenomics, a recent approach that extends biotechnology to non-culturable microbes. We presuppose that a probe for the screening of metagenomic cosmid library is needed. The students start from the chemical structure of the substrate that should be acted on by the novel enzyme and end with the sequence of the probe. To attain their goal, they discover databases such as BRENDA and programs such as BLAST and Clustal Omega. Students' answers to a satisfaction questionnaire show that a multistep tutorial integrated into a research wet-lab project is preferable to conventional lectures illustrating bioinformatics tools. Conclusion: Experimental biologists can better operate basic bioinformatics if a problem-solving approach is chosen

Archivio della ricerca - Università degli studi di Napoli Federico II

Comparative Analysis of CpG Islands in Four Fish Genomes

Author: Han Leng
Zhao Zhongming
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2008
Field of study

There has been much interest in CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, because they are considered gene markers and involved in gene regulation. To date, there has been no genome-wide analysis of CGIs in the fish genome. We first evaluated the performance of three popular CGI identification algorithms in four fish genomes (tetraodon, stickleback, medaka, and zebrafish). Our results suggest that Takai and Jones' (2002) algorithm is most suitable for comparative analysis of CGIs in the fish genome. Then, we performed a systematic analysis of CGIs in the four fish genomes using Takai and Jones' algorithm, compared to other vertebrate genomes. We found that both the number of CGIs and the CGI density vary greatly among these genomes. Remarkably, each fish genome presents a distinct distribution of CGI density with some genomic factors (e.g., chromosome size and chromosome GC content). These findings are helpful for understanding evolution of fish genomes and the features of fish CGIs

Crossref

Directory of Open Access Journals

PubMed Central

VCU Scholars Compass

A new reference genome assembly for the microcrustacean Daphnia pulex

Author: Ackerman Matthew S
Asselman Jana
Harker Brent
Jiang Xiaoqian
Lopez Jacqueline
Lynch Michael
Pfrender Michael E
Raborn R Taylor
Ramsdell Jordan
Spitze Ken
Thomas W Kelley
Xu Sen
Ye Zhiqiang
Publication venue: 'Genetics Society of America'
Publication date: 01/01/2017
Field of study

Comparing genomes of closely related genotypes from populations with distinct demographic histories can help reveal the impact of effective population size on genome evolution. For this purpose, we present a high quality genome assembly of Daphnia pulex (PA42), and compare this with the first sequenced genome of this species (TCO), which was derived from an isolate from a population with >90% reduction in nucleotide diversity. PA42 has numerous similarities to TCO at the gene level, with an average amino acid sequence identity of 98.8 and >60% of orthologous proteins identical. Nonetheless, there is a highly elevated number of genes in the TCO genome annotation, with similar to 7000 excess genes appearing to be false positives. This view is supported by the high GC content, lack of introns, and short length of these suspicious gene annotations. Consistent with the view that reduced effective population size can facilitate the accumulation of slightly deleterious genomic features, we observe more proliferation of transposable elements (TEs) and a higher frequency of gained introns in the TCO genome

Ghent University Academic Bibliography

Directory of Open Access Journals

In silico analysis of κ-theraphotoxin-Cg2a from Chilobrachys guangxiensis

Author: Sankaranarayanan Kavitha
Zaheer Zubin Abdul
Publication venue: Indian Journal of Biochemistry and Biophysics (IJBB)
Publication date: 29/07/2020
Field of study

κ-theraphotoxin-Cg2a is a 29- residue polypeptide extracted from the venomous glands of the Chinese earth tiger tarantula Chilobrachys guangxiensis. Plethoras of cancers are being associated with irregular functions of potassium ion channels. An extensive understanding of the toxin’s interaction with the voltage-gated potassium channels is of utmost necessity for it to be screened as a potential pharmacological molecule which may perhaps serve as toxin-based therapy to manage various cancer channelopathies. Physicochemical properties were studied, the evolutionary analysis was done to visualize the conserved domain among different toxins of tarantula family, docking studies between κ-theraphotoxin-Cg2a and a voltage-gated potassium ion channel was done by ClusPro 2.0. The presence of signal peptide was observed using PSIPRED. Cysteine – disulfide bonds present in the amino acid sequence was predicted by DiANNA server. Multiple sequence alignment illustrated conserved residues with other families of tarantula’s toxin. The docking of κ-theraphotoxin-Cg2a with the voltage-gated potassium channel was found to be interactive. The presence of cysteine –disulfide bonds were observed potentially playing a crucial role in the docking process. The interaction between the receptor and the ligand was found to be interactive which could turn out to help develop strategies to assist in creating potential pharmacological drug-based therapies

Online Publishing @ NISCAIR

NOPR

Multigenome DNA sequence conservation identifies Hox cis-regulatory elements

Author: De Buysscher Tristan
DeModena John A.
Kuntz Steven G.
Schwarz Erich M.
Shizuya Hiroaki
Sternberg Paul W.
Trout Diane
Wold Barbara J.
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/12/2008
Field of study

To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced ∼0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide

Crossref

PubMed Central

Caltech Authors

DSAP: deep-sequencing small RNA analysis pipeline

Author: Altschul
Bartel
Borchert
Brennecke
Carrington
Chen
Cheng
Chi-Ching Lee
Du
Friedlander
Gardner
Glazov
Griffiths-Jones
Griffiths-Jones
Griffiths-Jones
Griffiths-Jones
Hackenberg
Kuchenbauer
Lee
Lee
Moretti
Mullan
Olson
Pavlidis
Petrus Tang
Ping-Chiang Lyu
Po-Jung Huang
Reinhart
Rice
Richie Ruei-Chi Gan
Schwarz
Smith
Thompson
Wang
Wei-Chen Lin
Wilm
Yi-Chung Liu
Zeng
Publication venue: Oxford University Press
Publication date
Field of study

DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log2-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw

Crossref

PubMed Central

J Eukaryot Microbiol

Author
Publication venue
Publication date
Field of study

Emerging methods based on mass spectrometry (MS) can be used in the rapid identification of microorganisms. Thus far, these practical and rapidly evolving methods have mainly been applied to characterize prokaryotes. We applied matrix-assisted laser-desorption-ionization-time-of-flight mass spectrometry MALDI-TOF MS in the analysis of whole cells of 18 N. fowleri isolates belonging to three genotypes. Fourteen originated from the cerebrospinal fluid or brain tissue of primary amoebic meningoencephalitis patients and four originated from water samples of hot springs, rivers, lakes or municipal water supplies. Whole Naegleria trophozoites grown in axenic cultures were washed and mixed with MALDI matrix. Mass spectra were acquired with a 4700 TOF-TOF instrument. MALDI-TOF MS yielded consistent patterns for all isolates examined. Using a combination of novel data processing methods for visual peak comparison, statistical analysis and proteomics database searching we were able to detect several biomarkers that can differentiate all species and isolates studied, along with common biomarkers for all N. fowleri isolates. Naegleria fowleri could be easily separated from other species within the genus Naegleria. A number of peaks detected were tentatively identified. MALDI-TOF MS fingerprinting is a rapid, reproducible, high-throughput alternative method for identifying Naegleria isolates. This method has potential for studying eukaryotic agents.CC999999/Intramural CDC HHS/United States2017-12-26T00:00:00Z25231600PMC574320

CDC Stacks