5,523 research outputs found

    NCBI BLAST: a better web interface

    Get PDF
    Basic Local Alignment Search Tool (BLAST) is a sequence similarity search program. The public interface of BLAST, http://www.ncbi.nlm.nih.gov/blast, at the NCBI website has recently been reengineered to improve usability and performance. Key new features include simplified search forms, improved navigation, a list of recent BLAST results, saved search strategies and a documentation directory. Here, we describe the BLAST web application's new features, explain design decisions and outline plans for future improvement

    Parallel approach to sliding window sums

    Full text link
    Sliding window sums are widely used in bioinformatics applications, including sequence assembly, k-mer generation, hashing and compression. New vector algorithms which utilize the advanced vector extension (AVX) instructions available on modern processors, or the parallel compute units on GPUs and FPGAs, would provide a significant performance boost for the bioinformatics applications. We develop a generic vectorized sliding sum algorithm with speedup for window size w and number of processors P is O(P/w) for a generic sliding sum. For a sum with commutative operator the speedup is improved to O(P/log(w)). When applied to the genomic application of minimizer based k-mer table generation using AVX instructions, we obtain a speedup of over 5X.Comment: 10 pages, 5 figure

    Isolation of microsatellite loci in the Capricorn silvereye, Zosterops lateralis chlorocephalus (Aves : Zosteropidae)

    Get PDF
    The Capricorn silvereye (Zosterops lateralis chlorocephalus ) is ideally suited to investigating the genetic basis of body size evolution. We have isolated and characterized a set of microsatellite markers for this species. Seven out of 11 loci were polymorphic. The number of alleles detected ranged from two to five and observed heterozygosities between 0.12 and 0.67. One locus, ZL49, was found to be sex-linked. This moderate level of diversity is consistent with that expected in an isolated, island population

    PROCAIN server for remote protein sequence similarity search

    Get PDF
    Sensitive and accurate detection of distant protein homology is essential for the studies of protein structure, function and evolution. We recently developed PROCAIN, a method that is based on sequence profile comparison and involves the analysis of four signalsā€”similarities of residue content at the profile positions combined with three types of assisting information: sequence motifs, residue conservation and predicted secondary structure. Here we present the PROCAIN web server that allows the user to submit a query sequence or multiple sequence alignment and perform the search in a profile database of choice. The output is structured similar to that of BLAST, with the list of detected homologs sorted by E-value and followed by profileā€“profile alignments. The front page allows the user to adjust multiple options of input processing and output formatting, as well as search settings, including the relative weights assigned to the three types of assisting information

    Accurate statistical model of comparison between multiple sequence alignments

    Get PDF
    Comparison of multiple protein sequence alignments (MSA) reveals unexpected evolutionary relations between protein families and leads to exciting predictions of spatial structure and function. The power of MSA comparison critically depends on the quality of statistical model used to rank the similarities found in a database search, so that biologically relevant relationships are discriminated from spurious connections. Here, we develop an accurate statistical description of MSA comparison that does not originate from conventional models of single sequence comparison and captures essential features of protein families. As a final result, we compute E-values for the similarity between any two MSA using a mathematical function that depends on MSA lengths and sequence diversity. To develop these estimates of statistical significance, we first establish a procedure for generating realistic alignment decoys that reproduce natural patterns of sequence conservation dictated by protein secondary structure. Second, since similarity scores between these alignments do not follow the classic Gumbel extreme value distribution, we propose a novel distribution that yields statistically perfect agreement with the data. Third, we apply this random model to database searches and show that it surpasses conventional models in the accuracy of detecting remote protein similarities

    Racial-ethnic identity in mid-adolescence: Content and change as predictors of academic achievement

    Full text link
    Three aspects of racial-ethnic identity (REI)Ffeeling connected to oneā€™s racial-ethnic group (Connectedness), being aware that others may not value the in-group (Awareness of Racism), and feeling that oneā€™s in-group is characterized by academic attainment (Embedded Achievement)Fwere hypothesized to promote academic achievement. Youth randomly selected from 3 low-income, urban schools (n598 African American, n541 Latino) reported on their REI 4 times over 2 school years. Hierarchical linear modeling shows a small increase in REI and the predicted REI ā€“ grades relationship. Youth high in both REI Connectedness and Embedded Achievement attained better grade point average (GPA) at each point in time; youth high in REI Connectedness and Awareness of Racism at the beginning of 8th grade attained better GPA through 9th grade. Effects are not moderated by race-ethnicity.http://deepblue.lib.umich.edu/bitstream/2027.42/64271/1/Racial-ethnic_identity_in_mid-adolescence.pd

    ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes

    Get PDF
    The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.go

    Development of 5006 Full-Length CDNAs in Barley: A Tool for Accessing Cereal Genomics Resources

    Get PDF
    A collection of 5006 full-length (FL) cDNA sequences was developed in barley. Fifteen mRNA samples from various organs and treatments were pooled to develop a cDNA library using the CAP trapper method. More than 60% of the clones were confirmed to have complete coding sequences, based on comparison with rice amino acid and UniProt sequences. Blastn homologies (E<1E-5) to rice genes and Arabidopsis genes were 89 and 47%, respectively. Of the 5028 possible amino acid sequences derived from the 5006 FLcDNAs, 4032 (80.2%) were classified into 1678 GreenPhyl multigenic families. There were 555 cDNAs showing low homology to both rice and Arabidopsis. Gene ontology annotation by InterProScan indicated that many of these cDNAs (71%) have no known molecular functions and may be unique to barley. The cDNAs showed high homology to Barley 1 GeneChip oligo probes (81%) and the wheat gene index (84%). The high homology between FLcDNAs (27%) and mapped barley expressed sequence tag enabled assigning linkage map positions to 151ā€“233 FLcDNAs on each of the seven barley chromosomes. These comprehensive barley FLcDNAs provide strong platform to connect pre-existing genomic and genetic resources and accelerate gene identification and genome analysis in barley and related species

    Characteristics of 454 pyrosequencing dataā€”enabling realistic simulation with flowsim

    Get PDF
    Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to āˆ¼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments

    The cell cycle DB: a systems biology approach to cell cycle analysis

    Get PDF
    The cell cycle database is a biological resource that collects the most relevant information related to genes and proteins involved in human and yeast cell cycle processes. The database, which is accessible at the web site http://www.itb.cnr.it/cellcycle, has been developed in a systems biology context, since it also stores the cell cycle mathematical models published in the recent years, with the possibility to simulate them directly. The aim of our resource is to give an exhaustive view of the cell cycle process starting from its building-blocks, genes and proteins, toward the pathway they create, represented by the models
    • ā€¦
    corecore