199 research outputs found
Hawkeye: An interactive visual analytics tool for genome assemblies
Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome assemblies, designed to aid in identifying and correcting assembly errors. Users can analyze all levels of an assembly along with summary statistics and assembly metrics, and are guided by a ranking component towards likely mis-assemblies. Hawkeye is freely available and released as part of the open source AMOS project http://amos.sourceforge.net/hawkeye. © 2007 Schatz et al.; licensee BioMed Central Ltd
The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.
Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination
Assemblathon 1: A competitive assessment of de novo short read assembly methods
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/
Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18
Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen
Statistical Analysis of Microarray Data with Replicated Spots: A Case Study with Synechococcus WH8102
Until recently microarray experiments often involved relatively few arrays with only a single representation of each gene on each array. A complete genome microarray with multiple spots per gene (spread out spatially across the array) was developed in order to compare the gene expression of a marine cyanobacterium and a knockout mutant strain in a defined artificial seawater medium. Statistical methods were developed for analysis in the special situation of this case study where there is gene replication within an array and where relatively few arrays are used, which can be the case with current array technology. Due in part to the replication within an array, it was possible to detect very small changes in the levels of expression between the wild type and mutant strains. One interesting biological outcome of this experiment is the indication of the extent to which the phosphorus regulatory system of this cyanobacterium affects the expression of multiple genes beyond those strictly involved in phosphorus acquisition
Inositol 1,3,4,5,6-pentakisphosphate 2-kinase is a distant IPK member with a singular inositide binding site for axial 2-OH recognition
Inositol phosphates (InsPs) are signaling molecules with multiple roles in cells. In particular Graphic (InsP6) is involved in mRNA export and editing or chromatin remodeling among other events. InsP6 accumulates as mixed salts (phytate) in storage tissues of plants and plays a key role in their physiology. Human diets that are exclusively grain-based provide an excess of InsP6 that, through chelation of metal ions, may have a detrimental effect on human health. Ins(1,3,4,5,6)P5 2-kinase (InsP5 2-kinase or Ipk1) catalyses the synthesis of InsP6 from InsP5 and ATP, and is the only enzyme that transfers a phosphate group to the axial 2-OH of the myo-inositide. We present the first structure for an InsP5 2-kinase in complex with both substrates and products. This enzyme presents a singular structural region for inositide binding that encompasses almost half of the protein. The key residues in substrate binding are identified, with Asp368 being responsible for recognition of the axial 2-OH. This study sheds light on the unique molecular mechanism for the synthesis of the precursor of inositol pyrophosphates
Recommended from our members
De novo assembly of the cattle reference genome with single-molecule sequencing.
BackgroundMajor advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies.ResultsWe present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use.ConclusionsWe demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species
NCBI GEO: archive for high-throughput functional genomic data
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as ‘Minimum Information About a Microarray Experiment’ (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/
Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes
open28openHall A.B.; Papathanos P.-A.; Sharma A.; Cheng C.; Akbari O.S.; Assour L.; Bergman N.H.; Cagnetti A.; Crisanti A.; Dottorini T.; Fiorentini E.; Galizi R.; Hnath J.; Jiang X.; Koren S.; Nolan T.; Radune D.; Sharakhova M.V.; Steele A.; Timoshevskiy V.A.; Windbichler N.; Zhang S.; Hahn M.W.; Phillippy A.M.; Emrich S.J.; Sharakhov I.V.; Tu Z.J.; Besansky N.J.Hall, A. B.; Papathanos, P. -A.; SHARMA DHAKAL, Apsara; Cheng, C.; Akbari, O. S.; Assour, L.; Bergman, N. H.; Cagnetti, A.; Crisanti, A.; Dottorini, T.; Fiorentini, E.; Galizi, R.; Hnath, J.; Jiang, X.; Koren, S.; Nolan, T.; Radune, D.; Sharakhova, M. V.; Steele, A.; Timoshevskiy, V. A.; Windbichler, N.; Zhang, Shangu; Hahn, M. W.; Phillippy, A. M.; Emrich, S. J.; Sharakhov, I. V.; Tu, Z. J.; Besansky, N. J
Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation
BACKGROUND: The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. RESULTS: The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. CONCLUSIONS: The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease
- …