18 research outputs found
SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand
The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, evenwhennosuchgeneispresent.Thiscapabilitymeansthatsynteny-basedmethodsarefarmoreeffectivethansequencesimilaritybased methods in identifying true-negatives, a necessity forstudying gene loss and gene transposition. However, the identification of syntenicregionsrequirescomplexanalyseswhichmustberepeatedforpairwisecomparisonsbetweenanytwospecies.Therefore,as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of targetgenomes.SynFindiscapableofreportingper-geneinformation,usefulforresearchersstudyingspecificgenefamilies,aswellas genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc
SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand
The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, evenwhennosuchgeneispresent.Thiscapabilitymeansthatsynteny-basedmethodsarefarmoreeffectivethansequencesimilaritybased methods in identifying true-negatives, a necessity forstudying gene loss and gene transposition. However, the identification of syntenicregionsrequirescomplexanalyseswhichmustberepeatedforpairwisecomparisonsbetweenanytwospecies.Therefore,as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of targetgenomes.SynFindiscapableofreportingper-geneinformation,usefulforresearchersstudyingspecificgenefamilies,aswellas genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc
Recommended from our members
Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons
Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
SyMAP v3.4: a turnkey synteny system with application to plant genomes
SyMAP (Synteny Mapping and Analysis Program) was originally developed to compute synteny blocks between a sequenced genome and a FPC map, and has been extended to support pairs of sequenced genomes. SyMAP uses MUMmer to compute the raw hits between the two genomes, which are then clustered and filtered using the optional gene annotation. The filtered hits are input to the synteny algorithm, which was designed to discover duplicated regions and form larger-scale synteny blocks, where intervening micro-rearrangements are allowed. SyMAP provides extensive interactive Java displays at all levels of resolution along with simultaneous displays of multiple aligned pairs. The synteny blocks from multiple chromosomes may be displayed in a high-level dot plot or three-dimensional view, and the user may then drill down to see the details of a region, including the alignments of the hits to the gene annotation. These capabilities are illustrated by showing their application to the study of genome duplication, differential gene loss and transitive homology between sorghum, maize and rice. The software may be used from a website or standalone for the best performance. A project manager is provided to organize and automate the analysis of multi-genome groups. The software is freely distributed at http://www.agcol.arizona.edu/software/symap
Sequencing, Mapping, and Analysis of 27,455 Maize Full-Length cDNAs
Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5′ and 3′ UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org)
Balancing, Proportionality, and Constitutional Rights
In the theory and practice of constitutional adjudication, proportionality review plays a crucial role. At a theoretical level, it lies at core of the debate on rights adjudication; in judicial practice, it is a widespread decision-making model characterizing the action of constitutional, supra-national and international courts. Despite its circulation and centrality in contemporary legal discourse, proportionality in rights-adjudication is still extremely controversial. It raises normative questions—concerning its justification and limits—and descriptive questions—regarding its nature and distinctive features. The chapter addresses both orders of questions.
Part I centres on the justification of proportionality review, the connection between proportionality, balancing and theories of rights and the critical aspects of this connection.
Part II identifies and analyses the different forms of proportionality both in review, as a template for rights-adjudication, and of review, as a way of defining the scope and limits of adjudication
fRNAkenseq: a fully powered-by-CyVerse cloud integrated RNA-sequencing analysis tool
Background: Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce. Methods: One strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists. Results: We introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
PAVE: Program for assembling and viewing ESTs
Abstract Background New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. Results The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. Conclusion The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.</p
Recommended from our members
Genome-Wide Patterns of Differentiation Among House Mouse Subspecies
One approach to understanding the genetic basis of speciation is to scan the genomes of recently diverged taxa to identify highly differentiated regions. The house mouse, Mus musculus, provides a useful system for the study of speciation. Three subspecies (M. m. castaneus, M. m. domesticus, and M. m. musculus) diverged ∼350 KYA, are distributed parapatrically, show varying degrees of reproductive isolation in laboratory crosses, and hybridize in nature. We sequenced the testes transcriptomes of multiple wild-derived inbred lines from each subspecies to identify highly differentiated regions of the genome, to identify genes showing high expression divergence, and to compare patterns of differentiation among subspecies that have different demographic histories and exhibit different levels of reproductive isolation. Using a sliding-window approach, we found many genomic regions with high levels of sequence differentiation in each of the pairwise comparisons among subspecies. In all comparisons, the X chromosome was more highly differentiated than the autosomes. Sequence differentiation and expression divergence were greater in the M. m. domesticus-M. m. musculus comparison than in either pairwise comparison with M. m. castaneus, which is consistent with laboratory crosses that show the greatest reproductive isolation between M. m. domesticus and M. m. musculus. Coalescent simulations suggest that differences in estimates of effective population size can account for many of the observed patterns. However, there was an excess of highly differentiated regions relative to simulated distributions under a wide range of demographic scenarios. Overlap of some highly differentiated regions with previous results from QTL mapping and hybrid zone studies points to promising candidate regions for reproductive isolation