24 research outputs found
The RNAmute web server for the mutational analysis of RNA secondary structures
RNA mutational analysis at the secondary-structure level can be useful to a wide-range of biological applications. It can be used to predict an optimal site for performing a nucleotide mutation at the single molecular level, as well as to analyze basic phenomena at the systems level. For the former, as more sequence modification experiments are performed that include site-directed mutagenesis to find and explore functional motifs in RNAs, a pre-processing step that helps guide in planning the experiment becomes vital. For the latter, mutations are generally accepted as a central mechanism by which evolution occurs, and mutational analysis relating to structure should gain a better understanding of system functionality and evolution. In the past several years, the program RNAmute that is structure based and relies on RNA secondary-structure prediction has been developed for assisting in RNA mutational analysis. It has been extended from single-point mutations to treat multiple-point mutations efficiently by initially calculating all suboptimal solutions, after which only the mutations that stabilize the suboptimal solutions and destabilize the optimal one are considered as candidates for being deleterious. The RNAmute web server for mutational analysis is available at http://www.cs.bgu.ac.il/~xrnamute/XRNAmute
Recommended from our members
Principles of metadata organization at the ENCODE data coordination center.
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org
Recommended from our members
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package
Multi-tissue integrative analysis of personal epigenomes
Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements and transcriptional activity across more than 25 tissues from these donors. Integrative analysis revealed over a million variants with allele-specific activity, coordinated, locus-scale allelic imbalances, and structural variants impacting proximal chromatin structure. We relate the personal genome analysis to the ENCODE encyclopedia, annotating allele- and tissue-specific elements that are strongly enriched for variants impacting expression and disease phenotypes. These experimental and statistical approaches, and the corresponding EN-TEx resource, provide a framework for personalized functional genomics
ENCODE DATA Wrangling Vignettes
Slides and video from presentation at the Open Science Symposium at Carnegie Mellon University on October 18th, 201
Session 3: Open Tools and Platforms
The video of the four presentations and roundtable panel in Session 3, as well as the corresponding slide
Additional file 3: Figure S3. of A streamlined tethered chromosome conformation capture protocol
Comparison between RTCC experimental data (this work; black lines), Hi-C data from Crane et al. (2015; magenta lines); and relative representation in anti-LEM2 ChIP-chip data [28]. The curves were obtained by running the ICE pipeline [38] on our N2 dataset (N2 DpnII GSM2041038- SRR3105476) and on Crane et al. dataset (GSM1556154 - SRR1665087), as implemented in [ https://bitbucket.org/mirnylab/hiclib ], downloaded on Dec 1, 2015. To obtain the first Eigen Vector values, representing compartments along the chromosome axis, we have followed the tutorial from [ https://bitbucket.org/mirnylab/hiclib ], using the binnedData class function doEig(numPCs = 1). To inspect the correlation to LEM-2 binding compartments we added LEM-2 binding data [28] (MA2C normalized log2 ratio of ChIP signal over control), lifted from the ce4 genome assembly to the ce10 assembly, and averaged in 50KB bins. (PDF 46 kb
Additional file 2: Figure S2. of A streamlined tethered chromosome conformation capture protocol
Correlation between N2 DpnII experimental data and data from Crane et al. [22]. The 50KB chromatin contact matrix constructed using N2 young adults treated with DpnII restriction enzyme data (GSM2041038- SRR3105476) was compared with 50KB resolution chromatin contacts matrix constructed using the Crane et al. data (GSM1556154 - SRR1665087) from [22]. The total number of paired-ended 37X2 reads in our dataset was 88,466,514, while Crane et al. provide a total of 115,983,178 paired-ended 100X2 reads was. In order to build the chromatin contacts matrix we used the ICE pipeline [38] iterative mapping implementation from [ https://bitbucket.org/mirnylab/hiclib ], starting from 21nt up to 37nt in increments of 8. The number of detected Hi-C valid pairs in our dataset was 18,779,498 , consisting of 4,542,078 inter-chromosomal contacts and 14,237,420 intra-chromosomal contacts. In Crane’s dataset the number of valid Hi-C pairs was 59,200,047, consisting of 6,457,271 inter-chromosomal contacts and 52,742,776 intra-chromosomal contacts. Similarly to Additional file 1: Figure S1, any contacts between any location on chromosome I and region containing rRNA on chromosome I (the bin 15,050,000-end of chromosome I) are colored in green. Contacts between genomic loci in adjacent regions (up to 100KB apart are colored in purple). Versions of software used for analysis are as follows: Bowtie2-2.2.6 [39], and mirnylib/hiclib [ https://bitbucket.org/mirnylab/hiclib ] downloaded on December 1, 2015. Slight differences in aligned read counts from Tables 2 and 3 reflect updates in alignment software in the concerted package compared to the legacy versions used in Tables 2 and 3. (PDF 1132 kb