19 research outputs found
An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data
An annotation database (X:MAP) and BioConductor/R package (exonmap) have been developed to support fine-grained analysis of exon array data
Modulator structures for radio-on-fiber applications, Journal of Telecommunications and Information Technology, 2003, nr 1
Traditional electro-optic modulators are not optimized for bandpass applications such as radio-on-fiber delivery systems due to their limited electro-optic response. This paper presents a resonant electrode structure that can be employed in optical modulator designs to enhance the electro-optic response of the modulator over a narrow frequency band, and improve the performance of optical radio systems. A simple model of the structure is developed, and experimental results validating the model and illustrating the effectiveness of the structure are presented
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision
Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker node
rnaSeqMap: a Bioconductor package for RNA sequencing data exploration
BACKGROUND: The throughput of commercially available sequencers has recently significantly increased. It has reached the point where measuring the RNA expression by the depth of coverage has become feasible even for largest genomes. The development of software tools is constantly following the progress of biological hardware. In particular, as RNA sequencing software can be regarded genome browsers, exon junction tools and statistical tools operating on counts of reads in predefined regions. The library rnaSeqMap, freely available via Bioconductor, is an RNA sequencing software which is independent of any biological hardware platform. It is based upon standard Bioconductor infrastructure for sequencing data and includes several novel features focused on deeper understanding of coverage expression profiles and discovery of novel transcription regions.
RESULTS: rnaSeqMap is a toolbox for analyses that may be performed with the use of gene annotations or alternatively, in an unsupervised mode, on any genomic region to find novel or non-standard transcripts. The data back-end may be a MySQL database or a set of files in standard BAM format. The processing in R can be run on a machine without any particular hardware requirements, and scales linearly with the number of genomic loci and number of samples analyzed. The main features of rnaSeqMap include coverage operations, discovering irreducible regions of high expression, significance search and splicing analyses with nucleotide granularity.
CONCLUSIONS: This software may be used for a range of applications related to RNA sequencing by building customized analysis pipelines. The applicability and precision is expected to increase in parallel with the progress of the genome coverage in sequencers
Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage
The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primer
Comprehensive Analysis of Affymetrix Exon Arrays Using BioConductor
ISSN:1553-734XISSN:1553-735
Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage
The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers
Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations
BACKGROUND: Microarrays measure the binding of nucleotide sequences to a set of sequence specific probes. This information is combined with annotation specifying the relationship between probes and targets and used to make inferences about transcript- and, ultimately, gene expression. In some situations, a probe is capable of hybridizing to more than one transcript, in others, multiple probes can target a single sequence. These 'multiply targeted' probes can result in non-independence between measured expression levels. RESULTS: An analysis of these relationships for Affymetrix arrays considered both the extent and influence of exact matches between probe and transcript sequences. For the popular HGU133A array, approximately half of the probesets were found to interact in this way. Both real and simulated expression datasets were used to examine how these effects influenced the expression signal. It was found not only to lead to increased signal strength for the affected probesets, but the major effect is to significantly increase their correlation, even in situations when only a single probe from a probeset was involved. By building a network of probe-probeset-transcript relationships, it is possible to identify families of interacting probesets. More than 10% of the families contain members annotated to different genes or even different Unigene clusters. Within a family, a mixture of genuine biological and artefactual correlations can occur. CONCLUSION: Multiple targeting is not only prevalent, but also significant. The ability of probesets to hybridize to more than one gene product can lead to false positives when analysing gene expression. Comprehensive annotation describing multiple targeting is required when interpreting array data
Exon level integration of proteomics and microarray data
Abstract Background Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing. Results Using a genome database as a platform for integration, we combined quantitative protein mass spectrometry with Affymetrix Exon array data at the level of individual exons. We found significantly higher degrees of correlation than have been previously observed (r = 0.808). The study was performed using cell lines in equilibrium in order to reduce a major potential source of biological variation, thus allowing the analysis to focus on the data integration methods in order to establish their performance. Conclusion We conclude that part of the variation observed when integrating microarray and proteomics data may occur as a consequence both of the data analysis and of the high granularity to which studies have until recently been limited. The approach opens up the possibility for the first time of considering combined microarray and proteomics datasets at the level of individual exons and isoforms, important given the high proportion of alternative splicing observed in the human genome.</p