462,896 research outputs found
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Informed Consent to Address Trust, Control, and Privacy Concerns in User Profiling
More and more, services and products are being personalised or\ud
tailored, based on user-related data stored in so called user profiles or user\ud
models. Although user profiling offers great benefits for both organisations and\ud
users, there are several psychological factors hindering the potential success of user profiling. The most important factors are trust, control and privacy\ud
concerns. This paper presents informed consent as a means to address the\ud
hurdles trust, control, and privacy concerns pose to user profiling
Towards information profiling: data lake content metadata management
There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft
Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review
A variety of genome-wide profiling techniques are available to probe
complementary aspects of genome structure and function. Integrative analysis of
heterogeneous data sources can reveal higher-level interactions that cannot be
detected based on individual observations. A standard integration task in
cancer studies is to identify altered genomic regions that induce changes in
the expression of the associated genes based on joint analysis of genome-wide
gene expression and copy number profiling measurements. In this review, we
provide a comparison among various modeling procedures for integrating
genome-wide profiling data of gene copy number and transcriptional alterations
and highlight common approaches to genomic data integration. A transparent
benchmarking procedure is introduced to quantitatively compare the cancer gene
prioritization performance of the alternative methods. The benchmarking
algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin
Epitope profiling via mixture modeling of ranked data
We propose the use of probability models for ranked data as a useful
alternative to a quantitative data analysis to investigate the outcome of
bioassay experiments, when the preliminary choice of an appropriate
normalization method for the raw numerical responses is difficult or subject to
criticism. We review standard distance-based and multistage ranking models and
in this last context we propose an original generalization of the Plackett-Luce
model to account for the order of the ranking elicitation process. The
usefulness of the novel model is illustrated with its maximum likelihood
estimation for a real data set. Specifically, we address the heterogeneous
nature of experimental units via model-based clustering and detail the
necessary steps for a successful likelihood maximization through a hybrid
version of the Expectation-Maximization algorithm. The performance of the
mixture model using the new distribution as mixture components is compared with
those relative to alternative mixture models for random rankings. A discussion
on the interpretation of the identified clusters and a comparison with more
standard quantitative approaches are finally provided.Comment: (revised to properly include references
Coordinating views for data visualisation and algorithmic profiling
A number of researchers have designed visualisation systems that consist of multiple components, through which data and interaction commands flow. Such multistage (hybrid) models can be used to reduce algorithmic complexity, and to open up intermediate stages of algorithms for inspection and steering. In this paper, we present work on aiding the developer and the user of such algorithms through the application of interactive visualisation techniques. We present a set of tools designed to profile the performance of other visualisation components, and provide further functionality for the exploration of high dimensional data sets. Case studies are provided, illustrating the application of the profiling modules to a number of data sets. Through this work we are exploring ways in which techniques traditionally used to prepare for visualisation runs, and to retrospectively analyse them, can find new uses within the context of a multi-component visualisation system
Data base for the Colorado profiling network
The Colorado profiling system developed by the Wave Propagation Laboratory (WPL) includes five (soon to be six) Doppler radar wind Profilers; four operate at 49 MHz (6 m) and are located at Platteville, Fleming, Lay Creek, and Cahone, and one operates at 915 MHz (33 cm) and is located at Denver. The sixth radar, now under construction, will operate at 405 MHz (UHF) and will be located at Boulder. Microwave radiometers and surface meteorological stations are at some of the radar sites. The data base for the wind Profilers is discussed
- …