53 research outputs found
SIG-DB: leveraging homomorphic encryption to Securely Interrogate privately held Genomic DataBases
Genomic data are becoming increasingly valuable as we develop methods to
utilize the information at scale and gain a greater understanding of how
genetic information relates to biological function. Advances in synthetic
biology and the decreased cost of sequencing are increasing the amount of
privately held genomic data. As the quantity and value of private genomic data
grows, so does the incentive to acquire and protect such data, which creates a
need to store and process these data securely. We present an algorithm for the
Secure Interrogation of Genomic DataBases (SIG-DB). The SIG-DB algorithm
enables databases of genomic sequences to be searched with an encrypted query
sequence without revealing the query sequence to the Database Owner or any of
the database sequences to the Querier. SIG-DB is the first application of its
kind to take advantage of locality-sensitive hashing and homomorphic encryption
to allow generalized sequence-to-sequence comparisons of genomic data.Comment: 38 pages, 3 figures, 4 tables, 1 supplemental table, 7 supplemental
figure
A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data
Many interesting data sets available on the Internet are of a medium
size---too big to fit into a personal computer's memory, but not so large that
they won't fit comfortably on its hard disk. In the coming years, data sets of
this magnitude will inform vital research in a wide array of application
domains. However, due to a variety of constraints they are cumbersome to
ingest, wrangle, analyze, and share in a reproducible fashion. These
obstructions hamper thorough peer-review and thus disrupt the forward progress
of science. We propose a predictable and pipeable framework for R (the
state-of-the-art statistical computing environment) that leverages SQL (the
venerable database architecture and query language) to make reproducible
research on medium data a painless reality.Comment: 30 pages, plus supplementary material
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
Experimental characterization of a 400 Gbit/s orbital angular momentum multiplexed free-space optical link over 120 m
We experimentally demonstrate and characterize the
performance of a 400-Gbit/s orbital angular momentum
(OAM) multiplexed free-space optical link over 120-
meters on the roof of a building. Four OAM beams, each
carrying a 100-Gbit/s QPSK channel are multiplexed and
transmitted. We investigate the influence of channel
impairments on the received power, inter-modal
crosstalk among channels, and system power penalties.
Without laser tracking and compensation systems, the
measured received power and crosstalk among OAM
channels fluctuate by 4.5 dB and 5 dB, respectively, over
180 seconds. For a beam displacement of 2 mm that
corresponds to a pointing error less than 16.7 μrad, the
link bit-error-rates are below the forward error
correction threshold of 3.8×10-3 for all channels. Both
experimental and simulation results show that power
penalties increase rapidly when the displacement
increases
Efficacy of Interaction among College Students in a Web-Based Environment
In order to investigate the efficacy of interaction among college students in a Web-based learning environment, three interactive tools (discussion board, e-mail, and online chat) were evaluated regarding the level of interaction and tool preference among a diverse group of college students in terms of age, gender, and online learning experience. A survey instrument was developed and used to assess and encourage interactive qualities in distance courses. A four-factor split-plot ANOVA was applied to analyze the data. The survey’s questions were repeated across each of the three tools in order to determine interaction efficacy levels in a Web-based environment. Discussion board, e-mail, and online chat each had statistically significant interactions with one another across four different factors: Instructional Design, Instructor Engagement, Learner Engagement, and Tool Preference. E-mail was the most preferred method of interaction, particularly among younger students. Implications for practice and research are discussed
INTERACTING WITH LOCAL AND REMOTE DATA RESPOSITORIES USING THE stashR PACKAGE
The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user\u27s computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version of the database. Users efficiently access information from a remote database by retrieving only the data files indexed by user-specified keys and caching this data in a local copy of the remote database.
The local and remote counterparts of the stashR package offer the potential to enhance reproducible research by allowing users of Sweave to cache their R computations for a research paper in a localDB database. This database can then be stored on the Internet as a remoteDB database. When readers of the research paper wish to reproduce the computations involved in creating a specific figure or calculating a specific numeric value, they can access the remoteDB database and obtain the R objects involved in the computation
Towards a Soil Information System with quantified accuracy : a prototype for mapping continuous soil properties
This report describes the potential and functionality of software for spatial analysis, prediction and stochastic simulation of continuous soil properties using data from the Dutch Soil Information System (BIS). A geostatistical framework and R codes were developed. The geostatistical model of a soil property has a deterministic component representing the mean value within a soil category, and a stochastic component of standardized residuals. The standardized residuals are interpolated or simulated based on the simple kriging system. The software was tested in four case studies: exchangeable soil pH, clay content, organic matter content and Mean Spring Water table depth (MSW). It is concluded that the geostatistical framework and R codes developed in this study enable to predict values of continuous soil properties spatially, and to quantify the inaccuracy of these predictions. The inaccuracy of a spatial prediction at a certain location is quantified by the kriging variance, which can be interpreted as an indication of the uncertainty about the true value
Simulation and analyses of the aeroassist flight experiment attitude update method
A method which will be used to update the alignment of the Aeroassist Flight Experiment's Inertial Measuring Unit is simulated and analyzed. This method, the Star Line Maneuver, uses measurements from the Space Shuttle Orbiter star trackers along with an extended Kalman filter to estimate a correction to the attitude quaternion maintained by an Inertial Measuring Unit in the Orbiter's payload bay. This quaternion is corrupted by on-orbit bending of the Orbiter payload bay with respect to the Orbiter navigation base, which is incorporated into the payload quaternion when it is initialized via a direct transfer of the Orbiter attitude state. The method of updating this quaternion is examined through verification of baseline cases and Monte Carlo analysis using a simplified simulation, The simulation uses nominal state dynamics and measurement models from the Kalman filter as its real world models, and is programmed on Microvax minicomputer using Matlab, and interactive matrix analysis tool. Results are presented which confirm and augment previous performance studies, thereby enhancing confidence in the Star Line Maneuver design methodology
- …