53 research outputs found

    SIG-DB: leveraging homomorphic encryption to Securely Interrogate privately held Genomic DataBases

    Full text link
    Genomic data are becoming increasingly valuable as we develop methods to utilize the information at scale and gain a greater understanding of how genetic information relates to biological function. Advances in synthetic biology and the decreased cost of sequencing are increasing the amount of privately held genomic data. As the quantity and value of private genomic data grows, so does the incentive to acquire and protect such data, which creates a need to store and process these data securely. We present an algorithm for the Secure Interrogation of Genomic DataBases (SIG-DB). The SIG-DB algorithm enables databases of genomic sequences to be searched with an encrypted query sequence without revealing the query sequence to the Database Owner or any of the database sequences to the Querier. SIG-DB is the first application of its kind to take advantage of locality-sensitive hashing and homomorphic encryption to allow generalized sequence-to-sequence comparisons of genomic data.Comment: 38 pages, 3 figures, 4 tables, 1 supplemental table, 7 supplemental figure

    A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

    Get PDF
    Many interesting data sets available on the Internet are of a medium size---too big to fit into a personal computer's memory, but not so large that they won't fit comfortably on its hard disk. In the coming years, data sets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality.Comment: 30 pages, plus supplementary material

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    Experimental characterization of a 400  Gbit/s orbital angular momentum multiplexed free-space optical link over 120 m

    Get PDF
    We experimentally demonstrate and characterize the performance of a 400-Gbit/s orbital angular momentum (OAM) multiplexed free-space optical link over 120- meters on the roof of a building. Four OAM beams, each carrying a 100-Gbit/s QPSK channel are multiplexed and transmitted. We investigate the influence of channel impairments on the received power, inter-modal crosstalk among channels, and system power penalties. Without laser tracking and compensation systems, the measured received power and crosstalk among OAM channels fluctuate by 4.5 dB and 5 dB, respectively, over 180 seconds. For a beam displacement of 2 mm that corresponds to a pointing error less than 16.7 μrad, the link bit-error-rates are below the forward error correction threshold of 3.8×10-3 for all channels. Both experimental and simulation results show that power penalties increase rapidly when the displacement increases

    Efficacy of Interaction among College Students in a Web-Based Environment

    Get PDF
    In order to investigate the efficacy of interaction among college students in a Web-based learning environment, three interactive tools (discussion board, e-mail, and online chat) were evaluated regarding the level of interaction and tool preference among a diverse group of college students in terms of age, gender, and online learning experience. A survey instrument was developed and used to assess and encourage interactive qualities in distance courses. A four-factor split-plot ANOVA was applied to analyze the data. The survey’s questions were repeated across each of the three tools in order to determine interaction efficacy levels in a Web-based environment. Discussion board, e-mail, and online chat each had statistically significant interactions with one another across four different factors: Instructional Design, Instructor Engagement, Learner Engagement, and Tool Preference. E-mail was the most preferred method of interaction, particularly among younger students. Implications for practice and research are discussed

    INTERACTING WITH LOCAL AND REMOTE DATA RESPOSITORIES USING THE stashR PACKAGE

    Get PDF
    The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user\u27s computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version of the database. Users efficiently access information from a remote database by retrieving only the data files indexed by user-specified keys and caching this data in a local copy of the remote database. The local and remote counterparts of the stashR package offer the potential to enhance reproducible research by allowing users of Sweave to cache their R computations for a research paper in a localDB database. This database can then be stored on the Internet as a remoteDB database. When readers of the research paper wish to reproduce the computations involved in creating a specific figure or calculating a specific numeric value, they can access the remoteDB database and obtain the R objects involved in the computation

    Towards a Soil Information System with quantified accuracy : a prototype for mapping continuous soil properties

    Get PDF
    This report describes the potential and functionality of software for spatial analysis, prediction and stochastic simulation of continuous soil properties using data from the Dutch Soil Information System (BIS). A geostatistical framework and R codes were developed. The geostatistical model of a soil property has a deterministic component representing the mean value within a soil category, and a stochastic component of standardized residuals. The standardized residuals are interpolated or simulated based on the simple kriging system. The software was tested in four case studies: exchangeable soil pH, clay content, organic matter content and Mean Spring Water table depth (MSW). It is concluded that the geostatistical framework and R codes developed in this study enable to predict values of continuous soil properties spatially, and to quantify the inaccuracy of these predictions. The inaccuracy of a spatial prediction at a certain location is quantified by the kriging variance, which can be interpreted as an indication of the uncertainty about the true value

    Simulation and analyses of the aeroassist flight experiment attitude update method

    Get PDF
    A method which will be used to update the alignment of the Aeroassist Flight Experiment's Inertial Measuring Unit is simulated and analyzed. This method, the Star Line Maneuver, uses measurements from the Space Shuttle Orbiter star trackers along with an extended Kalman filter to estimate a correction to the attitude quaternion maintained by an Inertial Measuring Unit in the Orbiter's payload bay. This quaternion is corrupted by on-orbit bending of the Orbiter payload bay with respect to the Orbiter navigation base, which is incorporated into the payload quaternion when it is initialized via a direct transfer of the Orbiter attitude state. The method of updating this quaternion is examined through verification of baseline cases and Monte Carlo analysis using a simplified simulation, The simulation uses nominal state dynamics and measurement models from the Kalman filter as its real world models, and is programmed on Microvax minicomputer using Matlab, and interactive matrix analysis tool. Results are presented which confirm and augment previous performance studies, thereby enhancing confidence in the Star Line Maneuver design methodology
    corecore