13 research outputs found

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.publishedVersio

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome

    Regression

    No full text
    Regression is a statistical approach for modelling the relationship between a response variable y and one or several explanatory variables x. Various types of regression methods are extensively applied for the analysis of data from literarily all fields of quantitative research. For example, multiple linear regression, logistic regression, and Cox proportional hazards models have been the main basic statistical tools in medical research for decades. In the last 20–30 years, the regression toolbox has been supplied with numerous extensions, like, for example, generalized additive models, regression methods for repeated measurements, and regression methods for high-dimensional data, to mention some

    Additive monotone regression in high and lower dimensions

    No full text

    Partially linear monotone methods with automatic variable selection and monotonicity direction discovery

    No full text
    In many statistical regression and prediction problems, it is reasonable to assume monotone relationships between certain predictor variables and the outcome. Genomic effects on phenotypes are, for instance, often assumed to be monotone. However, in some settings, it may be reasonable to assume a partially linear model, where some of the covariates can be assumed to have a linear effect. One example is a prediction model using both high‐dimensional gene expression data, and low‐dimensional clinical data, or when combining continuous and categorical covariates. We study methods for fitting the partially linear monotone model, where some covariates are assumed to have a linear effect on the response, and some are assumed to have a monotone (potentially nonlinear) effect. Most existing methods in the literature for fitting such models are subject to the limitation that they have to be provided the monotonicity directions a priori for the different monotone effects. We here present methods for fitting partially linear monotone models which perform both automatic variable selection, and monotonicity direction discovery. The proposed methods perform comparably to, or better than, existing methods, in terms of estimation, prediction, and variable selection performance, in simulation experiments in both classical and high‐dimensional data settings

    Model uncertainty first, not afterwards

    No full text
    Watson and Holmes propose ways of investigating robustness of statistical decisions by examining certain neighbourhoods around a posterior distribution. This may partly amount to ad hoc modelling of extra uncertainty. Instead of creating neighbourhoods around the posterior a posteriori, we argue that it might be more fruitful to model a layer of extra uncertainty first, in the model building process, and then allow the data to determine how big the resulting neighbourhoods ought to be. We develop and briefly illustrate a general strategy along such lines

    Efficient on-line anomaly detection for ship systems in operation

    No full text
    We propose novel modifications to an anomaly detection methodology based on multivariate signal reconstruction followed by residuals analysis. The reconstructions are made using Auto Associative Kernel Regression (AAKR), where the query observations are compared to historical observations called memory vectors, representing normal operation. When the data set with historical observations grows large, the naive approach where all observations are used as memory vectors will lead to unacceptable large computational loads, hence a reduced set of memory vectors should be intelligently selected. The residuals between the observed and the reconstructed signals are analysed using standard Sequential Probability Ratio Tests (SPRT), where appropriate alarms are raised based on the sequential behaviour of the residuals. The modifications we introduce include: a novel cluster based method to select memory vectors to be considered by the AAKR, which gives an extensive reduction in computation time; a generalization of the distance measure, which makes it possible to distinguish between explanatory and response variables; and a regional credibility estimation used in the residuals analysis, to let the time used to identify if a sequence of query vectors represents an anomalous state or not, depend on the amount of data situated close to or surrounding the query vector. We demonstrate how the anomaly detection method and the proposed modifications can be successfully applied for anomaly detection on a set of imbalanced benchmark data sets, as well as on recent data from a marine diesel engine in operation

    AIS-Based Multiple Vessel Collision and Grounding Risk Identification based on Adaptive Safety Domain

    No full text
    The continuous growth in maritime traffic and recent developments towards autonomous navigation have directed increasing attention to navigational safety in which new tools are required to identify real-time risk and complex navigation situations. These tools are of paramount importance to avoid potentially disastrous consequences of accidents and promote safe navigation at sea. In this study, an adaptive ship-safety-domain is proposed with spatial risk functions to identify both collision and grounding risk based on motion and maneuverability conditions for all vessels. The algorithm is designed and validated through extensive amounts of Automatic Identification System (AIS) data for decision support over a large area, while the integration of the algorithm with other navigational systems will increase effectiveness and ensure reliability. Since a successful evacuation of a potential vessel-to-vessel collision, or a vessel grounding situation, is highly dependent on the nearby maneuvering limitations and other possible accident situations, multi-vessel collision and grounding risk is considered in this work to identify real-time risk. The presented algorithm utilizes and exploits dynamic AIS information, vessel registry and high-resolution maps and it is robust to inaccuracies of position, course and speed over ground records. The computation-efficient algorithm allows for real-time situation risk identification at a large-scale monitored map up to country level and up to several years of operation with a very high accuracy

    A Comparative Study of Methods for Estimating Conditional Shapley Values and When to Use Them

    Full text link
    Shapley values originated in cooperative game theory but are extensively used today as a model-agnostic explanation framework to explain predictions made by complex machine learning models in the industry and academia. There are several algorithmic approaches for computing different versions of Shapley value explanations. Here, we focus on conditional Shapley values for predictive models fitted to tabular data. Estimating precise conditional Shapley values is difficult as they require the estimation of non-trivial conditional expectations. In this article, we develop new methods, extend earlier proposed approaches, and systematize the new refined and existing methods into different method classes for comparison and evaluation. The method classes use either Monte Carlo integration or regression to model the conditional expectations. We conduct extensive simulation studies to evaluate how precisely the different method classes estimate the conditional expectations, and thereby the conditional Shapley values, for different setups. We also apply the methods to several real-world data experiments and provide recommendations for when to use the different method classes and approaches. Roughly speaking, we recommend using parametric methods when we can specify the data distribution almost correctly, as they generally produce the most accurate Shapley value explanations. When the distribution is unknown, both generative methods and regression models with a similar form as the underlying predictive model are good and stable options. Regression-based methods are often slow to train but produce the Shapley value explanations quickly once trained. The vice versa is true for Monte Carlo-based methods, making the different methods appropriate in different practical situations
    corecore