1,355 research outputs found

    Reliability and validity in comparative studies of software prediction models

    Get PDF
    Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models

    On null hypotheses in survival analysis

    Full text link
    The conventional nonparametric tests in survival analysis, such as the log-rank test, assess the null hypothesis that the hazards are equal at all times. However, hazards are hard to interpret causally, and other null hypotheses are more relevant in many scenarios with survival outcomes. To allow for a wider range of null hypotheses, we present a generic approach to define test statistics. This approach utilizes the fact that a wide range of common parameters in survival analysis can be expressed as solutions of differential equations. Thereby we can test hypotheses based on survival parameters that solve differential equations driven by cumulative hazards, and it is easy to implement the tests on a computer. We present simulations, suggesting that our tests perform well for several hypotheses in a range of scenarios. Finally, we use our tests to evaluate the effect of adjuvant chemotherapies in patients with colon cancer, using data from a randomised controlled trial

    The surprising implications of familial association in disease risk

    Full text link
    Background: A wide range of diseases show some degree of clustering in families; family history is therefore an important aspect for clinicians when making risk predictions. Familial aggregation is often quantified in terms of a familial relative risk (FRR), and although at first glance this measure may seem simple and intuitive as an average risk prediction, its implications are not straightforward. Methods: We use two statistical models for the distribution of disease risk in a population: a dichotomous risk model that gives an intuitive understanding of the implication of a given FRR, and a continuous risk model that facilitates a more detailed computation of the inequalities in disease risk. Published estimates of FRRs are used to produce Lorenz curves and Gini indices that quantifies the inequalities in risk for a range of diseases. Results: We demonstrate that even a moderate familial association in disease risk implies a very large difference in risk between individuals in the population. We give examples of diseases for which this is likely to be true, and we further demonstrate the relationship between the point estimates of FRRs and the distribution of risk in the population. Conclusions: The variation in risk for several severe diseases may be larger than the variation in income in many countries. The implications of familial risk estimates should be recognized by epidemiologists and clinicians.Comment: 17 pages, 5 figure

    Transforming cumulative hazard estimates

    Full text link
    Time to event outcomes are often evaluated on the hazard scale, but interpreting hazards may be difficult. Recently, there has been concern in the causal inference literature that hazards actually have a built in selection-effect that prevents simple causal interpretations. This is even a problem in randomized controlled trials, where hazard ratios have become a standard measure of treatment effects. Modeling on the hazard scale is nevertheless convenient, e.g. to adjust for covariates. Using hazards for intermediate calculations may therefore be desirable. Here, we provide a generic method for transforming hazard estimates consistently to other scales at which these built in selection effects are avoided. The method is based on differential equations, and generalize a well known relation between the Nelson-Aalen and Kaplan-Meier estimators. Using the martingale central limit theorem we also find that covariances can be estimated consistently for a large class of estimators, thus allowing for rapid calculations of confidence intervals. Hence, given cumulative hazard estimates based on e.g. Aalen's additive hazard model, we can obtain many other parameters without much more effort. We present several examples and associated estimators. Coverage and convergence speed is explored using simulations, suggesting that reliable estimates can be obtained in real-life scenarios.Comment: 22 pages, 4 figures. Added Lemma 1 stating sufficient conditions for P-UT for our considerations, and Proposition 1 showing the conditions are satisfied for estimated additive hazard coefficients and their martingale residual

    Famtile: An Algorithm For Learning High-level Tactical Behavior From Observation

    Get PDF
    This research focuses on the learning of a class of behaviors defined as high-level behaviors. High-level behaviors are defined here as behaviors that can be executed using a sequence of identifiable behaviors. Represented by low-level contexts, these behaviors are known a priori to learning and can be modeled separately by a knowledge engineer. The learning task, which is achieved by observing an expert within simulation, then becomes the identification and representation of the low-level context sequence executed by the expert. To learn this sequence, this research proposes FAMTILE - the Fuzzy ARTMAP / Template-Based Interpretation Learning Engine. This algorithm attempts to achieve this learning task by constructing rules that govern the low-level context transitions made by the expert. By combining these rules with models for these low-level context behaviors, it is hypothesized that an intelligent model for the expert can be created that can adequately model his behavior. To evaluate FAMTILE, four testing scenarios were developed that attempt to achieve three distinct evaluation goals: assessing the learning capabilities of Fuzzy ARTMAP, evaluating the ability of FAMTILE to correctly predict expert actions and context choices given an observation, and creating a model of the expert\u27s behavior that can perform the high-level task at a comparable level of proficiency

    On Seeking Joy: An Integral Part of Personal Wellness

    Get PDF
    The connection between joy and wellness is explored in the article On Seeking Joy: An Integral Part of Personal Wellness. The basic premise is that without joy ... one is not well. Joy helps us maintain our wellness and survive the stressful society we live in. Seeking Joy is not as simple as it sounds. Many people do not know what they are looking for. Some generic joy factors are offered. These elements contribute to most people\u27s happiness. Joyseeking, a style of leisure counseling that helps people identify what joy is to them and get more of it is described. A definition, process, and sample activity of joyseeking provides a basic understanding of how this style of leisure counseling may lead people towards joy

    The Witch: A New England Folk Tale

    Get PDF
    This is a film review of The Witch: A New England Folk Tale (2016), directed by Robert Eggers

    Cost overruns in Norwegian projects – An econometric study

    Get PDF
    Master's thesis in Industrial economicsCost overruns are a global phenomenon. By assuming that companies are profit maximizing, we imply that inaccurate estimates of project costs are unwanted as the basis for investment analysis is weakened. The aim of this thesis is therefore to identify which factors affect the ability to set accurate budgets and meet the estimated costs. I highlight this topic by analyzing the differences in cost overruns between projects from the Norwegian oil industry and the public sector, by introducing macroeconomic variables for analysis and by looking into whether the cost overruns from one sector affects the other. Firstly, descriptive statistics and univariate regressions were run in order to obtain a better overview of the topic. Both public and oil projects are statistically more prone to cost overruns than underruns, however, oil projects experience overruns of larger magnitude. Public projects show a trend where increasing project size reduces cost overruns, while cost overruns in oil projects tend to increase with the duration of the project. To further analyze the dynamics of cost overruns, multivariate regressions were performed. This includes using forward selection by iterative processes in order to arrive at the final models. For oil projects, I find the variables Duration, Pension fund surprise, GDP growth and NCS investment surprise to significantly affect the magnitude of cost overruns, explaining about 25% of the variability in cost overruns for oil projects. As for public projects, the corresponding model includes the variables Duration, Employment level, GDP from marine activities and Export, explaining about 13% of the variability. The models above indicate that cost overruns in both sectors depend on the macroeconomic environment at the time of project execution. The explanatory power for the oil model is rather acceptable, however, I fail to find a good general model for cost overruns in public projects. When comparing the two sectors by running the models on the opposite dataset, no causal relationship in cost overruns between the two was found, and I therefore fail to confirm whether the different sectors affect the cost overruns of each other

    Exact and Superconvergent Solutions of the Multi-Point Flux Approximation O-method: Analysis and Numerical Tests

    Get PDF
    In this thesis we prove the multi-point flux approximation O-method (MPFA) to yield exact potential and flux for the trigonometric potential functions u(x,y)=sin(x)sin(y) and u(x,y)=cos(x)cos(y). This is done on uniform square grids in a homogeneous medium with principal directions of the permeability aligned with the grid directions when having periodic boundary conditions. Earlier theoretical and numerical convergence articles suggests that these potential functions should only yield second order convergence. Hence, our motivation for the analysis was to gain new insight into the convergence of the method, as well as to develop theoretical proofs for what seems as decent examples for testing implementation. An extension of the result to uniform rectangular grids in an isotropic medium is also briefly discussed, before we develop a numerical overview of the exactness phenomenon for different types of boundary conditions. Lastly, an investigation of application of these results to obtain exact potential and flux using the MPFA method for general potential functions approximated by Fourier series was conducted.Master i Anvendt og beregningsorientert matematikkMAMN-MABMAB39
    • …
    corecore