147,709 research outputs found

    Finding Statistically Significant Interactions between Continuous Features

    Full text link
    The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019

    A Framework for Spatio-Temporal Data Analysis and Hypothesis Exploration

    Get PDF
    We present a general framework for pattern discovery and hypothesis exploration in spatio-temporal data sets that is based on delay-embedding. This is a remarkable method of nonlinear time-series analysis that allows the full phase-space behaviour of a system to be reconstructed from only a single observable (accessible variable). Recent extensions to the theory that focus on a probabilistic interpretation extend its scope and allow practical application to noisy, uncertain and high-dimensional systems. The framework uses these extensions to aid alignment of spatio-temporal sub-models (hypotheses) to empirical data - for example satellite images plus remote-sensing - and to explore modifications consistent with this alignment. The novel aspect of the work is a mechanism for linking global and local dynamics using a holistic spatio-temporal feedback loop. An example framework is devised for an urban based application, transit centric developments, and its utility is demonstrated with real data

    Predicting and simulating future land use pattern : a case study of Seremban district

    Get PDF
    As long as rapid urbanization which is a result of natural population growth and rural urban migration due to push and pull factors of social and economic conditions as well as the moving of urban populations from major city centres to urban fringe areas due to changing lifestyle which emphasized on spacious and more comfortable and environmentally friendly living environment continue to happen; towns and cities will continue to grow and expand to accommodate the growing and complex demand of the people. Experiences have shown that rapid and uncontrolled expansion of towns and cities has led to amongst others the deterioration in the quality of urban environment and sprawling of urban development onto prime agricultural and forest areas as well as cities starting to lose their identity. In order to avoid such phenomena continuing to happen, particularly in the Kuala Lumpur Conurbation Area, towns and cities need to be properly planned and managed so that their growth or expansion can be controlled and managed in a sustainable manner. One of the strategies adopted to curb sprawling development is through the delineation of urban growth or development limits (UGL). This means that the limit of towns and cities need to be studied and identified, so that urban development can be directed to areas that are identified and specified suitable for such development. One of the main tasks in the process of delineating UGL has been included as an important task in the preparation of development plans. With such policy a research study is now being carried out to develop a spatial modelling framework towards delineating UGL through the application and integration of spatial technologies and this will be a basis or framework for land use planners, managers and policy makers to formulate urban land use policies and monitor urban land use development. One of the main analysis involve in the process of performing this task is to understand past urban land development trend and to predict and identify future urban growth areas of the selected study area. This paper highlights the integration of statistical modeling technique via binary logistic regression analysis with GIS technology in understanding and predicting urban growth pattern and area as applied to District of Seremban, Negeri Sembilan. The result shows that urban land use pattern in the study area within the study period are significantly related to more than half of the predictors used in the analysis

    Towards Automated Performance Bug Identification in Python

    Full text link
    Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs
    corecore