122 research outputs found

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets

    Get PDF
    Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives

    Anthropogenic and climatic control upon vegetation fires: new insights from satelite observations to assess current and future impacts

    Get PDF
    Doutoramento em Engenharia Florestal - Instituto Superior de AgronomiaVegetation fires actively participate in ecosystem dynamics and atmospheric composition. Their contemporaneous occurrence and impacts – described under the concept of “fire regimes” – is driven by climate, vegetation, and human activities – the components of the “fire triangle”. The gaps in our understanding of those drivers hamper the proper consideration of fires in various domains, including ecosystems management, vegetation modeling, and climate change investigation. This thesis capitalizes on satellite observations to depict the anthropogenic and climatic influence on fire regimes. Fire inter-annual variability is shown to be dominated by large scale climatic patterns, of which the El Niño-Southern Oscillation has the most widespread and long term footprint. Fire frequency and seasonality are more complex, being determined by the interaction of all three factors of the fire triangle. The evaluation of a vegetation-fire model thus reveals significant discrepancies. It suggests a great margin of progress on representing of the anthropogenic factor, supported by the wide range of fire practices identified from fire season dynamics. A model specific to tropical deforestation fires is developed, as a regional application of this thesis contributions. Climate is a forceful safeguard against forest conversion progress, but ongoing environmental changes could revert the situation

    Data-Driven Modeling, Control and Tools for Cyber-Physical Energy Systems

    Get PDF
    Energy systems are experiencing a gradual but substantial change in moving away from being non-interactive and manually-controlled systems to utilizing tight integration of both cyber (computation, communications, and control) and physical representations guided by first principles based models, at all scales and levels. Furthermore, peak power reduction programs like demand response (DR) are becoming increasingly important as the volatility on the grid continues to increase due to regulation, integration of renewables and extreme weather conditions. In order to shield themselves from the risk of price volatility, end-user electricity consumers must monitor electricity prices and be flexible in the ways they choose to use electricity. This requires the use of control-oriented predictive models of an energy system’s dynamics and energy consumption. Such models are needed for understanding and improving the overall energy efficiency and operating costs. However, learning dynamical models using grey/white box approaches is very cost and time prohibitive since it often requires significant financial investments in retrofitting the system with several sensors and hiring domain experts for building the model. We present the use of data-driven methods for making model capture easy and efficient for cyber-physical energy systems. We develop Model-IQ, a methodology for analysis of uncertainty propagation for building inverse modeling and controls. Given a grey-box model structure and real input data from a temporary set of sensors, Model-IQ evaluates the effect of the uncertainty propagation from sensor data to model accuracy and to closed-loop control performance. We also developed a statistical method to quantify the bias in the sensor measurement and to determine near optimal sensor placement and density for accurate data collection for model training and control. Using a real building test-bed, we show how performing an uncertainty analysis can reveal trends about inverse model accuracy and control performance, which can be used to make informed decisions about sensor requirements and data accuracy. We also present DR-Advisor, a data-driven demand response recommender system for the building\u27s facilities manager which provides suitable control actions to meet the desired load curtailment while maintaining operations and maximizing the economic reward. We develop a model based control with regression trees algorithm (mbCRT), which allows us to perform closed-loop control for DR strategy synthesis for large commercial buildings. Our data-driven control synthesis algorithm outperforms rule-based demand response methods for a large DoE commercial reference building and leads to a significant amount of load curtailment (of 380kW) and over $45,000 in savings which is 37.9% of the summer energy bill for the building. The performance of DR-Advisor is also evaluated for 8 buildings on Penn\u27s campus; where it achieves 92.8% to 98.9% prediction accuracy. We also compare DR-Advisor with other data driven methods and rank 2nd on ASHRAE\u27s benchmarking data-set for energy prediction

    A conceptual framework for studying collective reactions to events in location-based social media

    Full text link
    Events are a core concept of spatial information, but location-based social media (LBSM) provide information on reactions to events. Individuals have varied degrees of agency in initiating, reacting to or modifying the course of events, and reactions include observations of occurrence, expressions containing sentiment or emotions, or a call to action. Key characteristics of reactions include referent events and information about who reacted, when, where and how. Collective reactions are composed of multiple individual reactions sharing common referents. They can be characterized according to the following dimensions: spatial, temporal, social, thematic and interlinkage. We present a conceptual framework, which allows characterization and comparison of collective reactions. For a thematically well-defined class of event such as storms, we can explore differences and similarities in collective attribution of meaning across space and time. Other events may have very complex spatio-temporal signatures (e.g. political processes such as Brexit or elections), which can be decomposed into series of individual events (e.g. a temporal window around the result of a vote). The purpose of our framework is to explore ways in which collective reactions to events in LBSM can be described and underpin the development of methods for analysing and understanding collective reactions to events
    • …
    corecore