12,158 research outputs found

    Definitions, methods, and applications in interpretable machine learning.

    Get PDF
    Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF
    In a context of global carbon emission reduction goals, buildings have been identified to detain valuable energy-saving abilities. With the exponential increase of smart, connected building automation systems, massive amounts of data are now accessible for analysis. These coupled with powerful data science methods and machine learning algorithms present a unique opportunity to identify untapped energy-saving potentials from field information, and effectively turn buildings into active assets of the built energy infrastructure.However, the diversity of building occupants, infrastructures, and the disparities in collected information has produced disjointed scales of analytics that make it tedious for approaches to scale and generalize over the building stock.This coupled with the lack of standards in the sector has hindered the broader adoption of data science practices in the field, and engendered the following questioning:How can data science facilitate the scaling of approaches and bridge disconnected spatiotemporal scales of the built environment to deliver enhanced energy-saving strategies?This thesis focuses on addressing this interrogation by investigating data-driven, scalable, interpretable, and multi-scale approaches across varying types of analytical classes. The work particularly explores descriptive, predictive, and prescriptive analytics to connect occupants, buildings, and urban energy planning together for improved energy performances.First, a novel multi-dimensional data-mining framework is developed, producing distinct dimensional outlines supporting systematic methodological approaches and refined knowledge discovery. Second, an automated building heat dynamics identification method is put forward, supporting large-scale thermal performance examination of buildings in a non-intrusive manner. The method produced 64\% of good quality model fits, against 14\% close, and 22\% poor ones out of 225 Dutch residential buildings. %, which were open-sourced in the interest of developing benchmarks. Third, a pioneering hierarchical forecasting method was designed, bridging individual and aggregated building load predictions in a coherent, data-efficient fashion. The approach was evaluated over hierarchies of 37, 140, and 383 nodal elements and showcased improved accuracy and coherency performances against disjointed prediction systems.Finally, building occupants and urban energy planning strategies are investigated under the prism of uncertainty. In a neighborhood of 41 Dutch residential buildings, occupants were determined to significantly impact optimal energy community designs in the context of weather and economic uncertainties.Overall, the thesis demonstrated the added value of multi-scale approaches in all analytical classes while fostering best data-science practices in the sector from benchmarks and open-source implementations

    Data-driven discovery of coordinates and governing equations

    Full text link
    The discovery of governing equations from scientific data has the potential to transform data-rich fields that lack well-characterized quantitative descriptions. Advances in sparse regression are currently enabling the tractable identification of both the structure and parameters of a nonlinear dynamical system from data. The resulting models have the fewest terms necessary to describe the dynamics, balancing model complexity with descriptive ability, and thus promoting interpretability and generalizability. This provides an algorithmic approach to Occam's razor for model discovery. However, this approach fundamentally relies on an effective coordinate system in which the dynamics have a simple representation. In this work, we design a custom autoencoder to discover a coordinate transformation into a reduced space where the dynamics may be sparsely represented. Thus, we simultaneously learn the governing equations and the associated coordinate system. We demonstrate this approach on several example high-dimensional dynamical systems with low-dimensional behavior. The resulting modeling framework combines the strengths of deep neural networks for flexible representation and sparse identification of nonlinear dynamics (SINDy) for parsimonious models. It is the first method of its kind to place the discovery of coordinates and models on an equal footing.Comment: 25 pages, 6 figures; added acknowledgment

    A Smart Products Lifecycle Management (sPLM) Framework - Modeling for Conceptualization, Interoperability, and Modularity

    Get PDF
    Autonomy and intelligence have been built into many of today’s mechatronic products, taking advantage of low-cost sensors and advanced data analytics technologies. Design of product intelligence (enabled by analytics capabilities) is no longer a trivial or additional option for the product development. The objective of this research is aimed at addressing the challenges raised by the new data-driven design paradigm for smart products development, in which the product itself and the smartness require to be carefully co-constructed. A smart product can be seen as specific compositions and configurations of its physical components to form the body, its analytics models to implement the intelligence, evolving along its lifecycle stages. Based on this view, the contribution of this research is to expand the “Product Lifecycle Management (PLM)” concept traditionally for physical products to data-based products. As a result, a Smart Products Lifecycle Management (sPLM) framework is conceptualized based on a high-dimensional Smart Product Hypercube (sPH) representation and decomposition. First, the sPLM addresses the interoperability issues by developing a Smart Component data model to uniformly represent and compose physical component models created by engineers and analytics models created by data scientists. Second, the sPLM implements an NPD3 process model that incorporates formal data analytics process into the new product development (NPD) process model, in order to support the transdisciplinary information flows and team interactions between engineers and data scientists. Third, the sPLM addresses the issues related to product definition, modular design, product configuration, and lifecycle management of analytics models, by adapting the theoretical frameworks and methods for traditional product design and development. An sPLM proof-of-concept platform had been implemented for validation of the concepts and methodologies developed throughout the research work. The sPLM platform provides a shared data repository to manage the product-, process-, and configuration-related knowledge for smart products development. It also provides a collaborative environment to facilitate transdisciplinary collaboration between product engineers and data scientists
    • …
    corecore