84 research outputs found

    Machine Learning in Discrete Molecular Spaces

    Get PDF
    The past decade has seen an explosion of machine learning in chemistry. Whether it is in property prediction, synthesis, molecular design, or any other subdivision, machine learning seems poised to become an integral, if not a dominant, component of future research efforts. This extraordinary capacity rests on the interac- tion between machine learning models and the underlying chemical data landscape commonly referred to as chemical space. Chemical space has multiple incarnations, but is generally considered the space of all possible molecules. In this sense, it is one example of a molecular set: an arbitrary collection of molecules. This thesis is devoted to precisely these objects, and particularly how they interact with machine learning models. This work is predicated on the idea that by better understanding the relationship between molecular sets and the models trained on them we can improve models, achieve greater interpretability, and further break down the walls between data-driven and human-centric chemistry. The hope is that this enables the full predictive power of machine learning to be leveraged while continuing to build our understanding of chemistry. The first three chapters of this thesis introduce and reviews the necessary machine learning theory, particularly the tools that have been specially designed for chemical problems. This is followed by an extensive literature review in which the contributions of machine learning to multiple facets of chemistry over the last two decades are explored. Chapters 4-7 explore the research conducted throughout this PhD. Here we explore how we can meaningfully describe the properties of an arbitrary set of molecules through information theory; how we can determine the most informative data points in a set of molecules; how graph signal processing can be used to understand the relationship between the chosen molecular representation, the property, and the machine learning model; and finally how this approach can be brought to bear on protein space. Each of these sub-projects briefly explores the necessary mathematical theory before leveraging it to provide approaches that resolve the posed problems. We conclude with a summary of the contributions of this work and outline fruitful avenues for further exploration

    Black-, grey-, and white-box side-channel programming for software integrity checking

    Get PDF
    Doctor of PhilosophyDepartment of Computing and Information SciencesEugene VassermanChecking software integrity is a fundamental problem of system security. Many approaches have been proposed trying to enforce that a device runs the original code. Software-based methods such as hypervisors, separation kernels, and control flow integrity checking often rely on processors to provide some form of separation such as operation modes and memory protection. Hardware-based methods such as remote attestation, secure boot, and watchdog coprocessors rely on trusted hardware to execute attestation code such as verifying memory content and examining signatures appearing on buses. However, many embedded systems do not possess such sophisticated capabilities due to prohibitive hardware costs, unacceptably high power consumption, or the inability to update fielded components. Further, security assumption may become invalid as time goes by. For Systems-on-Chip (SoCs), in particular, internal activities cannot be observed directly, while in non-SoCs, sniffing bus traffic between constituent components may suffice for integrity checking. A promising approach to check software integrity for resource-constrained SoCs is through side-channels. Side-channels have been used mostly for attacks, such as eavesdropping from vibration of glass or plant leaves, fingerprinting machines from traffic patterns, or extracting secret key materials of cryptographic routines using power consumption measurements. In this work, side-channels are used to enhance rather than undercut security. First, we study the relationships between the internal states of a target device and side-channel information. We use the uncovered relationships to monitor the internal state of a running device and determine whether the internal state is an expected one. An unexpected state may be a sign of incorrect execution or malicious activity. To further explore the possibilities inherent in side-channel-based software integrity checking, we investigate various hardware platforms, representative of different degrees of knowledge of the hardware from the side-channel profiling point of view. In other words, side-channel information is extracted by black-, grey-, and white-box analysis. Each one involves unique challenges requiring different techniques to successfully derive “side-channel profiles”. We can use these profiles to detect unexpected states with extremely high probability, even when an adversary knows that their code may be subject to side-channel analysis, i.e., the methodology is robust to side-channel-aware adversaries. The research includes: (1) Constructing systematic approaches for black- and grey-box profiling of side channels (and comparing them to white-box analysis); (2) Designing custom measurement instrumentation; and (3) Developing techniques for monitoring and enforcing software integrity utilizing side-channel profiles. We introduce the term “side-channel programming” to refer to techniques we design in which developers explicitly utilize side-channel characteristics of existing hardware to optimize run-time software integrity checking, creating executable code which is more conducive to side-channel-based monitoring. Compared with other software integrity checking techniques, our approach has numerous benefits. Among them are that the measurement process is non-invasive, non-interruptive, and backward-compatible in that it does not require any hardware modification, meaning our approach works with processors that do not include security features. Our method can even be used to augment existing protection mechanism, as it works even when all security mechanisms internal to the device fail

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Gaussian Process Regression for Materials and Molecules.

    Get PDF
    We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Model design for algorithmic efficiency in electromagnetic sensing

    Get PDF
    The objective of the proposed research is to develop structural changes to the design and application of electromagnetic (EM) sensing models to more efficiently and accurately invert EM measurements to extract parameters for applications such as landmine detection. Two different acquisition modalities are addressed in this research: ground-penetrating radar (GPR) and electromagnetic induction (EMI) sensors. The models needed for practical three-dimensional (3D) spatial imaging typically become impractically large, with up to seven dimensions of parameters that need to be extracted. These parameters include, but are not limited to target type, 3D location, and 3D orientation. The new special structures for these models exploit properties such as shift invariance and tensor representation, which can be combined with strategic inversion techniques, including the Fast Fourier Transform and semidefinite programming. The structures dramatically reduce the amount of computation and can eliminate the need to store up to five dimensions of parameters while still accurately estimating them.Ph.D

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    On the use of context information for an improved application of data-based algorithms in condition monitoring

    Get PDF
    xi, 124 p.En el campo de la monitorización de la condición, los algoritmos basados en datos cuentan con un amplio recorrido. Desde el uso de los gráficos de control de calidad que se llevan empleando durante casi un siglo a técnicas de mayor complejidad como las redes neuronales o máquinas de soporte vectorial que se emplean para detección, diagnóstico y estimación de vida remanente de los equipos. Sin embargo, la puesta en producción de los algoritmos de monitorización requiere de un estudio exhaustivo de un factor que es a menudo obviado por otros trabajos de la literatura: el contexto. El contexto, que en este trabajo es considerado como el conjunto de factores que influencian la monitorización de un bien, tiene un gran impacto en la algoritmia de monitorización y su aplicación final. Por este motivo, es el objeto de estudio de esta tesis en la que se han analizado tres casos de uso. Se ha profundizado en sus respectivos contextos, tratando de generalizar a la problemática habitual en la monitorización de maquinaria industrial, y se ha abordado dicha problemática de monitorización de forma que solucionen el contexto en lugar de cada caso de uso. Así, el conocimiento adquirido durante el desarrollo de las soluciones puede ser transferido a otros casos de uso que cuenten con contextos similares
    corecore