120,427 research outputs found

    Exploiting Latent Features of Text and Graphs

    Get PDF
    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Machine-Learning-Powered Cyber-Physical Systems

    Get PDF
    In the last few years, we witnessed the revolution of the Internet of Things (IoT) paradigm and the consequent growth of Cyber-Physical Systems (CPSs). IoT devices, which include a plethora of smart interconnected sensors, actuators, and microcontrollers, have the ability to sense physical phenomena occurring in an environment and provide copious amounts of heterogeneous data about the functioning of a system. As a consequence, the large amounts of generated data represent an opportunity to adopt artificial intelligence and machine learning techniques that can be used to make informed decisions aimed at the optimization of such systems, thus enabling a variety of services and applications across multiple domains. Machine learning processes and analyses such data to generate a feedback, which represents a status the environment is in. A feedback given to the user in order to make an informed decision is called an open-loop feedback. Thus, an open-loop CPS is characterized by the lack of an actuation directed at improving the system itself. A feedback used by the system itself to actuate a change aimed at optimizing the system itself is called a closed-loop feedback. Thus, a closed-loop CPS pairs feedback based on sensing data with an actuation that impacts the system directly. In this dissertation, we propose several applications in the context of CPS. We propose open-loop CPSs designed for the early prediction, diagnosis, and persistency detection of Bovine Respiratory Disease (BRD) in dairy calves, and for gait activity recognition in horses.These works use sensor data, such as pedometers and automated feeders, to perform valuable real-field data collection. Data are then processed by a mix of state-of-the-art approaches as well as novel techniques, before being fed to machine learning algorithms for classification, which informs the user on the status of their animals. Our work further evaluates a variety of trade-offs. In the context of BRD, we adopt optimization techniques to explore the trade-offs of using sensor data as opposed to manual examination performed by domain experts. Similarly, we carry out an extensive analysis on the cost-accuracy trade-offs, which farmers can adopt to make informed decisions on their barn investments. In the context of horse gait recognition we evaluate the benefits of lighter classifications algorithms to improve energy and storage usage, and their impact on classification accuracy. With respect to closed-loop CPS we proposes an incentive-based demand response approach for Heating Ventilation and Air Conditioning (HVAC) designed for peak load reduction in the context of smart grids. Specifically, our approach uses machine learning to process power data from smart thermostats deployed in user homes, along with their personal temperature preferences. Our machine learning models predict power savings due to thermostat changes, which are then plugged into our optimization problem that uses auction theory coupled with behavioral science. This framework selects the set of users who fulfill the power saving requirement, while minimizing financial incentives paid to the users, and, as a consequence, their discomfort. Our work on BRD has been published on IEEE DCOSS 2022 and Frontiers in Animal Science. Our work on gait recognition has been published on IEEE SMARTCOMP 2019 and Elsevier PMC 2020, and our work on energy management and energy prediction has been published on IEEE PerCom 2022 and IEEE SMARTCOMP 2022. Several other works are under submission when this thesis was written, and are included in this document as well

    Rethinking affordance

    Get PDF
    n/a ā€“ Critical survey essay retheorising the concept of 'affordance' in digital media context. Lead article in a special issue on the topic, co-edited by the authors for the journal Media Theory

    The Intuitive Appeal of Explainable Machines

    Get PDF
    Algorithmic decision-making has become synonymous with inexplicable decision-making, but what makes algorithms so difficult to explain? This Article examines what sets machine learning apart from other ways of developing rules for decision-making and the problem these properties pose for explanation. We show that machine learning models can be both inscrutable and nonintuitive and that these are related, but distinct, properties. Calls for explanation have treated these problems as one and the same, but disentangling the two reveals that they demand very different responses. Dealing with inscrutability requires providing a sensible description of the rules; addressing nonintuitiveness requires providing a satisfying explanation for why the rules are what they are. Existing laws like the Fair Credit Reporting Act (FCRA), the Equal Credit Opportunity Act (ECOA), and the General Data Protection Regulation (GDPR), as well as techniques within machine learning, are focused almost entirely on the problem of inscrutability. While such techniques could allow a machine learning system to comply with existing law, doing so may not help if the goal is to assess whether the basis for decision-making is normatively defensible. In most cases, intuition serves as the unacknowledged bridge between a descriptive account and a normative evaluation. But because machine learning is often valued for its ability to uncover statistical relationships that defy intuition, relying on intuition is not a satisfying approach. This Article thus argues for other mechanisms for normative evaluation. To know why the rules are what they are, one must seek explanations of the process behind a modelā€™s development, not just explanations of the model itself

    Weka: A machine learning workbench for data mining

    Get PDF
    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing configurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License
    • ā€¦
    corecore