214 research outputs found

    Safety Performance Prediction of Large-Truck Drivers in the Transportation Industry

    Get PDF
    The trucking industry and truck drivers play a key role in the United States commercial transportation sector. Accidents involving large trucks is one such big event that can cause huge problems to the driver, company, customer and other road users causing property damage and loss of life. The objective of this research is to concentrate on an individual transportation company and use their historical data to build models based on statistical and machine learning methods to predict accidents. The focus is to build models that has high accuracy and correctly predicts an accident. Logistic regression and penalized logistic regression models were tested initially to obtain some interpretation between the predictor variables and the response variable. Random forest, gradient boosting machine (GBM) and deep learning methods are explored to deal with high non-linear and complex data. The cost of fatal and non-fatal accidents is also discussed to weight the difference between training a driver and encountering an accident. Since accidents are very rare events, the model accuracy should be balanced between predicting non-accidents (specificity) and predicting accidents (sensitivity). This framework can be a base line for transportation companies to emphasis the benefits of prediction to have safer and more productive drivers

    Anomaly detection and explanation in big data

    Get PDF
    2021 Spring.Includes bibliographical references.Data quality tests are used to validate the data stored in databases and data warehouses, and to detect violations of syntactic and semantic constraints. Domain experts grapple with the issues related to the capturing of all the important constraints and checking that they are satisfied. The constraints are often identified in an ad hoc manner based on the knowledge of the application domain and the needs of the stakeholders. Constraints can exist over single or multiple attributes as well as records involving time series and sequences. The constraints involving multiple attributes can involve both linear and non-linear relationships among the attributes. We propose ADQuaTe as a data quality test framework that automatically (1) discovers different types of constraints from the data, (2) marks records that violate the constraints as suspicious, and (3) explains the violations. Domain knowledge is required to determine whether or not the suspicious records are actually faulty. The framework can incorporate feedback from domain experts to improve the accuracy of constraint discovery and anomaly detection. We instantiate ADQuaTe in two ways to detect anomalies in non-sequence and sequence data. The first instantiation (ADQuaTe2) uses an unsupervised approach called autoencoder for constraint discovery in non-sequence data. ADQuaTe2 is based on analyzing records in isolation to discover constraints among the attributes. We evaluate the effectiveness of ADQuaTe2 using real-world non-sequence datasets from the human health and plant diagnosis domains. We demonstrate that ADQuaTe2 can discover new constraints that were previously unspecified in existing data quality tests, and can report both previously detected and new faults in the data. We also use non-sequence datasets from the UCI repository to evaluate the improvement in the accuracy of ADQuaTe2 after incorporating ground truth knowledge and retraining the autoencoder model. The second instantiation (IDEAL) uses an unsupervised LSTM-autoencoder for constraint discovery in sequence data. IDEAL analyzes the correlations and dependencies among data records to discover constraints. We evaluate the effectiveness of IDEAL using datasets from Yahoo servers, NASA Shuttle, and Colorado State University Energy Institute. We demonstrate that IDEAL can detect previously known anomalies from these datasets. Using mutation analysis, we show that IDEAL can detect different types of injected faults. We also demonstrate that the accuracy of the approach improves after incorporating ground truth knowledge about the injected faults and retraining the LSTM-Autoencoder model. The novelty of this research lies in the development of a domain-independent framework that effectively and efficiently discovers different types of constraints from the data, detects and explains anomalous data, and minimizes false alarms through an interactive learning process

    A machine learning based drug discovery pipeline: finding new therapies for Cystic Fibrosis

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019O avanço tecnológico e a crescente disponibilidade de dados públicos levaram ao desenvolvimento de metodologias robustas de predição de atividade de compostos com base em aprendizagem automática. Estas metodologias apresentam maior rapidez, eficiência e menores custos que os métodos tradicionais de descoberta de fármacos. Fibrose Quística (FQ) é uma doença autossómica progressiva para a qual existe urgente necessidade de surgimento de novas terapias. Mutações no gene CFTR nos pacientes de FQ levam à produção deficiente do canal de membrana de transporte de aniões CFTR, gerando desequilíbrios iónicos e transporte anormal de fluidos. FQ afeta vários órgãos, os pulmões com mais gravidade, sendo normalmente devido a problemas nestes a causa de morte prematura. A mutação mais prevalente e relevante em FQ é a deleção da fenilalanina 508 (F508del-CFTR). Por esta razão, os principais esforços de descoberta de novos fármacos são direcionados a corrigir ou amenizar os feitos desta mutação. Foi criada uma metodologia com recurso a modelos de aprendizagem automática de classificação e regressão baseada em máquinas de vetores de suporte e Random Forests para descoberta de compostos com potencial terapêutico em FQ a partir de bases de dados de compostos de acesso público. Os compostos mais promissores foram selecionados e testados em laboratório através de ensaios de imunofluorescência com microscopia automatizada de triagem e análise de alto rendimento sobre o efeito na F508del-CFTR, com base na eficiência de tráfego da F508del-CFTR para a membrana plasmática. Os 10 compostos com melhores resultados neste ensaio foram validados com Western Blot e comparados com dois conhecidos compostos corretores da F508del-CFTR. 4 compostos foram identificados como promissores compostos terapêuticos para FQ

    Detection and Explanation of Distributed Denial of Service (DDoS) Attack Through Interpretable Machine Learning

    Get PDF
    Distributed denial of service (DDoS) is a network-based attack where the aim of the attacker is to overwhelm the victim server. The attacker floods the server by sending enormous amount of network packets in a distributed manner beyond the servers capacity and thus causing the disruption of its normal service. In this dissertation, we focus to build intelligent detectors that can learn by themselves with less human interactions and detect DDoS attacks accurately. Machine learning (ML) has promising outcomes throughout the technologies including cybersecurity and provides us with intelligence when applied on Intrusion Detection Systems (IDSs). In addition, from the state-of-the-art ML-based IDSs, the Ensemble classifier (combination of classifiers) outperforms single classifier. Therefore, we have implemented both supervised and unsupervised ensemble frameworks to build IDSs for better DDoS detection accuracy with lower false alarms compared to the existing ones. Our experimentation, done with the most popular and benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017, have achieved at most detection accuracy of 99.1% with the lowest false positive rate of 0.01%. As feature selection is one of the mandatory preprocessing phases in ML classification, we have designed several feature selection techniques for better performances in terms of DDoS detection accuracy, false positive alarms, and training times. Initially, we have implemented an ensemble framework for feature selection (FS) methods which combines almost all well-known FS methods and yields better outcomes compared to any single FS method.The goal of my dissertation is not only to detect DDoS attacks precisely but also to demonstrate explanations for these detections. Interpretable machine learning (IML) technique is used to explain a detected DDoS attack with the help of the effectiveness of the corresponding features. We also have implemented a novel feature selection approach based on IML which helps to find optimum features that are used further to retrain our models. The retrained model gives better performances than general feature selection process. Moreover, we have developed an explainer model using IML that identifies detected DDoS attacks with proper explanations based on effectiveness of the features. The contribution of this dissertation is five-folded with the ultimate goal of detecting the most frequent DDoS attacks in cyber security. In order to detect DDoS attacks, we first used ensemble machine learning classification with both supervised and unsupervised classifiers. For better performance, we then implemented and applied two feature selection approaches, such as ensemble feature selection framework and IML based feature selection approach, both individually and in a combination with supervised ensemble framework. Furthermore, we exclusively added explanations for the detected DDoS attacks with the help of explainer models that are built using LIME and SHAP IML methods. To build trustworthy explainer models, a detailed survey has been conducted on interpretable machine learning methods and on their associated tools. We applied the designed framework in various domains, like smart grid and NLP-based IDS to verify its efficacy and ability of performing as a generic model

    Planet Earth 2011

    Get PDF
    The failure of the UN climate change summit in Copenhagen in December 2009 to effectively reach a global agreement on emission reduction targets, led many within the developing world to view this as a reversal of the Kyoto Protocol and an attempt by the developed nations to shirk out of their responsibility for climate change. The issue of global warming has been at the top of the political agenda for a number of years and has become even more pressing with the rapid industrialization taking place in China and India. This book looks at the effects of climate change throughout different regions of the world and discusses to what extent cleantech and environmental initiatives such as the destruction of fluorinated greenhouse gases, biofuels, and the role of plant breeding and biotechnology. The book concludes with an insight into the socio-religious impact that global warming has, citing Christianity and Islam

    Remote Sensing

    Get PDF
    This dual conception of remote sensing brought us to the idea of preparing two different books; in addition to the first book which displays recent advances in remote sensing applications, this book is devoted to new techniques for data processing, sensors and platforms. We do not intend this book to cover all aspects of remote sensing techniques and platforms, since it would be an impossible task for a single volume. Instead, we have collected a number of high-quality, original and representative contributions in those areas

    Embracing Analytics in the Drinking Water Industry

    Get PDF
    Analytics can support numerous aspects of water industry planning, management, and operations. Given this wide range of touchpoints and applications, it is becoming increasingly imperative that the championship and capability of broad-based analytics needs to be developed and practically integrated to address the current and transitional challenges facing the drinking water industry. Analytics will contribute substantially to future efforts to provide innovative solutions that make the water industry more sustainable and resilient. The purpose of this book is to introduce analytics to practicing water engineers so they can deploy the covered subjects, approaches, and detailed techniques in their daily operations, management, and decision-making processes. Also, undergraduate students as well as early graduate students who are in the water concentrations will be exposed to established analytical techniques, along with many methods that are currently considered to be new or emerging/maturing. This book covers a broad spectrum of water industry analytics topics in an easy-to-follow manner. The overall background and contexts are motivated by (and directly drawn from) actual water utility projects that the authors have worked on numerous recent years. The authors strongly believe that the water industry should embrace and integrate data-driven fundamentals and methods into their daily operations and decision-making process(es) to replace established ìrule-of-thumbî and weak heuristic approaches ñ and an analytics viewpoint, approach, and culture is key to this industry transformation

    Intelligent Transportation Related Complex Systems and Sensors

    Get PDF
    Building around innovative services related to different modes of transport and traffic management, intelligent transport systems (ITS) are being widely adopted worldwide to improve the efficiency and safety of the transportation system. They enable users to be better informed and make safer, more coordinated, and smarter decisions on the use of transport networks. Current ITSs are complex systems, made up of several components/sub-systems characterized by time-dependent interactions among themselves. Some examples of these transportation-related complex systems include: road traffic sensors, autonomous/automated cars, smart cities, smart sensors, virtual sensors, traffic control systems, smart roads, logistics systems, smart mobility systems, and many others that are emerging from niche areas. The efficient operation of these complex systems requires: i) efficient solutions to the issues of sensors/actuators used to capture and control the physical parameters of these systems, as well as the quality of data collected from these systems; ii) tackling complexities using simulations and analytical modelling techniques; and iii) applying optimization techniques to improve the performance of these systems. It includes twenty-four papers, which cover scientific concepts, frameworks, architectures and various other ideas on analytics, trends and applications of transportation-related data

    Recovery of carbon stocks after wildfires in boreal forests : a synthesis

    Get PDF
    Book of abstracts Cool forests at risk? The Critical Role of Boreal and Mountain Ecosystems for People, Bioeconomy, and ClimatePeer reviewe
    corecore