6,085 research outputs found

    Proactive Assessment of Accident Risk to Improve Safety on a System of Freeways, Research Report 11-15

    Get PDF
    This report describes the development and evaluation of real-time crash risk-assessment models for four freeway corridors: U.S. Route 101 NB (northbound) and SB (southbound) and Interstate 880 NB and SB. Crash data for these freeway segments for the 16-month period from January 2010 through April 2011 are used to link historical crash occurrences with real-time traffic patterns observed through loop-detector data. \u27The crash risk-assessment models are based on a binary classification approach (crash and non-crash outcomes), with traffic parameters measured at surrounding vehicle detection station (VDS) locations as the independent variables. The analysis techniques used in this study are logistic regression and classification trees. Prior to developing the models, some data-related issues such as data cleaning and aggregation were addressed. The modeling efforts revealed that the turbulence resulting from speed variation is significantly associated with crash risk on the U.S. 101 NB corridor. The models estimated with data from U.S. 101 NB were evaluated on the basis of their classification performance, not only on U.S. 101 NB, but also on the other three freeway segments for transferability assessment. It was found that the predictive model derived from one freeway can be readily applied to other freeways, although the classification performance decreases. The models that transfer best to other roadways were determined to be those that use the least number of VDSs–that is, those that use one upstream or downstream station rather than two or three.\ The classification accuracy of the models is discussed in terms of how the models can be used for real-time crash risk assessment. The models can be applied to developing and testing variable speed limits (VSLs) and ramp-metering strategies that proactively attempt to reduce crash risk

    ART and ARTMAP Neural Networks for Applications: Self-Organizing Learning, Recognition, and Prediction

    Full text link
    ART and ARTMAP neural networks for adaptive recognition and prediction have been applied to a variety of problems. Applications include parts design retrieval at the Boeing Company, automatic mapping from remote sensing satellite measurements, medical database prediction, and robot vision. This chapter features a self-contained introduction to ART and ARTMAP dynamics and a complete algorithm for applications. Computational properties of these networks are illustrated by means of remote sensing and medical database examples. The basic ART and ARTMAP networks feature winner-take-all (WTA) competitive coding, which groups inputs into discrete recognition categories. WTA coding in these networks enables fast learning, that allows the network to encode important rare cases but that may lead to inefficient category proliferation with noisy training inputs. This problem is partially solved by ART-EMAP, which use WTA coding for learning but distributed category representations for test-set prediction. In medical database prediction problems, which often feature inconsistent training input predictions, the ARTMAP-IC network further improves ARTMAP performance with distributed prediction, category instance counting, and a new search algorithm. A recently developed family of ART models (dART and dARTMAP) retains stable coding, recognition, and prediction, but allows arbitrarily distributed category representation during learning as well as performance.National Science Foundation (IRI 94-01659, SBR 93-00633); Office of Naval Research (N00014-95-1-0409, N00014-95-0657

    A Procedure for Extending Input Selection Algorithms to Low Quality Data in Modelling Problems with Application to the Automatic Grading of Uploaded Assignments

    Get PDF
    When selecting relevant inputs in modeling problems with low quality data, the ranking of the most informative inputs is also uncertain. In this paper, this issue is addressed through a new procedure that allows the extending of different crisp feature selection algorithms to vague data. The partial knowledge about the ordinal of each feature is modelled by means of a possibility distribution, and a ranking is hereby applied to sort these distributions. It will be shown that this technique makes the most use of the available information in some vague datasets. The approach is demonstrated in a real-world application. In the context of massive online computer science courses, methods are sought for automatically providing the student with a qualification through code metrics. Feature selection methods are used to find the metrics involved in the most meaningful predictions. In this study, 800 source code files, collected and revised by the authors in classroom Computer Science lectures taught between 2013 and 2014, are analyzed with the proposed technique, and the most relevant metrics for the automatic grading task are discussed.This work was supported by the Spanish Ministerio de EconomĂ­a y Competitividad under Project TIN2011-24302, including funding from the European Regional Development Fund

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    NIPS - Not Even Wrong? A Systematic Review of Empirically Complete Demonstrations of Algorithmic Effectiveness in the Machine Learning and Artificial Intelligence Literature

    Get PDF
    Objective: To determine the completeness of argumentative steps necessary to conclude effectiveness of an algorithm in a sample of current ML/AI supervised learning literature. Data Sources: Papers published in the Neural Information Processing Systems (NeurIPS, n\'ee NIPS) journal where the official record showed a 2017 year of publication. Eligibility Criteria: Studies reporting a (semi-)supervised model, or pre-processing fused with (semi-)supervised models for tabular data. Study Appraisal: Three reviewers applied the assessment criteria to determine argumentative completeness. The criteria were split into three groups, including: experiments (e.g real and/or synthetic data), baselines (e.g uninformed and/or state-of-art) and quantitative comparison (e.g. performance quantifiers with confidence intervals and formal comparison of the algorithm against baselines). Results: Of the 121 eligible manuscripts (from the sample of 679 abstracts), 99\% used real-world data and 29\% used synthetic data. 91\% of manuscripts did not report an uninformed baseline and 55\% reported a state-of-art baseline. 32\% reported confidence intervals for performance but none provided references or exposition for how these were calculated. 3\% reported formal comparisons. Limitations: The use of one journal as the primary information source may not be representative of all ML/AI literature. However, the NeurIPS conference is recognised to be amongst the top tier concerning ML/AI studies, so it is reasonable to consider its corpus to be representative of high-quality research. Conclusion: Using the 2017 sample of the NeurIPS supervised learning corpus as an indicator for the quality and trustworthiness of current ML/AI research, it appears that complete argumentative chains in demonstrations of algorithmic effectiveness are rare

    Managed information gathering and fusion for transient transport problems

    Get PDF
    This paper deals with vehicular traffic management by communication technologies from Traffic Control Center point of view in road networks. The global goal is to manage the urban traffic by road traffic operations, controlling and interventional possibilities in order to minimize the traffic delays and stops and to improve traffic safety on the roads. This paper focuses on transient transport, when the controlling management is crucial. The aim was to detect the beginning time of the transient traffic on the roads, to gather the most appropriate data and to get reliable information for interventional suggestions. More reliable information can be created by information fusion, several fusion techniques are expounded in this paper. A half-automatic solution with Decision Support System has been developed to help with engineers in suggestions of interventions based on real time traffic data. The information fusion has benefits for Decision Support System: the complementary sensors may fill the gaps of one another, the system is able to detect the changing of the percentage of different vehicle types in traffic. An example of detection and interventional suggestion about transient traffic on transport networks of a little town is presented at the end of the paper. The novelty of this paper is the gathering of information - triggered by the state changing from stationer to transient - from ad hoc channels and combining them with information from developed regular channels. --information gathering,information fusion,Kalman filter,transient traffic,Decision Support System

    Gaussian process models for SCADA data based wind turbine performance/condition monitoring

    Get PDF
    Wind energy has seen remarkable growth in the past decade, and installed wind turbine capacity is increasing significantly every year around the globe. The presence of an excellent offshore wind resource and the need to reduce carbon emissions from electricity generation are driving policy to increase offshore wind generation capacity in UK waters. Logistic and transport issues make offshore maintenance costlier than onshore and availability correspondingly lower, and as a result, there is a growing interest in wind turbine condition monitoring allowing condition based, rather than corrective or scheduled, maintenance.;Offshore wind turbine manufacturers are constantly increasing the rated size the turbines, and also their hub height in order to access higher wind speeds with lower turbulence. However, such scaling up leads to significant increments in terms of materials for both tower structure and foundations, and also costs required for transportation, installation, and maintenance. Wind turbines are costly affairs that comprise several complex systems connected altogether (e.g., hub, drive shaft, gearbox, generator, yaw system, electric drive and so on).;The unexpected failure of these components can cause significant machine unavailability and/or damage to other components. This ultimately increases the operation and maintenance (O&M) cost and subsequently cost of energy (COE). Therefore, identifying faults at an early stage before catastrophic damage occurs is the primary objective associated with wind turbine condition monitoring.;Existing wind turbine condition monitoring strategies, for example, vibration signal analysis and oil debris detection, require costly sensors. The additional costs can be significant depending upon the number of wind turbines typically deployed in offshore wind farms and also, costly expertise is generally required to interpret the results. By contrast, Supervisory Control and Data Acquisition (SCADA) data analysis based condition monitoring could underpin condition based maintenance with little or no additional cost to the wind farm operator.;A Gaussian process (GP) is a stochastic, nonlinear and nonparametric model whose distribution function is the joint distribution of a collection of random variables; it is widely suitable for classification and regression problems. GP is a machine learning algorithm that uses a measure of similarity between subsequent data points (via covariance functions) to fit and or estimate the future value from a training dataset. GP models have been applied to numerous multivariate and multi-task problems including spatial and spatiotemporal contexts.;Furthermore, GP models have been applied to electricity price and residential probabilistic load forecasting, solar power forecasting. However, the application of GPs to wind turbine condition monitoring has to date been limited and not much explored.;This thesis focuses on GP based wind turbine condition monitoring that utilises data from SCADA systems exclusively. The selection of the covariance function greatly influences GP model accuracy. A comparative analysis of different covariance functions for GP models is presented with an in-depth analysis of popularly used stationary covariance functions. Based on this analysis, a suitable covariance function is selected for constructing a GP model-based fault detection algorithm for wind turbine condition monitoring.;By comparing incoming operational SCADA data, effective component condition indicators can be derived where the reference model is based on SCADA data from a healthy turbine constructed and compared against incoming data from a faulty turbine. In this thesis, a GP algorithm is constructed with suitable covariance function to detect incipient turbine operational faults or failures before they result in catastrophic damage so that preventative maintenance can be scheduled in a timely manner.;In order to judge GP model effectiveness, two other methods, based on binning, have been tested and compared with the GP based algorithm. This thesis also considers a range of critical turbine parameters and their impact on the GP fault detection algorithm.;Power is well known to be influenced by air density, and this is reflected in the IEC Standard air density correction procedure. Hence, the proper selection of an air density correction approach can improve the power curve model. This thesis addresses this, explores the different types of air density correction approach, and suggests the best way to incorporate these in the GP models to improve accuracy and reduce uncertainty.;Finally, a SCADA data based fault detection algorithm is constructed to detect failures caused by the yaw misalignment. Two fault detection algorithms based on IEC binning methods (widely used within the wind industry) are developed to assess the performance of the GP based fault detection algorithm in terms of their capability to detect in advance (and by how much) signs of failure, and also their false positive rate by making use of extensive SCADA data and turbine fault and repair logs.;GP models are robust in identifying early anomalies/failures that cause the wind turbine to underperform. This early detection is helpful in preventing machines to reach the catastrophic stage and allow enough time to undertake scheduled maintenance, which ultimately reduces the O&M, cost and maximises the power performance of wind turbines. Overall, results demonstrate the effectiveness of the GP algorithm in improving the performance of wind turbines through condition monitoring.Wind energy has seen remarkable growth in the past decade, and installed wind turbine capacity is increasing significantly every year around the globe. The presence of an excellent offshore wind resource and the need to reduce carbon emissions from electricity generation are driving policy to increase offshore wind generation capacity in UK waters. Logistic and transport issues make offshore maintenance costlier than onshore and availability correspondingly lower, and as a result, there is a growing interest in wind turbine condition monitoring allowing condition based, rather than corrective or scheduled, maintenance.;Offshore wind turbine manufacturers are constantly increasing the rated size the turbines, and also their hub height in order to access higher wind speeds with lower turbulence. However, such scaling up leads to significant increments in terms of materials for both tower structure and foundations, and also costs required for transportation, installation, and maintenance. Wind turbines are costly affairs that comprise several complex systems connected altogether (e.g., hub, drive shaft, gearbox, generator, yaw system, electric drive and so on).;The unexpected failure of these components can cause significant machine unavailability and/or damage to other components. This ultimately increases the operation and maintenance (O&M) cost and subsequently cost of energy (COE). Therefore, identifying faults at an early stage before catastrophic damage occurs is the primary objective associated with wind turbine condition monitoring.;Existing wind turbine condition monitoring strategies, for example, vibration signal analysis and oil debris detection, require costly sensors. The additional costs can be significant depending upon the number of wind turbines typically deployed in offshore wind farms and also, costly expertise is generally required to interpret the results. By contrast, Supervisory Control and Data Acquisition (SCADA) data analysis based condition monitoring could underpin condition based maintenance with little or no additional cost to the wind farm operator.;A Gaussian process (GP) is a stochastic, nonlinear and nonparametric model whose distribution function is the joint distribution of a collection of random variables; it is widely suitable for classification and regression problems. GP is a machine learning algorithm that uses a measure of similarity between subsequent data points (via covariance functions) to fit and or estimate the future value from a training dataset. GP models have been applied to numerous multivariate and multi-task problems including spatial and spatiotemporal contexts.;Furthermore, GP models have been applied to electricity price and residential probabilistic load forecasting, solar power forecasting. However, the application of GPs to wind turbine condition monitoring has to date been limited and not much explored.;This thesis focuses on GP based wind turbine condition monitoring that utilises data from SCADA systems exclusively. The selection of the covariance function greatly influences GP model accuracy. A comparative analysis of different covariance functions for GP models is presented with an in-depth analysis of popularly used stationary covariance functions. Based on this analysis, a suitable covariance function is selected for constructing a GP model-based fault detection algorithm for wind turbine condition monitoring.;By comparing incoming operational SCADA data, effective component condition indicators can be derived where the reference model is based on SCADA data from a healthy turbine constructed and compared against incoming data from a faulty turbine. In this thesis, a GP algorithm is constructed with suitable covariance function to detect incipient turbine operational faults or failures before they result in catastrophic damage so that preventative maintenance can be scheduled in a timely manner.;In order to judge GP model effectiveness, two other methods, based on binning, have been tested and compared with the GP based algorithm. This thesis also considers a range of critical turbine parameters and their impact on the GP fault detection algorithm.;Power is well known to be influenced by air density, and this is reflected in the IEC Standard air density correction procedure. Hence, the proper selection of an air density correction approach can improve the power curve model. This thesis addresses this, explores the different types of air density correction approach, and suggests the best way to incorporate these in the GP models to improve accuracy and reduce uncertainty.;Finally, a SCADA data based fault detection algorithm is constructed to detect failures caused by the yaw misalignment. Two fault detection algorithms based on IEC binning methods (widely used within the wind industry) are developed to assess the performance of the GP based fault detection algorithm in terms of their capability to detect in advance (and by how much) signs of failure, and also their false positive rate by making use of extensive SCADA data and turbine fault and repair logs.;GP models are robust in identifying early anomalies/failures that cause the wind turbine to underperform. This early detection is helpful in preventing machines to reach the catastrophic stage and allow enough time to undertake scheduled maintenance, which ultimately reduces the O&M, cost and maximises the power performance of wind turbines. Overall, results demonstrate the effectiveness of the GP algorithm in improving the performance of wind turbines through condition monitoring

    Learning, Categorization, Rule Formation, and Prediction by Fuzzy Neural Networks

    Full text link
    National Science Foundation (IRI 94-01659); Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015) Air Force Office of Scientific Research (90-0083, N00014-92-J-4015
    • …
    corecore