1,503 research outputs found

    IEEE Access Special Section Editorial: Big Data Technology and Applications in Intelligent Transportation

    Get PDF
    During the last few years, information technology and transportation industries, along with automotive manufacturers and academia, are focusing on leveraging intelligent transportation systems (ITS) to improve services related to driver experience, connected cars, Internet data plans for vehicles, traffic infrastructure, urban transportation systems, traffic collaborative management, road traffic accidents analysis, road traffic flow prediction, public transportation service plan, personal travel route plans, and the development of an effective ecosystem for vehicles, drivers, traffic controllers, city planners, and transportation applications. Moreover, the emerging technologies of the Internet of Things (IoT) and cloud computing have provided unprecedented opportunities for the development and realization of innovative intelligent transportation systems where sensors and mobile devices can gather information and cloud computing, allowing knowledge discovery, information sharing, and supported decision making. However, the development of such data-driven ITS requires the integration, processing, and analysis of plentiful information obtained from millions of vehicles, traffic infrastructures, smartphones, and other collaborative systems like weather stations and road safety and early warning systems. The huge amount of data generated by ITS devices is only of value if utilized in data analytics for decision-making such as accident prevention and detection, controlling road risks, reducing traffic carbon emissions, and other applications which bring big data analytics into the picture

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    Data-driven learning framework for associating weather conditions and wind turbine failures

    Get PDF
    The need for cost effective operation and maintenance (O&M) strategies in wind farms has risen significantly with the growing wind energy sector. In order to decrease costs, current practice in wind farm O&M is switching from corrective and preventive strategies to rather predictive ones. Anticipating wind turbine (WT) failures requires sophisticated models to understand the complex WT component degradation processes and to facilitate maintenance decision making. Environmental conditions and their impact on WT reliability play a significant role in these processes and need to be investigated profoundly. This paper is presenting a framework to assess and correlate weather conditions and their effects on WT component failures. Two approaches, using (a) supervised and (b) unsupervised data mining techniques are applied to pre-process the weather and failure data. An apriori rule mining algorithm is employed subsequently, in order to obtain logical interconnections between the failure occurrences and the environmental data, for both approaches. The framework is tested using a large historical failure database of modern wind turbines. The results show the relation between environmental parameters such as relative humidity, ambient temperature, wind speed and the failures of five major WT components: gearbox, generator, frequency converter, pitch and yaw system. Additionally, the performance of each technique, associating weather conditions and WT component failures, is assessed

    A Data Analytics Framework for Smart Grids: Spatio-temporal Wind Power Analysis and Synchrophasor Data Mining

    Get PDF
    abstract: Under the framework of intelligent management of power grids by leveraging advanced information, communication and control technologies, a primary objective of this study is to develop novel data mining and data processing schemes for several critical applications that can enhance the reliability of power systems. Specifically, this study is broadly organized into the following two parts: I) spatio-temporal wind power analysis for wind generation forecast and integration, and II) data mining and information fusion of synchrophasor measurements toward secure power grids. Part I is centered around wind power generation forecast and integration. First, a spatio-temporal analysis approach for short-term wind farm generation forecasting is proposed. Specifically, using extensive measurement data from an actual wind farm, the probability distribution and the level crossing rate of wind farm generation are characterized using tools from graphical learning and time-series analysis. Built on these spatial and temporal characterizations, finite state Markov chain models are developed, and a point forecast of wind farm generation is derived using the Markov chains. Then, multi-timescale scheduling and dispatch with stochastic wind generation and opportunistic demand response is investigated. Part II focuses on incorporating the emerging synchrophasor technology into the security assessment and the post-disturbance fault diagnosis of power systems. First, a data-mining framework is developed for on-line dynamic security assessment by using adaptive ensemble decision tree learning of real-time synchrophasor measurements. Under this framework, novel on-line dynamic security assessment schemes are devised, aiming to handle various factors (including variations of operating conditions, forced system topology change, and loss of critical synchrophasor measurements) that can have significant impact on the performance of conventional data-mining based on-line DSA schemes. Then, in the context of post-disturbance analysis, fault detection and localization of line outage is investigated using a dependency graph approach. It is shown that a dependency graph for voltage phase angles can be built according to the interconnection structure of power system, and line outage events can be detected and localized through networked data fusion of the synchrophasor measurements collected from multiple locations of power grids. Along a more practical avenue, a decentralized networked data fusion scheme is proposed for efficient fault detection and localization.Dissertation/ThesisPh.D. Electrical Engineering 201

    Estimation Of Hybrid Models For Real-time Crash Risk Assessment On Freeways

    Get PDF
    Relevance of reactive traffic management strategies such as freeway incident detection has been diminishing with advancements in mobile phone usage and video surveillance technology. On the other hand, capacity to collect, store, and analyze traffic data from underground loop detectors has witnessed enormous growth in the recent past. These two facts together provide us with motivation as well as the means to shift the focus of freeway traffic management toward proactive strategies that would involve anticipating incidents such as crashes. The primary element of proactive traffic management strategy would be model(s) that can separate \u27crash prone\u27 conditions from \u27normal\u27 traffic conditions in real-time. The aim in this research is to establish relationship(s) between historical crashes of specific types and corresponding loop detector data, which may be used as the basis for classifying real-time traffic conditions into \u27normal\u27 or \u27crash prone\u27 in the future. In this regard traffic data in this study were also collected for cases which did not lead to crashes (non-crash cases) so that the problem may be set up as a binary classification. A thorough review of the literature suggested that existing real-time crash \u27prediction\u27 models (classification or otherwise) are generic in nature, i.e., a single model has been used to identify all crashes (such as rear-end, sideswipe, or angle), even though traffic conditions preceding crashes are known to differ by type of crash. Moreover, a generic model would yield no information about the collision most likely to occur. To be able to analyze different groups of crashes independently, a large database of crashes reported during the 5-year period from 1999 through 2003 on Interstate-4 corridor in Orlando were collected. The 36.25-mile instrumented corridor is equipped with 69 dual loop detector stations in each direction (eastbound and westbound) located approximately every ½ mile. These stations report speed, volume, and occupancy data every 30-seconds from the three through lanes of the corridor. Geometric design parameters for the freeway were also collected and collated with historical crash and corresponding loop detector data. The first group of crashes to be analyzed were the rear-end crashes, which account to about 51% of the total crashes. Based on preliminary explorations of average traffic speeds; rear-end crashes were grouped into two mutually exclusive groups. First, those occurring under extended congestion (referred to as regime 1 traffic conditions) and the other which occurred with relatively free-flow conditions (referred to as regime 2 traffic conditions) prevailing 5-10 minutes before the crash. Simple rules to separate these two groups of rear-end crashes were formulated based on the classification tree methodology. It was found that the first group of rear-end crashes can be attributed to parameters measurable through loop detectors such as the coefficient of variation in speed and average occupancy at stations in the vicinity of crash location. For the second group of rear-end crashes (referred to as regime 2) traffic parameters such as average speed and occupancy at stations downstream of the crash location were significant along with off-line factors such as the time of day and presence of an on-ramp in the downstream direction. It was found that regime 1 traffic conditions make up only about 6% of the traffic conditions on the freeway. Almost half of rear-end crashes occurred under regime 1 traffic regime even with such little exposure. This observation led to the conclusion that freeway locations operating under regime 1 traffic may be flagged for (rear-end) crashes without any further investigation. MLP (multilayer perceptron) and NRBF (normalized radial basis function) neural network architecture were explored to identify regime 2 rear-end crashes. The performance of individual neural network models was improved by hybridizing their outputs. Individual and hybrid PNN (probabilistic neural network) models were also explored along with matched case control logistic regression. The stepwise selection procedure yielded the matched logistic regression model indicating the difference between average speeds upstream and downstream as significant. Even though the model provided good interpretation, its classification accuracy over the validation dataset was far inferior to the hybrid MLP/NRBF and PNN models. Hybrid neural network models along with classification tree model (developed to identify the traffic regimes) were able to identify about 60% of the regime 2 rear-end crashes in addition to all regime 1 rear-end crashes with a reasonable number of positive decisions (warnings). It translates into identification of more than ¾ (77%) of all rear-end crashes. Classification models were then developed for the next most frequent type, i.e., lane change related crashes. Based on preliminary analysis, it was concluded that the location specific characteristics, such as presence of ramps, mile-post location, etc. were not significantly associated with these crashes. Average difference between occupancies of adjacent lanes and average speeds upstream and downstream of the crash location were found significant. The significant variables were then subjected as inputs to MLP and NRBF based classifiers. The best models in each category were hybridized by averaging their respective outputs. The hybrid model significantly improved on the crash identification achieved through individual models and 57% of the crashes in the validation dataset could be identified with 30% warnings. Although the hybrid models in this research were developed with corresponding data for rear-end and lane-change related crashes only, it was observed that about 60% of the historical single vehicle crashes (other than rollovers) could also be identified using these models. The majority of the identified single vehicle crashes, according to the crash reports, were caused due to evasive actions by the drivers in order to avoid another vehicle in front or in the other lane. Vehicle rollover crashes were found to be associated with speeding and curvature of the freeway section; the established relationship, however, was not sufficient to identify occurrence of these crashes in real-time. Based on the results from modeling procedure, a framework for parallel real-time application of these two sets of models (rear-end and lane-change) in the form of a system was proposed. To identify rear-end crashes, the data are first subjected to classification tree based rules to identify traffic regimes. If traffic patterns belong to regime 1, a rear-end crash warning is issued for the location. If the patterns are identified to be regime 2, then they are subjected to hybrid MLP/NRBF model employing traffic data from five surrounding traffic stations. If the model identifies the patterns as crash prone then the location may be flagged for rear-end crash, otherwise final check for a regime 2 rear-end crash is applied on the data through the hybrid PNN model. If data from five stations are not available due to intermittent loop failures, the system is provided with the flexibility to switch to models with more tolerant data requirements (i.e., model using traffic data from only one station or three stations). To assess the risk of a lane-change related crash, if all three lanes at the immediate upstream station are functioning, the hybrid of the two of the best individual neural network models (NRBF with three hidden neurons and MLP with four hidden neurons) is applied to the input data. A warning for a lane-change related crash may be issued based on its output. The proposed strategy is demonstrated over a complete day of loop data in a virtual real-time application. It was shown that the system of models may be used to continuously assess and update the risk for rear-end and lane-change related crashes. The system developed in this research should be perceived as the primary component of proactive traffic management strategy. Output of the system along with the knowledge of variables critically associated with specific types of crashes identified in this research can be used to formulate ways for avoiding impending crashes. However, specific crash prevention strategies e.g., variable speed limit and warnings to the commuters demand separate attention and should be addressed through thorough future research

    Mining Heterogeneous Urban Data at Multiple Granularity Layers

    Get PDF
    The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests
    corecore