1,590 research outputs found

    A MapReduce-based nearest neighbor approach for big-data-driven traffic flow prediction

    Full text link
    In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup

    On-the-fly tracing for data-centric computing : parallelization, workflow and applications

    Get PDF
    As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization methods put an emphasis on formalism and may not achieve optimal performance with the given limited computing resources. In this work we propose a cross-platform programming paradigm, called on-the-fly data tracing , to provide source-to-source transformation where the same framework also provides the functionality of workflow optimization on larger applications. Using a big-data approximation computations related to large-scale data input are identified in the code and workflow and a simplified core dependence graph is built based on the computational load taking in to account big data. The code can then be partitioned into sections for efficient parallelization; and at the workflow level, optimization can be performed by adjusting the scheduling for big-data considerations, including the I/O performance of the machine. Regarding each unit in both source code and workflow as a model, this framework enables model-based parallel programming that matches the available computing resources. The techniques used in model-based parallel programming as well as the design of the software framework for both parallelization and workflow optimization as well as its implementations with multiple programming languages are presented in the dissertation. Then, the following experiments are performed to validate the framework: i) the benchmarking of parallelization speed-up using typical examples in data analysis and machine learning (e.g. naive Bayes, k-means) and ii) three real-world applications in data-centric computing with the framework are also described to illustrate the efficiency: pattern detection from hurricane and storm surge simulations, road traffic flow prediction and text mining from social media data. In the applications, it illustrates how to build scalable workflows with the framework along with performance enhancements

    Load Forecasting Based Distribution System Network Reconfiguration-A Distributed Data-Driven Approach

    Full text link
    In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, the proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.Comment: 5 pages, preprint for Asilomar Conference on Signals, Systems, and Computers 201

    Empirical Formulation of Highway Traffic Flow Prediction Objective Function Based on Network Topology

    Get PDF
    Accurate Highway road predictions are necessary for timely decision making by the transport authorities. In this paper, we propose a traffic flow objective function for a highway road prediction model. The bi-directional flow function of individual roads is reported considering the net inflows and outflows by a topological breakdown of the highway network. Further, we optimise and compare the proposed objective function for constraints involved using stacked long short-term memory (LSTM) based recurrent neural network machine learning model considering different loss functions and training optimisation strategies. Finally, we report the best fitting machine learning model parameters for the proposed flow objective function for better prediction accuracy.Peer reviewe

    Doctor of Philosophy

    Get PDF
    dissertationThe Active Traffic and Demand Management (ATDM) initiative aims to integrate various management strategies and control measures so as to achieve the mobility, environment and sustainability goals. To support the active monitoring and management of real-world complex traffic conditions, the first objective of this dissertation is to develop a travel time reliability estimation and prediction methodology that can provide informed decisions for the management and operation agencies and travelers. A systematic modeling framework was developed to consider a corridor with multiple bottlenecks, and a series of close-form formulas was derived to quantify the travel time distribution under both stochastic demand and capacity, with possible on-ramp and off-ramp flow changes. Traffic state estimation techniques are often used to guide operational management decisions, and accurate traffic estimates are critically needed in ATDM applications designed for reducing instability, volatility and emissions in the transportation system. By capturing the essential forward and backward wave propagation characteristics under possible random measurement errors, this dissertation proposes a unified representation with a simple but theoretically sound explanation for traffic observations under free-flow, congested and dynamic transient conditions. This study also presents a linear programming model to quantify the value of traffic measurements, in a heterogeneous data environment with fixed sensors, Bluetooth readers and GPS sensors. It is important to design comprehensive traffic control measures that can systematically address deteriorating congestion and environmental issues. To better evaluate and assess the mobility and environmental benefits of the transportation improvement plans, this dissertation also discusses a cross-resolution modeling framework for integrating a microscopic emission model with the existing mesoscopic traffic simulation model. A simplified car-following model-based vehicle trajectory construction method is used to generate the high-resolution vehicle trajectory profiles and resulting emission output. In addition, this dissertation discusses a number of important issues for a cloud computing-based software system implementation. A prototype of a reliability-based traveler information provision and dissemination system is developed to offer a rich set of travel reliability information for the general public and traffic management and planning organizations

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c
    • …
    corecore