2,492 research outputs found

    Visualization For Troubleshooting CSV Files

    Get PDF

    Interactive, multi-purpose traffic prediction platform using connected vehicles dataset

    Get PDF
    Traffic congestion is a perennial issue because of the increasing traffic demand yet limited budget for maintaining current transportation infrastructure; let alone expanding them. Many congestion management techniques require timely and accurate traffic estimation and prediction. Examples of such techniques include incident management, real-time routing, and providing accurate trip information based on historical data. In this dissertation, a speech-powered traffic prediction platform is proposed, which deploys a new deep learning algorithm for traffic prediction using Connected Vehicles (CV) data. To speed-up traffic forecasting, a Graph Convolution -- Gated Recurrent Unit (GC-GRU) architecture is proposed and analysis of its performance on tabular data is compared to state-of-the-art models. GC-GRU's Mean Absolute Percentage Error (MAPE) was very close to Transformer (3.16 vs 3.12) while achieving the fastest inference time and a six-fold faster training time than Transformer, although Long-Short-Term Memory (LSTM) was the fastest in training. Such improved performance in traffic prediction with a shorter inference time and competitive training time allows the proposed architecture to better cater to real-time applications. This is the first study to demonstrate the advantage of using multiscale approach by combining CV data with conventional sources such as Waze and probe data. CV data was better at detecting short duration, Jam and stand-still incidents and detected them earlier as compared to probe. CV data excelled at detecting minor incidents with a 90 percent detection rate versus 20 percent for probes and detecting them 3 minutes faster. To process the big CV data faster, a new algorithm is proposed to extract the spatial and temporal features from the CSV files into a Multiscale Data Analysis (MDA). The algorithm also leverages Graphics Processing Unit (GPU) using the Nvidia Rapids framework and Dask parallel cluster in Python. The results show a seventy-fold speedup in the data Extract, Transform, Load (ETL) of the CV data for the State of Missouri of an entire day for all the unique CV journeys (reducing the processing time from about 48 hours to 25 minutes). The processed data is then fed into a customized UNet model that learns highlevel traffic features from network-level images to predict large-scale, multi-route, speed and volume of CVs. The accuracy and robustness of the proposed model are evaluated by taking different road types, times of day and image snippets of the developed model and comparable benchmarks. To visually analyze the historical traffic data and the results of the prediction model, an interactive web application powered by speech queries is built to offer accurate and fast insights of traffic performance, and thus, allow for better positioning of traffic control strategies. The product of this dissertation can be seamlessly deployed by transportation authorities to understand and manage congestions in a timely manner.Includes bibliographical references

    Business Analytics Using Predictive Algorithms

    Get PDF
    In today's data-driven business landscape, organizations strive to extract actionable insights and make informed decisions using their vast data. Business analytics, combining data analysis, statistical modeling, and predictive algorithms, is crucial for transforming raw data into meaningful information. However, there are gaps in the field, such as limited industry focus, algorithm comparison, and data quality challenges. This work aims to address these gaps by demonstrating how predictive algorithms can be applied across business domains for pattern identification, trend forecasting, and accurate predictions. The report focuses on sales forecasting and topic modeling, comparing the performance of various algorithms including Linear Regression, Random Forest Regression, XGBoost, LSTMs, and ARIMA. It emphasizes the importance of data preprocessing, feature selection, and model evaluation for reliable sales forecasts, while utilizing S-BERT, UMAP, and HDBScan unsupervised algorithms for extracting valuable insights from unstructured textual data

    Analysis of Illegal Parking Behavior in Lisbon: Predicting and Analyzing Illegal Parking Incidents in Lisbon´s Top 10 Critical Streets

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceIllegal parking represents a costly and pervasive problem for most cities, as it not only leads to an increase in traffic congestion and the emission of air pollutants but also compromises pedestrian, biking, and driving safety. Moreover, it obstructs the flow of emergency vehicles, delivery services, and other essential functions, posing a significant risk to public safety and impeding the efficient operation of urban services. These detrimental effects ultimately diminish the cleanliness, security, and overall attractiveness of cities, impacting the well-being of both residents and visitors alike. Traditionally, decision-support systems utilized for addressing illegal parking have heavily relied on costly camera systems and complex video-processing algorithms to detect and monitor infractions in real time. However, the implementation of such systems is often challenging and expensive, particularly considering the diverse and dynamic road environment conditions. Alternatively, research studies focusing on spatiotemporal features for predicting parking infractions present a more efficient and cost-effective approach. This project focuses on the development of a machine learning model to accurately predict illegal parking incidents in the ten highly critical streets of Lisbon Municipality, taking into account the hour period and whether it is a weekend or holiday. A comprehensive evaluation of various machine learning algorithms was conducted, and the k-nearest neighbors (KNN) algorithm emerged as the top performing model. The KNN model exhibited robust predictive capabilities, effectively estimating the occurrence of illegal parking in the most critical streets, and together with the creation of an interactive and user-friendly dashboard, this project contributes valuable insights for urban planners, policymakers, and law enforcement agencies, empowering them to enhance public safety and security through informed decision-making

    Data Profiling in Cloud Migration: Data Quality Measures while Migrating Data from a Data Warehouse to the Google Cloud Platform

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn today times, corporations have gained a vast interest in data. More and more, companies realized that the key to improving their efficiency and effectiveness and understanding their customers’ needs and preferences better was reachable by mining data. However, as the amount of data grow, so must the companies necessities for storage capacity and ensuring data quality for more accurate insights. As such, new data storage methods must be considered, evolving from old ones, still keeping data integrity. Migrating a company’s data from an old method like a Data Warehouse to a new one, Google Cloud Platform is an elaborate task. Even more so when data quality needs to be assured and sensible data, like Personal Identifiable Information, needs to be anonymized in a Cloud computing environment. To ensure these points, profiling data, before or after it migrated, has a significant value by design a profile for the data available in each data source (e.g., Databases, files, and others) based on statistics, metadata information, and pattern rules. Thus, ensuring data quality is within reasonable standards through statistics metrics, and all Personal Identifiable Information is identified and anonymized accordingly. This work will reflect the required process of how profiling Data Warehouse data can improve data quality to better migrate to the Cloud

    Adversarially Reweighted Sequence Anomaly Detection With Limited Log Data

    Get PDF
    In the realm of safeguarding digital systems, the ability to detect anomalies in log sequences is paramount, with applications spanning cybersecurity, network surveillance, and financial transaction monitoring. This thesis presents AdvSVDD, a sophisticated deep learning model designed for sequence anomaly detection. Built upon the foundation of Deep Support Vector Data Description (Deep SVDD), AdvSVDD stands out by incorporating Adversarial Reweighted Learning (ARL) to enhance its performance, particularly when confronted with limited training data. By leveraging the Deep SVDD technique to map normal log sequences into a hypersphere and harnessing the amplification effects of Adversarial Reweighted Learning, AdvSVDD demonstrates remarkable efficacy in anomaly detection. Empirical evaluations on the BlueGene/L (BG/L) and Thunderbird supercomputer datasets showcase AdvSVDD’s superiority over conventional machine learning and deep learning approaches, including the foundational Deep SVDD framework. Performance metrics such as Precision, Recall, F1-Score, ROC AUC, and PR AUC attest to its proficiency. Furthermore, the study emphasizes AdvSVDD’s effectiveness under constrained training data and offers valuable insights into the role of adversarial component has in the enhancement of anomaly detection

    Performance Analysis of PV Power Plants Across Norway

    Get PDF
    This thesis examines hourly aggregated data from 501 photovoltaic (PV) installations, builds a better knowledge foundation about the geographical performance of PV systems in Norway, and provides a groundwork for how PV datasets with limited metadata can be analyzed. Metadata is supplemented with inferred tilt and azimuth by analyzing the power and irradiance relationship at different orientations, with 1-degree intervals. When tested with a known PV installation, the result shows a median accuracy of 12.2 and 14.1 degrees for tilt and azimuth, respectively. To analyze the performance of PV installations, the power output data is filtered with a linear filter (RANSAC) and a non-linear polynomial filter. The latter shows promising results, as long as specific requirements regarding the number of available timestamps are available. Unknown capacity units are inferred by selecting highly probable units (Wp, kW_p, and MW_p) and finding highly probable specific yields. Installations, where highly probable specific yields are not found using these units have been removed from further analysis

    Quantitative Risk Analysis using Real-time Data and Change-point Analysis for Data-informed Risk Prediction

    Get PDF
    Incidents in highly hazardous process industries (HHPI) are a major concern for various stakeholders due to the impact on human lives, environment, and potentially huge financial losses. Because process activities, location and products are unique, risk analysis techniques applied in the HHPI has evolved over the years. Unfortunately, some limitations of the various quantitative risk analysis (QRA) method currently employed means alternative or more improved methods are required. This research has obtained one such method called Big Data QRA Method. This method relies entirely on big data techniques and real-time process data to identify the point at which process risk is imminent and provide the extent of contribution of other components interacting up to the time index of the risk. Unlike the existing QRA methods which are static and based on unvalidated assumptions and data from single case studies, the big data method is dynamic and can be applied to most process systems. This alternative method is my original contribution to science and the practice of risk analysis The detailed procedure which has been provided in Chapter 9 of this thesis applies multiple change-point analysis and other big data techniques like, (a) time series analysis, (b) data exploration and compression techniques, (c) decision tree modelling, (d) linear regression modelling. Since the distributional properties of process data can change over time, the big data approach was found to be more appropriate. Considering the unique conditions, activities and the process systems use within the HHPI, the dust fire and explosion incidents at the Imperial Sugar Factory and the New England Wood Pellet LLC both of which occurred in the USA were found to be suitable case histories to use as a guide for evaluation of data in this research. Data analysis was performed using open source software packages in R Studio. Based on the investigation, multiple-change-point analysis packages strucchange and changepoint were found to be successful at detecting early signs of deteriorating conditions of component in process equipment and the main process risk. One such process component is a bearing which was suspected as the source of ignition which led to the dust fire and explosion at the Imperial Sugar Factory. As a result, this this research applies the big data QRA method procedure to bearing vibration data to predict early deterioration of bearings and final period when the bearing’s performance begins the final phase of deterioration to failure. Model-based identification of these periods provides an indication of whether the conditions of a mechanical part in process equipment at a particular moment represent an unacceptable risk. The procedure starts with selection of process operation data based on the findings of an incident investigation report on the case history of a known process incident. As the defining components of risk, both the frequency and consequences associated with the risk were obtained from the incident investigation reports. Acceptance criteria for the risk can be applied to the periods between the risks detected by the two change-point packages. The method was validated with two case study datasets to demonstrate its applicability as procedure for QRA. The procedure was then tested with two other case study datasets as examples of its application as a QRA method. The insight obtained from the validation and the applied examples led to the conclusion that big data techniques can be applied to real-time process data for risk assessment in the HHPI
    • …
    corecore