3,627 research outputs found
End-to-end anomaly detection in stream data
Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health
How machine learning informs ride-hailing services: A survey
In recent years, online ride-hailing services have emerged as an important component of urban transportation system, which not only provide significant ease for residents’ travel activities, but also shape new travel behavior and diversify urban mobility patterns. This study provides a thorough review of machine-learning-based methodologies for on-demand ride-hailing services. The importance of on-demand ride-hailing services in the spatio-temporal dynamics of urban traffic is first highlighted, with machine-learning-based macro-level ride-hailing research demonstrating its value in guiding the design, planning, operation, and control of urban intelligent transportation systems. Then, the research on travel behavior from the perspective of individual mobility patterns, including carpooling behavior and modal choice behavior, is summarized. In addition, existing studies on order matching and vehicle dispatching strategies, which are among the most important components of on-line ride-hailing systems, are collected and summarized. Finally, some of the critical challenges and opportunities in ride-hailing services are discussed
Modeling, Predicting and Capturing Human Mobility
Realistic models of human mobility are critical for modern day applications, specifically for recommendation systems, resource planning and process optimization domains. Given the rapid proliferation of mobile devices equipped with Internet connectivity and GPS functionality today, aggregating large sums of individual geolocation data is feasible. The thesis focuses on methodologies to facilitate data-driven mobility modeling by drawing parallels between the inherent nature of mobility trajectories, statistical physics and information theory. On the applied side, the thesis contributions lie in leveraging the formulated mobility models to construct prediction workflows by adopting a privacy-by-design perspective. This enables end users to derive utility from location-based services while preserving their location privacy. Finally, the thesis presents several approaches to generate large-scale synthetic mobility datasets by applying machine learning approaches to facilitate experimental reproducibility
Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective
Data-driven decision making is becoming an integral part of manufacturing
companies. Data is collected and commonly used to improve efficiency and
produce high quality items for the customers. IoT-based and other forms of
object tracking are an emerging tool for collecting movement data of
objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over
space and time. Movement data can provide valuable insights like process
bottlenecks, resource utilization, effective working time etc. that can be used
for decision making and improving efficiency.
Turning movement data into valuable information for industrial management and
decision making requires analysis methods. We refer to this process as movement
analytics. The purpose of this document is to review the current state of work
for movement analytics both in manufacturing and more broadly.
We survey relevant work from both a theoretical perspective and an
application perspective. From the theoretical perspective, we put an emphasis
on useful methods from two research areas: machine learning, and logic-based
knowledge representation. We also review their combinations in view of movement
analytics, and we discuss promising areas for future development and
application. Furthermore, we touch on constraint optimization.
From an application perspective, we review applications of these methods to
movement analytics in a general sense and across various industries. We also
describe currently available commercial off-the-shelf products for tracking in
manufacturing, and we overview main concepts of digital twins and their
applications
Document-level sentiment analysis of email data
Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings
Recommended from our members
Knowledge Discovery and Data Mining for Shared Mobility and Connected and Automated Vehicle Applications
The rapid development of shared mobility and connected and automated vehicles (CAVs) has not only brought new intelligent transportation system (ITS) challenges with the new types of mobility, but also brought a huge opportunity to accelerate the connectivity and informatization of transportation systems, particularly when we consider all the new forms of data that is becoming available. The primary challenge is how to take advantage of the enormous amount of data to discover knowledge, build effective models, and develop impactful applications. With the theoretical and experimental progress being made over the last two decades, data mining and machine learning technologies have become key approaches for parsing data, understanding information, and making informed decisions, especially as the rise of deep learning algorithms bringing new levels of performance to the analysis of large datasets. The combination of data mining and ITS can greatly benefit research and advances in shared mobility and CAVs.This dissertation focuses on knowledge discovery and data mining for shared mobility and CAV applications. When considering big data associated with shared mobility operations and CAV research, data mining techniques can be customized with transportation knowledge to initially parse the data. Then machine learning methods can be used to model the parsed data to elicit hidden knowledge. Finally, the discovered knowledge and extracted information can help in the development of effective shared mobility and CAV applications to achieve the goals of a safer, faster, and more eco-friendly transportation systems.In this dissertation, there are four main sections that are addressed. First, new methodologies are introduced for extracting lane-level road features from rough crowdsourced GPS trajectories via data mining, which is subsequently used as the fundamental information for CAV applications. The proposed method results in decimeter level accuracy, which satisfies the positioning needs for many macroscopic and microscopic shared mobility and CAV applications. Second, macroscopic ride-hailing service big data has been analyzed for demand prediction, vehicle operation, and system efficiency monitoring. The proposed deep learning algorithms increase the ride-hailing demand prediction accuracy to 80% and can help the fleet dispatching system reduce 30% of vacant travel distance. Third, microscopic automated vehicle perception data has been analyzed for a real-time computer vision system that can be used for lane change behavior detection. The proposed deep learning design combines the residual neural network image input with time serious control data and reaches 95% of lane change behavior prediction accuracy. Last but not least, new ride sharing and CAV applications have been simulated in a behavior modeling framework to analyze the impact of mobility and energy consumption, which addresses key barriers by quantifying the transportation system-wide mobility, energy and behavior impacts from new mobility technologies using real-world data
- …