649 research outputs found

    Machine Learning-based Methods for Driver Identification and Behavior Assessment: Applications for CAN and Floating Car Data

    Get PDF
    The exponential growth of car generated data, the increased connectivity, and the advances in artificial intelligence (AI), enable novel mobility applications. This dissertation focuses on two use-cases of driving data, namely distraction detection and driver identification (ID). Low and medium-income countries account for 93% of traffic deaths; moreover, a major contributing factor to road crashes is distracted driving. Motivated by this, the first part of this thesis explores the possibility of an easy-to-deploy solution to distracted driving detection. Most of the related work uses sophisticated sensors or cameras, which raises privacy concerns and increases the cost. Therefore a machine learning (ML) approach is proposed that only uses signals from the CAN-bus and the inertial measurement unit (IMU). It is then evaluated against a hand-annotated dataset of 13 drivers and delivers reasonable accuracy. This approach is limited in detecting short-term distractions but demonstrates that a viable solution is possible. In the second part, the focus is on the effective identification of drivers using their driving behavior. The aim is to address the shortcomings of the state-of-the-art methods. First, a driver ID mechanism based on discriminative classifiers is used to find a set of suitable signals and features. It uses five signals from the CAN-bus, with hand-engineered features, which is an improvement from current state-of-the-art that mainly focused on external sensors. The second approach is based on Gaussian mixture models (GMMs), although it uses two signals and fewer features, it shows improved accuracy. In this system, the enrollment of a new driver does not require retraining of the models, which was a limitation in the previous approach. In order to reduce the amount of training data a Triplet network is used to train a deep neural network (DNN) that learns to discriminate drivers. The training of the DNN does not require any driving data from the target set of drivers. The DNN encodes pieces of driving data to an embedding space so that in this space examples of the same driver will appear closer to each other and far from examples of other drivers. This technique reduces the amount of data needed for accurate prediction to under a minute of driving data. These three solutions are validated against a real-world dataset of 57 drivers. Lastly, the possibility of a driver ID system is explored that only uses floating car data (FCD), in particular, GPS data from smartphones. A DNN architecture is then designed that encodes the routes, origin, and destination coordinates as well as various other features computed based on contextual information. The proposed model is then evaluated against a dataset of 678 drivers and shows high accuracy. In a nutshell, this work demonstrates that proper driver ID is achievable. The constraints imposed by the use-case and data availability negatively affect the performance; in such cases, the efficient use of the available data is crucial

    Document Filtering for Long-tail Entities

    Full text link
    Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. It is based on intrinsic features, i.e., features that are derived from the documents in which the entities are mentioned. We propose a set of features that capture informativeness, entity-saliency, and timeliness. In particular, we introduce features based on entity aspect similarities, relation patterns, and temporal expressions and combine these with standard features for document filtering. Experiments following the TREC KBA 2014 setup on a publicly available dataset show that our model is able to improve the filtering performance for long-tail entities over several baselines. Results of applying the model to unseen entities are promising, indicating that the model is able to learn the general characteristics of a vital document. The overall performance across all entities---i.e., not just long-tail entities---improves upon the state-of-the-art without depending on any entity-specific training data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 201

    A review of natural language processing in contact centre automation

    Get PDF
    Contact centres have been highly valued by organizations for a long time. However, the COVID-19 pandemic has highlighted their critical importance in ensuring business continuity, economic activity, and quality customer support. The pandemic has led to an increase in customer inquiries related to payment extensions, cancellations, and stock inquiries, each with varying degrees of urgency. To address this challenge, organizations have taken the opportunity to re-evaluate the function of contact centres and explore innovative solutions. Next-generation platforms that incorporate machine learning techniques and natural language processing, such as self-service voice portals and chatbots, are being implemented to enhance customer service. These platforms offer robust features that equip customer agents with the necessary tools to provide exceptional customer support. Through an extensive review of existing literature, this paper aims to uncover research gaps and explore the advantages of transitioning to a contact centre that utilizes natural language solutions as the norm. Additionally, we will examine the major challenges faced by contact centre organizations and offer reco

    Characterizing model uncertainty in ensemble learning

    Get PDF

    Multimodal Classification of Urban Micro-Events

    Get PDF
    In this paper we seek methods to effectively detect urban micro-events. Urban micro-events are events which occur in cities, have limited geographical coverage and typically affect only a small group of citizens. Because of their scale these are difficult to identify in most data sources. However, by using citizen sensing to gather data, detecting them becomes feasible. The data gathered by citizen sensing is often multimodal and, as a consequence, the information required to detect urban micro-events is distributed over multiple modalities. This makes it essential to have a classifier capable of combining them. In this paper we explore several methods of creating such a classifier, including early, late, hybrid fusion and representation learning using multimodal graphs. We evaluate performance on a real world dataset obtained from a live citizen reporting system. We show that a multimodal approach yields higher performance than unimodal alternatives. Furthermore, we demonstrate that our hybrid combination of early and late fusion with multimodal embeddings performs best in classification of urban micro-events

    A Study of Text Mining Framework for Automated Classification of Software Requirements in Enterprise Systems

    Get PDF
    abstract: Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.Dissertation/ThesisMasters Thesis Engineering 201

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract
    • …
    corecore