55 research outputs found

    Big Data for Traffic Estimation and Prediction: A Survey of Data and Tools

    Full text link
    Big data has been used widely in many areas including the transportation industry. Using various data sources, traffic states can be well estimated and further predicted for improving the overall operation efficiency. Combined with this trend, this study presents an up-to-date survey of open data and big data tools used for traffic estimation and prediction. Different data types are categorized and the off-the-shelf tools are introduced. To further promote the use of big data for traffic estimation and prediction tasks, challenges and future directions are given for future studies

    Statistical and Electrical Features Evaluation for Electrical Appliances Energy Disaggregation

    Get PDF
    In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.Peer reviewedFinal Published versio

    Reconciling Big Data and Thick Data to Advance the New Urban Science and Smart City Governance

    Get PDF
    Amid growing enthusiasm for a ”new urban science” and ”smart city” approaches to urban management, ”big data” is expected to create radical new opportunities for urban research and practice. Meanwhile, anthropologists, sociologists, and human geographers, among others, generate highly contextualized and nuanced data, sometimes referred to as ‘thick data,’ that can potentially complement, refine and calibrate big data analytics while generating new interpretations of the city through diverse forms of reasoning. While researchers in a range of fields have begun to consider such questions, scholars of urban affairs have not yet engaged in these discussions. The article explores how ethnographic research could be reconciled with big data-driven inquiry into urban phenomena. We orient our critical reflections around an illustrative example: road safety in Mexico City. We argue that big and thick data can be reconciled in and through three stages of the research process: research formulation, data collection and analysis, and research output and knowledge representation

    Deep Learning-Based Maritime Environment Segmentation for Unmanned Surface Vehicles Using Superpixel Algorithms

    Get PDF
    Unmanned surface vehicles (USVs) are receiving increasing attention in recent years from both academia and industry. To make a high-level autonomy for USVs, the environment situational awareness is a key capability. However, due to the richness of the features in marine environments, as well as the complexity of the environment influenced by sun glare and sea fog, the development of a reliable situational awareness system remains a challenging problem that requires further studies. This paper, therefore, proposes a new deep semantic segmentation model together with a Simple Linear Iterative Clustering (SLIC) algorithm, for an accurate perception for various maritime environments. More specifically, powered by the SLIC algorithm, the new segmentation model can achieve refined results around obstacle edges and improved accuracy for water surface obstacle segmentation. The overall structure of the new model employs an encoder–decoder layout, and a superpixel refinement is embedded before final outputs. Three publicly available maritime image datasets are used in this paper to train and validate the segmentation model. The final output demonstrates that the proposed model can provide accurate results for obstacle segmentation

    Annual Report, 2017-2018

    Get PDF

    A text segmentation approach for automated annotation of online customer reviews, based on topic modeling

    Full text link
    Online customer review classification and analysis have been recognized as an important problem in many domains, such as business intelligence, marketing, and e-governance. To solve this problem, a variety of machine learning methods was developed in the past decade. Existing methods, however, either rely on human labeling or have high computing cost, or both. This makes them a poor fit to deal with dynamic and ever-growing collections of short but semantically noisy texts of customer reviews. In the present study, the problem of multi-topic online review clustering is addressed by generating high quality bronze-standard labeled sets for training efficient classifier models. A novel unsupervised algorithm is developed to break reviews into sequential semantically homogeneous segments. Segment data is then used to fine-tune a Latent Dirichlet Allocation (LDA) model obtained for the reviews, and to classify them along categories detected through topic modeling. After testing the segmentation algorithm on a benchmark text collection, it was successfully applied in a case study of tourism review classification. In all experiments conducted, the proposed approach produced results similar to or better than baseline methods. The paper critically discusses the main findings and paves ways for future work

    Probabilistic Value Selection for Space Efficient Model

    Get PDF
    An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.Comment: Accepted in the 21st IEEE International Conference on Mobile Data Management (July 2020

    Network Threat Detection Using Machine/Deep Learning in SDN-Based Platforms: A Comprehensive Analysis of State-of-the-Art Solutions, Discussion, Challenges, and Future Research Direction

    Get PDF
    A revolution in network technology has been ushered in by software defined networking (SDN), which makes it possible to control the network from a central location and provides an overview of the network’s security. Despite this, SDN has a single point of failure that increases the risk of potential threats. Network intrusion detection systems (NIDS) prevent intrusions into a network and preserve the network’s integrity, availability, and confidentiality. Much work has been done on NIDS but there are still improvements needed in reducing false alarms and increasing threat detection accuracy. Recently advanced approaches such as deep learning (DL) and machine learning (ML) have been implemented in SDN-based NIDS to overcome the security issues within a network. In the first part of this survey paper, we offer an introduction to the NIDS theory, as well as recent research that has been conducted on the topic. After that, we conduct a thorough analysis of the most recent ML- and DL-based NIDS approaches to ensure reliable identification of potential security risks. Finally, we focus on the opportunities and difficulties that lie ahead for future research on SDN-based ML and DL for NIDS.publishedVersio

    Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications throughout the world, the amount of data stored on the Web has been growing exponentially. In this new digital era, a large number of companies have emerged with the purpose of ltering the information available on the web and provide users with interesting items. The algorithms and models used to recommend these items are called Recommender Systems. These systems are applied to a large number of domains, from music, books, or movies to dating or Point-of-Interest (POI), which is an increasingly popular domain where users receive recommendations of di erent places when they arrive to a city. In this thesis, we focus on exploiting the use of contextual information, especially temporal and sequential data, and apply it in novel ways in both traditional and Point-of-Interest recommendation. We believe that this type of information can be used not only for creating new recommendation models but also for developing new metrics for analyzing the quality of these recommendations. In one of our rst contributions we propose di erent metrics, some of them derived from previously existing frameworks, using this contextual information. Besides, we also propose an intuitive algorithm that is able to provide recommendations to a target user by exploiting the last common interactions with other similar users of the system. At the same time, we conduct a comprehensive review of the algorithms that have been proposed in the area of POI recommendation between 2011 and 2019, identifying the common characteristics and methodologies used. Once this classi cation of the algorithms proposed to date is completed, we design a mechanism to recommend complete routes (not only independent POIs) to users, making use of reranking techniques. In addition, due to the great di culty of making recommendations in the POI domain, we propose the use of data aggregation techniques to use information from di erent cities to generate POI recommendations in a given target city. In the experimental work we present our approaches on di erent datasets belonging to both classical and POI recommendation. The results obtained in these experiments con rm the usefulness of our recommendation proposals, in terms of ranking accuracy and other dimensions like novelty, diversity, and coverage, and the appropriateness of our metrics for analyzing temporal information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones en todo el mundo, la cantidad de datos almacenados en la red ha crecido exponencialmente. En esta nueva era digital, han surgido un gran n umero de empresas con el objetivo de ltrar la informaci on disponible en la red y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos utilizados para recomendar estos art culos reciben el nombre de Sistemas de Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios, desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs, en ingl es), un dominio cada vez m as popular en el que los usuarios reciben recomendaciones de diferentes lugares cuando llegan a una ciudad. En esta tesis, nos centramos en explotar el uso de la informaci on contextual, especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa tanto en la recomendaci on cl asica como en la recomendaci on de POIs. Creemos que este tipo de informaci on puede utilizarse no s olo para crear nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas m etricas para analizar la calidad de estas recomendaciones. En una de nuestras primeras contribuciones proponemos diferentes m etricas, algunas derivadas de formulaciones previamente existentes, utilizando esta informaci on contextual. Adem as, proponemos un algoritmo intuitivo que es capaz de proporcionar recomendaciones a un usuario objetivo explotando las ultimas interacciones comunes con otros usuarios similares del sistema. Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y 2019, identi cando las caracter sticas comunes y las metodolog as utilizadas. Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking. Adem as, debido a la gran di cultad de realizar recomendaciones en el ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos para utilizar la informaci on de diferentes ciudades y generar recomendaciones de POIs en una determinada ciudad objetivo. En el trabajo experimental presentamos nuestros m etodos en diferentes conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los resultados obtenidos en estos experimentos con rman la utilidad de nuestras propuestas de recomendaci on en t erminos de precisi on de ranking y de otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de apropiadas son nuestras m etricas para analizar la informaci on temporal y los sesgos en las recomendaciones producida

    AI approaches to understand human deceptions, perceptions, and perspectives in social media

    Get PDF
    Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens\u27 perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups. This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine Learning approaches for fake news detection models, and a hybrid method for topic identification, whether they are fake or real. To understand the user\u27s perceptions or attitude toward some topics, this study analyzes the sentiments expressed in social media text. The sentiment analysis of posts can be used as an indicator to measure how topics are perceived by the users and how their perceptions as a whole can affect decision makers in government and industry, especially during the COVID-19 pandemic. It is difficult to measure the public perception of government policies issued during the pandemic. The citizen responses to the government policies are diverse, ranging from security or goodwill to confusion, fear, or anger. This dissertation provides a near real-time approach to track and monitor public reactions toward government policies by continuously collecting and analyzing Twitter posts about the COVID-19 pandemic. To address the social media\u27s overwhelming number of posts, content echo-chamber, and information isolation issue, this dissertation provides a multiple view-based summarization framework where the same contents can be summarized according to different perspectives. This framework includes components of choosing the perspectives, and advanced text summarization approaches. The proposed approaches in this dissertation are demonstrated with a prototype system to continuously collect Twitter data about COVID-19 government health policies and provide analysis of citizen concerns toward the policies, and the data is analyzed for fake news detection and for generating multiple-view summaries
    corecore