15 research outputs found

    Data Mining in Internet of Things Systems: A Literature Review

    Get PDF
    The Internet of Things (IoT) and cloud technologies have been the main focus of recent research, allowing for the accumulation of a vast amount of data generated from this diverse environment. These data include without any doubt priceless knowledge if could correctly discovered and correlated in an efficient manner. Data mining algorithms can be applied to the Internet of Things (IoT) to extract hidden information from the massive amounts of data that are generated by IoT and are thought to have high business value. In this paper, the most important data mining approaches covering classification, clustering, association analysis, time series analysis, and outlier analysis from the knowledge will be covered. Additionally, a survey of recent work in in this direction is included. Another significant challenges in the field are collecting, storing, and managing the large number of devices along with their associated features. In this paper, a deep look on the data mining for the IoT platforms will be given concentrating on real applications found in the literatur

    BILROST: Handling Actuators of the Internet of Things through Tweets on Twitter using a Domain- Specific Language

    Get PDF
    In recent years, many investigations have appeared that combine the Internet of Things and Social Networks. Some of them addressed the interconnection of objects as Social Networks interconnect people, and others addressed the connection between objects and people. However, they usually used interfaces created for that purpose instead of using familiar interfaces for users. Why not integrate Smart Objects in traditional Social Networks? Why not control Smart Objects through natural interactions in Social Networks? The goal of this paper is to make easier to create applications that allow non-experts users to control Smart Objects actuators through Social Networks through the proposal of a novel approach to connect objects and people using Social Networks. This proposal will address how to use Twitter so that objects could perform actions based on Twitter users’ posts. Moreover, it will be presented a Domain-Specific language that could help in the task of defining the actions that objects could perform when people publish specific content on Twitter

    Semantic Linking for Event-Based Classification of Tweets

    Get PDF
    International audienceDetecting which tweets are related to events and classifying them into categories is a challenging task due to the peculiarities of Twitter language and to the lack of contextual information. We propose to face this challenge by taking advantage of the information that can be automatically acquired from external knowledge bases. In particular, we enrich and generalise the textual content of tweets by linking the Named Entities (NE) to concepts in both DBpedia and YAGO ontologies, and exploit their specific or generic types to replace NE mentions in tweets. The approach we propose in this paper is applied to build a supervised classifier to separate event-related from non event-related tweets, as well as to associate to event-related tweets the event categories defined by the Topic Detection and Tracking community (TDT). We compare Naive Bayes (NB), Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) classification algorithms, showing that NE linking and replacement improves classification performance and contributes to reducing overfitting, especially with Recurrent Neural Networks (RNN)

    Cyber–Physical–Social Frameworks for Urban Big Data Systems: A Survey

    Get PDF
    The integration of things’ data on the Web and Web linking for things’ description and discovery is leading the way towards smart Cyber–Physical Systems (CPS). The data generated in CPS represents observations gathered by sensor devices about the ambient environment that can be manipulated by computational processes of the cyber world. Alongside this, the growing use of social networks offers near real-time citizen sensing capabilities as a complementary information source. The resulting Cyber–Physical–Social System (CPSS) can help to understand the real world and provide proactive services to users. The nature of CPSS data brings new requirements and challenges to different stages of data manipulation, including identification of data sources, processing and fusion of different types and scales of data. To gain an understanding of the existing methods and techniques which can be useful for a data-oriented CPSS implementation, this paper presents a survey of the existing research and commercial solutions. We define a conceptual framework for a data-oriented CPSS and detail the various solutions for building human–machine intelligence

    A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

    Get PDF
    Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets of over four months from San Francisco Bay Area dataset and additional datasets collected from London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81% over all classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75% more likely to influence the distribution of the Twitter comments than sport, weather and crime topics

    Mineração e uso de padrões linguísticos para desambiguação de palavras e análise do discurso

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2020.A extração de informação contida em textos na web tem o potencial de alavancar uma série de aplicações, mas muitas delas requerem a captura automática da semântica exata de elementos textuais relevantes. O Twitter, por exemplo, gera diariamente centenas de milhões de pequenos textos (tweets), muitos dos quais com rica informação sobre usuários, fatos, produtos, serviços, desejos, opiniões, etc. A anotação semântica de palavras relevantes em tweets é um grande desafio, pois eles impõem dificuldades adicionais (e.g., pouca informação de contexto, agramaticalidade) para métodos automáticos realizarem uma desambiguação de qualidade, o que leva a resultados com baixa precisão e cobertura. Inclusive, porque a língua é um sistema simbólico polissêmico, que não tem uma semântica pronta, o que se manifesta acentuadamente em linguagem coloquial e particularmente em mídias sociais. As soluções atuais de anotação geralmente não conseguem encontrar o sentido correto de palavras em construções envolvendo a semântica implícita que, às vezes, é colocada intencionalmente, por exemplo, para fazer humor, ironia, jogo de palavras ou trocadilhos. Este trabalho propõe o desenvolvimento de uma abordagem para minerar padrões léxico-semânticos, com a finalidade de captar a semântica em texto para utilizar em tarefas que processam a linguagem. Estes padrões foram denominados de padrões MSC+, pois são definidos por sequências de Componentes Morfo-semânticos (MSC). Um algoritmo não-supervisionado foi desenvolvido para minerar tais padrões, que suportam a identificação de um novo tipo de característica semântica em documentos, assim como métodos para desambiguar o sentido de palavras. Os resultados de experimentos com a tarefa de Word Sense Disambiguation (WSD), em texto de mídia social, mostraram que instâncias de alguns padrões MSC+ aparecem em vários tweets, mas às vezes usando palavras diferentes para transmitir o sentido. Os testes realizados nos resultados do experimento em WSD demonstraram que a exploração dos padrões MSC+ permite mecanismos eficazes na desambiguação do sentido de palavras, levando a melhorias no estado da arte, segundo medidas de precisão, cobertura e medida-F. Os padrões MSC+ também foram explorados em experimentos com Análise do Discurso (AD) do conteúdo de diferentes obras do escritor Machado de Assis. Os experimentos revelaram a incidência de padrões morfo-semânticos que evidenciam características de obras literárias e que podem auxiliar na classificação de discurso das obras analisadas, tais como a preponderância de verbos específicos nos contos, de substantivos femininos nos romances e adjetivos nos poemas.Abstract: Information extraction from social media texts has the potential to boost a number of applications, but many of them require the automatic capture of accurate semantics of relevant textual elements. Twitter, for example, generates hundreds of millions of short texts (tweets) daily, many of which containing rich information about users, facts, products, services, desires, opinions, etc. The semantic annotation of relevant words in tweets is a challenge because social media impose additional difficulties (e.g., little context information, poor grammatical rules conformity) for automatic methods to carry out quality disambiguation. It leads to results with low accuracy and coverage. In addition, a language is a polysemic symbolic system without ready semantics for some constructs. Sometimes words have implicit semantics (e.g., to make humor, irony, wordplay). It is common in colloquial language, and particularly in social media. In this work, we propose the development of an approach to mine lexical-semantic patterns and capture the semantics of texts for use in language processing tasks. We learn these patterns, that we call MSC+ patterns, from text data defined by Morpho-semantic Components (MSC). An unsupervised algorithm was developed to mine such patterns, which support the identification of a new kind of semantic feature in documents, as well as methods for disambiguating the meaning of words. Experimental results on Word Sense Disambiguation (WSD) task, from tweets, show that instances of some MSC+ patterns arise in many tweets, but sometimes using different words to convey the sense of the respective MSC in some tweets where pattern instances appear. The exploitation of MSC+ patterns when they induce semantics on target words enables effective word sense disambiguation mechanisms leading to improvements in the state of the art (e.g., metrics such as accuracy, coverage, and F-measure). We also explored the MSC+ patterns on the Discourse Analysis (DA) with literary content. Experimental results on selected works of a Brazilian writer submitted to our algorithm reveal the incidence of distinct morpho-semantic patterns in different types of works, such as the preponderance of specific verbs in tales, feminine nouns in romances, and adjectives in poems

    Reducing non-recurrent urban traffic congestion using vehicle re-routing

    Get PDF
    Recently, with the trend of world-wide urbanization, some of the accompanying problems are getting serious, including road traffic congestion. To deal with this problem, city planners now resort to the application of the latest information and communications technologies. One example is the adaptive traffic signal control system (e.g. SCATS, SCOOT). To increase the throughput of each main intersection, it dynamically adjusts the traffic light phases according to real-time traffic conditions collected by widely deployed induction loops and sensors. Another typical application is the on-board vehicle navigation system. It can provide drivers with a personalized route according to their preferences (e.g. shortest/fastest/easiest), utilizing comprehensive geo-map data and floating car data. Dynamic traffic assignment is also one of the key proposed methodologies, as it not only benefits the individual driver, but can also provide a route assignment solution for all vehicles with guaranteed minimum average travel time. However, the non-recurrent road traffic congestion problem is still not addressed properly. Unlike the recurrent traffic congestion, which is predictable by capturing the daily traffic pattern, unexpected road traffic congestion caused by unexpected en-route events (e.g. road maintenance, an unplanned parade, car crashes, etc.), often propagates to larger areas in very short time. Consequently, the congestion level of areas around the event location will be significantly degraded. Unfortunately, the three aforementioned methods cannot reduce this unexpected congestion in real time. The contribution of this thesis firstly lies in emphasizing the importance of the dynamic time constraint for vehicle rerouting. Secondly, a framework for evaluating the performance of vehicle route planning algorithms is proposed along with a case study on the simulated scenario of Cologne city. Thirdly, based on the multi-agent architecture of SCATS, the next road rerouting (NRR) system is introduced. Each agent in NRR can use the locally available information to provide the most promising next road guidance in the face of the unexpected urban traffic congestion. In the last contribution of this thesis, further performance improvement of NRR is achieved by the provision of high-resolution, high update frequency traffic information using vehicular ad hoc networks. Moreover, NRR includes an adaptation mechanism to dynamically determine the algorithmic (i.e. factors in the heuristic routing cost function) and operational (i.e. group of agents which must be enabled) parameters. The simulation results show that in the realistic urban scenario, compared to the existing solutions, NRR can significantly reduce the average travel time and improve the travel time reliability. The results also indicate that for both rerouted and non-rerouted vehicles, NRR does not bring any obvious unfairness issue where some vehicles overwhelmingly sacrifice their own travel time to obtain global benefits for other vehicles

    Large-Scale Traffic Flow Prediction Using Deep Learning in the Context of Smart Mobility

    Get PDF
    Designing and developing a new generation of cities around the world (termed as smart cities) is fast becoming one of the ultimate solutions to overcome cities' problems such as population growth, pollution, energy crisis, and pressure demand on existing transportation infrastructure. One of the major aspects of a smart city is smart mobility. Smart mobility aims at improving transportation systems in several aspects: city logistics, info-mobility, and people-mobility. The emergence of the Internet of Car (IoC) phenomenon alongside with the development of Intelligent Transportation Systems (ITSs) opens some opportunities in improving the tra c management systems and assisting the travelers and authorities in their decision-making process. However, this has given rise to the generation of huge amount of data originated from human-device and device-device interaction. This is an opportunity and a challenge, and smart mobility will not meet its full potential unless valuable insights are extracted from these big data. Although the smart city environment and IoC allow for the generation and exchange of large amounts of data, there have not been yet well de ned and mature approaches for mining this wealth of information to bene t the drivers and traffic authorities. The main reason is most likely related to fundamental challenges in dealing with big data of various types and uncertain frequency coming from diverse sources. Mainly, the issues of types of data and uncertainty analysis in the predictions are indicated as the most challenging areas of study that have not been tackled yet. Important issues such as the nature of the data, i.e., stationary or non-stationary, and the prediction tasks, i.e., short-term or long-term, should also be taken into consideration. Based on this observation, a data-driven traffic flow prediction framework within the context of big data environment is proposed in this thesis. The main goal of this framework is to enhance the quality of traffic flow predictions, which can be used to assist travelers and traffic authorities in the decision-making process (whether for travel or management purposes). The proposed framework is focused around four main aspects that tackle major data-driven traffic flow prediction problems: the fusion of hard data for traffic flow prediction; the fusion of soft data for traffic flow prediction; prediction of non-stationary traffic flow; and prediction of multi-step traffic flow. All these aspects are investigated and formulated as computational based tools/algorithms/approaches adequately tailored to the nature of the data at hand. The first tool tackles the inherent big data problems and deals with the uncertainty in the prediction. It relies on the ability of deep learning approaches in handling huge amounts of data generated by a large-scale and complex transportation system with limited prior knowledge. Furthermore, motivated by the close correlation between road traffic and weather conditions, a novel deep-learning-based approach that predicts traffic flow by fusing the traffic history and weather data is proposed. The second tool fuses the streams of data (hard data) and event-based data (soft data) using Dempster Shafer Evidence Theory (DSET). One of the main features of the DSET is its ability to capture uncertainties in probabilities. Subsequently, an extension of DSET, namely Dempsters conditional rules for updating belief, is used to fuse traffic prediction beliefs coming from streams of data and event-based data sources. The third tool consists of a method to detect non-stationarities in the traffic flow and an algorithm to perform online adaptations of the tra c prediction model. The proposed detection approach is developed by monitoring the evolution of the spectral contents of the traffic flow. Furthermore, the approach is specfi cally developed to work in conjunction with state-of-the-art machine learning methods such as Deep Neural Network (DNN). By combining the power of frequency domain features and the known generalization capability and scalability of DNN in handling real-world data, it is expected that high prediction performances can be achieved. The last tool is developed to improve multi-step traffic flow prediction in the recursive and multi-output settings. In the recursive setting, an algorithm that augments the information about the current time-step is proposed. This algorithm is called Conditional Data as Demonstrator (C-DaD) and is an extension of an algorithm called Data as Demonstrator (DaD). Furthermore, in the multi-output setting, a novel approach of generating new history-future pairs of data that are aggregated with the original training data using Conditional Generative Adversarial Network (C-GAN) is developed. To demonstrate the capabilities of the proposed approaches, a series of experiments using arti cial and real-world data are conducted. Each of the proposed approaches is compared with the state-of-the-art or currently existing approaches

    Ontology-driven urban issues identification from social media.

    Get PDF
    As cidades em todo o mundo enfrentam muitos problemas diretamente relacionados ao espaço urbano, especialmente nos aspectos de infraestrutura. A maioria desses problemas urbanos geralmente afeta a vida de residentes e visitantes. Por exemplo, as pessoas podem relatar um carro estacionado em uma calçada que está forçando os pedestres a andar na via, ou um enorme buraco que está causando congestionamento. Além de estarem relacionados com o espaço urbano, os problemas urbanos geralmente demandam ações das autoridades municipais. Existem diversas Redes Sociais Baseadas em Localização (LBSN, em inglês) no domínio das cidades inteligentes em todo o mundo, onde as pessoas relatam problemas urbanos de forma estruturada e as autoridades locais tomam conhecimento para então solucioná-los. Com o advento das redes sociais como Facebook e Twitter, as pessoas tendem a reclamar de forma não estruturada, esparsa e imprevisível, sendo difícil identificar problemas urbanos eventualmente relatados. Dados de mídia social, especialmente mensagens do Twitter, fotos e check-ins, tem desempenhado um papel importante nas cidades inteligentes. Um problema chave é o desafio de identificar conversas específicas e relevantes ao processar dados crowdsourcing ruidosos. Neste contexto, esta pesquisa investiga métodos computacionais a fim de fornecer uma identificação automatizada de problemas urbanos compartilhados em mídias sociais. A maioria dos trabalhos relacionados depende de classificadores baseados em técnicas de aprendizado de máquina, como SVM, Naïve Bayes e Árvores de Decisão; e enfrentam problemas relacionados à representação do conhecimento semântico, legibilidade humana e capacidade de inferência. Com o objetivo de superar essa lacuna semântica, esta pesquisa investiga a Extração de Informação baseada em ontologias, a partir da perspectiva de problemas urbanos, uma vez que tais problemas podem ser semanticamente interligados em plataformas LBSN. Dessa forma, este trabalho propõe uma ontologia no domínio de Problemas Urbanos (UIDO) para viabilizar a identificação e classificação dos problemas urbanos em uma abordagem automatizada que foca principalmente nas facetas temática e geográfica. Uma avaliação experimental demonstra que o desempenho da abordagem proposta é competitivo com os algoritmos de aprendizado de máquina mais utilizados, quando aplicados a este domínio em particular.The cities worldwide face with many issues directly related to the urban space, especially in the infrastructure aspects. Most of these urban issues generally affect the life of both resident and visitant people. For example, people can report a car parked on a footpath which is forcing pedestrians to walk on the road or a huge pothole that is causing traffic congestion. Besides being related to the urban space, urban issues generally demand actions from city authorities. There are many Location-Based Social Networks (LBSN) in the smart cities domain worldwide where people complain about urban issues in a structured way and local authorities are aware to fix them. With the advent of social networks such as Facebook and Twitter, people tend to complain in an unstructured, sparse and unpredictable way, being difficult to identify urban issues eventually reported. Social media data, especially Twitter messages, photos, and check-ins, have played an important role in the smart cities. A key problem is the challenge in identifying specific and relevant conversations on processing the noisy crowdsourced data. In this context, this research investigates computational methods in order to provide automated identification of urban issues shared in social media streams. Most related work rely on classifiers based on machine learning techniques such as Support Vector Machines (SVM), Naïve Bayes and Decision Trees; and face problems concerning semantic knowledge representation, human readability and inference capability. Aiming at overcoming this semantic gap, this research investigates the ontology-driven Information Extraction (IE) from the perspective of urban issues; as such issues can be semantically linked in LBSN platforms. Therefore, this work proposes an Urban Issues Domain Ontology (UIDO) to enable the identification and classification of urban issues in an automated approach that focuses mainly on the thematic and geographical facets. Experimental evaluation demonstrates the proposed approach performance is competitive with most commonly used machine learning algorithms applied for that particular domain.CNP
    corecore