99 research outputs found

    LogEvent2vec : LogEvent-to-vector based anomaly detection for large-scale logs in internet of things

    Get PDF
    Funding: This work was funded by the National Natural Science Foundation of China (Nos. 61802030), the Research Foundation of Education Bureau of Hunan Province, China (No. 19B005), and the International Cooperative Project for “Double First-Class”, CSUST (No. 2018IC24), the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education (No. JZNY201905), the Open Research Fund of the Hunan Provincial Key Laboratory of Network Investigational Technology (No. 2018WLZC003). This work was funded by the Researchers Supporting Project No. (RSP-2019/102) King Saud University, Riyadh, Saudi Arabia. Acknowledgments: We thank Researchers Supporting Project No. (RSP-2019/102) King Saud University, Riyadh, Saudi Arabia, for funding this research. We thank Francesco Cauteruccio for proofreading this paper.Peer reviewedPublisher PD

    Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

    Full text link
    Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual understanding, problem solving, and adaptability across a wide range of tasks suggest that integrating these models into urban domains could have a transformative impact on the development of smart cities. Despite growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces challenges such as a lack of clear definitions, systematic reviews, and universalizable solutions. To this end, this paper first introduces the concept of UFM and discusses the unique challenges involved in building them. We then propose a data-centric taxonomy that categorizes current UFM-related works, based on urban data modalities and types. Furthermore, to foster advancement in this field, we present a promising framework aimed at the prospective realization of UFMs, designed to overcome the identified challenges. Additionally, we explore the application landscape of UFMs, detailing their potential impact in various urban contexts. Relevant papers and open-source resources have been collated and are continuously updated at https://github.com/usail-hkust/Awesome-Urban-Foundation-Models

    Geo-Information Harvesting from Social Media Data

    Get PDF
    As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysis-ready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

    Understanding the Socio-infrastructure Systems During Disaster from Social Media Data

    Get PDF
    Our socio-infrastructure systems are becoming more and more vulnerable due to the increased severity and frequency of extreme events every year. Effective disaster management can minimize the damaging impacts of a disaster to a large extent. The ubiquitous use of social media platforms in GPS enabled smartphones offers a unique opportunity to observe, model, and predict human behavior during a disaster. This dissertation explores the opportunity of using social media data and different modeling techniques towards understanding and managing disaster more dynamically. In this dissertation, we focus on four objectives. First, we develop a method to infer individual evacuation behaviors (e.g., evacuation decision, timing, destination) from social media data. We develop an input output hidden Markov model to infer evacuation decisions from user tweets. Our findings show that using geo-tagged posts and text data, a hidden Markov model can be developed to capture the dynamics of hurricane evacuation decision. Second, we develop evacuation demand prediction model using social media and traffic data. We find that trained from social media and traffic data, a deep learning model can predict well evacuation traffic demand up to 24 hours ahead. Third, we present a multi-label classification approach to identify the co-occurrence of multiple types of infrastructure disruptions considering the sentiment towards a disruption—whether a post is reporting an actual disruption (negative), or a disruption in general (neutral), or not affected by a disruption (positive). We validate our approach for data collected during multiple hurricanes. Fourth, finally we develop an agent-based model to understand the influence of multiple information sources on risk perception dynamics and evacuation decisions. In this study, we explore the effects of socio-demographic factors and information sources such as social connectivity, neighborhood observation, and weather information and its credibility in forming risk perception dynamics and evacuation decisions

    Deep Learning from Smart City Data

    Get PDF
    Rapid urbanisation brings severe challenges on sustainable development and living quality of urban residents. Smart cities develop holistic solutions in the field of urban ecosystems using collected data from different types of Internet of Things (IoT) sources. Today, smart city research and applications have significantly surged as consequences of IoT and machine learning technological enhancement. As advanced machine learning methods, deep learning techniques provide an effective framework which facilitates data mining and knowledge discovery tasks especially in the area of computer vision and natural language processing. In recent years, researchers from various research fields attempted to apply deep learning technologies into smart city applications in order to establish a new smart city era. Much of the research effort on smart city has been made, for example, intelligence transportation, smart healthcare, public safety, etc. Meanwhile, we still face a lot of challenges as the deep learning techniques are still premature for smart city. In this thesis, we first provide a review of the latest research on the convergence of deep learning and smart city for data processing. The review is conducted from two perspectives: while the technique-oriented view presents the popular and extended deep learning models, the application-oriented view focuses on the representative application domains in smart cities. We then focus on two areas, which are intelligence transportation and social media analysis, to demonstrate how deep learning could be used in real-world applications by addressing some prominent issues, e.g., external knowledge integration, multi-modal knowledge fusion, semi-supervised or unsupervised learning, etc. In intelligent transportation area, an attention-based recurrent neural network is proposed to learn from traffic flow readings and external factors for multi-step prediction. More specifically, the attention mechanism is used to model the dynamic temporal dependencies of traffic flow data and a general fusion component is designed to incorporate the external factors. For the traffic event detection task, a multi-modal Generative Adversarial Network (mmGAN) is designed. The proposed model contains a sensor encoder and a social encoder to learn from both traffic flow sensor data and social media data. Meanwhile, the mmGAN model is extended to a semi-supervised architecture by leveraging generative adversarial training to further learn from unlabelled data. In social media analysis area, three deep neural models are proposed for crisis-related data classification and COVID-19 tweet analysis. We designed an adversarial training method to generate adversarial examples for image and textual social data to improve the robustness of multi-modal learning. As most social media data related to crisis or COVID-19 is not labelled, we then proposed two unsupervised text classification models on the basis of the state-of-the-art BERT model. We used the adversarial domain adaptation technique and the zero-shot learning framework to extract knowledge from a large amount of unlabeled social media data. To demonstrate the effectiveness of our proposed solutions for smart city applications, we have collected a large amount of real-time publicly available traffic sensor data from the California department of transportation and social media data (i.e., traffic, crisis and COVID-19) from Twitter, and built a few datasets for examining prediction or classification performances. The proposed methods successfully addressed the limitations of existing approaches and outperformed the popular baseline methods on these real-world datasets. We hope the work would move the relevant research one step further in creating truly intelligence for smart cities

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Learning Sensory Representations with Minimal Supervision

    Get PDF

    Toward enhancement of deep learning techniques using fuzzy logic: a survey

    Get PDF
    Deep learning has emerged recently as a type of artificial intelligence (AI) and machine learning (ML), it usually imitates the human way in gaining a particular knowledge type. Deep learning is considered an essential data science element, which comprises predictive modeling and statistics. Deep learning makes the processes of collecting, interpreting, and analyzing big data easier and faster. Deep neural networks are kind of ML models, where the non-linear processing units are layered for the purpose of extracting particular features from the inputs. Actually, the training process of similar networks is very expensive and it also depends on the used optimization method, hence optimal results may not be provided. The techniques of deep learning are also vulnerable to data noise. For these reasons, fuzzy systems are used to improve the performance of deep learning algorithms, especially in combination with neural networks. Fuzzy systems are used to improve the representation accuracy of deep learning models. This survey paper reviews some of the deep learning based fuzzy logic models and techniques that were presented and proposed in the previous studies, where fuzzy logic is used to improve deep learning performance. The approaches are divided into two categories based on how both of the samples are combined. Furthermore, the models' practicality in the actual world is revealed

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

    Geo-Information Harvesting from Social Media Data

    Get PDF
    As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multiperspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysisready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data
    • 

    corecore