299 research outputs found
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
In today's digital age, Convolutional Neural Networks (CNNs), a subset of
Deep Learning (DL), are widely used for various computer vision tasks such as
image classification, object detection, and image segmentation. There are
numerous types of CNNs designed to meet specific needs and requirements,
including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention,
depthwise convolutions, and NAS, among others. Each type of CNN has its unique
structure and characteristics, making it suitable for specific tasks. It's
crucial to gain a thorough understanding and perform a comparative analysis of
these different CNN types to understand their strengths and weaknesses.
Furthermore, studying the performance, limitations, and practical applications
of each type of CNN can aid in the development of new and improved
architectures in the future. We also dive into the platforms and frameworks
that researchers utilize for their research or development from various
perspectives. Additionally, we explore the main research fields of CNN like 6D
vision, generative models, and meta-learning. This survey paper provides a
comprehensive examination and comparison of various CNN architectures,
highlighting their architectural differences and emphasizing their respective
advantages, disadvantages, applications, challenges, and future trends
User Modeling and User Profiling: A Comprehensive Survey
The integration of artificial intelligence (AI) into daily life, particularly
through information retrieval and recommender systems, has necessitated
advanced user modeling and profiling techniques to deliver personalized
experiences. These techniques aim to construct accurate user representations
based on the rich amounts of data generated through interactions with these
systems. This paper presents a comprehensive survey of the current state,
evolution, and future directions of user modeling and profiling research. We
provide a historical overview, tracing the development from early stereotype
models to the latest deep learning techniques, and propose a novel taxonomy
that encompasses all active topics in this research area, including recent
trends. Our survey highlights the paradigm shifts towards more sophisticated
user profiling methods, emphasizing implicit data collection, multi-behavior
modeling, and the integration of graph data structures. We also address the
critical need for privacy-preserving techniques and the push towards
explainability and fairness in user modeling approaches. By examining the
definitions of core terminology, we aim to clarify ambiguities and foster a
clearer understanding of the field by proposing two novel encyclopedic
definitions of the main terms. Furthermore, we explore the application of user
modeling in various domains, such as fake news detection, cybersecurity, and
personalized education. This survey serves as a comprehensive resource for
researchers and practitioners, offering insights into the evolution of user
modeling and profiling and guiding the development of more personalized,
ethical, and effective AI systems.Comment: 71 page
A Review of Text Corpus-Based Tourism Big Data Mining
With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years
A Review of Text Corpus-Based Tourism Big Data Mining
With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years
When Urban Region Profiling Meets Large Language Models
Urban region profiling from web-sourced data is of utmost importance for
urban planning and sustainable development. We are witnessing a rising trend of
LLMs for various fields, especially dealing with multi-modal data research such
as vision-language learning, where the text modality serves as a supplement
information for the image. Since textual modality has never been introduced
into modality combinations in urban region profiling, we aim to answer two
fundamental questions in this paper: i) Can textual modality enhance urban
region profiling? ii) and if so, in what ways and with regard to which aspects?
To answer the questions, we leverage the power of Large Language Models (LLMs)
and introduce the first-ever LLM-enhanced framework that integrates the
knowledge of textual modality into urban imagery profiling, named LLM-enhanced
Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP).
Specifically, it first generates a detailed textual description for each
satellite image by an open-source Image-to-Text LLM. Then, the model is trained
on the image-text pairs, seamlessly unifying natural language supervision for
urban visual representation learning, jointly with contrastive loss and
language modeling loss. Results on predicting three urban indicators in four
major Chinese metropolises demonstrate its superior performance, with an
average improvement of 6.1% on R^2 compared to the state-of-the-art methods.
Our code and the image-language dataset will be released upon paper
notification
IMAGE GEOLOCALIZATION AND ITS APPLICATION TO MEDIA FORENSICS
Image geo-localization is an important research problem. In recent years, the IARPA Finder program gathers many researchers to develop the technology to address the geo-localization task. One particularly effective approach is utilizing the large-scale ground-level image and/or overhead imagery with image matching techniques for image geo-localization. In this dissertation, we focus on two different aspects of geo-localization. First, we focus on indoor image and use geo-localization to recognize different business venues. Second, we address the venerability of such a computer vision system and apply geo-localization to solve media forensics problems such as content manipulation and meta-data manipulation.
With the prevalence of social media platforms, media shared on the Internet can reach millions of people in a short time. Sheer amounts of media available on the Internet enable many different computer vision applications. However, at the same time, people can easily share a tampered media for malicious goals such as creating panic or distorting public opinions with little effort.
We first propose an image localization framework for extracting fine-grained location information (i.e. business venues) from images. Our framework utilizes the information available from social media websites such as Instagram and Yelp to extract a set of location-related concepts. Using these concepts with a multi-modal recognition model, we were able to extract location information based on the image content.
Secondly, to make a robust system, we address the metadata tampering detection problem, detecting the discrepancy between the images and its associated metadata such as GPS and timestamp. We propose a multi-task learning model to verify its authenticity by detecting the discrepancy between image content and its metadata. Our model first detects meteorological properties such as weather condition, sun angle, and temperatures from the image content and comparing it with the information from the online weather database. To facilitate the training and evaluating of our model, we create a large-scale outdoor dataset labeled with meteorological properties.
Thirdly, we address the event verification problem by designing a convolutional neural networks configuration specifically target for image localization. The proposed networks utilize the bilinear pooling layer and attention module to extract detail location information from the image content.
Forth, we present a generative model to generate realistic image compositing using adversarial learning, which can be used to further improve the image tampering detection model. Finally, we propose an object-based provenance approach to address the content manipulation problem in media forensics
Graph representation learning for security analytics in decentralized software systems and social networks
With the rapid advancement in digital transformation, various daily interactions, transactions, and operations typically depend on extensive network-structured systems. The inherent complexity of these platforms has become a critical challenge in ensuring their security and robustness, with impacts spanning individual users to large-scale organizations. Graph representation learning has emerged as a potential methodology to address various security analytics within these complex systems, especially in software code and social network analysis, and its applications in criminology. For software code, graph representations can capture the information of control-flow graphs and call graphs, which can be leveraged to detect vulnerabilities and improve software reliability. In the case of social network analysis in criminal investigation, graph representations can capture the social connections and interactions between individuals, which can be used to identify key players, detect illegal activities, and predict new/unobserved criminal cases.
In this thesis, we focus on two critical security topics using graph learning-based approaches: (1) addressing criminal investigation issues and (2) detecting vulnerabilities of Ethereum blockchain smart contracts. First, we propose the SoChainDB database, which facilitates obtaining data from blockchain-based social networks and conducting extensive analyses to understand Hive blockchain social data. Moreover, to apply social network analysis in criminal investigation, two graph-based machine learning frameworks are presented to address investigation issues in a burglary use case, one being transductive link prediction and the other being inductive link prediction.Then, we propose MANDO, an approach that utilizes a new heterogeneous graph representation of control-flow graphs and call graphs to learn the structures of heterogeneous contract graphs. Building upon MANDO, two deep graph learning-based frameworks, MANDO-GURU and MANDO-HGT, are proposed for accurate vulnerability detection at both the coarse-grained contract and fine-grained line levels. Empirical results show that MANDO frameworks significantly improve the detection accuracy of other state-of-the-art techniques for various vulnerability types in either source code or bytecode
- …