119 research outputs found
A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU
Deep learning (DL) has emerged as a powerful subset of machine learning (ML)
and artificial intelligence (AI), outperforming traditional ML methods,
especially in handling unstructured and large datasets. Its impact spans across
various domains, including speech recognition, healthcare, autonomous vehicles,
cybersecurity, predictive analytics, and more. However, the complexity and
dynamic nature of real-world problems present challenges in designing effective
deep learning models. Consequently, several deep learning models have been
developed to address different problems and applications. In this article, we
conduct a comprehensive survey of various deep learning models, including
Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs),
Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer
Learning. We examine the structure, applications, benefits, and limitations of
each model. Furthermore, we perform an analysis using three publicly available
datasets: IMDB, ARAS, and Fruit-360. We compare the performance of six renowned
deep learning models: CNN, Simple RNN, Long Short-Term Memory (LSTM),
Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional GRU.Comment: 16 pages, 29 figure
A review on deep-learning-based cyberbullying detection
Bullying is described as an undesirable behavior by others that harms an individual physically, mentally, or socially. Cyberbullying is a virtual form (e.g., textual or image) of bullying or harassment, also known as online bullying. Cyberbullying detection is a pressing need in today’s world, as the prevalence of cyberbullying is continually growing, resulting in mental health issues. Conventional machine learning models were previously used to identify cyberbullying. However, current research demonstrates that deep learning surpasses traditional machine learning algorithms in identifying cyberbullying for several reasons, including handling extensive data, efficiently classifying text and images, extracting features automatically through hidden layers, and many others. This paper reviews the existing surveys and identifies the gaps in those studies. We also present a deep-learning-based defense ecosystem for cyberbullying detection, including data representation techniques and different deep-learning-based models and frameworks. We have critically analyzed the existing DL-based cyberbullying detection techniques and identified their significant contributions and the future research directions they have presented. We have also summarized the datasets being used, including the DL architecture being used and the tasks that are accomplished for each dataset. Finally, several challenges faced by the existing researchers and the open issues to be addressed in the future have been presented
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Towards Mobility Data Science (Vision Paper)
Mobility data captures the locations of moving objects such as humans,
animals, and cars. With the availability of GPS-equipped mobile devices and
other inexpensive location-tracking technologies, mobility data is collected
ubiquitously. In recent years, the use of mobility data has demonstrated
significant impact in various domains including traffic management, urban
planning, and health sciences. In this paper, we present the emerging domain of
mobility data science. Towards a unified approach to mobility data science, we
envision a pipeline having the following components: mobility data collection,
cleaning, analysis, management, and privacy. For each of these components, we
explain how mobility data science differs from general data science, we survey
the current state of the art and describe open challenges for the research
community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from
the metadata. PDF has not been change
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases
Cardiothoracic and pulmonary diseases are a significant cause of mortality and morbidity worldwide. The COVID-19 pandemic has highlighted the lack of access to clinical care, the overburdened medical system, and the potential of artificial intelligence (AI) in improving medicine. There are a variety of diseases affecting the cardiopulmonary system including lung cancers, heart disease, tuberculosis (TB), etc., in addition to COVID-19-related diseases. Screening, diagnosis, and management of cardiopulmonary diseases has become difficult owing to the limited availability of diagnostic tools and experts, particularly in resource-limited regions. Early screening, accurate diagnosis and staging of these diseases could play a crucial role in treatment and care, and potentially aid in reducing mortality. Radiographic imaging methods such as computed tomography (CT), chest X-rays (CXRs), and echo ultrasound (US) are widely used in screening and diagnosis. Research on using image-based AI and machine learning (ML) methods can help in rapid assessment, serve as surrogates for expert assessment, and reduce variability in human performance. In this Special Issue, “Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases”, we have highlighted exemplary primary research studies and literature reviews focusing on novel AI/ML methods and their application in image-based screening, diagnosis, and clinical management of cardiopulmonary diseases. We hope that these articles will help establish the advancements in AI
Taking Computation to Data: Integrating Privacy-preserving AI techniques and Blockchain Allowing Secure Analysis of Sensitive Data on Premise
PhD thesis in Information technologyWith the advancement of artificial intelligence (AI), digital pathology has seen significant progress in recent years. However, the use of medical AI raises concerns about patient data privacy. The CLARIFY project is a research project funded under the European Union’s Marie Sklodowska-Curie Actions (MSCA) program. The primary objective of CLARIFY is to create a reliable, automated digital diagnostic platform that utilizes cloud-based data algorithms and artificial intelligence to enable interpretation and diagnosis of wholeslide-images (WSI) from any location, maximizing the advantages of AI-based digital pathology.
My research as an early stage researcher for the CLARIFY project centers on securing information systems using machine learning and access control techniques. To achieve this goal, I extensively researched privacy protection technologies such as federated learning, differential privacy, dataset distillation, and blockchain. These technologies have different priorities in terms of privacy, computational efficiency, and usability. Therefore, we designed a computing system that supports different levels of privacy security, based on the concept: taking computation to data. Our approach is based on two design principles. First, when external users need to access internal data, a robust access control mechanism must be established to limit unauthorized access. Second, it implies that raw data should be processed to ensure privacy and security. Specifically, we use smart contractbased access control and decentralized identity technology at the system security boundary to ensure the flexibility and immutability of verification. If the user’s raw data still cannot be directly accessed, we propose to use dataset distillation technology to filter out privacy, or use locally trained model as data agent. Our research focuses on improving the usability of these methods, and this thesis serves as a demonstration of current privacy-preserving and secure computing technologies
Reinforcement Learning with Human Feedback for Realistic Traffic Simulation
In light of the challenges and costs of real-world testing, autonomous
vehicle developers often rely on testing in simulation for the creation of
reliable systems. A key element of effective simulation is the incorporation of
realistic traffic models that align with human knowledge, an aspect that has
proven challenging due to the need to balance realism and diversity. This works
aims to address this by developing a framework that employs reinforcement
learning with human preference (RLHF) to enhance the realism of existing
traffic models. This study also identifies two main challenges: capturing the
nuances of human preferences on realism and the unification of diverse traffic
simulation models. To tackle these issues, we propose using human feedback for
alignment and employ RLHF due to its sample efficiency. We also introduce the
first dataset for realism alignment in traffic modeling to support such
research. Our framework, named TrafficRLHF, demonstrates its proficiency in
generating realistic traffic scenarios that are well-aligned with human
preferences, as corroborated by comprehensive evaluations on the nuScenes
dataset.Comment: 9 pages, 4 figure
Automated Rhythmic Transformation of Drum Recordings
Within the creative industries, music information retrieval techniques are now being applied in a variety of music creation and production applications. Audio artists incorporate techniques from music informatics and machine learning (e.g., beat and metre detection) for generative content creation and manipulation systems within the music production setting. Here musicians, desiring a certain sound or aesthetic influenced by the style of artists they admire, may change or replace the rhythmic pattern and sound characteristics (i.e., timbre) of drums in their recordings with those from an idealised recording (e.g., in processes of redrumming and mashup creation). Automated transformation systems for rhythm and timbre can be powerful tools for music producers, allowing them to quickly and easily adjust the different elements of a drum recording to fit the overall style of a song. The aim of this thesis is to develop systems for automated transformation of rhythmic patterns of drum recordings using a subset of techniques from deep learning called deep generative models (DGM) for neural audio synthesis. DGMs such as autoencoders and generative adversarial networks have been shown to be effective for transforming musical signals in a variety of genres as well as for learning the underlying structure of datasets for generation of new audio examples. To this end, modular deep learning-based systems are presented in this thesis with evaluations which measure the extent of the rhythmic modifications generated by different modes of transformation, which include audio style transfer, drum translation and latent space manipulation. The evaluation results underscore both the strengths and constraints of DGMs for transformation of rhythmic patterns as well as neural synthesis of drum sounds within a variety of musical genres. New audio style transfer (AST) functions were specifically designed for mashup-oriented drum recording transformation. The designed loss objectives lowered the computational demands of the AST algorithm and offered rhythmic transformation capabilities which adhere to a larger rhythmic structure of the input to generate music that is both creative and realistic. To extend the transformation possibilities of DGMs, systems based on adversarial autoencoders (AAE) were proposed for drum translation and continuous rhythmic transformation of bar-length patterns. The evaluations which investigated the lower dimensional representations of the latent space of the proposed system based on AAEs with a Gaussian mixture prior (AAE-GM) highlighted the importance of the structure of the disentangled latent distributions of AAE-GM. Furthermore, the proposed system demonstrated improved performance, as evidenced by higher reconstruction metrics, when compared to traditional autoencoder models. This implies that the system can more accurately recreate complex drum sounds, ensuring that the produced rhythmic transformation maintains richness of the source material. For music producers, this means heightened fidelity in drum synthesis and the potential for more expressive and varied drum tracks, enhancing the creativity in music production. This work also enhances neural drum synthesis by introducing a new, diverse dataset of kick, snare, and hi-hat drum samples, along with multiple drum loop datasets for model training and evaluation. Overall, the work in this thesis increased the profile of the field and hopefully will attract more attention and resources to the area, which will help drive future research and development of neural rhythmic transformation systems
Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation
Prompt Tuning is emerging as a scalable and cost-effective method to
fine-tune Pretrained Language Models (PLMs), which are often referred to as
Large Language Models (LLMs). This study benchmarks the performance and
computational efficiency of Prompt Tuning and baselines for multi-label text
classification. This is applied to the challenging task of classifying
companies into an investment firm's proprietary industry taxonomy, supporting
their thematic investment strategy. Text-to-text classification is frequently
reported to outperform task-specific classification heads, but has several
limitations when applied to a multi-label classification problem where each
label consists of multiple tokens: (a) Generated labels may not match any label
in the label taxonomy; (b) The fine-tuning process lacks permutation invariance
and is sensitive to the order of the provided labels; (c) The model provides
binary decisions rather than appropriate confidence scores. Limitation (a) is
addressed by applying constrained decoding using Trie Search, which slightly
improves classification performance. All limitations (a), (b), and (c) are
addressed by replacing the PLM's language head with a classification head,
which is referred to as Prompt Tuned Embedding Classification (PTEC). This
improves performance significantly, while also reducing computational costs
during inference. In our industrial application, the training data is skewed
towards well-known companies. We confirm that the model's performance is
consistent across both well-known and less-known companies. Our overall results
indicate the continuing need to adapt state-of-the-art methods to
domain-specific tasks, even in the era of PLMs with strong generalization
abilities. We release our codebase and a benchmarking dataset at
https://github.com/EQTPartners/PTEC
- …