Search CORE

446 research outputs found

Grounding deep models of visual data

Author: Bargal Sarah
Publication venue
Publication date: 21/02/2019
Field of study

Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks

Boston University Institutional Repository (OpenBU)

Deep Learning in Cardiology

Author: Bizopoulos Paschalis
Koutsouris Dimitrios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2021
Field of study

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

arXiv.org e-Print Archive

Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting

Author: Aljohani Abdulah Jeza
Ghafoor Salman
Khan Haibat
Khan Saad Zafar
Muzammil Nazeefa
Zaidi Syed Mohammad Hassan
Publication venue
Publication date: 25/10/2023
Field of study

Accurately forecasting solar power generation is crucial in the global progression towards sustainable energy systems. In this study, we conduct a meticulous comparison between Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our controlled experiments reveal promising advantages of QLSTMs, including accelerated training convergence and substantially reduced test loss within the initial epoch compared to classical LSTMs. These empirical findings demonstrate QLSTM's potential to swiftly assimilate complex time series relationships, enabled by quantum phenomena like superposition. However, realizing QLSTM's full capabilities necessitates further research into model validation across diverse conditions, systematic hyperparameter optimization, hardware noise resilience, and applications to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This pioneering work provides initial evidence substantiating quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking grounded in real-world data, our study elucidates a promising trajectory for quantum learning in renewable forecasting. Additional research and development can further actualize this potential to achieve unprecedented accuracy and reliability in predicting solar power generation worldwide.Comment: 17 pages, 8 figure

arXiv.org e-Print Archive

Enhancing Face Recognition with Deep Learning Architectures: A Comprehensive Review

Author: Bhaidasna Hetal Z.
Bhaidasna Zubin C.
Swaminarayan Priya R.
Publication venue: Auricle Global Society of Education and Research
Publication date: 27/10/2023
Field of study

The progression of information discernment via facial identification and the emergence of innovative frameworks has exhibited remarkable strides in recent years. This phenomenon has been particularly pronounced within the realm of verifying individual credentials, a practice prominently harnessed by law enforcement agencies to advance the field of forensic science. A multitude of scholarly endeavors have been dedicated to the application of deep learning techniques within machine learning models. These endeavors aim to facilitate the extraction of distinctive features and subsequent classification, thereby elevating the precision of unique individual recognition. In the context of this scholarly inquiry, the focal point resides in the exploration of deep learning methodologies tailored for the realm of facial recognition and its subsequent matching processes. This exploration centers on the augmentation of accuracy through the meticulous process of training models with expansive datasets. Within the confines of this research paper, a comprehensive survey is conducted, encompassing an array of diverse strategies utilized in facial recognition. This survey, in turn, delves into the intricacies and challenges that underlie the intricate field of facial recognition within imagery analysis

International Journal on Recent and Innovation Trends in Computing and Communication

Improving short text classification through global augmentation methods

Author: Marivate Vukosi
Sefara Tshephisho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2020
Field of study

We study the effect of different approaches to text augmentation. To do this we use 3 datasets that include social media and formal text in the form of news articles. Our goal is to provide insights for practitioners and researchers on making choices for augmentation for classification use cases. We observe that Word2vec-based augmentation is a viable option when one does not have access to a formal synonym model (like WordNet-based augmentation). The use of \emph{mixup} further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model. Round-trip translation with a translation service proves to be harder to use due to cost and as such is less accessible for both normal and low resource use-cases.Comment: Final version published in CD-MAKE 2020: Machine Learning and Knowledge Extraction pp 385-39

arXiv.org e-Print Archive

HAL Descartes

UPSpace at the University of Pretoria

Recommended from our members

Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media

Author: Alani Harith
Burel Gregoire
Publication venue
Publication date: 18/05/2018
Field of study

Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (e.g., hurricane, floods, etc.) and information categories (e.g., reports on affected individuals, donations and volunteering, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNNs) and validated against traditional machine learning models. Results show that the CNN-based API results can be relied upon when dealing with specific crises with the benefits associated with the usage word embeddings

Open Research Online (The Open University)