326 research outputs found
Towards robust real-world historical handwriting recognition
In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Survey on encode biometric data for transmission in wireless communication networks
The aim of this research survey is to review an enhanced model supported by artificial intelligence to encode biometric data for transmission in wireless communication networks can be tricky as performance decreases with increasing size due to interference, especially if channels and network topology are not selected carefully beforehand. Additionally, network dissociations may occur easily if crucial links fail as redundancy is neglected for signal transmission. Therefore, we present several algorithms and its implementation which addresses this problem by finding a network topology and channel assignment that minimizes interference and thus allows a deployment to increase its throughput performance by utilizing more bandwidth in the local spectrum by reducing coverage as well as connectivity issues in multiple AI-based techniques. Our evaluation survey shows an increase in throughput performance of up to multiple times or more compared to a baseline scenario where an optimization has not taken place and only one channel for the whole network is used with AI-based techniques. Furthermore, our solution also provides a robust signal transmission which tackles the issue of network partition for coverage and for single link failures by using airborne wireless network. The highest end-to-end connectivity stands at 10 Mbps data rate with a maximum propagation distance of several kilometers. The transmission in wireless network coverage depicted with several signal transmission data rate with 10 Mbps as it has lowest coverage issue with moderate range of propagation distance using enhanced model to encode biometric data for transmission in wireless communication
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
CSSL-RHA: Contrastive Self-Supervised Learning for Robust Handwriting Authentication
Handwriting authentication is a valuable tool used in various fields, such as
fraud prevention and cultural heritage protection. However, it remains a
challenging task due to the complex features, severe damage, and lack of
supervision. In this paper, we propose a novel Contrastive Self-Supervised
Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address
these issues. It can dynamically learn complex yet important features and
accurately predict writer identities. Specifically, to remove the negative
effects of imperfections and redundancy, we design an information-theoretic
filter for pre-processing and propose a novel adaptive matching scheme to
represent images as patches of local regions dominated by more important
features. Through online optimization at inference time, the most informative
patch embeddings are identified as the "most important" elements. Furthermore,
we employ contrastive self-supervised training with a momentum-based paradigm
to learn more general statistical structures of handwritten data without
supervision. We conduct extensive experiments on five benchmark datasets and
our manually annotated dataset EN-HA, which demonstrate the superiority of our
CSSL-RHA compared to baselines. Additionally, we show that our proposed model
can still effectively achieve authentication even under abnormal circumstances,
such as data falsification and corruption.Comment: 10 pages, 4 figures, 3 tables, submitted to ACM MM 202
A hybrid CNN-LSTM model for predicting PM2.5 in Beijing based on spatiotemporal correlation
Long-term exposure to air environments full of suspended particles, especially PM2.5, would seriously damage people's health and life (i.e., respiratory diseases and lung cancers). Therefore, accurate PM2.5 prediction is important for the government authorities to take preventive measures. In this paper, the advantages of convolutional neural networks (CNN) and long short-term memory networks (LSTM) models are combined. Then a hybrid CNN-LSTM model is proposed to predict the daily PM2.5 concentration in Beijing based on spatiotemporal correlation. Specifically, a Pearson's correlation coefficient is adopted to measure the relationship between PM2.5 in Beijing and air pollutants in its surrounding cities. In the hybrid CNN-LSTM model, the CNN model is used to learn spatial features, while the LSTM model is used to extract the temporal information. In order to evaluate the proposed model, three evaluation indexes are introduced, including root mean square error, mean absolute percent error, and R-squared. As a result, the hybrid CNN-LSTM model achieves the best performance compared with the Multilayer perceptron model (MLP) and LSTM. Moreover, the prediction accuracy of the proposed model considering spatiotemporal correlation outperforms the same model without spatiotemporal correlation. Therefore, the hybrid CNN-LSTM model can be adopted for PM2.5 concentration prediction
- …