326 research outputs found

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Survey on encode biometric data for transmission in wireless communication networks

    Get PDF
    The aim of this research survey is to review an enhanced model supported by artificial intelligence to encode biometric data for transmission in wireless communication networks can be tricky as performance decreases with increasing size due to interference, especially if channels and network topology are not selected carefully beforehand. Additionally, network dissociations may occur easily if crucial links fail as redundancy is neglected for signal transmission. Therefore, we present several algorithms and its implementation which addresses this problem by finding a network topology and channel assignment that minimizes interference and thus allows a deployment to increase its throughput performance by utilizing more bandwidth in the local spectrum by reducing coverage as well as connectivity issues in multiple AI-based techniques. Our evaluation survey shows an increase in throughput performance of up to multiple times or more compared to a baseline scenario where an optimization has not taken place and only one channel for the whole network is used with AI-based techniques. Furthermore, our solution also provides a robust signal transmission which tackles the issue of network partition for coverage and for single link failures by using airborne wireless network. The highest end-to-end connectivity stands at 10 Mbps data rate with a maximum propagation distance of several kilometers. The transmission in wireless network coverage depicted with several signal transmission data rate with 10 Mbps as it has lowest coverage issue with moderate range of propagation distance using enhanced model to encode biometric data for transmission in wireless communication

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    CSSL-RHA: Contrastive Self-Supervised Learning for Robust Handwriting Authentication

    Full text link
    Handwriting authentication is a valuable tool used in various fields, such as fraud prevention and cultural heritage protection. However, it remains a challenging task due to the complex features, severe damage, and lack of supervision. In this paper, we propose a novel Contrastive Self-Supervised Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address these issues. It can dynamically learn complex yet important features and accurately predict writer identities. Specifically, to remove the negative effects of imperfections and redundancy, we design an information-theoretic filter for pre-processing and propose a novel adaptive matching scheme to represent images as patches of local regions dominated by more important features. Through online optimization at inference time, the most informative patch embeddings are identified as the "most important" elements. Furthermore, we employ contrastive self-supervised training with a momentum-based paradigm to learn more general statistical structures of handwritten data without supervision. We conduct extensive experiments on five benchmark datasets and our manually annotated dataset EN-HA, which demonstrate the superiority of our CSSL-RHA compared to baselines. Additionally, we show that our proposed model can still effectively achieve authentication even under abnormal circumstances, such as data falsification and corruption.Comment: 10 pages, 4 figures, 3 tables, submitted to ACM MM 202

    A hybrid CNN-LSTM model for predicting PM2.5 in Beijing based on spatiotemporal correlation

    Get PDF
    Long-term exposure to air environments full of suspended particles, especially PM2.5, would seriously damage people's health and life (i.e., respiratory diseases and lung cancers). Therefore, accurate PM2.5 prediction is important for the government authorities to take preventive measures. In this paper, the advantages of convolutional neural networks (CNN) and long short-term memory networks (LSTM) models are combined. Then a hybrid CNN-LSTM model is proposed to predict the daily PM2.5 concentration in Beijing based on spatiotemporal correlation. Specifically, a Pearson's correlation coefficient is adopted to measure the relationship between PM2.5 in Beijing and air pollutants in its surrounding cities. In the hybrid CNN-LSTM model, the CNN model is used to learn spatial features, while the LSTM model is used to extract the temporal information. In order to evaluate the proposed model, three evaluation indexes are introduced, including root mean square error, mean absolute percent error, and R-squared. As a result, the hybrid CNN-LSTM model achieves the best performance compared with the Multilayer perceptron model (MLP) and LSTM. Moreover, the prediction accuracy of the proposed model considering spatiotemporal correlation outperforms the same model without spatiotemporal correlation. Therefore, the hybrid CNN-LSTM model can be adopted for PM2.5 concentration prediction
    corecore