27 research outputs found

    An online handwritten music symbol recognition system

    Get PDF
    The original publication is available at www.springerlink.comArticleINTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION. 9(1): 49-58 (2007)journal articl

    Automatic analysis of electronic drawings using neural network

    Get PDF
    Neural network technique has been found to be a powerful tool in pattern recognition. It captures associations or discovers regularities with a set of patterns, where the types, number of variables or diversity of the data are very great, the relationships between variables are vaguely understood, or the relationships are difficult to describe adequately with conventional approaches. In this dissertation, which is related to the research and the system design aiming at recognizing the digital gate symbols and characters in electronic drawings, we have proposed: (1) A modified Kohonen neural network with a shift-invariant capability in pattern recognition; (2) An effective approach to optimization of the structure of the back-propagation neural network; (3) Candidate searching and pre-processing techniques to facilitate the automatic analysis of the electronic drawings. An analysis and the system performance reveal that when the shift of an image pattern is not large, and the rotation is only by nx90°, (n = 1, 2, and 3), the modified Kohonen neural network is superior to the conventional Kohonen neural network in terms of shift-invariant and limited rotation-invariant capabilities. As a result, the dimensionality of the Kohonen layer can be reduced significantly compared with the conventional ones for the same performance. Moreover, the size of the subsequent neural network, say, back-propagation feed-forward neural network, can be decreased dramatically. There are no known rules for specifying the number of nodes in the hidden layers of a feed-forward neural network. Increasing the size of the hidden layer usually improves the recognition accuracy, while decreasing the size generally improves generalization capability. We determine the optimal size by simulation to attain a balance between the accuracy and generalization. This optimized back-propagation neural network outperforms the conventional ones designed by experience in general. In order to further reduce the computation complexity and save the calculation time spent in neural networks, pre-processing techniques have been developed to remove long circuit lines in the electronic drawings. This made the candidate searching more effective

    Symbol Recognition: Current Advances and Perspectives

    Get PDF
    Abstract. The recognition of symbols in graphic documents is an intensive research activity in the community of pattern recognition and document analysis. A key issue in the interpretation of maps, engineering drawings, diagrams, etc. is the recognition of domain dependent symbols according to a symbol database. In this work we first review the most outstanding symbol recognition methods from two different points of view: application domains and pattern recognition methods. In the second part of the paper, open and unaddressed problems involved in symbol recognition are described, analyzing their current state of art and discussing future research challenges. Thus, issues such as symbol representation, matching, segmentation, learning, scalability of recognition methods and performance evaluation are addressed in this work. Finally, we discuss the perspectives of symbol recognition concerning to new paradigms such as user interfaces in handheld computers or document database and WWW indexing by graphical content

    Deep neural networks for video classification in ecology

    Get PDF
    Analyzing large volumes of video data is a challenging and time-consuming task. Automating this process would very valuable, especially in ecological research where massive amounts of video can be used to unlock new avenues of ecological research into the behaviour of animals in their environments. Deep Neural Networks, particularly Deep Convolutional Neural Networks, are a powerful class of models for computer vision. When combined with Recurrent Neural Networks, Deep Convolutional models can be applied to video for frame level video classification. This research studies two datasets: penguins and seals. The purpose of the research is to compare the performance of image-only CNNs, which treat each frame of a video independently, against a combined CNN-RNN approach; and to assess whether incorporating the motion information in the temporal aspect of video improves the accuracy of classifications in these two datasets. Video and image-only models offer similar out-of-sample performance on the simpler seals dataset but the video model led to moderate performance improvements on the more complex penguin action recognition dataset

    Neuromorphic audio processing through real-time embedded spiking neural networks.

    Get PDF
    In this work novel speech recognition and audio processing systems based on a spiking artificial cochlea and neural networks are proposed and implemented. First, the biological behavior of the animal’s auditory system is analyzed and studied, along with the classical mechanisms of audio signal processing for sound classification, including Deep Learning techniques. Based on these studies, novel audio processing and automatic audio signal recognition systems are proposed, using a bio-inspired auditory sensor as input. A desktop software tool called NAVIS (Neuromorphic Auditory VIsualizer) for post-processing the information obtained from spiking cochleae was implemented, allowing to analyze these data for further research. Next, using a 4-chip SpiNNaker hardware platform and Spiking Neural Networks, a system is proposed for classifying different time-independent audio signals, making use of a Neuromorphic Auditory Sensor and frequency studies obtained with NAVIS. To prove the robustness and analyze the limitations of the system, the input audios were disturbed, simulating extreme noisy environments. Deep Learning mechanisms, particularly Convolutional Neural Networks, are trained and used to differentiate between healthy persons and pathological patients by detecting murmurs from heart recordings after integrating the spike information from the signals using a neuromorphic auditory sensor. Finally, a similar approach is used to train Spiking Convolutional Neural Networks for speech recognition tasks. A novel SCNN architecture for timedependent signals classification is proposed, using a buffered layer that adapts the information from a real-time input domain to a static domain. The system was deployed on a 48-chip SpiNNaker platform. Finally, the performance and efficiency of these systems were evaluated, obtaining conclusions and proposing improvements for future works.Premio Extraordinario de Doctorado U

    Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification

    Get PDF
    An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way. The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level. To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments. To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors

    The Machinic Imaginary: A Post-Phenomenological Examination of Computational Society

    Get PDF
    The central claim of this thesis is the postulation of a machinic dimension of the social imaginary—a more-than-human process of creative expression of the social world. With the development of machine learning and the sociality of interactive media, computational logics have a creative capacity to produce meaning of a radically machinic order. Through an analysis of computational functions and infrastructures ranging from artificial neural networks to large-scale machine ecologies, the institution of computational logics into the social imaginary is nothing less than a reordering of the conditions of social-historical creation. Responding to dominant technopolitical propositions concerning digital culture, this thesis proposes a critical development of Cornelius Castoriadis’ philosophy of the social imaginary. To do so, a post phenomenological framework is constructed by tracing a trajectory from Maurice Merleau-Ponty’s late ontological turn, through to the process-relational philosophies of Gilbert Simondon and Castoriadis. Introducing the concept of the machinic imaginary, the thesis maps the extent to which the dynamic, interactive paradigm of twenty-first century computation is changing how meaning is socially instituted in ways incomprehensible to human sense. As social imaginary significations are increasingly created and carried by machines, the articulation of the social diverges into human and non-human worlds. This inaccessibility of the machinic imaginary is a core problematic raised by this thesis, indicating a fragmentation of the social imaginary and a novel form of existential alienation. Any political theorisation of the contemporary social condition must therefore work within this alienation and engage with the transsubjective character of social-historical creation

    Application of backpropagation-like generative algorithms to various problems.

    Get PDF
    Thesis (M.Sc.)-University of Natal, Durban, 1992.Artificial neural networks (ANNs) were originally inspired by networks of biological neurons and the interactions present in networks of these neurons. The recent revival of interest in ANNs has again focused attention on the apparent ability of ANNs to solve difficult problems, such as machine vision, in novel ways. There are many types of ANNs which differ in architecture and learning algorithms, and the list grows annually. This study was restricted to feed-forward architectures and Backpropagation- like (BP-like) learning algorithms. However, it is well known that the learning problem for such networks is NP-complete. Thus generative and incremental learning algorithms, which have various advantages and to which the NP-completeness analysis used for BP-like networks may not apply, were also studied. Various algorithms were investigated and the performance compared. Finally, the better algorithms were applied to a number of problems including music composition, image binarization and navigation and goal satisfaction in an artificial environment. These tasks were chosen to investigate different aspects of ANN behaviour. The results, where appropriate, were compared to those resulting from non-ANN methods, and varied from poor to very encouraging
    corecore