Search CORE

291 research outputs found

DeepScores : a dataset for segmentation, detection and classification of tiny objects

Author: Elezi Ismail
Pelillo Marcello
Schmidhuber Jürgen
Stadelmann Thilo
Tuggener Lukas
Publication venue: IAPR
Publication date: 01/01/2018
Field of study

We present the DeepScores dataset with the goal of advancing the state-of-the-art in small object recognition by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred million small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, and optical music recognition (OMR) research in particular. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets. Finally, we provide baseline performances for object classification, intuition for the inherent difficulty that DeepScores poses to state-of-the-art object detectors like YOLO or R-CNN, and give pointers to future research based on this dataset

arXiv.org e-Print Archive

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Author: Mahdavi Mahshad
Publication venue: RIT Scholar Works
Publication date: 07/08/2020
Field of study

We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

RIT Scholar Works

Structure Diagram Recognition in Financial Announcements

Author: Hou Qiyu
Li Ruixuan
Qiao Meixuan
Wang Jun
Xiang Junfu
Publication venue
Publication date: 25/04/2023
Field of study

Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods

arXiv.org e-Print Archive

Symbolic and Deep Learning Based Data Representation Methods for Activity Recognition and Image Understanding at Pixel Level

Author: Karki Manohar
Publication venue: LSU Digital Commons
Publication date: 01/01/2017
Field of study

Efficient representation of large amount of data particularly images and video helps in the analysis, processing and overall understanding of the data. In this work, we present two frameworks that encapsulate the information present in such data. At first, we present an automated symbolic framework to recognize particular activities in real time from videos. The framework uses regular expressions for symbolically representing (possibly infinite) sets of motion characteristics obtained from a video. It is a uniform framework that handles trajectory-based and periodic articulated activities and provides polynomial time graph algorithms for fast recognition. The regular expressions representing motion characteristics can either be provided manually or learnt automatically from positive and negative examples of strings (that describe dynamic behavior) using offline automata learning frameworks. Confidence measures are associated with recognitions using Levenshtein distance between a string representing a motion signature and the regular expression describing an activity. We have used our framework to recognize trajectory-based activities like vehicle turns (U-turns, left and right turns, and K-turns), vehicle start and stop, person running and walking, and periodic articulated activities like digging, waving, boxing, and clapping in videos from the VIRAT public dataset, the KTH dataset, and a set of videos obtained from YouTube. Next, we present a core sampling framework that is able to use activation maps from several layers of a Convolutional Neural Network (CNN) as features to another neural network using transfer learning to provide an understanding of an input image. The intermediate map responses of a Convolutional Neural Network (CNN) contain information about an image that can be used to extract contextual knowledge about it. Our framework creates a representation that combines features from the test data and the contextual knowledge gained from the responses of a pretrained network, processes it and feeds it to a separate Deep Belief Network. We use this representation to extract more information from an image at the pixel level, hence gaining understanding of the whole image. We experimentally demonstrate the usefulness of our framework using a pretrained VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture Radar (SAR) imagery and the CAMVID dataset. Using this framework, we also reconstruct images by removing noise from noisy character images. The reconstructed images are encoded using Quadtrees. Quadtrees can be an efficient representation in learning from sparse features. When we are dealing with handwritten character images, they are quite susceptible to noise. Hence, preprocessing stages to make the raw data cleaner can improve the efficacy of their use. We improve upon the efficiency of probabilistic quadtrees by using a pixel level classifier to extract the character pixels and remove noise from the images. The pixel level denoiser uses a pretrained CNN trained on a large image dataset and uses transfer learning to aid the reconstruction of characters. In this work, we primarily deal with classification of noisy characters and create the noisy versions of handwritten Bangla Numeral and Basic Character datasets and use them and the Noisy MNIST dataset to demonstrate the usefulness of our approach

Louisiana State University

Recognizing hand-drawn diagrams in images

Author: Schäfer Bernhard
Publication venue
Publication date: 01/01/2023
Field of study

Diagrams are an essential tool in any organization. They are used to create conceptual models of anything ranging from business processes to software architectures. Despite the abundance of diagram modeling tools available, the creation of conceptual models often starts by sketching on a whiteboard or paper. However, starting with a hand-drawn diagram introduces the need to eventually digitize it, so that it can be further edited in modeling tools. To reduce the effort associated with the manual digitization of diagrams, research in hand-drawn diagram recognition aims to automate this task. While there is a large body of methods for recognizing diagrams drawn on tablets, there is a notable gap for recognizing diagrams sketched on paper or whiteboard. To close this research gap, this doctoral thesis addresses the problem of recognizing hand-drawn diagrams in images. In particular, it provides the following five main contributions. First, we collect and publish a dataset of business process diagrams sketched on paper. Given that the dataset originates from conceptual modeling tasks solved by 107 participants, it has a high degree of diversity, as reflected in various drawing styles, paper types, pens, and image-capturing methods. Second, we provide an overview of the challenges in recognizing conceptual diagrams sketched on paper. We find that conceptual modeling leads to diagrams with chaotic layouts, making the recognition of edges and labels especially challenging. Third, we propose an end-to-end system for recognizing diagrams modeled with BPMN, the standard language for modeling business processes. Given an image of a hand-drawn BPMN diagram, our system produces a BPMN XML file that can be imported into process modeling tools. The system consists of an object detection neural network, which we extend with network components for recognizing edges and labels. The following two contributions are related to these components. Fourth, we present several deep learning methods for edge recognition, which recognize the drawn path and connected shapes of each arrow. Last, we describe a label recognition method that consists of three steps, one of which features a network that predicts whether a label belongs to a specific shape or edge. To demonstrate the performance of the proposed methods, we evaluate them on both our collected and the existing diagram datasets

MAnnheim DOCument Server