21,134 research outputs found

    Geometric image segmentation via transform invariant rank cuts

    Get PDF
    Title from PDF of title page (University of Missouri--Columbia, viewed on March 6, 2013).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Thesis advisor: Dr. Tony X. HanIncludes bibliographical references.M.S. University of Missouri--Columbia 2012."December 2012"This research propose a novel image segmentation algorithm, named as Transform Invariant Rank Cuts (TIRC). Based on salient 3D geometric information of natural scenes. The segmentation algorithm unities an emerging robust statistics technique called Robust PCA and its recent application in Transform Invariant Low-Rank Texture (TILT) extraction. This proposed novel algorithms address two critical issues that have handicapped the applications of the TILT feature. First, we propose a simple yet e cient algorithm to detect low-rank texture regions in natural images. Second, TIRC is a principled graph-cut solution to partition the TILT features into groups; each group represents a unique 3D planar structure. Using a TILT adjacency graph, the algorithm assigns a TILT feature as a node. Two nodes are connected if they are spatially adjacent, with the cut cost function defined as the total coding length of encoding the two texture regions as low-rank matrices separately. Finally, the classical graph-cut algorithm can be applied to partition the graph into sub-graphs, each of which represents a unique surface texture and 3D orientation. The efficacy and visual quality of this geometric image segmentation algorithm is demonstrated on a large urban scene database

    Efficient Scene Text Localization and Recognition with Local Character Refinement

    Full text link
    An unconstrained end-to-end text localization and recognition method is presented. The method detects initial text hypothesis in a single pass by an efficient region-based method and subsequently refines the text hypothesis using a more robust local text model, which deviates from the common assumption of region-based methods that all characters are detected as connected components. Additionally, a novel feature based on character stroke area estimation is introduced. The feature is efficiently computed from a region distance map, it is invariant to scaling and rotations and allows to efficiently detect text regions regardless of what portion of text they capture. The method runs in real time and achieves state-of-the-art text localization and recognition results on the ICDAR 2013 Robust Reading dataset

    Thematic Annotation: extracting concepts out of documents

    Get PDF
    Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

    Methods for text segmentation from scene images

    Get PDF
    Camera-captured scene/born-digital image analysis helps in the development of vision for robots to read text, transliterate or translate text, navigate and retrieve search results. However, text in such images does nor follow any standard layout, and its location within the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image. OTCYMIST method is proposed to segment text from the born-digital images. This method won the first place in ICDAR 2011 and placed in the third position in ICDAR 2013 for its performance on the text segmentation task in robust reading competitions for born-digital image data set. Here, Otsu’s binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CC’s with sufficient edge pixels are retained. The centroids of the individual CC’s are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CC’s are grouped based on their proximity in the horizontal direction to generate bounding boxes (BB’s) of text strings. Overlapping BB’s are removed using an overlap area threshold. Non-overlapping and minimally overlapping BB’s are retained for text segmentation. These BB’s are split vertically to localize text at the word level. A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However, recognizing a word, obtained by manually cropping a scene/born-digital image, is not trivial. Existing OCR engines do not handle these kinds of scene word images effectively. Our intention is to first segment the word image and then pass it to the existing OCR engines for recognition. It is advantageous in two aspects: it avoids building a character classifier from scratch and reduces the word recognition task to a word segmentation task. Here, we propose three bottom-up approaches to segment a cropped word image. These approaches choose different features at the initial stage of segmentation. Power-law transform (PLT) was applied to the pixels of the gray scale born-digital images to non-linearly enhance the histogram. The recognition rate achieved on born-digital word images is 82. 9%, which is 20% more than the top performing entry (61. 5%) in ICDAR 2011 robust reading competition. The recognition rate is 82. 7% and 64. 6% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using PLT. In addition, we applied PLT to the colour planes such as red, green, blue, intensity and lightness plane by varying the gamma value. We call this technique as Nonlinear enhancement and selection of plane (NESP) for optimal segmentation, which is an improvement over PLT. NESP chooses a particular plane with a proper gamma value based on Fisher discrimination factor. The recognition rate is 72. 8% for scene images of ICDAR 2011 robust reading competition, which is 30% higher than the best entry (41. 2%). The recognition rate is 81. 7% and 65. 9% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using NESP. Another technique, midline analysis and propagation of segmentation (MAPS), has also been proposed for word segmentation. Here, the middle row pixels of the gray scale image are first segmented and the statistics of the segmented pixels are used to assign text and non-text labels to the rest of the image pixels using min-cut method. Gaussian model is fitted on the middle row segmented pixels before the assignment of other pixels. In MAPS method, we assume the middle row pixels are least affected by any of the degradations. This assumption is validated by the good word recognition rate of 71. 7% on ICDAR 2011 robust reading competition for scene images. The recognition rate is 83. 8% and 66. 0% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using MAPS. The best reported results for ICDAR 2003 word images is 61. 1% using custom lexicons containing the list of test words. On the other hand, NESP and MAPS achieve 66. 2% and 64. 5% for ICDAR 2003 word images without using any lexicon. By using similar custom lexicon, the recognition rates for ICDAR 2003 word images go up to 74. 9% and 74. 2% for NESP and MAPS methods, respectively. We manually segmented word images and recognized these images using OCR to benchmark maximum possible recognition rate for each database. The recognition rates of the proposed methods and the benchmark results are reported on the seven publicly available word image data sets and compared with the results reported in the literature. We have designed a classifier to recognize Kannada characters and words from Chars74k data set and our own image collection, respectively. Discrete cosine transform (DCT) and block DCT are used as features to train separate classifiers. Kannada words are segmented using the same techniques (MAPS and NESP) and further segmented into groups of components, since a Kannada character may be represented by a single component or a group of components in an image. The recognition rate on Kannada words is reported for different features with and without the use of a lexicon. The obtained recognition performance for Kannada character recognition (11. 4%) is three times the best performance (3. 5%) reported in the literature. This thesis has dealt with the principal aspects of camera captured scene/born-digital text image analysis: text localization, text segmentation, and word recognition. We have benchmarked the recognition rates of five word image data sets. We conducted a multi-script robust reading competition as part of ICDAR 2013. This competition was aimed to determine whether the text localization and segmentation methods were capable of handling any text, independent of the script

    ROAM: a Rich Object Appearance Model with Application to Rotoscoping

    Get PDF
    Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given a first closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling

    Relational Reasoning Network (RRN) for Anatomical Landmarking

    Full text link
    Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for craniomaxillofacial (CMF) bones. Available methods require segmentation of the object of interest for precise landmarking. Unlike those, our purpose in this study is to perform anatomical landmarking using the inherent relation of CMF bones without explicitly segmenting them. We propose a new deep network architecture, called relational reasoning network (RRN), to accurately learn the local and the global relations of the landmarks. Specifically, we are interested in learning landmarks in CMF region: mandible, maxilla, and nasal bones. The proposed RRN works in an end-to-end manner, utilizing learned relations of the landmarks based on dense-block units and without the need for segmentation. For a given a few landmarks as input, the proposed system accurately and efficiently localizes the remaining landmarks on the aforementioned bones. For a comprehensive evaluation of RRN, we used cone-beam computed tomography (CBCT) scans of 250 patients. The proposed system identifies the landmark locations very accurately even when there are severe pathologies or deformations in the bones. The proposed RRN has also revealed unique relationships among the landmarks that help us infer several reasoning about informativeness of the landmark points. RRN is invariant to order of landmarks and it allowed us to discover the optimal configurations (number and location) for landmarks to be localized within the object of interest (mandible) or nearby objects (maxilla and nasal). To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning.Comment: 10 pages, 6 Figures, 3 Table
    corecore