527 research outputs found
The Impact of Different Image Thresholding based Mammogram Image Segmentation- A Review
Images are examined and discretized numerical capacities. The goal of computerized image processing is to enhance the nature of pictorial data and to encourage programmed machine elucidation. A computerized imaging framework ought to have fundamental segments for picture procurement, exceptional equipment for encouraging picture applications, and a tremendous measure of memory for capacity and info/yield gadgets. Picture segmentation is the field broadly scrutinized particularly in numerous restorative applications and still offers different difficulties for the specialists. Segmentation is a critical errand to recognize districts suspicious of tumor in computerized mammograms. Every last picture have distinctive sorts of edges and diverse levels of limits. In picture transforming, the most regularly utilized strategy as a part of extricating articles from a picture is "thresholding". Thresholding is a prevalent device for picture segmentation for its straightforwardness, particularly in the fields where ongoing handling is required
Text Recognition Past, Present and Future
Text recognition in various images is a research domain which attempts to develop a computer programs with a feature to read the text from images by the computer. Thus there is a need of character recognition mechanisms which results Document Image Analysis (DIA) which changes different documents in paper format computer generated electronic format. In this paper we have read and analyzed various methods for text recognition from different types of text images like scene images, text images, born digital images and text from videos. Text Recognition is an easy task for people who can read, but to make a computer that does character recognition is highly difficult task. The reasons behind this might be variability, abstraction and absence of various hard-and-fast rules that locate the appearance of a visual character in various text images. Therefore rules that is to be applied need to be very heuristically deduced from samples domain. This paper gives a review for various existing methods. The objective of this paper is to give a summary on well-known methods
The Impact of Different Image Thresholding based Mammogram Image Segmentation- A Review
Images are examined and discretized numerical capacities. The goal of computerized image processing is to enhance the nature of pictorial data and to encourage programmed machine elucidation. A computerized imaging framework ought to have fundamental segments for picture procurement, exceptional equipment for encouraging picture applications, and a tremendous measure of memory for capacity and info/yield gadgets. Picture segmentation is the field broadly scrutinized particularly in numerous restorative applications and still offers different difficulties for the specialists. Segmentation is a critical errand to recognize districts suspicious of tumor in computerized mammograms. Every last picture have distinctive sorts of edges and diverse levels of limits. In picture transforming, the most regularly utilized strategy as a part of extricating articles from a picture is "thresholding". Thresholding is a prevalent device for picture segmentation for its straightforwardness, particularly in the fields where ongoing handling is required
Parking lot monitoring system using an autonomous quadrotor UAV
The main goal of this thesis is to develop a drone-based parking lot monitoring system using low-cost hardware and open-source software. Similar to wall-mounted surveillance cameras, a drone-based system can monitor parking lots without affecting the flow of traffic while also offering the mobility of patrol vehicles. The Parrot AR Drone 2.0 is the quadrotor drone used in this work due to its modularity and cost efficiency. Video and navigation data (including GPS) are communicated to a host computer using a Wi-Fi connection. The host computer analyzes navigation data using a custom flight control loop to determine control commands to be sent to the drone. A new license plate recognition pipeline is used to identify license plates of vehicles from video received from the drone
Detecting Multilingual Lines of Text with Fusion Moves
This thesis proposes an optimization-based algorithm for detecting lines of text in images taken by hand-held cameras. The majority of existing methods for this problem assume alphabet-based texts (e.g. in Latin or Greek) and they use heuristics specific to such texts: proximity between letters within one line, larger distance between separate lines, etc. We are interested in a more challenging problem where images combine alphabet and logographic characters from multiple languages where typographic rules vary a lot (e.g. English, Korean, and Chinese). Significantly higher complexity of fitting multiple lines of text in different languages calls for an energy-based formulation combining a data fidelity term and a regularization prior. Our data cost combines geometric errors and likelihoods given by a classifier trained to low-level features in each language. Our regularization term encourages sparsity based on label costs. Our energy can be efficiently minimized by fusion moves. The algorithm was evaluated on a database of images from the subway of metropolitan area of Seoul and was proven to be robust
Recommended from our members
Multimodal Indexing of Presentation Videos
This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z
An MRF Model for Binarization of Natural Scene Text
International audienceInspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy
- …