1,069 research outputs found

    Learning to Segment Every Thing

    Full text link
    Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world

    Fine-Grained Product Class Recognition for Assisted Shopping

    Full text link
    Assistive solutions for a better shopping experience can improve the quality of life of people, in particular also of visually impaired shoppers. We present a system that visually recognizes the fine-grained product classes of items on a shopping list, in shelves images taken with a smartphone in a grocery store. Our system consists of three components: (a) We automatically recognize useful text on product packaging, e.g., product name and brand, and build a mapping of words to product classes based on the large-scale GroceryProducts dataset. When the user populates the shopping list, we automatically infer the product class of each entered word. (b) We perform fine-grained product class recognition when the user is facing a shelf. We discover discriminative patches on product packaging to differentiate between visually similar product classes and to increase the robustness against continuous changes in product design. (c) We continuously improve the recognition accuracy through active learning. Our experiments show the robustness of the proposed method against cross-domain challenges, and the scalability to an increasing number of products with minimal re-training.Comment: Accepted at ICCV Workshop on Assistive Computer Vision and Robotics (ICCV-ACVR) 201

    Segmentation of Football Video Broadcast

    Get PDF
    In this paper a novel segmentation system for football player detection in broadcasted video is presented. Proposed detection system is a complex solution incorporating a dominant color based segmentation technique of a football playfield, a 3D playfield modeling algorithm based on Hough transform and a dedicated algorithm for player tracking, player detection system based on the combination of Histogram of Oriented Gradients (HOG) descriptors with Principal Component Analysis (PCA) and linear Support Vector Machine (SVM) classification. For the shot classification the several classification technique SVM, artificial neural network and Linear Discriminant Analysis (LDA) are used. Evaluation of the system is carried out using HD (1280×720) resolution test material. Additionally, performance of the proposed system is tested with different lighting conditions (including non-uniform pith lightning and multiple player shadows) and various camera positions. Experimental results presented in this paper show that combination of these techniques seems to be a promising solution for locating and segmenting objects in a broadcasted video

    Web-based Application for Cancerous Object Segmentation in Ultrasound Images Using Active Contour Method

    Get PDF
    Segmentation, or the process of separating clinical objects from surrounding tissue in medical images, is an important step in the Computer-Aided Diagnosis (CAD) system. The CAD system is developed to assist radiologists in diagnosing cancer malignancy, which in this research is found in ultrasound (US) medical imaging. The manual segmentation process, which cannot be accessed remotely, is a limitation of the CAD system because cancer objects are screened frequently, continuously, and at all times. Therefore, this research aims to build a user-friendly web application called COSION (Cancerous Object Segmentation) that provides easy access for radiologists to segment cancer objects in US images by adopting an active contour method called HERBAC (Hybrid Edge & Region-Based Active Contour). The waterfall method was used to develop the web application with Django as the web framework. The successfully built web application is named Cosion. Cosion was tested on 114 radiology breast and thyroid US images. Functional, portability, efficiency, reliability, expert validation, and usability testing concluded that Cosion runs well and is suitable for use with a functionality value of 0.9375, an average GTmetrix score of 96.43±0.66%, 100% stress testing percentage, 77.5% expert validation, and 75.8% usability. These quantitative performances indicate that the COSION web application is suitable for implementation in the CAD system for US medical imaging

    An efficient method for stamps recognition using Haar wavelet sub-bands

    Get PDF
    The problem facing certain organizations such as insurance companies and government institutions where a huge amount of documents is handled every day, hence an automated stamp recognition system is required. The image of the stamp may be on a different background, with different sizes, and suffers from rotating in different directions, also, the appearance of soft areas (patches) or small points as noise. Thus, the main objective of this paper is to extract and recognize the color stamp image. This paper proposed a method to recognize stamps, by using a technique named Haar wavelet sub-bands. The devised method has four stages: 1) extracts the stamp image; 2) preprocessing the image; 3) feature extraction; and 4) matching. This paper is implemented using C sharp (Microsoft Visual Studio 2012) programming language. The experiments conducted on a stamp dataset showed that the proposed method has a great capability to recognize stamps when using Haar wavelet transform with two sets of features (i.e., 100% recognition rate for energy features and 99.93% recognition rate for low order moment)

    Feedback-Based Gameplay Metrics and Gameplay Performance Segmentation: An audio-visual approach for assessing player experience.

    Get PDF
    Gameplay metrics is a method and approach that is growing in popularity amongst the game studies research community for its capacity to assess players’ engagement with game systems. Yet, little has been done, to date, to quantify players’ responses to feedback employed by games that conveys information to players, i.e., their audio-visual streams. The present thesis introduces a novel approach to player experience assessment - termed feedback-based gameplay metrics - which seeks to gather gameplay metrics from the audio-visual feedback streams presented to the player during play. So far, gameplay metrics - quantitative data about a game state and the player's interaction with the game system - are directly logged via the game's source code. The need to utilise source code restricts the range of games that researchers can analyse. By using computer science algorithms for audio-visual processing, yet to be employed for processing gameplay footage, the present thesis seeks to extract similar metrics through the audio-visual streams, thus circumventing the need for access to, whilst also proposing a method that focuses on describing the way gameplay information is broadcast to the player during play. In order to operationalise feedback-based gameplay metrics, the present thesis introduces the concept of gameplay performance segmentation which describes how coherent segments of play can be identified and extracted from lengthy game play sessions. Moreover, in order to both contextualise the method for processing metrics and provide a conceptual framework for analysing the results of a feedback-based gameplay metric segmentation, a multi-layered architecture based on five gameplay concepts (system, game world instance, spatial-temporal, degree of freedom and interaction) is also introduced. Finally, based on data gathered from game play sessions with participants, the present thesis discusses the validity of feedback-based gameplay metrics, gameplay performance segmentation and the multi-layered architecture. A software system has also been specifically developed to produce gameplay summaries based on feedback-based gameplay metrics, and examples of summaries (based on several games) are presented and analysed. The present thesis also demonstrates that feedback-based gameplay metrics can be conjointly analysed with other forms of data (such as biometry) in order to build a more complete picture of game play experience. Feedback based game-play metrics constitutes a post-processing approach that allows the researcher or analyst to explore the data however they wish and as many times as they wish. The method is also able to process any audio-visual file, and can therefore process material from a range of audio-visual sources. This novel methodology brings together game studies and computer sciences by extending the range of games that can now be researched but also to provide a viable solution accounting for the exact way players experience games

    Comparing the Line Crossing Feature Approach with the Graph Approach for Invoice Automation

    Full text link
    Di dalam dunia bisnis, invoice merupakan salah satu dokumen penting yang erat kaitannya dengan aktivitas penjualan dan pembelian di dalam suatu Perusahaan. Setiap invoice yang diperoleh oleh suatu Perusahaan akan menjadi sumber informasi atas berapa besarnya hutang atau piutang yang dimiliki oleh suatu Perusahaan. Ketika jumlah transaksi di dalam suatu Perusahaan masih tidak terlalu banyak, pencatatan invoice ke dalam database Perusahaan secara manual masih dimungkinkan. Mengingat bahwa jumlah transaksi di dalam suatu Perusahaan dapat menjadi besar, suatu metode untuk melakukan pencatatan invoice secara otomatis dapat membantu proses pencatatan data ke dalam database Perusahaan agar menjadi lebih efisien. Dua pendekatan pencatatan invoice secara otomatis yang akan dibandingkan di dalam karya ilmiah ini adalah Line Crossing Feature Approach dan Graph Approach. Berdasarkan perbandingan atas kelebihan dan kekurangan masing-masing pendekatan, Graph Approach secara pribadi dapat dinilai sebagai sebagai pendekatan yang lebih tepat untuk pencatatan invoice secara otomatis karena fleksibilitasnya dalam mengenali berbagai jenis dokumen. Kata kunci: invoice, information extraction, Line Crossing Feature Approach, Graph Approach, INFORMys metho

    VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS

    Get PDF
    This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models
    corecore