26,911 research outputs found

    Rotation-invariant features for multi-oriented text detection in natural images.

    Get PDF
    Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

    Evaluating and combining digital video shot boundary detection algorithms

    Get PDF
    The development of standards for video encoding coupled with the increased power of computing mean that content-based manipulation of digital video information is now feasible. Shots are a basic structural building block of digital video and the boundaries between shots need to be determined automatically to allow for content-based manipulation. A shot can be thought of as continuous images from one camera at a time. In this paper we examine a variety of automatic techniques for shot boundary detection that we have implemented and evaluated on a baseline of 720,000 frames (8 hours) of broadcast television. This extends our previous work on evaluating a single technique based on comparing colour histograms. A description of each of our three methods currently working is given along with how they are evaluated. It is found that although the different methods have about the same order of magnitude in terms of effectiveness, different shot boundaries are detected by the different methods. We then look at combining the three shot boundary detection methods to produce one output result and the benefits in accuracy and performance that this brought to our system. Each of the methods were changed from using a static threshold value for three unconnected methods to one using three dynamic threshold values for one connected method. In a final summing up we look at the future directions for this work

    TRECVID 2004 - an overview

    Get PDF
    corecore