7,402 research outputs found

    OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

    Full text link
    We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat

    Interaction Grammars

    Get PDF
    Interaction Grammar (IG) is a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modelling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoretic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    STV-based Video Feature Processing for Action Recognition

    Get PDF
    In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    RichTags: A Social Semantic Tagging System

    No full text
    Social tagging systems allow users associating arbitrary keywords (or tags, or labels) to resources they want to save for future recall. Such saved items are called posts or bookmarks and usually constitute shared information in social tagging systems (although access control mechanisms might be applied as well). This means that users of a social tagging system can save and share their bookmarks with each other. The term social stresses the fact that much of the usefulness of the system relies on the data the users submit and share with each other. As a member of this category of tools, RichTags aims to overcome some weaknesses of the conventional social tagging systems (folksonomies) by utilizing Semantic Web technologies. The defining characteristic of the system is that the tags constitute an ontology of meaningful concepts, which is collectively managed by the users of the system. Hence, the approach is called social semantic tagging. It overcomes the polysemy, the synonymy, and the basic level variation problems encountered in the conventional systems. As well, it offers higher precision and recall. Current realisation of semantic tagging basically concerns an effort to automatically derive semantics out of folksonomies without affecting the mechanism of tagging applied in them. In contrast, RichTags’s approach for semantic tagging is a social process relied on the collective intelligence of the users instead of automation methods. The later means that the users collectively expand the tag vocabulary throughout the tagging task, while consistency mechanisms are applied to keep the vocabulary consistent during this expansion. The basic factor that differentiates RichTags from existing proposals for the enhancement of tags with meaning is that the primary mechanism relies on human collective intelligence and not on automation methods. However, this does not mean that the proposed automation techniques could not be combined with RichTags; contrariwise they could be very useful to speed up the production of the initial set of semantic tags in the vocabulary. Finally, RichTags is not limited to enriching the tags with meaning as current efforts primarily aim to; instead it utilizes this semantic information to improve the tagging and the exploration tasks of tagging systems
    corecore