1,954 research outputs found

    Microsoft COCO: Common Objects in Context

    Get PDF
    We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model

    Efficient contour-based shape representation and matching

    Get PDF
    This paper presents an efficient method for calculating the similarity between 2D closed shape contours. The proposed algorithm is invariant to translation, scale change and rotation. It can be used for database retrieval or for detecting regions with a particular shape in video sequences. The proposed algorithm is suitable for real-time applications. In the first stage of the algorithm, an ordered sequence of contour points approximating the shapes is extracted from the input binary images. The contours are translation and scale-size normalized, and small sets of the most likely starting points for both shapes are extracted. In the second stage, the starting points from both shapes are assigned into pairs and rotation alignment is performed. The dissimilarity measure is based on the geometrical distances between corresponding contour points. A fast sub-optimal method for solving the correspondence problem between contour points from two shapes is proposed. The dissimilarity measure is calculated for each pair of starting points. The lowest dissimilarity is taken as the final dissimilarity measure between two shapes. Three different experiments are carried out using the proposed approach: letter recognition using a web camera, our own simulation of Part B of the MPEG-7 core experiment ā€œCE-Shape1ā€ and detection of characters in cartoon video sequences. Results indicate that the proposed dissimilarity measure is aligned with human intuition

    Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this recordACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2-6 December 2018In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.European Union Horizon 2020CERCA Program of Generalitat de Cataluny

    Multimodal Content Delivery for Geo-services

    Get PDF
    This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device

    A web-based approach to engineering adaptive collaborative applications

    Get PDF
    Current methods employed to develop collaborative applications have to make decisions and speculate about the environment in which the application will operate within, the network infrastructure that will be used and the device type the application will operate on. These decisions and assumptions about the environment in which collaborative applications were designed to work are not ideal. These methods produce collaborative applications that are characterised as being inflexible, working on homogeneous networks and single platforms, requiring pre-existing knowledge of the data and information types they need to use and having a rigid choice of architecture. On the other hand, future collaborative applications are required to be flexible; to work in highly heterogeneous environments; be adaptable to work on different networks and on a range of device types. This research investigates the role that the Web and its various pervasive technologies along with a component-based Grid middleware can play to address these concerns. The aim is to develop an approach to building adaptive collaborative applications that can operate on heterogeneous and changing environments. This work proposes a four-layer model that developers can use to build adaptive collaborative applications. The four-layer model is populated with Web technologies such as Scalable Vector Graphics (SVG), the Resource Description Framework (RDF), Protocol and RDF Query Language (SPARQL) and Gridkit, a middleware infrastructure, based on the Open Overlays concept. The Middleware layer (the first layer of the four-layer model) addresses network and operating system heterogeneity, the Group Communication layer enables collaboration and data sharing, while the Knowledge Representation layer proposes an interoperable RDF data modelling language and a flexible storage facility with an adaptive architecture for heterogeneous data storage. And finally there is the Presentation and Interaction layer which proposes a framework (Oea) for scalable and adaptive user interfaces. The four layer model has been successfully used to build a collaborative application, called Wildfurt that overcomes challenges facing collaborative applications. This research has demonstrated new applications for cutting-edge Web technologies in the area of building collaborative applications. SVG has been used for developing superior adaptive and scalable user interfaces that can operate on different device types. RDF and RDFS, have also been used to design and model collaborative applications providing a mechanism to define classes and properties and the relationships between them. A flexible and adaptable storage facility that is able to change its architecture based on the surrounding environments and requirements has also been achieved by combining the RDF technology with the Open Overlays middleware, Gridkit

    Overview of imageCLEFlifelog 2019: solve my life puzzle and lifelog Moment retrieval

    Get PDF
    This paper describes ImageCLEFlifelog 2019, the third edition of the Lifelog task. In this edition, the task was composed of two subtasks (challenges): the Lifelog Moments Retrieval (LMRT) challenge that followed the same format as in the previous edition, and the Solve My Life Puzzle (Puzzle), a brand new task that focused on rearranging lifelog moments in temporal order. ImageCLEFlifelog 2019 received noticeably higher submissions than the previous editions, with ten teams participating resulting in a total number of 109 runs

    A framework for the assembly and delivery of multimodal graphics in E-learning environments

    Get PDF
    In recent years educators and education institutions have embraced E-Learning environments as a method of delivering content to and communicating with their learners. Particular attention needs to be paid to the accessibility of the content that each educator provides. In relation to graphics, content providers are instructed to provide textual alternatives for each graphic using either the ā€œaltā€ attribute or the ā€œlongdescā€ attribute of the HTML IMG tag. This is not always suitable for graphical concepts inherent in technical topics due to the spatial nature of the information. As there is currently no suggested alternative to the use of textual descriptions in E-Learning environments, blind learners are at a signiļ¬cant disadvantage when attempting to learn Science, Technology, Engineering or Mathematical (STEM) subjects online. A new approach is required that will provide blind learners with the same learning capabilities enjoyed by their sighted peers in relation to graphics. Multimodal graphics combine the modalities of sound and touch in order to deliver graphical concepts to blind learners. Although they have proven successful, they can be time consuming to create and often require expertise in accessible graphic design. This thesis proposes an approach based on mainstream E-Learning techniques that can support non-experts in the assembly of multimodal graphics. The approach is known as the Multimodal Graphic Assembly and Delivery Framework (MGADF). It exploits a component based Service Oriented Architecture (SOA) to provide non experts with the ability to assemble multimodal graphics and integrate them into mainstream E-Learning environments. This thesis details the design of the system architecture, information architecture and methodologies of the MGADF. Proof of concept interfaces were implemented, based on the design, that clearly demonstrate the feasibility of the approach. The interfaces were used in an end-user evaluation that assessed the beneļ¬ts of a component based approach for non-expert multimodal graphic producers
    • ā€¦
    corecore