315 research outputs found
Multiple path prediction for traffic scenes using LSTMs and mixture density models
This work presents an analysis of predicting multiple future paths of moving objects in traffic scenes by leveraging Long Short-Term Memory architectures (LSTMs) and Mixture Density Networks (MDNs) in a single-shot manner. Path prediction allows estimating the future positions of objects. This is useful in important applications such as security monitoring systems, Autonomous Driver Assistance Systems and assistive technologies. Normal approaches use observed positions (tracklets) of objects in video frames to predict their future paths as a sequence of position values. This can be treated as a time series. LSTMs have achieved good performance when dealing with time series. However, LSTMs have the limitation of only predicting a single path per tracklet. Path prediction is not a deterministic task and requires predicting with a level of uncertainty. Predicting multiple paths instead of a single one is therefore a more realistic manner of approaching this task. In this work, predicting a set of future paths with associated uncertainty was archived by combining LSTMs and MDNs. The evaluation was made on the KITTI and the CityFlow datasets on three type of objects, four prediction horizons and two different points of view (image coordinates and birds-eye vie
Recommended from our members
"Colearning" - Collaborative Open Learning through OER and Social Media
This chapter introduces the concept of coLearning as well as discussing how open learning networks can produce, share and reuse OER collaboratively through social media.
COLEARNING OBJECTIVES
The aim of this investigation is to identify new forms of collaboration, as well as strategies that can be used to make the production and adaptation processes of OER more explicit for anyone in a social network to contribute.
REUSABILITY
This open content is an adapted version of a conference paper for OCW conference 2012, which was created by the same authors. This chapter can be reused by:
Educators who would like to create reusable OER (images, videos, maps, units)
Learners who are interested in tools for reusing and adapting OER
Content developers who are looking for different media to enrich OER
Social network users who would like to produce and share open media conten
A Framework to Enable the Semantic Inferencing and Querying of Multimedia Content
Cultural institutions, broadcasting companies, academic, scientific and defence organisations are producing vast quantities of digital multimedia content. With this growth in audiovisual material comes the need for standardised representations encapsulating the rich semantic meaning required to enable the automatic filtering, machine processing, interpretation and assimilation of multimedia resources. Additionally generating high-level descriptions is difficult and manual creation is expensive although significant progress has been made in recent years on automatic segmentation and low-level feature recognition for multimedia. Within this paper we describe the application of semantic web technologies to enable the generation of high-level, domain-specific, semantic descriptions of multimedia content from low-level, automatically-extracted features. By applying the knowledge reasoning capabilities provided by ontologies and inferencing rules to large, multimedia data sets generated by scientific research communities, we hope to expedite solutions to the complex scientific problems they face
A Tool for Creating, Editing and Tracking Virtual SMIL Presentations
The ability to easily find, edit and re-use content adds significant value to that content. When the content consists of complex, multimedia objects, both the difficulty of implementing such capabilities and the added-value are multiplied. The work described here is based on an archive of SMIL presentations, built and indexed using tools, developed by the authors, which enabled digitized videos of lectures to be automatically synchronized with their corresponding PowerPoint slides and recorded metadata about the lecture context, origin of the video and slide files and the temporal alignment information. As the archive grew, it became clear that the ability to edit, update or customize existing presentations by deleting, adding or replacing specific slides without re-filming the entire lecture was required. This paper describes an application that provides such functionality by enabling the easy editing, repurposing and tracking of presentations for web-based distance learning
Conservation of effort in feature selection for image annotation
This paper describes an evaluation of a number of subsets of features for the purpose of image annotation using a non-parametric density estimation algorithm (described in). By applying some general recommendations from the literature and through evaluating a range of low-level visual feature configurations and subsets, we achieve an improvement in performance, measured by the mean average precision, from 0.2861 to 0.3800. We demonstrate the significant impact that the choice of visual or low-level features can have on an automatic image annotation system. There is often a large set of possible features that may be used and a corresponding large number of variables that can be configured or tuned for each feature in addition to other options for the annotation approach. Judicious and effective selection of features for image annotation is required to achieve the best performance with the least user design effort. We discuss the performance of the chosen feature subsets in comparison with previous results and propose some general recommendations observed from the work so far
People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
In this paper we propose a technique to adapt a convolutional neural network
(CNN) based object counter to additional visual domains and object types while
still preserving the original counting function. Domain-specific normalisation
and scaling operators are trained to allow the model to adjust to the
statistical distributions of the various visual domains. The developed
adaptation technique is used to produce a singular patch-based counting
regressor capable of counting various object types including people, vehicles,
cell nuclei and wildlife. As part of this study a challenging new cell counting
dataset in the context of tissue culture and patient diagnosis is constructed.
This new collection, referred to as the Dublin Cell Counting (DCC) dataset, is
the first of its kind to be made available to the wider computer vision
community. State-of-the-art object counting performance is achieved in both the
Shanghaitech (parts A and B) and Penguins datasets while competitive
performance is observed on the TRANCOS and Modified Bone Marrow (MBM) datasets,
all using a shared counting model.Comment: 10 page
Dynamic Generation of Intelligent Multimedia Presentations Through Semantic Inferencing
This paper first proposes a high-level architecture for semi-automatically generating multimedia presentations by combining semantic inferencing with multimedia presentation generation tools. It then describes a system, based on this architecture, which was developed as a service to run over OAI archives - but is applicable to any repositories containing mixed-media resources described using Dublin Core. By applying an iterative sequence of searches across the Dublin Core metadata, published by the OAI data providers, semantic relationships can be inferred between the mixed-media objects which are retrieved. Using predefined mapping rules, these semantic relationships are then mapped to spatial and temporal relationships between the objects. The spatial and temporal relationships are expressed within SMIL files which can be replayed as multimedia presentations. Our underlying hypothesis is that by using automated computer processing of metadata to organize and combine semantically-related objects within multimedia presentations, the system may be able to generate new knowledge by exposing previously unrecognized connections. In addition, the use of multilayered information-rich multimedia to present the results, enables faster and easier information browsing, analysis, interpretation and deduction by the end-user
MediaEval 2019: concealed FGSM perturbations for privacy preservation
This work tackles the Pixel Privacy task put forth by MediaEval 2019. Our goal is to decrease the accuracy of a classification algorithm while preserving the original image quality. We use the fast gradient sign method, which normally has a corrupting influence on image appeal, and devise two methods to minimize the damage. The first approach uses a map that is a combination of salient and flat areas. Perturbations are more noticeable in these locations, and so are directed away from them. The second approach adds the gradient of an aesthetic algorithm to the gradient of the attacking algorithm to guide the perturbations towards a direction that preserves appeal. We make our code available at: https://git.io/JesX
Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture
Deep learning based visual-linguistic multimodal models such as Contrastive
Language Image Pre-training (CLIP) have become increasingly popular recently
and are used within text-to-image generative models such as DALL-E and Stable
Diffusion. However, gender and other social biases have been uncovered in these
models, and this has the potential to be amplified and perpetuated through AI
systems. In this paper, we present a methodology for auditing multimodal models
that consider gender, informed by concepts from transnational feminism,
including regional and cultural dimensions. Focusing on CLIP, we found evidence
of significant gender bias with varying patterns across global regions. Harmful
stereotypical associations were also uncovered related to visual cultural cues
and labels such as terrorism. Levels of gender bias uncovered within CLIP for
different regions aligned with global indices of societal gender equality, with
those from the Global South reflecting the highest levels of gender bias.Comment: Selected for publication at the Aequitas 2023: Workshop on Fairness
and Bias in AI | co-located with ECAI 2023, Krak\'ow, Polan
- …