420 research outputs found
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
Employment of artificial intelligence mechanisms for e-Health systems in order to obtain vital signs and detect diseases from medical images improving the processes of online consultations and diagnosis
Nowadays e-Health web applications allow doctors to access different types of features, such
as knowing which medication the patient has consumed or performing online consultations.
Internet systems for healthcare can be improved by using artificial intelligence
mechanisms for the process of detecting diseases and obtaining biological data, allowing
medical professionals to have important information that facilitates the diagnosis process and
the choice of the correct treatment for each particular person.
The proposed research work aims to present an innovative approach when compared
to traditional platforms, by providing online vital signs in real time, access to a web
stethoscope, to a medical image uploader that predicts if a certain disease is present, through
deep learning methods, and also allows the visualization of all historical data of a patient.
This dissertation has the objective of defending the concept of online consultations,
providing complementary functionalities to the traditional methods for performing medical
diagnoses through the use of software engineering practices.
The process of obtaining vital signs was done via artificial intelligence using a
computer camera as sensor. This methodology requires that the user is at a state of rest
during the measurements.
This investigation led to the conclusion that, in the future, many medical processes
will most likely be done online, where this practice is considered extremely helpful for the
analysis and treatment of contagious diseases, or cases that require constant monitoring.No quotidiano, as aplicações Web e-Saúde permitem aos médicos acesso a diferentes tipos
de funcionalidades, como saber qual a medicação que o doente consumiu ou a realização
de consultas online.
Os sistemas via internet para a saúde podem ser melhorados, utilizando mecanismos
de inteligência artificial para os processos de deteção de doenças e de obtenção de dados
biológicos, permitindo que os médicos tenham informações importantes que facilitam o
processo de diagnóstico ou a escolha do tratamento correto para um determinado utente.
O trabalho de investigação proposto pretende apresentar uma abordagem inovadora
na comparação com as plataformas tradicionais, ao disponibilizar sinais vitais online em
tempo real, acesso a um estetoscópio web, a um uploader de imagens médicas que prevê
se uma determinada doença está presente, através de métodos de aprendizagem profunda,
bem como permite visualizar todos os dados históricos de um paciente.
Esta dissertação visa defender o conceito de consultas virtuais, providenciando
funcionalidades complementares aos processos tradicionais de realização de um diagnóstico
médico, através da utilização de práticas de engenharia de software.
O processo de obtenção de sinais vitais foi feito através de inteligência artificial para
visão computacional utilizando uma câmara de computador. Esta metodologia requer que o
utilizador esteja em estado de repouso durante a obtenção dos dados medidos.
Esta investigação permitiu concluir que, no futuro, muitos processos médicos atuais
provavelmente serão feitos online, sendo esta prática considerada extremamente útil na
análise e tratamento de doenças contagiosas, ou de casos que requerem acompanhamento
constante
Learning to Hash-tag Videos with Tag2Vec
User-given tags or labels are valuable resources for semantic understanding
of visual media such as images and videos. Recently, a new type of labeling
mechanism known as hash-tags have become increasingly popular on social media
sites. In this paper, we study the problem of generating relevant and useful
hash-tags for short video clips. Traditional data-driven approaches for tag
enrichment and recommendation use direct visual similarity for label transfer
and propagation. We attempt to learn a direct low-cost mapping from video to
hash-tags using a two step training process. We first employ a natural language
processing (NLP) technique, skip-gram models with neural network training to
learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a
corpus of 10 million hash-tags. We then train an embedding function to map
video features to the low-dimensional Tag2vec space. We learn this embedding
for 29 categories of short video clips with hash-tags. A query video without
any tag-information can then be directly mapped to the vector space of tags
using the learned embedding and relevant tags can be found by performing a
simple nearest-neighbor retrieval in the Tag2Vec space. We validate the
relevance of the tags suggested by our system qualitatively and quantitatively
with a user study
Quaero at TRECVID 2013: Semantic Indexing and Instance Search
International audienceThe Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval1. LIG participated to the semantic indexing main task, localization task and concept pair task. LIG also participated to the organization of this task. This paper describes these participations which are quite similar to our previous year's participations. For the semantic indexing main task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classiffication, fusion of descriptor variants, higher-level fusion, and re-ranking. We used a number of different descriptors and a hierarchical fusion strategy. We also used conceptual feedback by adding a vector of classiffication score to the pool of descriptors. The best Quaero run has a Mean Inferred Average Precision of 0.2848, which ranked us 2nd out of 26 participants. We also co-organized the TRECVid SIN 2013 task and collaborative annotation
Focused image search in the social Web.
Recently, social multimedia-sharing websites, which allow users to upload, annotate, and share online photo or video collections, have become increasingly popular. The user tags or annotations constitute the new multimedia meta-data . We present an image search system that exploits both image textual and visual information. First, we use focused crawling and DOM Tree based web data extraction methods to extract image textual features from social networking image collections. Second, we propose the concept of visual words to handle the image\u27s visual content for fast indexing and searching. We also develop several user friendly search options to allow users to query the index using words and image feature descriptions (visual words). The developed image search system tries to bridge the gap between the scalable industrial image search engines, which are based on keyword search, and the slower content based image retrieval systems developed mostly in the academic field and designed to search based on image content only. We have implemented a working prototype by crawling and indexing over 16,056 images from flickr.com, one of the most popular image sharing websites. Our experimental results on a working prototype confirm the efficiency and effectiveness of the methods, that we proposed
Poet: Product-oriented Video Captioner for E-commerce
In e-commerce, a growing number of user-generated videos are used for product
promotion. How to generate video descriptions that narrate the user-preferred
product characteristics depicted in the video is vital for successful
promoting. Traditional video captioning methods, which focus on routinely
describing what exists and happens in a video, are not amenable for
product-oriented video captioning. To address this problem, we propose a
product-oriented video captioner framework, abbreviated as Poet. Poet firstly
represents the videos as product-oriented spatial-temporal graphs. Then, based
on the aspects of the video-associated product, we perform knowledge-enhanced
spatial-temporal inference on those graphs for capturing the dynamic change of
fine-grained product-part characteristics. The knowledge leveraging module in
Poet differs from the traditional design by performing knowledge filtering and
dynamic memory modeling. We show that Poet achieves consistent performance
improvement over previous methods concerning generation quality, product
aspects capturing, and lexical diversity. Experiments are performed on two
product-oriented video captioning datasets, buyer-generated fashion video
dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from
Mobile Taobao. We will release the desensitized datasets to promote further
investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding
FrameProv: Towards End-To-End Video Provenance
Video feeds are often deliberately used as evidence, as in the case of CCTV
footage; but more often than not, the existence of footage of a supposed event
is perceived as proof of fact in the eyes of the public at large. This reliance
represents a societal vulnerability given the existence of easy-to-use editing
tools and means to fabricate entire video feeds using machine learning. And, as
the recent barrage of fake news and fake porn videos have shown, this isn't
merely an academic concern, it is actively been exploited. I posit that this
exploitation is only going to get more insidious. In this position paper, I
introduce a long term project that aims to mitigate some of the most egregious
forms of manipulation by embedding trustworthy components in the video
transmission chain. Unlike earlier works, I am not aiming to do tamper
detection or other forms of forensics -- approaches I think are bound to fail
in the face of the reality of necessary editing and compression -- instead, the
aim here is to provide a way for the video publisher to prove the integrity of
the video feed as well as make explicit any edits they may have performed. To
do this, I present a novel data structure, a video-edit specification language
and supporting infrastructure that provides end-to-end video provenance, from
the camera sensor to the viewer. I have implemented a prototype of this system
and am in talks with journalists and video editors to discuss the best ways
forward with introducing this idea to the mainstream
Natural language processing based advanced method of unnecessary video detection
In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions
AXES at TRECVID 2012: KIS, INS, and MED
The AXES project participated in the interactive instance search task (INS), the known-item search task (KIS), and the multimedia event detection task (MED) for TRECVid 2012. As in our TRECVid 2011 system, we used nearly identical search systems and user interfaces for both INS and KIS. Our interactive INS and KIS systems focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our KIS experiments were media professionals from the BBC; our INS experiments were carried out by students and researchers at Dublin City University. We performed comparatively well in both experiments. Our best KIS run found 13 of the 25 topics, and our best INS runs outperformed all other submitted runs in terms of P@100. For MED, the system presented was based on a minimal number of low-level descriptors, which we chose to be as large as computationally feasible. These descriptors are aggregated to produce high-dimensional video-level signatures, which are used to train a set of linear classifiers. Our MED system achieved the second-best score of all submitted runs in the main track, and best score in the ad-hoc track, suggesting that a simple system based on state-of-the-art low-level descriptors can give relatively high performance. This paper describes in detail our KIS, INS, and MED systems and the results and findings of our experiments
- …