420 research outputs found

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Employment of artificial intelligence mechanisms for e-Health systems in order to obtain vital signs and detect diseases from medical images improving the processes of online consultations and diagnosis

    Get PDF
    Nowadays e-Health web applications allow doctors to access different types of features, such as knowing which medication the patient has consumed or performing online consultations. Internet systems for healthcare can be improved by using artificial intelligence mechanisms for the process of detecting diseases and obtaining biological data, allowing medical professionals to have important information that facilitates the diagnosis process and the choice of the correct treatment for each particular person. The proposed research work aims to present an innovative approach when compared to traditional platforms, by providing online vital signs in real time, access to a web stethoscope, to a medical image uploader that predicts if a certain disease is present, through deep learning methods, and also allows the visualization of all historical data of a patient. This dissertation has the objective of defending the concept of online consultations, providing complementary functionalities to the traditional methods for performing medical diagnoses through the use of software engineering practices. The process of obtaining vital signs was done via artificial intelligence using a computer camera as sensor. This methodology requires that the user is at a state of rest during the measurements. This investigation led to the conclusion that, in the future, many medical processes will most likely be done online, where this practice is considered extremely helpful for the analysis and treatment of contagious diseases, or cases that require constant monitoring.No quotidiano, as aplicações Web e-Saúde permitem aos médicos acesso a diferentes tipos de funcionalidades, como saber qual a medicação que o doente consumiu ou a realização de consultas online. Os sistemas via internet para a saúde podem ser melhorados, utilizando mecanismos de inteligência artificial para os processos de deteção de doenças e de obtenção de dados biológicos, permitindo que os médicos tenham informações importantes que facilitam o processo de diagnóstico ou a escolha do tratamento correto para um determinado utente. O trabalho de investigação proposto pretende apresentar uma abordagem inovadora na comparação com as plataformas tradicionais, ao disponibilizar sinais vitais online em tempo real, acesso a um estetoscópio web, a um uploader de imagens médicas que prevê se uma determinada doença está presente, através de métodos de aprendizagem profunda, bem como permite visualizar todos os dados históricos de um paciente. Esta dissertação visa defender o conceito de consultas virtuais, providenciando funcionalidades complementares aos processos tradicionais de realização de um diagnóstico médico, através da utilização de práticas de engenharia de software. O processo de obtenção de sinais vitais foi feito através de inteligência artificial para visão computacional utilizando uma câmara de computador. Esta metodologia requer que o utilizador esteja em estado de repouso durante a obtenção dos dados medidos. Esta investigação permitiu concluir que, no futuro, muitos processos médicos atuais provavelmente serão feitos online, sendo esta prática considerada extremamente útil na análise e tratamento de doenças contagiosas, ou de casos que requerem acompanhamento constante

    Learning to Hash-tag Videos with Tag2Vec

    Full text link
    User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study

    Quaero at TRECVID 2013: Semantic Indexing and Instance Search

    Get PDF
    International audienceThe Quaero group is a consortium of French and German organizations working on Multimedia Indexing and Retrieval1. LIG participated to the semantic indexing main task, localization task and concept pair task. LIG also participated to the organization of this task. This paper describes these participations which are quite similar to our previous year's participations. For the semantic indexing main task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classiffication, fusion of descriptor variants, higher-level fusion, and re-ranking. We used a number of different descriptors and a hierarchical fusion strategy. We also used conceptual feedback by adding a vector of classiffication score to the pool of descriptors. The best Quaero run has a Mean Inferred Average Precision of 0.2848, which ranked us 2nd out of 26 participants. We also co-organized the TRECVid SIN 2013 task and collaborative annotation

    Focused image search in the social Web.

    Get PDF
    Recently, social multimedia-sharing websites, which allow users to upload, annotate, and share online photo or video collections, have become increasingly popular. The user tags or annotations constitute the new multimedia meta-data . We present an image search system that exploits both image textual and visual information. First, we use focused crawling and DOM Tree based web data extraction methods to extract image textual features from social networking image collections. Second, we propose the concept of visual words to handle the image\u27s visual content for fast indexing and searching. We also develop several user friendly search options to allow users to query the index using words and image feature descriptions (visual words). The developed image search system tries to bridge the gap between the scalable industrial image search engines, which are based on keyword search, and the slower content based image retrieval systems developed mostly in the academic field and designed to search based on image content only. We have implemented a working prototype by crawling and indexing over 16,056 images from flickr.com, one of the most popular image sharing websites. Our experimental results on a working prototype confirm the efficiency and effectiveness of the methods, that we proposed

    Poet: Product-oriented Video Captioner for E-commerce

    Full text link
    In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding

    FrameProv: Towards End-To-End Video Provenance

    Full text link
    Video feeds are often deliberately used as evidence, as in the case of CCTV footage; but more often than not, the existence of footage of a supposed event is perceived as proof of fact in the eyes of the public at large. This reliance represents a societal vulnerability given the existence of easy-to-use editing tools and means to fabricate entire video feeds using machine learning. And, as the recent barrage of fake news and fake porn videos have shown, this isn't merely an academic concern, it is actively been exploited. I posit that this exploitation is only going to get more insidious. In this position paper, I introduce a long term project that aims to mitigate some of the most egregious forms of manipulation by embedding trustworthy components in the video transmission chain. Unlike earlier works, I am not aiming to do tamper detection or other forms of forensics -- approaches I think are bound to fail in the face of the reality of necessary editing and compression -- instead, the aim here is to provide a way for the video publisher to prove the integrity of the video feed as well as make explicit any edits they may have performed. To do this, I present a novel data structure, a video-edit specification language and supporting infrastructure that provides end-to-end video provenance, from the camera sensor to the viewer. I have implemented a prototype of this system and am in talks with journalists and video editors to discuss the best ways forward with introducing this idea to the mainstream

    Natural language processing based advanced method of unnecessary video detection

    Get PDF
    In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions

    AXES at TRECVID 2012: KIS, INS, and MED

    Get PDF
    The AXES project participated in the interactive instance search task (INS), the known-item search task (KIS), and the multimedia event detection task (MED) for TRECVid 2012. As in our TRECVid 2011 system, we used nearly identical search systems and user interfaces for both INS and KIS. Our interactive INS and KIS systems focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our KIS experiments were media professionals from the BBC; our INS experiments were carried out by students and researchers at Dublin City University. We performed comparatively well in both experiments. Our best KIS run found 13 of the 25 topics, and our best INS runs outperformed all other submitted runs in terms of P@100. For MED, the system presented was based on a minimal number of low-level descriptors, which we chose to be as large as computationally feasible. These descriptors are aggregated to produce high-dimensional video-level signatures, which are used to train a set of linear classifiers. Our MED system achieved the second-best score of all submitted runs in the main track, and best score in the ad-hoc track, suggesting that a simple system based on state-of-the-art low-level descriptors can give relatively high performance. This paper describes in detail our KIS, INS, and MED systems and the results and findings of our experiments
    corecore