668 research outputs found

    VITALAS at TRECVID-2008

    Get PDF
    In this paper, we present our experiments in TRECVID 2008 about High-Level feature extraction task. This is the first year for our participation in TRECVID, our system adopts some popular approaches that other workgroups proposed before. We proposed 2 advanced low-level features NEW Gabor texture descriptor and the Compact-SIFT Codeword histogram. Our system applied well-known LIBSVM to train the SVM classifier for the basic classifier. In fusion step, some methods were employed such as the Voting, SVM-base, HCRF and Bootstrap Average AdaBoost(BAAB)

    A Novel Semantic Statistical Model for Automatic Image Annotation Using the Relationship between the Regions Based on Multi-Criteria Decision Making

    Get PDF
    Automatic image annotation has emerged as an important research topic due to the existence of the semantic gap and in addition to its potential application on image retrieval and management.  In this paper we present an approach which combines regional contexts and visual topics to automatic image annotation. Regional contexts model the relationship between the regions, whereas visual topics provide the global distribution of topics over an image. Conventional image annotation methods neglected the relationship between the regions in an image, while these regions are exactly explanation of the image semantics, therefore considering the relationship between them are helpful to annotate the images. The proposed model extracts regional contexts and visual topics from the image, and incorporates them by MCDM (Multi Criteria Decision Making) approach based on TOPSIS (Technique for Order Preference by Similarity to the Ideal Solution) method. Regional contexts and visual topics are learned by PLSA (Probability Latent Semantic Analysis) from the training data. The experiments on 5k Corel images show that integrating these two kinds of information is beneficial to image annotation.DOI:http://dx.doi.org/10.11591/ijece.v4i1.459

    Deliverable D1.4 Visual, text and audio information analysis for hypervideo, final release

    Get PDF
    Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including (a) the improvement of a number of methods in terms of accuracy and time efficiency, (b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform

    Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection

    Get PDF
    The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts; (2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts; and (3) to describe such short text corpora via specialised topic modelling. A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss’ kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.Agencia Estatal de InvestigaciĂłn | Ref. PID2020–113673RB-I00Xunta de Galicia | Ref. ED431C2018/55Fundação para a CiĂȘncia e a Tecnologia | Ref. UIDB/04469/2020Financiado para publicaciĂłn en acceso aberto: Universidade de Vigo/CISU

    VITALAS at TRECVID-2009

    Get PDF
    This paper describes the participation of VITALAS in the TRECVID-2009 evaluation where we submitted runs for the High-Level Feature Extraction (HLFE) and Interactive Search tasks. For the HLFE task, we focus on the evaluation of low-level feature sets and fusion methods. The runs employ multiple low-level features based on all available modalities (visual, audio and text) and the results show that use of such features improves the retrieval eectiveness signicantly. We also use a concept score fusion approach that achieves good results with reduced low-level feature vector dimensionality. Furthermore, a weighting scheme is introduced for cluster assignment in the \bag-of-words" approach. Our runs achieved good performance compared to a baseline run and the submissions of other TRECVID-2009 participants. For the Interactive Search task, we focus on the evaluation of the integrated VITALAS system in order to gain insights into the use and eectiveness of the system's search functionalities on (the combination of) multiple modalities and study the behavior of two user groups: professional archivists and non-professional users. Our analysis indicates that both user groups submit about the same total number of queries and use the search functionalities in a similar way, but professional users save twice as many shots and examine shots deeper in the ranked retrieved list.The agreement between the TRECVID assessors and our users was quite low. In terms of the eectiveness of the dierent search modalities, similarity searches retrieve on average twice as many relevant shots as keyword searches, fused searches three times as many, while concept searches retrieve even up to ve times as many relevant shots, indicating the benets of the use of robust concept detectors in multimodal video retrieval. High-Level Feature Extraction Runs 1. A VITALAS.CERTH-ITI 1: Early fusion of all available low-level features. 2. A VITALAS.CERTH-ITI 2: Concept score fusion for ve low-level features and 100 concepts, text features and bag-of-words with color SIFT descriptor based on dense sampling. 3. A VITALAS.CERTH-ITI 3: Concept score fusion for ve low-level features and 100 concepts combined with text features. 4. A VITALAS.CERTH-ITI 4: Weighting scheme for bag-of-words based on dense sampling of the color SIFT descriptor. 5. A VITALAS.CERTH-ITI 5: Baseline run, bag-of-words based on dense sampling of the color SIFT descriptor. Interactive Search Runs 1. vitalas 1: Interactive run by professional archivists 2. vitalas 2: Interactive run by professional archivists 3. vitalas 3: Interactive run by non-professional users 4. vitalas 4: Interactive run by non-professional user

    Geo-Information Harvesting from Social Media Data

    Get PDF
    As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysis-ready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

    Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

    Full text link
    As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model. Metric-based comparisons with several baselines and human evaluations indicate that our dilutions show higher relevance and topical coherence, while simultaneously being more effective at demonstrating the brittleness of the multimodal classifiers. Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations, especially in human-facing societal applications. The code and other resources are available at https://claws-lab.github.io/multimodal-robustness/.Comment: Accepted at the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP); Full Paper (Oral

    Tracking physical events on social media

    Get PDF
    Social media platforms have emerged as the widely accessed form of communication channel on the world wide web in the modern day. The first ever social networking website came into existence in the year 2002 and currently there are about 2.08 billion social media users around the globe. The participation of users within a social network can be considered as an act of sensing where they are interacting with the physical world and recording the corresponding observations in the form of texts, pictures, videos, etc. This phenomenon is termed as Social Sensing and motivates us to develop robust techniques which can estimate the physical state from the human observations. This dissertation addresses a set of problems related to detection and tracking of real-world events. The term ‘event’ refers to an entity that can be characterized by spatial and temporal properties. With the help of these properties we design novel mathematical models that help us with our goals. We first focus on a simple event detection technique using ‘Twitter’ as the source of information. The method described in this work allow us to perform detection in a completely language independent and unsupervised fashion. We next extend the event detection problem to a different type of social media, ‘Instagram’, which allows users to share pictorial information of nearby observations. With the availability of geotagged data we solve two different subproblems - the first one is to detect and geolocalize the instance of an event and the second one is to estimate the path taken by an event during its course. The next problem we look at is related to improving the quality of event localization with the help of text and metadata information. Twitter, in general, has less volume of geotagged data available in comparison to Instagram, which demands us to design methods that explore the supplementary information available from the detected events. Finally, we take a look at both the social networks at the same time in order to utilize the complementary advantages and perform better than the methods designed for the individual networks
    • 

    corecore