74 research outputs found
Combining Textual Features for the Detection of Hateful and Offensive Language
The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms
Recommended from our members
Estimating the information gap between textual and visual representations
Photos, drawings, figures, etc. supplement textual information in
various kinds of media, for example, in web news or scientific pub- lications. In this respect, the
intended effect of an image can be quite different, e.g., providing additional information,
focusing on certain details of surrounding text, or simply being a general il- lustration of a
topic. As a consequence, the semantic correlation between information of different modalities can
vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise
way. The variety of possible interrelations of textual and graphical information and the question,
how they can be de- scribed and automatically estimated have not been addressed yet by previous
work. In this paper, we present several contributions to close this gap. First, we introduce two
measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic
correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI
and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an
appropriate deep neural network model for the demanding task. The system has been evaluated on a
challenging test set and the experimental results demonstrate the feasibility of the approach
TVCalib: Camera Calibration for Sports Field Registration in Soccer
Sports field registration in broadcast videos is typically interpreted as the
task of homography estimation, which provides a mapping between a planar field
and the corresponding visible area of the image. In contrast to previous
approaches, we consider the task as a camera calibration problem. First, we
introduce a differentiable objective function that is able to learn the camera
pose and focal length from segment correspondences (e.g., lines, point clouds),
based on pixel-level annotations for segments of a known calibration object.
The calibration module iteratively minimizes the segment reprojection error
induced by the estimated camera parameters. Second, we propose a novel approach
for 3D sports field registration from broadcast soccer images. Compared to the
typical solution, which subsequently refines an initial estimation, our
solution does it in one step. The proposed method is evaluated for sports field
registration on two datasets and achieves superior results compared to two
state-of-the-art approaches.Comment: Accepted for publication at WACV'2
Recommended from our members
A Review on Recent Advances in Video-based Learning Research: Video Features, Interaction, Tools, and Technologies
Human learning shifts stronger than ever towards online settings, and especially towards video platforms. There is an abundance of tutorials and lectures covering diverse topics, from fixing a bike to particle physics. While it is advantageous that learning resources are freely available on the Web, the quality of the resources varies a lot. Given the number of available videos, users need algorithmic support in finding helpful and entertaining learning resources.
In this paper, we present a review of the recent research literature (2020-2021) on video-based learning. We focus on publications that examine the characteristics of video content, analyze frequently used features and technologies, and, finally, derive conclusions on trends and possible future research directions
Recommended from our members
On the Impact of Features and Classifiers for Measuring Knowledge Gain during Web Search - A Case Study
Search engines are normally not designed to support human learning intents and processes. The ÿeld of Search as Learning (SAL) aims to investigate the characteristics of a successful Web search with a learning purpose. In this paper, we analyze the impact of text complexity of Web pages on predicting knowledge gain during a search session. For this purpose, we conduct an experimental case study and investigate the in˝uence of several text-based features and classiÿers on the prediction task. We build upon data from a study of related work, where 104 participants were given the task to learn about the formation of lightning and thunder through Web search. We perform an extensive evaluation based on a state-of-the-art approach and extend it with additional features related to textual complexity of Web pages. In contrast to prior work, we perform a systematic search for optimal hyperparameters and show the possible in˝uence of feature selection strategies on the knowledge gain prediction. When using the new set of features, state-of-the-art results are noticeably improved. The results indicate that text complexity of Web pages could be an important feature resource for knowledge gain prediction
Recommended from our members
Semi-supervised identification of rarely appearing persons in video by correcting weak labels
Some recent approaches for character identification in movies and TV broadcasts are realized in a semi-supervised manner by assigning transcripts and/or subtitles to the speakers. However, the labels obtained in this way achieve only an accuracy of 80% - 90% and the number of training examples for the different actors is unevenly distributed. In this paper, we propose a novel approach for person identification in video by correcting and extending the training data with reliable predictions to reduce the number of annotation errors. Furthermore, the intra-class diversity of rarely speaking characters is enhanced. To address the imbalance of training data per person, we suggest two complementary prediction scores. These scores are also used to recognize whether or not a face track belongs to a (supporting) character whose identity does not appear in the transcript etc. Experimental results demonstrate the feasibility of the proposed approach, outperforming the current state of the art
Recommended from our members
A Multimodal Approach for Semantic Patent Image Retrieval
Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities
- …