484 research outputs found
An overview of video recommender systems: state-of-the-art and research issues
Video platforms have become indispensable components within a diverse range of applications, serving various purposes in entertainment, e-learning, corporate training, online documentation, and news provision. As the volume and complexity of video content continue to grow, the need for personalized access features becomes an inevitable requirement to ensure efficient content consumption. To address this need, recommender systems have emerged as helpful tools providing personalized video access. By leveraging past user-specific video consumption data and the preferences of similar users, these systems excel in recommending videos that are highly relevant to individual users. This article presents a comprehensive overview of the current state of video recommender systems (VRS), exploring the algorithms used, their applications, and related aspects. In addition to an in-depth analysis of existing approaches, this review also addresses unresolved research challenges within this domain. These unexplored areas offer exciting opportunities for advancements and innovations, aiming to enhance the accuracy and effectiveness of personalized video recommendations. Overall, this article serves as a valuable resource for researchers, practitioners, and stakeholders in the video domain. It offers insights into cutting-edge algorithms, successful applications, and areas that merit further exploration to advance the field of video recommendation
Elastic shape analysis of geometric objects with complex structures and partial correspondences
In this dissertation, we address the development of elastic shape analysis frameworks for the registration, comparison and statistical shape analysis of geometric objects with complex topological structures and partial correspondences. In particular, we introduce a variational framework and several numerical algorithms for the estimation of geodesics and distances induced by higher-order elastic Sobolev metrics on the space of parametrized and unparametrized curves and surfaces. We extend our framework to the setting of shape graphs (i.e., geometric objects with branching structures where each branch is a curve) and surfaces with complex topological structures and partial correspondences. To do so, we leverage the flexibility of varifold fidelity metrics in order to augment our geometric objects with a spatially-varying weight function, which in turn enables us to indirectly model topological changes and handle partial matching constraints via the estimation of vanishing weights within the registration process. In the setting of shape graphs, we prove the existence of solutions to the relaxed registration problem with weights, which is the main theoretical contribution of this thesis. In the setting of surfaces, we leverage our surface matching algorithms to develop a comprehensive collection of numerical routines for the statistical shape analysis of sets of 3D surfaces, which includes algorithms to compute Karcher means, perform dimensionality reduction via multidimensional scaling and tangent principal component analysis, and estimate parallel transport across surfaces (possibly with partial matching constraints).
Moreover, we also address the development of numerical shape analysis pipelines for large-scale data-driven applications with geometric objects. Towards this end, we introduce a supervised deep learning framework to compute the square-root velocity (SRV) distance for curves. Our trained network provides fast and accurate estimates of the SRV distance between pairs of geometric curves, without the need to find optimal reparametrizations. As a proof of concept for the suitability of such approaches in practical contexts, we use it to perform optical character recognition (OCR), achieving comparable performance in terms of computational speed and accuracy to other existing OCR methods.
Lastly, we address the difficulty of extracting high quality shape structures from imaging data in the field of astronomy. To do so, we present a state-of-the-art expectation-maximization approach for the challenging task of multi-frame astronomical image deconvolution and super-resolution. We leverage our approach to obtain a high-fidelity reconstruction of the night sky, from which high quality shape data can be extracted using appropriate segmentation and photometric techniques
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
Interdisciplinarity in the Age of the Triple Helix: a Film Practitioner's Perspective
This integrative chapter contextualises my research including articles I have published as well as one of the creative artefacts developed from it, the feature film The Knife That Killed Me. I review my work considering the ways in which technology, industry methods and academic practice have evolved as well as how attitudes to interdisciplinarity have changed, linking these to Etzkowitz and Leydesdorff’s ‘Triple Helix’ model (1995). I explore my own experiences and observations of opportunities and challenges that have been posed by the intersection of different stakeholder needs and expectations, both from industry and academic perspectives, and argue that my work provides novel examples of the applicability of the ‘Triple Helix’ to the creative industries. The chapter concludes with a reflection on the evolution and direction of my work, the relevance of the ‘Triple Helix’ to creative practice, and ways in which this relationship could be investigated further
Temporal multimodal video and lifelog retrieval
The past decades have seen exponential growth of both consumption and production of data, with multimedia such as images and videos contributing significantly to said growth. The widespread proliferation of smartphones has provided everyday users with the ability to consume and produce such content easily. As the complexity and diversity of multimedia data has grown, so has the need for more complex retrieval models which address the information needs of users. Finding relevant multimedia content is central in many scenarios, from internet search engines and medical retrieval to querying one's personal multimedia archive, also called lifelog. Traditional retrieval models have often focused on queries targeting small units of retrieval, yet users usually remember temporal context and expect results to include this. However, there is little research into enabling these information needs in interactive multimedia retrieval.
In this thesis, we aim to close this research gap by making several contributions to multimedia retrieval with a focus on two scenarios, namely video and lifelog retrieval. We provide a retrieval model for complex information needs with temporal components, including a data model for multimedia retrieval, a query model for complex information needs, and a modular and adaptable query execution model which includes novel algorithms for result fusion. The concepts and models are implemented in vitrivr, an open-source multimodal multimedia retrieval system, which covers all aspects from extraction to query formulation and browsing. vitrivr has proven its usefulness in evaluation campaigns and is now used in two large-scale interdisciplinary research projects. We show the feasibility and effectiveness of our contributions in two ways: firstly, through results from user-centric evaluations which pit different user-system combinations against one another. Secondly, we perform a system-centric evaluation by creating a new dataset for temporal information needs in video and lifelog retrieval with which we quantitatively evaluate our models.
The results show significant benefits for systems that enable users to specify more complex information needs with temporal components. Participation in interactive retrieval evaluation campaigns over multiple years provides insight into possible future developments and challenges of such campaigns
Deep Learning for Video Object Segmentation:A Review
As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review
- …