Search CORE

17 research outputs found

Matrix Factorization in Tropical and Mixed Tropical-Linear Algebras

Author: Kordonis Ioannis
Maragos Petros
Retsinas George
Theodosis Emmanouil
Publication venue
Publication date: 25/09/2023
Field of study

Matrix Factorization (MF) has found numerous applications in Machine Learning and Data Mining, including collaborative filtering recommendation systems, dimensionality reduction, data visualization, and community detection. Motivated by the recent successes of tropical algebra and geometry in machine learning, we investigate two problems involving matrix factorization over the tropical algebra. For the first problem, Tropical Matrix Factorization (TMF), which has been studied already in the literature, we propose an improved algorithm that avoids many of the local optima. The second formulation considers the approximate decomposition of a given matrix into the product of three matrices where a usual matrix product is followed by a tropical product. This formulation has a very interesting interpretation in terms of the learning of the utility functions of multiple users. We also present numerical results illustrating the effectiveness of the proposed algorithms, as well as an application to recommendation systems with promising results

arXiv.org e-Print Archive

Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

Author: Filntisis Panagiotis P.
Katsamanis Athanasios
Maragos Petros
Paraperas-Papantoniou Foivos
Retsinas George
Roussos Anastasios
Publication venue
Publication date: 22/07/2022
Field of study

The recent state of the art on monocular 3D face reconstruction from image data has made some impressive advancements, thanks to the advent of Deep Learning. However, it has mostly focused on input coming from a single RGB image, overlooking the following important factors: a) Nowadays, the vast majority of facial image data of interest do not originate from single images but rather from videos, which contain rich dynamic information. b) Furthermore, these videos typically capture individuals in some form of verbal communication (public talks, teleconferences, audiovisual human-computer interactions, interviews, monologues/dialogues in movies, etc). When existing 3D face reconstruction methods are applied in such videos, the artifacts in the reconstruction of the shape and motion of the mouth area are often severe, since they do not match well with the speech audio. To overcome the aforementioned limitations, we present the first method for visual speech-aware perceptual reconstruction of 3D mouth expressions. We do this by proposing a "lipread" loss, which guides the fitting process so that the elicited perception from the 3D reconstructed talking head resembles that of the original video footage. We demonstrate that, interestingly, the lipread loss is better suited for 3D reconstruction of mouth movements compared to traditional landmark losses, and even direct 3D supervision. Furthermore, the devised method does not rely on any text transcriptions or corresponding audio, rendering it ideal for training in unlabeled datasets. We verify the efficiency of our method through exhaustive objective evaluations on three large-scale datasets, as well as subjective evaluation with two web-based user studies

arXiv.org e-Print Archive

3D Facial Expressions through Analysis-by-Neural-Synthesis

Author: Abrevaya Victoria F.
Bolkart Timo
Danecek Radek
Filntisis Panagiotis P.
Maragos Petros
Retsinas George
Roussos Anastasios
Publication venue
Publication date: 05/04/2024
Field of study

While existing methods for 3D face reconstruction from in-the-wild images excel at recovering the overall face shape, they commonly miss subtle, extreme, asymmetric, or rarely observed expressions. We improve upon these methods with SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics), which faithfully reconstructs expressive 3D faces from images. We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images. For training, most methods employ differentiable rendering to compare a predicted face mesh with the input image, along with a plethora of additional loss functions. This differentiable rendering loss not only has to provide supervision to optimize for 3D face geometry, camera, albedo, and lighting, which is an ill-posed optimization problem, but the domain gap between rendering and input image further hinders the learning process. Instead, SMIRK replaces the differentiable rendering with a neural rendering module that, given the rendered predicted mesh geometry, and sparsely sampled pixels of the input image, generates a face image. As the neural rendering gets color information from sampled image pixels, supervising with neural rendering-based reconstruction loss can focus solely on the geometry. Further, it enables us to generate images of the input identity with varying expressions while training. These are then utilized as input to the reconstruction model and used as supervision with ground truth geometry. This effectively augments the training data and enhances the generalization for diverse expressions. Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction. Project webpage: https://georgeretsi.github.io/smirk/

arXiv.org e-Print Archive

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Author: Christlein Vincent
Liwicki Marcus
Mokayed Hamam
Nikolaidou Konstantina
Retsinas George
Seuret Mathias
Sfikas Giorgos
Smith Elisa Barney
Publication venue
Publication date: 29/03/2023
Field of study

Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction; today, Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method manages to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with Frechet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and gets similar writer retrieval score as real data

arXiv.org e-Print Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Transforming scholarship in the archives through handwritten text recognition:Transkribus as a case study

Author: Ares Oliveira Sofia
Bryan Maximilian
Colutto Sebastian
Diem Markus
Déjean Hervé
Fiel Stefan
Gatos Basilis
Greinoecker Albert
Grüning Tobias
Hackl Guenter
Haukkovaara Vili
Heyer Gerhard
Hirvonen Lauri
Hodel Tobias
Jokinen Matti
Jokinen Philip
Kallio Mario
Kaplan Frederic
Kleber Florian
Labahn Roger
Lang Eva Maria
Laube Sören
Leifert Gundram
Louloudis Georgios
McNicholl Rory
Meunier Jean-Luc
Michael Johannes
Muehlberger Guenter
Mühlbauer Elena
Philipp Nathanael
Pratikakis Ioannis
Puigcerver Pérez Joan
Putz Hannelore
Retsinas George
Romero Verónica
Sablatnig Robert
Schofield Philip
Seaward Louise
Sfikas Georgios
Sieber Christian
Stamatopoulos Nikolaos
Strauss Tobias
Sánchez Joan Andreu
Terbul Tamara
Terras Melissa
Toselli Alejandro Hector
Ulreich Berthold
Vicente Bosch
Vidal Enrique
Villega Mauricio
Walcher Johanna
Weidemann Max
Wurster Herbert
Zagoris Konstantinos
Publication venue: 'Emerald'
Publication date: 09/09/2019
Field of study

Purpose: An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. - Design/methodology/approach: This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. - Findings: Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. - Research limitations/implications: The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc. - Practical implications: Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field. - Social implications: The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals. - Originality/value: This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector

Infoscience - École polytechnique fédérale de Lausanne

UCL Discovery

Edinburgh Research Explorer

ZORA

Bern Open Repository and Information System (BORIS)

Transferable Deep Features for Keyword Spotting

Author: George Retsinas
Publication venue: 'MDPI AG'
Publication date: 09/01/2018
Field of study

Deep features, defined as the activations of hidden layers of a neural network, have given promising results applied to various vision tasks. In this paper, we explore the usefulness and transferability of deep features, applied in the context of the problem of keyword spotting (KWS). We use a state-of-the-art deep convolutional network to extract deep features. The optimal parameters concerning their application are subsequently studied: the impact of the choice of hidden layer, the impact of applying dimensionality reduction with a manifold learning technique, as well as the choice of dissimilarity measure used to retrieve relevant word images. Extensive numerical results show that deep features lead to state-of-the-art KWS performance, even when the test and training set come from different document collections

Multidisciplinary Digital Publishing Institute