84 research outputs found

    Matrix Factorization in Tropical and Mixed Tropical-Linear Algebras

    Full text link
    Matrix Factorization (MF) has found numerous applications in Machine Learning and Data Mining, including collaborative filtering recommendation systems, dimensionality reduction, data visualization, and community detection. Motivated by the recent successes of tropical algebra and geometry in machine learning, we investigate two problems involving matrix factorization over the tropical algebra. For the first problem, Tropical Matrix Factorization (TMF), which has been studied already in the literature, we propose an improved algorithm that avoids many of the local optima. The second formulation considers the approximate decomposition of a given matrix into the product of three matrices where a usual matrix product is followed by a tropical product. This formulation has a very interesting interpretation in terms of the learning of the utility functions of multiple users. We also present numerical results illustrating the effectiveness of the proposed algorithms, as well as an application to recommendation systems with promising results

    Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

    Full text link
    The recent state of the art on monocular 3D face reconstruction from image data has made some impressive advancements, thanks to the advent of Deep Learning. However, it has mostly focused on input coming from a single RGB image, overlooking the following important factors: a) Nowadays, the vast majority of facial image data of interest do not originate from single images but rather from videos, which contain rich dynamic information. b) Furthermore, these videos typically capture individuals in some form of verbal communication (public talks, teleconferences, audiovisual human-computer interactions, interviews, monologues/dialogues in movies, etc). When existing 3D face reconstruction methods are applied in such videos, the artifacts in the reconstruction of the shape and motion of the mouth area are often severe, since they do not match well with the speech audio. To overcome the aforementioned limitations, we present the first method for visual speech-aware perceptual reconstruction of 3D mouth expressions. We do this by proposing a "lipread" loss, which guides the fitting process so that the elicited perception from the 3D reconstructed talking head resembles that of the original video footage. We demonstrate that, interestingly, the lipread loss is better suited for 3D reconstruction of mouth movements compared to traditional landmark losses, and even direct 3D supervision. Furthermore, the devised method does not rely on any text transcriptions or corresponding audio, rendering it ideal for training in unlabeled datasets. We verify the efficiency of our method through exhaustive objective evaluations on three large-scale datasets, as well as subjective evaluation with two web-based user studies

    A Novel Training Program to Improve Human Spatial Orientation: Preliminary Findings

    Get PDF
    The ability to form a mental representation of the surroundings is a critical skill for spatial navigation and orientation in humans. Such a mental representation is known as a "cognitive map" and is formed as individuals familiarize themselves with the surrounding, providing detailed information about salient environmental landmarks and their spatial relationships. Despite evidence of the malleability and potential for training spatial orientation skills in humans, it remains unknown if the specific ability to form cognitive maps can be improved by an appositely developed training program. Here, we present a newly developed computerized 12-days training program in a virtual environment designed specifically to stimulate the acquisition of this important skill. We asked 15 healthy volunteers to complete the training program and perform a comprehensive spatial behavioral assessment before and after the training. We asked participants to become familiar with the environment by navigating a small area before slowly building them up to navigate within the larger and more complex environment; we asked them to travel back and forth between environmental landmarks until they had built an understanding of where those landmarks resided with respect to one another. This process repeated until participants had visited every landmark in the virtual town and had learned where each landmark resided with respect to the others. The results of this study confirmed the feasibility of the training program and suggested an improvement in the ability of participants to form mental representations of the spatial surrounding. This study provides preliminary findings on the feasibility of a 12-days program in training spatial orientation skills. We discuss the utility and potential impact of this training program in the lives of the many individuals affected by topographical disorientation as a result of an acquired or developmental condition

    WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

    Full text link
    Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction; today, Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method manages to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with Frechet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and gets similar writer retrieval score as real data
    corecore