901 research outputs found

    Visual Realism Assessment for Face-swap Videos

    Full text link
    Deep-learning based face-swap videos, also known as deep fakes, are becoming more and more realistic and deceiving. The malicious usage of these face-swap videos has caused wide concerns. The research community has been focusing on the automatic detection of these fake videos, but the assessment of their visual realism, as perceived by human eyes, is still an unexplored dimension. Visual realism assessment, or VRA, is essential for assessing the potential impact that may be brought by a specific face-swap video, and it is also important as a quality assessment metric to compare different face-swap methods. In this paper, we make a small step towards this new VRA direction by building a benchmark for evaluating the effectiveness of different automatic VRA models, which range from using traditional hand-crafted features to different kinds of deep-learning features. The evaluations are based on a recent competition dataset named DFGC 2022, which contains 1400 diverse face-swap videos that are annotated with Mean Opinion Scores (MOS) on visual realism. Comprehensive experiment results using 11 models and 3 protocols are shown and discussed. We demonstrate the feasibility of devising effective VRA models for assessing face-swap videos and methods. The particular usefulness of existing deepfake detection features for VRA is also noted. The code can be found at https://github.com/XianyunSun/VRA.git.Comment: Accepted by ICIG 202

    Phonetic accommodation to natural and synthetic voices : Behavior of groups and individuals in speech shadowing

    Get PDF
    The present study investigates whether native speakers of German phonetically accommodate to natural and synthetic voices in a shadowing experiment. We aim to determine whether this phenomenon, which is frequently found in HHI, also occurs in HCI involving synthetic speech. The examined features pertain to different phonetic domains: allophonic variation, schwa epenthesis, realization of pitch accents, word-based temporal structure and distribution of spectral energy. On the individual level, we found that the participants converged to varying subsets of the examined features, while they maintained their baseline behavior in other cases or, in rare instances, even diverged from the model voices. This shows that accommodation with respect to one particular feature may not predict the behavior with respect to another feature. On the group level, the participants of the natural condition converged to all features under examination, however very subtly so for schwa epenthesis. The synthetic voices, while partly reducing the strength of effects found for the natural voices, triggered accommodating behavior as well. The predominant pattern for all voice types was convergence during the interaction followed by divergence after the interaction

    Phonetic accommodation of human interlocutors in the context of human-computer interaction

    Get PDF
    Phonetic accommodation refers to the phenomenon that interlocutors adapt their way of speaking to each other within an interaction. This can have a positive influence on the communication quality. As we increasingly use spoken language to interact with computers these days, the phenomenon of phonetic accommodation is also investigated in the context of human-computer interaction: on the one hand, to find out whether speakers adapt to a computer agent in a similar way as they do to a human interlocutor, on the other hand, to implement accommodation behavior in spoken dialog systems and explore how this affects their users. To date, the focus has been mainly on the global acoustic-prosodic level. The present work demonstrates that speakers interacting with a computer agent also identify locally anchored phonetic phenomena such as segmental allophonic variation and local prosodic features as accommodation targets and converge on them. To this end, we conducted two experiments. First, we applied the shadowing method, where the participants repeated short sentences from natural and synthetic model speakers. In the second experiment, we used the Wizard-of-Oz method, in which an intelligent spoken dialog system is simulated, to enable a dynamic exchange between the participants and a computer agent — the virtual language learning tutor Mirabella. The target language of our experiments was German. Phonetic convergence occurred in both experiments when natural voices were used as well as when synthetic voices were used as stimuli. Moreover, both native and non-native speakers of the target language converged to Mirabella. Thus, accommodation could be relevant, for example, in the context of computer-assisted language learning. Individual variation in accommodation behavior can be attributed in part to speaker-specific characteristics, one of which is assumed to be the personality structure. We included the Big Five personality traits as well as the concept of mental boundaries in the analysis of our data. Different personality traits influenced accommodation to different types of phonetic features. Mental boundaries have not been studied before in the context of phonetic accommodation. We created a validated German adaptation of a questionnaire that assesses the strength of mental boundaries. The latter can be used in future studies involving mental boundaries in native speakers of German.Bei phonetischer Akkommodation handelt es sich um das PhĂ€nomen, dass GesprĂ€chspartner ihre Sprechweise innerhalb einer Interaktion aneinander anpassen. Dies kann die QualitĂ€t der Kommunikation positiv beeinflussen. Da wir heutzutage immer öfter mittels gesprochener Sprache mit Computern interagieren, wird das PhĂ€nomen der phonetischen Akkommodation auch im Kontext der Mensch-Computer-Interaktion untersucht: zum einen, um herauszufinden, ob sich Sprecher an einen Computeragenten in Ă€hnlicher Weise anpassen wie an einen menschlichen GesprĂ€chspartner, zum anderen, um das Akkommodationsverhalten in Sprachdialogsysteme zu implementieren und zu erforschen, wie dieses auf ihre Benutzer wirkt. Bislang lag der Fokus dabei hauptsĂ€chlich auf der globalen akustisch-prosodischen Ebene. Die vorliegende Arbeit zeigt, dass Sprecher in Interaktion mit einem Computeragenten auch lokal verankerte phonetische PhĂ€nomene wie segmentale allophone Variation und lokale prosodische Merkmale als Akkommodationsziele identifizieren und in Bezug auf diese konvergieren. Dabei wendeten wir in einem ersten Experiment die Shadowing-Methode an, bei der die Teilnehmer kurze SĂ€tze von natĂŒrlichen und synthetischen Modellsprechern wiederholten. In einem zweiten Experiment ermöglichten wir mit der Wizard-of-Oz-Methode, bei der ein intelligentes Sprachdialogsystem simuliert wird, einen dynamischen Austausch zwischen den Teilnehmern und einem Computeragenten — der virtuellen Sprachlerntutorin Mirabella. Die Zielsprache unserer Experimente war Deutsch. Phonetische Konvergenz trat in beiden Experimenten sowohl bei Verwendung natĂŒrlicher Stimmen als auch bei Verwendung synthetischer Stimmen als Stimuli auf. Zudem konvergierten sowohl Muttersprachler als auch Nicht-Muttersprachler der Zielsprache zu Mirabella. Somit könnte Akkommodation zum Beispiel im Kontext des computergstĂŒtzten Sprachenlernens zum Tragen kommen. Individuelle Variation im Akkommodationsverhalten kann unter anderem auf sprecherspezifische Eigenschaften zurĂŒckgefĂŒhrt werden. Es wird vermutet, dass zu diesen auch die Persönlichkeitsstruktur gehört. Wir bezogen die Big Five Persönlichkeitsmerkmale sowie das Konzept der mentalen Grenzen in die Analyse unserer Daten ein. Verschiedene Persönlichkeitsmerkmale beeinflussten die Akkommodation zu unterschiedlichen Typen von phonetischen Merkmalen. Die mentalen Grenzen sind im Zusammenhang mit phonetischer Akkommodation zuvor noch nicht untersucht worden. Wir erstellten eine validierte deutsche Adaptierung eines Fragebogens, der die StĂ€rke der mentalen Grenzen erhebt. Diese kann in zukĂŒnftigen Untersuchungen mentaler Grenzen bei Muttersprachlern des Deutschen verwendet werden.Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 278805297: "Phonetische Konvergenz in der Mensch-Maschine-Kommunikation

    Phonetic accommodation in interaction with a virtual language learning tutor: A Wizard-of-Oz study

    Get PDF
    We present a Wizard-of-Oz experiment examining phonetic accommodation of human interlocutors in the context of human-computer interaction. Forty-two native speakers of German engaged in dynamic spoken interaction with a simulated virtual tutor for learning the German language called Mirabella. Mirabella was controlled by the experimenter and used either natural or hidden Markov model-based synthetic speech to communicate with the participants. In the course of four tasks, the participants’ accommodating behavior with respect to wh-question realization and allophonic variation in German was tested. The participants converged to Mirabella with respect to modified wh-question intonation, i.e., rising F0 contour and nuclear pitch accent on the interrogative pronoun, and the allophonic contrast [ÉȘç] vs. [ÉȘk] occurring in the word ending -ig. They did not accommodate to the allophonic contrast [ɛː] vs. [eː] as a realization of the long vowel -Ă€-. The results did not differ between the experimental groups that communicated with either the natural or the synthetic speech version of Mirabella. Testing the influence of the “Big Five” personality traits on the accommodating behavior revealed a tendency for neuroticism to influence the convergence of question intonation. On the level of individual speakers, we found considerable variation with respect to the degree and direction of accommodation. We conclude that phonetic accommodation on the level of local prosody and segmental pronunciation occurs in users of spoken dialog systems, which could be exploited in the context of computer-assisted language learning

    Speech Synthesis for the Generation of Artificial Personality

    Get PDF

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces

    On Solving Selected Nonlinear Integer Programming Problems in Data Mining, Computational Biology, and Sustainability

    Get PDF
    This thesis consists of three essays concerning the use of optimization techniques to solve four problems in the fields of data mining, computational biology, and sustainable energy devices. To the best of our knowledge, the particular problems we discuss have not been previously addressed using optimization, which is a specific contribution of this dissertation. In particular, we analyze each of the problems to capture their underlying essence, subsequently demonstrating that each problem can be modeled as a nonlinear (mixed) integer program. We then discuss the design and implementation of solution techniques to locate optimal solutions to the aforementioned problems. Running throughout this dissertation is the theme of using mixed-integer programming techniques in conjunction with context-dependent algorithms to identify optimal and previously undiscovered underlying structure

    Identifying experts and authoritative documents in social bookmarking systems

    Get PDF
    Social bookmarking systems allow people to create pointers to Web resources in a shared, Web-based environment. These services allow users to add free-text labels, or “tags”, to their bookmarks as a way to organize resources for later recall. Ease-of-use, low cognitive barriers, and a lack of controlled vocabulary have allowed social bookmaking systems to grow exponentially over time. However, these same characteristics also raise concerns. Tags lack the formality of traditional classificatory metadata and suffer from the same vocabulary problems as full-text search engines. It is unclear how many valuable resources are untagged or tagged with noisy, irrelevant tags. With few restrictions to entry, annotation spamming adds noise to public social bookmarking systems. Furthermore, many algorithms for discovering semantic relations among tags do not scale to the Web. Recognizing these problems, we develop a novel graph-based Expert and Authoritative Resource Location (EARL) algorithm to find the most authoritative documents and expert users on a given topic in a social bookmarking system. In EARL’s first phase, we reduce noise in a Delicious dataset by isolating a smaller sub-network of “candidate experts”, users whose tagging behavior shows potential domain and classification expertise. In the second phase, a HITS-based graph analysis is performed on the candidate experts’ data to rank the top experts and authoritative documents by topic. To identify topics of interest in Delicious, we develop a distributed method to find subsets of frequently co-occurring tags shared by many candidate experts. We evaluated EARL’s ability to locate authoritative resources and domain experts in Delicious by conducting two independent experiments. The first experiment relies on human judges’ n-point scale ratings of resources suggested by three graph-based algorithms and Google. The second experiment evaluated the proposed approach’s ability to identify classification expertise through human judges’ n-point scale ratings of classification terms versus expert-generated data
    • 

    corecore