40 research outputs found

    Inducing Stereotypical Character Roles from Plot Structure

    Get PDF
    If we are to understand stories, we must understand characters: characters are central to every narrative and drive the action forward. Critically, many stories (especially cultural ones) employ stereotypical character roles in their stories for different purposes, including efficient communication among bundles of default characteristics and associations, ease understanding of those characters\u27 role in the overall narrative, and many more. These roles include ideas such as hero, villain, or victim, as well as culturally-specific roles such as, for example, the donor (in Russian tales) or the trickster (in Native American tales). My thesis aims to learn these roles automatically, inducing them from data using a clustering technique. The first step of learning character roles, however, is to identify which coreference chains correspond to characters, which are defined by narratologists as animate entities that drive the plot forward. The first part of my work has focused on this character identification problem, specifically focusing on the problem of animacy detection. Prior work treated animacy as a word-level property, and researchers developed statistical models to classify words as either animate or inanimate. I claimed this approach to the problem is ill-posed and presented a new hybrid approach for classifying the animacy of coreference chains that achieved state-of-the-art performance. The next step of my work is to develop approaches first to identify the characters and then a new unsupervised clustering approach to learn stereotypical roles. My character identification system consists of two stages: first, I detect animate chains from the coreference chains using my existing animacy detector; second, I apply a supervised machine learning model that identifies which of those chains qualify as characters. I proposed a narratologically grounded definition of character and built a supervised machine learning model with a small set of features that achieved state-of-the-art performance. In the last step, I successfully implemented a clustering approach with plot and thematic information to cluster the archetypes. This work resulted in a completely new approach to understanding the structure of stories, greatly advancing the state-of-the-art of story understanding

    Proceedings of the 4th International Workshop on Reading Music Systems

    Full text link
    The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 4th International Workshop on Reading Music Systems, held online on Nov. 18th 2022.Comment: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona Shatr

    Methods for Addressing Data Diversity in Automatic Speech Recognition

    Get PDF
    The performance of speech recognition systems is known to degrade in mismatched conditions, where the acoustic environment and the speaker population significantly differ between the training and target test data. Performance degradation due to the mismatch is widely reported in the literature, particularly for diverse datasets. This thesis approaches the mismatch problem in diverse datasets with various strategies including data refinement, variability modelling and speech recognition model adaptation. These strategies are realised in six novel contributions. The first contribution is a data subset selection technique using likelihood ratio derived from a target test set quantifying mismatch. The second contribution is a multi-style training method using data augmentation. The existing training data is augmented using a distribution of variabilities learnt from a target dataset, resulting in a matched set. The third contribution is a new approach for genre identification in diverse media data with the aim of reducing the mismatch in an adaptation framework. The fourth contribution is a novel method which performs an unsupervised domain discovery using latent Dirichlet allocation. Since the latent domains have a high correlation with some subjective meta-data tags, such as genre labels of media data, features derived from the latent domains are successfully applied to the genre and broadcast show identification tasks. The fifth contribution extends the latent modelling technique for acoustic model adaptation, where latent-domain specific models are adapted from a base model. As the sixth contribution, an alternative adaptation approach is proposed where subspace adaptation of deep neural network acoustic models is performed using the proposed latent-domain aware training procedure. All of the proposed techniques for mismatch reduction are verified using diverse datasets. Using data selection, data augmentation and latent-domain model adaptation methods the mismatch between training and testing conditions of diverse ASR systems are reduced, resulting in more robust speech recognition systems

    Advances in Binders for Construction Materials

    Get PDF
    The global binder production for construction materials is approximately 7.5 billion tons per year, contributing ~6% to the global anthropogenic atmospheric CO2 emissions. Reducing this carbon footprint is a key aim of the construction industry, and current research focuses on developing new innovative ways to attain more sustainable binders and concrete/mortars as a real alternative to the current global demand for Portland cement.With this aim, several potential alternative binders are currently being investigated by scientists worldwide, based on calcium aluminate cement, calcium sulfoaluminate cement, alkali-activated binders, calcined clay limestone cements, nanomaterials, or supersulfated cements. This Special Issue presents contributions that address research and practical advances in i) alternative binder manufacturing processes; ii) chemical, microstructural, and structural characterization of unhydrated binders and of hydrated systems; iii) the properties and modelling of concrete and mortars; iv) applications and durability of concrete and mortars; and v) the conservation and repair of historic concrete/mortar structures using alternative binders.We believe this Special Issue will be of high interest in the binder industry and construction community, based upon the novelty and quality of the results and the real potential application of the findings to the practice and industry

    Monitoring of wooden constructions - a key to long service life?

    Get PDF

    Pattern-based refactoring in model-driven engineering

    Full text link
    L’ingénierie dirigée par les modèles (IDM) est un paradigme du génie logiciel qui utilise les modèles comme concepts de premier ordre à partir desquels la validation, le code, les tests et la documentation sont dérivés. Ce paradigme met en jeu divers artefacts tels que les modèles, les méta-modèles ou les programmes de transformation des modèles. Dans un contexte industriel, ces artefacts sont de plus en plus complexes. En particulier, leur maintenance demande beaucoup de temps et de ressources. Afin de réduire la complexité des artefacts et le coût de leur maintenance, de nombreux chercheurs se sont intéressés au refactoring de ces artefacts pour améliorer leur qualité. Dans cette thèse, nous proposons d’étudier le refactoring dans l’IDM dans sa globalité, par son application à ces différents artefacts. Dans un premier temps, nous utilisons des patrons de conception spécifiques, comme une connaissance a priori, appliqués aux transformations de modèles comme un véhicule pour le refactoring. Nous procédons d’abord par une phase de détection des patrons de conception avec différentes formes et différents niveaux de complétude. Les occurrences détectées forment ainsi des opportunités de refactoring qui seront exploitées pour aboutir à des formes plus souhaitables et/ou plus complètes de ces patrons de conceptions. Dans le cas d’absence de connaissance a priori, comme les patrons de conception, nous proposons une approche basée sur la programmation génétique, pour apprendre des règles de transformations, capables de détecter des opportunités de refactoring et de les corriger. Comme alternative à la connaissance disponible a priori, l’approche utilise des exemples de paires d’artefacts d’avant et d’après le refactoring, pour ainsi apprendre les règles de refactoring. Nous illustrons cette approche sur le refactoring de modèles.Model-Driven Engineering (MDE) is a software engineering paradigm that uses models as first-class concepts from which validation, code, testing, and documentation are derived. This paradigm involves various artifacts such as models, meta-models, or model transformation programs. In an industrial context, these artifacts are increasingly complex. In particular, their maintenance is time and resources consuming. In order to reduce the complexity of artifacts and the cost of their maintenance, many researchers have been interested in refactoring these artifacts to improve their quality. In this thesis, we propose to study refactoring in MDE holistically, by its application to these different artifacts. First, we use specific design patterns, as an example of prior knowledge, applied to model transformations to enable refactoring. We first proceed with a detecting phase of design patterns, with different forms and levels of completeness. The detected occurrences thus form refactoring opportunities that will be exploited to implement more desirable and/or more complete forms of these design patterns. In the absence of prior knowledge, such as design patterns, we propose an approach based on genetic programming, to learn transformation rules, capable of detecting refactoring opportunities and correcting them. As an alternative to prior knowledge, our approach uses examples of pairs of artifacts before and after refactoring, in order to learn refactoring rules. We illustrate this approach on model refactoring

    Advances in identifiability of nonlinear probabilistic models

    Get PDF
    Identifiability is a highly prized property of statistical models. This thesis investigates this property in nonlinear models encountered in two fields of statistics: representation learning and causal discovery. In representation learning, identifiability leads to learning interpretable and reproducible representations, while in causal discovery, it is necessary for the estimation of correct causal directions. We begin by leveraging recent advances in nonlinear ICA to show that the latent space of a VAE is identifiable up to a permutation and pointwise nonlinear transformations of its components. A factorized prior distribution over the latent variables conditioned on an auxiliary observed variable, such as a class label or nearly any other observation, is required for our result. We also extend previous identifiability results in nonlinear ICA to the case of noisy or undercomplete observations, and incorporate them into a maximum likelihood framework. Our second contribution is to develop the Independently Modulated Component Analysis (IMCA) framework, a generalization of nonlinear ICA to non-independent latent variables. We show that we can drop the independence assumption in ICA while maintaining identifiability, resulting in a very flexible and generic framework for principled disentangled representation learning. This finding is predicated on the existence of an auxiliary variable that modulates the joint distribution of the latent variables in a factorizable manner. As a third contribution, we extend the identifiability theory to a broad family of conditional energy-based models (EBMs). This novel model generalizes earlier results by removing any distributional assumptions on the representations, which are ubiquitous in the latent variable setting. The conditional EBM can learn identifiable overcomplete representations and has universal approximation capabilities/. Finally, we investigate a connection between the framework of autoregressive normalizing flow models and causal discovery. Causal models derived from affine autoregressive flows are shown to be identifiable, generalizing the wellknown additive noise model. Using normalizing flows, we can compute the exact likelihood of the causal model, which is subsequently used to derive a likelihood ratio measure for causal discovery. They are also invertible, making them perfectly suitable for performing causal inference tasks like interventions and counterfactuals
    corecore