85 research outputs found

    Incorporating Annotator Uncertainty into Representations of Discourse Relations

    Full text link
    Annotation of discourse relations is a known difficult task, especially for non-expert annotators. In this paper, we investigate novice annotators' uncertainty on the annotation of discourse relations on spoken conversational data. We find that dialogue context (single turn, pair of turns within speaker, and pair of turns across speakers) is a significant predictor of confidence scores. We compute distributed representations of discourse relations from co-occurrence statistics that incorporate information about confidence scores and dialogue context. We perform a hierarchical clustering analysis using these representations and show that weighting discourse relation representations with information about confidence and dialogue context coherently models our annotators' uncertainty about discourse relation labels

    DisCGen: A Framework for Discourse-Informed Counterspeech Generation

    Full text link
    Counterspeech can be an effective method for battling hateful content on social media. Automated counterspeech generation can aid in this process. Generated counterspeech, however, can be viable only when grounded in the context of topic, audience and sensitivity as these factors influence both the efficacy and appropriateness. In this work, we propose a novel framework based on theories of discourse to study the inferential links that connect counter speeches to the hateful comment. Within this framework, we propose: i) a taxonomy of counterspeech derived from discourse frameworks, and ii) discourse-informed prompting strategies for generating contextually-grounded counterspeech. To construct and validate this framework, we present a process for collecting an in-the-wild dataset of counterspeech from Reddit. Using this process, we manually annotate a dataset of 3.9k Reddit comment pairs for the presence of hatespeech and counterspeech. The positive pairs are annotated for 10 classes in our proposed taxonomy. We annotate these pairs with paraphrased counterparts to remove offensiveness and first-person references. We show that by using our dataset and framework, large language models can generate contextually-grounded counterspeech informed by theories of discourse. According to our human evaluation, our approaches can act as a safeguard against critical failures of discourse-agnostic models.Comment: IJCNLP-AACL, 202

    The roles of redox active cofactors in catalysis : structural studies of iron sulfur cluster and flavin dependent enzymes

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemistry, 2013.Cataloged from PDF version of thesis.Includes bibliographical references.Cofactors are highly prevalent in biological systems and have evolved to take on many functions in enzyme catalysis. Two cofactors, flavin adenine dinucleotide (FAD) and [4Fe-4S] clusters, were originally determined to aid in electron transfer and redox chemistry. However, additional activities for these cofactors continue to be discovered. The study of FAD in the context of rebeccamycin and staurosporine biosynthesis has yielded another role for this cofactor in the enzyme StaC. A homolog of this enzyme, RebC, uses its FAD cofactor in the oxidation of 7-carboxy-K252c. StaC also uses 7-carboxy-K252 as a substrate, but its reaction does not result in a redox transformation. Biochemical and X-ray crystallographic methods were employed to determine that, indeed, the role of FAD in the StaC system is not to catalyze redox chemistry. Instead, FAD sterically drives an initial decarboxylation event. Subtle differences in the active sites of RebC and StaC promote this redox neutral decarboxylation, by activating water for a final protonation step. In another system, the characterization of the S-adenosyl-L-methionine (AdoMet) radical superfamily showed the versatility of these cofactors. In this superfamily, which includes over 40,000 unique sequences, [4Fe-4S] clusters are responsible for the initiation of radical chemistry. A recently described subclass of this superfamily, the dehydrogenases, require additional [4Fe-4S] cluster for activity. This requirement led to the hypothesis that these enzymes are catalyzing redox chemistry by directly ligating substrates to auxiliary (Aux) clusters. X-ray structures of 2-deoxy-scyllo-inosamine dehydrogenase (BtrN), required for the biosynthesis of 2-deoxystreptamine, and an anaerobic sulfatase maturating enzyme, anSMEcpe, which installs a required formylglycine posttranslational modification, refute this hypothesis. In these structures, substrate binding is distal from each enzymes' Aux clusters. However, the Aux cluster binding architecture shared between BtrN, anSMEcpe, and another AdoMet radical enzyme, MoaA, involved in molybdenum cofactor biosynthesis, suggests that the structural features will be a staple in the AdoMet radical superfamily, common to - 30% of the AdoMet radical reactions.by Peter John Goldman.Ph.D

    Supervision distante pour l'apprentissage de structures discursives dans les conversations multi-locuteurs

    Get PDF
    L'objectif principal de cette thèse est d'améliorer l'inférence automatique pour la modélisation et la compréhension des communications humaines. En particulier, le but est de faciliter considérablement l'analyse du discours afin d'implémenter, au niveau industriel, des outils d'aide à l'exploration des conversations. Il s'agit notamment de la production de résumés automatiques, de recommandations, de la détection des actes de dialogue, de l'identification des décisions, de la planification et des relations sémantiques entre les actes de dialogue afin de comprendre les dialogues. Dans les conversations à plusieurs locuteurs, il est important de comprendre non seulement le sens de l'énoncé d'un locuteur et à qui il s'adresse, mais aussi les relations sémantiques qui le lient aux autres énoncés de la conversation et qui donnent lieu à différents fils de discussion. Une réponse doit être reconnue comme une réponse à une question particulière ; un argument, comme un argument pour ou contre une proposition en cours de discussion ; un désaccord, comme l'expression d'un point de vue contrasté par rapport à une autre idée déjà exprimée. Malheureusement, les données de discours annotées à la main et de qualités sont coûteuses et prennent du temps, et nous sommes loin d'en avoir assez pour entraîner des modèles d'apprentissage automatique traditionnels, et encore moins des modèles d'apprentissage profond. Il est donc nécessaire de trouver un moyen plus efficace d'annoter en structures discursives de grands corpus de conversations multi-locuteurs, tels que les transcriptions de réunions ou les chats. Un autre problème est qu'aucune quantité de données ne sera suffisante pour permettre aux modèles d'apprentissage automatique d'apprendre les caractéristiques sémantiques des relations discursives sans l'aide d'un expert ; les données sont tout simplement trop rares. Les relations de longue distance, dans lesquelles un énoncé est sémantiquement connecté non pas à l'énoncé qui le précède immédiatement, mais à un autre énoncé plus antérieur/tôt dans la conversation, sont particulièrement difficiles et rares, bien que souvent centrales pour la compréhension. Notre objectif dans cette thèse a donc été non seulement de concevoir un modèle qui prédit la structure du discours pour une conversation multipartite sans nécessiter de grandes quantités de données annotées manuellement, mais aussi de développer une approche qui soit transparente et explicable afin qu'elle puisse être modifiée et améliorée par des experts.The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well

    Big Social Data and GIS: Visualize Predictive Crime

    Get PDF
    Social media is a desirable Big Data source used to examine the relationship between crime and social behavior. Observation of this connection is enriched within a geographic information system (GIS) rooted in environmental criminology theory, and produces several different results to substantiate such a claim. This paper presents the construction and implementation of a GIS artifact producing visualization and statistical outcomes to develop evidence that supports predictive crime analysis. An information system research prototype guides inquiry and uses crime as the dependent variable and a social media tweet corpus, operationalized via natural language processing, as the independent variable. This inescapable realization of social media as a predictive crime variable is prudent; researchers and practitioners will better appreciate its capability. Inclusive visual and statistical results are novel, represent state-of-the-art predictive analysis, increase the baseline R2 value by 7.26%, and support future predictive crime-based research when front-run with real-time social media

    Mechanisms of cognitive reserve : computational and experimental explorations

    Get PDF
    Cognitive reserve is the name given to the latent variable that describes individual differences in the ability to offset cognitive decline in old age. This thesis attempts to provide mechanistic explanations for two major aspects of cognitive reserve. These are neural compensation and neural reserve. Furthermore, behavioural experiments carried out as part of this investigation have extended the knowledge of existing theories as to the age invariance of neural compensation and the relationship between language, other more traditional proxies of cognitive reserve, and executive control. The results of these studies carried out in this thesis have demonstrated a biologically viable mechanism for the monitoring of task demand with resultant control of interhemispheric communication as a method of compensation. Further, this aspect of neural compensation was not found in younger participants. The neural network model in this thesis demonstrated differences over age in the spacing of representations for bilingual and monolingual networks as well as demonstrating increased inhibition in the bilingual network as a result of a negative relationship between weights from the tags of each language to nodes in the hidden layer. Finally, regression analysis using data from two large scale behavioural experiments demonstrated a minimal influence of bilingual language use on performance in executive control tasks. The models in this thesis provide an insight into the mechanisms behind cognitive reserve whilst supporting empirical results. Further, the results from the neural network model allowed predictions to be made with regard to the performance of bilinguals in dual category retrieval tasks. The lack of a relationship between bilingualism and cognitive control is supported by emerging research in the area and suggests that the functionality underlying cognitive reserve may be better described by biological rather than cognitive processes

    Cochlear transcriptome analysis of an outbred mouse population (CFW)

    Get PDF
    Age-related hearing loss (ARHL) is the most common cause of hearing loss and one of the most prevalent conditions affecting the elderly worldwide. Despite evidence from our lab and others about its polygenic nature, little is known about the specific genes, cell types, and pathways involved in ARHL, impeding the development of therapeutic interventions. In this manuscript, we describe, for the first time, the complete cell-type specific transcriptome of the aging mouse cochlea using snRNA-seq in an outbred mouse model in relation to auditory threshold variation. Cochlear cell types were identified using unsupervised clustering and annotated via a three-tiered approach—first by linking to expression of known marker genes, then using the NSForest algorithm to select minimum cluster-specific marker genes and reduce dimensional feature space for statistical comparison of our clusters with existing publicly-available data sets on the gEAR website,1 and finally, by validating and refining the annotations using Multiplexed Error Robust Fluorescence In Situ Hybridization (MERFISH) and the cluster-specific marker genes as probes. We report on 60 unique cell-types expanding the number of defined cochlear cell types by more than two times. Importantly, we show significant specific cell type increases and decreases associated with loss of hearing acuity implicating specific subsets of hair cell subtypes, ganglion cell subtypes, and cell subtypes within the stria vascularis in this model of ARHL. These results provide a view into the cellular and molecular mechanisms responsible for age-related hearing loss and pathways for therapeutic targeting

    Sharing integrated spatial and thematic data : the CRISOLA case for Malta and the European project Plan4all process

    Get PDF
    Sharing data across diverse thematic disciplines is only the next step in a series of hard-fought efforts to ensure barrier-free data availability. The Plan4all project is one such effort, focusing on the interoperability and harmonisation of spatial planning data as based on the INSPIRE protocols. The aims are to support holistic planning and the development of a European network of public and private actors as well as Spatial Data Infrastructure (SDI). The Plan4all and INSPIRE standards enable planners to publish and share spatial planning data. The Malta case tackled the wider scenario for sharing of data, through the investigation of the availability, transformation and dissemination of data using geoportals. The study is brought to the fore with an analysis of the approaches taken to ensure that data in the physical and social domains are harmonised in an internationally-established process. Through an analysis of the criminological theme, the Plan4all process is integrated with the social and land use themes as identified in the CRISOLA model. The process serves as a basis for the need to view sharing as one part of the datacycle rather than an end in itself: without a solid protocol the foundations have been laid for the implementation of the datasets in the social and crime domains.peer-reviewe

    Ernst Denert Award for Software Engineering 2020

    Get PDF
    This open access book provides an overview of the dissertations of the eleven nominees for the Ernst Denert Award for Software Engineering in 2020. The prize, kindly sponsored by the Gerlind & Ernst Denert Stiftung, is awarded for excellent work within the discipline of Software Engineering, which includes methods, tools and procedures for better and efficient development of high quality software. An essential requirement for the nominated work is its applicability and usability in industrial practice. The book contains eleven papers that describe the works by Jonathan Brachthäuser (EPFL Lausanne) entitled What You See Is What You Get: Practical Effect Handlers in Capability-Passing Style, Mojdeh Golagha’s (Fortiss, Munich) thesis How to Effectively Reduce Failure Analysis Time?, Nikolay Harutyunyan’s (FAU Erlangen-Nürnberg) work on Open Source Software Governance, Dominic Henze’s (TU Munich) research about Dynamically Scalable Fog Architectures, Anne Hess’s (Fraunhofer IESE, Kaiserslautern) work on Crossing Disciplinary Borders to Improve Requirements Communication, Istvan Koren’s (RWTH Aachen U) thesis DevOpsUse: A Community-Oriented Methodology for Societal Software Engineering, Yannic Noller’s (NU Singapore) work on Hybrid Differential Software Testing, Dominic Steinhofel’s (TU Darmstadt) thesis entitled Ever Change a Running System: Structured Software Reengineering Using Automatically Proven-Correct Transformation Rules, Peter Wägemann’s (FAU Erlangen-Nürnberg) work Static Worst-Case Analyses and Their Validation Techniques for Safety-Critical Systems, Michael von Wenckstern’s (RWTH Aachen U) research on Improving the Model-Based Systems Engineering Process, and Franz Zieris’s (FU Berlin) thesis on Understanding How Pair Programming Actually Works in Industry: Mechanisms, Patterns, and Dynamics – which actually won the award. The chapters describe key findings of the respective works, show their relevance and applicability to practice and industrial software engineering projects, and provide additional information and findings that have only been discovered afterwards, e.g. when applying the results in industry. This way, the book is not only interesting to other researchers, but also to industrial software professionals who would like to learn about the application of state-of-the-art methods in their daily work
    • …
    corecore