90 research outputs found

    Incorporating Annotator Uncertainty into Representations of Discourse Relations

    Full text link
    Annotation of discourse relations is a known difficult task, especially for non-expert annotators. In this paper, we investigate novice annotators' uncertainty on the annotation of discourse relations on spoken conversational data. We find that dialogue context (single turn, pair of turns within speaker, and pair of turns across speakers) is a significant predictor of confidence scores. We compute distributed representations of discourse relations from co-occurrence statistics that incorporate information about confidence scores and dialogue context. We perform a hierarchical clustering analysis using these representations and show that weighting discourse relation representations with information about confidence and dialogue context coherently models our annotators' uncertainty about discourse relation labels

    Will it Unblend?

    Get PDF
    Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as *innoventor*, are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT\u27s processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that context-aware embedding systems outperform character-level and context-free embeddings, although their results are still far from satisfactory

    The distribution of discourse relations within and across turns in spontaneous conversation

    Full text link
    Time pressure and topic negotiation may impose constraints on how people leverage discourse relations (DRs) in spontaneous conversational contexts. In this work, we adapt a system of DRs for written language to spontaneous dialogue using crowdsourced annotations from novice annotators. We then test whether discourse relations are used differently across several types of multi-utterance contexts. We compare the patterns of DR annotation within and across speakers and within and across turns. Ultimately, we find that different discourse contexts produce distinct distributions of discourse relations, with single-turn annotations creating the most uncertainty for annotators. Additionally, we find that the discourse relation annotations are of sufficient quality to predict from embeddings of discourse units.Comment: Proceedings of Computational Approaches to Discourse 2023, collocated with the 2023 meeting of the Association for Computational Linguistics, Toronto, Canad

    Remembering you read “doctoral dissertation”: Phrase frequency effects in recall and recognition memory

    Get PDF
    Speakers understand and produce common words like cat more easily than less common words like panther. Similarly, this pattern of behavior shows up at larger levels, processing common combinations of words like alcoholic beverages more quickly than less common ones like psychic nephew. As a result, many researchers have concluded that these combinations of words have word-like representations in long-term memory as a way of explaining how both words and phrases can be easier to process the more common they are. This dissertation challenges these assumptions by using episodic memory tasks such as yes-no recognition and immediate free recall of combinations of words, under the premise that word-like representations for phrases should lead to word-like patterns of episodic memory. The results and a corresponding verbal model demonstrate that combinations of words are processed more easily not because phrases have the same structures as words, but because of the strength of association between the two words within a phrase, which leads to facilitated processing

    Knowing a thing is "a thing": The use of acoustic features in multiword expression extraction

    Get PDF
    Speakers of a language need to have complex linguistic representations for speaking, often on the level of non-literal, idiomatic expressions like black sheep. Typically, datasets of these so-called multiword expressions come from hand-crafted ontologies or lexicons, because identifying expressions like these in an unsupervised manner is still an unsolved problem in natural language processing. In this thesis I demonstrate that prosodic features, which are helpful in parsing syntax and interpreting meaning, can also be used to identify multiword expressions. To do this, I extracted noun phrases from the Buckeye corpus, which contains spontaneous spoken language, and matched these noun phrases to page titles in Wikipedia, a massive, freely available encyclopedic ontology of entities and phenomena. By incorporating prosodic features into a model that distinguishes between multiword expressions that are found in Wikipedia titles and those that are not, we see increases in classifier performance that suggests that prosodic cues can help with the automatic extraction of multiword expressions from spontaneous speech, helping models and potentially listeners decide whether something is "a thing" or not

    Technology Transfer and Innovation Policy at Canadian Universities: Opportunities and Social Costs

    Get PDF
    This report, supported by a Social Sciences and Humanities Research Council (SSHRC) Knowledge Synthesis Grant, critically examines the role of universities in transmitting knowledge in the forms of technology transfer mechanisms, intellectual property agreements and other knowledge diffusion policies. In reviewing and synthesizing the recent literature on the topic, we seek to provide some initial evidence-based policy recommendations in order to generally strengthen Canada‘s innovation ecosystem and more specifically to maximize the return on the nation‘s investment in higher education research and development

    Downstream behavioral and electrophysiological consequences of word prediction on recognition memory

    Get PDF
    Data Availability The datasets generated for this study are available on request to the corresponding author. Funding This work was supported by National Institute on Aging Grant R01-AG026308, as well as a James S. McDonnell Foundation Scholar Award to KF. JR was partially supported by NWO Veni grant 275-89-032.Peer reviewedPublisher PD

    A survey tool for measuring evidence-based decision making capacity in public health agencies

    Get PDF
    BACKGROUND: While increasing attention is placed on using evidence-based decision making (EBDM) to improve public health, there is little research assessing the current EBDM capacity of the public health workforce. Public health agencies serve a wide range of populations with varying levels of resources. Our survey tool allows an individual agency to collect data that reflects its unique workforce. METHODS: Health department leaders and academic researchers collaboratively developed and conducted cross-sectional surveys in Kansas and Mississippi (USA) to assess EBDM capacity. Surveys were delivered to state- and local-level practitioners and community partners working in chronic disease control and prevention. The core component of the surveys was adopted from a previously tested instrument and measured gaps (importance versus availability) in competencies for EBDM in chronic disease. Other survey questions addressed expectations and incentives for using EBDM, self-efficacy in three EBDM skills, and estimates of EBDM within the agency. RESULTS: In both states, participants identified communication with policymakers, use of economic evaluation, and translation of research to practice as top competency gaps. Self-efficacy in developing evidence-based chronic disease control programs was lower than in finding or using data. Public health practitioners estimated that approximately two-thirds of programs in their agency were evidence-based. Mississippi participants indicated that health department leaders' expectations for the use of EBDM was approximately twice that of co-workers' expectations and that the use of EBDM could be increased with training and leadership prioritization. CONCLUSIONS: The assessment of EBDM capacity in Kansas and Mississippi built upon previous nationwide findings to identify top gaps in core competencies for EBDM in chronic disease and to estimate a percentage of programs in U.S. health departments that are evidence-based. The survey can serve as a valuable tool for other health departments and non-governmental organizations to assess EBDM capacity within their own workforce and to assist in the identification of approaches that will enhance the uptake of EBDM processes in public health programming and policymaking. Localized survey findings can provide direction for focusing workforce training programs and can indicate the types of incentives and policies that could affect the culture of EBDM in the workplace
    • …
    corecore