693 research outputs found

    Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data

    Get PDF
    Automated alignment of transcriptions to audio files expedites the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages, for which large corpora exist and for which acoustic models have been created by large-scale research projects. Prosodylab-Aligner (PL-A), from McGill University, facilitates automated alignment and segmentation for understudied languages. It allows researchers to train acoustic models using the same audio files for which alignments will be created. Those models can then be used to create time-aligned Praat TextGrids with word and phone boundaries marked. For the benefit of others who wish to use PL-A for research projects, this paper reports on our use of PL-A on Tongan field recordings, reviewing the software, outlining required steps, and providing tips. Since field recordings often contain more background noise than the laboratory recordings for which PL-A was designed, the paper also discusses the relative benefits of removing background noise for both training and alignment purposes. Finally, it compares acoustic measures based on various alignments and compares boundary placements with those of human aligners, demonstrating that automated alignment is both feasible and less time-consuming than manual alignment.National Foreign Language Resource Cente

    Analogical reasoning in uncovering the meaning of digital-technology terms: the case of backdoor

    Full text link
    [EN] The paper substantiates the critical role of analogical reasoning and figurative languge in resolving the ambiguity of cybersecurity terms in various expert communities. Dwelling on the divergent interpretations of a backdoor, it uncovers the potential of metaphor to serve both as an interpretative mechanism and as a framing tool in the ongoing digital technologies discourse. By combining methods of corpus research and frame semantics analysis the study examines the challenges of unpacking the meaning of the contested concept of the backdoor. The paper proposes a qualitatively new metaphor-facilitated mode of interpreting cybersecurity vulnerabilities based on MetaNet deep semantic metaphor analysis and outlines the merits of this hierarchically organized metaphor and frames ontology. The utility of the method is demonstrated through analyzing corpus data and top-down extracting of metaphors (linguistic metaphor – conceptual metaphor – entailed metaphor – inferences) with subsequent identifying of metaphor families dominating the cybersecurity discourse. The paper further claims that the predominant metaphors prompt certain decisions and solutions affecting information security policies. Skrynnikova, IV. (2020). Analogical reasoning in uncovering the meaning of digital-technology terms: the case of backdoor. Journal of Computer-Assisted Linguistic Research. 4(1):23-46. https://doi.org/10.4995/jclr.2020.12921OJS234641Betz, David and Stevens, Tim. 2013. "Analogical Reasoning and Cyber Security." Security Dialogue 44, No. 2: 147-164 (2013). https://doi.org/10.1177/0967010613478323David, Oana and Matlock, Teenie. 2018. "Cross-linguistic automated detection of metaphors for poverty and cancer." Language and Cognition 10 (2018), 467-493. UK Cognitive Linguistics Association. https://doi.org/10.1017/langcog.2018.11David, Oana. 2016. Metaphor in the grammar of argument realization. Unpublished doctoral dissertation, University of California, Berkeley.David, Oana, Lakoff, George, and Stickles, Elise. 2016. "Cascades in metaphor and grammar: A case study of metaphors in the gun debate." Constructions and Frames. 8. 10.1075/cf.8.2.04dav. https://doi.org/10.1075/cf.8.2.04davDavies, Mark. 2013. "Corpus of Global Web-Based English: 1.9 billion words from speakers in 20 countries." Available at: http://corpus.byu.edu/glowbe/Davies, Mark. and Fuchs, Robert. 2015. "Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE)." English World-Wide 36(1), 1-28. https://doi.org/10.1075/eww.36.1.01davDeignan, Alice. 2005. Metaphor and corpus linguistics. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1075/celcr.6Demjén, Zsófia, Semino, Elena, and Koller, Veronika. 2016. "Metaphors for 'good' and 'bad' deaths." Metaphor and the Social World 6(1), 1-19. https://doi.org/10.1075/msw.6.1.01demDodge, Ellen. K., Hong, Jisup, and Stickles, Elise. 2015. "MetaNet: deep semantic automatic metaphor analysis." Proceedings of the Third Workshop on Metaphor in NLP, 40-49. Denver, Colorado, 5 June 2015. Association for Computational Linguistics. https://doi.org/10.3115/v1/W15-1405Do Dinh, Erik-Lân and Gurevych, Iryna. 2016. "Token-level metaphor detection using neural networks." Proceedings of the Fourth Workshop on Metaphor in NLP (June), 28-33. https://doi.org/10.18653/v1/W16-1104Dunn, Jonathan. 2013. "What metaphor identification systems can tell us about metaphor-inlanguage." Proceedings of the First Workshop on Metaphor in NLP, Atlanta Georgia, 13 June 2010, 1-10. Available at: http://www.aclweb.org/anthology/W13-0901Fillmore, Charles J. and Atkins, Beryl. T. 1992. "Toward a frame-based lexicon: the semantics of RISK and its neighbors." In Frames, fields, and contrasts: new essays in semantic and lexical organization, edited by A. Lehrer and E. F. Kittay, 75-102. New York/London: Routledge.Gedigian, M., Bryant, J., Narayanan, S., and Ciric, B. 2006. "Catching metaphors." Proceedings of the Third Workshop on Scalable Natural Language Understanding ScaNaLU 06 (June), 41-48. https://doi.org/10.3115/1621459.1621467Gill, Lex. 2018. "Law, Metaphor, and the Encrypted Machine." Osgoode Hall Law Journal 55.2: 440-477. Available at: https://digitalcommons.osgoode.yorku.ca/ohlj/vol55/iss2/3Gutiérrez, E. Dario, Shutova, Ekaterina, Marghetis, Tyler, and Bergen Benjamin. 2016. "Literal and metaphorical senses in compositional distributional semantic models." In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 7-12, 2016, 183-193. https://doi.org/10.18653/v1/P16-1018Hallam-Baker, Phillip. 2008. dotCrime Manifesto: How to Stop Internet Crime. Addison-Wesley.Jenner, Leontine. 2018. "Backdoor: how a metaphor turns into a weapon." Available at: https://www.hiig.de/en/backdoor-how-a-metaphor-turns-into-a-weapon/Krishnakumaran, Saisuresh and Zhu, Xiaojin. 2007. "Hunting elusive metaphors using lexical resources." In Proceedings of the Workshop on Computational Approaches to Figurative Language, 13-20. Association for Computational Linguistics. https://doi.org/10.3115/1611528.1611531Kupers, Wendelin M. 2013. "Embodied transformative metaphors and narratives in organisational life‐worlds of change." Journal of Organizational Change Management, Vol. 26 Issue: 3, 494-528. https://doi.org/10.1108/09534811311328551Lakoff, George. 1993. "The contemporary theory of metaphor". In Metaphor and thought, edited by A. Ortony, 202-251. New York, NY, US: Cambridge University Press. https://doi.org/10.1017/CBO9781139173865.013Lakoff, George, and Johnson, Mark. 1980. Metaphors we live by. Chicago, IL: University of Chicago Press.Landwehr, C., Bull, A. R., McDermott, J. P., and Choi, W. S. 1994. "A Taxonomy of Computer Program Security Flaws, with Examples." ACM Computing Surv., vol. 26, no. 3, 211-254. https://doi.org/10.1145/185403.185412Lederer, Jenny. (2013). "Assessing claims of metaphorical salience through corpus data." In Proceedings of the 37th Annual Meeting of the Cognitive Science Society, editored by D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings and P. P. Maglio, 1255-1260. Austin, TX: Cognitive Science Society.Lönneker, Birte. 2003. "Is there a way to represent metaphors in WordNets? Insights from the Hamburg Metaphor Database." Proceedings of the ACL 2003 Workshop on Lexicon and Figurative Language - Volume 14, 18-27. https://doi.org/10.3115/1118975.1118978Martin, James H. 2006. "A corpus-based analysis of context effects on metaphor comprehension." In Corpus-based approaches to metaphor and metonymy edited by S. T. Gries and A. Stefanowitsch, 214-236. Berlin: Mouton de Gruyter.Martin, James H. 1994. "MetaBank: a knowledge-base of metaphoric language conventions." Computational Intelligence 10(2), 134-149. https://doi.org/10.1111/j.1467-8640.1994.tb00161.xMason, Z. J. 2004. "CorMet: a computational, corpus-based conventional metaphor extraction system." Computational Linguistics 30(1), 23-44.https://doi.org/10.1162/089120104773633376Philip, G. 2004. "Locating metaphor candidates in specialized corpora using raw frequency and keyword lists." In Metaphor in use: context, culture, and communication edited by F. MacArthur, J. L. Oncins-Martínez, M. Sánchez-García and A. M. Piquer-Píriz, 85-105.Amsterdam: John Benjamins.Pragglejaz Group. 2007. "MIP: a method for identifying metaphorically used words in discourse." Metaphor and Symbol 22(1), 1-39. https://doi.org/10.1080/10926480709336752Shutova, Ekaterina, Teufel, Simone, and Korhonen, Anna. 2012. "Statistical metaphor processing." Computational Linguistics 39(2), 301-353. https://doi.org/10.1162/COLI_a_00124Shutova, Ekaterina and Sun, Lin. 2013. "Unsupervised metaphor identification using hierarchical graph factorization clustering." In Proceedings of NAACL-HLT 2013, Atlanta, Georgia, 9-14 June 2013, 978-988. Available at: http://www.aclweb.org/anthology/N13-1118Skrynnikova, Inna, Astafurova, Tatiana, and Sytina, Nadezhda. 2017. "Power of metaphor: cultural narratives in political persuasion." Proceedings of the 7th International Scientific and Practical Conference "Current issues of linguistics and didactics: The interdisciplinary approach in humanities" (CILDIAH 2017). https://doi.org/10.2991/cildiah-17.2017.50Steen, Gerard J., Dorst, Aletta, Berenike, Herrmann J., Kaal, Anna A., Krennmayr, Tina, and Pasma, Trijntje. 2010. A method for linguistic metaphor identification: from MIP to MIPVU. Amsterdam: John Benjamins. https://doi.org/10.1075/celcr.14Steen, Gerard, J. 1999. "From linguistic to conceptual metaphor in five steps." In Metaphor in cognitive linguistics, edited by R. W. Gibbs and G. J. Steen (Eds.), 57-77. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1075/cilt.175.05steStefanowitsch, Anatol, and Gries, Stefan Th., eds. 2006. Corpus based approaches to metaphor and metonymy. Berlin/New York: Mouton de Gruyter. https://doi.org/10.1515/9783110199895Stickles, Elise, David, Oana, Dodge, Ellen K., and Hong, Jisup. 2016. "Formalizing contemporary conceptual metaphor theory." Constructions and Frames 8(2), 166-213. https://doi.org/10.1075/cf.8.2.03stiWolff, Josephine. 2014. "Cybersecurity as Metaphor: Policy and Defense Implications of Computer Security Metaphors." Paper presented at TPRC Conference, March 31, 2014. https://doi.org/10.2139/ssrn.241863

    PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

    Full text link
    Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Despite the growing prevalence of AI-driven assistants in the root cause analysis process, their effectiveness in assisting on-call engineers is constrained by low accuracy due to the intrinsic difficulty of the task, a propensity for LLM-based approaches to hallucinate, and difficulties in distinguishing these well-disguised hallucinations. To address this challenge, we propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction. Considering the black-box nature of many LLM-based root cause predictors, fine-tuning or temperature-scaling-based approaches are inapplicable. We therefore design an innovative confidence estimation framework based on prompting retrieval-augmented large language models (LLMs) that demand a minimal amount of information from the root cause predictor. This approach consists of two scoring phases: the LLM-based confidence estimator first evaluates its confidence in making judgments in the face of the current incident that reflects its ``grounded-ness" level in reference data, then rates the root cause prediction based on historical references. An optimization step combines these two scores for a final confidence assignment. We show that our method is able to produce calibrated confidence estimates for predicted root causes, validate the usefulness of retrieved historical data and the prompting strategy as well as the generalizability across different root cause prediction models. Our study takes an important move towards reliably and effectively embedding LLMs into cloud incident management systems

    Measuring the differences between human-human and human-machine dialogs

    Get PDF
    In this paper, we assess the applicability of user simulation techniques to generate dialogs which are similar to real human-machine spoken interactions.To do so, we present the results of the comparison between three corpora acquired by means of different techniques. The first corpus was acquired with real users.A statistical user simulation technique has been applied to the same task to acquire the second corpus. In this technique, the next user answer is selected by means of a classification process that takes into account the previous dialog history, the lexical information in the clause, and the subtask of the dialog to which it contributes. Finally, a dialog simulation technique has been developed for the acquisition of the third corpus. This technique uses a random selection of the user and system turns, defining stop conditions for automatically deciding if the simulated dialog is successful or not. We use several evaluation measures proposed in previous research to compare between our three acquired corpora, and then discuss the similarities and differences with regard to these measures

    Automated Error Detection for Developing Grammar Proficiency of ESL Learners

    Get PDF
    Thanks to natural language processing technologies, computer programs are actively being used not only for holistic scoring, but also for formative evaluation of writing. CyWrite is one such program that is under development. The program is built upon Second Language Acquisition theories and aims to assist ESL learners in higher education by providing them with effective formative feedback to facilitate autonomous learning and improvement of their writing skills. In this study, we focus on CyWrite’s capacity to detect grammatical errors in student writing. We specifically report on (1) computational and pedagogical approaches to the development of the tool in terms of students’ grammatical accuracy, and (2) the performance of our grammatical analyzer. We evaluated the performance of CyWrite on a corpus of essays written by ESL undergraduate students with regards to four types of grammatical errors: quantifiers, subject-verb agreement, articles, and run-on sentences. We compared CyWrite’s performance at detecting these errors to the performance of a well-known commercially available AWE tool, Criterion. Our findings demonstrated better performance metrics of our tool as compared to Criterion, and a deeper analysis of false positives and false negatives shed light on how CyWrite’s performance can be improved

    Towards Feasible Instructor Intervention in MOOC discussion forums

    Get PDF
    Massive Open Online Courses allow numerous people from around the world to have access to knowledge that they otherwise have not. However, high student-to-instructor ratio in MOOCs restricts instructors’ ability to facilitate student learning by intervening in discussions forums, as they do in face-to-face classrooms. Instructors need automated guidance on when and how to intervene in discussion forums. Using a typology of pedagogical interventions derived from prior research, we annotate a large corpus of discussion forum contents to enable supervised machine learning to automatically identify interventions that promote student learning. Such machine learning models may allow building of dashboards to automatically prompt instructors on when and how to intervene in discussion forums. In the longer term, it may be possible to automate these interventions relieving instructors of this effort. Such automated approaches are essential for allowing good pedagogical practices to scale in the context of MOOC discussion forums

    Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention

    Full text link
    Prompt and accurate detection of system anomalies is essential to ensure the reliability of software systems. Unlike manual efforts that exploit all available run-time information, existing approaches usually leverage only a single type of monitoring data (often logs or metrics) or fail to make effective use of the joint information among different types of data. Consequently, many false predictions occur. To better understand the manifestations of system anomalies, we conduct a systematical study on a large amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates that logs and metrics can manifest system anomalies collaboratively and complementarily, and neither of them only is sufficient. Thus, integrating heterogeneous data can help recover the complete picture of a system's health status. In this context, we propose Hades, the first end-to-end semi-supervised approach to effectively identify system anomalies based on heterogeneous data. Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns. It captures discriminative features and meaningful interactions from heterogeneous data via a cross-modal attention module, trained in a semi-supervised manner. We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud. The experimental results present the effectiveness of our model in detecting system anomalies. We also release the code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). arXiv admin note: substantial text overlap with arXiv:2207.0291

    Exploring the possibilities of Thomson’s fourth paradigm transformation—The case for a multimodal approach to digital oral history?

    Get PDF
    This article seeks to reorientate ‘digital oral history’ towards a new research paradigm, Multimodal Digital Oral History (MDOH), and in so doing it seeks to build upon Alistair Thomson’s (Thomson, A., 2007, Four paradigm transformations in oral history. Oral History Review, 34(1): 49–70.) characterization of a ‘dizzying digital revolution’ and paradigmatic transformation in oral history (OH). Calling for a recalibration of the current dominance of the textual transcript, and for active engagement with the oral, aural, and sonic affordances of both retro-digitized and born digital OH (DOH) collections, we call for a re-orientation of the digital from passive to generative and self-reflexive in the human–machine study of spoken word recordings. First, we take stock of the field of DOH as it is currently conceived and the ways in which it has or has not answered calls for a return to the orality of the interview by digital means. Secondly, we address the predominant trend of working with transcriptions in digital analysis of spoken word recordings and the tools being used by oral historians. Thirdly, we ask about the emerging possibilities—tools and experimental methodologies—for sonic analysis of spoken word collections within and beyond OH, looking to intersections with digital humanities, sociolinguistics, and sound studies. Lastly, we consider ethical questions and practicalities concomitant with data-driven methods, analyses and technologies like AI for the study of sonic research artefacts, reflections that dovetail with digital hermeneutics and digital tool criticism and point towards a new MDOH departure, a sub-field that has potential to inform the many fields that seek patterns in audio, audio-visual, and post-textual materials, serially and at scale
    corecore